Post
827
Qwen 3 Fine tuning >> MoE. Update the experiment thread to include config and script for fine-tuning the Qwen3-30B-A3B model.
The goal is to make a low latency non-thinking model for a daily driver coding, so 3 billion parameters active should be perfect.
✔️ training running
✔️ evals running
⏭️ improve dataset
The moe isn't going to fit into colab's A100 even with quantization (🙏 @UnslothAI ). So I've been working on HF spaces' H100s for this. Everything is available in the tread and I'll share more tomorrow.
burtenshaw/Qwen3-Code-Lite#1
The goal is to make a low latency non-thinking model for a daily driver coding, so 3 billion parameters active should be perfect.
✔️ training running
✔️ evals running
⏭️ improve dataset
The moe isn't going to fit into colab's A100 even with quantization (🙏 @UnslothAI ). So I've been working on HF spaces' H100s for this. Everything is available in the tread and I'll share more tomorrow.
burtenshaw/Qwen3-Code-Lite#1