Hi, can you build a model for the coding task?

#1
by win10 - opened

Hi, can you build a model for the coding task?
Also, can you build an MoE for GLM4-32B and 9B? (In my attempts, fine-tuning was necessary to restore performance—I suspect it's because the QKV was split.)

It depends on the models available - IE: Llama, Mistrals, Deepseek, Qwens/QwQs.
Not all model archs support MOEs.
IE: Gemma does not.

Likewise the model size/layers/arch type must all match up to create a MOE too.
GLM is a new arch type too , which complicates things.

Sign up or log in to comment