Hi, can you build a model for the coding task?

by win10 - opened 20 days ago

20 days ago

Hi, can you build a model for the coding task?
Also, can you build an MoE for GLM4-32B and 9B? (In my attempts, fine-tuning was necessary to restore performance—I suspect it's because the QKV was split.)

DavidAU

Owner 19 days ago

It depends on the models available - IE: Llama, Mistrals, Deepseek, Qwens/QwQs.
Not all model archs support MOEs.
IE: Gemma does not.

Likewise the model size/layers/arch type must all match up to create a MOE too.
GLM is a new arch type too , which complicates things.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment