Text Generation
GGUF
all use cases
creative
creative writing
all genres
tool calls
tool use
problem solving
deep thinking
reasoning
deep reasoning
story
writing
fiction
roleplaying
bfloat16
role play
sillytavern
backyard
llama 3.1
context 128k
mergekit
Merge
Mixture of Experts
mixture of experts
conversational
Hi, can you build a model for the coding task?
#1
by
win10
- opened
Hi, can you build a model for the coding task?
Also, can you build an MoE for GLM4-32B and 9B? (In my attempts, fine-tuning was necessary to restore performance—I suspect it's because the QKV was split.)
It depends on the models available - IE: Llama, Mistrals, Deepseek, Qwens/QwQs.
Not all model archs support MOEs.
IE: Gemma does not.
Likewise the model size/layers/arch type must all match up to create a MOE too.
GLM is a new arch type too , which complicates things.