Speculative Decoding: I'd love to have a much smaller "companion model" (0.5B for example)

#43
by lstrozzi - opened

Hello, thank you very much to release to the public such a great model!

I was using the pair QWEN 32B + QWEN 0.5B until you released Mistral-Small-3.1-24B, but I've tested your model and it wins clearly over QWEN for my application.

I however had an approx. 40% speed boost with QWEN, because the 0.5B model can "predict" about half of the tokens much faster. Do you plan to release a smaller Mistral to be used as Speculative Decoding companion to the bigger models?

Thanks in advance for your feedback,
Congratulations again for the excellent European work, we much need local AI. I'm building an AI platform at https://www.thalabus.com, all made in Europe.
Cheers

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment