Speculative Decoding

#1
by FrenzyBiscuit - opened

I'm trying to use this model for speculative decoding of 32B and it dramatically slows down the model.

On the other hand, regular Qwen 2.5 1.5B dramatically speeds up the regular 32B Qwen 2.5 model.

Is this trained on the same v0.2 data as the 32B v0.2?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment