Speculative Decoding

by FrenzyBiscuit - opened Jan 22

Jan 22

I'm trying to use this model for speculative decoding of 32B and it dramatically slows down the model.

On the other hand, regular Qwen 2.5 1.5B dramatically speeds up the regular 32B Qwen 2.5 model.

Is this trained on the same v0.2 data as the 32B v0.2?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment