Qwen
/

YaRN: is "performance" referring to quality or speed?

#4
by kmouratidis - opened

All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts.

Does performance here refer to prefill / decode speeds, or to model quality?

If it's the latter, do you have mitigation suggestions or even complete alternatives?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment