YaRN: is "performance" referring to quality or speed?
#4
by
kmouratidis
- opened
All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts.
Does performance here refer to prefill / decode speeds, or to model quality?
If it's the latter, do you have mitigation suggestions or even complete alternatives?