Different number of attention heads, makes rotary_ndims vs rope scaling factors wrong?

#1
by bartowski - opened

In configuration_phi3.py, it has:

rotary_ndims = int(self.hidden_size // self.num_attention_heads * self.partial_rotary_factor)

so rotary_ndims would be 3072 // 24 * 1.0 = 128

Then rope_scaling_short_factor is a list of length 48

it then raises an error if

len(rope_scaling_short_factor) != rotary_ndims //2

and since 48 != 64, this is an error (and I get a similar one in llama.cpp)

The question is, is the number of heads incorrect? In both phi 3.5 mini the num_attention_heads is 32, which would give a rotary_ndims of 96, which when divided by 2 gives the number 48 that we expect

Any idea what's incorrect?

Thanks for your interest!

In the config, the rotary factor is 0.75.
Could you share how you are loading the config?

https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/4b00ec8714b0cb224e4fb33380cbf0919f177f3e/config.json#L31

I was attempting to quantize this to an 8-bit EXL2 quant this morning, which also failed for what I assume are similar reasons. Looks like it's missing the check for partial_rotary_factor. Very cool to see 128K context on Phi-4. Will work to get the associated infrastructure in place.

Maybe there is a same issue with Sglang?

When I run the following command:

python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --host 0.0.0.0 --port 30000 --dp 4 --enable-p2p-check --mem-fraction-static 0.95

I get this error:

  File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
self._rope_scaling_validation()
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 208, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling`'s short_factor field must have length 64, got 48```

Yes it seems likely that most of these tools are ignoring the 0.75 scaling, thanks for pointing that out @ykim362 ! Will investigate

Maybe there is a same issue with Sglang?

When I run the following command:

python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --host 0.0.0.0 --port 30000 --dp 4 --enable-p2p-check --mem-fraction-static 0.95

I get this error:

  File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
self._rope_scaling_validation()
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 208, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling`'s short_factor field must have length 64, got 48```

Same issue with vllm even if with version 0.7.2 in OAI server mode.

Hi @leflak ,

Thanks for your interest!
We already integrate it to vllm, and it will be available from v0.7.3.
https://github.com/vllm-project/vllm/pull/12718

Thanks.

getting same error when GRPO training with unsloth - valueError: rope_scaling's short_factor field must have length 64, got 48```

same error when sft with huggingface trl

ValueError: `rope_scaling`'s short_factor field must have length 64, got 48

This error is raised because the length of your rope_scaling dictionary’s short_factor list doesn’t match what the model configuration expects. In the validation method, the code calculates:

rotary_ndims = int(self.hidden_size // self.num_attention_heads * self.partial_rotary_factor)

Then it requires that the length of rope_scaling["short_factor"] be exactly rotary_ndims // 2. In your case, the error message indicates that it expected a length of 64, meaning:

rotary_ndims

128
rotary_ndims=128
128
÷
2

64
128÷2=64
But your provided list has only 48 elements.

To resolve this issue, you have two options:

Update the rope_scaling dictionary:
Modify your rope_scaling["short_factor"] (and similarly the long_factor, if applicable) so that its length is 64, matching the computed expectation.

Adjust model parameters:
If the list of 48 elements is what you intend to use, then you’ll need to adjust your model’s configuration (for example, by changing hidden_size, num_attention_heads, or partial_rotary_factor) so that the computed value of rotary_ndims // 2 equals 48.

Review your model configuration settings and ensure that the dimensions in rope_scaling align with the derived value from your model parameters.

Is there fix in Microsoft for that?

Microsoft org

Hi @legolasyiu .
Thanks for your interest!
Yes, the new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.

VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947

Hi @leflak ,

Thanks for your interest!
We already integrate it to vllm, and it will be available from v0.7.3.
https://github.com/vllm-project/vllm/pull/12718

Thanks.

Hi @ykim362 thanks for your reply (I suggest an update of the modelcard which requires vllm>=0.7.2)!

Hi @legolasyiu .
Thanks for your interest!
Yes, the new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.

VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947

Thanks. I am so glad you guys are fixing it.

Microsoft org

Thanks, @leflak .
Will update the model card to vllm v0.7.3.

nguyenbh changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment