About the rope_theta

#12
by HongyuZang - opened

I notice that the rope_theta of qwen-7b and qwen-1.5b are both "10000" and the sliding_window are both "4096". However, the original rope_theta and sliding_window are "1000000.0" and "131072" seperately. Is there any specific reason why change rope_theta there? Will this affect the performance of further finetuning or RL training?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment