About the rope_theta
#12
by
HongyuZang
- opened
I notice that the rope_theta of qwen-7b and qwen-1.5b are both "10000" and the sliding_window are both "4096". However, the original rope_theta and sliding_window are "1000000.0" and "131072" seperately. Is there any specific reason why change rope_theta there? Will this affect the performance of further finetuning or RL training?