luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v5-Train-NoKL-Marg-NormAdv Text Generation • Updated 18 days ago • 4
luckeciano/Qwen-2.5-7B-RL-LACPO-NoBaselineNoKLNoEntropyNoSmooth Text Generation • Updated 8 days ago • 1
luckeciano/Qwen-2.5-7B-RL-LACPO-NoBaselineNoKLNoEntropy0.5NoSmooth Text Generation • Updated 8 days ago • 1
luckeciano/Qwen-2.5-7B-RL-LACPO-NoBaselineNoKLNoEntropy0.5Smooth10 Text Generation • Updated 7 days ago
luckeciano/Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropy0.1Smooth10 Text Generation • Updated 6 days ago • 1