R1-GRPO-Math-Python-Code-Experiments

0-hero 's Collections

updated 2 days ago

Lora & full finetune experiments on r1 distills to generate python code for math problems