
sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16
Text Generation
•
Updated
•
2
Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)