Jaward commited on
Commit
fd65b72
·
verified ·
1 Parent(s): f066e86

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -4,7 +4,7 @@ tags: []
4
  ---
5
 
6
  # SmolLM2-360M-Instruct-Reasoner
7
- This is an experimental reasoning version of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). It was trained with a custom grpo trainer that scales for models with <=500M params, supports both cpu and gpu (with vllm support). So far the final model tends to perform well on most reasoning problems with responses in desired format, although there is still room for improvements. Feel free to send a PR on the repo.
8
 
9
  CODE: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb
10
 
 
4
  ---
5
 
6
  # SmolLM2-360M-Instruct-Reasoner
7
+ This is an experimental reasoning version of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). It was trained with a custom grpo trainer that scales for models with <=500M params, supports both cpu and gpu (with vllm + flash attention support). So far the final model tends to perform well on most reasoning problems with responses in desired format, although there is still room for improvements. Feel free to send a PR on the repo.
8
 
9
  CODE: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb
10