Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ tags: []
|
|
4 |
---
|
5 |
|
6 |
# SmolLM2-360M-Instruct-Reasoner
|
7 |
-
This is an experimental reasoning version of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). It was trained with a custom grpo trainer that scales for models with <=500M params, supports both cpu and gpu (with vllm support). So far the final model tends to perform well on most reasoning problems with responses in desired format, although there is still room for improvements. Feel free to send a PR on the repo.
|
8 |
|
9 |
CODE: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb
|
10 |
|
|
|
4 |
---
|
5 |
|
6 |
# SmolLM2-360M-Instruct-Reasoner
|
7 |
+
This is an experimental reasoning version of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). It was trained with a custom grpo trainer that scales for models with <=500M params, supports both cpu and gpu (with vllm + flash attention support). So far the final model tends to perform well on most reasoning problems with responses in desired format, although there is still room for improvements. Feel free to send a PR on the repo.
|
8 |
|
9 |
CODE: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb
|
10 |
|