Jaward
/

smollm2_360m_grpo_gsm8k_reasoner

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Jaward commited on 16 days ago

Commit

fd65b72

·

verified ·

1 Parent(s): f066e86

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ tags: []
 ---
 # SmolLM2-360M-Instruct-Reasoner
-This is an experimental reasoning version of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). It was trained with a custom grpo trainer that scales for models with <=500M params, supports both cpu and gpu (with vllm support). So far the final model tends to perform well on most reasoning problems with responses in desired format, although there is still room for improvements. Feel free to send a PR on the repo.
 CODE: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

 ---
 # SmolLM2-360M-Instruct-Reasoner
+This is an experimental reasoning version of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). It was trained with a custom grpo trainer that scales for models with <=500M params, supports both cpu and gpu (with vllm + flash attention support). So far the final model tends to perform well on most reasoning problems with responses in desired format, although there is still room for improvements. Feel free to send a PR on the repo.
 CODE: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb