0.5B Model
#9
by
chrisvnz
- opened
Great work, do you think an 0.5B model would produce reasonable results also?
Unsure! Happy to collaborate to see if an even smaller model can truly learn reasoning. Would most likely require distillation first from Deepseek-R1 ;)