Image-Text-to-Text
Transformers
Safetensors
gemma3
gemma
google
conversational
text-generation-inference

Gemma 3 12B V.2 Fornax

This model is an experiment to try to produce a strong smaller thinking model capable of fitting in an 8GiB consumer graphics card with generalizeable reasoning capabilities. Most other open source thinking models, especially on the smaller side, fail to generalize their reasoning to tasks other than coding or math due to an overly large focus on GRPO zero for CoT which is only applicable for coding and math.

Instead of using GRPO, this model aims to SFT a wide variety of high quality, diverse reasoning traces from Deepseek R1 onto Gemma 3 to force the model to learn to effectively generalize its reasoning capabilites to a large number of tasks as an extension of the LiMO paper's approach to Math/Coding CoT. A subset of V3 O3/24 non-thinking data was also included for improved creativity and to allow the model to retain it's non-thinking capabilites.

Training off the QAT checkpoint allows for this model to be used without a drop in quality at Q4_0, requiring only ~6GiB of memory.

Thinking Mode

Similar to the Qwen 3 model line, Gemma Fornax can be used with or without thinking mode enabled.

  • To enable thinking place /think in the system prompt and prefill <think>\n for thinking mode.

  • To disable thinking put /no_think in the system prompt.

Settings

I reccomend using the included sampler and template json configs for Sillytavern, as the defaults do not play well with Gemma 3 due to formatting issues.

Special Thanks:

Google for open sourcing the excellent Gemma 3 model line.

Undi95 for portions of their dataset and inspiration.

PJMixers-Dev for their dataset curation and creation efforts.

GeneralReasoning for their dataset.

Downloads last month
3
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT

Finetuned
(51)
this model
Quantizations
3 models

Datasets used to train ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT

Collection including ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT