ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT

Gemma 3 12B V.2 Fornax

This model is an experiment to try to produce a strong smaller thinking model capable of fitting in an 8GiB consumer graphics card with generalizeable reasoning capabilities. Most other open source thinking models, especially on the smaller side, fail to generalize their reasoning to tasks other than coding or math due to an overly large focus on GRPO zero for CoT which is only applicable for coding and math.

Instead of using GRPO, this model aims to SFT a wide variety of high quality, diverse reasoning traces from Deepseek R1 onto Gemma 3 to force the model to learn to effectively generalize its reasoning capabilites to a large number of tasks as an extension of the LiMO paper's approach to Math/Coding CoT. A subset of V3 O3/24 non-thinking data was also included for improved creativity and to allow the model to retain it's non-thinking capabilites.

Training off the QAT checkpoint allows for this model to be used without a drop in quality at Q4_0, requiring only ~6GiB of memory.

Thinking Mode

Similar to the Qwen 3 model line, Gemma Fornax can be used with or without thinking mode enabled.

To enable thinking place /think in the system prompt and prefill <think>\n for thinking mode.
To disable thinking put /no_think in the system prompt.

Settings

I reccomend using the included sampler and template json configs for Sillytavern, as the defaults do not play well with Gemma 3 due to formatting issues.

Special Thanks:

Google for open sourcing the excellent Gemma 3 model line.

Undi95 for portions of their dataset and inspiration.

PJMixers-Dev for their dataset curation and creation efforts.

GeneralReasoning for their dataset.

ConicCat
/

Gemma-3-12B-FornaxV.2-QAT-CoT

Gemma 3 12B V.2 Fornax

Thinking Mode

Settings

Special Thanks:

Model tree for ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT

Datasets used to train ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT

Collection including ConicCat/Gemma-3-12B-FornaxV.2-QAT-CoT

8GiB Models