google/gemma-3-12b-it-qat-q4_0-unquantized日本語が多く含まれるimatrixを使って量子化したモデルです
This is a model that quantizes google/gemma-3-12b-it-qat-q4_0-unquantized using an imatrix that contains a lot of Japanese..
https://huggingface.co/dahara1/imatrix-jpn-test).

最新のllama.cppを使って動かしてください。
Please use the latest llama.cpp.

投機的デコーディングで速度を向上させる使い方の例
Example of Speculative Decoding for speed up.

1Bモデルには視覚機能は含まれていません
There is no vision ability in 1B model.

./llama-server -m ./gemma-3-12b-it-qat-q4_0-japanese-imatrix-Q4_0.gguf -md ./gemma-3-1b-it-qat-q4_0-japanese-imatrix-Q4_K-f16.gguf -e -ngld 99 -ngl 99
Downloads last month
1,779
GGUF
Model size
1,000M params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support