GGUF Quantized Models for TomoDG/EtherealAurora-MN-Nemo-12B

This repository contains GGUF format model files for TomoDG/EtherealAurora-MN-Nemo-12B.

These files were quantized using llama.cpp.

Original Model Card

For details on the merge process, methodology, and intended use, please refer to the original model card: TomoDG/EtherealAurora-MN-Nemo-12B

Available Quantizations

File Name Quantization Type Size (Approx) Recommended RAM Use Case
EtherealAurora-MN-Nemo-12B-Q4_K_S.gguf Q4_K_S ~6.95 GB 9 GB+ Smallest 4-bit K-quant, lower RAM usage
EtherealAurora-MN-Nemo-12B-Q4_K_M.gguf Q4_K_M ~7.30 GB 10 GB+ Good balance quality/performance, medium RAM
EtherealAurora-MN-Nemo-12B-Q5_K_M.gguf Q5_K_M ~8.52 GB 12 GB+ Higher quality, higher RAM usage
EtherealAurora-MN-Nemo-12B-Q6_K.gguf Q6_K ~9.82 GB 13 GB+ Very high quality, close to FP16
EtherealAurora-MN-Nemo-12B-Q8_0.gguf Q8_0 ~12.7 GB 16 GB+ Highest quality GGUF quant, large size

General Recommendations:

  • _K_M quants (like Q4_K_M, Q5_K_M): Generally recommended for a good balance of quality and resource usage.
  • Q6_K: Offers higher quality closer to FP16 if you have sufficient RAM.
  • Q8_0: Highest quality GGUF quantization but requires the most resources.
Downloads last month
95
GGUF
Model size
12.2B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TomoDG/EtherealAurora-MN-Nemo-12B-GGUF

Quantized
(1)
this model