GGUF Quantized Models for TomoDG/EtherealAurora-MN-Nemo-12B

This repository contains GGUF format model files for TomoDG/EtherealAurora-MN-Nemo-12B.

These files were quantized using llama.cpp.

Original Model Card

For details on the merge process, methodology, and intended use, please refer to the original model card: TomoDG/EtherealAurora-MN-Nemo-12B

File Name	Quantization Type	Size (Approx)	Recommended RAM	Use Case
`EtherealAurora-MN-Nemo-12B-Q4_K_S.gguf`	Q4_K_S	~6.95 GB	9 GB+	Smallest 4-bit K-quant, lower RAM usage
`EtherealAurora-MN-Nemo-12B-Q4_K_M.gguf`	Q4_K_M	~7.30 GB	10 GB+	Good balance quality/performance, medium RAM
`EtherealAurora-MN-Nemo-12B-Q5_K_M.gguf`	Q5_K_M	~8.52 GB	12 GB+	Higher quality, higher RAM usage
`EtherealAurora-MN-Nemo-12B-Q6_K.gguf`	Q6_K	~9.82 GB	13 GB+	Very high quality, close to FP16
`EtherealAurora-MN-Nemo-12B-Q8_0.gguf`	Q8_0	~12.7 GB	16 GB+	Highest quality GGUF quant, large size

General Recommendations:

_K_M quants (like Q4_K_M, Q5_K_M): Generally recommended for a good balance of quality and resource usage.
Q6_K: Offers higher quality closer to FP16 if you have sufficient RAM.
Q8_0: Highest quality GGUF quantization but requires the most resources.