--- license: apache-2.0 # Or the license of the original model language: en library_name: llama.cpp # Indicate it's primarily for llama.cpp ecosystem tags: - gguf - quantized - merge - mergekit - ties - 12b - text-generation - etherealaurora - mn-mag-mell - nemomix - chat - roleplay pipeline_tag: text-generation base_model: TomoDG/EtherealAurora-MN-Nemo-12B model_type: llama --- # GGUF Quantized Models for TomoDG/EtherealAurora-MN-Nemo-12B This repository contains GGUF format model files for [TomoDG/EtherealAurora-MN-Nemo-12B](https://huggingface.co/TomoDG/EtherealAurora-MN-Nemo-12B). These files were quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp). ## Original Model Card For details on the merge process, methodology, and intended use, please refer to the original model card: [**TomoDG/EtherealAurora-MN-Nemo-12B**](https://huggingface.co/TomoDG/EtherealAurora-MN-Nemo-12B) ## Available Quantizations | File Name | Quantization Type | Size (Approx) | Recommended RAM | Use Case | | :------------------------------------------ | :---------------- | :------------ | :-------------- | :------------------------------------------- | | `EtherealAurora-MN-Nemo-12B-Q4_K_S.gguf` | Q4_K_S | ~6.95 GB | 9 GB+ | Smallest 4-bit K-quant, lower RAM usage | | `EtherealAurora-MN-Nemo-12B-Q4_K_M.gguf` | Q4_K_M | ~7.30 GB | 10 GB+ | Good balance quality/performance, medium RAM | | `EtherealAurora-MN-Nemo-12B-Q5_K_M.gguf` | Q5_K_M | ~8.52 GB | 12 GB+ | Higher quality, higher RAM usage | | `EtherealAurora-MN-Nemo-12B-Q6_K.gguf` | Q6_K | ~9.82 GB | 13 GB+ | Very high quality, close to FP16 | | `EtherealAurora-MN-Nemo-12B-Q8_0.gguf` | Q8_0 | ~12.7 GB | 16 GB+ | Highest quality GGUF quant, large size | | **General Recommendations:** * **`_K_M` quants (like Q4_K_M, Q5_K_M):** Generally recommended for a good balance of quality and resource usage. * **`Q6_K`:** Offers higher quality closer to FP16 if you have sufficient RAM. * **`Q8_0`:** Highest quality GGUF quantization but requires the most resources.