metadata
license: apache-2.0
language: en
library_name: llama.cpp
tags:
- gguf
- quantized
- merge
- mergekit
- ties
- 12b
- text-generation
- etherealaurora
- mn-mag-mell
- nemomix
- chat
- roleplay
pipeline_tag: text-generation
base_model: TomoDG/EtherealAurora-MN-Nemo-12B
model_type: llama
GGUF Quantized Models for TomoDG/EtherealAurora-MN-Nemo-12B
This repository contains GGUF format model files for TomoDG/EtherealAurora-MN-Nemo-12B.
These files were quantized using llama.cpp.
Original Model Card
For details on the merge process, methodology, and intended use, please refer to the original model card: TomoDG/EtherealAurora-MN-Nemo-12B
Available Quantizations
File Name | Quantization Type | Size (Approx) | Recommended RAM | Use Case |
---|---|---|---|---|
EtherealAurora-MN-Nemo-12B-Q4_K_S.gguf |
Q4_K_S | ~6.95 GB | 9 GB+ | Smallest 4-bit K-quant, lower RAM usage |
EtherealAurora-MN-Nemo-12B-Q4_K_M.gguf |
Q4_K_M | ~7.30 GB | 10 GB+ | Good balance quality/performance, medium RAM |
EtherealAurora-MN-Nemo-12B-Q5_K_M.gguf |
Q5_K_M | ~8.52 GB | 12 GB+ | Higher quality, higher RAM usage |
EtherealAurora-MN-Nemo-12B-Q6_K.gguf |
Q6_K | ~9.82 GB | 13 GB+ | Very high quality, close to FP16 |
EtherealAurora-MN-Nemo-12B-Q8_0.gguf |
Q8_0 | ~12.7 GB | 16 GB+ | Highest quality GGUF quant, large size |
General Recommendations:
_K_M
quants (like Q4_K_M, Q5_K_M): Generally recommended for a good balance of quality and resource usage.Q6_K
: Offers higher quality closer to FP16 if you have sufficient RAM.Q8_0
: Highest quality GGUF quantization but requires the most resources.