metadata

license: apache-2.0
language: en
library_name: llama.cpp
tags:
  - gguf
  - quantized
  - merge
  - mergekit
  - ties
  - 12b
  - text-generation
  - etherealaurora
  - mn-mag-mell
  - nemomix
  - chat
  - roleplay
pipeline_tag: text-generation
base_model: TomoDG/EtherealAurora-MN-Nemo-12B
model_type: llama

GGUF Quantized Models for TomoDG/EtherealAurora-MN-Nemo-12B

This repository contains GGUF format model files for TomoDG/EtherealAurora-MN-Nemo-12B.

These files were quantized using llama.cpp.

Original Model Card

For details on the merge process, methodology, and intended use, please refer to the original model card: TomoDG/EtherealAurora-MN-Nemo-12B

Available Quantizations

File Name	Quantization Type	Size (Approx)	Recommended RAM	Use Case
`EtherealAurora-MN-Nemo-12B-Q4_K_S.gguf`	Q4_K_S	~6.95 GB	9 GB+	Smallest 4-bit K-quant, lower RAM usage
`EtherealAurora-MN-Nemo-12B-Q4_K_M.gguf`	Q4_K_M	~7.30 GB	10 GB+	Good balance quality/performance, medium RAM
`EtherealAurora-MN-Nemo-12B-Q5_K_M.gguf`	Q5_K_M	~8.52 GB	12 GB+	Higher quality, higher RAM usage
`EtherealAurora-MN-Nemo-12B-Q6_K.gguf`	Q6_K	~9.82 GB	13 GB+	Very high quality, close to FP16
`EtherealAurora-MN-Nemo-12B-Q8_0.gguf`	Q8_0	~12.7 GB	16 GB+	Highest quality GGUF quant, large size

General Recommendations:

_K_M quants (like Q4_K_M, Q5_K_M): Generally recommended for a good balance of quality and resource usage.
Q6_K: Offers higher quality closer to FP16 if you have sufficient RAM.
Q8_0: Highest quality GGUF quantization but requires the most resources.