TomoDG's picture
Upload README.md with huggingface_hub
d94ab86 verified
metadata
license: apache-2.0
language: en
library_name: llama.cpp
tags:
  - gguf
  - quantized
  - merge
  - mergekit
  - ties
  - 12b
  - text-generation
  - etherealaurora
  - mn-mag-mell
  - nemomix
  - chat
  - roleplay
pipeline_tag: text-generation
base_model: TomoDG/EtherealAurora-MN-Nemo-12B
model_type: llama

GGUF Quantized Models for TomoDG/EtherealAurora-MN-Nemo-12B

This repository contains GGUF format model files for TomoDG/EtherealAurora-MN-Nemo-12B.

These files were quantized using llama.cpp.

Original Model Card

For details on the merge process, methodology, and intended use, please refer to the original model card: TomoDG/EtherealAurora-MN-Nemo-12B

Available Quantizations

File Name Quantization Type Size (Approx) Recommended RAM Use Case
EtherealAurora-MN-Nemo-12B-Q4_K_S.gguf Q4_K_S ~6.95 GB 9 GB+ Smallest 4-bit K-quant, lower RAM usage
EtherealAurora-MN-Nemo-12B-Q4_K_M.gguf Q4_K_M ~7.30 GB 10 GB+ Good balance quality/performance, medium RAM
EtherealAurora-MN-Nemo-12B-Q5_K_M.gguf Q5_K_M ~8.52 GB 12 GB+ Higher quality, higher RAM usage
EtherealAurora-MN-Nemo-12B-Q6_K.gguf Q6_K ~9.82 GB 13 GB+ Very high quality, close to FP16
EtherealAurora-MN-Nemo-12B-Q8_0.gguf Q8_0 ~12.7 GB 16 GB+ Highest quality GGUF quant, large size

General Recommendations:

  • _K_M quants (like Q4_K_M, Q5_K_M): Generally recommended for a good balance of quality and resource usage.
  • Q6_K: Offers higher quality closer to FP16 if you have sufficient RAM.
  • Q8_0: Highest quality GGUF quantization but requires the most resources.