Shuttle 3.5 β€” Q8_0 GGUF Quant

This repo contains a GGUF quantized version of ShuttleAI's Shuttle 3.5 model, a high-performance instruction-tuned variant of Qwen 3 32B. This quant was built for efficient local inference without sacrificing quality.

πŸ”— Base Model

  • Original: shuttleai/shuttle-3.5
  • Parent architecture: Qwen 3 32B
  • Quantized by: Lex-au
  • Quantization format: GGUF Q8_0

πŸ“¦ Model Size

Format Size
Original (safetensors, F16) 65.52 GB
Q8_0 (GGUF) 34.8 GB

Compression Ratio: ~47%
Size Reduction: ~18% absolute (30.72 GB saved)

πŸ§ͺ Quality

  • Q8_0 is near-lossless, preserving almost all performance of the full-precision model.
  • Ideal for high-quality inference on capable consumer hardware.

πŸš€ Usage

Compatible with all major GGUF-supporting runtimes, including:

  • llama.cpp
  • KoboldCPP
  • text-generation-webui
  • llamafile
  • LM Studio

Example with llama.cpp:

./main -m shuttle-3.5.Q8_0.gguf --ctx-size 4096 --threads 16 --prompt "Describe the effects of quantum decoherence in plain English."
Downloads last month
25
GGUF
Model size
32.8B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lex-au/shuttle-3.5-Q8_0-GGUF

Quantized
(5)
this model