LlamaLite-1B-Q8

Model Description

LlamaLite-1B-Q8 is a quantized (8-bit) version of the Meta Llama 3.2-1B-Instruct model, optimized for efficient inference on edge devices and resource-constrained environments. This model maintains high accuracy while significantly reducing memory footprint.

Model Details

  • Base Model: Meta Llama 3.2-1B-Instruct
  • Quantization: 8-bit (GGUF format)
  • Size: 1.31 GB
  • Framework: Llama.cpp
  • Optimized for: Offline use, low-power devices

Usage

This model is suitable for real-time applications such as:

  • Offline AI assistants
  • Embedded systems
  • Edge AI devices
  • Low-latency inference

Example Usage in Llama.cpp

./main -m LlamaLite-1B-Q8.gguf -p "Tell me about quantum computing"
Downloads last month
2
GGUF
Model size
1.24B params
Architecture
llama
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sagar27kumar/LlamaLite-1B-Q8

Quantized
(241)
this model