LlamaLite-1B-Q8
Model Description
LlamaLite-1B-Q8 is a quantized (8-bit) version of the Meta Llama 3.2-1B-Instruct model, optimized for efficient inference on edge devices and resource-constrained environments. This model maintains high accuracy while significantly reducing memory footprint.
Model Details
- Base Model: Meta Llama 3.2-1B-Instruct
- Quantization: 8-bit (GGUF format)
- Size: 1.31 GB
- Framework: Llama.cpp
- Optimized for: Offline use, low-power devices
Usage
This model is suitable for real-time applications such as:
- Offline AI assistants
- Embedded systems
- Edge AI devices
- Low-latency inference
Example Usage in Llama.cpp
./main -m LlamaLite-1B-Q8.gguf -p "Tell me about quantum computing"
- Downloads last month
- 2
Hardware compatibility
Log In
to view the estimation
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for sagar27kumar/LlamaLite-1B-Q8
Base model
meta-llama/Llama-3.2-1B-Instruct