About

Bitsandbytes 4bit quantized version of https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7

Ideal for faster and cheaper GPU inference e.g. in VLLM.
Stats from running on RTX 4090:

model weights take 9.35GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 4.50GiB; the rest of the memory reserved for KV Cache is 9.24GiB.
Maximum concurrency for 32768 tokens per request: 1.54x

Downloads last month: 4

Safetensors

Model size

8.37B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 3WaD/Lamarck-14B-v0.7-bnb-4bit

Base model

sometimesanotion/Lamarck-14B-v0.7

Quantized

(17)

this model