InterLM2-Chat NF4 Quant

Usage

As of 2024/1/17, Transformers must be installed from source and bitsandbytes >=0.42.0 is required in order to load serialized 4-bit quants.

pip install -U git+https://github.com/huggingface/transformers bitsandbytes

Quantization config

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

Not necessary for inference, just load the model without specifying any quantization/load_in_*bit.

Model Details

Downloads last month
15
Safetensors
Model size
10.8B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support