unsloth/README · I can't run any of the dynamic bnb-4bit quants with TextGenerationInference

Feb 10

•

here is the options i had used :

"--quantize bitsandbytes-fp4 --max-input-tokens 30000 --sharded true --num-shard 2"

docker compose file

  text-generation-inference:
    image: ghcr.io/huggingface/text-generation-inference:3.1.0
    environment:
      - MODEL_ID=unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit
    ports:
      - "0.0.0.0:8099:80"
    restart: "unless-stopped"
    command: "--quantize bitsandbytes-fp4 --max-input-tokens 30000 --sharded true --num-shard 2"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0', '1']
              capabilities: [gpu]
    shm_size: '90g'
    volumes:
      - ~/.hf-docker-data:/data
    networks:
      - llmhost

Error :

text-generation-inference-1  | [rank1]: AssertionError: The choosen size 1 is not compatible with sharding on 2 shards rank=1

I also opened an issue at TGI . Not sure which side have the problem

https://github.com/huggingface/text-generation-inference/issues/3005

shimmyshimmer

Unsloth AI org Feb 13

Thanks, honestly I have never seen the error before - but please note you are using our dynamic quant which might be supported. Instead use the basic BNB version

v3ss0n

Mar 26

Basic BNB - JIT Quant works fine , i wanted to use dynamic quants.

v3ss0n changed discussion title from I can't run any of the bnb-4bit quants with TextGenerationInference to I can't run any of the dynamic bnb-4bit quants with TextGenerationInference Mar 26