KR PRO

KashyapR

AI & ML interests

LLM's, Generative AI, RAG, ML

Recent Activity

Organizations

None yet

Posts 1

view post
Post
1991
Question: Quantization through GPTQ

Hi Team, I’m trying to quantize a 13b model using the below configuration on A100. I tried the below options

quantization_config = GPTQConfig(
bits=4,
group_size=128,
dataset="wikitext2",
batch_size=16,
desc_act=False

)

1. Enforce batch_size = 16 or batch_size = 2 at the quant configurations
2. Set tokenizer.pad_token_id = tokenizer.eos_token_id (which is 2)

I observed that even if we explicitly enforce the batch size and set the pad_token_id value other than None. It is not being considered

Can’t we set the batch_size and pad_token_id to some other value is this expected behavior with GPTQ . What is the reason behind this? Please suggest if there is any way to override the batch size config.

https://github.com/huggingface/optimum/blob/main/optimum/gptq/data.py#L51

Could you kindly suggest? Appreciate your kind support.
Thanks

datasets 0

None public yet