Post
1991
Question: Quantization through GPTQ
Hi Team, I’m trying to quantize a 13b model using the below configuration on A100. I tried the below options
quantization_config = GPTQConfig(
bits=4,
group_size=128,
dataset="wikitext2",
batch_size=16,
desc_act=False
)
1. Enforce batch_size = 16 or batch_size = 2 at the quant configurations
2. Set tokenizer.pad_token_id = tokenizer.eos_token_id (which is 2)
I observed that even if we explicitly enforce the batch size and set the pad_token_id value other than None. It is not being considered
Can’t we set the batch_size and pad_token_id to some other value is this expected behavior with GPTQ . What is the reason behind this? Please suggest if there is any way to override the batch size config.
https://github.com/huggingface/optimum/blob/main/optimum/gptq/data.py#L51
Could you kindly suggest? Appreciate your kind support.
Thanks
Hi Team, I’m trying to quantize a 13b model using the below configuration on A100. I tried the below options
quantization_config = GPTQConfig(
bits=4,
group_size=128,
dataset="wikitext2",
batch_size=16,
desc_act=False
)
1. Enforce batch_size = 16 or batch_size = 2 at the quant configurations
2. Set tokenizer.pad_token_id = tokenizer.eos_token_id (which is 2)
I observed that even if we explicitly enforce the batch size and set the pad_token_id value other than None. It is not being considered
Can’t we set the batch_size and pad_token_id to some other value is this expected behavior with GPTQ . What is the reason behind this? Please suggest if there is any way to override the batch size config.
https://github.com/huggingface/optimum/blob/main/optimum/gptq/data.py#L51
Could you kindly suggest? Appreciate your kind support.
Thanks