What is the difference between "cpu-int4-rtn-block-32-acc-level-4" and "cpu-int4-rtn-block-32"?

#3
by Zhubarb - opened

I understand both are aimed at CPU and mobile, what does "acc-level-4" stand for and what does it do?
The onnx files seem to be the same size, which one should we use when? I could not find details on the model card. Thanks in advance.

Microsoft org

ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. With accuracy level = 1 and accuracy level = 4. If better performance with a minor trade-off in accuracy (for example on mobile devices), we recommend using the model with acc-level-4.

gugarosa changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment