NeMo

qnemo file

#9
by willy1212009 - opened

did anyone do the PTQ from nemo-framework to get nemotron-340b fp8/int4 qnemo file? it should use 16H100 or 8H200 to convert, but we dont have this equipment QQ.
but it's weird that we want use quantize but it need 16H100 first lol.
in paper, it show if use quantize, only need 8H100

https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/ptq.html

NVIDIA org

Same as for the base model, there's some quantization work in progress (but not sure about int4) that will be shared once full validated.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment