is this W8A16 or W8A8?

#3
by ehartford - opened

W8A16 is compatible with Ampere with Marlin kernel

W8A8 is only compatible with Hopper.

Which is this?

The quantization scheme is compatible with finegrained_fp8 in Transformers.
You should be able to run it with W8A8.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment