is this W8A16 or W8A8?
#3
by
ehartford
- opened
W8A16 is compatible with Ampere with Marlin kernel
W8A8 is only compatible with Hopper.
Which is this?
The quantization scheme is compatible with finegrained_fp8 in Transformers.
You should be able to run it with W8A8.