Possible for a 5.0 quant or so?

#1
by Adzeiros - opened

Curious if you could make a 5.0 quant? I usually run 70B models at 4.86 bpw to get 32k context (using 4bit cache) and wonder if EXL3 will allow me to push it to 5.0 bpw

Added

MikeRoz changed discussion status to closed

Sign up or log in to comment