Curious if you could make a 5.0 quant? I usually run 70B models at 4.86 bpw to get 32k context (using 4bit cache) and wonder if EXL3 will allow me to push it to 5.0 bpw
Added
· Sign up or log in to comment