Transformers
GGUF
Serbian
mistral
text-generation-inference
conversational
datatab commited on
Commit
9a20e6f
·
verified ·
1 Parent(s): 504670e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -61,4 +61,34 @@ datasets:
61
  <td><strong>35.60</strong></td>
62
  <td>69.43</td>
63
  </tr>
64
- </table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  <td><strong>35.60</strong></td>
62
  <td>69.43</td>
63
  </tr>
64
+ </table>
65
+
66
+ # Quant. preference
67
+
68
+ | Quant. | Description |
69
+ |---------------|---------------------------------------------------------------------------------------|
70
+ | not_quantized | Recommended. Fast conversion. Slow inference, big files. |
71
+ | fast_quantized| Recommended. Fast conversion. OK inference, OK file size. |
72
+ | quantized | Recommended. Slow conversion. Fast inference, small files. |
73
+ | f32 | Not recommended. Retains 100% accuracy, but super slow and memory hungry. |
74
+ | f16 | Fastest conversion + retains 100% accuracy. Slow and memory hungry. |
75
+ | q8_0 | Fast conversion. High resource use, but generally acceptable. |
76
+ | q4_k_m | Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K |
77
+ | q5_k_m | Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K |
78
+ | q2_k | Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.|
79
+ | q3_k_l | Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K |
80
+ | q3_k_m | Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K |
81
+ | q3_k_s | Uses Q3_K for all tensors |
82
+ | q4_0 | Original quant method, 4-bit. |
83
+ | q4_1 | Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.|
84
+ | q4_k_s | Uses Q4_K for all tensors |
85
+ | q4_k | alias for q4_k_m |
86
+ | q5_k | alias for q5_k_m |
87
+ | q5_0 | Higher accuracy, higher resource usage and slower inference. |
88
+ | q5_1 | Even higher accuracy, resource usage and slower inference. |
89
+ | q5_k_s | Uses Q5_K for all tensors |
90
+ | q6_k | Uses Q8_K for all tensors |
91
+ | iq2_xxs | 2.06 bpw quantization |
92
+ | iq2_xs | 2.31 bpw quantization |
93
+ | iq3_xxs | 3.06 bpw quantization |
94
+ | q3_k_xs | 3-bit extra small quantization |