Update README.md
Browse files
README.md
CHANGED
@@ -40,9 +40,9 @@ Use Jinja Template or CHATML template.
|
|
40 |
|
41 |
IMPORTANT NOTES:
|
42 |
|
43 |
-
- Due to the unique nature (MOE, Size, Activated experts) of this model GGUF quants can be run on the CPU, GPU or with GPU part "off-load", right up to full precision.
|
44 |
-
- This model is difficult to Imatrix : You need a much larger imatrix file / multi-language / multi-content to imatrix it.
|
45 |
-
- GPU speeds will be BLISTERING 4x-8x or higher than CPU only AND relative to other "30B" models (equal roughly to 7.5B "normal" model speeds).
|
46 |
|
47 |
Please refer the org model card for details, benchmarks, how to use, settings, system roles etc etc :
|
48 |
|
|
|
40 |
|
41 |
IMPORTANT NOTES:
|
42 |
|
43 |
+
- Due to the unique nature (MOE, Size, Activated experts, size of experts) of this model GGUF quants can be run on the CPU, GPU or with GPU part "off-load", right up to full precision.
|
44 |
+
- This model is difficult to Imatrix : You need a much larger imatrix file / multi-language / multi-content (ie code/text) to imatrix it.
|
45 |
+
- GPU speeds will be BLISTERING 4x-8x or higher than CPU only speeds AND this model will be BLISTERING too, relative to other "30B" models (Token per second speed equal roughly to 7.5B "normal" model speeds).
|
46 |
|
47 |
Please refer the org model card for details, benchmarks, how to use, settings, system roles etc etc :
|
48 |
|