Update README.md
Browse files
README.md
CHANGED
@@ -3,49 +3,36 @@ language:
|
|
3 |
- en
|
4 |
pipeline_tag: text-generation
|
5 |
base_model:
|
6 |
-
-
|
7 |
license: gemma
|
8 |
---
|
9 |
|
10 |
# gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
|
11 |
- ## Introduction
|
12 |
-
|
|
|
13 |
- ## Quantization Strategy
|
14 |
-
-
|
15 |
-
-
|
|
|
16 |
- ## Quick Start
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
# single GPU
|
22 |
-
python quantize_quark.py --model_dir $MODEL_DIR \
|
23 |
-
--output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
|
24 |
-
--quant_scheme w_uint4_per_group_asym \
|
25 |
-
--num_calib_data 128 \
|
26 |
-
--quant_algo awq \
|
27 |
-
--dataset pileval_for_awq_benchmark \
|
28 |
-
--model_export hf_format \
|
29 |
-
--group_size 128 \
|
30 |
-
--group_size_per_layer lm_head 32 \
|
31 |
-
--data_type float32 \
|
32 |
-
--exclude_layers
|
33 |
-
# cpu
|
34 |
-
python quantize_quark.py --model_dir $MODEL_DIR \
|
35 |
-
--output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
|
36 |
-
--quant_scheme w_uint4_per_group_asym \
|
37 |
-
--num_calib_data 128 \
|
38 |
-
--quant_algo awq \
|
39 |
-
--dataset pileval_for_awq_benchmark \
|
40 |
-
--model_export hf_format \
|
41 |
-
--group_size 128 \
|
42 |
-
--group_size_per_layer lm_head 32 \
|
43 |
-
--data_type float32 \
|
44 |
-
--exclude_layers \
|
45 |
-
--device cpu
|
46 |
-
```
|
47 |
-
## Deployment
|
48 |
-
Quark has its own export format, quark_safetensors, which is compatible with autoAWQ exports.
|
49 |
|
50 |
#### License
|
51 |
-
Modifications copyright(c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
- en
|
4 |
pipeline_tag: text-generation
|
5 |
base_model:
|
6 |
+
- amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
|
7 |
license: gemma
|
8 |
---
|
9 |
|
10 |
# gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
|
11 |
- ## Introduction
|
12 |
+
This model was prepared using the AMD Quark Quantization tool, followed by necessary post-processing.
|
13 |
+
|
14 |
- ## Quantization Strategy
|
15 |
+
- AWQ / Group size lm_head 32 / Group size 128 / Asymmetric / UINT4 Weights / FP16 activations
|
16 |
+
- Excluded Layers: None
|
17 |
+
|
18 |
- ## Quick Start
|
19 |
+
For quickstart, refer to [Ryzen AI doucmentation](https://ryzenai.docs.amd.com/en/latest/hybrid_oga.html)
|
20 |
+
|
21 |
+
#### Evaluation scores
|
22 |
+
The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 66.625.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
#### License
|
25 |
+
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
|
26 |
+
|
27 |
+
MIT License
|
28 |
+
|
29 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal
|
30 |
+
in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
31 |
+
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
32 |
+
|
33 |
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
34 |
+
|
35 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
36 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
37 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
38 |
+
SOFTWARE.
|