amd
/

gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid

Text Generation

ONNX

English

Model card Files Files and versions Community

satreysa commited on Mar 19

Commit

ef5f951

verified ·

1 Parent(s): 7a1e195

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -37

README.md CHANGED Viewed

@@ -3,49 +3,36 @@ language:
 - en
 pipeline_tag: text-generation
 base_model:
-- google/gemma-2-2b
 license: gemma
 ---
 # gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
 - ## Introduction
-  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Strategy
-  - ***Quantized Layers***: All linear layers
-  - ***Weight***: uint4 asymmetric per-group. group_size=32 for lm_head, and group_size=128 for the rest.
 - ## Quick Start
-1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
-2. Run the quantization script in the example folder using the following command line:
-    ```sh
-    export MODEL_DIR = [local model checkpoint folder] or google/gemma-2-2b
-    # single GPU
-    python quantize_quark.py --model_dir $MODEL_DIR \
-                            --output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
-                            --quant_scheme w_uint4_per_group_asym \
-                            --num_calib_data 128 \
-                            --quant_algo awq \
-                            --dataset pileval_for_awq_benchmark \
-                            --model_export hf_format \
-                            --group_size 128 \
-                            --group_size_per_layer lm_head 32 \
-                            --data_type float32 \
-                            --exclude_layers
-    # cpu
-    python quantize_quark.py --model_dir $MODEL_DIR \
-                            --output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
-                            --quant_scheme w_uint4_per_group_asym \
-                            --num_calib_data 128 \
-                            --quant_algo awq \
-                            --dataset pileval_for_awq_benchmark \
-                            --model_export hf_format \
-                            --group_size 128 \
-                            --group_size_per_layer lm_head 32 \
-                            --data_type float32 \
-                            --exclude_layers \
-                            --device cpu
-    ```
-## Deployment
-Quark has its own export format, quark_safetensors, which is compatible with autoAWQ exports.
 #### License
-Modifications copyright(c) 2025 Advanced Micro Devices,Inc. All rights reserved.

 - en
 pipeline_tag: text-generation
 base_model:
+- amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
 license: gemma
 ---
 # gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
 - ## Introduction
+   This model was prepared using the AMD Quark Quantization tool, followed by necessary post-processing.
 - ## Quantization Strategy
+  - AWQ / Group size lm_head 32 / Group size 128 / Asymmetric / UINT4 Weights / FP16 activations
+  - Excluded Layers: None
 - ## Quick Start
+For quickstart, refer to [Ryzen AI doucmentation](https://ryzenai.docs.amd.com/en/latest/hybrid_oga.html)
+#### Evaluation scores
+The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 66.625.
 #### License
+Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.