bowenbaoamd commited on
Commit
2505537
·
verified ·
1 Parent(s): 171c2e8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -26,7 +26,8 @@ python3 quantize_quark.py \
26
  --kv_cache_dtype fp8 \
27
  --num_calib_data 128 \
28
  --model_export quark_safetensors \
29
- --no_weight_matrix_merge
 
30
 
31
  # If model size is too large for single GPU, please use multi GPU instead.
32
  python3 quantize_quark.py \
@@ -37,7 +38,8 @@ python3 quantize_quark.py \
37
  --num_calib_data 128 \
38
  --model_export quark_safetensors \
39
  --no_weight_matrix_merge \
40
- --multi_gpu
 
41
  ```
42
  ## Deployment
43
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 
26
  --kv_cache_dtype fp8 \
27
  --num_calib_data 128 \
28
  --model_export quark_safetensors \
29
+ --no_weight_matrix_merge \
30
+ --custom_mode fp8
31
 
32
  # If model size is too large for single GPU, please use multi GPU instead.
33
  python3 quantize_quark.py \
 
38
  --num_calib_data 128 \
39
  --model_export quark_safetensors \
40
  --no_weight_matrix_merge \
41
+ --multi_gpu \
42
+ --custom_mode fp8
43
  ```
44
  ## Deployment
45
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).