hfl-rc commited on
Commit
85acc98
·
verified ·
1 Parent(s): a5511ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -15
README.md CHANGED
@@ -13,7 +13,7 @@ base_model:
13
 
14
  # Qwen2.5-VL-7B-Instruct-GPTQ-Int4
15
 
16
- This is an **UNOFFICIAL** GPTQ-Int4 quantized version of the `Qwen2.5-VL-7B-Instruct` model using `gptqmodel` library.
17
 
18
  The model is compatible with the latest `transformers` library (which can run non-quantized Qwen2.5-VL models).
19
 
@@ -23,36 +23,47 @@ The model is compatible with the latest `transformers` library (which can run no
23
  | ------------------------------------------------------------ | :---------: | :------------: | :------: |
24
  | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 7.1 GB | 83.48 | 791 |
25
  | [Qwen2.5-VL-3B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-AWQ) | 3.2 GB | 82.52 | 786 |
26
- | **Qwen2.5-VL-3B-Instruct-GPTQ-Int4** | 3.2 GB | 82.56 | 784 |
 
27
  | [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 16.0 GB | 83.2 | 846 |
28
  | [Qwen2.5-VL-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct-AWQ) | 6.5 GB | 79.68 | 837 |
29
- | **Qwen2.5-VL-7B-Instruct-GPTQ-Int4** | 6.5 GB | 81.48 | 845 |
 
30
 
31
 
32
  #### Note
33
 
34
  - Evaluations are performed using [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) with default setting.
35
  - GPTQ models are computationally more effective (fewer VRAM usage, faster inference speed) than AWQ series in these evaluations.
 
36
 
37
  ### Quick Tour
38
 
39
  Install the required libraries:
40
  ```
41
  pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils
42
- pip install gptqmodel tokenicer # optional
 
 
 
 
 
 
 
43
  ```
44
 
45
  Sample code:
 
46
  ```python
47
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
48
  from qwen_vl_utils import process_vision_info
49
 
50
  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
51
- "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int4",
52
  attn_implementation="flash_attention_2",
53
  device_map="auto"
54
  )
55
- processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int4")
56
 
57
  messages = [{
58
  "role": "user",
@@ -72,17 +83,12 @@ inputs = processor(
72
  generated_ids = model.generate(**inputs, max_new_tokens=512)
73
  generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
74
  output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
75
- print(output_text)
76
  ```
77
 
78
- Results:
79
- ```
80
- ['这张图片展示了一个标志或图标,包含以下内容:\n\n1. 左侧有一个圆形的图标,里面有一幅插画,描绘了两只羊驼(Alpaca),背景中有树木和一座亭子。\n2. 中间部分用中文写着“中文LLaMA & Alpaca大模型”,意思是“Chinese LLaMA & Alpaca Large Language Models”。\n3. 右侧有一个黑色的数字“3”,旁边有一些电路板的图案。\n\n整体来看,这个标志可能与中文的大型语言模型(LLaMA和Alpaca)有关,可能是一个项目、平台或产品的名称。']
81
- ```
82
 
83
  ### Disclaimer
84
  - **This is NOT an official model by Qwen. Use at your own risk.**
85
- - For detailed usage, please check [Qwen2.5-VL's page](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct).
86
-
87
-
88
-
 
13
 
14
  # Qwen2.5-VL-7B-Instruct-GPTQ-Int4
15
 
16
+ This is an **UNOFFICIAL** GPTQ-Int4 quantized version of the `Qwen2.5-VL` model using `gptqmodel` library.
17
 
18
  The model is compatible with the latest `transformers` library (which can run non-quantized Qwen2.5-VL models).
19
 
 
23
  | ------------------------------------------------------------ | :---------: | :------------: | :------: |
24
  | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 7.1 GB | 83.48 | 791 |
25
  | [Qwen2.5-VL-3B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-AWQ) | 3.2 GB | 82.52 | 786 |
26
+ | [**Qwen2.5-VL-3B-Instruct-GPTQ-Int4**](https://huggingface.co/hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4) | 3.2 GB | 82.56 | 784 |
27
+ | [**Qwen2.5-VL-3B-Instruct-GPTQ-Int3**](https://huggingface.co/hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int3) | 2.9 GB | 76.68 | 742 |
28
  | [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 16.0 GB | 83.2 | 846 |
29
  | [Qwen2.5-VL-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct-AWQ) | 6.5 GB | 79.68 | 837 |
30
+ | [**Qwen2.5-VL-7B-Instruct-GPTQ-Int4**](https://huggingface.co/hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int4) | 6.5 GB | 81.48 | 845 |
31
+ | [**Qwen2.5-VL-7B-Instruct-GPTQ-Int3**](https://huggingface.co/hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3) | 5.8 GB | 78.56 | 823 |
32
 
33
 
34
  #### Note
35
 
36
  - Evaluations are performed using [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) with default setting.
37
  - GPTQ models are computationally more effective (fewer VRAM usage, faster inference speed) than AWQ series in these evaluations.
38
+ - We recommend use `gptqmodel` instead of `autogptq` library, as `autogptq` is no longer maintained.
39
 
40
  ### Quick Tour
41
 
42
  Install the required libraries:
43
  ```
44
  pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils
45
+ pip install git+https://github.com/huggingface/optimum.git
46
+ pip install gptqmodel
47
+ ```
48
+
49
+ Optionally, you may need to install:
50
+
51
+ ```
52
+ pip install tokenicer device_smi logbar
53
  ```
54
 
55
  Sample code:
56
+
57
  ```python
58
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
59
  from qwen_vl_utils import process_vision_info
60
 
61
  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
62
+ "hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4",
63
  attn_implementation="flash_attention_2",
64
  device_map="auto"
65
  )
66
+ processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4")
67
 
68
  messages = [{
69
  "role": "user",
 
83
  generated_ids = model.generate(**inputs, max_new_tokens=512)
84
  generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
85
  output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
86
+ print(output_text[0])
87
  ```
88
 
89
+ Response:
90
+ > 这张图片展示了一个中文和英文的标志,内容为“中文LLaMA & Alpaca大模型”和“Chinese LLaMA & Alpaca Large Language Models”。标志左侧有两个卡通形象,一个是红色围巾的羊驼,另一个是白色毛发的羊驼,背景是一个绿色的草地和一座红色屋顶的建筑。标志右侧有一个数字3,旁边有一些电路图案。整体设计简洁明了,使用了明亮的颜色和可爱的卡通形象来吸引注意力。
 
 
91
 
92
  ### Disclaimer
93
  - **This is NOT an official model by Qwen. Use at your own risk.**
94
+ - For detailed usage, please check [Qwen2.5-VL's page](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct).