hfl-rc commited on
Commit
4472742
·
verified ·
1 Parent(s): 4e12261

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - multimodal
9
+ library_name: transformers
10
+ base_model:
11
+ - Qwen/Qwen2.5-VL-7B-Instruct
12
+ ---
13
+
14
+ # Qwen2.5-VL-7B-Instruct-GPTQ-Int3
15
+
16
+ This is an **UNOFFICIAL** GPTQ-Int3 quantized version of the `Qwen2.5-VL` model using `gptqmodel` library.
17
+
18
+ The model is compatible with the latest `transformers` library (which can run non-quantized Qwen2.5-VL models).
19
+
20
+ ### Performance
21
+
22
+ | Model | Size (Disk) | ChartQA (test) | OCRBench |
23
+ | ------------------------------------------------------------ | :---------: | :------------: | :------: |
24
+ | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 7.1 GB | 83.48 | 791 |
25
+ | [Qwen2.5-VL-3B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-AWQ) | 3.2 GB | 82.52 | 786 |
26
+ | [**Qwen2.5-VL-3B-Instruct-GPTQ-Int4**](https://huggingface.co/hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4) | 3.2 GB | 82.56 | 784 |
27
+ | [**Qwen2.5-VL-3B-Instruct-GPTQ-Int3**](https://huggingface.co/hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int3) | 2.9 GB | 76.68 | 742 |
28
+ | [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 16.0 GB | 83.2 | 846 |
29
+ | [Qwen2.5-VL-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct-AWQ) | 6.5 GB | 79.68 | 837 |
30
+ | [**Qwen2.5-VL-7B-Instruct-GPTQ-Int4**](https://huggingface.co/hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int4) | 6.5 GB | 81.48 | 845 |
31
+ | [**Qwen2.5-VL-7B-Instruct-GPTQ-Int3**](https://huggingface.co/hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3) | 5.8 GB | 78.56 | 823 |
32
+
33
+
34
+ #### Note
35
+
36
+ - Evaluations are performed using [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) with default setting.
37
+ - GPTQ models are computationally more effective (fewer VRAM usage, faster inference speed) than AWQ series in these evaluations.
38
+ - We recommend use `gptqmodel` instead of `autogptq` library, as `autogptq` is no longer maintained.
39
+
40
+ ### Quick Tour
41
+
42
+ Install the required libraries:
43
+ ```
44
+ pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils
45
+ pip install git+https://github.com/huggingface/optimum.git
46
+ pip install gptqmodel
47
+ ```
48
+
49
+ Optionally, you may need to install:
50
+
51
+ ```
52
+ pip install tokenicer device_smi logbar
53
+ ```
54
+
55
+ Sample code:
56
+
57
+ ```python
58
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
59
+ from qwen_vl_utils import process_vision_info
60
+
61
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
62
+ "hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4",
63
+ attn_implementation="flash_attention_2",
64
+ device_map="auto"
65
+ )
66
+ processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4")
67
+
68
+ messages = [{
69
+ "role": "user",
70
+ "content": [
71
+ {"type": "image", "image": "https://raw.githubusercontent.com/ymcui/Chinese-LLaMA-Alpaca-3/refs/heads/main/pics/banner.png"},
72
+ {"type": "text", "text": "请你描述一下这张图片。"},
73
+ ],
74
+ }]
75
+
76
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
77
+ image_inputs, video_inputs = process_vision_info(messages)
78
+ inputs = processor(
79
+ text=[text], images=image_inputs, videos=video_inputs,
80
+ padding=True, return_tensors="pt",
81
+ ).to("cuda")
82
+
83
+ generated_ids = model.generate(**inputs, max_new_tokens=512)
84
+ generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
85
+ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
86
+ print(output_text[0])
87
+ ```
88
+
89
+ Response:
90
+ > 这张图片展示了一个中文和英文的标志,内容为“中文LLaMA & Alpaca大模型”和“Chinese LLaMA & Alpaca Large Language Models”。标志左侧有两个卡通形象,一个是红色围巾的羊驼,另一个是白色毛发的羊驼,背景是一个绿色的草地和一座红色屋顶的建筑。标志右侧有一个数字3,旁边有一些电路图案。整体设计简洁明了,使用了明亮的颜色和可爱的卡通形象来吸引注意力。
91
+
92
+ ### Disclaimer
93
+ - **This is NOT an official model by Qwen. Use at your own risk.**
94
+ - For detailed usage, please check [Qwen2.5-VL's page](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct).