nielsr HF Staff commited on
Commit
7c8b38c
·
verified ·
1 Parent(s): ff6f369

Set pipeline tag to feature-extraction, add link to code and usage examples

Browse files

This PR fixes the model card by setting the pipeline tag to feature-extraction, allowing it to be discovered at https://huggingface.co/models?pipeline_tag=feature-extraction.
It also adds a link to the Github repository and usage examples, for more details.

Files changed (1) hide show
  1. README.md +67 -30
README.md CHANGED
@@ -1,31 +1,68 @@
1
- ---
2
- license: other
3
- license_name: cogvlm2
4
- license_link: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/LICENS
5
-
6
- language:
7
- - ens
8
- pipeline_tag: text-generation
9
- tags:
10
- - chat
11
- - cogvlm2
12
-
13
- inference: false
14
- ---
15
- # VisionReward-Image
16
-
17
- ## Introduction
18
- We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
19
- Here, we present the model of VisionReward-Image.
20
-
21
- ## Merging and Extracting Checkpoint Files
22
- Use the following command to merge the split files into a single `.tar` file and then extract it into the specified directory:
23
-
24
- ```sh
25
- cat ckpts/split_part_* > ckpts/visionreward_image.tar
26
- tar -xvf ckpts/visionreward_image.tar
27
- ```
28
-
29
- ## Using this model
30
- You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  > This model utilizes fp32 precision parameters and requires the use of the sat (SwissArmyTransformer) library for invocation. For the bf16 (bfloat16) version of the model, please refer to the following link: [https://huggingface.co/THUDM/VisionReward-Image-bf16](https://huggingface.co/THUDM/VisionReward-Image-bf16)
 
1
+ ---
2
+ language:
3
+ - ens
4
+ license: other
5
+ license_name: cogvlm2
6
+ license_link: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/LICENS
7
+ pipeline_tag: feature-extraction
8
+ tags:
9
+ - chat
10
+ - cogvlm2
11
+ inference: false
12
+ ---
13
+
14
+ # VisionReward-Image
15
+
16
+ ## Introduction
17
+ We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
18
+ Here, we present the model of VisionReward-Image.
19
+
20
+ ## Merging and Extracting Checkpoint Files
21
+ Use the following command to merge the split files into a single `.tar` file and then extract it into the specified directory:
22
+
23
+ ```sh
24
+ cat ckpts/split_part_* > ckpts/visionreward_image.tar
25
+ tar -xvf ckpts/visionreward_image.tar
26
+ ```
27
+
28
+ ## Using this model
29
+ You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).
30
+
31
+ ### Usage
32
+
33
+ #### VQA (Vision-Question-Answering)
34
+ You can run the following commands for a checklist query. Available image and video questions can be found in `VisionReward_Image/VisionReward_image_qa.txt` and `VisionReward_Video/VisionReward_video_qa.txt`, respectively.
35
+ ```
36
+ # For Image QA
37
+ python inference-image.py --bf16 --question [[your_question]]
38
+ # Input: image_path + prompt + question
39
+ # Output: yes/no
40
+
41
+ # For Video QA
42
+ python inference-video.py --question [[your_question]]
43
+ # Input: video_path + prompt + question
44
+ # Output: yes/no
45
+ ```
46
+
47
+ #### Scoring with VisionReward
48
+ You can also calculate scores for images/videos with the following commands. The corresponding weights are in `VisionReward_Image/weight.json` and `VisionReward_Video/weight.json`
49
+ ```
50
+ # Scoring an Image
51
+ python inference-image.py --bf16 --score
52
+ # Input: image_path + prompt
53
+ # Output: score
54
+
55
+ # Scoring a Video
56
+ python inference-video.py --score
57
+ # Input: video_path + prompt
58
+ # Output: score
59
+ ```
60
+
61
+ #### Compare Two Videos
62
+ It's also possible to directly compare the quality of two videos, leveraging the weights in `VisionReward_Video/weight.json`.
63
+ ```
64
+ python inference-video.py --compare
65
+ # Input: video_path1 + video_path2 + prompt
66
+ # Output: better_video
67
+ ```
68
  > This model utilizes fp32 precision parameters and requires the use of the sat (SwissArmyTransformer) library for invocation. For the bf16 (bfloat16) version of the model, please refer to the following link: [https://huggingface.co/THUDM/VisionReward-Image-bf16](https://huggingface.co/THUDM/VisionReward-Image-bf16)