U4R
/

BoZhang commited on
Commit
482829b
·
verified ·
1 Parent(s): 52521a7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ <div align="center">
4
+ <h1> OmniCaptioner: One Captioner to Rule Them All </h1>
5
+
6
+ </div>
7
+ <div align="center">
8
+
9
+ <p align="center">
10
+ <a href="https://alpha-innovator.github.io/OmniCaptioner-project-page/"><b>HomePage</b></a>&nbsp&nbsp | &nbsp&nbsp <a href="https://github.com/Alpha-Innovator/OmniCaptioner">Github</a>&nbsp&nbsp | &nbsp&nbsp <a href="https://huggingface.co/papers/2504.07089">Paper</a>&nbsp&nbsp
11
+ </p>
12
+ </div>
13
+
14
+
15
+ ## 💻 Finetuning Code
16
+ ### 1. Create a conda environment and install PyTorch
17
+ ```bash
18
+ conda create -n OmniCap python=3.9
19
+ conda activate OmniCap
20
+ ```
21
+ ### 2.Install dependencies
22
+ ```bash
23
+ pip install -r requirements.txt
24
+ ```
25
+ ### 3. Install flash-attn
26
+ ```bash
27
+ pip install flash-attn --no-build-isolation
28
+ ```
29
+ ### 4. Prepare data
30
+ You can place the links to your data files in `./data/caption_data.yaml`.
31
+
32
+ ### 5. Start finetuning
33
+ ```bash
34
+ bash scripts/finetune_caption_slurm.sh
35
+ ```
36
+ ## 🚀 Inference Code
37
+
38
+ You can caption the image with AIGC style using the following command:
39
+
40
+
41
+ ```python
42
+ CUDA_VISIBLE_DEVICES=0 python src/inference_single_image.py \
43
+ --model_path your_model_path \
44
+ --image_path your_image_path \
45
+ --image_type aigc
46
+ ```
47
+
48
+ You can caption the image with OCR style using the following command:
49
+
50
+ ```python
51
+ CUDA_VISIBLE_DEVICES=0 python src/inference_single_image.py \
52
+ --model_path your_model_path \
53
+ --image_path your_image_path \
54
+ --image_type ocr
55
+ ```
56
+ ## 🚀 Evaluation Code with LLM
57
+
58
+ ```python
59
+
60
+ cd VLMEvalkit
61
+ conda create -n VLMEvalkit python=3.9
62
+ conda activate VLMEvalkit
63
+ pip install -e .
64
+
65
+ CUDA_VISIBLE_DEVICES=0 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-3B --verbose > output_omnicap_qwen2-5-3B_MMMU_DEV_VAL.log 2>&1 &
66
+ CUDA_VISIBLE_DEVICES=0,1 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-7B --verbose > output_omnicap_qwen2-5-7B_MMMU_DEV_VAL.log 2>&1 &
67
+ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-32B --verbose > output_omnicap_qwen2-5-32B_MMMU_DEV_VAL.log 2>&1 &
68
+
69
+ CUDA_VISIBLE_DEVICES=0 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-7B --verbose > output_omnicap_deepseek_distill_3B_MMMU_DEV_VAL.log 2>&1 &
70
+ CUDA_VISIBLE_DEVICES=0,1 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-32B --verbose > output_omnicap_deepseek_distill_32B_MMMU_DEV_VAL.log 2>&1 &
71
+ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-70B --verbose > output_omnicap_deepseek_distill_70B_MMMU_DEV_VAL.log 2>&1 &
72
+
73
+ ```
74
+
75
+
76
+ ## Citation
77
+
78
+ If you find the provided code or models useful for your research, consider citing them as:
79
+ ```
80
+
81
+ ```
82
+
83
+