U4R
/

BoZhang commited on
Commit
1753ff2
Β·
verified Β·
1 Parent(s): a060e73

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -15
README.md CHANGED
@@ -13,27 +13,64 @@
13
 
14
 
15
  ## πŸ’» Finetuning Code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- Coming Soon
18
-
 
 
19
  ## πŸš€ Inference Code
20
 
21
- - Python >= 3.10.0 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
22
- - [PyTorch >= 2.0.1+cu12.1](https://pytorch.org/)
23
 
24
- ```bash
25
- git clone https://github.com/NVlabs/Sana.git
26
- cd Sana
27
- ./environment_setup.sh sana
28
- ```
29
- - Prepare the prompts in asset/samples/samples.txt
30
 
31
- ```
32
- python scripts/inference.py \
33
- --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
34
- --model_path=hf://U4R/Sana_trainwithOmnicap/sana_omnicaptioner.pth
35
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
 
37
 
38
 
39
  ## Citation
 
13
 
14
 
15
  ## πŸ’» Finetuning Code
16
+ ### 1. Create a conda environment and install PyTorch
17
+ ```bash
18
+ conda create -n OmniCap python=3.9
19
+ conda activate OmniCap
20
+ ```
21
+ ### 2.Install dependencies
22
+ ```bash
23
+ pip install -r requirements.txt
24
+ ```
25
+ ### 3. Install flash-attn
26
+ ```bash
27
+ pip install flash-attn --no-build-isolation
28
+ ```
29
+ ### 4. Prepare data
30
+ You can place the links to your data files in `./data/caption_data.yaml`.
31
 
32
+ ### 5. Start finetuning
33
+ ```bash
34
+ bash scripts/finetune_caption_slurm.sh
35
+ ```
36
  ## πŸš€ Inference Code
37
 
38
+ You can caption the image with AIGC style using the following command:
 
39
 
 
 
 
 
 
 
40
 
41
+ ```python
42
+ CUDA_VISIBLE_DEVICES=0 python src/inference_single_image.py \
43
+ --model_path your_model_path \
44
+ --image_path your_image_path \
45
+ --image_type aigc
46
+ ```
47
+
48
+ You can caption the image with OCR style using the following command:
49
+
50
+ ```python
51
+ CUDA_VISIBLE_DEVICES=0 python src/inference_single_image.py \
52
+ --model_path your_model_path \
53
+ --image_path your_image_path \
54
+ --image_type ocr
55
+ ```
56
+ ## πŸš€ Evaluation Code with LLM
57
+
58
+ ```python
59
+
60
+ cd VLMEvalkit
61
+ conda create -n VLMEvalkit python=3.9
62
+ conda activate VLMEvalkit
63
+ pip install -e .
64
+
65
+ CUDA_VISIBLE_DEVICES=0 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-3B --verbose > output_omnicap_qwen2-5-3B_MMMU_DEV_VAL.log 2>&1 &
66
+ CUDA_VISIBLE_DEVICES=0,1 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-7B --verbose > output_omnicap_qwen2-5-7B_MMMU_DEV_VAL.log 2>&1 &
67
+ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-qwen2-5-32B --verbose > output_omnicap_qwen2-5-32B_MMMU_DEV_VAL.log 2>&1 &
68
+
69
+ CUDA_VISIBLE_DEVICES=0 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-7B --verbose > output_omnicap_deepseek_distill_3B_MMMU_DEV_VAL.log 2>&1 &
70
+ CUDA_VISIBLE_DEVICES=0,1 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-32B --verbose > output_omnicap_deepseek_distill_32B_MMMU_DEV_VAL.log 2>&1 &
71
+ CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python run.py --data MMMU_DEV_VAL --model Omnicaptioner-deepseek-distill-70B --verbose > output_omnicap_deepseek_distill_70B_MMMU_DEV_VAL.log 2>&1 &
72
 
73
+ ```
74
 
75
 
76
  ## Citation