Commit to readme
Browse files
README.md
CHANGED
@@ -1,3 +1,66 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- Llava
|
8 |
+
- Multimodal
|
9 |
+
- Image-Text-to-Text
|
10 |
+
- FineTuned
|
11 |
+
- Vision
|
12 |
+
|
13 |
+
---
|
14 |
+
|
15 |
+
# Model Details
|
16 |
+
This model is a fine-tuned version of the LLaVA-v1.5-7B language
|
17 |
+
model, which has been adapted to work with a custom Historical Paintings
|
18 |
+
Dataset. The fine-tuning process utilized PEFT (Parameter-Efficient Fine-Tuning)
|
19 |
+
LoRA and DeepSpeed to reduce the number of trainable parameters and efficiently
|
20 |
+
utilize GPU resources.
|
21 |
+
|
22 |
+
## How to use?
|
23 |
+
The folder 'llava-v1.5-7b-task-lora' contains the lora weights and the folder 'llava-ftmodel' contains the merged model weights and configurations.
|
24 |
+
- To use the model:
|
25 |
+
```bash
|
26 |
+
git clone https://github.com/haotian-liu/LLaVA.git
|
27 |
+
cd LLaVA
|
28 |
+
```
|
29 |
+
- Now, Place the folder 'llava-ftmodel' (this repo) in 'LLaVA' directory
|
30 |
+
- Make sure transformers version is 4.37.2!
|
31 |
+
- Now, place the 'test.jpg' from this repo, in the 'LLaVA' directory (To use it as a test image)
|
32 |
+
- Now run the following command:
|
33 |
+
```bash
|
34 |
+
python -m llava.serve.cli --model-path 'llava-ftmodel' --image-file 'test.jpg'
|
35 |
+
```
|
36 |
+
The model will ask for Human input, Type 'Describe this image' or 'What is depicted in this figure?' and hit enter!
|
37 |
+
ENJOY!
|
38 |
+
|
39 |
+
|
40 |
+
## Intended Use
|
41 |
+
The fine-tuned LLaVA model is designed for tasks related to historical paintings, such as image captioning, visual question answering, and
|
42 |
+
multimodal understanding. It can be used by researchers, historians, and
|
43 |
+
enthusiasts interested in exploring and analyzing historical artworks.
|
44 |
+
|
45 |
+
## Fine Tuning Procedure
|
46 |
+
The model was fine-tuned using 8 NVIDIA A40 GPUs, each with 48 GB of VRAM. The training process leveraged the efficiency of PEFT LoRA and
|
47 |
+
DeepSpeed to optimize the use of GPU resources and minimize the number of
|
48 |
+
trainable parameters. Once the new lora weights were trained, they were merged to the original model weights. After fine-tuning, the model achieved a final loss value
|
49 |
+
of 0.11.
|
50 |
+
|
51 |
+
## Performance
|
52 |
+
The fine-tuned LLaVA model has demonstrated improved performance on tasks related to historical paintings compared to the original LLaVA-v1.5-7B
|
53 |
+
model. However, the exact performance metrics and benchmarks are not provided in
|
54 |
+
this model card.
|
55 |
+
|
56 |
+
### Limitations and Biases
|
57 |
+
As with any language model, the fine-tuned LLaVA model may exhibit biases present in the training data, which could include historical,
|
58 |
+
cultural, or societal biases. Additionally, the model's performance may be
|
59 |
+
limited by the quality and diversity of the Historical Paintings Dataset used
|
60 |
+
for fine-tuning.
|
61 |
+
|
62 |
+
### Ethical Considerations
|
63 |
+
Users of this model should be aware of potential ethical implications, such as the use of historical artworks without proper attribution
|
64 |
+
or consent. It is essential to respect intellectual property rights and ensure
|
65 |
+
that any generated content or analyses are used responsibly and respectfully.
|
66 |
+
|