Ananya8154's picture
Update README.md
1bad355 verified
---
license: mit
language:
- en
library_name: transformers
tags:
- Llava
- Multimodal
- Image-Text-to-Text
- FineTuned
- Vision
---
# Model Details
This model is a fine-tuned version of the LLaVA-v1.5-7B language
model, which has been adapted to work with a custom Historical Paintings
Dataset. The fine-tuning process utilized PEFT (Parameter-Efficient Fine-Tuning)
LoRA and DeepSpeed to reduce the number of trainable parameters and efficiently
utilize GPU resources.
## Dataset
The dataset used for fine-tuning is a collection of famous historical paintings/arts from artists like Leonardo da vinci, or Von Aachen.
The dataset consists of 3k instances of image-text pairs. Given below is sample of text used in the data. Each instance contains image id as well as image path, which is important for llava.
```python
{
"id": "data_0001",
"image": "images/dataset/1.jpg",
"conversations": [
{
"from": "human",
"value": "What is this image?"
},
{
"from": "gpt",
"value": "The Procuring Scene by Hans von Aachen is a captivating masterpiece that showcases the artists exceptional talent in depicting the nuances of human behavior and social dynamics. With remarkable attention to detail von Aachen portrays a scene of seduction and illicit liaisons subtly hinting at the undercurrents of desire and power play that permeated the elite circles of his time. Through his deft brushstrokes and skillful "
}
]
},
```
## How to use?
**Note** - Don't use the model with the transformers 'Use this model' on huggingface, alternatively follow the belows step wise approach for inferencing this model.
The folder 'llava-v1.5-7b-task-lora' contains the lora weights and the folder 'llava-ftmodel' contains the merged model weights and configurations.
- To use the model:
```bash
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
```
- Now, Place the folder 'llava-ftmodel' (this repo) in 'LLaVA' directory
- Make sure transformers version is 4.37.2!
- Now, place the 'test.jpg' from this repo, in the 'LLaVA' directory (To use it as a test image)
- Now run the following command:
```bash
python -m llava.serve.cli --model-path 'llava-ftmodel' --image-file 'test.jpg'
```
The model will ask for Human input, Type 'Describe this image' or 'What is depicted in this figure?' and hit enter!
ENJOY!
## Model key metrices
- "train/global_step": 940,
- "train/train_samples_per_second": 7.443,
- "_step": 940,
- "train/loss": 0.1388,
- "train/epoch": 5,
## Intended Use
The fine-tuned LLaVA model is designed for tasks related to historical paintings, such as image captioning, visual question answering, and
multimodal understanding. It can be used by researchers, historians, and
enthusiasts interested in exploring and analyzing historical artworks.
## Fine Tuning Procedure
The model was fine-tuned using NVIDIA A40 GPU, with 48 GB of VRAM. The training process leveraged the efficiency of PEFT LoRA and
DeepSpeed to optimize the use of GPU resources and minimize the number of
trainable parameters. Once the new lora weights were trained, they were merged to the original model weights. After fine-tuning, the model achieved a final loss value
of 0.13
## Performance
The fine-tuned LLaVA model has demonstrated improved performance on tasks related to historical paintings compared to the original LLaVA-v1.5-7B
model. However, the exact performance metrics and benchmarks are not provided in
this model card.
### Limitations and Biases
As with any language model, the fine-tuned LLaVA model may exhibit biases present in the training data, which could include historical,
cultural, or societal biases. Additionally, the model's performance may be
limited by the quality and diversity of the Historical Paintings Dataset used
for fine-tuning.
### Ethical Considerations
Users of this model should be aware of potential ethical implications, such as the use of historical artworks without proper attribution
or consent. It is essential to respect intellectual property rights and ensure
that any generated content or analyses are used responsibly and respectfully.