File size: 2,205 Bytes
aec7957
 
 
 
 
 
0e0e438
aec7957
0e0e438
aec7957
 
 
2786e40
 
aec7957
 
 
 
 
 
 
550d215
 
0e0e438
 
aec7957
0e0e438
 
 
aec7957
0e0e438
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0aca20
0e0e438
 
 
2786e40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
base_model: unsloth/pixtral-12b-2409-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2_vl
- trl
- qlora
license: apache-2.0
language:
- en
datasets:
- unsloth/llava-instruct-mix-vsft-mini
---

# Uploaded  model

- **Developed by:** MMoshtaghi
- **License:** apache-2.0
- **Finetuned from model :** unsloth/pixtral-12b-2409-unsloth-bnb-4bit
- **Finetuned on dataset:** [unsloth/llava-instruct-mix-vsft-mini](https://huggingface.co/datasets/unsloth/llava-instruct-mix-vsft-mini)
- **PEFT method :** [Quantized LoRA](https://huggingface.co/papers/2305.14314)
                       
## Quick start

```python
from datasets import load_dataset
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    model_name = "MMoshtaghi/Pixtral-12B-2409-LoRAAdpt-General",
    load_in_4bit = True,
)
FastVisionModel.for_inference(model) # Enable for inference!

dataset = load_dataset("unsloth/llava-instruct-mix-vsft-mini", split = "train")
image = dataset[2]["images"][0]
instruction = "Is there something interesting about this image?"

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64,
                   use_cache = True, temperature = 1.5, min_p = 0.1)
```

### Framework versions

- TRL: 0.13.0
- Transformers: 4.47.1
- Pytorch: 2.5.1+cu121
- Datasets: 3.2.0
- Tokenizers: 0.21.0
- Unsloth: 2025.1.5

## Training procedure
(Log-in required!)
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/open_ai/huggingface/runs/rvj0a631) 


## Citations
This VLM model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.