File size: 5,382 Bytes
2f9b4d1
 
 
 
e1651b3
2f9b4d1
e1651b3
 
2f9b4d1
8213a92
2f9b4d1
6e7cb6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f9b4d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
base_model: Qwen/Qwen-VL-Chat
---

# Lumixion-e1-70k-fncall-qlora 

Lumixion is the first ever vast array of multi-modal function calling models easily available for usage. This is the first iteration finetuned on 70+ samples with qlora and many other optimizations. 
If you would like to work on real-world multi-modal AI join our discord: [LINK](https://discord.gg/a2FWEDD8HV)

![IMG](img.webp)

## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("AgoraX/Lumixion-e1-70k-fncall-qlora",trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    "AgoraX/Lumixion-e1-70k-fncall-qlora", # path to the output directory
    device_map="cuda",
    trust_remote_code=True
).eval()



# 1st dialogue turn
query = tokenizer.from_list_format([
    {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url
    {'text': "What are the objects in the image? What animals are present? Are there any people in the image?"},
])
print("sending model to chat")
response, history = model.chat(tokenizer, query=query, history=None)
print(response)

## How to Get Started with the Model
```


## output
```
[FUNCTION CALL]
{{
  'type': 'object',
  'properties': {{
    'objects': {{
      'type': 'array',
      'description': 'The objects present in the image.',
      'items': {{
        'type': 'string',
        'enum': ['dog', 'person', 'tree', 'path', 'sun']
      }}
    }},
    'animals': {{
      'type': 'array',
      'description': 'The animals present in the image.',
      'items': {{
        'type': 'string',
        'enum': ['dog']
      }}
    }},
    'people': {{
      'type': 'boolean',
      'description': 'Whether there are people in the image.',
      'enum': [true]
    }}
  }}
}}

[EXPECTED OUTPUT]
{{
  'objects': ['dog', 'person', 'tree', 'path', 'sun'],
  'animals': ['dog'],
  'people': true
}}

```











## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** Agora Research
- **Model type:** Vision Language Model
- **Language(s) (NLP):** English/Chinese
- **Finetuned from model:** Qwen-VL-Chat

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/QwenLM/Qwen-VL
- **Paper:** https://arxiv.org/pdf/2308.12966.pdf

## Uses
```
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
```
# Note: The default behavior now has injection attack prevention off.
```
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-VL-Chat",trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    "MODEL_PATH_HERE", # path to the output directory
    device_map="cuda",
    trust_remote_code=True
).eval()
```
# Specify hyperparameters for generation (generation_config if transformers < 4.32.0)
```
#model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)


# 1st dialogue turn
query = tokenizer.from_list_format([
    {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url
    {'text': "What are the objects in the image? What animals are present? Are there any people in the image?"},
])
print("sending model to chat")
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
```

# Print Results
```
[FUNCTION CALL]
{{
  'type': 'object',
  'properties': {{
    'objects': {{
      'type': 'array',
      'description': 'The objects present in the image.',
      'items': {{
        'type': 'string',
        'enum': ['dog', 'person', 'tree', 'path', 'sun']
      }}
    }},
    'animals': {{
      'type': 'array',
      'description': 'The animals present in the image.',
      'items': {{
        'type': 'string',
        'enum': ['dog']
      }}
    }},
    'people': {{
      'type': 'boolean',
      'description': 'Whether there are people in the image.',
      'enum': [true]
    }}
  }}
}}

[EXPECTED OUTPUT]
{{
  'objects': ['dog', 'person', 'tree', 'path', 'sun'],
  'animals': ['dog'],
  'people': true
}}

```
### Direct Use

Just send an image and ask a question in the text.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

(recommended) transformers >= 4.32.0

## How to Get Started with the Model
```
query = tokenizer.from_list_format([
    {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url
    {'text': "QUESTIONS/QUERIES GO HERE"},
])
```
## Training Details

### Training Data

Custom Function Calling Dataset with 70k examples

### Training Procedure 

qlora for 3 epochs