File size: 5,382 Bytes
2f9b4d1 e1651b3 2f9b4d1 e1651b3 2f9b4d1 8213a92 2f9b4d1 6e7cb6d 2f9b4d1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
---
base_model: Qwen/Qwen-VL-Chat
---
# Lumixion-e1-70k-fncall-qlora
Lumixion is the first ever vast array of multi-modal function calling models easily available for usage. This is the first iteration finetuned on 70+ samples with qlora and many other optimizations.
If you would like to work on real-world multi-modal AI join our discord: [LINK](https://discord.gg/a2FWEDD8HV)

## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("AgoraX/Lumixion-e1-70k-fncall-qlora",trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"AgoraX/Lumixion-e1-70k-fncall-qlora", # path to the output directory
device_map="cuda",
trust_remote_code=True
).eval()
# 1st dialogue turn
query = tokenizer.from_list_format([
{'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url
{'text': "What are the objects in the image? What animals are present? Are there any people in the image?"},
])
print("sending model to chat")
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
## How to Get Started with the Model
```
## output
```
[FUNCTION CALL]
{{
'type': 'object',
'properties': {{
'objects': {{
'type': 'array',
'description': 'The objects present in the image.',
'items': {{
'type': 'string',
'enum': ['dog', 'person', 'tree', 'path', 'sun']
}}
}},
'animals': {{
'type': 'array',
'description': 'The animals present in the image.',
'items': {{
'type': 'string',
'enum': ['dog']
}}
}},
'people': {{
'type': 'boolean',
'description': 'Whether there are people in the image.',
'enum': [true]
}}
}}
}}
[EXPECTED OUTPUT]
{{
'objects': ['dog', 'person', 'tree', 'path', 'sun'],
'animals': ['dog'],
'people': true
}}
```
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Agora Research
- **Model type:** Vision Language Model
- **Language(s) (NLP):** English/Chinese
- **Finetuned from model:** Qwen-VL-Chat
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/QwenLM/Qwen-VL
- **Paper:** https://arxiv.org/pdf/2308.12966.pdf
## Uses
```
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
```
# Note: The default behavior now has injection attack prevention off.
```
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-VL-Chat",trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"MODEL_PATH_HERE", # path to the output directory
device_map="cuda",
trust_remote_code=True
).eval()
```
# Specify hyperparameters for generation (generation_config if transformers < 4.32.0)
```
#model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
# 1st dialogue turn
query = tokenizer.from_list_format([
{'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url
{'text': "What are the objects in the image? What animals are present? Are there any people in the image?"},
])
print("sending model to chat")
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
```
# Print Results
```
[FUNCTION CALL]
{{
'type': 'object',
'properties': {{
'objects': {{
'type': 'array',
'description': 'The objects present in the image.',
'items': {{
'type': 'string',
'enum': ['dog', 'person', 'tree', 'path', 'sun']
}}
}},
'animals': {{
'type': 'array',
'description': 'The animals present in the image.',
'items': {{
'type': 'string',
'enum': ['dog']
}}
}},
'people': {{
'type': 'boolean',
'description': 'Whether there are people in the image.',
'enum': [true]
}}
}}
}}
[EXPECTED OUTPUT]
{{
'objects': ['dog', 'person', 'tree', 'path', 'sun'],
'animals': ['dog'],
'people': true
}}
```
### Direct Use
Just send an image and ask a question in the text.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
(recommended) transformers >= 4.32.0
## How to Get Started with the Model
```
query = tokenizer.from_list_format([
{'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url
{'text': "QUESTIONS/QUERIES GO HERE"},
])
```
## Training Details
### Training Data
Custom Function Calling Dataset with 70k examples
### Training Procedure
qlora for 3 epochs
|