|
--- |
|
base_model: Qwen/Qwen-VL-Chat |
|
--- |
|
|
|
# Lumixion-e1-70k-fncall-qlora |
|
|
|
Lumixion is the first ever vast array of multi-modal function calling models easily available for usage. This is the first iteration finetuned on 70+ samples with qlora and many other optimizations. |
|
If you would like to work on real-world multi-modal AI join our discord: [LINK](https://discord.gg/a2FWEDD8HV) |
|
|
|
 |
|
|
|
## Usage |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from transformers.generation import GenerationConfig |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("AgoraX/Lumixion-e1-70k-fncall-qlora",trust_remote_code=True) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"AgoraX/Lumixion-e1-70k-fncall-qlora", # path to the output directory |
|
device_map="cuda", |
|
trust_remote_code=True |
|
).eval() |
|
|
|
|
|
|
|
# 1st dialogue turn |
|
query = tokenizer.from_list_format([ |
|
{'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url |
|
{'text': "What are the objects in the image? What animals are present? Are there any people in the image?"}, |
|
]) |
|
print("sending model to chat") |
|
response, history = model.chat(tokenizer, query=query, history=None) |
|
print(response) |
|
|
|
## How to Get Started with the Model |
|
``` |
|
|
|
|
|
## output |
|
``` |
|
[FUNCTION CALL] |
|
{{ |
|
'type': 'object', |
|
'properties': {{ |
|
'objects': {{ |
|
'type': 'array', |
|
'description': 'The objects present in the image.', |
|
'items': {{ |
|
'type': 'string', |
|
'enum': ['dog', 'person', 'tree', 'path', 'sun'] |
|
}} |
|
}}, |
|
'animals': {{ |
|
'type': 'array', |
|
'description': 'The animals present in the image.', |
|
'items': {{ |
|
'type': 'string', |
|
'enum': ['dog'] |
|
}} |
|
}}, |
|
'people': {{ |
|
'type': 'boolean', |
|
'description': 'Whether there are people in the image.', |
|
'enum': [true] |
|
}} |
|
}} |
|
}} |
|
|
|
[EXPECTED OUTPUT] |
|
{{ |
|
'objects': ['dog', 'person', 'tree', 'path', 'sun'], |
|
'animals': ['dog'], |
|
'people': true |
|
}} |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** Agora Research |
|
- **Model type:** Vision Language Model |
|
- **Language(s) (NLP):** English/Chinese |
|
- **Finetuned from model:** Qwen-VL-Chat |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/QwenLM/Qwen-VL |
|
- **Paper:** https://arxiv.org/pdf/2308.12966.pdf |
|
|
|
## Uses |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from transformers.generation import GenerationConfig |
|
``` |
|
# Note: The default behavior now has injection attack prevention off. |
|
``` |
|
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-VL-Chat",trust_remote_code=True) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"MODEL_PATH_HERE", # path to the output directory |
|
device_map="cuda", |
|
trust_remote_code=True |
|
).eval() |
|
``` |
|
# Specify hyperparameters for generation (generation_config if transformers < 4.32.0) |
|
``` |
|
#model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) |
|
|
|
|
|
# 1st dialogue turn |
|
query = tokenizer.from_list_format([ |
|
{'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url |
|
{'text': "What are the objects in the image? What animals are present? Are there any people in the image?"}, |
|
]) |
|
print("sending model to chat") |
|
response, history = model.chat(tokenizer, query=query, history=None) |
|
print(response) |
|
``` |
|
|
|
# Print Results |
|
``` |
|
[FUNCTION CALL] |
|
{{ |
|
'type': 'object', |
|
'properties': {{ |
|
'objects': {{ |
|
'type': 'array', |
|
'description': 'The objects present in the image.', |
|
'items': {{ |
|
'type': 'string', |
|
'enum': ['dog', 'person', 'tree', 'path', 'sun'] |
|
}} |
|
}}, |
|
'animals': {{ |
|
'type': 'array', |
|
'description': 'The animals present in the image.', |
|
'items': {{ |
|
'type': 'string', |
|
'enum': ['dog'] |
|
}} |
|
}}, |
|
'people': {{ |
|
'type': 'boolean', |
|
'description': 'Whether there are people in the image.', |
|
'enum': [true] |
|
}} |
|
}} |
|
}} |
|
|
|
[EXPECTED OUTPUT] |
|
{{ |
|
'objects': ['dog', 'person', 'tree', 'path', 'sun'], |
|
'animals': ['dog'], |
|
'people': true |
|
}} |
|
|
|
``` |
|
### Direct Use |
|
|
|
Just send an image and ask a question in the text. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
(recommended) transformers >= 4.32.0 |
|
|
|
## How to Get Started with the Model |
|
``` |
|
query = tokenizer.from_list_format([ |
|
{'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url |
|
{'text': "QUESTIONS/QUERIES GO HERE"}, |
|
]) |
|
``` |
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Custom Function Calling Dataset with 70k examples |
|
|
|
### Training Procedure |
|
|
|
qlora for 3 epochs |
|
|