nielsr HF staff commited on
Commit
4e8d1f3
·
verified ·
1 Parent(s): e300d83

Improve Metadata and add Paper/Github links

Browse files

This PR improves the metadata by adding the `datasets` tag, correcting the `pipeline_tag` to `image-text-to-text`, and including the `library_name`. It also adds links to the paper and the Github repository.

Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -1,14 +1,19 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - OpenGVLab/InternVL2_5-8B
7
- pipeline_tag: visual-question-answering
 
 
 
 
 
 
8
  ---
9
 
10
  **DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning**
11
 
 
 
12
  DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
13
 
14
  **Key Features:**
@@ -57,6 +62,8 @@ tokenizer = AutoTokenizer.from_pretrained(
57
 
58
  For detailed usage instructions and additional configurations, please refer to the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) repository.
59
 
 
 
60
 
61
  **Limitations:**
62
- While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.
 
1
  ---
 
 
 
2
  base_model:
3
  - OpenGVLab/InternVL2_5-8B
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
+ datasets:
10
+ - ayeshaishaq/DriveLMMo1
11
  ---
12
 
13
  **DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning**
14
 
15
+ [Paper](https://arxiv.org/abs/2503.10621)
16
+
17
  DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
18
 
19
  **Key Features:**
 
62
 
63
  For detailed usage instructions and additional configurations, please refer to the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) repository.
64
 
65
+ Code: [https://github.com/Vision-CAIR/DriveLMM](https://github.com/Vision-CAIR/DriveLMM)
66
+
67
 
68
  **Limitations:**
69
+ While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.