Improve Metadata and add Paper/Github links
Browse filesThis PR improves the metadata by adding the `datasets` tag, correcting the `pipeline_tag` to `image-text-to-text`, and including the `library_name`. It also adds links to the paper and the Github repository.
README.md
CHANGED
@@ -1,14 +1,19 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
-
language:
|
4 |
-
- en
|
5 |
base_model:
|
6 |
- OpenGVLab/InternVL2_5-8B
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
**DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning**
|
11 |
|
|
|
|
|
12 |
DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
|
13 |
|
14 |
**Key Features:**
|
@@ -57,6 +62,8 @@ tokenizer = AutoTokenizer.from_pretrained(
|
|
57 |
|
58 |
For detailed usage instructions and additional configurations, please refer to the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) repository.
|
59 |
|
|
|
|
|
60 |
|
61 |
**Limitations:**
|
62 |
-
While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- OpenGVLab/InternVL2_5-8B
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
license: apache-2.0
|
7 |
+
pipeline_tag: image-text-to-text
|
8 |
+
library_name: transformers
|
9 |
+
datasets:
|
10 |
+
- ayeshaishaq/DriveLMMo1
|
11 |
---
|
12 |
|
13 |
**DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning**
|
14 |
|
15 |
+
[Paper](https://arxiv.org/abs/2503.10621)
|
16 |
+
|
17 |
DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
|
18 |
|
19 |
**Key Features:**
|
|
|
62 |
|
63 |
For detailed usage instructions and additional configurations, please refer to the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) repository.
|
64 |
|
65 |
+
Code: [https://github.com/Vision-CAIR/DriveLMM](https://github.com/Vision-CAIR/DriveLMM)
|
66 |
+
|
67 |
|
68 |
**Limitations:**
|
69 |
+
While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.
|