moonshotai
/

Kimi-VL-A3B-Thinking

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

Nealeon commited on 18 days ago

Commit

edb5925

·

verified ·

1 Parent(s): 17bce8a

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -10,6 +10,15 @@ library_name: transformers
   <img width="30%" src="figures/logo.png">
 </div>
 ## Introduction
 We present **Kimi-VL**, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers **advanced multimodal reasoning, long-context understanding, and strong agent capabilities**—all while activating only **2.8B** parameters in its language decoder (Kimi-VL-A3B).

   <img width="30%" src="figures/logo.png">
 </div>
+<div align="center">
+  <a href="https://arxiv.org/abs/2504.07491">
+    <b>📄 Tech Report</b>
+  </a> &nbsp;|&nbsp;
+  <a href="https://github.com/MoonshotAI/Kimi-VL">
+    <b>📄 Github</b>
+  </a> &nbsp;|&nbsp;
+  <a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking/">💬 Chat Web</a>
+</div>
 ## Introduction
 We present **Kimi-VL**, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers **advanced multimodal reasoning, long-context understanding, and strong agent capabilities**—all while activating only **2.8B** parameters in its language decoder (Kimi-VL-A3B).