Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
s-emanuilov 's Collections
Query expansion
LLM reasoning
Multimodal models
Agents
Small Language Models
RAG
Embeddings

Multimodal models

updated Jan 16

Papers on AI models that combine vision and language capabilities.

Upvote
-

  • LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

    Paper • 2501.03895 • Published Jan 7 • 53

  • LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

    Paper • 2501.06186 • Published Jan 10 • 66

  • Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

    Paper • 2501.09012 • Published Jan 15 • 10
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs