Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ingridtv 's Collections
GenAI/LLM
Multimodal/VLM

Multimodal/VLM

updated 10 days ago
Upvote
-

  • microsoft/Phi-4-multimodal-instruct

    Automatic Speech Recognition • Updated May 1 • 436k • 1.41k

  • microsoft/Phi-4-mini-instruct

    Text Generation • Updated May 1 • 343k • 489

  • SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

    Paper • 2503.11576 • Published Mar 14 • 108

  • Emerging Properties in Unified Multimodal Pretraining

    Paper • 2505.14683 • Published 12 days ago • 124

  • google/medgemma-4b-it

    Image-Text-to-Text • Updated 11 days ago • 28.3k • 296

  • kelkalot/medgemma-4b-it-GGUF

    Updated 11 days ago • 156 • 1
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs