view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 4 days ago • 265
Shot categorizer Collection Fine-tune of Florence-2 to generate shot categories, useful for data curation. Code: https://github.com/huggingface/movie-shot-categorizer. • 3 items • Updated 10 days ago • 2
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 12 days ago • 64
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 12 days ago • 66
view article Article HuggingFace, IISc partner to supercharge model building on India's diverse languages 17 days ago • 14
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated 13 days ago • 110
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 24 days ago • 129
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 25 days ago • 65
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 214
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 26 days ago • 93