Aritra Roy Gosthipaty's picture

Aritra Roy Gosthipaty PRO

ariG23498

·

https://arig23498.github.io/

AI & ML interests

Deep Representation Learning

Recent Activity

upvoted an article 4 days ago

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

upvoted an article 4 days ago

Open R1: Update #3

published an article 4 days ago

Benchmarking Assisted Generation with Gemma 3 and Qwen 2.5: A Code-First Guide

View all activity

Organizations

ariG23498's activity

upvoted 2 articles 4 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

5 days ago

• 269

Article

Open R1: Update #3

By

and 9 others •

5 days ago

• 225

upvoted 2 collections 4 days ago

Gemma 3

4 items • Updated 4 days ago • 14

Gemma 3 Release

9 items • Updated 3 days ago • 251

upvoted a collection 10 days ago

Shot categorizer

Fine-tune of Florence-2 to generate shot categories, useful for data curation. Code: https://github.com/huggingface/movie-shot-categorizer. • 3 items • Updated 10 days ago • 2

upvoted a collection 12 days ago

C4AI Aya Vision

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 12 days ago • 64

upvoted an article 12 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

13 days ago

• 66

upvoted an article 16 days ago

Article

Common AI Model Formats

By

•

17 days ago

• 31

upvoted 2 articles 17 days ago

Article

SigLIP 2: A better multilingual vision language encoder

24 days ago

• 136

Article

HuggingFace, IISc partner to supercharge model building on India's diverse languages

18 days ago

• 14

upvoted a paper 17 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 111

upvoted a collection 17 days ago

Phi-4

Phi-4 family of small language and multi-modal models. • 7 items • Updated 13 days ago • 110

upvoted an article 20 days ago

Article

Remote VAEs for decoding with HF endpoints 🤗

21 days ago

• 36

upvoted a collection 22 days ago

SigLIP 2

OpenCLIP and timm SigLIP 2 models • 45 items • Updated 23 days ago • 12

upvoted a collection 23 days ago

SigLIP2

36 items • Updated 4 days ago • 64

upvoted a paper 23 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 24 days ago • 129

upvoted an article 24 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

25 days ago

• 207

upvoted 2 articles 25 days ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

26 days ago

• 65

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

By

•

Jul 5, 2024

• 214

upvoted an article 26 days ago

Article

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

27 days ago

• 93