Shehan Munasinghe's picture

2 9 2

Shehan Munasinghe

shehan97

·

https://shehanmunasinghe.github.io/

AI & ML interests

Computer Vision, Multi-modal learning

Organizations

shehan97's activity

commented 2 papers 6 months ago

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published Nov 7, 2024 • 23 •

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published Nov 7, 2024 • 23 •

New activity in MBZUAI/swiftformer-xs over 1 year ago

Adding `safetensors` variant of this model

#1 opened almost 2 years ago by