Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
che111 's Collections
VideoForMed
Work for 3D Medical Vision
Med Multimodal Learning
Localize Viusal Understanding
Generative Model
Synthetic Data Learning
Explaniable, Fairness Work
General Multimodal Learning

VideoForMed

updated Sep 5, 2024
Upvote
-

  • Distilling Vision-Language Models on Millions of Videos

    Paper • 2401.06129 • Published Jan 11, 2024 • 17

  • Koala: Key frame-conditioned long video-LLM

    Paper • 2404.04346 • Published Apr 5, 2024 • 7

  • MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

    Paper • 2404.05726 • Published Apr 8, 2024 • 23

  • OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Paper • 2406.07471 • Published Jun 11, 2024 • 1

  • VISA: Reasoning Video Object Segmentation via Large Language Models

    Paper • 2407.11325 • Published Jul 16, 2024 • 1

  • SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

    Paper • 2407.15841 • Published Jul 22, 2024 • 41

  • VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

    Paper • 2409.01071 • Published Sep 2, 2024 • 28

  • OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

    Paper • 2409.01199 • Published Sep 2, 2024 • 14
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs