--- license: apache-2.0 datasets: - yzy666/SVBench language: - en metrics: - code_eval base_model: - OpenGVLab/InternVideo2_5_Chat_8B pipeline_tag: visual-question-answering --- # Model Card for StreamingChat This dataset card aims to provide a comprehensive overview of the StreamingChat model. For details, see our [Project](https://yzy-bupt.github.io/SVBench/), [Paper](https://arxiv.org/abs/2502.10810), [Dataset](https://huggingface.co/datasets/yzy666/SVBench) and [GitHub repository](https://github.com/yzy-bupt/SVBench). ## **Dataset Description** **StreamingChat** is a streaming video understanding model built upon [InternVideo2.5](https://huggingface.co/OpenGVLab/InternVideo2_5_Chat_8B). It utilizes Streaming video dialogue data, including temporal dialogue paths from the [SVBench](https://huggingface.co/datasets/yzy666/SVBench) training set. The model is fine-tuned using a static resolution strategy, enabling it to process several minutes of video at a rate of 1 FPS. Images are interleaved with language tokens, with each image comprising 16 tokens. This model aims to catalyze progress in streaming video understanding. ## **Uses** Download the StreamingChat model from Hugging Face: ```bash git clone https://huggingface.co/yzy666/StreamingChat_8B ``` Install Python dependencies: ```bash conda create -n StreamingChat -y python=3.9.21 conda activate StreamingChat conda install -y -c pytorch pytorch=2.5.1 torchvision=0.10.1 pip install transformers=4.37.2 opencv-python=4.11.0.84 imageio=2.37.0 decord=0.6.0 pip install flash-attn --no-build-isolation ``` Run the inference script directly: ```bash python demo.py ``` ## **Citation** If you find our data useful, please consider citing our work! ``` @article{yang2025svbench, title={SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding}, author={Yang, Zhenyu and Hu, Yuhang and Du, Zemin and Xue, Dizhan and Qian, Shengsheng and Wu, Jiahong and Yang, Fan and Dong, Weiming and Xu, Changsheng}, journal={arXiv preprint arXiv:2502.10810}, year={2025} } ```