Kyu Song

kyunocap

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

nari-labs/Dia-1.6B

upvoted a paper 11 days ago

Towards Understanding Camera Motions in Any Video

liked a Space 11 days ago

enzostvs/deepsite

View all activity

Organizations

None yet

kyunocap's activity

liked a model 3 days ago

nari-labs/Dia-1.6B

Text-to-Speech • Updated 4 days ago • 144k • • 1.99k

upvoted a paper 11 days ago

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published 18 days ago • 155

liked a Space 11 days ago

6.24k

DeepSite

🐳

Generate any application with DeepSeek

upvoted a paper 14 days ago

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published 16 days ago • 106

liked a Space 17 days ago

1.25k

Dia 1.6B

👯

Generate realistic dialogue from a script, using Dia!

upvoted a paper 20 days ago

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published 23 days ago • 48

upvoted a paper 22 days ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published 25 days ago • 21

upvoted a paper 24 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 25 days ago • 255

upvoted 2 papers about 2 months ago

Automated Movie Generation via Multi-Agent CoT Planning

Paper • 2503.07314 • Published Mar 10 • 45

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5 • 232

upvoted 2 papers 2 months ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 87

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 73

liked 2 models 2 months ago

Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • Updated Apr 6 • 3.29M • • 877

Qwen/Qwen2.5-VL-72B-Instruct

Image-Text-to-Text • Updated Mar 23 • 185k • • 442

upvoted a paper 3 months ago

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16 • 60

liked a Space 3 months ago

2.57k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted 2 papers 3 months ago

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

Paper • 2502.04363 • Published Feb 5 • 12

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published Feb 11 • 36

liked a model 3 months ago

DAMO-NLP-SG/VideoLLaMA3-7B

Visual Question Answering • Updated Mar 20 • 63.8k • 54

upvoted a paper 3 months ago

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6 • 38