Phys

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

wchai authored a paper 4 days ago

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

sainx authored a paper 23 days ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

wchai authored a paper about 1 month ago

An Empirical Study of GPT-4o Image Generation Capabilities

View all activity

Phys111111's activity

wchai

authored a paper 4 days ago

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

Paper • 2505.01583 • Published 7 days ago • 9

sainx

authored a paper 23 days ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published 25 days ago • 21

wchai

authored a paper about 1 month ago

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 62

sainx

authored a paper about 1 month ago

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published Apr 1 • 29

Jialuo21

published a dataset about 2 months ago

Phys111111/combined_data

Viewer • Updated Oct 6, 2024 • 1.93k • 56

wchai

authored a paper 2 months ago

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Paper • 2502.20172 • Published Feb 27 • 28

sainx

authored a paper 3 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 121

sainx

authored a paper 4 months ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 72

Fiaa

authored a paper 4 months ago

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9 • 15

sainx

authored a paper 5 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

wchai

authored 2 papers 6 months ago

PAD: Personalized Alignment at Decoding-Time

Paper • 2410.04070 • Published Oct 5, 2024 • 1

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Paper • 2411.11922 • Published Nov 18, 2024 • 19

wchai

authored a paper 7 months ago

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4, 2024 • 6

Jialuo21

updated a dataset 7 months ago

Phys111111/combined_data

Viewer • Updated Oct 6, 2024 • 1.93k • 56

Jialuo21

updated a dataset 8 months ago

Phys111111/data_buoyancy

Viewer • Updated Sep 23, 2024 • 114 • 21

wchai

authored 2 papers 8 months ago

Chasing Consistency in Text-to-3D Generation from a Single Image

Paper • 2309.03599 • Published Sep 7, 2023 • 1

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

Paper • 2407.13930 • Published Jul 18, 2024

Jialuo21

updated a model 8 months ago

Phys111111/temp

Updated Sep 6, 2024

sainx

authored a paper 11 months ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 61

Fiaa

authored a paper 11 months ago

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Paper • 2406.07546 • Published Jun 11, 2024 • 9

AI & ML interests

Recent Activity

Team members 6

Phys111111's activity