taesiri's picture

taesiri PRO

taesiri

·

https://taesiri.ai/

AI & ML interests

AGI ... one linear layer at a time

Recent Activity

updated a dataset 9 minutes ago

taesiri/GBPro-RAW

published a dataset 11 minutes ago

taesiri/GBPro-RAW

new activity 38 minutes ago

taesiri/Gemini-Text-based-Image-Editor:I thought this space has no limit?

View all activity

Organizations

taesiri's activity

upvoted a paper about 21 hours ago

A Survey of Interactive Generative Video

Paper • 2504.21853 • Published 3 days ago • 37

upvoted a paper 1 day ago

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Paper • 2505.00703 • Published 2 days ago • 25

upvoted a paper 2 days ago

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published 4 days ago • 19

upvoted 4 papers 3 days ago

Taming the Titans: A Survey of Efficient LLM Inference Serving

Paper • 2504.19720 • Published 5 days ago • 9

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published 4 days ago • 32

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning

Paper • 2504.18904 • Published 7 days ago • 7

Phi-4-reasoning Technical Report

Paper • 2504.21318 • Published 4 days ago • 24

upvoted 7 papers 4 days ago

The Leaderboard Illusion

Paper • 2504.20879 • Published 4 days ago • 54

YoChameleon: Personalized Vision and Language Generation

Paper • 2504.20998 • Published 4 days ago • 10

TesserAct: Learning 4D Embodied World Models

Paper • 2504.20995 • Published 4 days ago • 15

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published 4 days ago • 74

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published 4 days ago • 43

RepText: Rendering Visual Text via Replicating

Paper • 2504.19724 • Published 5 days ago • 29

Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

Paper • 2504.18589 • Published 9 days ago • 9

upvoted 6 papers 5 days ago

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

Paper • 2504.19838 • Published 5 days ago • 21

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published 12 days ago • 149

Kimi-Audio Technical Report

Paper • 2504.18425 • Published 8 days ago • 13

DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency

Paper • 2504.12080 • Published 17 days ago • 7

Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark

Paper • 2504.16427 • Published 11 days ago • 16

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Paper • 2504.16656 • Published 10 days ago • 50