new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Apr 23

Submitted by

Hennara

Kuwain 1.5B: An Arabic SLM via Language Injection

·
6 authors

7

Submitted by

iseesaw

TTRL: Test-Time Reinforcement Learning

·
10 authors

2

Submitted by

minghaowu

The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

·
10 authors

2

Submitted by

longlian

Describe Anything: Detailed Localized Image and Video Captioning

·
11 authors

3

Submitted by

longlian

Learning Adaptive Parallel Reasoning with Language Models

·
9 authors

2

Submitted by

chenjoya

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

·
6 authors

2

Submitted by

zhangysk

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

·
20 authors

2

Submitted by

Neph0s

BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation

·
6 authors

Submitted by

bongbohong

Efficient Pretraining Length Scaling

·
7 authors

2

Submitted by

yueyang2000

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

·
6 authors

2

Submitted by

Kaiyue

Personalized Text-to-Image Generation with Auto-Regressive Models

·
4 authors

3

Submitted by

thomasschmied

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

·
5 authors

2

Submitted by

Zilence006

Vidi: Large Multimodal Models for Video Understanding and Editing

·
22 authors

Submitted by

sayakpaul

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

·
9 authors

2

Submitted by

zhoutianyi

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

·
7 authors

4

Submitted by

theFoxofSky

RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

·
8 authors

Submitted by

ziqipang

MR. Video: "MapReduce" is the Principle for Long Video Understanding

·
2 authors

2

Submitted by

stneng

Progent: Programmable Privilege Control for LLM Agents

·
7 authors

2

Submitted by

QiYao-Wang

IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

·
23 authors

2

Submitted by

j-min

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

·
4 authors

2

Submitted by

yoyolicoris

DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions

·
7 authors

2