SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model
Abstract
The introduction of SridBench, a benchmark for scientific figure generation, reveals that current top-tier models, such as GPT-4o-image, fall short in semantic and structural accuracy compared to human performance, underscoring the need for more advanced multimodal reasoning-driven visual generation.
Recent years have seen rapid advances in AI-driven image generation. Early diffusion models emphasized perceptual quality, while newer multimodal models like GPT-4o-image integrate high-level reasoning, improving semantic understanding and structural composition. Scientific illustration generation exemplifies this evolution: unlike general image synthesis, it demands accurate interpretation of technical content and transformation of abstract ideas into clear, standardized visuals. This task is significantly more knowledge-intensive and laborious, often requiring hours of manual work and specialized tools. Automating it in a controllable, intelligent manner would provide substantial practical value. Yet, no benchmark currently exists to evaluate AI on this front. To fill this gap, we introduce SridBench, the first benchmark for scientific figure generation. It comprises 1,120 instances curated from leading scientific papers across 13 natural and computer science disciplines, collected via human experts and MLLMs. Each sample is evaluated along six dimensions, including semantic fidelity and structural accuracy. Experimental results reveal that even top-tier models like GPT-4o-image lag behind human performance, with common issues in text/visual clarity and scientific correctness. These findings highlight the need for more advanced reasoning-driven visual generation capabilities.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing (2025)
- MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models (2025)
- Preliminary Explorations with GPT-4o(mni) Native Image Generation (2025)
- Step1X-Edit: A Practical Framework for General Image Editing (2025)
- DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? (2025)
- KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models (2025)
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper