FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow
Abstract
FullFront is a benchmark evaluating Multimodal Large Language Models across conceptualization, comprehension, and implementation phases in front-end engineering.
Front-end engineering involves a complex workflow where engineers conceptualize designs, translate them into code, and iteratively refine the implementation. While recent benchmarks primarily focus on converting visual designs to code, we present FullFront, a benchmark designed to evaluate Multimodal Large Language Models (MLLMs) across the full front-end development pipeline. FullFront assesses three fundamental tasks that map directly to the front-end engineering pipeline: Webpage Design (conceptualization phase), Webpage Perception QA (comprehension of visual organization and elements), and Webpage Code Generation (implementation phase). Unlike existing benchmarks that use either scraped websites with bloated code or oversimplified LLM-generated HTML, FullFront employs a novel, two-stage process to transform real-world webpages into clean, standardized HTML while maintaining diverse visual designs and avoiding copyright issues. Extensive testing of state-of-the-art MLLMs reveals significant limitations in page perception, code generation (particularly for image handling and layout), and interaction implementation. Our results quantitatively demonstrate performance disparities across models and tasks, and highlight a substantial gap between current MLLM capabilities and human expert performance in front-end engineering. The FullFront benchmark and code are available in https://github.com/Mikivishy/FullFront.
Community
FullFront is a comprehensive benchmark for evaluating Multimodal Large Language Models (MLLMs) across the entire front-end engineering workflow.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LLM Code Customization with Visual Results: A Benchmark on TikZ (2025)
- UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs (2025)
- DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? (2025)
- CompBench: Benchmarking Complex Instruction-guided Image Editing (2025)
- Multimodal graph representation learning for website generation based on visual sketch (2025)
- Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning (2025)
- Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper