arxiv:2505.20650

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Published on May 27

· Submitted by

YanAdjeNole on May 28

Upvote

Authors:

Yan Wang ,

Xueqing Peng ,

Abstract

FinTagging evaluates LLMs for structured information extraction and semantic alignment in XBRL financial reporting, revealing challenges in fine-grained concept alignment.

AI-generated summary

We introduce FinTagging, the first full-scope, table-aware XBRL benchmark designed to evaluate the structured information extraction and semantic alignment capabilities of large language models (LLMs) in the context of XBRL-based financial reporting. Unlike prior benchmarks that oversimplify XBRL tagging as flat multi-class classification and focus solely on narrative text, FinTagging decomposes the XBRL tagging problem into two subtasks: FinNI for financial entity extraction and FinCL for taxonomy-driven concept alignment. It requires models to jointly extract facts and align them with the full 10k+ US-GAAP taxonomy across both unstructured text and structured tables, enabling realistic, fine-grained evaluation. We assess a diverse set of LLMs under zero-shot settings, systematically analyzing their performance on both subtasks and overall tagging accuracy. Our results reveal that, while LLMs demonstrate strong generalization in information extraction, they struggle with fine-grained concept alignment, particularly in disambiguating closely related taxonomy entries. These findings highlight the limitations of existing LLMs in fully automating XBRL tagging and underscore the need for improved semantic reasoning and schema-aware modeling to meet the demands of accurate financial disclosure. Code is available at our GitHub repository and data is at our Hugging Face repository.

View arXiv page View PDF GitHub repository Add to collection

Community

YanAdjeNole

Paper author Paper submitter 6 days ago

Automated tagging is essential for converting financial disclosures into machine-readable data by linking numerical facts to standardized meanings. Despite the widespread adoption of the XBRL format, accurately tagging over 2,000 facts per report to more than 10,000 taxonomy concepts remains challenging, with thousands of errors identified annually. In this work, we introduce FinTagging, the first benchmark tailored for evaluating large language models on full-scope XBRL tagging across both text and tables. Unlike prior benchmarks that simplify tagging as flat classification over limited concepts, FinTagging requires models to jointly extract structured financial facts and align them with a comprehensive taxonomy. We evaluate ten state-of-the-art models in a zero-shot setting using two new datasets, FinNI-eval for numerical fact extraction and FinCL-eval for concept linking. Our results show that while some models perform well on extraction, they struggle with precise semantic alignment, especially across low-frequency concepts. A unified evaluation framework further reveals that without structured assessment, models often produce invalid outputs. These findings highlight the limitations of general LLMs in handling complex financial tagging and underscore the need for domain-specific adaptation, with FinTagging providing a foundation for future research in financial document understanding.