Papers
arxiv:2505.20650

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Published on May 27
· Submitted by YanAdjeNole on May 28
Authors:
,
,
,
,
,
,
,

Abstract

FinTagging evaluates LLMs for structured information extraction and semantic alignment in XBRL financial reporting, revealing challenges in fine-grained concept alignment.

AI-generated summary

We introduce FinTagging, the first full-scope, table-aware XBRL benchmark designed to evaluate the structured information extraction and semantic alignment capabilities of large language models (LLMs) in the context of XBRL-based financial reporting. Unlike prior benchmarks that oversimplify XBRL tagging as flat multi-class classification and focus solely on narrative text, FinTagging decomposes the XBRL tagging problem into two subtasks: FinNI for financial entity extraction and FinCL for taxonomy-driven concept alignment. It requires models to jointly extract facts and align them with the full 10k+ US-GAAP taxonomy across both unstructured text and structured tables, enabling realistic, fine-grained evaluation. We assess a diverse set of LLMs under zero-shot settings, systematically analyzing their performance on both subtasks and overall tagging accuracy. Our results reveal that, while LLMs demonstrate strong generalization in information extraction, they struggle with fine-grained concept alignment, particularly in disambiguating closely related taxonomy entries. These findings highlight the limitations of existing LLMs in fully automating XBRL tagging and underscore the need for improved semantic reasoning and schema-aware modeling to meet the demands of accurate financial disclosure. Code is available at our GitHub repository and data is at our Hugging Face repository.

Community

Paper author Paper submitter

Automated tagging is essential for converting financial disclosures into machine-readable data by linking numerical facts to standardized meanings. Despite the widespread adoption of the XBRL format, accurately tagging over 2,000 facts per report to more than 10,000 taxonomy concepts remains challenging, with thousands of errors identified annually. In this work, we introduce FinTagging, the first benchmark tailored for evaluating large language models on full-scope XBRL tagging across both text and tables. Unlike prior benchmarks that simplify tagging as flat classification over limited concepts, FinTagging requires models to jointly extract structured financial facts and align them with a comprehensive taxonomy. We evaluate ten state-of-the-art models in a zero-shot setting using two new datasets, FinNI-eval for numerical fact extraction and FinCL-eval for concept linking. Our results show that while some models perform well on extraction, they struggle with precise semantic alignment, especially across low-frequency concepts. A unified evaluation framework further reveals that without structured assessment, models often produce invalid outputs. These findings highlight the limitations of general LLMs in handling complex financial tagging and underscore the need for domain-specific adaptation, with FinTagging providing a foundation for future research in financial document understanding.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.20650 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.20650 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.20650 in a Space README.md to link it from this page.

Collections including this paper 1