DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Abstract
DeepTheorem enhances LLM theorem-proving through a large-scale natural language dataset and a tailored reinforcement learning strategy, achieving state-of-the-art results in informal theorem proving.
Theorem proving serves as a major testbed for evaluating complex reasoning abilities in large language models (LLMs). However, traditional automated theorem proving (ATP) approaches rely heavily on formal proof systems that poorly align with LLMs' strength derived from informal, natural language knowledge acquired during pre-training. In this work, we propose DeepTheorem, a comprehensive informal theorem-proving framework exploiting natural language to enhance LLM mathematical reasoning. DeepTheorem includes a large-scale benchmark dataset consisting of 121K high-quality IMO-level informal theorems and proofs spanning diverse mathematical domains, rigorously annotated for correctness, difficulty, and topic categories, accompanied by systematically constructed verifiable theorem variants. We devise a novel reinforcement learning strategy (RL-Zero) explicitly tailored to informal theorem proving, leveraging the verified theorem variants to incentivize robust mathematical inference. Additionally, we propose comprehensive outcome and process evaluation metrics examining proof correctness and the quality of reasoning steps. Extensive experimental analyses demonstrate DeepTheorem significantly improves LLM theorem-proving performance compared to existing datasets and supervised fine-tuning protocols, achieving state-of-the-art accuracy and reasoning quality. Our findings highlight DeepTheorem's potential to fundamentally advance automated informal theorem proving and mathematical exploration.
Community
๐จ Announcing DeepTheorem: Revolutionizing LLM Mathematical Reasoning! ๐
๐๐๐ปโ:
๐ Learning by exploration is the most important rationale that recent RL-zero training teaches us since self-exploration significantly boosts the utilization of LLM pre-training knowledge;
๐ง Since LLM is pre-trained with massive knowledge of mathematical theorems, can LLM learns theorem proving by self-exploration?
๐คฏ We show that using our high-quality deep theorem dataset with online RL learning is sufficient to activate LLM's theorem-proving ability. Our 7B model can outperform even advanced models like Gemini and Claude 3.5! More importantly, we don't need any theorem proof annotation, all you need is the truth value of the theorem itself.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning (2025)
- Leanabell-Prover: Posttraining Scaling in Formal Reasoning (2025)
- DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning (2025)
- Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification (2025)
- SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond (2025)
- General-Reasoner: Advancing LLM Reasoning Across All Domains (2025)
- AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper