ByteDance-Seed
/

BFS-Prover

Text Generation

theorem-proving

formal-mathematics

text-generation-inference

Model card Files Files and versions Community

RanXinByteDance commited on Feb 24

Commit

b7d5988

·

verified ·

1 Parent(s): 0d4c8a8

Update README.md

Files changed (1) hide show

README.md +18 -2

README.md CHANGED Viewed

@@ -33,9 +33,25 @@ This repository contains the latest tactic generator model checkpoint from BFS-P
   - Autoformalized NuminaMath-CoT dataset
 ## Performance
-When integrated into the full BFS-Prover system, this tactic generator model achieved
-72.54% success rate on MiniF2F test set accumulatively.
 ## Usage

   - Autoformalized NuminaMath-CoT dataset
 ## Performance
+BFS-Prover achieves state-of-the-art performance on the MiniF2F test benchmark. Here's a detailed comparison:
+### MiniF2F Test Benchmark Results
+| Prover System | Search Method | Critic Model | Tactic Budget | Score |
+|---------------|---------------|--------------|---------------|--------|
+| BFS-Prover (Accumulative) | BFS | No | - | **72.95%** |
+| BFS-Prover (This Work) | BFS | No | 2048×2×600 | **70.83% ± 0.89%** |
+| HunyuanProver | BFS | Yes | 600×8×400 | 68.4% |
+| InternLM2.5-StepProver | BFS | Yes | 256×32×600 | 65.9% |
+| DeepSeek-Prover-V1.5* | MCTS | No | 32×16×400 | 63.5% |
+*Note: DeepSeek-Prover-V1.5 uses whole-proof generation method; tactic budget decomposed for comparison.
+### Key Advantages
+- Achieves better performance without requiring a critic model
+- Uses simpler search method (BFS) compared to MCTS
+- Shows strong scaling with increased search passes
+- Benefits from DPO training using compiler feedback
 ## Usage