Update README.md
Browse files
README.md
CHANGED
@@ -33,9 +33,25 @@ This repository contains the latest tactic generator model checkpoint from BFS-P
|
|
33 |
- Autoformalized NuminaMath-CoT dataset
|
34 |
|
35 |
## Performance
|
|
|
36 |
|
37 |
-
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## Usage
|
41 |
|
|
|
33 |
- Autoformalized NuminaMath-CoT dataset
|
34 |
|
35 |
## Performance
|
36 |
+
BFS-Prover achieves state-of-the-art performance on the MiniF2F test benchmark. Here's a detailed comparison:
|
37 |
|
38 |
+
### MiniF2F Test Benchmark Results
|
39 |
+
|
40 |
+
| Prover System | Search Method | Critic Model | Tactic Budget | Score |
|
41 |
+
|---------------|---------------|--------------|---------------|--------|
|
42 |
+
| BFS-Prover (Accumulative) | BFS | No | - | **72.95%** |
|
43 |
+
| BFS-Prover (This Work) | BFS | No | 2048×2×600 | **70.83% ± 0.89%** |
|
44 |
+
| HunyuanProver | BFS | Yes | 600×8×400 | 68.4% |
|
45 |
+
| InternLM2.5-StepProver | BFS | Yes | 256×32×600 | 65.9% |
|
46 |
+
| DeepSeek-Prover-V1.5* | MCTS | No | 32×16×400 | 63.5% |
|
47 |
+
|
48 |
+
*Note: DeepSeek-Prover-V1.5 uses whole-proof generation method; tactic budget decomposed for comparison.
|
49 |
+
|
50 |
+
### Key Advantages
|
51 |
+
- Achieves better performance without requiring a critic model
|
52 |
+
- Uses simpler search method (BFS) compared to MCTS
|
53 |
+
- Shows strong scaling with increased search passes
|
54 |
+
- Benefits from DPO training using compiler feedback
|
55 |
|
56 |
## Usage
|
57 |
|