RanXinByteDance commited on
Commit
b7d5988
·
verified ·
1 Parent(s): 0d4c8a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -2
README.md CHANGED
@@ -33,9 +33,25 @@ This repository contains the latest tactic generator model checkpoint from BFS-P
33
  - Autoformalized NuminaMath-CoT dataset
34
 
35
  ## Performance
 
36
 
37
- When integrated into the full BFS-Prover system, this tactic generator model achieved
38
- 72.54% success rate on MiniF2F test set accumulatively.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Usage
41
 
 
33
  - Autoformalized NuminaMath-CoT dataset
34
 
35
  ## Performance
36
+ BFS-Prover achieves state-of-the-art performance on the MiniF2F test benchmark. Here's a detailed comparison:
37
 
38
+ ### MiniF2F Test Benchmark Results
39
+
40
+ | Prover System | Search Method | Critic Model | Tactic Budget | Score |
41
+ |---------------|---------------|--------------|---------------|--------|
42
+ | BFS-Prover (Accumulative) | BFS | No | - | **72.95%** |
43
+ | BFS-Prover (This Work) | BFS | No | 2048×2×600 | **70.83% ± 0.89%** |
44
+ | HunyuanProver | BFS | Yes | 600×8×400 | 68.4% |
45
+ | InternLM2.5-StepProver | BFS | Yes | 256×32×600 | 65.9% |
46
+ | DeepSeek-Prover-V1.5* | MCTS | No | 32×16×400 | 63.5% |
47
+
48
+ *Note: DeepSeek-Prover-V1.5 uses whole-proof generation method; tactic budget decomposed for comparison.
49
+
50
+ ### Key Advantages
51
+ - Achieves better performance without requiring a critic model
52
+ - Uses simpler search method (BFS) compared to MCTS
53
+ - Shows strong scaling with increased search passes
54
+ - Benefits from DPO training using compiler feedback
55
 
56
  ## Usage
57