Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Abstract

The synergy between deep learning models and traditional automation tools plays a pivotal role in developing robust neural theorem provers (NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when the model explicitly calls the method, or only at a single granularity level, failing to fully exploit the power of built-in tactics and off-the-shelf automated theorem provers. In this work, we propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency through equipping proof-generation LLMs with automation methods in different granularities via fine-grained structure analysis of model-generated proof proposals. Furthermore, ProofAug serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version), setting a new SOTA for Isabelle NTPs with a total sample budget of only 2100. We also implement a Lean version of ProofAug that can improve the performance of Kimina-Prover-Preview-Distill-1.5B by 6.1%. Our code is available at this https://github.com/haoxiongliu/ProofAug.

Method

The following flowchart illustrates our method. It improves the Draft, Sketch, and Prove method by applying automation tools at different granularity levels through fine-grained proof structure analysis.

Each box in the flowchart corresponds to a proof step. (a) The initial proof encounters an error when proving Conjecture 1-0. (b) We replace the original proof of Conjecture 1-0 with a sorry and continue, until a syntactic error occurs at Conjecture 1-1. (c) The proof of Conjecture 1 is replaced by sorry and we obtain a semi-proof with two sorrys. (d) Automation tools are called to prove Conjecture 0 and Conjecture 1, but failing at the latter one. (e) The efficient-recursive-proving (ERP) module sends a completion request at Conjecture 1. (f) The generated ERP proof still fails at Conjecture 1. (g) We finally resort to a more coarse level of proof and successfully find a proof with automation tools. Refer to Algorithm 1 for a pseudo-code description of ProofAug.

Main Results

Table 2: Performance on miniF2F-test

**Comparison of different methods on MiniF2F-test.** For BFS methods, the sample budget N × S × T corresponds to N attempts of S expansion with T iterations. As to tree-search methods, it becomes N × T, with the same meanings for the symbols. Results with a `†` are obtained using a mixed strategy.
Method	Model	Sample Budget	miniF2F-test
Methods using Isabelle
Draft, Sketch, and Prove	CodeX	100	39.3%
Subgoal-XL	Fine-tuned Llama-8B	64	39.3%
		16384†	56.1%
LEGO-Prover	mixed GPTs	100	50.0%
Lyra	GPT-4	100	47.1%
		200	51.2%
POETRY	Fine-tuned ProofGPT (1.3B)	1 × 32 × 128	42.2%
Our Experiments (using Isabelle)
DSP baseline	deepseek-math-7b-base	1	28.7%
		10	40.6%
		100	49.2%
ProofAug	deepseek-math-7b-base	1	36.5% (+7.8%)
		10	44.7% (+4.1%)
		100	52.5% (+3.3%)
ProofAug (0-shot)	deepseek-math-7b-base	500	54.5%
ProofAug (0-shot) w/ ERP	deepseek-math-7b-base	500	56.1%
ProofAug + Mixed Strategy	deepseek-math-7b-base	1400†	61.9%
ProofAug + Mix. + Dataset Curation	deepseek-math-7b-base	2100†	66.0%
Methods using Lean
HTPS	Evariste (600M)	64 × 5000	41.0%
RMaxTS	DeepSeek-Prover-V1.5-RL (7B)	32 × 6400†	63.5%
Best-First Search + CG	InternLM2.5-StepProver (7B)	256 × 32 × 600	65.9%

Citation

Please cite the paper and star this repo if you use ProofAug. Thanks!

@article{liu2025efficientneuraltheoremproving,
    title={Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis}, 
    author={Haoxiong Liu and Jiacheng Sun and Zhenguo Li and Andrew C Yao},
    year={2025},
    eprint={2501.18310},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2501.18310}, 
}