Automated theorem proving: Difference between revisions

Jump to navigation Jump to search
imported>OAbot
m Open access bot: url-access updated in citation with #oabot.
 
imported>OAbot
m Open access bot: doi updated in citation with #oabot.
 
Line 3: Line 3:


== Logical foundations ==
== Logical foundations ==
While the roots of formalized [[Logicism|logic]] go back to [[Aristotelian logic|Aristotle]], the end of the 19th and early 20th centuries saw the development of modern logic and formalized mathematics. [[Gottlob Frege|Frege]]'s ''[[Begriffsschrift]]'' (1879) introduced both a complete [[propositional logic|propositional calculus]] and what is essentially modern [[predicate logic]].<ref>{{cite book|last=Frege|first=Gottlob|title=Begriffsschrift|year=1879|publisher=Verlag Louis Neuert|url=http://gallica.bnf.fr/ark:/12148/bpt6k65658c}}</ref> His ''[[The Foundations of Arithmetic|Foundations of Arithmetic]]'', published in 1884,<ref>{{cite book|last=Frege|first=Gottlob|title=Die Grundlagen der Arithmetik|year=1884|publisher=Wilhelm Kobner|location=Breslau|url=http://www.ac-nancy-metz.fr/enseign/philo/textesph/Frege.pdf|access-date=2012-09-02|archive-url=https://web.archive.org/web/20070926172317/http://www.ac-nancy-metz.fr/enseign/philo/textesph/Frege.pdf|archive-date=2007-09-26|url-status=dead}}</ref> expressed (parts of) mathematics in formal logic. This approach was continued by [[Bertrand Russell|Russell]] and [[Alfred North Whitehead|Whitehead]] in their influential ''[[Principia Mathematica]]'', first published 1910–1913,<ref>{{cite book |author=Russell |first1=Bertrand |url=https://archive.org/details/cu31924001575244 |title=Principia Mathematica |last2=Whitehead |first2=Alfred North |publisher=Cambridge University Press |year=1910–1913 |edition=1st}}</ref> and with a revised second edition in 1927.<ref>{{cite book |author=Russell |first1=Bertrand |url=https://archive.org/details/in.ernet.dli.2015.221192 |title=Principia Mathematica |last2=Whitehead |first2=Alfred North |publisher=Cambridge University Press |year=1927 |edition=2nd |language=en}}</ref> Russell and Whitehead thought they could derive all mathematical truth using [[axiom]]s and [[inference rule]]s of formal logic, in principle opening up the process to automation. In 1920, [[Thoralf Skolem]] simplified a previous result by [[Leopold Löwenheim]], leading to the [[Löwenheim–Skolem theorem]] and, in 1930, to the notion of a [[Herbrand universe]] and a [[Herbrand interpretation]] that allowed [[satisfiability|(un)satisfiability]] of first-order formulas (and hence the [[Validity (logic)|validity]] of a theorem) to be reduced to (potentially infinitely many) propositional satisfiability problems.<ref>{{cite thesis |first=J. |last=Herbrand |title=Recherches sur la théorie de la démonstration |date=1930 |type=PhD |publisher=University of Paris |url=https://eudml.org/doc/192791 |language=fr}}</ref>
While the roots of formalized [[Logicism|logic]] go back to [[Aristotelian logic|Aristotle]], the end of the 19th and early 20th centuries saw the development of modern logic and formalized mathematics. [[Gottlob Frege|Frege]]'s ''[[Begriffsschrift]]'' (1879) introduced both a complete [[propositional logic|propositional calculus]] and what is essentially modern [[predicate logic]].<ref>{{cite book|last=Frege|first=Gottlob|title=Begriffsschrift|year=1879|publisher=Verlag Louis Neuert|url=https://gallica.bnf.fr/ark:/12148/bpt6k65658c}}</ref> His ''[[The Foundations of Arithmetic|Foundations of Arithmetic]]'', published in 1884,<ref>{{cite book|last=Frege|first=Gottlob|title=Die Grundlagen der Arithmetik|year=1884|publisher=Wilhelm Kobner|location=Breslau|url=http://www.ac-nancy-metz.fr/enseign/philo/textesph/Frege.pdf|access-date=2012-09-02|archive-url=https://web.archive.org/web/20070926172317/http://www.ac-nancy-metz.fr/enseign/philo/textesph/Frege.pdf|archive-date=2007-09-26|url-status=dead}}</ref> expressed (parts of) mathematics in formal logic. This approach was continued by [[Bertrand Russell|Russell]] and [[Alfred North Whitehead|Whitehead]] in their influential ''[[Principia Mathematica]]'', first published 1910–1913,<ref>{{cite book |author=Russell |first1=Bertrand |url=https://archive.org/details/cu31924001575244 |title=Principia Mathematica |last2=Whitehead |first2=Alfred North |publisher=Cambridge University Press |year=1910–1913 |edition=1st}}</ref> and with a revised second edition in 1927.<ref>{{cite book |author=Russell |first1=Bertrand |url=https://archive.org/details/in.ernet.dli.2015.221192 |title=Principia Mathematica |last2=Whitehead |first2=Alfred North |publisher=Cambridge University Press |year=1927 |edition=2nd |language=en}}</ref> Russell and Whitehead thought they could derive all mathematical truth using [[axiom]]s and [[inference rule]]s of formal logic, in principle opening up the process to automation.<ref>{{Cite book |last=Russell |first=Bertrand |url=https://philarchive.org/rec/RUSTPO-77 |title=Principles of Mathematics |date=1931 |publisher=W.W. Norton & Company |location=New York}}</ref> In 1920, [[Thoralf Skolem]] simplified a previous result by [[Leopold Löwenheim]], leading to the [[Löwenheim–Skolem theorem]] and, in 1930, to the notion of a [[Herbrand universe]] and a [[Herbrand interpretation]] that allowed [[satisfiability|(un)satisfiability]] of first-order formulas (and hence the [[Validity (logic)|validity]] of a theorem) to be reduced to (potentially infinitely many) propositional satisfiability problems.<ref>{{cite thesis |first=J. |last=Herbrand |title=Recherches sur la théorie de la démonstration |date=1930 |type=PhD |publisher=University of Paris |url=https://eudml.org/doc/192791 |language=fr}}</ref>


In 1929, [[Mojżesz Presburger]] showed that the [[first-order theory]] of the [[natural numbers]] with addition and equality (now called [[Presburger arithmetic]] in his honor) is [[Decidability (logic)|decidable]] and gave an algorithm that could determine if a given [[sentence (logic)|sentence]] in the [[language (logic)|language]] was true or false.<ref>{{cite journal|last=Presburger|first=Mojżesz|title=Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt|journal=Comptes Rendus du I Congrès de Mathématiciens des Pays Slaves|year=1929|pages=92–101|location=Warszawa}}</ref><ref name=Davis2001>{{Cite book
In 1929, [[Mojżesz Presburger]] showed that the [[first-order theory]] of the [[natural numbers]] with addition and equality (now called [[Presburger arithmetic]] in his honor) is [[Decidability (logic)|decidable]] and gave an algorithm that could determine if a given [[sentence (logic)|sentence]] in the [[language (logic)|language]] was true or false.<ref>{{cite journal|last=Presburger|first=Mojżesz|title=Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt|journal=Comptes Rendus du I Congrès de Mathématiciens des Pays Slaves|year=1929|pages=92–101|location=Warszawa}}</ref><ref name=Davis2001>{{Cite book
Line 19: Line 19:
}}</ref>
}}</ref>


However, shortly after this positive result, [[Kurt Gödel]] published ''[[On Formally Undecidable Propositions of Principia Mathematica and Related Systems]]'' (1931), showing that in any sufficiently strong axiomatic system, there are true statements that cannot be proved in the system. This topic was further developed in the 1930s by [[Alonzo Church]] and [[Alan Turing]], who on the one hand gave two independent but equivalent definitions of [[computability]], and on the other gave concrete examples of [[Undecidable problem|undecidable question]]s.
However, shortly after this positive result, [[Kurt Gödel]] published ''[[On Formally Undecidable Propositions of Principia Mathematica and Related Systems]]'' (1931), showing that in any sufficiently strong axiomatic system, there are true statements that cannot be proved in the system.<ref>{{Cite book |last1=Boolos |first1=George S. |url=https://www.cambridge.org/core/books/computability-and-logic/2431FF2C369748EC0EFFF45221D38392 |title=Computability and Logic |last2=Burgess |first2=John P. |last3=Jeffrey |first3=Richard C. |date=2002 |publisher=Cambridge University Press |edition=4 |location=Cambridge |doi=10.1017/CBO9781139164931 |isbn=978-0-521-80975-7 }}</ref> This topic was further developed in the 1930s by [[Alonzo Church]] and [[Alan Turing]], who on the one hand gave two independent but equivalent definitions of [[computability]], and on the other gave concrete examples of [[Undecidable problem|undecidable question]]s.<ref>{{Cite journal |last=Chowdhary |first=K.R. |date=2025 |title=Theory of Computation |url=https://link.springer.com/book/10.1007/978-981-97-6234-7 |journal=SpringerLink |language=en |doi=10.1007/978-981-97-6234-7 |isbn=978-981-97-6233-0 |doi-access=free }}</ref>


== First implementations ==
== First implementations ==
Line 28: Line 28:


== Decidability of the problem ==
== Decidability of the problem ==
{{Unreferenced section|date=April 2010}}
Depending on the underlying logic, the problem of deciding the validity of a formula varies from trivial to impossible. For the common case of [[propositional logic]], the problem is decidable but [[co-NP-complete]], and hence only [[exponential time|exponential-time]] algorithms are believed to exist for general proof tasks.<ref>{{Cite web |title=Computational Complexity: A Modern Approach / Sanjeev Arora and Boaz Barak |url=https://theory.cs.princeton.edu/complexity/ |access-date=2026-01-25 |website=theory.cs.princeton.edu}}</ref> For a [[first-order logic|first-order predicate calculus]], [[Gödel's completeness theorem]] states that the theorems (provable statements) are exactly the semantically valid [[well-formed formula]]s, so the valid formulas are [[computably enumerable]]: given unbounded resources, any valid formula can eventually be proven. However, ''invalid'' formulas (those that are ''not'' entailed by a given theory), cannot always be recognized.<ref>{{Cite book |last=Kleene |first=Stephen Cole |url=https://philpapers.org/rec/KLEML |title=Mathematical Logic |date=1967 |publisher=Dover Publications |location=Mineola, N.Y.}}</ref>
Depending on the underlying logic, the problem of deciding the validity of a formula varies from trivial to impossible. For the common case of [[propositional logic]], the problem is decidable but [[co-NP-complete]], and hence only [[exponential time|exponential-time]] algorithms are believed to exist for general proof tasks. For a [[first-order logic|first-order predicate calculus]], [[Gödel's completeness theorem]] states that the theorems (provable statements) are exactly the semantically valid [[well-formed formula]]s, so the valid formulas are [[computably enumerable]]: given unbounded resources, any valid formula can eventually be proven. However, ''invalid'' formulas (those that are ''not'' entailed by a given theory), cannot always be recognized.


The above applies to first-order theories, such as [[Peano axioms|Peano arithmetic]]. However, for a specific model that may be described by a first-order theory, some statements may be true but undecidable in the theory used to describe the model. For example, by [[Gödel's incompleteness theorem]], we know that any consistent theory whose axioms are true for the natural numbers cannot prove all first-order statements true for the natural numbers, even if the list of axioms is allowed to be infinite enumerable. It follows that an automated theorem prover will fail to terminate while searching for a proof precisely when the statement being investigated is undecidable in the theory being used, even if it is true in the model of interest. Despite this theoretical limit, in practice, theorem provers can solve many hard problems, even in models that are not fully described by any first-order theory (such as the [[integer]]s).
The above applies to first-order theories, such as [[Peano axioms|Peano arithmetic]]. However, for a specific model that may be described by a first-order theory, some statements may be true but undecidable in the theory used to describe the model. For example, by [[Gödel's incompleteness theorem]], we know that any consistent theory whose axioms are true for the natural numbers cannot prove all first-order statements true for the natural numbers, even if the list of axioms is allowed to be infinite enumerable.<ref>{{Citation |last=Raatikainen |first=Panu |title=Gödel's Incompleteness Theorems |date=2026 |encyclopedia=The Stanford Encyclopedia of Philosophy |editor-last=Zalta |editor-first=Edward N. |url=https://plato.stanford.edu/archives/spr2026/entries/goedel-incompleteness/ |access-date=2026-01-25 |edition=Spring 2026 |publisher=Metaphysics Research Lab, Stanford University |editor2-last=Nodelman |editor2-first=Uri}}</ref> It follows that an automated theorem prover will fail to terminate while searching for a proof precisely when the statement being investigated is undecidable in the theory being used, even if it is true in the model of interest. Despite this theoretical limit, in practice, theorem provers can solve many hard problems, even in models that are not fully described by any first-order theory (such as the [[integer]]s).


== Related problems ==
== Related problems ==
Line 46: Line 45:


== Applications ==
== Applications ==
Commercial use of automated theorem proving is mostly concentrated in [[integrated circuit design]] and verification. Since the [[Pentium FDIV bug]], the complicated [[floating point unit]]s of modern microprocessors have been designed with extra scrutiny. [[AMD]], [[Intel]] and others use automated theorem proving to verify that division and other operations are correctly implemented in their processors.<ref>{{Citation |last=Goel |first=Shilpi |title=Microprocessor Assurance and the Role of Theorem Proving |date=2022 |work=Handbook of Computer Architecture |pages=1–43 |editor-last=Chattopadhyay |editor-first=Anupam |url=https://link.springer.com/10.1007/978-981-15-6401-7_38-1 |access-date=2024-02-10 |place=Singapore |publisher=Springer Nature Singapore |language=en |doi=10.1007/978-981-15-6401-7_38-1 |isbn=978-981-15-6401-7 |last2=Ray |first2=Sandip|url-access=subscription }}</ref>
Commercial use of automated theorem proving is mostly concentrated in [[integrated circuit design]] and verification. Since the [[Pentium FDIV bug]], the complicated [[floating point unit]]s of modern microprocessors have been designed with extra scrutiny. [[AMD]], [[Intel]] and others use automated theorem proving to verify that division and other operations are correctly implemented in their processors.<ref>{{Citation |last1=Goel |first1=Shilpi |title=Microprocessor Assurance and the Role of Theorem Proving |date=2022 |work=Handbook of Computer Architecture |pages=1–43 |editor-last=Chattopadhyay |editor-first=Anupam |url=https://link.springer.com/10.1007/978-981-15-6401-7_38-1 |access-date=2024-02-10 |place=Singapore |publisher=Springer Nature Singapore |language=en |doi=10.1007/978-981-15-6401-7_38-1 |isbn=978-981-15-6401-7 |last2=Ray |first2=Sandip|url-access=subscription }}</ref>


Other uses of theorem provers include [[program synthesis]], constructing programs that satisfy a [[formal specification]].<ref>{{cite book |last1=Basin |first1=D. |title=Program Development in Computational Logic |last2=Deville |first2=Y. |last3=Flener |first3=P. |last4=Hamfelt |first4=A. |last5=Fischer Nilsson |first5=J. |publisher=Springer |year=2004 |editor=M. Bruynooghe and K.-K. Lau |series=LNCS |volume=3049 |pages=30&ndash;65 |chapter=Synthesis of programs in computational logic |citeseerx=10.1.1.62.4976}}</ref> Automated theorem provers have been integrated with [[Proof assistant|proof assistants]], including [[Isabelle (proof assistant)|Isabelle/HOL]].<ref>{{Cite journal |last1=Meng |first1=Jia |last2=Paulson |first2=Lawrence C. |date=2008-01-01 |title=Translating Higher-Order Clauses to First-Order Clauses |url=https://doi.org/10.1007/s10817-007-9085-y |journal=Journal of Automated Reasoning |language=en |volume=40 |issue=1 |pages=35–60 |doi=10.1007/s10817-007-9085-y |issn=1573-0670 |s2cid=7716709|url-access=subscription }}</ref>
Other uses of theorem provers include [[program synthesis]], constructing programs that satisfy a [[formal specification]].<ref>{{cite book |last1=Basin |first1=D. |title=Program Development in Computational Logic |last2=Deville |first2=Y. |last3=Flener |first3=P. |last4=Hamfelt |first4=A. |last5=Fischer Nilsson |first5=J. |publisher=Springer |year=2004 |editor=M. Bruynooghe and K.-K. Lau |series=LNCS |volume=3049 |pages=30&ndash;65 |chapter=Synthesis of programs in computational logic |citeseerx=10.1.1.62.4976}}</ref> Automated theorem provers have been integrated with [[Proof assistant|proof assistants]], including [[Isabelle (proof assistant)|Isabelle/HOL]].<ref>{{Cite journal |last1=Meng |first1=Jia |last2=Paulson |first2=Lawrence C. |date=2008-01-01 |title=Translating Higher-Order Clauses to First-Order Clauses |url=https://doi.org/10.1007/s10817-007-9085-y |journal=Journal of Automated Reasoning |language=en |volume=40 |issue=1 |pages=35–60 |doi=10.1007/s10817-007-9085-y |issn=1573-0670 |s2cid=7716709|url-access=subscription }}</ref>
Line 68: Line 67:
* [[Otter (theorem prover)|Otter]], developed at the [[Argonne National Laboratory]], is based on [[first-order resolution]] and [[paramodulation]]. Otter has since been replaced by [[Prover9]], which is paired with [[Mace4]].
* [[Otter (theorem prover)|Otter]], developed at the [[Argonne National Laboratory]], is based on [[first-order resolution]] and [[paramodulation]]. Otter has since been replaced by [[Prover9]], which is paired with [[Mace4]].
* [[SETHEO]] is a high-performance system based on the goal-directed [[model elimination]] calculus, originally developed by a team under direction of [[Wolfgang Bibel]]. E and SETHEO have been combined (with other systems) in the composite theorem prover E-SETHEO.
* [[SETHEO]] is a high-performance system based on the goal-directed [[model elimination]] calculus, originally developed by a team under direction of [[Wolfgang Bibel]]. E and SETHEO have been combined (with other systems) in the composite theorem prover E-SETHEO.
* [[Vampire theorem prover|Vampire]] was originally developed and implemented at [[University of Manchester|Manchester University]] by [[Andrei Voronkov]] and Kryštof Hoder. It is now developed by a growing international team. It has won the FOF division (among other divisions) at the CADE ATP System Competition regularly since 2001.<ref>{{cite web |title=History |url=https://vprover.github.io/history.html |website=vprover.github.io}}</ref>
* [[Vampire theorem prover|Vampire]] was originally developed and implemented at [[University of Manchester|Manchester University]] by [[Andrei Voronkov (computer scientist)|Andrei Voronkov]] and Kryštof Hoder. It is now developed by a growing international team. It has won the FOF division (among other divisions) at the CADE ATP System Competition regularly since 2001.<ref>{{cite web |title=History |url=https://vprover.github.io/history.html |website=vprover.github.io}}</ref>
* Waldmeister is a specialized system for unit-equational first-order logic developed by Arnim Buch and Thomas Hillenbrand. It won the CASC UEQ division for fourteen consecutive years (1997–2010).
* Waldmeister is a specialized system for unit-equational first-order logic developed by Arnim Buch and Thomas Hillenbrand. It won the CASC UEQ division for fourteen consecutive years (1997–2010).
* [[SPASS]] is a first-order logic theorem prover with equality. This is developed by the research group Automation of Logic, [[Max Planck Institute for Computer Science]].
* [[SPASS]] is a first-order logic theorem prover with equality. This is developed by the research group Automation of Logic, [[Max Planck Institute for Computer Science]].