Your SlideShare is downloading. ×

Metabiology life as evolving software by g j chaitin


Published on

Metabiology Life as Evolving Software by G J Chaitin

Metabiology Life as Evolving Software by G J Chaitin

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 1 METABIOLOGY: LIFE AS EVOLVING SOFTWARE METABIOLOGY: a field parallel to biology, dealing with the random evolution of artifi- cial software (computer programs) rather than natural software (DNA), and simple enough that it is possible to prove rigorous theorems or formulate heuristic arguments at the same high level of precision that is common in the- oretical physics.
  • 2. 2 “The chance that higher life forms might have emerged in this way [by Darwinian evolution] is comparable to the chance that a tornado sweeping through a junkyard might assemble a Boeing 747 from the materials therein.” — Fred Hoyle. “In my opinion, if Darwin’s theory is as simple, fundamental and basic as its adherents believe, then there ought to be an equally fundamental mathemati- cal theory about this, that expresses these ideas with the generality, precision and degree of abstractness that we are accustomed to demand in pure math- ematics.” — Gregory Chaitin, Speculations on Biology, Information and Complexity. “Mathematics is able to deal successfully only with the simplest of situations, more precisely, with a complex situation only to the extent that rare good fortune makes this complex situation hinge upon a few dominant simple fac- tors. Beyond the well-traversed path, mathematics loses its bearings in a jungle of unnamed special functions and impenetrable combinatorial partic- ularities. Thus, the mathematical technique can only reach far if it starts from a point close to the simple essentials of a problem which has simple essentials. That form of wisdom which is the opposite of single-mindedness, the ability to keep many threads in hand, to draw for an argument from many disparate sources, is quite foreign to mathematics.” — Jacob Schwartz, The Pernicious Influence of Mathematics on Science. “It may seem natural to think that, to understand a complex system, one must construct a model incorporating everything that one knows about the system. However sensible this procedure may seem, in biology it has repeat- edly turned out to be a sterile exercise. There are two snags with it. The first is that one finishes up with a model so complicated that one cannot understand it: the point of a model is to simplify, not to confuse. The sec- ond is that if one constructs a sufficiently complex model one can make it do anything one likes by fiddling with the parameters: a model that can predict anything predicts nothing.” — John Maynard Smith & E¨ors Szathm´ary, The Origins of Life.
  • 3. 3 Course Notes METABIOLOGY: LIFE AS EVOLVING SOFTWARE G. J. Chaitin Draft October 1, 2010
  • 4. 4 To my wife Virginia who played an essential role in this research
  • 5. Contents Preface 7 1 Introduction: Building a theory 9 2 The search for the perfect language 19 3 Is the world built out of information? Is everything soft- ware? 39 4 The information economy 45 5 How real are real numbers? 55 6 Speculations on biology, information and complexity 77 7 Metaphysics, metamathematics and metabiology 87 8 Algorithmic information as a fundamental concept in physics, mathematics and biology 101 9 To a mathematical theory of evolution and biological creativ- ity 113 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 9.2 History of Metabiology . . . . . . . . . . . . . . . . . . . . . . 114 9.3 Modeling Evolution . . . . . . . . . . . . . . . . . . . . . . . . 116 9.3.1 Software Organisms . . . . . . . . . . . . . . . . . . . . 116 9.3.2 The Hill-Climbing Algorithm . . . . . . . . . . . . . . 116 9.3.3 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.3.4 What is a Mutation? . . . . . . . . . . . . . . . . . . . 117 5
  • 6. 6 9.3.5 Mutation Distance . . . . . . . . . . . . . . . . . . . . 117 9.3.6 Hidden Use of Oracles . . . . . . . . . . . . . . . . . . 118 9.4 Model A (Naming Integers) Exhaustive Search . . . . . . . . . 119 9.4.1 The Busy Beaver Function . . . . . . . . . . . . . . . . 119 9.4.2 Proof of Theorem 1 (Exhaustive Search) . . . . . . . . 119 9.5 Model A (Naming Integers) Intelligent Design . . . . . . . . . . 120 9.5.1 Another Busy Beaver Function . . . . . . . . . . . . . 120 9.5.2 Improving Lower Bounds on Ω . . . . . . . . . . . . . . 121 9.5.3 Proof of Theorem 2 (Intelligent Design) . . . . . . . . . 123 9.6 Model A (Naming Integers) Cumulative Evolution at Random . 124 9.7 Model B (Naming Functions) . . . . . . . . . . . . . . . . . . . 128 9.8 Remarks on Model C (Naming Ordinals) . . . . . . . . . . . . . 132 9.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 10 Parsing the Turing test 143 11 Should mathematics be done differently because of G¨odel’s incompleteness theorem? 149 Bibliography 159 Books by Chaitin 161
  • 7. Preface Biology and mathematics are like oil and water, they do not mix. Never- theless this course will describe my attempt to express some basic biological principles mathematically. I’ll try to explain the raison d’ˆetre of what I call my “metabiological” approach, which studies randomly evolving computer programs rather than biological organisms. I want to thank a number of people and organizations for inviting me to lecture on metabiology; the interaction with audiences was extremely stimu- lating and helped these ideas to evolve. Firstly, I thank the IBM Watson Research Center, Yorktown Heights, where I gave two talks on this, including the world premiere talk on metabiol- ogy. Another talk on metabiology in the United States was at the University of Maine. In Argentina I thank Veronica Becher of the University of Buenos Aires and Victor Rodriguez of the University of Cordoba for their kind invitations. And I am most grateful to the University of Cordoba, currently celebrating its 400th anniversary, for the honorary doctorate that they were kind enough to bestow on me. In Chile I spoke on metabiology several times at the Valparaiso Complex Systems Institute, and in Brazil I included metabiology in courses I gave at the Federal University of Rio de Janeiro and in a talk at the Federal University in Niteroi. Furthermore I thank Bernd-Olaf K¨uppers for inviting me to a very stim- ulating meeting at his Frege Centre for Structural Sciences at the University of Jena. And I thank Ilias Kotsireas for organizing a Chaitin-in-Ontario lecture se- ries in 2009 in the course of which I spoke on metabiology at the University of Western Ontario in London, at the Institute for Quantum Computing in Waterloo, and at the Fields Institute at the University of Toronto. The chap- 7
  • 8. 8 Chaitin: Metabiology ter of this book on Ω is based on a talk I gave at Wilfrid Laurier University in Waterloo. Finally, I should mention that the chapter on “The Search for the Perfect Language” was first given as a talk at the Hebrew University in Jerusalem in 2008, then at the University of Campinas in Brazil, and finally at the Perimeter Institute in Waterloo, Canada. The chapter on “Is Everything Software?” was originally a talk at the Technion in Haifa, where I also spoke on metabiology at the University of Haifa, one of a series of talks I gave there as the Rothschild Distinguished Lecturer for 2010. These were great audiences, and their questions and suggestions were extremely valuable. — Gregory Chaitin, August 2010
  • 9. Chapter 1 Introduction: Building a theory • This is a course on biology that will spend a lot of time discussing Kurt G¨odel’s famous 1931 incompleteness theorem on the limits of formal mathematical reasoning. Why? Because in my opinion the ultimate historical perspective on the significance of incompleteness may be that G¨odel opens the door from mathematics to biology. • We will also spend a lot of time discussing computer programs and software for doing mathematical calculations. How come? Because DNA is presumably a universal programming language, which is a language that is rich enough that it can express any algorithm. The fact that DNA is such a powerful programming language is a more fundamental characteristic of life than mere self-reproduction, which anyway is never exact—for if it were, there would be no evolution. • Now a few words on the kind of mathematics that we shall use in this course. Starting with Newton mathematical physics is full of what are called ordinary differential equations, and starting with Maxwell partial differential equations become more and more important. Mathematical physics is full of differential equations, that is, continuous mathematics. But that is not the kind of mathematics that we shall use here. The secret of life is not a differential equation. There is no differential equation for your spouse, for an organism, or for biological evolution. Instead we shall concentrate on the fact that DNA is the software, it’s the programming language for life. • It is true that there are (ordinary) differential equations in a highly suc- 9
  • 10. 10 Chaitin: Metabiology cessful mathematical theory of evolution, Wright-Fisher-Haldane pop- ulation genetics. But population genetics does not say where new genes come from, it assumes a fixed gene pool and discusses the change of gene frequencies in response to selective pressure, not bio- logical creativity and the major transitions in evolution, such as the transition from unicellular to multicellular organisms, which is what interests us. • If we aren’t going to use anymore the differential equations that popu- late mathematical physics, what kind of math are we going to use? It will be discrete math, new math, the math of the 20th century dealing with computation, with algorithms. It won’t be traditional continuous math, it won’t be the calculus. As Dorothy says in The Wizard of Oz, “Toto, we’re not in Kansas anymore!” More in line with our life-as-evolving-software viewpoint are three hot new topics in 20th century mathematics, computation, information and complexity. These have expanded into entire theories, called com- putability theory, information theory and complexity theory, theories which superficially appear to have little or no connection with biology. In particular, our basic tool in this course will be algorithmic infor- mation theory (AIT), a mixture of Turing computability theory with Shannon information theory, which features the concept of program- size complexity. The author was one of the people who created this theory, AIT, in the mid 1960’s and then further developed it in the mid 1970’s; the theory of evolution presented in this course could have been done then—all the necessary tools were available. Why then the delay of 35 years? My apologies; I got distracted working on computer engineering and thinking about metamathematics. I had published notes on biology occasionally on and off since 1969, but I couldn’t find the right way of thinking about biology, I couldn’t figure out how to formulate evolution mathematically in a workable manner. Once I discovered the right way, this new theory I call metabiology went from being a gleam in my eye to a full-fledged mathematical theory in just two years. • Also, it would be nice to be able to show that in our toy model hierar- chical structure will evolve, since that is such a conspicuous feature of biological organisms.
  • 11. Introduction: Building a theory 11 What kind of math can we use for that? Well, there are places in pure math and in software engineering where you get hierarchical structures: in Mandelbrot fractals, in Cantor transfinite ordinal numbers, in hier- archies of fast growing functions, and in software levels of abstraction. Fractals are continuous math and therefore not suitable for our discrete models, but the three others are genuine possibilities, and we shall discuss them all. One of our models of evolution does provably exhibit hierarchical structure. • Here is the big challenge: Biology is extremely complicated, and every rule has exceptions. How can mathematics possibly deal with this? We will outline an indirect way to deal with it, by studying a toy model I call metabiology (= life as evolving software, computer program organisms, computer program mutations), not the real thing. We are using Leibnizian math, not Newtonian math. By modeling life as software, as computer programs, we get a very rich space of possible designs for organisms, and we can discuss biological creativity = where new genes come from (where new biological ideas such as multicellular organization come from), not just changes in gene frequencies in a population as in conventional evolutionary models. • Some simulations of evolution on the computer (in silico—as contrasted with in vivo, in the organism, and in vitro, in the test tube) such as Tierra and Avida do in fact model organisms as software. But in these models there is only a limited amount of evolution followed by stagnation.1 Furthermore I do not run my models on a computer, I prove theorems about them. And one of these theorems is that evolution will con- tinue indefinitely, that biological creativity (or what passes for it in my model) is endless, unceasing. • The main theme of Darwinian evolution is competition, survival of the fittest, “Nature red in tooth and claw.” The main theme of my model is creativity: Instead of a population of individuals competing 1 As for genetic algorithms, they are intended to “stagnate” when they achieve an optimal solution to an engineering design problem; such a solution is a fixed point of the process of simulated evolution used by genetic algorithms.
  • 12. 12 Chaitin: Metabiology ferociously with each other in order to spread their individual genes (as in Richard Dawkins’ The Selfish Gene), instead of a jungle, my model is like an individual Buddhist trying to attain enlightenment, a monk who is on the path to enlightenment, it is like a mystic or a kabbalist who is trying to get closer and closer to God. More precisely, the single mutating organism in my model attains greater and greater mathematical knowledge by discovering more and more of the bits of Ω, which is, as we shall see in the part of the course on Ω, Course Topic 5, a very concentrated form of mathematical knowledge, of mathematical creativity. My organisms strive for greater mathematical understanding, for purely mathematical enlightenment. I model where new mathematical knowledge is coming from, where new biological ideas are coming from, it is this process that I model and prove theorems about. • But my model of a single mutating organism is indeed Darwinian: I have a single organism that is subjected to completely random muta- tions until a fitter organism is found, which then replaces my original organism, and this process continues indefinitely. The key point is that in my model progress comes from combining random mutations and having a fitness criterion (which is my abstract encapsulation of both competition and the environment). The key point in Darwin’s theory was to replace God by randomness; organisms are not designed, they emerge at random, and that is also the case in my highly simplified toy model. • Does this highly abstract game have any relevance to real biology? Probably not, and if so, only very, very indirectly. It is mathematics itself that benefits most, because we begin to have a mathematical theory inspired by Darwin, to have mathematical concepts that are inspired by biology. The fact that I can prove that evolution occurs in my model does not in any way constitute a proof that Darwinians are correct and Intelligent Design partidarians are mistaken. But my work is suggestive and it does clarify some of the issues, by furnishing a toy model that is much easier to analyze than the real thing—the real thing is what is actually taking place in the biosphere,
  • 13. Introduction: Building a theory 13 not in my toy model, which consists of arbitrary mutation computer programs operating on arbitrary organism computer programs. • More on creativity, a key word in my model: Something is mechanical if there is an algorithm for doing it; it is creative if there is no such algorithm. This notion of creativity is basic to our endeavor, and it comes from the work of G¨odel on the incompleteness of formal axiomatic theories, and from the work of Turing on the unsolvability of the so-called halt- ing problem.2 Their work shows that there are no absolutely general methods in mathematics and theoretical computer science, that cre- ativity is essential, a conclusion that Paul Feyerabend with his book Against Method would have loved had he been aware of it: What Fey- erabend espouses for philosophical reasons is in fact a theorem, that is, is provably correct in the field of mathematics. So before we get to Darwin, we shall spend a lot of time in this course with G¨odel and Turing and the like, preparing the groundwork for our model of evolution. Without this historical background it is impossible to appreciate what is going on in our model. • My model therefore mixes mathematical creativity and biological cre- ativity. This is both good and bad. It’s bad, because it distances my model from biology. But it is good, because mathematical creativity is a deep mathematical question, a fundamental mystery, a big un- known, and therefore something important to think about, at least for mathematicians, if not for biologists. Further distancing my model from biology, my model combines ran- domness, a very Darwinian feature, with Turing oracles, which have no counterpart in biology; we will discuss this in due course. Exploring such models of randomly evolving software may well develop into a new field of mathematics. Hopefully this is just the beginning, and metabiology will develop and will have more connection with biol- ogy in the future than it has at present. 2 Turing’s halting problem is the question of deciding whether or not a computer pro- gram that is self-contained, without any input, will run forever, or will eventually finish.
  • 14. 14 Chaitin: Metabiology • The main difference between our model and the DNA software in real organisms is their time complexity: the amount of time the software can run. I can prove elegant theorems because in my model the time allowed for a program to run is finite, but unlimited. Real DNA software must run quickly: 9 months to produce a baby, 70 years in total, more or less. A theory of the evolution of programs with such limited time complexity, with such limited run time, would be more realistic but it will not contain the neat results we have in our idealized version of biology. This is similar to the thermodynamics arguments which are taken in the “thermodynamic limit” of large amounts of time, in order to obtain more clear-cut results, when discussing the ideal performance of heat engines (e.g., steam engines). Indeed, AIT is a kind of thermodynamics of computation, with program-size complexity replacing entropy. In- stead of applying to heat engines and telling us their ideal efficiency, AIT does the same for computers, for computations. You have to go far from everyday biology to find beautiful mathematical structure. • It should be emphasized that metabiology is Work in Progress. It may be mistaken. And it is certainly not finished yet. We are building a new theory. How do you create a theory? “Beauty” is the guide. And this course will give a history of ideas for metabiology with plenty of examples. An idea is beautiful when it illuminates you, when it connects everything, when you ask yourself, “Why didn’t I see that before!,” when in retrospect it seems obvious. AIT has two such ideas: the idea of looking at the size of a com- puter program as a complexity measure, and the idea of self-delimiting programs. Metabiology has two more beautiful ideas: the idea of or- ganisms as arbitrary programs with a difficult mathematical problem to solve, and the idea of mutations as arbitrary programs that operate on an organism to produce a mutated organism. Once you have these ideas, the rest is just uninspired routine work, lots of hard work, but that’s all. In this course we shall discuss all four of these beautiful ideas, which were the key inspirations required for creating AIT and metabiology.
  • 15. Introduction: Building a theory 15 Routine work is not enough, you need a spark from God. And mostly you need an instinct for mathematical beauty, for sensing an idea that can be developed, for the importance of an idea. That is, more than anything else, a question of aesthetics, of intuition, of instinct, of judge- ment, and it is highly subjective. I will try my best to explain why I believe in these ideas, but just as in artistic taste, there is no way to convince anyone. You either feel it somewhere deep in your soul or you don’t. There is nothing more important than experiencing beauty; it’s a glimpse of transcendence, a glimpse of the divine, something that fewer and fewer people believe in nowadays. But without that we are mere machines. And I may have the beginnings of a mathematical theory of evolu- tion and biological creativity, but a mathematical theory of beauty is nowhere in sight. • Incompleteness goes from being threatening to provoking creativity and being applied in order to keep our organisms evolving indefinitely. Evo- lution stagnates in most models because the organisms achieve their goals. In my model the organisms are asked to achieve something that can never be fully achieved because of the incompleteness phe- nomenon. So my organisms keep getting better and better at what they are doing; they can never stop, because stopping would mean that they had a complete answer to a math problem to which incom- pleteness applies. Indeed, the three mathematical challenges that my organisms face, naming large integers, fast growing functions, and large transfinite ordinals, are very concrete, tangible examples of the incom- pleteness phenomenon, which at first seemed rather mysterious. Incompleteness is the reason that our organisms have to keep evolving forever, as they strive to become more and more complete, less and less incomplete. . . Incompleteness keeps our model of evolution from stagnating, it gives our organisms a mission, a raison d’ˆetre. You have to go beyond incompleteness; incompleteness gives rise to creativity and evolution. Incompleteness sounds bad, but the other side of the coin is creativity and evolution, which are good. Now we give an outline of the course, consisting of Course Topics 1–9: 1. This introduction.
  • 16. 16 Chaitin: Metabiology 2. The Search for the Perfect Language. (My talk at the Perimeter Insti- tute in Waterloo.) Umberto Eco, Lull, Leibniz, Cantor, Russell, Hilbert, G¨odel, Turing, AIT, Ω. Kabbalah, Key to Universal Knowledge, God-like Power of Creation, the Golem! Mathematical theories are all incomplete (G¨odel, Turing, Ω), but pro- gramming languages are universal. Most concise programming lan- guages, self-delimiting programs. 3. Is the world built out of information? Is everything software? (My talk at the Technion in Haifa.) Physics of information: Quantum Information Theory; general relativ- ity and black holes, Beckenstein bound, holographic principle = every physical system contains a finite number of bits of information that grows as the surface area of the physical system, not as its volume (Lee Smolin, Three Roads to Quantum Gravity); derivation of Ein- stein’s field equations for gravity from the thermodynamics of black holes (Ted Jacobson, “Thermodynamics of Spacetime: The Einstein Equation of State”). The first attempt to construct a truly fundamental mathematical model for biology: von Neumann self-reproducing automata in a cellular automata world, a world in which magic works, a plastic world. See also: Edgar F. Codd, Cellular Automata. Konrad Zuse, Rechnen- der Raum (Calculating Space). Fred Hoyle, Ossian’s Ride. Freeman Dyson, The Sun, the Genome, and the Internet (1999), green technol- ogy. Craig Venter, genetic engineering, synthetic life. Technological applications: Seeds for houses, seeds for jet planes! Plant the seed in the earth just add water and sunlight. Universal con- structors, 3D printers = matter printers = printers for objects. Flexible manufacturing. Alchemy, Plastic reality. 4. Artificial Life: Evolution Simulations. Low Level: Thomas Ray’s Tierra, Christoph Adami’s Avida, Walter Fontana’s ALchemy (Algorithmic Chemistry), Genetic algorithms. High Level: Exploratory concept-formation based on examining lots of examples in elementary number theory, experimental math with no
  • 17. Introduction: Building a theory 17 proofs: Douglas Lenat (1984), “Automated theory formation in math- ematics,” AM. After a while these stop evolving. What about proofs instead of simulations? Seems impossible—see the frontispiece quotes facing the title page, especially the one by Jacob Schwartz—but there is hope. See Course Topic 5 arguing that Ω is provably a bridge from math to biology. 5. How Real Are Real Numbers? A History of Ω. (My talk at WLU in Waterloo.) Course Topic 3 gives physical arguments against real numbers, and this course topic gives mathematical arguments against real numbers. These considerations about paradoxical real numbers will lead us straight to the halting probability Ω. That is not how Ω was actually discovered, but it is the best way of understanding Ω. It’s a Whig history: how it should have been, not how it actually was. The irreducible complexity real number Ω proves that math is more biological than biology; this is the first real bridge between math and biology. Biology is extremely complicated, and pure math is infinitely complicated. The theme of Ω as concentrated mathematical creativity is introduced here; this is important because Ω is the organism that emerges through random evolution in Course Topic 8. Now let’s get to work in earnest to build a mathematical theory of evolution and biological creativity. 6. Metabiology: Life as Evolving Software. Stephen Wolfram, NKS: the origin of life as the physical implementa- tion of a universal programming language; the Ubiquity of Univer- sality. Fran¸cois Jacob, bricolage, Nature is a cobbler, a tinkerer. Neil Shubin, Your Inner Fish. Stephen Gould, Wonderful Life, on the Cam- brian explosion of body designs. Murray Gell-Mann, frozen accidents. Ernst Haeckel, ontogeny recapitulates phylogeny. Evo-devo. Note that a small change in a computer program (one bit!) can com- pletely wreck it. But small changes can also make substantial improve- ments. This is a highly nonlinear effect, like the famous butterfly effect
  • 18. 18 Chaitin: Metabiology of chaos theory (see James Gleick’s Chaos). Over the history of this planet, covering the entire surface of the earth, there is time to try many small changes. But not enough time according to the Intelli- gent Design book Signature in the Cell. In the real world this is still controversial, but in my toy model evolution provably works. A blog summarized one of my talks on metabiology like this: “We are all random walks in program space!” That’s the general idea; in Course Topics 7 and 8 we fill in the details of this new theory. 7. Creativity in Mathematics. We need to challenge our organisms into evolving. We need to keep them from stagnating. These problems can utilize an unlimited amount of mathematical creativity: • Busy Beaver problem: Naming large integers: 1010 , 101010 . . . • Naming fast-growing functions: N2 , 2N . . . • Naming large transfinite Cantor ordinals: ω, ω2 , ωω . . . 8. Creativity in Biology. Single mutating organism. Hill-climbing algo- rithm on a fitness landscape. Hill-climbing random walks in software space. Evolution of mutating software. What is a mutation? Exhaus- tive search. Intelligent design. Cumulative evolution at random. Ω as concentrated creativity, Ω as an evolving organism. Randomness yields intelligence. We have a proof that evolution works, at least in this toy model; in fact, surprisingly it is nearly as fast as intelligent design, as deliberately choosing the mutations in the best possible order. But can we show that random evolution is slower than intelligent design? Otherwise the theory collapses onto a point, it cannot distinguish, it does not make useful distinctions. We also get evolution of hierarchical structure in non-universal programming languages. So we seem to have evolution at work in these toy models. But to what extent is this relevant to real biological systems? 9. Conclusion: On the plasticity of the world. Is the universe mental? Speculation where all this might possibly lead.
  • 19. Chapter 2 The search for the perfect language I will tell how the story given in Umberto Eco’s book The Search for the Perfect Language continues with modern work on logical and programming languages. Lecture given Monday, 21 September 2009, at the Perimeter In- stitute for Theoretical Physics in Waterloo, Canada.1 Today I’m not going to talk much about Ω. I will focus on that at Wilfrid Laurier University tomorrow. And if you want to hear a little bit about my current enthusiasm, which is what I’m optimistically calling metabiology — it’s a field with a lovely name and almost no content at this time — that’s on Wednesday at the Institute for Quantum Computing. I thought it would be fun here at the Perimeter Institute to repeat a talk, to give a version of a talk, that I gave in Jerusalem a year ago. To understand the talk it helps to keep in mind that it was first given in Jerusalem. I’d like to give you a broad sweep of the history of mathematical logic. I’m a math- ematician who likes physicists; some mathematicians don’t like physicists. But I do. Before I became a mathematician I wanted to be a physicist. So I’m going to talk about mathematics, and I’d like to give you a broad overview, most definitely a non-standard view of some intellectual history. It 1 This lecture was published in Portuguese in S˜ao Paulo, Brazil, in the magazine Dicta & Contradicta, No. 4, 2009. See 19
  • 20. 20 Chaitin: Metabiology will be a talk about the history of work on the foundations of mathematics as seen from the perspective of the Middle Ages. So here goes. . . This talk = Umberto Eco + Hilbert, G¨odel, Turing. . . Outline at: There is a wonderful book by Umberto Eco called The Search for the Perfect Language, and I recommend it highly to all of you. In The Search for the Perfect Language you can see that Umberto Eco likes the Middle Ages — I think he probably wishes we were still there. And this book talks about a dream that Eco believes played a fundamental role in European intellectual history, which is the search for the perfect language. What is the search for the perfect language? Nowadays a physicist would call this the search for a Theory of Everything (TOE), but in the terms in which it was formulated originally, it was the idea of finding, shall we say, the language of creation, the language before the Tower of Babel, the language that God used in creating the universe, the language whose structure directly expresses the structure of the world, the language in which concepts are expressed in their direct, original format. You can see that this idea is a little bit like the attempt to find a foun- dational Theory of Everything in physics. The crucial point is that knowing this language would be like having a key to universal knowledge. If you’re a theologian, it would bring you closer, very close, to God’s thoughts, which is dangerous. If you’re a magician, it would give you magical powers. If you’re a linguist, it would tell you the original, pure, uncorrupted language from which all languages descend. One can go on and on. . . This very fascinating book is about the quest to find this language. If you find it, you’re opening a door to absolute knowledge, to God, to the ultimate nature of reality, to whatever. And there are a lot of interesting chapters in this intellectual history. One of them is Raymond Lull, around 1200, a Catalan. Raymond Lull ≈ 1200 He was a very interesting gentleman who had the idea of mechanically com- bining all possible concepts to get new knowledge. So you would have a wheel with different concepts on it, and another wheel with other concepts on it, and you would rotate them to get all possible combinations. This would be
  • 21. The search for the perfect language 21 a systematic way to discover new concepts and new truths. And if you re- member Swift’s Gulliver’s Travels, there Swift makes fun of an idea like this, in one of the parts of the book that is not for children but definitely only for adults. Let’s leave Lull and go on to Leibniz. In The Search for the Perfect Language there is an entire chapter on Leibniz. Leibniz is a transitional figure in the search for the perfect language. Leibniz is wonderful because he is universal. He knows all about Kabbalah, Christian Kabbalah and Jewish Kabbalah, and all kinds of hermetic and esoteric doctrines, and he knows all about alchemy, he actually ghost-authored a book on alchemy. Leibniz knows about all these things, and he knows about ancient philosophy, he knows about scholastic philosophy, and he also knows about what was then called mechanical philosophy, which was the beginning of modern science. And Leibniz sees good in all of this. And he formulates a version of the search for the perfect language, which is firmly grounded in the magical, theological original idea, but which is also fit for consumption nowadays, that is, acceptable to modern ears, to contem- porary scientists. This is a universal language he called the characteristica universalis that was supposed to come with a crucial calculus ratiocinator. Leibniz: characteristica universalis, calculus ratiocinator The idea, the goal, is that you would reduce reasoning to calculation, to computation, because the most certain thing is that 2 + 5 = 7. In other words, the way Leibniz put it, perhaps in one of his letters, is that if two people have an intellectual dispute, instead of dueling they could just sit down and say, “Gentlemen, let us compute!”, and get the correct answer and find out who was right. So this is Leibniz’s version of the search for the perfect language. How far did he get with this? Well, Leibniz is a person who gets bored easily, and flies like a butterfly from field to field, throwing out fundamental ideas, rarely taking the trouble to develop them fully. One case of the characteristica universalis that Leibniz did develop is called the calculus. This is one case where Leibniz worked out his ideas for the perfect language in beautiful detail. Leibniz’s version of the calculus differs from Newton’s precisely because it is part of Leibniz’s project for the characteristica universalis. Christian Huygens hated the calculus.
  • 22. 22 Chaitin: Metabiology Christian Huygens taught Leibniz mathematics in Paris at a relatively late age, when Leibniz was in his twenties. Most mathematicians start very, very young. And Christian Huygen’s hated Leibniz’s calculus because he said that it was mechanical, it was brainless: Any fool can just calculate the answer by following the rules, without understanding what he or she is doing. Huygens preferred the old, synthetic geometry proofs where you have to be creative and come up with a diagram and some particular reason for something to be true. Leibniz wanted a general method. He wanted to get the formalism, the notation, right, and have a mechanical way to get the answer. Huygens didn’t like this, but that was precisely the point. This was precisely what Leibniz was looking for, for everything! The idea was that if you get absolute truth, if you have found the truth, it should mechanically enable you to determine what’s going on, without creativity. This is good, this is not bad. This is also precisely how Leibniz’s version of the calculus differed from Newton’s. Leibniz saw clearly the importance of having a formalism that led you automatically to the answer. Let’s now take a big jump, to David Hilbert, about a century ago. . . No, first I want to tell you about an important attempt to find the perfect language: Cantor’s theory of infinite sets. Cantor: Infinite Sets This late 19th century theory is interesting because it’s firmly in the Middle Ages and also, in a way, the inspiration for all of 20th century mathematics. This theory of infinite sets was actually theology. This is mathematical theology. Normally you don’t mention that fact. To be a field of mathe- matics, the price of admission is you throw out all the philosophy, and you just end up with something technical. So all the theology has been thrown out. But Cantor’s goal was to understand God. God is transcendent. The theory of infinite sets has this hierarchy of bigger and bigger infinities, the alephs, the ℵ’s. You have ℵ0, ℵ1, the infinity of integers, of real numbers, and you keep going. Each one of these is the set of all subsets of the previous one. And very far out you get mind-boggling infinities like ℵω; this is the first infinity after ℵ0, ℵ1, ℵ2, ℵ3, ℵ4 . . .
  • 23. The search for the perfect language 23 Then you can continue with ω + 1, ω + 2, ω + 3 . . . 2ω + 1, 2ω + 2, 2ω + 3 . . . These so-called ordinal numbers are subscripts for the ℵ’s, which are cardi- nalities. Let’s go farther: ℵω2 , ℵωω , ℵωωω . . . And there’s an ordinal called epsilon-nought 0 = ωωωω... which is the smallest solution of the equation x = ωx . And the corresponding cardinal ℵ 0 is pretty big! You know, God is very far off, since God is infinite and transcendent. We can try to go in His direction. But we’re never going to get there, because after every cardinal, there’s a bigger one, the cardinality of the set of all subsets. And after any infinite sequence of cardinals that you get, you just take the union of all of that, and you get a bigger cardinal than is in the sequence. So this thing is inherently open-ended. And contradictory, by the way! There’s only one problem. This is absolutely wonderful, breath-taking stuff. The only problem is that it’s contradictory. The problem is very simple. If you take the universal set, the set of everything, and you consider the set of all its subsets, by Cantor’s diago- nal argument this should have a bigger cardinality, but how can you have anything bigger than the set of everything? This is the paradox that Bertrand Russell discovered. Russell looked at this and asked why do you get this bad result. And if you look at the Cantor diagonal argument proof that the set of all subsets of everything is bigger than everything, it involves the set of all sets that are not members of themselves, {x : x ∈ x},
  • 24. 24 Chaitin: Metabiology which can neither be in itself nor not be in itself. This is called the Russell paradox. Cantor was aware of the fact that this happens, but Cantor wasn’t both- ered by these contradictions, because he was doing theology. We’re finite but God is infinite, and it’s paradoxical for a finite being to try to comprehend a transcendent, infinite being, so paradoxes are okay. But the math community is not very happy with a theory which leads to contradictions. However, these ideas are so wonderful, that what the math community has done is forget about all this theology and philosophy and try to sweep the contradictions under the rug. There is an expurgated version of all this called Zermelo-Fraenkel set theory, with the axiom of choice, usually: ZFC. This is a formal axiomatic theory which you develop using first-order logic, and it is an expurgated version of Cantor’s theory believed not to contain any paradoxes. Anyway, Bertrand Russell was inspired by all of this to attempt a general critique of mathematical reasoning, and to find a lot of contradictions, a lot of mathematical arguments that lead to contradictions. Bertrand Russell: mathematics is full of contradictions. I already told you about his most famous one, the Russell paradox. Russell was an atheist who was searching for the absolute, who believed in absolute truth. And he loved mathematics and wanted mathematics to be perfect. Russell went around telling people about these contradictions in order to try to get them fixed. Besides the paradox that there’s no biggest cardinal and that the set of subsets of everything is bigger than everything, there’s also a problem with the ordinal numbers that’s called the Burali-Forti paradox, namely that the set of all the ordinals is an ordinal that’s bigger than all the ordinals. This works because each ordinal can be defined as the set of all the ordinals that are smaller than it is. (Then an ordinal is less than another ordinal if and only if it is contained in it.) Russell is going around telling people that reason leads to contradictions. So David Hilbert about a century ago proposes a program to put mathematics on a firm foundation. And basically what Hilbert proposes is the idea of a completely formal axiomatic theory, which is a modern version of Leibniz’s characteristica universalis and calculus ratiocinator: David Hilbert: mathematics is a formal axiomatic theory.
  • 25. The search for the perfect language 25 This is the idea of making mathematics totally objective, of removing all the subjective elements. So in such a formal axiomatic theory you would have a finite number of axioms, axioms that are not written in an ambiguous natural language. Instead you use a precise artificial language with a simple, regular artificial grammar. You use mathematical logic, not informal reasoning, and you specify the rules of the game completely precisely. It should be mechanical to decide whether a proof is correct. Hilbert was a conservative. He believed that mathematics gives abso- lute truth, which is an idea from the Middle Ages. You can see the Middle Ages whenever you mention absolute truth. Nevertheless, modern mathe- maticians remain enamored with absolute truth. As G¨odel said, we pure mathematicians are the last holdout of the Middle Ages. We still believe in the Platonic world of ideas, at least mathematical ideas, when everyone else, including philosophers, now laughs at this notion. But pure mathemati- cians live in the Platonic world of ideas, even though everyone else stopped believing in this a long time ago. So math gives absolute truth, said Hilbert. Every mathematician some- where deep inside believes this. Then there ought to exist a finite set of axioms, and a precise set of rules for deduction, for inference, such that all of mathematical truth is a consequence of these axioms. You see, if mathemat- ical truth is black or white, and purely objective, then if you fill in all the steps in a proof and carefully use an artificial language to avoid ambiguity, you should be able to have a finite set of axioms we can all agree on, that in principle enable you to deduce all of mathematical truth. This is just the notion that mathematics provides absolute certainty; Hilbert is analyzing what this means. What Hilbert says is that the traditional view that mathematics provides absolute certainty, that in the Platonic world of pure mathematics everything is black or white, means that there should be a single formal axiomatic theory for all of math. That was a very important idea of his. An important consequence of this idea goes back to the Middle Ages. This perfect language for mathematics, which is what Hilbert was looking for, would in fact give a key to absolute knowledge, because in principle you could mechanically deduce all the theorems from the axioms, simply by running through the tree of all possible proofs. You start with the axioms, then you apply the rules of inference once, and get all the theorems that have one-step proofs, you apply them two times, and you get all the theorems that
  • 26. 26 Chaitin: Metabiology have two-step proofs, and like that, totally mechanically, you would get all of mathematical truth, by systematically traversing the tree of all possible proofs. This would not put all mathematicians out of work, not at all. In practice this process would take an outrageous amount of time to get to interesting results, and all the interesting theorems would be overwhelmed by uninter- esting theorems, such as the fact that 1 + 1 = 2 and other trivialities. It would be hard to find the interesting theorems and to separate the wheat from the chaff. But in principle this would give you all mathematical truths. You wouldn’t actually do it, but it would show that math gives absolute certainty. By the way, it was important to make all mathematicians agree on the choice of formal axiomatic theory, and you would use metamathematics to try to convince everyone that this formal axiomatic theory avoids all the paradoxes that Bertrand Russell had noticed and contains no contradictions. Okay, so this was the idea of putting mathematics on a firm foundation and removing all doubts. This was Hilbert’s idea, about a century ago, and metamathematics studies a formal axiomatic theory from the outside, and notice that this is a door to absolute truth, following the notion of the perfect language. So what happens with this program, with this proposal of Hilbert’s? Well, there’s some good news and some bad news. Some of the good news I already mentioned: The thing that comes the closest to what Hilbert asked for is Zermelo-Fraenkel set theory, and it is a beautiful axiomatic theory. I want to mention some of the milestones in the development of this theory. One of them is the von Neumann integers, so let me tell you about that. Remember that Spinoza has a philosophical system in which the world is built out of only one substance, and that substance is God, that’s all there is. Zermelo-Fraenkel set theory is similar. Everything is sets, and every set is built out of the empty set. That’s all there is: the empty set, and sets built starting with the empty set. So zero is the empty set, that’s the first von Neumann integer, and in general n + 1 is defined to be the set of all integers less than or equal to n: von Neumann integers: 0 = {}, n + 1 = {0, 1, 2, . . . , n}. So if you write this out in full, removing all the abbreviations, all you have are curly braces, you have set formation starting with no content, and the
  • 27. The search for the perfect language 27 full notation for n grows exponentially in n, if you write it all out, because everything up to that point is repeated in the next number. In spite of this exponential growth, this is a beautiful conceptual scheme. Then you can define rational numbers as pairs of these integers, you can define real numbers as limit sequences of rationals, and you get all of mathematics, starting just with the empty set. So it’s a lovely piece of ontology. Here’s all of mathematical creation just built out of the empty set. And other people who worked on this are of course Fraenkel and Zermelo, because it is called Zermelo-Fraenkel set theory, and an approximate notion of what they did was to try to avoid sets that are too big. The universal set is too big, it gets you into trouble. Not every property determines a set. So this is a formal theory that most mathematicians believe enables you to carry out all the arguments that normally appear in mathematics — maybe if you don’t include category theory, which is very difficult to formalize, and even more paradoxical than set theory, from what I hear. Okay, so that’s some of the positive work on Hilbert’s program. Now some of the negative work on Hilbert’s program — I’d like to tell you about it, you’ve all heard of it — is of course G¨odel in 1931 and Turing in 1936. G¨odel, 1931 — Turing, 1936 What they show is that you can’t have a perfect language for mathematics, you cannot have a formal axiomatic theory like Hilbert wanted for all of mathematics, because of incompleteness, because no such system will include all of mathematical truth, it will always leave out truths, it will always be incomplete. And this is G¨odel’s incompleteness theorem of 1931, and G¨odel’s original proof is very strange. It’s basically the paradox of “this statement is false,” “This statement is false!” which is a paradox of course because it can be neither true nor false. If it’s false that it’s false, then it’s true, and if it’s true that it’s false, then it’s false. That’s just a paradox. But what G¨odel does is say “this statement is unprovable.” “This statement is unprovable!” So if the statement says of itself it’s unprovable, there are two possibilities: it’s provable, or it isn’t.
  • 28. 28 Chaitin: Metabiology If it’s provable, then we’re proving something that’s false, because it says it’s unprovable. So we hope that’s not the case; by hypothesis, we’ll eliminate that possibility. If we prove things that are false, we have a formal axiomatic theory that we’re not interested in, because it proves false things. The only possibility left is that it’s unprovable. But if it’s unprovable then it’s true, because it asserts it’s unprovable, therefore there’s a hole. We haven’t captured all of mathematical truth in our theory. This proof of incompleteness shocks a lot of people, but my personal reaction to it is, okay, it’s correct, but I don’t like it. A better proof of incompleteness, a deeper proof, comes from Turing in 1936. He derives incompleteness from a more fundamental phenomenon, which is uncomputability, the discovery that mathematics is full of stuff that can’t be calculated, of things you can define, but which you cannot calculate, because there’s no algorithm. Uncomputability ⇒ Incompleteness And in particular, the uncomputable thing that he discovers is the halt- ing problem, a very simple question: Does a computer program that’s self-contained halt or does it go on forever? There is no algorithm to answer this in every individual case, therefore there is no formal axiomatic theory that enables you to always prove in individual cases what the answer is. Why not? Because if there were a formal axiomatic theory that’s complete for the halting problem, that would give you a mechanical procedure for deciding, by running through the tree of all possible proofs, until you find a proof that an individual program you’re interested in halts, or you find a proof that it doesn’t. But that’s impossible because this is not a computable function. So Turing’s insight in 1936 is that incompleteness, that G¨odel found in 1931, for any formal axiomatic theory, comes from a deeper phenomenon, which is uncomputability. Incompleteness is an immediate corollary of un- computability, a concept which does not appear in G¨odel’s 1931 paper. But Turing’s paper has both good and bad aspects. There’s a negative aspect of his 1936 paper, which I’ve just told you about, but there’s also a positive aspect. You get another proof, a deeper proof of incompleteness, but you also get a kind of completeness. You find a perfect language. There is no perfect language for mathematical reasoning. G¨odel showed that in 1931, and Turing showed it again in 1936. But what Turing also
  • 29. The search for the perfect language 29 showed in 1936 is that there are perfect languages, not for mathematical reasoning, but for computation, for specifying algorithms. What Turing discovers in 1936 is that there’s a kind of completeness called universality and that there are universal Turing machines and universal programming languages. Universal Turing Machines / Programming Languages What “universal” means, what a universal programming language or a uni- versal Turing machine is, is a language in which every possible algorithm can be written. So on the one hand, Turing shows us in a deeper way that any language for mathematical reasoning has to be incomplete, but on the other hand, he shows us that languages for computation can be universal, which is just another name, a synonym, for completeness. There are perfect languages for computation, for writing algorithms, even though there aren’t any perfect languages for mathematical reasoning. This is the positive side, this is the completeness side, of Turing’s 1936 paper. Now, what I’ve spent most of my professional life on, is a subject I call algorithmic information theory Algorithmic Information Theory (AIT) that derives incompleteness from uncomputability by taking advantage of a deeper phenomenon, by considering an extreme form of uncomputability, which is called algorithmic randomness or algorithmic irreducibility. AIT: algorithmic randomness, algorithmic irreducibility There’s a perfect language again, and there’s also a negative side, the halt- ing probability Ω, whose bits are algorithmically random, algorithmically irreducible mathematical truths. Ω = .010010111 . . . This is a place in pure mathematics where there’s no structure. If you want to know the bits of the numerical value of the halting probability, this is a well-defined mathematical question, and in the world of mathematics all truths are necessary truths, but these look like accidental, contingent truths. They look random, they have irreducible complexity.
  • 30. 30 Chaitin: Metabiology This is a maximal case of uncomputability, this is a place in pure mathe- matics where there’s absolutely no structure at all. Although it is true that you can in a few cases actually know some of the first bits. . . There are actually an infinite number of halting probabilities depending on your choice of programming language. After you choose a language, then you ask what is the probability that a program generated by coin tossing will eventually halt. And that gives you a different halting probability. The numerical value will be different; the paradoxical properties are the same. Okay, there are cases for which you can get a few of the first bits. For example, if Ω starts with 1s in binary or 9s in decimal, you can know those bits or digits, if Ω is .11111. . . base two or .99999. . . base ten. So you can get a finite number of bits, perhaps, of the numerical value, but if you have an N- bit formal axiomatic theory, then you can’t get more than N bits of Ω. That’s sort of the general result. It’s irreducible logically and computationally. It’s irreducible mathematical information. It’s a perfect simulation in pure math, where all truths are necessary, of contingent, accidental, maximal entropy truths. So that’s the bad news from AIT. But just like in Turing’s 1936 work, there is a positive side. On the one hand we have maximal uncomputabil- ity, maximal entropy, total lack of structure, of any redundancy, in an information-theoretic sense, but there’s also good news. AIT, the theory of program-size complexity, the theory where Ω is the crown jewel, goes further than Turing, and picks out from Turing’s universal Turing machines, from Turing’s universal languages, maximally expressive programming languages. Because those are the ones that you have to use to develop this theory where you get to Ω. AIT has the notion of a maximally expressive programming language in which programs are maximally compact, and deals with a very basic complex- ity concept which is the size of the smallest program to calculate something: H(x) is the size in bits of the smallest program to calculate x. And we now have a better notion of perfection. The perfect languages that Turing found, the universal programming languages, are not all equally good. We now concentrate on a subset, the ones that enable us to write the most concise programs. These are the most expressive languages, the ones with the smallest programs. Now let me tell you, this definition of complexity is a dry, technical way of expressing this idea in modern terms. But let me put this into Medieval
  • 31. The search for the perfect language 31 terminology, which is much more colorful. The notion of program-size com- plexity — which by the way has many different names: algorithmic complex- ity, Kolmogorov complexity, algorithmic information content — in Medieval terms, what we’re asking is, how many yes/no decisions did God have to make to create something?, which is obviously a rather basic question to ask. That is, if you consider that God is calculating the universe. I’m giving you a Medieval perspective on these modern developments. Theology is the fundamental physics, it’s the theoretical physics of the Middle Ages. I have a lot of time left — I’ve been racing through this material — so maybe I should explain in more detail how AIT contributes to the quest for the perfect language. The notion of universal Turing machine that is used in AIT is Turing’s very basic idea of a flexible machine. It’s flexible hardware, which we call soft- ware. In a way, Turing in 1936 creates the computer industry and computer technology. That’s a tremendous benefit of a paper that mathematically sounds at first rather negative, since it talks about things that cannot be calculated, that cannot be proved. But on the other hand there’s a very pos- itive aspect — I stated it in theoretical terms — which is that programming languages can be complete, can be universal, even though formal axiomatic theories cannot be complete. Okay, so you get this technology, there’s this notion of a flexible machine, this notion of software, which emerges in this paper. Von Neumann, the same von Neumann who invented the von Neumann integers, credited all of this to Turing. At least Turing is responsible for the concept; the hardware implementation is another matter. Now, AIT, where you talk about program-size complexity, the size of the smallest program, how many yes/no decisions God has to make to calcu- late something, to create something, picks out a particular class of universal Turing machines U. What are the universal computers U like that you use to define program- size complexity and talk about Ω? Well, a universal computer U has the property that for any other computer C and its program p, your universal computer U will calculate the same result if you give it the original program p for C concatenated to a prefix πC which depends only on the computer C that you want to simulate. πC tells U which computer to simulate. In symbols, U(πC p) = C(p).
  • 32. 32 Chaitin: Metabiology In other words, πC p is the concatenation of two pieces of information. It’s a binary string. You take the original program p, which is also a binary string, and in front of it you put a prefix that tells you which computer to simulate. Which means that these programs πC p for U are only a fixed number of bits larger than the programs p for any individual machine C. These U are the universal Turing machines that you use in AIT. These are the most expressive languages. These are the languages with maximal expressive power. These are the languages in which programs are as concise as possible. This is how you define program-size complexity. God will natu- rally use the most perfect, most powerful programming languages, when he creates the world, to build everything. I should point out that Turing’s original universality concept was not careful about counting bits; it didn’t really care about the size of programs. All a universal machine U had to do was to be able to simulate any other machine C, but one did not study the size of the program for U as a function of the size of the program for C. Here we are careful not to waste bits. AIT is concerned with particularly efficient ways for U to be universal. The original notion of universality in Turing was not this demanding. The fact that you can just add a fixed number of bits to a program for C to get one for U is not completely trivial. Let me tell you why. After you put πC and p together, you have to know where the prefix ends and the program that is being simulated begins. There are many ways to do this. A very simple way to make the prefix πC self-delimiting is to have it be a sequence of 0’s followed by a 1: πC = 0k 1. And the number k of 0’s tells us which machine C to simulate. That’s a very wasteful way to indicate this. The prefix πC is actually an interpreter for the programming language C. AIT’s universal languages U have the property that you give U an interpreter plus the program p in this other language C, and U will run the interpreter to see what p does. If you think of this interpreter πC as an arbitrary string of bits, one way to make it self-delimiting is to just double all the bits. 0 goes to 00, 1 goes to 11, and you put a pair of unequal bits 01 as punctuation at the end:
  • 33. The search for the perfect language 33 Arbitrary πC: 0 → 00, 1 → 11, 01 at the end. This is a better way to have a self-delimiting prefix that you can concatenate with p. It only doubles the size, the 0k 1 trick increases the size exponentially. And there are more efficient ways to make the prefix self-delimiting. For example, you can put the size of the prefix in front of the prefix. But it’s sort of like Russian dolls, because if you put the size |πC| of πC in front of πC, |πC| also has to be self-delimiting: U(. . . ||πC|| |πC| πC p) = C(p). Anyway, picking U this way is the key idea in the original 1960s version of AIT that Solomonoff, Kolmogorov and I independently proposed. But ten years later I realized that this is not the right approach. You actually want the whole program πC p for U to be self-delimiting, not just the prefix πC. You want the whole thing to be self-delimiting to get the right theory of program-size complexity. Let me compare the 1960s version of AIT and the 1970s version of AIT. Let me compare these two different theories of program-size complexity. In the 1960s version, an N-bit string will in general need an N-bit pro- gram, if it’s irreducible, and most strings are algorithmically irreducible. Most N-bit strings need an N-bit program. These are the irreducible strings, the ones that have no pattern, no structure. Most N-bit strings need an N- bit program, because there aren’t enough smaller programs. But in the 1970s version of AIT, you go from N bits to N + log2 N bits, because you want to make the programs self-delimiting. An N-bit string will usually need an N + log2 N bit program: Most N-bit strings AIT1960: N bits of complexity, AIT1970: N + log2 N bits of complexity. Actually, in AIT1970 it’s N plus H(N), which is the size of the smallest self-delimiting program to calculate N, that’s exactly what that logarithmic term is. In other words, in the 1970s version of AIT, the size of the smallest program for calculating an N-bit string is usually N bits plus the size in bits of the smallest self-delimiting program to calculate N, which is roughly log N + log log N + log log log N + . . .
  • 34. 34 Chaitin: Metabiology bits long. That’s the Russian dolls aspect of this. The 1970s version of AIT, which takes the idea of being self-delimiting from the prefix and applies it to the whole program, gives us even better perfect languages. AIT evolved in two stages. First we concentrate on those U with U(πC p) = C(p) with πC self-delimiting, and then we insist that the whole thing πC p has also got to be self-delimiting. And when you do that, you get important new results, such as the sub-additivity of program-size complexity, H(x, y) ≤ H(x) + H(y), which is not the case if you don’t make everything self-delimiting. This just says that you can concatenate the smallest program for calculating x and the smallest program for calculating y to get a program for calculating x and y. And you can’t even define the halting probability Ω in AIT1960. If you allow all N-bit strings to be programs, then you cannot define the halting probability in a natural way, because the sum for defining the probability that a program will halt Ω = p halts 2−(size in bits of p) diverges to infinity instead of being between zero and one. This is the key technical point in AIT. I want the halting probability to be finite. The normal way of thinking about programs is that there are 2N N-bit programs, and the natural way of defining the halting probability is that every N-bit program that halts contributes 1/2N to the halting probability. The only problem is that for any fixed size N there are roughly order of 2N programs that halt, so if you sum over all possible sizes, you get infinity, which is no good. In order to get the halting probability to be between zero and one 0 < Ω = p halts 2−(size in bits of p) < 1 you have to be sure that the total probability summed over all programs p is less than or equal to one. This happens automatically if we force p to be self-delimiting. How can we do this? Easy! Pretend that you are the
  • 35. The search for the perfect language 35 universal computer U. As you read the program bit by bit, you have to be able to decide by yourself where the program ends, without any special punctuation, such as a blank, at the end of the program. This implies that no extension of a valid program is a valid program, and that the set of valid programs is what’s called a prefix-free set. Then the fact that the sum that defines Ω must be between zero and one, is just a special case of what’s called the Kraft inequality in Shannon information theory. But this technical machinery isn’t necessary. That 0 < Ω < 1 follows immediately from the fact that as you read the program bit by bit you are forced to decide where to stop without seeing any special punctuation. In other words, in AIT1960 we were actually using a three-symbol alphabet for programs: 0, 1 and blank. The blank told us where a program ends. But that’s a symbol that you’re wasting, because you use it very little. As you all know, if you have a three-symbol alphabet, then the right way to use it is to use each symbol roughly one-third of the time. So if you really use only 0s and 1s, then you have to force the Turing machine to decide by itself where the program ends. You don’t put a blank at the end to indicate that. So programs go from N bits in size to N +log2 N bits, because you’ve got to indicate in each program how big it is. On the other hand, you can just take subroutines and concatenate them to make a bigger program, so program- size complexity becomes sub-additive. You run the universal machine U to calculate the first object x, and then you run it again to calculate the second object y, and then you’ve got x and y, and so H(x, y) ≤ H(x) + H(y). These self-delimiting binary languages are the ones that the study of program-size complexity has led us to discriminate as the ideal languages, the most perfect languages. We got to them in two stages, AIT1960 and AIT1970. These are languages for computation, for expressing algorithms, not for mathematical reasoning. They are universal programming languages that are maximally expressive, maximally concise. We already knew how to do that in the 1960s, but in the 1970s we realized that programs should be self-delimiting, which made it possible to define the halting probability Ω. Okay, so that’s the story, and now maybe I should summarize all of this, this saga of the quest for the perfect language. As I said, the search for the perfect language has some negative conclusions and some positive conclu- sions.
  • 36. 36 Chaitin: Metabiology Hilbert wanted to find a perfect language giving all of mathematical truth, all mathematical knowledge, he wanted a formal axiomatic theory for all of mathematics. This was supposed to be a Theory of Everything for the world of pure math. And this cannot succeed, because we know that every formal axiomatic theory is incomplete, as shown by G¨odel, by Turing, and by my halting probability Ω. Instead of finding a perfect language, a perfect for- mal axiomatic theory, we found incompleteness, uncomputability, and even algorithmic irreducibility and algorithmic randomness. So that’s the negative side of this story, which is fascinating from an epistemological point of view, because we found limits to what we can know, we found limits of formal reasoning. Now interestingly enough, the mathematical community couldn’t care less. They still want absolute truth! They still believe in absolute truth, and that mathematics gives absolute truth. And if you want a proof of this, just go to the December 2008 issue of the Notices of the American Mathematical Society. That’s a special issue of the Notices devoted to formal proof. The technology has been developed to the point where they can run real mathematics, real proofs, through proof-checkers, and get them checked. A mathematician writes the proof out in a formal language, and fills in the missing steps and makes corrections until the proof-checker can understand the whole thing and verify that it is correct. And these proof-checkers are getting smarter and smarter, so that more and more of the details can be left out. As the technology improves, the job of formalizing a proof becomes easier and easier. The formal-proof extremists are saying that in the future all mathematics will have to be written out formally and verified by proof-checkers. The engineering has been worked out to the point that you can formally prove real mathematical results and run them through proof-checkers for verification. For example, this has been done with the proof of the four-color conjecture. It was written out as a formal proof that was run through a proof-checker. And the position of these extremists is that in the future all mathematics will have to be written out in a formal language, and you will have to get it checked before submitting a paper to a human referee, who will then only have to decide if the proof is worth publishing, not whether the proof is correct. And they want a repository of all mathematical knowledge, which would be a database of checked formal proofs of theorems. This is a substantial community, and to learn more, go to the December
  • 37. The search for the perfect language 37 2008 AMS Notices, which is available on the web for free in the AMS website. This is being worked on by a sizeable community, and the Notices devoted a special issue to it, which means that mathematicians still believe in absolute truth. I’m not disparaging this extremely interesting work, but I am saying that there’s a wonderful intellectual tension between it and the incompleteness re- sults that I’ve discussed in this talk. There’s a wonderful intellectual tension between incompleteness and the fact that people still believe in formal proof and absolute truth. People still want to go ahead and carry out Hilbert’s program and actually formalize everything, just as if G¨odel and Turing had never happened! I think this is an extremely interesting and, at least for me, a quite unexpected development. These were the negative conclusions from this saga. Now I want to wrap this talk up by summarizing the positive conclusions. There are perfect languages, for computing, not for reasoning. They’re computer programming languages. And we have universal Turing machines and universal programming languages, and although languages for reason- ing cannot be complete, these universal programming languages are com- plete. Furthermore, AIT has picked out the most expressive programming languages, the ones that are particularly good to use for a theory of program- size complexity. So there is a substantial practical spinoff. Furthermore, since I’ve worked most of my professional career on AIT, I view AIT as a substantial contri- bution to the search for the perfect language, because it gives us a measure of expressive power, and of conceptual complexity and the complexity of ideas. Remember, I said that from the perspective of the Middle Ages, that’s how many yes/no decisions God had to make to create something, which obviously He will do in an optimum manner.2 From the theoretical side, however, this quest was disappointing due to G¨odel incompleteness and because there is no Theory of Everything for pure math. Provably there is no TOE for pure math. In fact, if you look at the bits of the halting probability Ω, they show that pure mathematics contains infinite irreducible complexity, and in this precise sense is more like biology, the domain of the complex, than like theoretical physics, where there is still 2 Note that program-size complexity = size of smallest name for something.
  • 38. 38 Chaitin: Metabiology hope of finding a simple, elegant TOE.3 So this is the negative side of the story, unless you’re a biologist. The positive side is we get this marvelous programming technology. So this dream, the search for the perfect language and for absolute knowledge, ended in the bowels of a computer, it ended in a Golem. In fact, let me end with a Medieval perspective on this. How would all this look to someone from the Middle Ages? This quest, the search for the perfect language, was an attempt to obtain magical, God-like powers. Let’s bring someone from the 1200s here and show them a notebook computer. You have this dead machine, it’s a machine, it’s a physical object, and when you put software into it, all of a sudden it comes to life! So from the perspective of the Middle Ages, I would say that the perfect languages that we’ve found have given us some magical, God-like powers, which is that we can breath life into some inanimate matter. Observe that hardware is analogous to the body, and software is analogous to the soul, and when you put software into a computer, this inanimate object comes to life and creates virtual worlds. So from the perspective of somebody from the year 1200, the search for the perfect language has been successful and has given us some magical, God-like abilities, except that we take them entirely for granted. Thanks very much!4 3 Incompleteness can be considered good rather than bad: It shows that mathematics is creative, not mechanical. 4 Twenty minutes of questions and discussion followed. These have not been transcribed, but are available via digital streaming video at
  • 39. Chapter 3 Is the world built out of information? Is everything software? From Chaitin, Costa, Doria, After G¨odel, in preparation. Lecture, the Technion, Haifa, Thursday, 10 June 2010. Now for some even weirder stuff! Let’s return to The Thirteenth Floor and to the ideas that we briefly referred to in the introductory section of this chapter. Let’s now turn to ontology: What is the world built out of, made out of? Fundamental physics is currently in the doldrums. There is no pressing unexpected, new experimental data — or if there is, we can’t see that it is! So we are witnessing a return to pre-Socratic philosophy with its em- phasis on ontology rather than epistemology. We are witnessing a return to metaphysics. Metaphysics may be dead in contemporary philosophy, but amazingly enough it is alive and well in contemporary fundamental physics and cosmology. There are serious problems with the traditional view that the world is a space-time continuum. Quantum field theory and general relativity con- tradict each other. The notion of space-time breaks down at very small distances, because extremely massive quantum fluctuations (virtual parti- cle/antiparticle pairs) should provoke black holes and space-time should be torn apart, which doesn’t actually happen. Here are two other examples of problems with the continuum, with very 39
  • 40. 40 Chaitin: Metabiology small distances: • the infinite self-energy of a point electron in classical Maxwell electro- dynamics, • and in quantum field theory, renormalization, which Dirac never ac- cepted. And here is an example of renormalization: the infinite bare charge of the electron which is shielded by vacuum polarization via virtual pair formation and annihilation, so that far from an electron it only seems to have finite charge. This is analogous to the behavior of water, which is a highly polarized molecule forming micro-clusters that shield charge, with many of the highly positive hydrogen-ends of H2O near the highly negative oxygen-ends of these water molecules. In response to these problems with the continuum, some of us feel that the traditional Pythagorian ontology: God is a mathematician, the world is built out of mathematics, should be changed to this more modern → Neo-Pythagorian ontology: God is a programmer, the world is built out of software. In other words, all is algorithm! There is an emerging school, a new viewpoint named digital philosophy. Here are some key people and key works in this new school of thought: Ed- ward Fredkin,, Stephen Wolfram, A New Kind of Science, Konrad Zuse, Rechnender Raum (Calculating Space), John von Neumann, Theory of Self-Reproducing Automata, and Chaitin, Meta Math!.1 These may be regarded as works on metaphysics, on possible digital worlds. However there have in fact been parallel developments in the world of physics itself. 1 Lesser known but important works on digital philosophy: Arthur Burks, Essays on Cellular Automata, Edgar Codd, Cellular Automata.
  • 41. Is the world built out of information? Is everything software? 41 Quantum information theory builds the world out of qubits, not matter. And phenomenological quantum gravity and the theory of the entropy of black holes suggests that any physical system contains only a finite number of bits of information that grows, amazingly enough, as the surface area of the physical system, not as its volume — hence the name holographic principle. For more on the entropy of black holes, the Bekenstein bound, and the holographic principle, see Lee Smolin, Three Roads to Quantum Gravity. One of the key ideas that has emerged from this research on possible digital worlds is to transform the universal Turing machine, a machine capable of running any algorithm, into the universal constructor, a ma- chine capable of building anything: Universal Turing Machine → Universal Constructor. And this leads to the idea of an information economy: worlds in which everything is software, worlds in which everything is information and you can construct anything if you have a program to calculate it. This is like magic in the Middle Ages. You can bring something into being by invoking its true name. Nothing is hardware, everything is software!2 A more modern version of this everything-is-information view is presented in two green-technology books by Freeman Dyson: The Sun, the Genome and the Internet, and A Many-Colored Glass. He envisions seeds to grow houses, seeds to grow airplanes, seeds to grow factories, and imagines children using genetic engineering to design and grow new kinds of flowers! All you need is water, sun and soil, plus the right seeds! From an abstract, theoretical mathematical point of view, the key concept here is an old friend from Chapter 2: H(x) = the size in bits of the smallest program to compute x. H(x) is also = to the minimum amount of algorithmic information needed to build/construct x, = in Medieval language the number of yes/no decisions God had to make to create x, = in biological terms, roughly the amount of DNA needed for growing x. It requires the self-delimiting programs of Chapter 2 for the following intuitively necessary condition to hold: H(x, y) ≤ H(x) + H(y) + c. 2 On magic in the Middle Ages, see Umberto Eco, The Search for the Perfect Language, and Allison Coudert, Leibniz and the Kabbalah.
  • 42. 42 Chaitin: Metabiology This says that algorithmic information is sub-additive: If it takes H(x) bits of information to build x and H(y) bits of information to build y, then the sum of that suffices to build both x and y. Furthermore, the mutual information, the information in common, has this important property: H(x) + H(y) − H(x, y) = H(x) − H(x|y∗ ) + O(1), H(y) − H(y|x∗ ) + O(1). Here H(x|y) = the size in bits of the smallest program to compute x from y. This triple equality tells us that the extent to which it is better to build x and y together rather than separately (the bits of subroutines that are shared, the amount of software that is shared) is also equal to the extent that knowing a minimum-size program y for y helps us to know x and to the extent to which knowing a minimum-size program x for x helps us to know y. (This triple equality is an idealization; it holds only in the limit of extremely large compute times for x and y.) These results about algorithmic information/complexity H are a kind of economic meta-theory for the information economy, which is the asymp- totic limit, perhaps, of our current economy in which material resources (petroleum, uranium, gold) are still important, not just technological and scientific know-how. But as astrophysicist Fred Hoyle points out in his science fiction novel Ossian’s Ride, the availability of unlimited amounts of energy, say from nu- clear fusion reactors, would make it possible to use giant mass spectrometers to extract gold and other chemical elements directly from sea water and soil. Material resources would no longer be that important. If we had unlimited energy, all that would matter would be know-how, information, knowing how to build things. And so we finally end up with the idea of a printer for objects, a more plebeian term for a universal constructor. There are already commercial versions of such devices. They are called 3D printers and are used for rapid prototyping and digital fabrication. They are not yet universal constructors, but the trend is clear. . . 3 In Medieval terms, results about H(x) are properties of the size of spells, they are about the complexity of magic incantations! The idea that every- thing is software is not as new as it may seem. 3 One current project is to build a 3D printer that can print a copy of itself. See
  • 43. Bibliography [1] A. Burks, Essays on Cellular Automata, University of Illinois Press (1970). [2] G. J. Chaitin, Meta Math!, Pantheon (2005). [3] E. Codd, Cellular Automata, Academic Press (1968). [4] A. Coudert, Leibniz and the Kabbalah, Kluwer (1995). [5] F. Dyson, The Sun, the Genome and the Internet, Oxford University Press (1999). [6] F. Dyson, A Many-Colored Glass, University of Virginia Press (2007). [7] U. Eco, The Search for the Perfect Language, Blackwell (1995). [8] E. Fredkin, [9] F. Hoyle, Ossian’s Ride, Harper (1959). [10] J. von Neumann, Theory of Self-Reproducing Automata, University of Illinois Press (1966). [11] L. Smolin, Three Roads to Quantum Gravity, Basic Books (2001). [12] S. Wolfram, A New Kind of Science, Wolfram Media (2002). [13] K. Zuse, Rechnender Raum (Calculating Space), Vieweg (1969). 43
  • 44. 44 Chaitin: Metabiology
  • 45. Chapter 4 The information economy S. Zambelli, Computable, Constructive and Behavioural Economic Dynamics, Routledge, 2010, pp. 73–78. In honor of Kumaraswamy Velupillai’s 60th birthday Abstract: One can imagine a future society in which natural resources are irrelevant and all that counts is information. I shall discuss this possibil- ity, plus the role that algorithmic information theory might then play as a metatheory for the amount of information required to construct something. Introduction I am not an economist; I work on algorithmic information theory (AIT). This essay, in which I present a vision of a possible future information economy, should not be taken too seriously. I am merely playing with ideas and trying to provide some light entertainment of a kind suitable for this festschrift volume, given Vela’s deep appreciation of the relevance of foundational issues in mathematics for economic theory. In algorithmic information theory, you measure the complexity of some- thing by counting the number of bits in the smallest program for calculating it: program → Universal Computer → output. If the output of a program could be a physical or a biological system, then this complexity measure would give us a way to measure of the difficulty of 45
  • 46. 46 Chaitin: Metabiology explaining how to construct or grow something, in other words, measure either traditional smokestack or newer green technological complexity: software → Universal Constructor → physical system, DNA → Development → biological system. And it is possible to conceive of a future scenario in which technology is not natural-resource limited, because energy and raw materials are freely available, but is only know-how limited. In this essay, I will outline four different versions of this dream, in order to explain why I take it seriously: 1. Magic, in which knowing someone’s secret name gives you power over them, 2. Astrophysicist Fred Hoyle’s vision of a future society in his science- fiction novel Ossian’s Ride, 3. Mathematician John von Neumann’s cellular automata world with its self-reproducing automata and a universal constructor, 4. Physicist Freeman Dyson’s vision of a future green technology in which you can, for example, grow houses from seeds. As these four examples show, if an idea is important, it’s reinvented, it keeps being rediscovered. In fact, I think this is an idea whose time has come. Secret/True Names and the Esoteric Tradition “In the beginning was the Word, and the Word was with God, and the Word was God.” John 1:1 Information knowing someone’s secret/true name is very important in the esoteric tradition [1, 2]: • Recall the German fairy tale in which the punch line is “Rumpelstiltskin is my name!” (the Brothers Grimm). • You have power over someone if you know their secret name. • You can summon a demon if you know its secret name.
  • 47. The information economy 47 • In the Garden of Eden, Adam acquired power over the animals by naming them. • God’s name is never mentioned by Orthodox Jews. • The golem in Prague was animated by a piece of paper with God’s secret name on it. • Presumably God can summon a person or thing into existence by calling its true name. • Leibniz was interested in the original sacred Adamic language of cre- ation, the perfect language in which the essence/true nature of each substance or being is directly expressed, as a way of obtaining ultimate knowledge. His project for a characteristica universalis evolved from this, and the calculus evolved from that. Christian Huygens, who had taught Leibniz mathematics in Paris, hated the calculus [3], because it eliminated mathematical creativity and arrived at answers mechani- cally and inelegantly. Fred Hoyle’s Ossian’s Ride The main features in the future economy that Hoyle imagines are: • Cheap and unlimited hydrogen to helium fusion power, • Therefore raw materials readily available from sea-water, soil and air (for example, using extremely large-scale and energy intensive mass spectrometer-like devices [Gordon Lasher, private communication]). • And with essentially free energy and raw materials, all that counts is technological know-how, which is just information. Perhaps it’s best to let Hoyle explain this in his own words [4]: [T]he older established industries of Europe and America. . . grew up around specialized mineral deposits—coal, oil, metallic ores. Without these deposits the older style of industrialization was completely impossible. On the political and economic fronts,
  • 48. 48 Chaitin: Metabiology the world became divided into “haves” and “have-nots,” depend- ing whereabouts on the earth’s surface these specialized deposits happened to be situated. . . In the second phase of industrialism. . . no specialized deposits are needed at all. The key to this second phase lies in the pos- session of an effectively unlimited source of energy. Everything here depends on the thermonuclear reactor. . . With a thermonu- clear reactor, a single ton of ordinary water can be made to yield as much energy as several hundred tons of coal—and there is no shortage of water in the sea. Indeed, the use of coal and oil as a prime mover in industry becomes utterly inefficient and archaic. With unlimited energy the need for high-grade metallic ores disappears. Low-grade ones can be smelted—and there is an am- ple supply of such ores to be found everywhere. Carbon can be taken from inorganic compounds, nitrogen from the air, a whole vast range of chemical from sea water. So I arrived at the rich concept of this second phase of industri- alization, a phase in which nothing is needed but the commonest materials—water, air and fairly common rocks. This was a phase that can be practiced by anybody, by any nation, provided one condition is met: provided one knows exactly what to do. This second phase was clearly enormously more effective and powerful than the first. Of course this concept wasn’t original. It must have been at least thirty years old. It was the second concept that I was more interested in. The concept of information as an entity in itself, the concept of information as a violently explosive social force. In Hoyle’s fantasy, this crucial information — including the design of ther- monuclear reactors — that suddenly propels the world into a second phase of industrialization comes from another world. It is a legacy bequeathed to humanity by a nonhuman civilization desperately trying to preserve anything it can when being destroyed by the brightening of its star.
  • 49. The information economy 49 John von Neumann’s Cellular Automata World This cellular automata world first appeared in lectures and private working notes by von Neumann. These ideas were advertised in article in Scientific American in 1955 that was written by John Kemeny [5]. Left unfinished because of von Neumann’s death in 1957, his notes were edited by Arthur Burks and finally published in 1966 [6]. Burks then presented an overview in [7]. Key points: • World is a discrete crystalline medium. • Two-dimensional world, graph paper, divided into square cells. • Each square has 29 states. • Time is quantized as well as space. • State of each square the same universal function of its previous state and the previous state of its 4 immediate neighbors (square itself plus up, down, left, right immediate neighbors). • Universal constructor can assemble any quiescent array of states. • Then you have to start the device running. • The universal constructor is part of von Neumann’s self-reproducing automata. The crucial point is that in von Neumann’s toy world, physical systems are merely discrete information, that is all there is. And there is no dif- ference between computing a string of bits (as in AIT) and “computing” (constructing) an arbitrary physical system. I should also mention that starting from scratch, Edgar Codd came up with a simpler version of von Neumann’s cellular automata world in 1968 [8]. In Codd’s model cells have 8 states instead of 29.
  • 50. 50 Chaitin: Metabiology Freeman Dyson’s Green Technology Instead of Hoyle’s vision of a second stage of traditional smokestack heavy industry, Dyson [9, 10] optimistically envisions a green-technology small-is- beautiful do-it-yourself grass-roots future. The emerging technology that may someday lead to Dyson’s utopia is be- coming known as “synthetic biology” and deals with deliberately engineered organisms. This is also referred to as “artificial life,” the development of “designer genomes.” To produce something, you just create the DNA for it. Here are some key points in Dyson’s vision: • Solar electrical power obtained from modified trees. (Not from ther- monuclear reactors!) • Other useful devices/machines grown from seeds. Even houses grown from seeds?! • School children able to design and grow new plants, animals. • Mop up excessive carbon dioxide or produce fuels from sugar (actual Craig Venter projects [11]). On a much darker note, to show how important information is, there presumably exists a sequence of a few-thousand DNA bases (A, C, G, T) for the genome of a virus that would destroy the human race, indeed, most life on this planet. With current or soon-to-be-available molecular biology technology, genetic engineering tools, anyone who knew this sequence could easily synthesize the corresponding pathogen. Dyson’s utopia can easily turn into a nightmare. AIT as an Economic Metatheory So one can imagine scenarios in which natural resources are irrelevant and all that counts is technological know-how, that is, information. We have just seen four such scenarios. In such a world, I believe, AIT becomes, not an economic theory, but perhaps an economic metatheory, since it is a theory of information, a theory about the properties of technological know-how, as I will now explain.
  • 51. The information economy 51 The main concept in AIT is the amount of information H(X) required to compute (or construct) something, X. This is measured in bits of software, the number of bits in the smallest program that calculates X. Briefly, one refers to H(X) as the complexity of X. For an introduction to AIT, please see [12, 13]. In economic terms, H(X) is a measure of the amount of technological know-how needed to produce X. If X is a hammer, H(X) will be small. If X is a sophisticated military aircraft, H(X) will be quite large. Two other concepts in AIT are the joint complexity H(X, Y ) of produc- ing X and Y together, and the relative complexity H(X|Y ) of producing X if we are given Y for free. Consider now two objects, X and Y . In AIT, H(X) + H(Y ) − H(X, Y ) is referred to as the mutual information in X and Y . This is the extent to which it is cheaper to produce X and Y together than to produce X and Y separately, in other words, the extent to which the technological know-how needed to produce X and Y can be shared, or overlaps. And there is a basic theorem in AIT that states that this is also H(X) − H(X|Y ), which is the extent to which being given the know-how for Y helps us to construct X, and it’s also H(Y ) − H(Y |X), which is the extent to which being given the know-how for X helps us to construct Y . This is not earth-shaking, but it’s nice to know. (For a proof of this theorem about mutual information, please see [14].) One of the reasons that we get these pleasing properties is that AIT is like classical thermodynamics in that time is ignored. In thermodynamics, heat engines operate very slowly, for example, reversibly. In AIT, the time or effort required to construct something is ignored, only the information required is measured. This enables both thermodynamics and AIT to have clean, simple results. They are toy models, as they must be if we wish to prove nice theorems.
  • 52. 52 Chaitin: Metabiology Conclusion Clearly, we are not yet living in an information economy. Oil, uranium, gold and other scarce, precious limited natural resources still matter. But someday we may live in an information economy, or at least approach it asymptotically. In such an economy, everything is, in effect, software; hard- ware is comparatively unimportant. This is a possible world, though perhaps not yet our own world. References 1. A. Coudert, Leibniz and the Kabbalah, Kluwer, Dordrecht, 1995. 2. U. Eco, The Search for the Perfect Language, Blackwell, Oxford, 1995. 3. J. Hofmann, Leibniz in Paris 1672–1676, Cambridge University Press, 1974, p. 299. 4. F. Hoyle, Ossian’s Ride, Harper & Brothers, New York, 1959, pp. 157– 158. 5. J. Kemeny, “Man viewed as a machine,” Scientific American, April 1955, pp. 58–67. 6. J. von Neumann, Theory of Self-Reproducing Automata, University of Illinois Press, Urbana, 1966. (Edited and completed by Arthur W. Burks.) 7. A. Burks (ed.), Essays on Cellular Automata, University of Illinois Press, Urbana, 1970. 8. E. Codd, Cellular Automata, Academic Press, New York, 1968. 9. F. Dyson, The Sun, the Genome, & the Internet, Oxford University Press, New York, 1999. 10. F. Dyson, A Many-Colored Glass, University of Virginia Press, Char- lottesville, 2007. 11. C. Venter, A Life Decoded, Viking, New York, 2007.
  • 53. The information economy 53 12. G. Chaitin, Meta Maths, Atlantic Books, London, 2006. 13. G. Chaitin, Thinking about G¨odel and Turing, World Scientific, Singa- pore, 2007. 14. G. Chaitin, Exploring Randomness, Springer-Verlag, London, 2001, pp. 95–96. 1 July 2008
  • 54. 54 Chaitin: Metabiology
  • 55. Chapter 5 How real are real numbers? We discuss mathematical and physical arguments against continuity and in favor of discreteness, with particular emphasis on the ideas of ´Emile Borel (1871–1956). Lecture given Tuesday, 22 September 2009, at Wilfrid Laurier University in Waterloo, Canada. I’m not going to give a tremendously serious talk on mathematics today. Instead I will try to entertain and stimulate you by showing you some really weird real numbers. I’m not trying to undermine what you may have learned in your mathe- matics classes. I love the real numbers. I have nothing against real numbers. There’s even a real number — Ω — that has my name on it.1 But as you will see, there are some strange things going on with the real numbers. Let’s start by going back to a famous paper by Turing in 1936. This is Turing’s famous 1936 paper in the Proceedings of the London Mathemati- cal Society; mathematicians proudly claim it creates the computer industry, which is not quite right of course. But it does have the idea of a general-purpose computer and of hardware and software, and it is a wonderful paper. This paper is called “On computable numbers, with an application to the Entscheidungsproblem.” And what most people forget, and is the subject of 1 See the chapter on “Chaitin’s Constant” in Steven Finch, Mathematical Constants, Cambridge University Press, 2003. 55
  • 56. 56 Chaitin: Metabiology my talk today, is that when Turing talks about computable numbers, he’s talking about computable real numbers. Turing, 1936: “On computable numbers. . . ” But when you work on a computer, the last thing on earth you’re ever going to see is a real number, because a real number has an infinite number of digits of precision, and computers only have finite precision. Computers don’t quite measure up to the exalted standards of pure mathematics. One of the important contributions of Turing’s paper, not to computer technology but to pure mathematics, and even to philosophy and epistemol- ogy, is that Turing’s paper distinguishes very clearly between real numbers that are computable and real numbers that are uncomputable. What is a real number? It’s just a measurement made with infinite pre- cision. So if I have a straight line one unit long, and I want to find out where a point is, that corresponds to a real number. If it is all the way to the left in this unit interval, it’s 0.0000. . . If the point is all the way to the right, it’s 1.0000. . . If it is exactly in the middle, that’s .50000. . . And every point on this line corresponds to precisely one real number. There are no gaps. 0.0 ——– 0.5 ——– 1.0 So, if you just tell me exactly where a point is, that’s a real number. From the point of view of geometrical intuition a real number is something very simple: it is just a point on a line. But from an arithmetical point of view, if you want to calculate its numerical value digit by digit or bit by bit if you’re using binary, it turns out that real numbers are problematical. Even though to geometrical intuition points are the most natural and elementary thing you can imagine, if you want to actually calculate the value of a real number with infinite precision, you can get into big trouble. Actually, you never calculate it with infinite precision. What Turing says is that you calculate it with arbitrary precision. His notion of a computable real number is a real number that you can calculate as accurately as you may wish. I guess he actually says it is an infinite calculation. You start calculating its numerical value, if you’re using decimal, digit by digit, or if you’re using binary, bit by bit. You have the integer part, the decimal point, and then you have an infinite string of bits or digits, depending on your base, and the computer will grind away gradually giving you more and more bits or more and more digits of the numerical value of the number.
  • 57. How real are real numbers? 57 So that’s a computable real number. According to Turing, that means it is a real number for which there is an algorithm, a mechanical procedure, for calculating its value with arbitrary accuracy, with more and more precision. For example, π is a computable real number, √ 2 is a computable real number, and e is a computable real number. In fact, every real number you’ve ever encountered in your math classes, every individual real number that you’ve ever encountered, is a computable real number. Computable reals: π, √ 2, e, 1/2, 3/4 . . . These are the familiar real numbers, the computable ones, but surprisingly enough, Turing points out that there are also lots of uncomputable real numbers. Dramatically enough, the moment Turing comes up with the computer as a mathematical concept — mathematicians call this a universal Turing machine — he immediately points out that there are things no computer can do. And one thing no computer can do is calculate the value of an uncomputable real number. How does Turing show that there are uncomputable reals? Well, the first argument he gives goes back to Cantor’s theory of infinite sets, which tells us that the set of real numbers is an infinite set that is bigger, that is infinitely more numerous, than the set of computer programs. The possible computer programs are just as numerous as the positive integers, as the whole numbers 1, 2, 3, 4, 5. . . but the set of real numbers is a much bigger infinity. So in fact there are more uncomputable reals than computable reals. From Cantor’s theory of infinite sets, we see that the set of uncomputable reals is just as big as the set of all reals, while the set of computable reals is only as big as the set of whole numbers. The set of uncomputable reals is much bigger than the set of computable reals. #{uncomputable reals} = #{all reals} = ℵ1, #{computable reals} = #{computer programs} = #{whole numbers} =ℵ0, ℵ1 > ℵ0. The set of computable reals is as numerous as the computer programs, be- cause each computable real needs a computer program to calculate it. And the computer programs are as numerous as the whole numbers 1, 2, 3, 4, 5. . . because you can think of a computer program as a very big whole number.
  • 58. 58 Chaitin: Metabiology In base-two a whole number is just a string of bits, which is all a computer program is. That most reals are uncomputable was quite a surprise, but Turing doesn’t stop with that. In his famous paper he uses a technique from set theory called Cantor’s diagonal argument to exhibit an individual example of an uncomputable real. Turing’s 1936 paper has an intellectual impact and a technological impact. From a technological point of view his paper is fantastic because as we all know the computer has changed our lives; we can’t imagine living without it. In 1936 Turing has the idea of a flexible machine, of flexible hardware, of what we now call software. A universal machine changes to be like any other machine when you insert the right software. This is a very deep concept: a flexible digital machine. You don’t need lots of special-purpose machines, you can make do with only one machine. This is the idea of a general-purpose computer, and it is in Turing’s wonderful 1936 paper, before anybody built such a device. But then immediately Turing points out that there are real numbers that can’t be calculated, that no machine can furnish you with better and better approximations for; in fact there are more uncomputable reals than computable reals. So in a sense this means that most individual real numbers are like myth- ical beasts, like Pegasus or unicorns — name your favorite mythical object that unfortunately doesn’t exist in the real world. In this talk I will concentrate on the uncomputable reals, not the reals that are familiar to all of us like π, √ 2 and 3/4. I’ll tell you about some surprising real numbers, ones that are in the shadow, that we cannot compute, and whose numerical values are quite elusive. And the first thing I’d like to say on this topic is that there actually was an example of an uncomputable real before Turing’s 1936 paper. It was in a short essay by a wonderful French mathematician who is now largely forgotten called ´Emile Borel. ´Emile Borel in 1927, without anticipating in any way Turing’s wonderful paper, does point out a real number that we can now recognize as an uncomputable real. Borel’s 1927 number is a very paradoxical real number, and I’d like to tell you about it. Let’s see if you enjoy it as much as I do! Borel’s idea is to have a know-it-all real number: It’s an oracle that knows the answer to every yes/no question. Borel was a Frenchman, and he imagined writing all possible yes/no ques-
  • 59. How real are real numbers? 59 tions in French in a numbered list. So each question has its number, and then what you do is consider the real number with no integer part, and whose Nth decimal, the Nth digit after the decimal point of this number, answers the Nth question in the list of all possible questions. Borel’s 1927 know-it-all real: The Nth digit answers the Nth question. If the answer to the Nth question is “yes,” then the Nth digit is, say, a 1, and if the answer is “no” then that digit will be a 2. You have an infinite number of digits, so you can pack an infinite number of answers in Borel’s number. If you can imagine a list of all possible yes/no questions, this number will be like an oracle that will answer them all. If you could know the value of this magical number with sufficient accu- racy, you could answer any particular, individual question. Needless to say, this number is a bit fantastic. So why did Borel come up with this crazy example? To show us that there are real numbers that are not for real. Borel’s amazing number will give you the answer to every yes/no question in mathematics and in physics, and about the past and the future. You can ask Borel’s oracle paradoxical questions, like “Is the answer to this question no?” And then you have a problem, because whether you answer “no” or you answer “yes,” it will always be the wrong answer. There is no correct answer. Another problem is if you ask “Will I drink coffee tomorrow?” And depending on what you find in Borel’s number, you do the opposite. You will just have to put up with not drinking coffee for one day, perhaps, to refute the know-it-all number. So these are some paradoxical aspects of Borel’s idea. Another problem is, how do you make a list of all possible yes/no questions in order to give a number to each question? Actually, it is easy to number each question. What you do is make a list of all possible texts drawn from the alphabet of the language you’re interested in, and you have ten possibilities for each digit in Borel’s oracle number, and so far we’ve only used the digits 1 and 2. So you can use the other digits to say that that sequence of characters from that particular national alphabet
  • 60. 60 Chaitin: Metabiology isn’t grammatical, or it’s not a question, or it’s a question but it’s not a yes/no question, or it’s a yes/no question which has no answer because it’s paradoxical, and maybe it’s a yes/no question which has an answer but I don’t want to tell you about the future because I want to avoid the coffee problem. Another way to try to fix Borel’s number is to restrict it to just give the answer to mathematical questions. You can imagine an infinite list of mathematical questions, you pick a formal language in which you ask yes/no mathematical questions, you pick a notation for asking such questions, and there is certainly no problem with having Borel’s know-it-all number answer only mathematical questions. That will get rid of the paradoxes. You can make that work simply because you can pack an infinite amount of information in a real number.2 And this magical number could be represented, in principle, by two metal rods, if you believe in infinite precision lengths. You have a rod that is exactly one unit long, one meter long, and you have another rod that is smaller, whose length is precisely the know-it-all number. You have your standard meter, and you have something less than a meter long whose length is precisely the magical know-it-all number. If you could measure the size of the smaller rod relative to the standard meter with arbitrary accuracy, you could answer every mathematical ques- tion, if somebody gave you this magical metal rod.3 Of course, we are assuming that you can make measurements with infinite precision, which any physicist who is here will say is impossible. I think that the most accurate measurement that has ever been made has under twenty digits of precision. But in theory having these two rods would give us an oracle. It would be like having Borel’s know-it-all number. And now you’re not going to be surprised to hear that if you fix Borel’s 2 As long as we avoid self-reference, i.e., giving this know-it-all number a name and then having a digit ask about itself in a way that requires that very digit to be different. E.g., is the digit of Borel’s know-it-all number that corresponds to this very question a 2? 3 This is an argument against infinite divisibility of space. In 1932 Hermann Weyl gave an argument against the infinite divisibility of time. If time is really infinitely divisible, then a machine could perform one step of a calculation in one second, then another step in half a second, the next in 1/4 of a second, then in 1/8 of a second, then 1/16 of a second, and in this manner would perform an infinite number of steps in precisely 1+ 1 2 + 1 4 +. . . = 2 seconds. But no one believes that such so-called Zeno machines are actually possible.
  • 61. How real are real numbers? 61 number so that it is at least well-defined, not paradoxical, it is in fact un- computable. For otherwise you could compute the answer to every question, which is implausible. But Borel did not really have the notion of computability nor of what a computer is. He was working with the idea of computation intuitively, informally, without defining it. He said that as far as he was concerned this number is conceivable but not really legitimate. He felt that his real number was too wild. Borel had what is now called a constructive attitude. His personal view which he states in that little 1927 essay with the oracle number, is that he believes in a real number if in principle it can be calculated, if in theory there’s some way to do that. And he talks about this counter-example, this number which is conceivable but in his opinion is not really legitimate. If we remove Borel’s number, that will leave a hole in the line, in the unit interval [0, 1] = {0 ≤ x ≤ 1}. So we’ve broken the unit interval into two pieces because we just eliminated one point, which is very unfortunate geometrically. But let’s go on. Okay, so this was 1927, this was ´Emile Borel with his paradoxical know-it- all real number that answers every yes/no question, and it is a mathematical fantasy, not a reality. Now let’s go back to Turing. In 1936 Turing points out that there are more uncomputable reals than computable ones. Now I’d like to tell you something that Turing doesn’t point out, which uses ideas from ´Emile Borel, who was one of the inventors of what’s called measure theory or probability theory. It turns out that if you choose a real number x between zero and one, x ∈ [0, 1], and you have uniform probability of picking any real number between zero and one, then the probability is unity that the number x will be uncomputable. The probability is zero that the number will be computable. Prob{uncomputable reals} = 1, Prob{computable reals} = 0. That’s not too difficult to see. Now I’ll give you a mathematical proof of this. You want to cover all the computable reals in the unit interval with a covering which can be arbitrarily small. That’s the way to prove that the computable reals have measure zero, which means that they are an infinites- imal part of the unit interval, that they have zero probability. Technically, you say they have measure zero.
  • 62. 62 Chaitin: Metabiology Remember that every computable real corresponds to a program, the program to calculate it, and the programs are essentially positive integers, so there’s a first program, a second program, a third program. . . So you can imagine all the computable reals in a list. There will be a first computable real, a second, a third. . . And I cover the first computable real with an interval. I put on top of it an interval of length /2. And then I cover the second computable real with an interval of length /4. I cover the third computable real with an interval of length /8 . . . then /16, then /32 . . . So the total size of the covering is 2 + 4 + 8 + 16 + 32 + . . . = . Some of these intervals may overlap; it doesn’t really matter. What does matter is that the total length of all these intervals is exactly , and you can make as small as you want. So this is just a proof that something is of measure zero. I’m taking the trouble to show that you can corner all the computable real numbers in the unit interval by covering them. You can do it with a covering that you can make as small as you please, which means that the computable reals have zero probability, they occupy zero real-estate in the unit interval. This is a proof that the computable reals really are exceptional. But they’re the exception of our normal, everyday experience. The fact that un- computable reals have probability unity doesn’t help us to find any concrete examples! To repeat, if I pick a real number at random between zero and one, it is possible to get a computable real, but it is infinitely unlikely. Probability zero in this circumstance doesn’t mean impossibility, it just means that it’s an infinitesimal probability, it is infinitely unlikely. It is possible. It would be miraculous, but it can happen. The way mathematicians say this, is that real numbers are almost surely uncomputable. This is a bit discouraging to those of us who prefer computable reals to uncomputable ones. Or maybe it is a bit surprising that all the individual reals in our normal experience are exceptional. It does make me think that perhaps real numbers are problematic and cannot be taken for granted. What do you think?
  • 63. How real are real numbers? 63 What’s another way to put it? In other words, the real numbers are like a Swiss cheese with a lot of holes. In fact, it’s all holes! It’s like looking at the night sky. There are stars in the night sky, those are the computable reals, but the background is always black — the stars are the exception. So that is how the real numbers look! All the reals we know and love are exceptional. And Borel goes a little farther in his last book, written when he was in his eighties. In 1952 he published a book called Les nombres inaccessibles — Inaccessible Numbers — in which he points out that most real numbers can’t even be referred to or named individually in any way. The real numbers that you can somehow name, or pick out as individuals, even without being able to compute them, have to be the exception, with total probability zero. Most real numbers cannot even be named as individuals in any way, constructive or non-constructive. The way somebody put it is, most reals are wall-flowers, they’ll never be invited to dance! Prob{individually nameable reals} = 0, Prob{un-nameable reals} = 1. Okay, so what I’d like to do in the rest of this talk is to take Borel’s crazy, know-it-all, oracle real number, and try to make it as realistic as possible. We’ve gotten ourselves into a bit of a quandary. I tend to believe in something if I can calculate it; if so, that mathematical object has concrete meaning for me. So I have a sort of constructive attitude in math. But there is this surprising fact that in some sense most mathematical facts or objects seem to be beyond our reach. Most real numbers can never be calculated, they’re uncomputable, which suggests that mathematics is full of things that we can’t know, that we can’t calculate. This is related to something famous called G¨odel’s incompleteness the- orem from 1931, five years before Turing. G¨odel’s 1931 theorem says that given any finite set of axioms, there will be true mathematical statements that escape, that are true but can’t be proven from those axioms. So math- ematics resists axiomatization. There is no Theory of Everything for pure mathematics. G¨odel, 1931: No TOE for pure mathematics! And what Turing shows in 1936 is that there are a lot of things in mathemat- ics that you can never calculate, that are beyond our reach because there’s no way to calculate the answer. And in fact real numbers are an example: most real numbers, with probability one, cannot be calculated.
  • 64. 64 Chaitin: Metabiology So it might be nice to try to come up with an example of a particular real number that can’t be calculated, and try to make these strange, mysterious, hidden real numbers as concrete as possible. I’d like to show you a real number — Ω — which is as real as I can make it, but is nevertheless uncomputable. That’s my goal. In other words, there is what you can calculate or what you can prove in mathematics, and then there is this vast cloud of unknowing, of things that you can’t know or calculate. And I would like to try to find something right at the border between the knowable and the unknowable. I’m going to make it as real as possible, but it’s going to be just beyond our reach, just beyond what we can calculate. I want to show you a real number that can almost be calculated, which is as close as possible to seeming real, concrete, but in fact escapes us and is an example of this ubiquitous phenomenon of uncomputability that Turing discovered in 1936, of numbers which cannot be computed. How can we come up with a number like this? I’ll do it by combining ideas from Turing with ideas from Borel, and then using compression to eliminate all the redundancy. And the result will be my Ω number. This is not how I actually discovered Ω, but I think it is a good way to understand Ω. In his 1936 paper Turing discovered what’s called the halting problem. This famous paper took years to digest, and it was a while before mathe- maticians realized how important the halting problem is. Another important idea in this paper is the notion of a universal Turing machine. Of course, he doesn’t call it a Turing machine, that name came later. So if you look there you don’t find the words “Turing machine.” Another thing that is in this paper but you won’t find it if you look for it, is a very famous result called the unsolvability of the halting problem, which I will explain now. If you look at the paper, it’s not easy to spot, it’s not called that, but the idea is certainly there. It took years of work on this paper by a community to extract the essential ideas, give them catchy names, and start waving flags with those names on them. So let me tell you about the halting problem, which is a very fundamental thing that Turing came up with. Remember that Turing has the idea of a general-purpose computer, and then since he’s a pure mathematician, he immediately starts pointing out that there are things that no computer can calculate. There are things that no algorithm can achieve, which there is no mechanical way to calculate.
  • 65. How real are real numbers? 65 One of these things is called the halting problem. What is the halting problem? It’s a very simple question. Let’s say you’re given a computer program, and it’s a computer program that is self-contained, so it cannot ask for input, it cannot read in any data. If there is any data it needs all that data has to be included in the program as a constant. And the program just starts calculating, it starts executing. And there are two possibilities: Does the program go on forever, or at some point does it get a result and say “I’m finished,” and halt? That’s the question. Does a self-contained computer program ever halt? So you’re given a program that’s self-contained, and want to know what will happen. It’s a self-contained program, you just start running it — the process is totally mechanical — and there are two possibilities: The first possibility is that this process will go on forever, that the program is searching for something that it will never find, and is in some kind of an infinite loop. The other possibility is that the program will eventually find what it is looking for, and maybe produce a result; at any rate, it will halt and stop and it is finished. And you can find out which is the case by running it. You run the program, and if it stops eventually you are going to discover that, if you are patient enough. The problem is what if it never stops; running the program cannot de- termine that. You can give up after running the program for a day or for a week, but you can’t be sure it is never going to stop. So Turing asks the very deep question, “Is there a general procedure, given a program, for deciding in advance, without running it, whether it is going to go on forever or whether it is eventually going to stop?” You want an algorithm for deciding this. You want an algorithm which will take a finite amount of time to decide, and will always give you the correct answer. And what Turing shows is that there is no general method for deciding, there is no algorithm for doing this; deciding whether a program halts or not isn’t a computable function. This is a very simple question involving computer programs that always has an answer — the program either goes on forever or not — but there’s no mechanical procedure for deciding, there’s no algorithm which always gives you the correct answer, there’s no general way, given a computer program, to tell what is going to happen.
  • 66. 66 Chaitin: Metabiology For individual programs you can sometimes decide, you can even settle infinitely many cases, but there’s no general way to decide. This is the famous result called the unsolvability of the halting problem. Turing, 1936: Unsolvability of the halting problem! However if you are a practical person from the business school, you may say, “What do I care?” And I would have to agree with you. You may well say, “All I care is will the program stop in a reasonable amount of time, say, a year. Who is going to wait more than a year?” But if you want to know if a program will stop in a fixed amount of time, that’s very easy to do, you just run it for that amount of time and see what happens. There is no unsolvability, none at all. You only get into trouble when there’s no time limit. So you may say that this is sort of a fantasy, because in the real world there is always a time limit: We’re not going to live forever, or you’re going to run out of power, or the computer is going to break down, or be crushed by glaciers, or the continents will shift and a volcano will melt your computer, or the sun will go nova, whatever horror you want to contemplate! And I agree with you. The halting problem is a theoretical question. It is not a practical question. The world of mathematics is a toy world where we ask fantasy questions, which is why we have nice theories that give nice answers. The real world is messy and complicated. The reason you can use reasoning and prove things in pure mathematics is because it’s a toy world, it’s much simpler than the real world. Okay, so this question, the halting problem, is not a real question, it’s an abstract, philosophical question. If you suspected that, I agree with you, you were right! But I like to live in the world of ideas. It’s a game, you may say, it’s a fantasy world, but that’s the world of pure mathematics. So let’s start with the question Turing proved in his 1936 paper is unsolv- able. There is no general method, no mechanical procedure, no algorithm to answer this question that will always work. So what I do, is I play a trick of the kind that you use in a field called statistical mechanics, a field of physics, which is to take an individual problem and imbed it in a space, an ensemble of all possible problems of that type. That’s a well-known strategy. In other words, instead of asking if an individual program halts or not, let’s look at the probability that this will happen, taken over the ensemble of all possible programs. . .
  • 67. How real are real numbers? 67 But first let me tell you why it is sometimes very important to know whether a program halts or not. You may say, “Who cares?” Well, in pure mathematics it is important because there are famous mathematical conjectures which it turns out are equivalent to asking whether a program halts or not. There’s a lovely example from ancient Greece. There’s something called a perfect number. A number is perfect if it is the sum of all its divisors. (Or twice the sum, if you include the number itself as one of the divisors.) So 6 is a perfect number, because its divisors are 1, 2 and 3, and 6 = 1 + 2 + 3. That’s a perfect number. If the sum of the divisors is more than the number, then it’s abundant; if the sum of the divisors is less than the number, then it is deficient; and if the sum of the divisors is exactly equal to the number, then it is perfect. Furthermore, two numbers are amicable if each one is the sum of the divisors of the other. And there are lots of perfect numbers. The next perfect number is 28. 28 = 1 + 2 + 4 + 7 + 14. The question is, are there any odd perfect numbers? This is a question that goes back to ancient Greece, to Pythagoras, Euclid and Plato. Are there odd perfect numbers? So the question is, are there any odd perfect numbers? And the answer, amazingly enough, is that nobody knows. It’s a very simple question, the concepts go back two millennia, but all the perfect numbers that have been found are even, and nobody knows if there’s an odd perfect number. Now, in principle you could just start a computer program going, have it look at each odd number, find its divisors, add them, and see whether the sum is exactly the number. So if there’s an odd perfect number, we’re eventually going to find it. If the program never ends, then all the perfect numbers are even. It searches for an odd perfect number and either it halts because it found one, or goes on forever without ever finding what it is looking for. It turns out that most of the famous conjectures in mathematics, but not all, are equivalent to asking whether a computer program halts. The general
  • 68. 68 Chaitin: Metabiology idea is that most down-to-earth mathematical questions are instances of the halting problem. However whether or not there are infinitely many perfect numbers — which is also unknown — is not a case of the halting problem. On the other hand, a famous conjecture called the Riemann hypothesis is an instance of the halting problem. And there’s Fermat’s Last Theorem, ac- tually a three-century old conjecture which has now been proven by Andrew Wiles, stating that there is no solution of xN + yN = zN (x, y, z integers > 0, N integer ≥ 3). These are all conjectures which if false can be refuted by a numerical counter- example. You can search systematically for a counter-example using a com- puter program, hence that kind of mathematical conjecture is equivalent to asking whether a program halts or not. There’s a program that systematically looks for solutions of xN +yN = zN and there’s a program that systematically looks for zeros of the Riemann zeta function that are in the wrong place. The Riemann hypothesis is complicated, but if it’s false, there is a finite calculation which refutes it, and you can search systematically for that. (The Riemann hypothesis is important because if it’s true, then the prime numbers are smoothly distributed in a certain precise technical sense. This seems to be the case but no one can prove it.) What I’m trying to say is that a lot of famous mathematical conjectures are equivalent to special cases of the halting problem. If you had a way of solving the halting problem that would be pretty nifty. It would be great to have an oracle for the halting problem. Which by the way is Turing’s terminology, but not in that famous 1936 paper. In another paper he talks about oracles, which is a lovely term to use in pure mathematics. Following Borel 1927, we know how to pack the answers to all possible cases of the halting problem into one real number, and this gives us a more realistic version of Borel’s magical know-it-all oracle number. You use the successive bits of a real number to give the answer to every individual case of the halting problem. Remember that you can think of a computer program as a whole number, as an integer. You can number all the programs. In binary machine language a computer program is just a long bit string, and you can think of it as the base-two numeral for a big whole number. So every program is also a whole number. And then if a program is the number N, the Nth program in a list of all possible programs, you use the Nth bit of a real number to tell us whether
  • 69. How real are real numbers? 69 or not that program halts. If the Nth program halts, the Nth bit will be a 1; if it doesn’t halt, the Nth bit will be a 0. Halting-problem oracle number: The Nth bit answers the Nth case of the halting problem. This is a more realistic version of Borel’s 1927 oracle number. And following Turing’s 1936 paper it is uncomputable. Why? Because if you could compute this real number, you could solve the halt- ing problem, you could decide whether any self-contained program will halt, and this would enable you to settle a lot of famous mathematical conjectures, for instance the Riemann hypothesis. The Clay Mathematics Institute has offered a million dollar prize to the person who settles the Riemann hypoth- esis, but only if they settle it positively, I think. But it would also be very interesting to refute the Riemann hypothesis. There is a bit in this real number which corresponds to the program that looks for a refutation of the Riemann hypothesis. If you could know what this particular bit is, that wouldn’t actually be worth a million dollars, because it wouldn’t give you a proof. Nevertheless this is a piece of information that a lot of mathematicians would like to know because the Riemann hypothesis is a famous problem in pure mathematics having to do with the prime numbers and how smoothly they are distributed. So a halting-problem oracle would be a valuable thing to have. This number wouldn’t tell you about history or the future; it wouldn’t answer every yes/no question in French. But Borel’s 1927 number is paradoxical. Our halting-problem oracle is a much more down-to-earth number. In spite of being more down to earth, it is an uncomputable real that would also be very valuable. But we can do even better! This halting-problem oracle packs a lot of mathematical information into one real number, but it doesn’t do it in the best, most economical way. This real number is redundant, it repeats a lot of information, it’s not the most compact, concise way to give the answer to every case of the halting problem. You’re wasting a lot of bits, you’re wasting a lot of space in this real number, you’re repeating a lot of information. Let me tell you why. We want to know whether individual programs halt or not. Now I’ll give the second and last proof in this talk. Suppose that we are given a lot of individual cases of the halting problem. Suppose we have a list of a thousand or a million programs, and want to know if each one halts or not. These are all self-contained programs.
  • 70. 70 Chaitin: Metabiology If you have a thousand programs or a million programs, you might think that to know whether each of these programs halts or not is a thousand or a million bits of mathematical information. And it turns out that it’s not, it’s actually only ten or twenty bits of mathematical information. N cases of halting problem = only log2 N bits of information. Why isn’t it a thousand or a million bits of information? Well, you don’t need to know the answer in every individual case. You don’t want to ask the oracle too many questions. Oracles should be used sparingly. Do we really need to ask the oracle about each individual program? Not at all! It is enough to know how many of the programs halt; I don’t need to know each individual case. And that’s a lot less information. If there are 2N programs, you only need N bits of information, not 2N bits. You don’t need to know about each individual case. As I said, you just need to know how many of the programs halt. If there are N programs, that’s just log2 N bits of information, which is much less than N bits of information. How come we get this huge savings? Let’s say you are given a finite set of programs, you have a finite collection of programs, and you want to know whether each one halts or not. Why does it suffice to know how many of these programs halt? You just start running all of them in parallel, and they start halting, and eventually all the programs that will ever halt, have halted. And if you know exactly how many that is, you don’t have to wait any longer, you can stop at that point. You know that all the other programs will never halt. All the ones that haven’t halted yet are never going to halt. In other words, the answers to individual instances of the halting problem are never independent, they are always correlated. These are not independent mathematical facts. That’s why we don’t really need to ask an oracle in each individual case whether a program halts. We can compress this information a great deal. This information has a lot of redundancy. There are a lot of correlations in the answers to individual instances of the halting problem. Okay, so you don’t need to use a bit for each program to get a real number that’s an oracle for the halting problem. I just told you how to do much better if you are only interested in a finite set of programs. But what if you are interested in all possible programs, what then? Well, here’s how you handle this.
  • 71. How real are real numbers? 71 You don’t ask whether individual programs halt or not; you ask what is the probability that a program chosen at random will halt. Halting probability Ω = Prob{random program halts}. That’s a real number between zero and one, and it is a real number I’m very proud of. I like to call it Ω, which is the last letter in the Greek alphabet, because it’s sort of a maximally unknowable real number. Let me explain first how you define Ω, and then I’ll talk about its remark- able properties. The idea is this: I’m taking Turing’s halting problem and I’m making it into the halting probability. Turing is interested in individual programs and asks whether or not they halt. I take all possible programs, I put them into a bag, a big bag that contains every possible computer program, I close my eyes, I shake the bag, I reach in and pull out a program and ask, “What is the probability that this program will halt?” If every program halts, this probability would be one. If no program halts, the probability of halting would be zero. Actually some programs halt and some don’t, so the halting probability is going to be strictly between zero and one 0 < Ω = .11011100 . . . < 1, with an exact numerical value depending on the choice of programming lan- guage. And it turns out that if you do things properly — there are some technical problems that I don’t want to talk about — you don’t really need to know for every individual program whether it halts or not. What you really need to know is what is the probability that a program will halt. And the way it works is this: If I know the numerical value of the halting probability Ω with N bits of precision — I’m writing it in binary, in base two — if I know the numerical value of the halting probability Ω with N bits of precision, then I know for every program up to N bits in size whether or not it halts. Can you see why? Try thinking about it for a while. Knowing N bits of Ω ⇒ Knowing which ≤ N bit programs halt. This is a very compact, compressed way — in fact, it is the most com- pressed, compact way — of giving the answers to Turing’s halting problem. You can show this is the best possible compression, the best possible oracle,
  • 72. 72 Chaitin: Metabiology this is the most economical way to do it, this is the algorithmic information content of the halting problem. Let me try to explain this. Do you know about file compression programs? There are lots of compression programs on your computer, and I’m taking all the individual answers to the halting problem and compressing them. So whatever your favorite compression program is, let’s use it to com- press all the answers to the halting problem. If you could compress it per- fectly, you’d get something that has absolutely no redundancy, something that couldn’t be compressed any more. So you get rid of all the redundancy in individual answers to the halting problem, and what you get is this number I call the halting probability Ω. This is just the most compact, compressed way to give you the answer to all the individual cases of Turing’s famous 1936 halting problem. Even though Ω is a very valuable number because it solves the halting problem, the interesting thing about it is that it is algorithmically and log- ically irreducible. In other words, Ω looks random, it looks like it has no structure, the bits of its numerical value look like independent tosses of a fair coin. The bits of Ω are irreducible mathematical information. Why is this? The answer is, basically, that any structure in something dis- appears when you compress it. If there were any pattern in the bits of Ω, for example, if 0s and 1s were not equally likely, then Ω would not be maxi- mally compressed. In other words, when you remove all the redundancy from something, what you’re left with looks random, but it isn’t, because it’s full of valuable information. What you get when you compress Turing’s halting problem, Ω, isn’t noise, it’s very valuable mathematical information, it gives you the answers to Tur- ing’s halting problem, but it looks random, accidental, arbitrary, simply be- cause you’ve removed all the redundancy. Each bit is a complete surprise. This may seem paradoxical, but it is a basic result in information theory that once you compress something and get rid of all the redundancy in it, if you take a meaningful message and do this to it, afterwards it looks just like noise. Let me summarize what we’ve seen thus far. We have the halting probability Ω that is an oracle for Turing’s halting problem. It depends on your programming language and there are technical details that I don’t want to go into, but if you do everything properly you
  • 73. How real are real numbers? 73 get this probability that is greater than zero and less than one. It’s a real number, and if you write it in base two, there’s no integer part, just a “.” and then a lot of bits. These bits look like they have absolutely no structure or pattern; they look random, they look like the typical result of independent tosses of a fair coin. They are sort of maximally unknowable, maximally uncomputable. Let me try to explain what this means. At this point I want to make a philosophical statement. In pure mathe- matics all truths are necessary truths. And there are other truths that are called contingent or accidental like historical facts. That Napoleon was the emperor of France is not something that you expect to prove mathematically, it just happened, so it’s an accidental or a contingent truth. And whether each bit of the numerical value of the halting probability Ω is a 0 or a 1 is a necessary truth, but looks like it’s contingent. It’s a perfect simulation of a contingent, accidental, random truth in pure mathematics, where all truths are necessary truths. The bits of Ω are necessary but look accidental, contingent. This is a place where God plays dice. I don’t know if any of you remember the dispute many years ago between Neils Bohr and Albert Einstein about quantum mechanics? Einstein said, “God doesn’t play dice!”, and Bohr said, “Well, He does in quantum mechanics!” I think God also plays dice in pure mathematics. I do believe that in the Platonic world of mathematics the bits of the halting probability are fully determined. It’s not arbitrary, you can’t chose them at random. In the Platonic world of pure mathematics each bit is determined. Another way to put it is that God knows what each bit is. But what can we know down here at our level with our finite means? Well, seen from our limited perspective the bits of Ω are maximally unknowable, they are a worst case. The precise mathematical statement of why the bits of the numerical value of Ω are difficult to know, difficult to calculate and difficult to prove (to determine what they are by proof) is this: In order to be able to calculate the first N bits of the halting probability you need to use a program that is at least N bits in size. And to be able to prove what each of these N bits is starting from a set of axioms, you need to have at least N bits of axioms. So the bits of Ω are irreducible mathematical facts, they are computa- tionally and logically irreducible. Essentially the only way to get out of a
  • 74. 74 Chaitin: Metabiology formal mathematical theory what these bits are, is to put that in as a new axiom. But you can prove anything by adding it as a new axiom. So this a place where mathematical truth has no structure, no pattern, where logical reasoning doesn’t work, because these are sort of accidental mathematical facts. Let me explain this another way. Leibniz talks about something called the principle of sufficient reason. He was a rationalist, and he believed that if anything is true, it must be true for a reason. In pure math the reason that something is true is called a proof. However, the bits of the halting probability Ω are truths that are true for no reason; more precisely, they are true for no reason simpler than themselves. The only way to prove what they are is to take that as a new postulate. They seem to be completely contingent, entirely accidental. The bits of Ω are mathematical facts that are true for no reason. They look a lot like independent tosses of a fair coin, even though they are determined mathematically. It’s a perfect simulation within pure math of independent tosses of a fair coin. So to give an example, 0s and 1s are going to be equally likely. If you knew all the even bits, it wouldn’t help you to get any of the odd bits. If you knew the first million bits, it wouldn’t help you to get the next bit. It’s a place where mathematical truth just has no structure or pattern. But the bits of Ω do have a lot of statistical structure. For example, in the limit there will be exactly as many 0s as 1s, the ratio of their occurrences will tend to unity. Also, all blocks of two bits are equally likely. 00, 01, 10 and 11 each have limiting relative frequency exactly 1/4 — you can prove that. More generally, Ω is what Borel called a normal number, which means that in each base b, every possible block of K base-b “digits” will have exactly the same limiting relative frequency 1/bK . That’s provably the case for Ω. Ω is provably Borel normal. Another thing you can show is that the Ω number is transcendental; it’s not algebraic, it’s not the solution of an algebraic equation with integer coefficients. Actually any uncomputable number must be transcendental; it can’t be algebraic. But Ω is more than uncomputable, it’s maximally uncomputable. This is a place where mathematical truth has absolutely no structure or pattern. This is a place where mathematical truth looks contingent or accidental or random.
  • 75. How real are real numbers? 75 Now if I may go one step further, I’d like to end this talk by comparing pure mathematics with theoretical physics and with biology. Pure mathematics developed together with theoretical physics. A lot of wonderful pure mathematicians of the past were also theoretical physicists, Euler for example, or more recently Hermann Weyl. The two fields are rather similar. And physicists are still hoping for a theory of everything (TOE), which would be a set of simple, elegant equations that give you the whole universe, and which would fit on a T-shirt. So that’s physics. On the other hand we have biology. Molecular biology is a very complicated subject. An individual cell is like a city. Every one of us has 3 × 109 bases in our DNA, which is 6 × 109 bits. There is no simple equation for a human being. Biology is the domain of the complicated. How does pure mathematics compare with these two other fields? Nor- mally you think pure math is closer to physics, since they grew together, they co-evolved. But what the bits of the halting probability Ω show is that in a certain sense pure math is closer to biology than it is to theoretical physics, because pure mathematics provably contains infinite irreducible complexity. Math is even worse than biology, which has very high but only finite complexity. The human genome is 6 × 109 bits, which is a lot, but it’s finite. But pure mathematics contains the bits of Ω, which is an infinite number of bits of complexity! Human = 6 × 109 bits, Ω = infinite number of bits. Thanks very much!
  • 76. 76 Chaitin: Metabiology
  • 77. Chapter 6 Speculations on biology, information and complexity Bulletin of the European Association for Theoretical Computer Science 91 (February 2007), pp. 231–237. Abstract: It would be nice to have a mathematical understanding of basic biological concepts and to be able to prove that life must evolve in very general circumstances. At present we are far from being able to do this. But I’ll discuss some partial steps in this direction plus what I regard as a possible future line of attack. Can Darwinian evolution be made into a math- ematical theory? Is there a fundamental mathematical theory for biology? Darwin = math ?! In 1960 the physicist Eugene Wigner published a paper with a wonderful title, “The unreasonable effectiveness of mathematics in the natural sciences.” In this paper he marveled at the miracle that pure mathematics is so often extremely useful in theoretical physics. To me this does not seem so marvelous, since mathematics and physics co- evolved. That however does not diminish the miracle that at a fundamental 77
  • 78. 78 Chaitin: Metabiology level Nature is ruled by simple, beautiful mathematical laws, that is, the miracle that Nature is comprehensible. I personally am much more disturbed by another phenomenon, pointed out by I.M. Gel’fand and propagated by Vladimir Arnold in a lecture of his that is available on the web, which is the stunning contrast between the relevance of mathematics to physics, and its amazing lack of relevance to biology! Indeed, unlike physics, biology is not ruled by simple laws. There is no equation for your spouse, or for a human society or a natural ecology. Biology is the domain of the complex. It takes 3 × 109 bases = 6 × 109 bits of information to specify the DNA that determines a human being. Darwinian evolution has acquired the status of a dogma, but to me as a mathematician seems woefully vague and unsatisfactory. What is evolu- tion? What is evolving? How can we measure that? And can we prove, mathematically prove, that with high probability life must arise and evolve? In my opinion, if Darwin’s theory is as simple, fundamental and basic as its adherents believe, then there ought to be an equally fundamental math- ematical theory about this, that expresses these ideas with the generality, precision and degree of abstractness that we are accustomed to demand in pure mathematics. Look around you. We are surrounded by evolving organisms, they’re everywhere, and their ubiquity is a challenge to the mathematical way of thinking. Evolution is not just a story for children fascinated by dinosaurs. In my own lifetime I have seen the ease with which microbes evolve immunity to antibiotics. We may well live in a future in which people will again die of simple infections that we were once briefly able to control. Evolution seems to work remarkably well all around us, but not as a mathematical theory! In the next section of this paper I will speculate about possible directions for modeling evolution mathematically. I do not know how to solve this difficult problem; new ideas are needed. But later in the paper I will have the pleasure of describing a minor triumph. The program-size complexity viewpoint that I will now describe to you does have some successes to its credit, even though they only take us an infinitesimal distance in the direction we must travel to fully understand evolution.
  • 79. Speculations on biology, information and complexity 79 A software view of biology: Can we model evolution via evolving software? I’d like to start by explaining my overall point of view. It is summarized here: Life = Software ? program → COMPUTER → output DNA → DEVELOPMENT/PREGNANCY → organism (Size of program in bits) ≈ (Amount of DNA in bases) × 2 So the idea is firstly that I regard life as software, biochemical software. In particular, I focus on the digital information contained in DNA. In my opinion, DNA is essentially a programming language for building an organism and then running that organism. More precisely, my central metaphor is that DNA is a computer program, and its output is the organism. And how can we measure the complexity of an organism? How can we measure the amount of information that is contained in DNA? Well, each of the successive bases in a DNA strand is just 2 bits of digital software, since there are four possible bases. The alphabet for computer software is 0 and 1. The alphabet of life is A, G, C, and T, standing for adenine, cytosine, guanine, and thymine. A program is just a string of bits, and the human genome is just a string of bases. So in both cases we are looking at digital information. My basic approach is to measure the complexity of a digital object by the size in bits of the smallest program for calculating it. I think this is more or less analogous to measuring the complexity of a biological organism by 2 times the number of bases in its DNA. Of course, this is a tremendous oversimplification. But I am only search- ing for a toy model of biology that is simple enough that I can prove some theorems, not for a detailed theory describing the actual biological organ- isms that we have here on earth. I am searching for the Platonic essence of biology; I am only interested in the actual creatures we know and love to the extent that they are clues for finding ideal Platonic forms of life. How to go about doing this, I am not sure. But I have some suggestions. It might be interesting, I think, to attempt to discover a toy model for evolution consisting of evolving, competing, interacting programs. Each or- ganism would consist of a single program, and we would measure its com- plexity in bits of software. The only problem is how to make the programs
  • 80. 80 Chaitin: Metabiology interact! This kind of model has no geometry, it leaves out the physical uni- verse in which the organisms live. In fact, it omits bodies and retains only their DNA. This hopefully helps to make the mathematics more tractable. But at present this model has no interaction between organisms, no notion of time, no dynamics, and no reason for things to evolve. The question is how to add that to the model. Hopeless, you may say. Perhaps not! Let’s consider some other models that people have proposed. In von Neumann’s original model creatures are embedded in a cellular automata world and are largely immobile. Not so good! There is also the problem of dissecting out the individual organisms that are embedded in a toy universe, which must be done before their in- dividual complexities can be measured. My suggestion in one of my early papers that it might be possible to use the concept of mutual information— the extent to which the complexity of two things taken together is smaller than the sum of their individual complexities—in order to accomplish this, is not, in my current opinion, particularly fruitful. In von Neumann’s original model we have the complete physics for a toy cellular automata universe. Walter Fontana’s ALChemy = algorithmic chemistry project went to a slightly higher level of abstraction. It used LISP S-expressions to model biochemistry. LISP is a functional programming language in which everything—programs as well as data—is kept in identical symbolic form, namely as what are called LISP S-expressions. Such programs can easily operate on each other and produce other programs, much in the way that molecules can react and produce other molecules. I have a feeling that both von Neumann’s cellular automata world and Fontana’s algorithmic chemistry are too low-level to model biological evolu- tion. (A model with perhaps the opposite problem of being at too high a level, is Douglas Lenat’s AM = Automated Mathematician project, which dealt with the evolution of new mathematical concepts.) So instead I am proposing a model in which individual creatures are programs. As I said, the only problem is how to model the ecology in which these creatures com- pete. In other words, the problem is how to insert a dynamics into this static software world.1 1 Thomas Ray’s Tierra project did in fact create an ecology with software parasites and hyperparasites. The software creatures he considered were sequences of machine language instructions coexisting in the memory of a single computer and competing for that machine’s memory and execution time. Again, I feel this model was too low-level. I feel that too much micro-structure was included.
  • 81. Speculations on biology, information and complexity 81 Since I have not been able to come up with a suitable dynamics for the software model I am proposing, I must leave this as a challenge for the future and proceed to describe a few biologically relevant things that I can do by measuring the size of computer programs. Let me tell you what this viewpoint can buy us that is a tiny bit biologically relevant. Pure mathematics has infinite complexity and is therefore like biology Okay, program-size complexity can’t help us very much with biological com- plexity and evolution, at least not yet. It’s not much help in biology. But this viewpoint has been developed into a mathematical theory of complexity that I find beautiful and compelling—since I’m one of the people who cre- ated it—and that has important applications in another major field, namely metamathematics. I call my theory algorithmic information theory, and in it you measure the complexity of something X via the size in bits of the smallest program for calculating X, while completely ignoring the amount of effort which may be necessary to discover this program or to actually run it (time and storage space). In fact, we pay a severe price for ignoring the time a program takes to run and concentrating only on its size. We get a beautiful theory, but we can almost never be sure that we have found the smallest program for calculating something. We can almost never determine the complexity of anything, if we chose to measure that in terms of the size of the smallest program for calculating it! This amazing fact, a modern example of the incompleteness phenomenon first discovered by Kurt G¨odel in 1931, severely limits the practical utility of the concept of program-size complexity. However, from a philosophical point of view, this paradoxical limitation on what we can know is precisely the most interesting thing about algorithmic information theory, because that has profound epistemological implications. The jewel in the crown of algorithmic information theory is the halting probability Ω, which provides a concentrated version of Alan Turing’s 1936 halting problem. In 1936 Turing asked if there was a way to determine whether or not individual self-contained computer programs will eventually stop. And his answer, surprisingly enough, is that this cannot be done. Perhaps it can be done in individual cases, but Turing showed that there
  • 82. 82 Chaitin: Metabiology could be no general-purpose algorithm for doing this, one that would work for all possible programs. The halting probability Ω is defined to be the probability that a program that is chosen at random, that is, one that is generated by coin tossing, will eventually halt. If no program ever halted, the value of Ω would be zero. If all programs were to halt, the value of Ω would be one. And since in actual fact some programs halt and some fail to halt, the value of Ω is greater than zero and less than one. Moreover, Ω has the remarkable property that its numerical value is maximally unknowable. More precisely, let’s imagine writing the value of Ω out in binary, in base-two notation. That would consist of a binary point followed by an infinite stream of bits. It turns out that these bits are irreducible, both computationally and logically: • You need an N-bit program in order to be able to calculate the first N bits of the numerical value of Ω. • You need N bits of axioms in order to be able to prove what are the first N bits of Ω. • In fact, you need N bits of axioms in order to be able to determine the positions and values of any N bits of Ω, not just the first N bits. Thus the bits of Ω are, in a sense, mathematical facts that are true for no reason, more precisely, for no reason simpler than themselves. Essentially the only way to determine the values of some of these bits is to directly add that information as a new axiom. And the only way to calculate individual bits of Ω is to separately add each bit you want to your program. The more bits you want, the larger your program must become, so the program doesn’t really help you very much. You see, you can only calculate bits of Ω if you already know what these bits are, which is not terribly useful. Whereas with π = 3.1415926 . . . we can get all the bits or all the digits from a single finite program, that’s all you have to know. The algorithm for compresses an infinite amount of information into a finite package. But with Ω there can be no compression, none at all, because there is absolutely no structure. Furthermore, since the bits of Ω in their totality are infinitely complex, we see that pure mathematics contains infinite complexity. Each of the bits of Ω is, so to speak, a complete surprise, an individual atom of mathematical creativity. Pure mathematics is therefore, fundamentally, much more similar
  • 83. Speculations on biology, information and complexity 83 to biology, the domain of the complex, than it is to physics, where there is still hope of someday finding a theory of everything, a complete set of equations for the universe that might even fit on a T-shirt. In my opinion, establishing this surprising fact has been the most impor- tant achievement of algorithmic information theory, even though it is actually a rather weak link between pure mathematics and biology. But I think it’s an actual link, perhaps the first. Computing Ω in the limit from below as a model for evolution I should also point out that Ω provides an extremely abstract—much too abstract to be satisfying—model for evolution. Because even though Ω con- tains infinite complexity, it can be obtained in the limit of infinite time via a computational process. Since this extremely lengthy computational pro- cess generates something of infinite complexity, it may be regarded as an evolutionary process. How can we do this? Well, it’s actually quite simple. Even though, as I have said, Ω is maximally unknowable, there is a simple but very time- consuming way to obtain increasingly accurate lower bounds on Ω. To do this simply pick a cut-off t, and consider the finite set of all programs p up to t bits in size which halt within time t. Each such program p contributes 1/2|p| , 1 over 2 raised to p’s size in bits, to Ω. In other words, Ω = lim t→∞    |p| ≤ t & halts within time t 2−|p|    . This may be cute, and I feel compelled to tell you about it, but I certainly do not regard this as a satisfactory model for biological evolution, since there is no apparent connection with Darwin’s theory. References The classical work on a theoretical mathematical underpinning for biology is von Neumann’s posthumous book [2]. (An earlier account of von Neu- mann’s thinking on this subject was published in [1], which I read as a
  • 84. 84 Chaitin: Metabiology child.) Interestingly enough, Francis Crick—who probably contributed more than any other individual to creating modern molecular biology—for many years shared an office with Sydney Brenner, who was aware of von Neumann’s thoughts on theoretical biology and self-reproduction. This interesting fact is revealed in the splendid biography of Crick [3]. For a book-length presentation of my own work on information and com- plexity, see [4], where there is a substantial amount of material on molecular biology. This book is summarized in my recent article [5], which however does not discuss biology. A longer overview of [4] is my Alan Turing lecture [6], which does touch on biological questions. For my complete train of thought on biology extending over nearly four decades, see also [7,8,9,10,11]. For information on Tierra, see Tom Ray’s home page at http://www.his. For information on ALChemy, see http://www.santafe. edu/~walter/AlChemy/papers.html. For information on Douglas Lenat’s Automated Mathematician, see [12] and the Wikipedia entry http://en. For Vladimir Arnold’s provocative lecture, the one in which Wigner and Gel’fand are mentioned, see arnold.html. Wigner’s entire paper is itself on the web at http://www. 1. J. Kemeny, “Man viewed as a machine,” Scientific American, April 1955, pp. 58–67. 2. J. von Neumann, Theory of Self-Reproducing Automata, University of Illinois Press, Urbana, 1967. 3. M. Ridley, Francis Crick, Eminent Lives, New York, 2006. 4. G. Chaitin, Meta Math!, Pantheon Books, New York, 2005. 5. G. Chaitin, “The limits of reason,” Scientific American, March 2006, pp. 74–81. 6. G. Chaitin, “Epistemology as information theory: from Leibniz to Ω,” European Computing and Philosophy Conference, V¨aster˚as, Sweden, June 2005.
  • 85. Speculations on biology, information and complexity 85 7. G. Chaitin, “To a mathematical definition of ‘life’,” ACM SICACT News, January 1970, pp. 12–18. 8. G. Chaitin, “Toward a mathematical definition of ‘life’,” R. Levine, M. Tribus, The Maximum Entropy Formalism, MIT Press, 1979, pp. 477–498. 9. G. Chaitin, “Algorithmic information and evolution,” O. Solbrig, G. Nicolis, Perspectives on Biological Complexity, IUBS Press, 1991, pp. 51-60. 10. G. Chaitin, “Complexity and biology,” New Scientist, 5 October 1991, p. 52. 11. G. Chaitin, “Meta-mathematics and the foundations of mathematics,” Bulletin of the European Association for Theoretical Computer Science, June 2002, pp. 167–179. 12. D. Lenat, “Automated theory formation in mathematics,” pp. 833–842 in volume 2 of R. Reddy, Proceedings of the 5th International Joint Conference on Artificial Intelligence, Cambridge, MA, August 1977, William Kaufmann, 1977.
  • 86. 86 Chaitin: Metabiology
  • 87. Chapter 7 Metaphysics, metamathematics and metabiology To be published in H. Zenil, Randomness Through Computation, World Scientific, 2011. Abstract: In this essay we present an information-theoretic perspec- tive on epistemology using software models. We shall use the notion of algorithmic information to discuss what is a physical law, to determine the limits of the axiomatic method, and to analyze Darwin’s theory of evolution. Weyl, Leibniz, complexity and the principle of sufficient reason The best way to understand the deep concept of conceptual complexity and algorithmic information, which is our basic tool, is to see how it evolved, to know its long history. Let’s start with Hermann Weyl and the great philosopher/mathematician G. W. Leibniz. That everything that is true is true for a reason is rationalist Leibniz’s famous principle of sufficient reason. The bits of Ω seem to refute this fundamental principle and also the idea that everything can be proved starting from self-evident facts. 87
  • 88. 88 Chaitin: Metabiology What is a scientific theory? The starting point of algorithmic information theory, which is the subject of this essay, is this toy model of the scientific method: theory/program/010 → Computer → experimental data/output/110100101. A scientific theory is a computer program for exactly producing the exper- imental data, and both theory and data are a finite sequence of bits, a bit string. Then we can define the complexity of a theory to be its size in bits, and we can compare the size in bits of a theory with the size in bits of the experimental data that it accounts for. That the simplest theory is best, means that we should pick the smallest program that explains a given set of data. Furthermore, if the theory is the same size as the data, then it is useless, because there is always a theory that is the same size as the data that it explains. In other words, a theory must be a compression of the data, and the greater the compression, the better the theory. Explanations are compressions, comprehension is compression! Furthermore, if a bit string has absolutely no structure, if it is completely random, then there will be no theory for it that is smaller than it is. Most bit strings of a given size are incompressible and therefore incomprehensible, simply because there are not enough smaller theories to go around. This software model of science is not new. It can be traced back via Hermann Weyl (1932) to G. W. Leibniz (1686)! Let’s start with Weyl. In his little book on philosophy The Open World: Three Lectures on the Meta- physical Implications of Science, Weyl points out that if arbitrarily complex laws are allowed, then the concept of law becomes vacuous, because there is always a law! In his view, this implies that the concept of a physical law and of complexity are inseparable; for there can be no concept of law without a corresponding complexity concept. Unfortunately he also points out that in spite of its importance, the concept of complexity is a slippery one and hard to define mathematically in a convincing and rigorous fashion. Furthermore, Weyl attributes these ideas to Leibniz, to the 1686 Dis- cours de m´etaphysique. What does Leibniz have to say about complexity in his Discours? The material on complexity is in Sections V and VI of the Discours. In Section V, Leibniz explains why science is possible, why the world is comprehensible, lawful. It is, he says, because God has created the best possible, the most perfect world, in that the greatest possible diversity of
  • 89. Metaphysics, metamathematics and metabiology 89 phenomena are governed by the smallest possible set of ideas. God simul- taneously maximizes the richness and diversity of the world and minimizes the complexity of the ideas, of the mathematical laws, that determine this world. That is why science is possible! A modern restatement of this idea is that science is possible because the world seems very complex but is actually governed by a small set of laws having low conceptual complexity. And in Section VI of the Discours, Leibniz touches on randomness. He points out that any finite set of points on a piece of graph paper always seems to follow a law, because there is always a mathematical equation passing through those very points. But there is a law only if the equation is simple, not if it is very complicated. This is the idea that impressed Weyl, and it becomes the definition of randomness in algorithmic information theory.1 Finding elegant programs So the best theory for something is the smallest program that calculates it. How can we be sure that we have the best theory? Let’s forget about theories and just call a program elegant if it is the smallest program that produces the output that it does. More precisely, a program is elegant if no smaller program written in the same language produces the same output. So can we be sure that a program is elegant, that it is the best theory for its output? Amazingly enough, we can’t: It turns out that any formal axiomatic theory A can prove that at most finitely many programs are el- egant, in spite of the fact that there are infinitely many elegant programs. More precisely, it takes an N-bit theory A, one having N bits of axioms, having complexity N, to be able to prove that an individual N-bit program is elegant. And we don’t need to know much about the formal axiomatic theory A in order to be able to prove that it has this limitation. What is a formal axiomatic theory? All we need to know about the axiomatic theory A, is the crucial require- ment emphasized by David Hilbert that there should be a proof-checking 1 Historical Note: Algorithmic information theory was first proposed in the 1960s by R. Solomonoff, A. N. Kolmogorov, and G. J. Chaitin. Solomonoff and Chaitin considered this toy model of the scientific method, and Kolmogorov and Chaitin proposed defining randomness as algorithmic incompressibility.
  • 90. 90 Chaitin: Metabiology algorithm, a mechanical procedure for deciding if a proof is correct or not. It follows that we can systematically run through all possible proofs, all possible strings of characters in the alphabet of the theory A, in size order, check- ing which ones are valid proofs, and thus discover all the theorems, all the provable assertions in the theory A.2 That’s all we need to know about a formal axiomatic theory A, that there is an algorithm for generating all the theorems of the theory. This is the software model of the axiomatic method studied in algorithmic information theory. If the software for producing all the theorems is N bits in size, then the complexity of our theory A is defined to be N bits, and we can limit A’s power in terms of its complexity H(A) = N. Here’s how: Why can’t you prove that a program is elegant? Suppose that we have an N-bit theory A, that is, that H(A) = N, and that it is always possible to prove that individual elegant programs are in fact elegant, and that it is never possible to prove that inelegant programs are elegant. Consider the following paradoxical program P: P runs through all possible proofs in the formal axiomatic theory A, searching for the first proof in A that an individual program Q is elegant for which it is also the case that the size of Q in bits is larger than the size of P in bits. And what does P do when it finds Q? It runs Q and then P produces as its output the output of Q. In other words, the output of P is the same as the output of the first provably elegant program Q that is larger than P. But this contradicts the definition of elegance! P is too small to be able to calculate the output of an elegant program Q that is larger than P. We seem to have arrived at a contradiction! But do not worry; there is no contradiction. What we have actually proved is that P can never find Q. In other words, there is no proof in the formal axiomatic theory A that an individual program Q is elegant, not if Q is larger than P. And how large is P? Well, just a fixed number of bits c larger than N, the complexity H(A) of the formal axiomatic theory A. P 2 Historical Note: The idea of running through all possible proofs, of creativity by mechanically trying all possible combinations, can be traced back through Leibniz to Ramon Llull in the 1200s.
  • 91. Metaphysics, metamathematics and metabiology 91 consists of a small, fixed main program c bits in size, followed by a large subroutine H(A) bits in size for generating all the theorems of A. The only thing tricky about this proof is that it requires P to be able to know its own size in bits. And how well we are able to do this depends on the details of the particular programming language that we are using for the proof. So to get a neat result and to be able to carry out this simple, elegant proof, we have to be sure to use an appropriate programming language. This is one of the key issues in algorithmic information theory, which programming language to use.3 Farewell to reason: The halting probability Ω4 So there are infinitely many elegant programs, but there are only finitely many provably elegant programs in any formal axiomatic theory A. The proof of this is rather straightforward and short. Nevertheless, this is a fundamental information-theoretic incompleteness theorem that is rather different in style from the classical incompleteness results of G¨odel, Turing and others. An even more important incompleteness result in algorithmic informa- tion theory has to do with the halting probability Ω, the numerical value of the probability that a program p whose successive bits are generated by independent tosses of a fair coin will eventually halt: Ω = p halts 2−(size in bits of p) . To be able to define this probability Ω, it is also very important how you chose your programming language. If you are not careful, this sum will diverge instead of being ≤ 1 like a well-behaved probability should. Turing’s fundamental result is that the halting problem in unsolvable. In algorithmic information theory the fundamental result is that the halting probability Ω is algorithmically irreducible or random. It follows that the bits of Ω cannot be compressed into a theory less complicated than they are. They are irreducibly complex. It takes N bits of axioms to be able to 3 See the chapter on “The Search for the Perfect Language” in Chaitin, Mathematics, Complexity and Philosophy, in press. 4 Farewell to Reason is the title of a book by Paul Feyerabend, a wonderfully provocative philosopher. We borrow his title here for dramatic effect, but he does not discuss Ω in this book or any of his other works.
  • 92. 92 Chaitin: Metabiology determine N bits of the numerical value Ω = .1101011 . . . of the halting probability. If your formal axiomatic theory A has H(A) = N, then you can determine the values and positions of at most N + c bits of Ω. In other words, the bits of Ω are logically irreducible, they cannot be proved from anything simpler than they are. Essentially the only way to determine what are the bits of Ω is to add these bits to your theory A as new axioms. But you can prove anything by adding it as a new axiom. That’s not using reasoning! So the bits of Ω refute Leibniz’s principle of sufficient reason: they are true for no reason. More precisely, they are not true for any reason simpler than themselves. This is a place where mathematical truth has absolutely no structure, no pattern, for which there is no theory! Adding new axioms: Quasi-empirical mathematics5 So incompleteness follows immediately from fundamental information- theoretic limitations. What to do about incompleteness? Well, just add new axioms, increase the complexity H(A) of your theory A! That is the only way to get around incompleteness. In other words, do mathematics more like physics, add new axioms not because they are self-evident, but for pragmatic reasons, because they help mathematicians to organize their mathematical experience just like physi- cal theories help physicists to organize their physical experience. After all, Maxwell’s equations and the Schr¨odinger equation are not at all self-evident, but they work! And this is just what mathematicians have done in theoret- ical computer science with the hypothesis that P = NP, in mathematical cryptography with the hypothesis that factoring is hard, and in abstract axiomatic set theory with the new axiom of projective determinacy.6 5 The term quasi-empirical is due to the philosopher Imre Lakatos, a friend of Feyer- abend. For more on this school, including the original article by Lakatos, see the collection of quasi-empirical philosophy of math papers edited by Thomas Tymoczko, New Directions in the Philosophy of Mathematics. 6 See the article on “The Brave New World of Bodacious Assumptions in Cryptography” in the March 2010 issue of the AMS Notices, and the article by W. Hugh Woodin on “The Continuum Hypothesis” in the June/July 2001 issue of the AMS Notices.
  • 93. Metaphysics, metamathematics and metabiology 93 Mathematics, biology and metabiology We’ve discussed physical and mathematical theories; now let’s turn to biol- ogy, the most exciting field of science at this time, but one where mathematics is not very helpful. Biology is very different from physics. There is no sim- ple equation for your spouse. Biology is the domain of the complex. There are not many universal rules. There are always exceptions. Math is very important in theoretical physics, but there is no fundamental mathematical theoretical biology. This is unacceptable. The honor of mathematics requires us to come up with a mathematical theory of evolution and either prove that Darwin was wrong or right! We want a general, abstract theory of evolution, not an immensely complicated theory of actual biological evolution. And we want proofs, not computer simulations! So we’ve got to keep our model very, very simple. That’s why this proposed new field is metabiology, not biology. What kind of math can we use to build such a theory? Well, it’s certainly not going to be differential equations. Don’t expect to find the secret of life in a differential equation; that’s the wrong kind of mathematics for a fundamental theory of biology. In fact a universal Turing machine has much more to do with biology than a differential equation does. A universal Turing machine is a very complicated new kind of object compared to what came previously, compared with the simple, elegant ideas in classical mathematics like analysis. And there are self-reproducing computer programs, which is an encouraging sign. There are in fact three areas in our current mathematics that do have some fundamental connection with biology, that show promise for math to continue moving in a biological direction: Computation, Information, Complexity. DNA is essentially a programming language that computes the organism and its functioning; hence the relevance of the theory of computation for biology. Furthermore, DNA contains biological information. Hence the relevance of information theory. There are in fact at least four different theories of information: • Boltzmann statistical mechanics and Boltzmann entropy, • Shannon communication theory and coding theory,
  • 94. 94 Chaitin: Metabiology • algorithmic information theory (Solomonoff, Kolmogorov, Chaitin), which is the subject of this essay, and • quantum information theory and qubits. Of the four, AIT (algorithmic information theory) is closest in spirit to biol- ogy. AIT studies the size in bits of the smallest program to compute some- thing. And the complexity of a living organism can be roughly (very roughly) measured by the number of bases in its DNA, in the biological computer pro- gram for calculating it. Finally, let’s talk about complexity. Complexity is in fact the most distin- guishing feature of biological as opposed to physical science and mathematics. There are many computational definitions of complexity, usually concerned with computation times, but again AIT, which concentrates on program size or conceptual complexity, is closest in spirit to biology. Let’s emphasize what we are not interested in doing. We are certainly not trying to do systems biology: large, complex realistic simulations of biological systems. And we are not interested in anything that is at all like Fisher-Wright population genetics that uses differential equations to study the shift of gene frequencies in response to selective pressures. We want to use a sufficiently rich mathematical space to model the space of all possible designs for biological organisms, to model biological creativity. And the only space that is sufficiently rich to do that is a software space, the space of all possible algorithms in a fixed programming language. Otherwise we have limited ourselves to a fixed set of possible genes as in population genetics, and it is hopeless to expect to model the major transitions in bio- logical evolution such as from single-celled to multicellular organisms, which is a bit like taking a main program and making it into a subroutine that is called many times. Recall the cover of Stephen Gould’s Wonderful Life on the Burgess shale and the Cambrian explosion? Around 250 primitive organisms with wildly differing body plans, looking very much like the combinatorial exploration of a software space. Note that there are no intermediate forms; small changes in software produce vast changes in output. So to simplify matters and concentrate on the essentials, let’s throw away the organism and just keep the DNA. Here is our proposal: Metabiology: a field parallel to biology that studies the random evolution of artificial software (computer programs) rather than
  • 95. Metaphysics, metamathematics and metabiology 95 natural software (DNA), and that is sufficiently simple to permit rigorous proofs or at least heuristic arguments as convincing as those that are employed in theoretical physics. This analogy may seem a bit far-fetched. But recall that Darwin himself was inspired by the analogy between artificial selection by plant and animal breeders and natural section imposed by malthusian limitations. Furthermore, there are many tantalizing analogies between DNA and large, old pieces of software. Remember bricolage, that Nature is a cobbler, a tinkerer? In fact, a human being is just a very large piece of software, one that is 3 × 109 bases = 6 × 109 bits ≈ one gigabyte of software that has been patched and modified for more than a billion years: a tremendous mess, in fact, with bits and pieces of fish and amphibian design mixed in with that for a mammal.7 For example, at one point in gestation the human embryo has gills. As time goes by, large human software projects also turn into a tremendous mess with many old bits and pieces. The key point is that you can’t start over, you’ve got to make do with what you have as best you can. If we could design a human being from scratch we could do a much better job. But we can’t start over. Evolution only makes small changes, incremental patches, to adapt the existing code to new environments. So how do we model this? Well, the key ideas are: Evolution of mutating software, and: Random walks in software space. That’s the general idea. And here are the specifics of our current model, which is quite tentative. We take an organism, a single organism, and perform random mutations on it until we get a fitter organism. That replaces the original organism, and then we continue as before. The result is a random walk in software space with increasing fitness, a hill-climbing algorithm in fact.8 7 See Neil Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year History of the Human Body. 8 In order to avoid getting stuck on a local maximum, in order to keep evolution from stopping, we stipulate that there is a non-zero probability to go from any organism to any other organism, and − log2 of the probability of mutating from A to B defines an important concept, the mutation distance, which is measured in bits.
  • 96. 96 Chaitin: Metabiology Finally, a key element in our proposed model is the definition of fitness. For evolution to work, it is important to keep our organisms from stagnating. It is important to give them something challenging to do. The simplest possible challenge to force our organisms to evolve is what is called the Busy Beaver problem, which is the problem of providing concise names for extremely large integers. Each of our organisms produces a single positive integer. The larger the integer, the fitter the organism.9 The Busy Beaver function of N, BB(N), that is used in AIT is defined to be the largest positive integer that is produced by a program that is less than or equal to N bits in size. BB(N) grows faster than any computable function of N and is closely related to Turing’s famous halting problem, because if BB(N) were computable, the halting problem would be solvable.10 Doing well on the Busy Beaver problem can utilize an unlimited amount of mathematical creativity. For example, we can start with addition, then invent multiplication, then exponentiation, then hyper-exponentials, and use this to concisely name large integers: N + N → N × N → NN → NNN → . . . There are many possible choices for such an evolving software model: You can vary the computer programming language and therefore the soft- ware space, you can change the mutation model, and eventually you could also change the fitness measure. For a particular choice of language and probability distribution of mutations, and keeping the current fitness func- tion, it is possible to show that in time of the order of 2N the fitness will grow as BB(N), which grows faster than any computable function of N and shows that genuine creativity is taking place, for mechanically changing the organism can only yield fitness that grows as a computable function.11 9 Alternative formulations: The organism calculates a total function f(n) of a single non-negative integer n and f(n) is fitter than g(n) if f(n)/g(n) → ∞ as n → ∞. Or the organism calculates a (constructive) Cantor ordinal number and the larger the ordinal, the fitter the organism. 10 Consider BB (N) defined to be the maximum run-time of any program that halts that is less than or equal to N bits in size. 11 Note that to actually simulate our model an oracle for the halting problem would have to be employed to avoid organisms that have no fitness because they never calculate a positive integer. This also explains how the fitness can grow faster than any computable function. In our evolution model, implicit use is being made of an oracle for the halting problem, which answers questions whose answers cannot be computed by any algorithmic process.
  • 97. Metaphysics, metamathematics and metabiology 97 So with random mutations and just a single organism we actually do get evolution, unbounded evolution, which was precisely the goal of metabiology! This theorem may seem encouraging, but it actually has a serious prob- lem. The times involved are so large that our search process is essentially ergodic, which means that we are doing an exhaustive search. Real evolu- tion is not at all ergodic, since the space of all possible designs is much too immense for exhaustive search. It turns out that with this same model there is actually a much quicker ideal evolutionary pathway that achieves fitness BB(N) in time of the order of N. This path is however unstable under random mutations, plus it is much too good: Each organism adds only a single bit to the preceding organism, and immediately achieves near optimal fitness for an organism of its size, which doesn’t seem to at all reflect the haphazard, frozen-accident nature of what actually happens in biological evolution.12 So that is the current state of metabiology: a field with some promise, but not much actual content at the present time. The particular details of our current model are not too important. Some kind of mutating software model should work, should exhibit some kind of basic biological features. The chal- lenge is to identify such a model, to characterize its behavior statistically,13 and to prove that it does what is required. 12 The Nth organism in this ideal evolutionary pathway is essentially just the first N bits of the numerical value of the halting probability Ω. Can you figure out how to compute BB(N) from this? 13 For instance, will some kind of hierarchical structure emerge? Large human software projects are always written that way.
  • 98. 98 Chaitin: Metabiology
  • 99. Bibliography [1] G. J. Chaitin, Thinking about G¨odel and Turing: Essays on Complexity, 1970–2007, World Scientific, 2007. [2] G. J. Chaitin, Mathematics, Complexity and Philosophy, Midas, in press. (Draft at [3] S. Gould, Wonderful Life, Norton (1989). [4] N. Koblitz and A. Menezes, “The brave new world of bodacious assump- tions in cryptography,” AMS Notices 57, 357–365 (2010). [5] G. W. Leibniz, Discours de m´etaphysique, suivi de Monadologie, Galli- mard (1995). [6] N. Shubin, Your Inner Fish, Pantheon (2008). [7] T. Tymoczko, New Directions in the Philosophy of Mathematics, Prince- ton University Press (1998). [8] H. Weyl, The Open World, Yale University Press (1932). [9] W. H. Woodin, “The continuum hypothesis, Part I,” AMS Notices 48, 567–576 (2001). 99
  • 100. 100 Chaitin: Metabiology
  • 101. Chapter 8 Algorithmic information as a fundamental concept in physics, mathematics and biology In Memoriam Jacob T. “Jack” Schwartz (1930–2009) The concept of information is not only fundamental in quantum me- chanics, but also, when formulated as program-size complexity, helps in understanding what is a law of nature, the limitations of the axiomatic method, and Darwin’s theory of evolution. Lecture given Wednesday, 23 September 2009, at the Institute for Quantum Computing in Waterloo, Canada. I’m delighted to be here at IQC, an institution devoted to two of my favorite topics, information and computation. Information & Computation The field I work in is algorithmic information theory, AIT, and in a funny way AIT is precisely the dual of what the IQC is all about, which is quantum information and quantum computing. Let me compare and contrast the two fields: First of all, I care about bits of software, not qubits, I care about the size of programs, not about 101
  • 102. 102 Chaitin: Metabiology compute times; I look at the information content of individual objects, not at ensembles, and my computers are classical, not quantum. You care about what can be known in physical systems and I care about what can be known in pure mathematics. You care about practical applications, and I care about philosophy. And strangely enough, we both sort of end up in the same place, because God plays dice both in quantum mechanics and in pure math: You need quantum randomness for cryptography, and I find irreducible complexity in the bits of the halting probability Ω. So my subject will be algorithmic information, not qubits, and I’d like to show you three different applications of the notion of algorithmic information: in physics, in math and in biology. I’ll show you three software models, three toy models of what goes on in physics, in math and in biology, in which you get insight by considering the amount of software, the size of programs, the algorithmic information content. But first of all, I should say that these are definitely toy models, highly simplified models, of what goes on in physics, in math and in biology. In fact, my motto for today is taken from Picasso, who said, “Art is a lie that helps us see the truth!” Well, that also applies to theories: Theories are lies that help us see the truth! (Picasso) You see, in order to create a mathematical theory in which you can prove neat theorems, you have to concentrate on the essential features of a situation. The models that I will show you are highly simplified toy models. You have to eliminate all the distractions, brush away all the inessential features, so that you can see the ideal case, the ideal situation. I am only interested in the Platonic essence of a situation, so that I can weave it into a beautiful mathematical theory, so that I can lay bare its inner soul. I have no interest in complicated, realistic models of anything, because they do not lead to beautiful theories. You will have to judge for yourself if my models are oversimplified, and whether or not they help you to understand what is going on in more realistic situations. So let’s start with physics, with what you might call AIT’s version of the Leibniz/Weyl model of the scientific method. What is a law of nature, what is a scientific law? The AIT model of this is given in this diagram: Theory / Program / 011 → Computer → Experimental Data / World / 1101101
  • 103. Algorithmic information in physics, mathematics and biology 103 On the left-hand side we have a scientific theory, which is a computer pro- gram, a finite binary sequence, a bit string, for calculating exactly your experimental data, perhaps the whole time-evolution of the universe, which in this discrete model is also a bit string. In other words, in this model a program is a theory for its output. I am not interested in prediction and I am not interested in statistical theories, only in deterministic theories that explain the data perfectly. And I don’t care how much work it takes to execute the program, to run the theory, to actually calculate the world from it, as long as the amount of time required to do this is finite. And I assume the world is discrete, not continuous. Remember Picasso. Remember I’m a pure mathematician! The best theory is the smallest, the most concise program that calculates precisely your data. And a theory is useless unless it is a compression, unless it has a much smaller number of bits than the data it accounts for, the data it explains. Why? Because there is always a theory with the same number of bits, even if the data is completely random. In fact, this gives you a way to distinguish between situations where there is a law, and lawless situations, ones where there is no theory, no way to understand what is happening. Amazingly enough, aspects of this approach go back to Leibniz in 1686 and to Weyl in 1932. Let me tell you about this. In fact, it’s all here: If arbitrarily complex laws are permitted, then the concept of law becomes vacuous, because there is always a law! Leibniz, 1686: Discours de m´etaphysique V, VI Hermann Weyl, 1932: The Open World 1949: Philosophy of Mathematics & Natural Science So you see, the concepts of law of nature and of complexity are insepara- ble. Remember, 1686 was the year before Newton’s Principia. What Leibniz discusses in Sections V and VI of his Discours is why is science possible, how can you distinguish a world in which science applies from one where it doesn’t, how can you tell if the world is lawful or not. According to AIT, that’s only if there’s a theory with a much smaller number of bits than the data. Okay, that’s enough physics, let’s go on to mathematics. What if you want to prove that you have the best theory, the most concise program for something? What if you try to use mathematical reasoning for that?
  • 104. 104 Chaitin: Metabiology What metamathematics studies are so-called formal axiomatic theories, in which there has to be a proof-checking algorithm, and therefore there is an algorithm for checking all possible proofs and enumerating all the theorems in your formal axiomatic theory. And that’s the model of an axiomatic theory studied in AIT. AIT concentrates on the size in bits of the algorithm for running through all the proofs and producing all theorems, a very slow, never-ending computation, but one that can be done mindlessly. So how many bits of information does it take to do this? That is all AIT cares about. To AIT a formal axiomatic theory is just a black box; the inner structure, the details of the theory, are unimportant. Formal Axiomatic Theory / Hilbert Axioms, Rules of Inference → Computer → Theorem1, Theorem2, Theorem3 . . . How many bits of software are you putting into the computer to get the theorems out? So that’s how we measure the information content, the complexity of a mathematical theory. The complexity of a formal axiomatic theory is the size in bits of the program that generates all the theorems. An N-bit theory is one in which the program to generate all the theorems is N bits in size. And my key result is that it takes an N-bit axiomatic theory to enable you to prove that an N-bit program is elegant, that is, that it is the most concise explanation for its output. More precisely, a program is elegant if no smaller program written in the same language produces the same output. It takes an N-bit theory to prove that an N-bit program is elegant. How can you prove this metatheorem? Well, that’s very easy. Consider a formal axiomatic mathematical theory T and this paradoxical program P: P computes the same output as the first provably elegant program Q whose size in bits is larger than P. In other words, P runs through all proofs in T, checking them all, until it gets to the first proof that a program Q that is larger than P is elegant, then P runs Q and produces Q’s output. But if P finds Q and does that, it produces the same output as a provably elegant program Q that is larger
  • 105. Algorithmic information in physics, mathematics and biology 105 than P, which is impossible, because an elegant program is the most concise program that yields the output it does. Hence P never finds Q, which means that in T you can’t prove that Q is elegant if Q is larger in size than P. So now the key question is this: How many bits are there in P? Well, just a fixed number of bits more than there are in T. In other words, there is a constant c such that for any formal axiomatic theory T, if you can prove in T that a program Q is elegant only if it is actually elegant, then you can prove in T that Q is elegant only if Q’s size in bits, |Q|, is less than or equal to c plus the complexity of T, which is by definition the number of bits of software that it takes to enumerate all the theorems of your theory T. Q.E.D. “Q is elegant” ∈ T only if Q is elegant ⇒ “Q is elegant” ∈ T only if |Q| ≤ |T| + c. That’s the first major result in AIT: that it takes an N-bit theory to prove that an N-bit program is elegant. The second major result is that the bits of the base-two numerical value of the halting probability Ω are irreducible, because it takes an N-bit theory to enable you to determine N bits of Ω. This I won’t prove, but let me remind you how Ω is defined. And of course Ω = ΩL also depends on your choice of programming language L, which I discussed in my talk on Monday at the Perimeter Institute: Halting Probability 0 < ΩL = p halts 2−|p| < 1, ΩL = .1101100 . . . N-bit theory ⇒ at most N + c bits of Ω. By the way, the fact that in general a formal axiomatic theory can’t enable you to prove that individual programs Q are elegant, except in finitely many cases, has as an immediate corollary that Turing’s halting problem is unsolvable. Because if we had a general method, an algorithm, for deciding if a program will ever halt, then it would be trivial to decide if Q is elegant: You’d just run all the programs that halt that are smaller than Q to see if any of them produce the same output. Corollary: There is no algorithm for solving the halting problem. And we have also proved in two different ways that the world of pure math is infinitely complex, that no theory of finite complexity can enable you to determine all the elegant programs or all the bits of the halting probability Ω.
  • 106. 106 Chaitin: Metabiology Corollary: The world of pure math has infinite complexity, and is therefore more like biology, the domain of the complex, than like physics, where there is still hope of a simple, elegant theory of everything. Those are my software models of physics and math. Now for a software model of evolution! These are some new ideas of mine. I’ve just started working on this. I call it metabiology. Our starting point is the fact that, as Jack Schwartz used to tell me, DNA is just digital software. So we model organisms only as DNA, that is, we consider software organisms. Remember Picasso! We’ll mutate these software organisms, have a fitness function, and see what happens! The basic ideas are summarized here: Metabiology Random Walks in Software Space Evolution of Mutating Software Organism = Program Single Organism Fitness Function = Busy Beaver Problem Mutation Distance[A, B] = − log2 probability of mutating from A to B It will take a while to explain all this, and to tell you how far I’ve been able to get with this model. Basically, I have only two and a half theorems. . . Let me start by reminding you that Darwin was inspired by the analogy between artificial selection by animal and plant breeders and natural selec- tion by Nature. Metabiology exploits the analogy between natural software (DNA) and artificial software (computer programs): METABIOLOGY: a field parallel to biology, dealing with the ran- dom evolution of artificial software (computer programs) rather than natural software (DNA), and simple enough to make it pos- sible to prove rigorous theorems or formulate heuristic arguments at the same high level of precision that is common in theoretical physics. Next, I’d like to tell you how I came up with this idea. There are two key components. One I got by reading David Berlinski’s polemical book The
  • 107. Algorithmic information in physics, mathematics and biology 107 Devil’s Delusion which discusses some of the arguments against Darwinian evolution, and the other is the Busy Beaver problem, which gives our organ- isms something challenging to do. And I should tell you about Neil Shubin’s book Your Inner Fish. Let me explain each of these in turn. Berlinski has an incisive discussion of perplexities with Darwinian evolu- tion. One is the absence of intermediate forms, the other is major transitions in evolution such as that from single-celled to multicellular organisms. But neither of these is a problem if we consider software organisms. Darwin himself worried about the eye; he thought that partial eyes were useless. In fact, as a biologist explained to me, eye-like organs have evolved independently many different times. Anyway, we know very well that small changes in the software can produce drastic changes in the output. A one-bit change can destroy a program! So the absence of intermediate forms is not a problem. How about the transition from unicellular to multicellular? No problem, that’s just the idea of a subroutine. You take the main program and make it into a subroutine that you call many times. Or you fork it and run it simultaneously in many parallel threads. And Berlinski discusses the neutral theory of evolution and talks about evolution as a random walk. So that was my starting point. But to make my software organisms evolve, I need to give them something challenging to do. The Busy Beaver problem to the rescue! That’s the problem of finding small, concise names for extremely large positive integers. A program names a positive integer by calculating it and then halting. BB(N) = the largest positive integer you can name with a program of size ≤ N bits. And the BB problem can utilize an unlimited amount of mathematical cre- ativity, because it’s equivalent to Turing’s halting problem, since another way to define BB of N is as follows: BB(N) = the longest runtime of any program that halts that is ≤ N bits in size. These two definitions of BB(N) are essentially equivalent.1 1 On naming large numbers, see Archimedes’ The Sand Reckoner, described in Gamow, One, Two, Three. . . Infinity. I thank Ilias Kotsireas for reminding me of this early work on the BB problem.
  • 108. 108 Chaitin: Metabiology The key point is that BB(N) grows faster than any computable function of N, because otherwise there would be an algorithm for solving the halt- ing problem, which earlier in this lecture we showed is impossible using an information-theoretic argument. BB(N) grows faster than any computable function of N. At the level of abstraction I am working with in this model, there is no essential difference between mathematical creativity and biological creativity. Now let’s turn to Neil Shubin’s book Your Inner Fish, that he summarized in his article in the January 2009 special issue of Scientific American devoted to Darwin’s bicentennial. I don’t know much about biology, but I do have a lot of experience with software. Besides my theoretical work, during my career at IBM I worked on large software projects, compilers, operating systems, that kind of stuff. You can’t start a large software project over from scratch, you just patch it and add new function as best you can. And as Shubin spells it out, that’s exactly what Nature does too. Think of yourself as an extremely large piece of software that has been patched and modified for more than a billion years to adapt us to new ecological niches. We were not designed from scratch to be human beings, to be bipedal primates. Some of the design is like that of fish, some is like that of an amphibian. If you could start over, you could design human beings much better. But you can’t start over. You have to make do with what you have as best you can. Evolution makes the minimum changes needed to adapt to a changing environment. As Fran¸cois Jacob used to emphasize, Nature is a cobbler, a handyman, a bricoleur. We are all random walks in program space! So those are the main ideas, and now let me present my model. I have a single software organism and I try making random mutations. These could be high-level language mutations (copy a subroutine with a change), or low-level mutations. Initially, I have chosen point mutations: insert, delete or change one or more bits. As the number of bits that are mutated increases, the probability of the mutation drops off exponentially. Also I favor the beginning of the program. The closer to the beginning a bit is, the more likely it is to change. Again, that drops off exponentially.
  • 109. Algorithmic information in physics, mathematics and biology 109 So I try a random mutation and I see what is the fitness of the resulting organism. I am only interested in programs that calculate a single positive integer then halt, and the bigger the integer the fitter the program. So if the mutated organism is more fit, it becomes my current organism. Otherwise I keep my current organism and continue trying mutations. By the way, to actually do this you would need to have an oracle for the halting problem, since you have to skip mutations that give you a program that never halts. There is a non-zero probability that a single mutation will take us from an organism to any other, which ensures we will not get stuck at a local maximum. For the details, you can see my article in the February 2009 EATCS Bulletin, the Bulletin of the European Association for Theoretical Computer Science. I don’t think that the details matter too much. There are a lot of parameters that you can vary in this model and probably still get things to work. For example, you can change the programming language or you can change the mutation model. As a matter of fact, the programming language I’ve used is one of the universal Turing machines in AIT1960 that I discussed in my Monday lecture at Perimeter. I picked it not because it is the right choice, but because it is a programming language I know well. Okay, we have a random walk in software space with increasing fitness. How well will this do? I’m not sure but I’ll tell you what I can prove. First of all, the mutation model is set up in such a way that in time of the order of 2N a single mutation will try adding a self-contained N-bit prefix that calculates BB(N) and ignores the rest of the program; the time is the number of mutations that have been considered. The size of the organism in bits will grow at most as 2N , the fitness will grow as BB(N). So the fitness will grow faster than any computable function, which shows that biological creativity is taking place; for if an organism is improved mechanically via an algorithm and without any creativity, then its fitness will only increase as a computable function. Theorem 1 With high probability fitness[time order of 2N ] ≥ BB(N) which grows faster than any computable function of N.
  • 110. 110 Chaitin: Metabiology We can prove that evolution will occur but our proof is not very interesting since the time involved is large enough for evolution to randomly try all the possibilities. And, most important of all, in this proof evolution is not cumulative. In effect, we are starting from scratch each time. Now, I think that the behavior of this model will actually be cumulative but I can’t prove it. However, I can show that there is what might be called an ideal evolution- ary pathway, which is a sequence of organisms having fitness that grows as BB(N) and size in bits that grows as N, and the mutation distance between successive organisms is bounded. That is encouraging. It shows that there are intermediate forms that are fitter and fitter. The only problem is that this pathway is unstable and not a likely random walk; it’s an ideal evolutionary pathway. This is my second theorem, and the organisms are reversed initial seg- ments of the bits of the halting probability Ω (plus a fixed prefix). If you are given an initial portion of Ω, K bits in fact, then you can find which ≤ K bit programs halt and see which one produces the largest positive integer, which is by definition BB(K). Furthermore the mutation distance between reversed initial segments of Ω is not large because you are only adding one bit at a time. Theorem 2 There is a sequence of organisms OK with the property that: OK = the first K bits of Ω, fitness[OK] = BB(K), mutation-distance[OK, OK+1] < c. (If we could shield the prefix from mutations, and we picked as successor to each organism the fittest organism within a certain fixed mutation distance neighborhood, then this ideal evolutionary pathway would be followed.) These are my two theorems. The half-theorem is the fact that a sequence of software organisms with increasing fitness and bounded mutation distance does not depend on the choice of universal Turing machine, because adding a fixed prefix to each organism keeps the mutation distance bounded. Theorem 2.5 That a sequence of organisms OK has the property that mutation-distance[OK, OK+1] < c does not depend on the choice of universal Turing machine.
  • 111. Algorithmic information in physics, mathematics and biology 111 Okay, so at this point there are only these 2 1/2 theorems. Not very impressive. As I said, at this time metabiology is a field with a lovely name but not much content. However, I am hopeful. I feel that some kind of evolving software model should work. There are a lot of parameters to vary, a lot of knobs to tweak. The question is, how biological will the behavior of these models be? In particular I would like to know if hierarchical structure will emerge. The human genome is a very large piece of software: about a gigabyte of DNA. Large computer programs must be structured, they cannot be spaghetti code; otherwise they cannot be debugged and maintained. Software organ- isms, I suspect, can also benefit from such discipline. Then a useful muta- tion is likely to be small and localized, rather than involve many coordinated changes scattered throughout the organism, which is much less likely. Software engineering practice has a lot of experience with large software projects, which may also be relevant to randomly evolving software organ- isms, and perhaps indirectly to biology. Clearly, there is a lot of work to be done. As Dobzhansky said, nothing in biology makes sense except in the light of evolution, and I think that a randomly evolving software approach can give us some insight. Thank you very much!
  • 112. 112 Chaitin: Metabiology
  • 113. Chapter 9 To a mathematical theory of evolution and biological creativity To be published in H. Zenil, Computation in Nature & The Nature of Computation, World Scientific, 2012. Abstract: We present an information-theoretic analysis of Darwin’s theory of evolution, modeled as a hill-climbing algorithm on a fitness landscape. Our space of possible organisms consists of computer pro- grams, which are subjected to random mutations. We study the random walk of increasing fitness made by a single mutating organism. In two different models we are able to show that evolution will occur and to char- acterize the rate of evolutionary progress, i.e., the rate of biological creativity. Key words and phrases: metabiology, evolution of mutating soft- ware, random walks in software space, algorithmic information theory 9.1 Introduction For many years we have been disturbed by the fact that there is no fun- damental mathematical theory inspired by Darwin’s theory of evolution [1, 2, 3, 4, 5, 6, 7, 8, 9]. This is the fourth paper in a series [10, 11, 12] attempting to create such a theory. In a previous paper [10] we did not yet have a workable mathematical 113
  • 114. 114 Chaitin: Metabiology framework: We were able to prove two not very impressive theorems, and then the way forward was blocked. Now we have what appears to be a good mathematical framework, and have been able to prove a number of theorems. Things are starting to work, things are starting to get interesting, and there are many technical questions, many open problems, to work on. So this is a working paper, a progress report, intended to promote interest in the field and get others to participate in the research. There is much to be done. In order to present the ideas as clearly as possible and not get bogged down in technical details, the material is presented more like a physics paper than a math paper. Estimates are at times rather sloppy. We are trying to get an idea of what is going on. The arguments concerning the basic math framework are however very precise; that part is done more or less like a math paper. 9.2 History of Metabiology In the first paper in this series [10] we proposed modeling biological evo- lution by studying the evolution of randomly mutating software—we call this metabiology. In particular, we proposed considering a single mutating software organism following a random walk in software space of increasing fitness. Besides that the main contribution of [10] was to use the Busy Beaver problem to challenge organisms into evolving. The larger the positive integer that a program names, the fitter the program. And we measured the rate of evolutionary progress using the Busy Beaver function BB(N) = the largest integer that can be named by an N-bit pro- gram. Our two results employing the framework in [10] are that • with random mutations, random point mutations, we will get to fitness BB(N) in time exponential in N (evolution by exhaustive search) [10, 11], • whereas by choosing the mutations by hand and applying them in the right order, we will get to fitness BB(N) in time linear in N (evolution by intelligent design) [11, 12]. We were unable to show that cumulative evolution will occur at random;
  • 115. To a mathematical theory of evolution and biological creativity 115 exhaustive search starts from scratch each time.1 This paper advances beyond the previous work on metabiology [10, 11, 12, 13] by proposing a better concept of mutation. Instead of changing, deleting or inserting one or more adjacent bits in a binary program, we now have high- level mutations: we can use an arbitrary algorithm M to map the organism A into the mutated organism A = M(A). Furthermore, the probability of the mutation M is now furnished by algorithmic information theory: it depends on the size in bits of the self-delimiting program for M. It is very important that we now have a natural, universal probability distribution on the space of all possible mutations, and that this is such a rich space. Using this new notion of mutation, these much more powerful mutations, enables us to accomplish the following: • We are now able to show that random evolution will become cumula- tive and will reach fitness BB(N) in time that grows roughly as N2 , so that random evolution behaves much more like intelligent design than it does like exhaustive search.2 • We also have a version of our model in which we can show that hi- erarchical structure will evolve, a conspicuous feature of biological organisms that previously [10] was beyond our reach. This is encouraging progress, and suggests that we may now have the correct version of these biology-inspired concepts. However there are many serious lacunae in the theory as it currently stands. It does not yet deserve to be called a mathematical theory of evolution and biological creativity; at best, it is a sketch of a possible direction in which such a theory might go. On the other hand, the new results are encouraging, and we feel it would be inappropriate to sit on these results until all the lacunae are filled. After all, that would take an entire book, since metabiology is, or will hopefully become, a rich and entirely new field. That said, the reader will understand that this is a working paper, a progress report, to show the direction in which the theory is developing, and 1 The Busy Beaver function BB(N) grows faster than any computable function. That evolution is able to “compute” the uncomputable function BB(N) is evidence of creativity that cannot be achieved mechanically. This is possible only because our model of evolu- tion/creativity utilizes an uncomputable Turing oracle. Our model utilizes the oracle in a highly constrained manner; otherwise it would be easy to calculate BB(N). 2 Most unfortunately, it is not yet demonstrated that random evolution cannot be as fast as intelligent design.
  • 116. 116 Chaitin: Metabiology to indicate problems that need to be solved in order to advance, in order to take the next step. We hope that this paper will encourage others to participate in developing metabiology and exploring its potential. 9.3 Modeling Evolution 9.3.1 Software Organisms In this paper we follow a metabiological [10, 11, 12, 13] approach: Instead of studying the evolution of actual biological organisms we study the evolution of software subjected to random mutations. In order to do this we use tools from algorithmic information theory (AIT) [13, 14, 15, 16, 17, 18, 19]; to fully understand this paper expert understanding of AIT is unfor- tunately necessary (see the outline in the Appendix). As our programming formalism we employ one of the optimal self- delimiting binary universal Turing machines U of AIT [14], and also, but only in Section 9.7, a primitive FORTRAN-like language that is not univer- sal. So our organisms consist on the one hand of arbitrary self-delimiting binary programs p for U, or on the other hand of certain FORTRAN-like computer programs. These are the respective software spaces in which we shall be working, and in which we will study hill-climbing random walks. 9.3.2 The Hill-Climbing Algorithm In our models of evolution, we define a hill-climbing random walk as follows: We start with a single software organism A and subject it to random mu- tations until a fitter organism A is obtained, then subject that organism to random mutations until an even fitter organism A is obtained, etc. In one of our models, organisms calculate natural numbers, and the bigger the number, the fitter the organism. In the other, organisms calculate functions that map a natural number into another natural number, and the faster the function grows, the fitter the organism. In this connection, here is a useful piece of terminology: A mutation M succeeds if A = M(A) is fitter than A; otherwise M is said to fail.
  • 117. To a mathematical theory of evolution and biological creativity 117 9.3.3 Fitness In order to get our software organisms to evolve it is important to present them with a challenge, to give them something difficult to do. Three well- known problems requiring unlimited amounts of mathematical creativity are: • Model A: Naming large natural numbers (non-negative integers) [20, 21, 22, 23], • Model B: Defining extremely fast-growing functions [24, 25, 26], • Model C: Naming large constructive Cantor ordinal numbers [26, 27]. So a software organism will be judged to be more fit if it calculates a larger integer (our Model A, Sections 9.4, 9.5, 9.6), or if it calculates a faster- growing function (our Model B, Section 9.7). Naming large Cantor ordinals (Model C) is left for future work, but is briefly discussed in Section 9.8. 9.3.4 What is a Mutation? Another central issue is the concept of a mutation. Biological systems are subjected to point mutations, localized changes in DNA, as well as to high level mutations such as copying an entire gene and then introducing changes in it. Initially [10] we considered mutating programs by changing, deleting or adding one or more adjacent bits in a binary program, and postponed working with high-level source language mutations. Here we employ an extremely general notion of mutation: A mutation is an arbitrary algorithm that transforms, that maps the original organism into the mutated organism. It takes as input the organism, and produces as output the mutated organism. And if the mutation is an n-bit program, then it has probability 2−n . In order to have the total probability of mutations be ≤ 1 we use the self-delimiting programs of AIT [14].3 9.3.5 Mutation Distance A second crucial concept is mutation distance, how difficult it is to get from organism A to organism B. We measure this distance in bits and it is defined 3 The total probability of mutations is actually < 1, so that each time we pick a mutation at random, there is a fixed probability that we will get the null mutation M(A) = A, which always fails.
  • 118. 118 Chaitin: Metabiology to be − log2 of the probability that a random mutation will change A to B. Using AIT [14, 15, 16], we see that this is nearly H(B|A), the size in bits of the smallest self-delimiting program that takes A as input and produces B as output.4 More precisely, H(B|A) = − log2 P(B|A) + O(1) = − log2   U(p|A)=B 2−|p|   + O(1). (9.1) Here |p| denotes the size in bits of the program p, and U(p|A) denotes the output produced by running p given input A on the computer U until p halts. The definition of H(B|A) that we employ here is somewhat different from the one that is used in AIT: a mutation is given A directly, it is not given a minimum-size program for A. Nevertheless, (9.1) holds [14]. Interpreting (9.1) in words, it is nearly the same to consider the simplest mutation from A to B, which is H(B|A) bits in size and has probability 2−H(B|A) , as to sum the probability over all the mutations that carry A into B. Note that this distance measure is not symmetric. For example, it is easy to change (X, Y ) into Y , but not vice versa. 9.3.6 Hidden Use of Oracles There are two hidden assumptions here. First of all, we need to use an oracle to compare the fitness of an organism A with that of a mutated organism A . This is because a mutated program may not halt and thus never produces a natural number. Once we know that the original organism A and the mutated organism A both halt, then we can run them to see what they calculate and which is fitter. In the case of fast-growing computable functions, an oracle is definitely needed to see if one grows faster than another; this cannot be determined by running the primitive recursive functions [29] calculated by the FORTRAN- like programs that we will study later, in Section 9.7. Just as oracles would be needed to actually find fitter organisms, they are also necessary because a random mutation may never halt and produce a 4 Similarly, H(B) denotes the size in bits of the smallest self-delimiting program for B that is not given A. H(B) is called the complexity of B, and H(B|A) is the relative complexity of B given A.
  • 119. To a mathematical theory of evolution and biological creativity 119 mutated organism. So to actually apply our random mutations to organisms we would need to use an oracle in order to avoid non-terminating mutations. 9.4 Model A (Naming Integers) Exhaustive Search 9.4.1 The Busy Beaver Function The first step in this metabiological approach is to measure the rate of evo- lution. To do that, we introduce this version of the Busy Beaver function: BB(N) = the biggest natural number named by a ≤ N-bit program. More formally, BB(N) = max H(k)≤N k. Here the program-size complexity or the algorithmic information content H(k) of k is the size in bits of the smallest self-delimiting program p without input for calculating k: H(k) = min U(p)=k |p|. Here again |p| denotes the size in bits of p, and U(p) denotes the output produced by running the program p on the computer U until p halts. 9.4.2 Proof of Theorem 1 (Exhaustive Search) Now, for the sake of definiteness, let’s start with the trivial program that directly outputs the positive integer 1, and apply mutations at random.5 Let’s define the mutation time to be n if we have tried n mutations, and the organism time to be n if there are n successive organisms of increasing fitness so far in our infinite random walk. From AIT [14] we know that there is an N + O(1)-bit mutation that ignores its input and produces as output a ≤ N-bit program that calculates BB(N). This mutation M has probability 2−N+O(1) and on the average, it will occur at random every 2N+O(1) times a random mutation is tried. Therefore: 5 The choice of initial organism is actually unimportant.
  • 120. 120 Chaitin: Metabiology Theorem 1 The fitness of our organism will reach BB(N) by mutation time 2N . In other words, we will achieve N bits of biological/mathematical cre- ativity by time 2N . Each successive bit of creativity takes twice as long as the previous bit did.6 More precisely, the probability that this should fail to happen, the prob- ability that M has not been tried by time 2N , is 1 − 1 2N 2N → e−1 ≈ 1 2.7 < 1 2 . And the probability that it will fail to happen by mutation time K2N is < 1/2K . This is the worst that evolution can do. It is the fitness that organisms will achieve if we are employing exhaustive search on the space of all possible organisms. Actual biological evolution is not at all like that. The human genome has 3 × 109 bases, but in the mere 4 × 109 years of life on this planet only a tiny fraction of the total enormous number 43×109 of sequences of 3 × 109 bases can have been tried. In other words, evolution is not ergodic. 9.5 Model A (Naming Integers) Intelligent Design 9.5.1 Another Busy Beaver Function If we could choose our mutations intelligently, evolution would be much more rapid. Let’s use the halting probability Ω [19] to show just how rapid. First we define a slightly different Busy Beaver function BB based on Ω. Con- sider a fixed recursive/computable enumeration {pi : i = 0, 1, 2 . . .} without repetitions of all the programs without input that halt when run on U. Thus 0 < Ω = ΩU = i 2−|pi| < 1 (9.2) and we get the following sequence Ω0 = 0 < Ω1 < Ω2 . . . of lower bounds on Ω: ΩN = i<N 2−|pi| . (9.3) 6 Instead of bits of creativity one could perhaps refer to bits of inspiration; said inspira- tion of course is ultimately coming through/from our oracle, which keeps us from getting stuck on non-terminating programs.
  • 121. To a mathematical theory of evolution and biological creativity 121 In (9.2) and (9.3) |p| denotes the size in bits of p, as before. We define BB (K) to be the least N for which the first K bits of the base-two numerical value of ΩN are correct, i.e., the same as the first K bits of the numerical value of Ω. BB (K) exists because we know from AIT [14] that Ω is irrational, so Ω = .010000 is impossible and there is no danger that ΩN will be of the form .0011111 with 1’s forever. Note that BB and BB are approximately equal. For we can calculate BB (N) if we are given N and the first N bits of Ω. Therefore BB (N) ≤ BB(N + H(N) + c) = BB(N + O(log N)). Furthermore, if we knew N and any M ≥ BB (N), we could calculate the string ω of the first N bits of Ω, which according to AIT [14] has complexity H(ω) > N − c , so N − c < H(ω) ≤ H(N) + H(M) + c . Therefore BB (N) and all greater than or equal numbers M have complexity H(M) > N − H(N) − c − c , so BB (N) must be greater than the biggest number M0 with complexity H(M0) ≤ N − H(N) − c − c . Therefore BB (N) > BB(N − H(N) − c − c ) = BB(N + O(log N)). 9.5.2 Improving Lower Bounds on Ω Our model consists of arbitrary mutation computer programs operating on arbitrary organism computer programs. To analyze the behavior of this system (Model A), however, we shall focus on a select subset: Our organisms are lower bounds on Ω, and our mutations increase these lower bounds. We are going to use these same organisms and mutations to analyze both intelligent design (Section 9.5.3) and cumulative evolution at random (Section 9.6). Think of Section 9.5.3 versus Section 9.6 as counterpoint. Organism Pρ — Lower Bound ρ on Ω Now we use a bit string ρ to represent a dyadic rational number in [0, 2) = {0 ≤ x < 2}; ρ consists of the base-two units “digit” followed by the base-two expansion of the fractional part of this rational number. There is a self-delimiting prefix πΩ that given a bit string ρ that is a lower bound on Ω, calculates the first N such that Ω > ΩN ≥ ρ, where ΩN
  • 122. 122 Chaitin: Metabiology is defined as in (9.3).7 If we concatenate the prefix πΩ with the string of bits ρ, and insert 0|ρ| 1 in front of ρ in order to make everything self-delimiting, we obtain a program Pρ for this N. We will now analyze the behavior of Model A by using these organisms of the form Pρ = πΩ 0|ρ| 1ρ. (9.4) To repeat, the output of Pρ, and therefore its fitness φPρ , is determined as follows: U(Pρ) = the first N for which i<N 2−|pi| = ΩN ≥ ρ. (9.5) This fitness will be ≥ BB (K) if ρ < Ω and the first K bits of ρ are the correct base-two numerical value of Ω. Pρ will fail to halt if ρ > Ω.8 Mutation Mk — Lower Bound ρ on Ω Increased by 2−k Consider the mutations Mk that do the following. First of all, Mk computes the fitness φ of the current organism A by running A to determine the integer φ = φA that A names. All that Mk takes from A is its fitness φA. Then Mk computes the corresponding lower bound on Ω: ρ = i<φ 2−|pi| = Ωφ. Here {pi} is the standard enumeration of all the programs that halt when run on U that we employed in Section 9.5.1. Then Mk increments the lower bound ρ on Ω by 2−k : ρ = ρ + 2−k . In this way Mk obtains the mutated program A = Pρ . A will fail to halt if ρ > Ω. If A does halt, then A = Mk(A) = Pρ will have fitness N(see (9.5)) greater than φA = φ because ρ > ρ = Ωφ, so more halting programs are included in the sum (9.3) for ΩN , which therefore has been extended farther: [ΩN ≥ ρ > ρ = Ωφ] =⇒ [N > φ]. 7 That ρ = Ω follows from the fact that Ω is irrational. 8 That ρ = Ω follows from the fact that Ω is irrational.
  • 123. To a mathematical theory of evolution and biological creativity 123 Therefore if Ω > ρ = ρ + 2−k , then Mk increases the fitness of A. If ρ > Ω, then Pρ = Mk(A) never halts and is totally unfit. 9.5.3 Proof of Theorem 2 (Intelligent Design) Please note that in this toy world, the “intelligent designer” is the author of this paper, who chooses the mutations optimally in order to get his creatures to evolve. Let’s now start with the computer program Pρ with ρ = 0. In other words, we start with a lower bound on Ω of zero. Then for k = 1, 2, 3 . . . we try applying Mk to Pρ. The mutated organism Pρ = Mk(Pρ) will either fail to halt, or it will have higher fitness than our previous organism and will replace it. Note that in general ρ = ρ + 2−k , although it could conceivably have that value. Mk will from Pρ take only its fitness, which is the first N such that ΩN ≥ ρ. ρ = ΩN + 2−k ≥ ρ + 2−k . So ρ is actually equal to a lower bound on Ω, ΩN , plus 2−k . Thus Mk will attempt to increase a lower bound on Ω, ΩN , by 2−k . Mk will succeed if Ω > ρ . Mk will fail if ρ > Ω. This is the situation at the end of stage k. Then we increment k and repeat. The lower bounds on Ω will get higher and higher. More formally, let O0 = Pρ with ρ = 0. And for k ≥ 1 let Ok = Ok−1 if Mk fails, Mk(Ok−1) if Mk succeeds. Each Ok is a program of the form Pρ with Ω > ρ. At the end of stage k in this process the first k bits of ρ will be exactly the same as the first k bits of Ω, because at that point all together we have tried summing 1/2+1/4+1/8 · · ·+1/2k to ρ. In essence, we are using an oracle to determine the value of Ω by successive interval halving.9 In other words, at the end of stage k the first k bits of ρ in Ok are correct. Hence: 9 That this works is easy to see visually. Think of the unit interval drawn vertically, with 0 below and 1 above. The intervals are being pushed up after being halved, but it is still the case that Ω remains inside each halved interval, even after it has been pushed up.
  • 124. 124 Chaitin: Metabiology Theorem 2 By picking our mutations intelligently rather than at random, we obtain a sequence ON of software organisms with non-decreasing fitness10 for which the fitness of each organism is ≥ BB (N). In other words, we will achieve N bits of biological/mathematical creativity in mutation time linear in N. Each successive bit of creativity takes about as long as the previous bit did. However, successive mutations must be tried at random in our evolution model; they cannot be chosen deliberately. We see in these two theorems two extremes: Theorem 1, brainless exhaustive search, and Theorem 2, intelligent design. What can real, random evolution actually achieve? We shall see that the answer is closer to Theorem 2 than to Theorem 1. We will achieve fitness BB (N) in time roughly order of N2 . In other words, each successive bit of creativity takes an amount of time which increases linearly in the number of bits. Open Problem 1 Is this the best that can be done by picking the mutations intelligently rather than at random? Or can creativity be even faster than linear? Does each use of the oracle yield only one bit of creativity? 11 Open Problem 2 In Theorem 2 how fast does the size in bits of the or- ganism ON grow? By using entirely different mutations intelligently, would it be possible to have the size in bits of the organism ON grow linearly, or, alternatively, for the mutation distance between ON and ON+1 to be bounded, and still achieve the same rapid growth in fitness? Open Problem 3 In Theorem 2 how many different organisms will there be by mutation time N? I.e., on the average how fast does organism time grow as a function of mutation time? 9.6 Model A (Naming Integers) Cumulative Evolu- tion at Random Now we shall achieve what Theorem 2 achieved by intelligent design, by using randomness instead. Since the order of our mutations will be random, not 10 Note that this is actually a legitimate fitness increasing (non-random) walk because the fitness increases each time that ON changes, i.e., each time that ON+1 = ON . 11 Yes, only one bit of creativity, otherwise Ω would be compressible. In fact, the sequence of oracle replies must be incompressible.
  • 125. To a mathematical theory of evolution and biological creativity 125 intelligent, there will be some duplication of effort and creativity is delayed, but not overmuch. In other words, instead of using the mutations Mk in a predetermined order, they shall be picked at random, and also mixed together with other mutations that increase the fitness. As you will recall (Section 9.5.2), a larger and larger positive integer is equivalent to a better and better lower bound on Ω. That will be our clock, our memory. We will again be evolving better and better lower bounds ρ on Ω and we shall make use of the organisms Pρ as before ((9.4), Section 9.5.2). We will also use again the mutations Mk of Section 9.5.2. Let’s now study the behavior of the random walk in Model A if we start with an arbitrary program A that has a fitness, for example, the program that is the constant 0, and apply mutations to it at random, according to the probability measure on mutations determined by AIT [14], namely that M has probability 2−H(M) .12 So with probability one, every mutation will be tried infinitely often; M will be tried roughly every 2H(M) mutation times. At any given point in this random walk, we can measure our progress to Ω by the fitness φ = φA of our current organism A and the corresponding lower bound Ωφ = ΩφA on Ω. Since the fitness φ can only increase, the lower bound Ωφ can only get better. In our analysis of what will happen we focus on the mutations Mk; other mutations will have no effect on the analysis. They are harmless and can be mixed in together with the Mk. By increasing the fitness, they can only make Ωφ converge to Ω more quickly. We also need a new mutation M∗ . M∗ doesn’t get us much closer to Ω, it just makes sure that our random walk will contain infinitely many of the programs Pρ. M∗ will be tried roughly periodically during our random walk. M∗ takes the current lower bound Ωφ = ΩφA on Ω, and produces A = M∗ (A) = PΩ1+φA . A has fitness 1 greater than the fitness of A and thus mutation M∗ will always succeed, and this keeps lots of organisms of the form Pρ in our random walk. Let’s now return to the mutations Mk, each of which will also have to be tried infinitely often in the course of our random walk. 12 This is a convenient lower bound on the probability of a mutation. A more precise value for the probability of jumping from A to A is 2−H(A |A) .
  • 126. 126 Chaitin: Metabiology The mutation Mk will either have no effect because Mk(A) fails to halt, which means that we are less than 2−k away from Ω, that is, ΩφA is less than 2−k away from Ω, or Mk will have the effect of incrementing our lower bound ΩφA on Ω by 2−k . As more and more of these mutations Mk are tried at random, eventually, purely by chance, more and more of the beginning of ΩφA will become correct (the same as the initial bits of Ω). Meanwhile, the fitness φA will increase enormously, passing BB (n) as soon as the first n bits of ΩφA are correct. And soon afterwards, M∗ will package this in an organism A = PΩ1+φA . How long will it take for all this to happen? I.e., how long will it take to try the Mk for k = 1, 2, 3, . . . , n and then try M∗ ? We have H(Mk) ≤ H(k) + c. Therefore mutation Mk has probability ≥ 2−H(k)−c > 1 c k(log k)1+ (9.6) since k 1 k(log k)1+ converges.13 The mutation Mk will be tried in time proportional to 1 over the probability of its being tried, which by (9.6) is approximately upper bounded by ξ(k) = c k(log k)1+ . (9.7) On the average, from what point on will the first n bits of Ωφ = ΩφA be the same as the first n bits of Ω? We can be sure this will happen if we first try M1, then afterwards M2, then M3, etc. through Mn, in that order. Note that if these mutations are tried in the wrong order, they will not have the desired effect. But they will do no harm either, and eventually will also be tried in the correct order. Note that it is conceivable that none of these Mk actually succeed, because of the other random mutations that were in the mix, in the melee. These other mutations may already have pushed us within 2−k of Ω. So these Mk don’t have to succeed, they just have to be tried. Then M∗ will make sure that we get an organism of the form Pρ with at least n bits of ρ correct. 13 We are using here one of the basic theorems of AIT [14].
  • 127. To a mathematical theory of evolution and biological creativity 127 Hence: Expected time to try M1 ≤ ξ(1) Expected time to then afterwards try M2 ≤ ξ(2) Expected time to then afterwards try M3 ≤ ξ(3) . . . Expected time to then afterwards try Mn ≤ ξ(n) Expected time to then afterwards try M∗ ≤ c ∴ Expected time to try M1, M2, M3 . . . Mn, M∗ in order ≤ k≤n ξ(k) + c Using (9.7), we see that this is our extremely rough “ball-park” estimate on a mutation time sufficiently big for the first n bits of ρ in Pρ = M∗ (A) to be the correct bits of Ω: k≤n ξ(k) + c = k≤n c k(log k)1+ + c = O(n2 (log n)1+ ). (9.8) Hence we expect that in time O(n2 (log n)1+ ) our random walk will include an organism Pρ in which the first n bits of ρ are correct, and so Pρ will compute a positive integer ≥ BB (n), and thus at this time the fitness will have to be at least that big: Theorem 3 In Model A with random mutations, the fitness of the organisms Pρ = M∗ (A) will reach BB (N) by mutation time roughly N2 . Note that since the bits of ρ in the organisms Pρ = M∗ (A) are becoming better and better lower bounds on Ω, these organisms in effect contain their evolutionary history. In Model A, evolution is cumulative, it does not start over from scratch as in exhaustive search. It should be emphasized that in the course of such a hill-climbing random walk, with probability one every possible mutation will be tried infinitely of- ten. However the mutations Mk will immediately recover from perturbations and set the evolution back on course. In a sense the system is self-organizing and self-repairing. Similarly, the initial organism is irrelevant. Also note that with probability one the time history or evolutionary path- way (i.e., the random walk in Model A) will quickly grow better and better approximations to all possible halting probabilities ΩU (see (9.2)) determined by any optimal universal self-delimiting binary computer U , not just for our
  • 128. 128 Chaitin: Metabiology original U. Furthermore, some mutations will periodically convert our organ- ism into a numerical constant for its fitness φ, and there will even be arbitrar- ily long chains of successive numerical constant organisms φ, φ + 1, φ + 2 . . . The microstructure and fluctuations that will occur with probability one are quite varied and should perhaps be studied in detail to unravel the full zoo of organisms and their interconnections; this is in effect a kind of miniature mathematical ecology. Open Problem 4 Study this mathematical ecology. Open Problem 5 Improve the estimate (9.8) and get a better upper bound on the expected time it will take to try M1, M2, M3 through Mn and M∗ in that order. Besides the mean, what is the variance? Open Problem 6 Separate random evolution and intelligent design: We have shown that random evolution is fast, but can you prove that it cannot be as fast as intelligent design? I.e., we have a lower bound on the speed of random evolution, and now we also need an upper bound. This is prob- ably easier to do if we only consider random mutations Mk and keep other mutations from mixing in. Open Problem 7 In Theorem 3 how fast does the size in bits of the organ- ism Pρ grow? Is it possible to have the size in bits of the organism Pρ grow linearly and still achieve the same rapid growth in fitness? Open Problem 8 It is interesting to think of Model A as a conventional random walk and to study the average mutation distance between an organism A and its successor A , its second successor A , etc. In organism time ∆t how far will we get from A on the average? What will the variance be? 9.7 Model B (Naming Functions) Let’s now consider Model B. Why study Model B? Because hierarchical struc- ture is a conspicuous feature of actual biological organisms, but it is impossi- ble to prove that such structure must emerge by random evolution in Model A. Why not? Because the programming language used by the organisms in Model A is so powerful that all structure in the programs can be hidden.
  • 129. To a mathematical theory of evolution and biological creativity 129 Consider the programs Pρ defined in Section 9.5.2 and used to prove Theo- rems 2 and 3. As we saw in Theorem 3, these programs Pρ evolve without limit at random. However, Pρ consists of a fixed prefix πΩ followed by a lower bound on Ω, ρ, and what evolves is the lower bound ρ, data which has no visible hierarchical structure, not the prefix πΩ, code which has fixed, unevolving, hierarchical structure. So in Model A it is impossible to prove that hierarchical structure will emerge and increase in depth. To be able to do this we must utilize a less powerful programming language, one that is not universal and in which the hierarchical structure cannot be hidden: the Meyer-Ritchie LOOP language [28]. We will show that the nesting depth of LOOP programs will increase without limit, due to random mutations. This also provides a much more concrete example of evolution than is furnished by our main model, Model A. Now for the details. We study the evolution of functions f(x) of a single integer argument x; faster growing functions are taken to be fitter. More precisely, if f(x) and g(x) are two such functions, f is fitter than g iff g/f → 0 as x → ∞. We use an oracle to decide if A = M(A) is fitter than A; if not, A is not replaced by A .14 The programming language we are using has the advantage that program structure cannot be hidden. It’s a programming language that is powerful enough to program any primitive recursive function [29], but it’s not a universal programming language. To give a concrete example of hierarchical evolution, we use the extremely simple Meyer-Ritchie LOOP programming language, containing only assign- ment, addition by 1, do loops, and no conditional statements or subroutines. All variables are natural numbers, non-negative integers. Here is an example of a program written in this language: 14 An oracle is needed in order to decide whether g(x)/f(x) → 0 as x → ∞ and also to avoid mutations M that never produce an A = M(A). Furthermore, if a mutation produces a syntactically invalid LOOP program A , A does not replace A.
  • 130. 130 Chaitin: Metabiology // Exponential: 2 to the Nth power // with only two nested do loops! function(N) // Parameter must be called N. M = 1 // do N times M2 = 0 // M2 = 2 * M do M times M2 = M2 + 1 M2 = M2 + 1 end do M = M2 end do // Return M = 2 to the Nth power. return_value = M // Last line of function must // always set return_value. end function More generally, let’s start with f0(x) = 2x: function(N) // f_0(N) M = 0 // M = 2 * N do N times M = M + 1 M = M + 1 end do return_value = M end function // end f_0(N) Note that the nesting depth of f0 is 1. And given a program for the function fk, here is how we program fk+1(x) = fx k (2) (9.9) by increasing the nesting depth of the program for fk by 1:
  • 131. To a mathematical theory of evolution and biological creativity 131 function(N) // f_(k+1)(N) M = 2 // do M = f_k(M) N times do N times N_ = M // Insert program for f_k here // with "function" and "end function" // stripped and all variable names // renamed to variable name_ M = return_value_ end do return_value = M end function // end f_(k+1)(N) So following (9.9) we now have programs for f0(x) = 2x, f1(x) = 2x , f2(x) = 222... with x 2’s . . . Note that a program in this language which has nesting depth 0 (no do loops) can only calculate a function of the form (x+a constant), and that the depth 1 function f0(x) = 2x grows faster than all of these depth 0 functions. More generally, it can be proven by induction [29] that a program in this language with do loop nesting depth ≤ k defines functions that grow more slowly than fk, which is defined by a depth k+1 LOOP program. This is the basic theorem of Meyer and Ritchie [28] classifying the primitive recursive functions according to their rates of growth. Now consider the mutation M that examines a software organism A writ- ten in this LOOP language to determine its nesting depth n, and then re- places A by A = fn(x), a function that grows faster than any LOOP func- tion with depth ≤ n. Mutation M will be tried at random with probability ≥ 2−H(M) . And so: Theorem 4 In Model B, the nesting depth of a LOOP function will increase by 1 roughly periodically, with an estimated mutation time of 2H(M) between successive increments. Once mutation M increases the nesting depth, it will remain greater than or equal to that increased depth, because no LOOP func- tion with smaller nesting depth can grow as fast. Note that this theorem works because the nesting depth of a primitive recursive function is used as a clock; it gives Model B memory that can be used by intelligent mutations like M.
  • 132. 132 Chaitin: Metabiology Open Problem 9 In the proof of Theorem 4, is the mutation M primitive recursive, and if so, what is its LOOP nesting depth? Open Problem 10 M can actually increase the nesting depth extremely fast. Study this. Open Problem 11 Formulate a version of Theorem 4 in terms of subrou- tine nesting instead of do loop nesting. What is a good computer programming language to use for this? 9.8 Remarks on Model C (Naming Ordinals) Now let’s briefly turn to programs that compute constructive Cantor ordinal numbers α [27]. From a biological point of view, the evolution of ordinals is piquant, because they certainly exhibit a great deal of hierarchical structure. Not, in effect, as we showed in Section 9.7 must occur in the genotype; here it is automatically present in the phenotype. Ordinals also seem like an excellent choice for an evolutionary model because of their fundamental role in mathematics15 and because of the mys- tique associated with naming large ordinals, a problem which can utilize an unlimited amount of mathematical creativity [26, 27]. Conventional ordinal notations can only handle an initial segment of the constructive ordinals. However there are two fundamentally different ways [27] to use algorithms to name all such ordinals α: • An ordinal is a program that given two positive integers, tells us which is less than the other in a well-ordering of the positive integers with order type α. • An ordinal α is a program for obtaining that ordinal from below: If it is a successor ordinal, as β + 1; if it is a limit ordinal, as the limit of a fundamental sequence βk (k = 0, 1, 2 . . .). This yields two different definitions of the algorithmic information content or program-size complexity of a constructive ordinal: 15 As an illustration of this, ordinals may be used to extend the function hierarchy fk of Section 9.7 to transfinite k. For example, fω(x) = fx(x), fω+1(x) = fx ω (2), fω+2(x) = fx ω+1(2) . . . fω×2(x) = fω+x(x), etc., an extension of (9.9).
  • 133. To a mathematical theory of evolution and biological creativity 133 H(α) = the size in bits of the smallest self-delimiting program for calculating α. We can now define this beautiful new version of the Busy Beaver function: BBord(N) = max H(α)≤N α. In order to make programs for ordinals α evolve, we now need to use a very sophisticated oracle, one that can determine if a program computes an ordinal and, given two such programs, can also determine if one of these ordinals is less than the other. Assuming such an oracle, we get the following version of Theorem 1, merely by using brainless exhaustive search: Theorem 5 The fitness of our ordinal organism α will reach BBord(N) by mutation time 2N . Can we do better than this? The problem is to determine if there is some kind of Ω number or other way to compress information about constructive ordinals so that we can improve on Theorem 5 by proving that evolution will probably reach BBord(N) in an amount of time which does not grow exponentially. We suspect that Model C may be an example of a case in which cumulative evolution at random does not occur. On the other hand, we are given an extremely powerful oracle; maybe it is possible to take advantage of that. The problem is open. Open Problem 12 Improve on Theorem 5 or show that no improvement is possible. 9.9 Conclusion At this point we should look back and ask why this all worked. Mainly for the following reason: We used an extremely rich space of possible mutations, one that possess a natural probability distribution: the space of all possible self-delimiting programs studied by AIT [14]. But the use of such powerful mutational mechanisms raises a number of issues. Presumably DNA is a universal programming language, but how sophis- ticated can mutations be in actual biological organisms? In this connection,
  • 134. 134 Chaitin: Metabiology note that evo-devo views DNA as software for constructing the embryo, and that the change from single-celled to multicellular organisms is roughly like taking a main program and making it into a subroutine, which is a fairly high-level mutation. Could this be the reason that it took so long—on the order of 109 years—for this to happen?16 The issue of balance between the power of the organisms and the power of the mutations is an important one. In the current version of the theory, both have equal power, but as a matter of aesthetics it would be bad form for a proof to overemphasize the mutations at the expense of the organisms. In future versions of the theory perhaps it will be desirable to limit the power of mutations in some manner by fiat. In this connection, note that there are two uses of oracles in this theory, one to decide which of two organisms is fitter, and another to eliminate non- terminating mutations. It is perfectly fine for a proof to be based on taking advantage of the oracle for organisms, but taking advantage of the oracle for mutations is questionable. We have by no means presented in this paper a mathematical theory of evolution and biological creativity comme il faut. But at this point in time we believe that metabiology is still a possible contender for such a theory. The ultimate goal must be to find in the Platonic world of mathematical ideas that ideal model of evolution by natural selection which real, messy biological evolution can but approach asymptotically in the limit from below. We thank Prof. Cristian Calude of the University of Auckland for reading a draft of this paper, for his helpful comments, and for providing the paper by Meyer and Ritchie [28]. Appendix. AIT in a Nutshell Programming languages are commonly universal, that is to say, capable of expressing essentially any algorithm. In order to be able to combine subroutines, i.e., for algorithmic informa- tion to be subadditive, size of program to calculate x and y ≤ size of program to calculate x + size of program to calculate y, 16 During most of the history of the earth, life was unicellular.
  • 135. To a mathematical theory of evolution and biological creativity 135 it is important that programs be self-delimiting. This means that the uni- versal computer U reads a program bit by bit as required and there is no special delimiter to mark the end of the program; the computer must decide by itself where to stop reading. More precisely, if programs are self-delimiting we have H(x, y) ≤ H(x) + H(y) + c, where H(. . .) denotes the size in bits of the smallest program for U to cal- culate . . . , and c is the number of bits in the main program that reads and executes the subroutine for x followed by the subroutine for y. Besides giving us subadditivity, the fact that programs are self-delimiting also enables us to talk about that probability P(x) that a program that is generated at random will compute x when run on U. Let’s now consider how expressive different programming languages can be. Given a particular programming language U, two important things to consider are the program-size complexity H(x) as a function of x, and the corresponding algorithmic probability P(x) that a program whose bits are chosen using independent tosses of a fair coin will compute x. We are thus led to select a subset of the universal languages that minimize H and maximize P; one way to define such a language is to consider a universal computer U that runs self-delimiting binary computer programs πC p defined as follows: U(πC p) = C(p). In other words, the result of running on U the program consisting of the prefix πC followed by the program p, is the same as the result of running p on the computer C. The prefix πC tells U which computer C to simulate. Any two such maximally expressive universal languages U and V will necessarily have |HU (x) − HV (x)| ≤ c and PU (x) ≥ PV (x) × 2−c , PV (x) ≥ PU (x) × 2−c . It is in this precise sense that such a universal U minimizes H and maximizes P. For such languages U it will be the case that H(x) = − log2 P(x) + O(1),
  • 136. 136 Chaitin: Metabiology which means that most of the probability of calculating x is concentrated on the minimum-size program for doing this, which is therefore essentially unique. O(1) means that the difference between the two sides of the equation is order of unity, i.e., bounded by a constant. Furthermore, we have H(x, y) = H(x) + H(y|x) + O(1). Here H(y|x) is the size of the smallest program to calculate y from x.17 This tells us that essentially the best way to calculate x and y is to calculate x and then calculate y from x. In other words, the joint complexity of x and y is essentially the same as the absolute complexity of x added to the relative complexity of y given x. This decomposition of the joint complexity as a sum of absolute and relative complexities implies that the mutual information content H(x : y) ≡ H(x) + H(y) − H(x, y), which is the extent to which it is easier to compute x and y together rather than separately, has the property that H(x : y) = H(x) − H(x|y) + O(1) = H(y) − H(y|x) + O(1). In other words, H(x : y) is also the extent to which knowing y helps us to know x and vice versa. Last but not least, using such a maximally expressive U we can define the halting probability Ω, for example as follows: Ω = 2−|p| summed over all programs p that halt when run on U, or alternatively Ω = 2−H(n) summed over all positive integers n, which has a slightly different numerical value but essentially the same paradoxical properties. What are these properties? Ω is a form of concentrated mathematical creativity, or, alternatively, a particularly economical Turing oracle for the 17 It is crucial that we are not given x directly. Instead we are given a minimum-size program for x.
  • 137. To a mathematical theory of evolution and biological creativity 137 halting problem, because knowing n bits of the dyadic expansion of Ω enables one to solve the halting problem for all programs p which compute a positive integer that are up to n bits in size. It follows that the bits of the dyadic expansion of Ω are irreducible mathematical information; they cannot be compressed into a theory smaller than they are.18 From a philosophical point of view, however, the most striking thing about Ω is that it provides a perfect simulation in pure mathematics, where all truths are necessary truths, of contingent, accidental truths—i.e., of truths such as historical facts or biological frozen accidents. Furthermore, Ω opens a door for us from mathematics to biology. The halting probability Ω contains infinite irreducible complexity and in a sense shows that pure mathematics is even more biological then biology itself, which merely contains extremely large finite complexity. For each bit of the dyadic expansion of Ω is one bit of independent, irreducible mathematical information, while the human genome is merely 3 × 109 bases = 6 × 109 bits of information. 18 More precisely, it takes a formal axiomatic theory of complexity ≥ n−c (one requiring a ≥ n − c bit program to enumerate all its theorems) to enable us to determine n bits of Ω.
  • 138. 138 Chaitin: Metabiology
  • 139. Bibliography [1] D. Berlinski, The Devil’s Delusion, Crown Forum, 2008. [2] S. J. Gould, Wonderful Life, Norton, 1990. [3] N. Shubin, Your Inner Fish, Pantheon, 2008. [4] M. Mitchell, Complexity, Oxford University Press, 2009. [5] J. Fodor, M. Piattelli-Palmarini, What Darwin Got Wrong, Farrar, Straus and Giroux, 2010. [6] S. C. Meyer, Signature in the Cell, HarperOne, 2009. [7] J. Maynard Smith, Shaping Life, Yale University Press, 1999. [8] J. Maynard Smith, E. Szathm´ary, The Origins of Life, Oxford University Press, 1999; The Major Transitions in Evolution, Oxford University Press, 1997. [9] F. Hoyle, Mathematics of Evolution, Acorn, 1999. [10] G. J. Chaitin, “Evolution of mutating software,” EATCS Bulletin 97 (February 2009), pp. 157–164. [11] G. J. Chaitin, “Metaphysics, metamathematics and metabiology,” in H. Zenil, Randomness Through Computation, World Scientific, in press. (Draft at [12] G. J. Chaitin, Mathematics, Complexity and Philosophy, Midas, in press. (Draft at (See Chapter 3, “Algorithmic Information as a Fundamental Concept in Physics, Mathematics and Biology.”) 139
  • 140. 140 Chaitin: Metabiology [13] G. J. Chaitin, Chapter “Complexity, Randomness” in Chaitin, Costa, Doria, After G¨odel, in preparation. (Draft at http://www.umcs.maine. edu/~chaitin/bookgoedel_2.pdf.) [14] G. J. Chaitin, “A theory of program size formally identical to informa- tion theory,” J. ACM 22 (1975), pp. 329–340. [15] G. J. Chaitin, Algorithmic Information Theory, Cambridge University Press, 1987. [16] G. J. Chaitin, Exploring Randomness, Springer, 2001. [17] C. S. Calude, Information and Randomness, Springer-Verlag, 2002. [18] M. Li, P. M. B. Vit´anyi, An Introduction to Kolmogorov Complexity and Its Applications, Springer, 2008. [19] C. Calude, G. Chaitin, “What is a halting probability?,” AMS Notices 57 (2010), pp. 236–237. [20] H. Steinhaus, Mathematical Snapshots, Oxford University Press, 1969, pp. 29–30. [21] D. E. Knuth, “Mathematics and computer science: Coping with finite- ness,” Science 194 (1976), pp. 1235–1242. [22] A. Hodges, One to Nine, Norton, 2008, pp. 246–249; M. Davis, The Universal Computer, Norton, 2000, pp. 169, 235. [23] G. J. Chaitin, “Computing the Busy Beaver function,” in T. M. Cover, B. Gopinath, Open Problems in Communication and Computation, Springer, 1987, pp. 108–112. [24] G. H. Hardy, Orders of Infinity, Cambridge University Press, 1910. (See Theorem of Paul du Bois-Reymond, p. 8.) [25] D. Hilbert, “On the infinite,” in J. van Heijenoort, From Frege to G¨odel, Harvard University Press, 1967, pp. 367–392. [26] J. Stillwell, Roads to Infinity, A. K. Peters, 2010.
  • 141. To a mathematical theory of evolution and biological creativity 141 [27] H. Rogers, Jr., Theory of Recursive Functions and Effective Computabil- ity, MIT Press, 1987. (See Chapter 11, especially Sections 11.7, 11.8 and the exercises for these two sections.) [28] A. R. Meyer, D. M. Ritchie, “The complexity of loop programs,” Pro- ceedings ACM National Meeting, 1967, pp. 465–469. [29] C. Calude, Theories of Computational Complexity, North-Holland, 1988. (See Chapters 1, 5.)
  • 142. 142 Chaitin: Metabiology
  • 143. Chapter 10 Parsing the Turing test Journal of Scientific Exploration 23 (2009), pp. 530–534. Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer edited by Robert Epstein, Gary Roberts and Grace Beber. Springer, 2009. xxiii + 517 pp. $199.00 (hardcover). ISBN 9781402067082. This big, expensive book offers much food for thought. This review will be a reaction to the first editor’s introduction, plus the clever reverse Turing test in Chapter 28 by Charles Platt with machines attempting to determine if humans have any intelligence. Basically, based on my sample of these two chapters, this book is a celebration of the coming extinction of the human race. I shall play the devil’s advocate, and also take a meta perspective on the book, analyzing its significance as a social phenomenon instead of considering its contents. Turing’s famous paper on the imitation game (reprinted and annotated in this book), a remote conversation with a computer attempting to prove it is human, in addition to its intellectual fireworks, reflects the fact that Turing, as the French say, “felt uncomfortable in his skin,” both as a male and as a human being. As this book indicates, this has now become part of the zeitgeist and a general social problem. The general attitude I see here reminds me of remarks by Marvin Minsky I heard many years ago, when he called human beings “meat machines,” and described the human race as a carbon-based life-form that was creating a silicon-based life-form that would replace it. At the time, his remarks seemed a bit mad, but now many people seem to feel that way. 143
  • 144. 144 Chaitin: Metabiology Why is this? Well, our current society attempts to make people into machines, it behaves as if human beings were ants or bees. We are being forced to live in an anthill, beehive society. Obviously machines are better at being machines than we are, and humans feel ill-suited for anthill or beehive life. Human beings are made to feel obsolete, has-beens. Robert Epstein’s introduction argues that a super-human intelligence is inevitable and not far off in time, and that at best we shall be slaves or pets for the machine, at worse exterminated as annoying insects. The authors are well aware of the amazing advances in computer technol- ogy that they believe make this possible, but perhaps they are less aware of the fact that the more we understand about organisms, the more molecular biology progresses, the more amazing living beings seem. The cells in the human body were originally autonomous living beings that have now banded together much like the citizens in a nation or the employees in a corporation. An individual cell is amazingly sophisticated, and, it seems to me, is best compared with a computer or even with an entire city. So our artificial machines may not catch up with Nature’s machines for a while. Can a century of human engineering compare with billions of years of evolution, essentially an immense parallel-processing molecular-level com- putation going on throughout the entire biosphere? In a more optimistic scenario we are not exterminated, the machines will be our servants. Isaac Asimov thought that in the future human beings might live like ancient Greek aristocrats with robotic slaves. Yes, machines can calculate better than we can, and remember things better than we can. Should we be very upset? Railroad trains go faster than a person can run, a steam-shovel can move earth quicker than a person, an airplane can fly. But human beings made those machines, and should be proud of it. Are we upset about the fact that we need to wear clothing in the winter? Not at all. People are not very fast, not very strong, they do not have fur or a tough hide, but they are extremely curious, clever, and imaginative, flexible and adaptable. Like the universal Turing machine, we are generalists, not specialists. We are not optimized for any particular little ecological niche. It is also possible that eventually enhanced humans and humanized ma- chines will become nearly indistinguishable, which doesn’t sound too bad to me. It’s much like wearing clothing or using a can-opener. But maybe none of this will happen. Another possibility is that machine intelligences will remain unconscious zombies, monstrous golems lacking a
  • 145. Parsing the Turing test 145 divine spark, a human soul. For we are products of George Bernard Shaw’s life-force, of Henri Bergson’s “´elan vital”, and machines are not. This is of course not a fashionable view in our secular times, but let me try to give a contemporary version of this argument, one designed for modern sensibilities. First of all, quantum mechanics, a branch of fundamental physics, has been telling us that the Schrdinger Psi function is real, more real than the particles it describes. Electrons in atoms are expressed as probability waves that interfere constructively and destructively. Atoms are like musical in- struments. Whatever the Psi function is, it is not material. It is more like an idea, and therefore gives support to those Platonic idealist philosophies that view spirit as more fundamental, more real, than matter. Of course, this is not a fashionable interpretation. Nonetheless Nature is giving us this hint loud and clear, even if we refuse to listen. The latest version of quantum mechanics, now called quantum informa- tion theory, reformulates “classical” 1920s quantum mechanics in terms of qubits of information; information is certainly not matter. In my opinion quantum information theory is even less materialist than classical quantum mechanics. Consciousness, quite mysterious at this time, is also more about informa- tion than about matter, I think. Could consciousness reflect some currently unknown level of physical reality? Could our current science be radically in- complete? Indeed, it may well be so. There may be many scientific mysteries yet to solve. It is true that during the three-century plus history of modern science, each period thinks it has a nearly final answer, only to discover 25 or 50 years later some totally unexpected phenomenon that provokes a complete paradigm shift. Let me invoke a temporal rather than a spatial “Copernican principle.” Why should our epoch be especially favored? Why should we have the final answers? A simple linear extrapolation of the history of science suggests that a century from now things will look remarkably different. What did we know of quantum mechanics a century ago? Is it possible that, to use Wolf- gang Pauli’s trenchant phrase, our current scientific world-view “is not even wrong?” For our grand-children and great-grand children’s sake I hope so. How boring if it should happen that there will be no fundamental changes in our scientific world view in the future. Why should Nature’s imagination be as limited as ours?
  • 146. 146 Chaitin: Metabiology So if our current scientific world view is not at all final, perhaps living beings do have something special that machines cannot attain, something that science will some day understand as well as we currently understand quantum mechanics, a scientific version, perhaps, of the soul or what the spiritual would refer to as a divine spark. How otherwise to understand cases of amazing human creativity? Pick your own favorite examples. I pick the composer Johann Sebastian Bach, and the mathematicians Leonhard Euler, Srinivasa Ramanujan and Georg Cantor. Can machines have that kind of creativity, that kind of inspiration? These men seem to have had a direct link to the source of new ideas. Believers in Darwinian evolution by natural selection will argue that no vital spark, no lan vital, nothing at all divine is needed, just random muta- tions. I myself am a believer in Darwinian evolution. I am currently trying to develop a theory I optimistically have dubbed “metabiology.” The purpose of metabiology is to prove mathematically that Darwinian evolution works. But I am open to the possibility that this may not be achievable. It would also be delightful to be able to prove that evolution by natural selection doesn’t, cannot work. I would be happy either way, as long as I can prove it. Most likely my metabiological ideas will lead nowhere, but I feel my honor as a mathematician demands that I should give it a try. And why have human beings become so defeatist? Is it more fun to work in a factory that produces robots than to conceive and raise one’s own children? Or look at cars. I have been in remote corners of Argentina, where people seem almost completely divorced from the modern world economy and do everything themselves. They manage splendidly without cars, with horses and donkeys. These are self-reproducing cars, vegetarian cars, not ones that need petroleum. No wonder that the contributors to this book have given up on human beings. People are ill-used in our modern society, and sensitive scientific intellectuals feel it. Scientists are now micro-managed. The refereeing and grant systems with everything decided by committees favors safe, conser- vative, incremental science. Can radical new ideas have a chance with our current “factory” science? I doubt it. Would Galileo, Newton, Maxwell, Darwin and Einstein be able to work in the current system? Would Euler, Ramanujan and Cantor? I think not. As I said, human beings are not ants, they are not bees, they were not designed to be slaves. Let’s look at particularly creative periods in human history, for example ancient Greece and the Italian Renaissance.
  • 147. Parsing the Turing test 147 How come the ancient Greeks were so creative? I asked a Greek intellec- tual that once, in Mykonos, and he told me that the ancient Greeks discussed this, and noted that ancient Egypt was largely stable and un-innovative for millennia, the contrary of the ancient Greeks, because Greek city-states were small and separated by mountains or isolated on islands, and so imaginative individuals could be creative and affect things, while Egyptian geography permitted strong central, unified control of an empire, creativity was sup- pressed, and talented individuals could have little or no effect. Similarly, the creativity of the Italian Renaissance probably had some- thing to do with the fact that, even now, there is no Italian nation-state. Italians are first of all Tuscans or Sicilians, they are individualists, not Ital- ians! In both cases, ancient Greece and renaissance Italy, chaos and anarchy encouraged creativity, and kept it from being suppressing by the authorities. What can we learn from this? That strong central control is bad for us. Immediate corollaries: The European Community was not a good idea. And the United States would be better off as fifty separate states. At least that’s the case if you want to maximize creativity. I’ve already said what I think of the current refereeing and grant systems. Let me wrap up my argument. People are not machines. It is time for people to stop trying to be like machines, because we have machines for that now. We should stop worshipping the machine, and instead unleash our creative, curious, passionate, inspired, intuitive, irrational individualistic humanity.
  • 148. 148 Chaitin: Metabiology
  • 149. Chapter 11 Should mathematics be done differently because of G¨odel’s incompleteness theorem? Speech on the occasion of being granted an honorary doctorate by the Univer- sity of Cordoba, founded in 1613. Lecture given Monday, 23 November 2009, in Cordoba, Argentina.1 Good afternoon. First of all, I want to thank the university authorities who are present, to thank the University of Cordoba, and to thank the Faculty of Philosophy and Humanities, for this honor which I find really moving. I consider myself an Argentinean-American, and I cannot imagine anything nicer than receiving an honorary doctorate from the oldest university in Argentina and one of the oldest in the Americas. I’m really very moved. It’s a great pleasure for me and my wife to be here in Cordoba, especially for such a nice reason, and for us to become acquainted with this city and its intellectual and scientific traditions. So thank you very much. Furthermore, in spite of what has just been said here by Professor Victor Rodriguez about my achievements, I don’t think that I have accomplished 1 This speech was delivered in Spanish and translated into English by the author. 149
  • 150. 150 Chaitin: Metabiology very much. What I see constantly before me are the challenging questions that I have not been able to answer, the big holes in what we can understand. Very basic questions, such as whether it is possible to prove mathematically that Darwin’s theory of evolution works or that it doesn’t work — either way it would be very interesting. Or the subject that I want to talk about today, which I will now introduce for you. I used to work as a computer programmer; I wrote computer software and did theory as a hobby. So I’m an amateur mathematician and a professional programmer. That’s how I used to earn a living. People normally think that mathematics is a dry, serious subject where nothing dramatic ever happens. But in the past century math went through a revolution as serious as the one that took place in physics because of the theory of relativity and quantum theory. This fact is not well-known outside the math community, but it is becoming better known now. In particular, I’m referring to a controversy over how mathematics should be done. There is a struggle for the soul of mathematics. I exaggerate a bit, but not too much. There is a struggle for the soul of mathematics between two different groups, two tendencies, two opposing viewpoints. On one side there is the famous French mathematician Poincar´e who spoke of the importance of intuition in mathematics. On the other side we have the German mathematician Hilbert who emphasized formalism and the role of the axiomatic method. The conflict is between intuition and formalism. In other words, is mathematics creative or is it mechanical? Stating it that way, I indicate my own biases. You can see which side I am on: the romantic side. But the debate is still very much alive and I want to give you a concise history of this conflict. About a century ago Hilbert proposed formalizing all of mathematics, dropping the use of natural language and making math into a formal ax- iomatic theory using an artificial language and mathematical logic. The key point is that Hilbert thought that math gives absolute certainty and that this implies that you can formalize mathematics completely in such a way that there is an algorithm, a mechanical procedure, for checking whether or not a proof is correct. In other words, Hilbert believed that if math is objective not subjective, if it really is absolutely certain, this is equivalent to saying that there are rules of the game for carrying out proofs — if no steps are left out and we use a completely formal language — which provide us with a completely mechanical way to check if a proof is correct, that is, whether it obeys the
  • 151. Should mathematics be done differently? 151 rules. According to Hilbert, this is what it means to say that math gives absolute certainty, which is what most mathematicians believe, because math is a way of fleeing from the real world to a toy world where truth is black or white and proofs are absolutely convincing. This is what Hilbert proposed about a century ago. And most people thought that it could actually be done, that one could formalize everything. Hilbert represented the orthodox, conservative position within the math com- munity. People thought that it ought to be possible. In fact, some very pretty work was done trying to achieve what Hilbert had proposed, trying to ful- fill his dream of formalizing mathematics completely and obtaining absolute certainty and total objectivity.2 But in 1931 and in 1936 there were two big surprises. In 1931 Kurt G¨odel showed that Hilbert’s project could never work, and in 1936 Alan Turing showed this completely differently and found a deeper reason why Hilbert’s dream was unattainable. These two pieces of work are greatly admired, but in my opinion the math community has a very ambiguous position about these two achieve- ments. G¨odel and Turing are heroes, but nobody wants to face the disturbing implications of their work. What G¨odel showed in 1931 is that Hilbert’s dream is impossible because any formalization of mathematics — any formal axiomatic system of the kind that Hilbert sought for all of math, to give absolute certainty, to show that the truth is black or white — will necessarily have to be incomplete because some true results will be missing. In other words, no finite formal axiomatic theory can give us all mathematical truths, some of them will always escape us. In fact, an infinity of true math results will be missing from any formal axiomatic theory proposed to achieve Hilbert’s dream. Formal axiomatic theories are always incomplete, they do not enable us to demonstrate all possible mathematical truths. G¨odel shows how to construct assertions which are true but cannot be demonstrated within a given formal axiomatic system. The way he does it is very surprising. He constructs a mathematical assertion — in fact, an arithmetical assertion — which states that it itself cannot be demonstrated. “I’m unprovable!” 2 In particular, I’m thinking of Zermelo-Fraenkel set theory and of the von Neumann integers.
  • 152. 152 Chaitin: Metabiology If you can construct an assertion that states that it’s unprovable, there are two possibilities: that it’s provable and that it isn’t. If it’s provable and it asserts that it isn’t, we’re demonstrating something that’s false, which is terrible. So by hypothesis we eliminate this possibility. If a formal axiomatic system enables us to prove things that are false, it doesn’t interest us, it’s a complete waste of time. Therefore “I’m unprovable” cannot be proved, which means that it is true. So you either demonstrate things that are false, or there are indemonstra- ble truths, truths that escape us. This is the alternative that G¨odel confronts us with. Assuming that the formal axiomatic system doesn’t enable you to prove things that are false, there must be true mathematical assertions that cannot be proved. G¨odel incompleteness theorem was a big surprise at the time, and while not provoking panic, it did lead to some rather emotional reactions, for ex- ample, from Hermann Weyl. Weyl said that his faith in pure mathematics was badly affected, and that at first it was difficult for him to continue with his research. And Weyl was a very fine mathematician. Now my story splits in two. On the one hand, there is more research on incompleteness, on G¨odel’s remarkable discovery. On the other hand, the math community begins to lose interest in these philosophical questions and continues with its everyday work. First I’ll tell you about Turing. In 1936 Turing goes beyond G¨odel and finds a much deeper reason for incompleteness. But I should emphasize that pioneering work is always the most difficult. Before G¨odel nobody was courageous enough to imagine that Hilbert might be wrong. Turing found a deeper reason for incompleteness. Turing discovered that there are many things in mathematics that can be defined but which there is no mechanical procedure, no algorithm, for calculating — they are not computable functions. Math is full of things that can be defined but cannot be calculated. And uncomputability is a new source of incompleteness. If we consider a mathematical question such as Turing’s famous halting problem for which there is no general method for calculating the answer, we get the immediate corollary that there cannot be a formal axiomatic theory that always enables us to prove what the answer is. Why not? One of the most basic properties of a formal axiomatic theory is that in principle there is a mechanical procedure for systematically traversing the
  • 153. Should mathematics be done differently? 153 tree of all possible proofs and eliminating the ones that are incorrect. It would be very slow, but in principle it would enable us to find all the theorems. So if we have a theory that enables us to demonstrate in individual cases whether or not a program eventually halts, this would give us a mechanical procedure, an algorithm, that always gives the correct answer, which Turing showed in 1936 is impossible. So Turing deduces G¨odel incompleteness from a more fundamental idea, uncomputability, which is the fact that math is full of things that can be defined but cannot be calculated. Now World War II begins and the generation that was interested in these philosophical questions disappears from the scene. The math community goes forward forgetting the crisis that was provoked by G¨odel’s theorem which had been such a big surprise. My problem is that I didn’t go forward. I remained obsessed with G¨odel’s theorem. I thought it had to be very important. I bet my professional career on the idea that it was a mistake to ignore G¨odel’s result. What the math community did, since they are mathematicians and not philosophers, is to continue with their daily work, with the problems that interested them. The consensus was that yes in theory there are limits to what can be demonstrated using any particular formal axiomatic theory, but not in practice, not with the kinds of questions that interest us, not in our own particular field. This was more or less the community’s reaction. In other words, while there may be mathematical facts that are true but unprovable, these are highly artificial pathological cases. The consensus was that in practice this does not occur. At least that is what mathematicians preferred to think in order to be able to carry on with their work. People have an amazing ability to avoid thinking about unpleasant sub- jects such as death. If we think about death all the time it is impossible to function. And if mathematicians think all the time about incompleteness they can’t function either, since there will always be doubt about whether the matter at hand can be settled by means of a proof. Why am I wasting years of my life trying to prove something if there may not even be a proof? Let’s consider an alternative course of action. Instead of ignoring G¨odel’s theorem, what if we take it very seriously? I don’t believe in going to ex- tremes, but if one took G¨odel’s result very, very seriously, how might one pro- ceed? Consider the Riemann hypothesis. This is an important mathematical conjecture that has a lot of significant consequences. But unfortunately in a hundred and fifty years of effort nobody has succeeded in proving the Rie-
  • 154. 154 Chaitin: Metabiology mann hypothesis. Mathematicians don’t know what to do; the way forward is blocked. But physicists would just consider the Riemann hypothesis to be a mathematical fact that has been corroborated empirically. In other words, I think that a possible reaction to G¨odel’s result is to make math a little bit more like theoretical physics. In physics axioms don’t have to be self-evident. Maxwell’s equations and the Schr¨odinger equation are not self-evident but they help us to organize, to unify a large body of experimental data. One could do mathematics in a similar fashion, taking G¨odel as justifica- tion for behaving as if math were an empirical science in which one doesn’t try to demonstrate everything from self-evident principles, but instead one only seeks to organize mathematical experience like physicists organize their physics lab experience. One could proceed pragmatically and adopt unproven hypotheses as new basic principles because they are extremely fruitful and have many useful consequences even though they aren’t at all self-evident. This is what I think we should do if we take G¨odel’s theorem seriously. In my opinion mathematics is different from physics, but maybe not as different as most people think. My work on metamathematics using complex- ity and information-theoretic ideas suggests to me that perhaps we should emphasize the similarities between the world of mathematics and the world of physics instead of emphasizing the differences. In this connection, there is a highly pertinent remark by the Russian mathematician Vladimir Arnold. In his opinion the only difference between mathematics and physics is that in mathematics the experiments are cheaper, since one can carry them out on a computer instead of having to have a laboratory full of expensive equipment! So math experiments are easier than physics experiments. How do I try to justify this new “quasi-empirical” view of mathematics? Well, like most mathematicians, I do in fact believe in the Platonic world of math ideas in which the truth is totally black or white. But I also believe that we are denied direct access to this Platonic world and that down here at our level it may be helpful to work a bit more quasi-empirically. It may look like my mixed, hybrid, Platonic-empiricist position is incon- sistent, but I don’t think that this is actually the case. Indeed, it is sometimes very fruitful to take ideas that seem to be inconsistent and show that in fact they aren’t. Okay, so where do I find arguments in favor of this quasi-empirical view of mathematics? The key question is whether the incompleteness phenomenon
  • 155. Should mathematics be done differently? 155 that was discovered by G¨odel and further explored by Turing is exceptional or widespread. How pervasive is incompleteness? That’s the basic question, and it is quite controversial. My contribution to this discussion is that I’ve found tools for measuring the complexity or the information content of a formal axiomatic mathemati- cal theory. And by using the concept of complexity in algorithmic information theory one can see that incompleteness is natural, not surprising. In fact, it’s inevitable, it’s unavoidable. Using algorithmic information theory, one can see that the world of math- ematical truths, the Platonic world of mathematical ideas, is infinitely com- plex. But any formal axiomatic system made by human beings necessarily has only finite complexity. Indeed, rather low complexity, since the axioms and rules of inference normally fit on a couple of pages. So seen from this perspective, incompleteness is natural, inevitable. The world of mathematical ideas is infinitely complex, but our theories only have low, finite complexity; otherwise they wouldn’t fit in a mathematician’s brain nor would they be regarded as self-evident — but I’m against the idea that in mathematics axioms have to be self-evident, because in physics self-evidence of axioms is not required. I’ve used complexity and information theory to argue that since the amount of information in pure mathematics is infinite, incompleteness is only to be expected, since a formal axiomatic theory can capture at most a finite amount of this mathematical information, an infinitesimal portion in fact. This more or less summarizes an entire lifetime of research. But you will not be surprised to learn that the mathematics community has not accepted my quasi-empirical proposal. The immune system of an intellectual commu- nity is very strong, and my ideas are rejected as foreign, as alien to the math community. Logicians don’t care much for computability, for complexity, for informa- tion and for randomness. Randomness is a nightmare for a logician, because randomness is irrational. Random events happen for no reason, they are incomprehensible from a logical point of view. However the physics community has some interest in my work. They like the idea of using a physics-inspired approach in pure mathematics. They like the idea that math isn’t that different from physics. They like the idea that a mathematics proof may be more convincing than the heuristic arguments that are accepted in physics, but that this is only a matter of degree, not an
  • 156. 156 Chaitin: Metabiology absolute black or white difference. They have always felt that mathemati- cians believe too much in absolute truth, and do not appreciate theoretical physics enough. But the coin is two-sided, and the conflict between intuition and for- malism has become much more acute because of the computer. Computer technology is a powerful argument against creativity and in favor of mecha- nization and formalization. Just take a look at the December 2008 issue of the Notices of the American Mathematical Society which you can find for free on the web. This is a special issue devoted to formal proof. While I, a poor theoretician, have been trying to convince mathematicians to pay attention to G¨odel’s theorem and work slightly differently, these people — I didn’t realize what was happening until they did it — have nearly succeeded in carrying out Hilbert’s dream. They’ve constructed tools for formalizing almost all of mathematics. They’ve done a superb piece of software engineering. This community, which is a group of fine mathematicians and software engineers, believes that in the future all mathematical proofs should be for- mal proofs. In their opinion, there will soon be no reason for accepting informal proofs. We can start demanding formal proofs and re-writing all of mathematics in a formal language so that it can be checked by verification software. There are now interactive proof checkers for verifying mathematical proofs. This is how these work: If I’m a mathematician and I have an informal proof that I want to formalize, I give it to the proof checker. It will say, “Well, there’s a particular step in this proof that I don’t understand yet. Can you please explain this better?” And you keep filling in the proof, pro- viding more details, until the software says, “Now I understand everything. It’s all fine. I have a complete formal proof.” You didn’t have to write all the steps in the formal proof yourself; that would be a big job. You write part of it, and the software provides the rest. The final result of this joint effort is a complete formal proof that has been checked and verified by reliable software, software that you trust be- cause it was carefully developed using the best available software engineering methodology. And this verification technology has advanced to the point where you don’t just verify toy proofs, you can verify complicated proofs of really im- portant theorems, for example the four color theorem, which states that four colors suffice for coloring maps without having neighboring countries with
  • 157. Should mathematics be done differently? 157 the same color. This was a rather complicated proof that not only was formalized, the mathematician who did it did not complain and even stated that going through this process enabled him to substantially improve the proof. So this formal proof business is getting really serious. Hilbert never thought that mathematicians should be required to use detailed formal proofs in their daily work. But this community does. Fur- thermore, they envision an official repository for formal proofs that have been put through this verification process. Proofs will have to be accepted by this repository to be used by the mathematics community; everything that has been formalized and checked will be there, in one place. So amazingly enough, the lines of research opened up by Hilbert’s formal- ization proposal and by G¨odel’s work on the limitations of formal systems are both progressing dramatically. I think there is a wonderful intellectual tension between the work advancing formalization and the one criticizing it. Both of these lines of research are going forward splendidly in parallel! In mathematics this circumstance is striking because one thinks that the truth is black or white. But in philosophy this situation doesn’t seem so strange because philosophers understand that ideas that seem contradictory are often in fact complementary. I won’t try to predict the final outcome of this conflict; probably there will be no final outcome. In philosophy there are no final answers. Each generation does its best to resolve the fundamental questions to their own satisfaction, and then the next generation goes off in a different direction. So I won’t try to predict the future. I don’t know if mathematicians will eventually think that incompleteness implies that they should do math differently, or if formalization will win. Perhaps we don’t have to choose between quasi-empiricism and formal- ization. Both of these approaches can contribute something to mathematics and to mathematical practice. My late friend the mathematician Gian-Carlo Rota, whose provocative ideas I greatly enjoy, has bequeathed us a collection of his essays entitled Indiscrete Thoughts. He thinks that formal axiomatization is a cemetery. When a theory is completely finished, then you can formalize it. But when you are creating a new theory, you have to work with vague intuitions, with imprecise ideas, and formalization is deadly. Premature for- malization stifles creativity; once a theory is formalized it becomes stiff and rigid and no new ideas can get in.
  • 158. 158 Chaitin: Metabiology So I think that quasi-empiricism and formalism can both contribute some- thing of value. Furthermore, both are advancing step by step. In 1974 I proposed accepting new math axioms the way that this is done in physics,3 and nobody took me seriously, but in the past thirty-five years this has actually happened. It has happened in set theory, where there’s a new axiom called “projec- tive determinacy.” It has happened in theoretical computer science, where you use the hypothesis that P is not equal NP, which everyone believes but nobody can prove. And it has happened in mathematical cryptogra- phy, which is based on the assumption that you can’t factorize big numbers quickly. In these fields mathematicians are behaving as if they were physicists. They’ve found new principles that enable them to organize the experiences of each of these communities. These are principles that are not self-evident, that have not been demonstrated, but that are accepted by consensus as new fundamental principles, at least until they are disproven or counter-examples are encountered. Each of these mathematical communities is behaving as if they were the- oretical physicists, they are doing what I call quasi-empirical mathematics. So I’ve been delighted to witness these developments, but not so delighted to see the striking advance of formalism in recent years. These questions are still open, and they are very difficult ones. I’ve tried to argue in favor of a quasi-empirical stance, in favor of creativity and against formalism, but I myself am not completely convinced by my own arguments. More work is needed. We still do not know to what extent math is mechanical or creative. Thank you very much! 3 “Information-theoretic limitations of formal systems,” J. ACM 21, 1974, pp. 403–424.
  • 159. Bibliography 1. David Berlinski, The Devil’s Delusion: Atheism and its Scientific Pre- tensions 2. Stephen Jay Gould, Wonderful Life: The Burgess Shale and the Nature of History 3. Neil Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year History of the Human Body 4. Melanie Mitchell, Complexity: A Guided Tour 5. Jerry Fodor and Massimo Piattelli-Palmarini, What Darwin Got Wrong 6. Stephen C. Meyer, Signature in the Cell: DNA and the Evidence for Intelligent Design 159
  • 160. 160 Chaitin: Metabiology
  • 161. Books by Chaitin • Algorithmic Information Theory, Cambridge University Press, 1987. • Information, Randomness and Incompleteness: Papers on Algorithmic Information Theory, World Scientific, 1987, 2nd edition, 1990. • Information-Theoretic Incompleteness, World Scientific, 1992. • The Limits of Mathematics: A Course on Information Theory and the Limits of Formal Reasoning, Springer, 1998. Also in Japanese. • The Unknowable, Springer, 1999. Also in Japanese. • Exploring Randomness, Springer, 2001. • Conversations with a Mathematician: Math, Art, Science and the Lim- its of Reason, Springer, 2002. Also in Portuguese and Japanese. • From Philosophy to Program Size: Key Ideas and Methods. Lecture Notes on Algorithmic Information Theory from the 8th Estonian Win- ter School in Computer Science, EWSCS ’03, Tallinn Institute of Cy- bernetics, 2003. • Meta Math! The Quest for Omega, Pantheon, 2005. Also UK, French, Italian, Portuguese, Japanese and Greek editions. • Teoria algoritmica della complessit`a, Giappichelli, 2006. • Thinking about G¨odel and Turing: Essays on Complexity, 1970–2007, World Scientific, 2007. • Mathematics, Complexity & Philosophy: Lectures in Canada and Ar- gentina, Midas, in press. This is an English/Spanish bilingual edition. 161
  • 162. 162 Chaitin: Metabiology • G. Chaitin, N. da Costa, F. A. Doria, After G¨odel: Exploits into an undecidable world, in preparation.