1.
1
METABIOLOGY:
LIFE AS EVOLVING
SOFTWARE
METABIOLOGY: a ﬁeld parallel to biology,
dealing with the random evolution of artiﬁ-
cial software (computer programs) rather than
natural software (DNA), and simple enough
that it is possible to prove rigorous theorems
or formulate heuristic arguments at the same
high level of precision that is common in the-
oretical physics.
2.
2
“The chance that higher life forms might have emerged in this way [by
Darwinian evolution] is comparable to the chance that a tornado sweeping
through a junkyard might assemble a Boeing 747 from the materials therein.”
— Fred Hoyle.
“In my opinion, if Darwin’s theory is as simple, fundamental and basic as its
adherents believe, then there ought to be an equally fundamental mathemati-
cal theory about this, that expresses these ideas with the generality, precision
and degree of abstractness that we are accustomed to demand in pure math-
ematics.” — Gregory Chaitin, Speculations on Biology, Information and
Complexity.
“Mathematics is able to deal successfully only with the simplest of situations,
more precisely, with a complex situation only to the extent that rare good
fortune makes this complex situation hinge upon a few dominant simple fac-
tors. Beyond the well-traversed path, mathematics loses its bearings in a
jungle of unnamed special functions and impenetrable combinatorial partic-
ularities. Thus, the mathematical technique can only reach far if it starts
from a point close to the simple essentials of a problem which has simple
essentials. That form of wisdom which is the opposite of single-mindedness,
the ability to keep many threads in hand, to draw for an argument from many
disparate sources, is quite foreign to mathematics.” — Jacob Schwartz,
The Pernicious Inﬂuence of Mathematics on Science.
“It may seem natural to think that, to understand a complex system, one
must construct a model incorporating everything that one knows about the
system. However sensible this procedure may seem, in biology it has repeat-
edly turned out to be a sterile exercise. There are two snags with it. The
ﬁrst is that one ﬁnishes up with a model so complicated that one cannot
understand it: the point of a model is to simplify, not to confuse. The sec-
ond is that if one constructs a suﬃciently complex model one can make it
do anything one likes by ﬁddling with the parameters: a model that can
predict anything predicts nothing.” — John Maynard Smith & E¨ors
Szathm´ary, The Origins of Life.
3.
3
Course Notes
METABIOLOGY:
LIFE AS EVOLVING
SOFTWARE
G. J. Chaitin
Draft October 1, 2010
4.
4
To my wife Virginia
who played an essential role in this research
5.
Contents
Preface 7
1 Introduction: Building a theory 9
2 The search for the perfect language 19
3 Is the world built out of information? Is everything soft-
ware? 39
4 The information economy 45
5 How real are real numbers? 55
6 Speculations on biology, information and complexity 77
7 Metaphysics, metamathematics and metabiology 87
8 Algorithmic information as a fundamental concept in
physics, mathematics and biology 101
9 To a mathematical theory of evolution and biological creativ-
ity 113
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.2 History of Metabiology . . . . . . . . . . . . . . . . . . . . . . 114
9.3 Modeling Evolution . . . . . . . . . . . . . . . . . . . . . . . . 116
9.3.1 Software Organisms . . . . . . . . . . . . . . . . . . . . 116
9.3.2 The Hill-Climbing Algorithm . . . . . . . . . . . . . . 116
9.3.3 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3.4 What is a Mutation? . . . . . . . . . . . . . . . . . . . 117
5
7.
Preface
Biology and mathematics are like oil and water, they do not mix. Never-
theless this course will describe my attempt to express some basic biological
principles mathematically. I’ll try to explain the raison d’ˆetre of what I call
my “metabiological” approach, which studies randomly evolving computer
programs rather than biological organisms.
I want to thank a number of people and organizations for inviting me to
lecture on metabiology; the interaction with audiences was extremely stimu-
lating and helped these ideas to evolve.
Firstly, I thank the IBM Watson Research Center, Yorktown Heights,
where I gave two talks on this, including the world premiere talk on metabiol-
ogy. Another talk on metabiology in the United States was at the University
of Maine.
In Argentina I thank Veronica Becher of the University of Buenos Aires
and Victor Rodriguez of the University of Cordoba for their kind invitations.
And I am most grateful to the University of Cordoba, currently celebrating
its 400th anniversary, for the honorary doctorate that they were kind enough
to bestow on me.
In Chile I spoke on metabiology several times at the Valparaiso Complex
Systems Institute, and in Brazil I included metabiology in courses I gave
at the Federal University of Rio de Janeiro and in a talk at the Federal
University in Niteroi.
Furthermore I thank Bernd-Olaf K¨uppers for inviting me to a very stim-
ulating meeting at his Frege Centre for Structural Sciences at the University
of Jena.
And I thank Ilias Kotsireas for organizing a Chaitin-in-Ontario lecture se-
ries in 2009 in the course of which I spoke on metabiology at the University
of Western Ontario in London, at the Institute for Quantum Computing in
Waterloo, and at the Fields Institute at the University of Toronto. The chap-
7
8.
8 Chaitin: Metabiology
ter of this book on Ω is based on a talk I gave at Wilfrid Laurier University
in Waterloo.
Finally, I should mention that the chapter on “The Search for the Perfect
Language” was ﬁrst given as a talk at the Hebrew University in Jerusalem
in 2008, then at the University of Campinas in Brazil, and ﬁnally at the
Perimeter Institute in Waterloo, Canada.
The chapter on “Is Everything Software?” was originally a talk at the
Technion in Haifa, where I also spoke on metabiology at the University of
Haifa, one of a series of talks I gave there as the Rothschild Distinguished
Lecturer for 2010.
These were great audiences, and their questions and suggestions were
extremely valuable. —
Gregory Chaitin, August 2010
9.
Chapter 1
Introduction: Building a theory
• This is a course on biology that will spend a lot of time discussing Kurt
G¨odel’s famous 1931 incompleteness theorem on the limits of formal
mathematical reasoning. Why? Because in my opinion the ultimate
historical perspective on the signiﬁcance of incompleteness may be that
G¨odel opens the door from mathematics to biology.
• We will also spend a lot of time discussing computer programs and
software for doing mathematical calculations. How come? Because
DNA is presumably a universal programming language, which
is a language that is rich enough that it can express any algorithm.
The fact that DNA is such a powerful programming language is a more
fundamental characteristic of life than mere self-reproduction, which
anyway is never exact—for if it were, there would be no evolution.
• Now a few words on the kind of mathematics that we shall use in this
course. Starting with Newton mathematical physics is full of what are
called ordinary diﬀerential equations, and starting with Maxwell partial
diﬀerential equations become more and more important. Mathematical
physics is full of diﬀerential equations, that is, continuous mathematics.
But that is not the kind of mathematics that we shall use here. The
secret of life is not a diﬀerential equation. There is no diﬀerential
equation for your spouse, for an organism, or for biological evolution.
Instead we shall concentrate on the fact that DNA is the software, it’s
the programming language for life.
• It is true that there are (ordinary) diﬀerential equations in a highly suc-
9
10.
10 Chaitin: Metabiology
cessful mathematical theory of evolution, Wright-Fisher-Haldane pop-
ulation genetics. But population genetics does not say where new
genes come from, it assumes a ﬁxed gene pool and discusses the
change of gene frequencies in response to selective pressure, not bio-
logical creativity and the major transitions in evolution, such
as the transition from unicellular to multicellular organisms, which is
what interests us.
• If we aren’t going to use anymore the diﬀerential equations that popu-
late mathematical physics, what kind of math are we going to use? It
will be discrete math, new math, the math of the 20th century dealing
with computation, with algorithms. It won’t be traditional continuous
math, it won’t be the calculus. As Dorothy says in The Wizard of Oz,
“Toto, we’re not in Kansas anymore!”
More in line with our life-as-evolving-software viewpoint are three hot
new topics in 20th century mathematics, computation, information
and complexity. These have expanded into entire theories, called com-
putability theory, information theory and complexity theory, theories
which superﬁcially appear to have little or no connection with biology.
In particular, our basic tool in this course will be algorithmic infor-
mation theory (AIT), a mixture of Turing computability theory with
Shannon information theory, which features the concept of program-
size complexity. The author was one of the people who created this
theory, AIT, in the mid 1960’s and then further developed it in the
mid 1970’s; the theory of evolution presented in this course could have
been done then—all the necessary tools were available.
Why then the delay of 35 years? My apologies; I got distracted working
on computer engineering and thinking about metamathematics. I had
published notes on biology occasionally on and oﬀ since 1969, but I
couldn’t ﬁnd the right way of thinking about biology, I couldn’t ﬁgure
out how to formulate evolution mathematically in a workable manner.
Once I discovered the right way, this new theory I call metabiology went
from being a gleam in my eye to a full-ﬂedged mathematical theory in
just two years.
• Also, it would be nice to be able to show that in our toy model hierar-
chical structure will evolve, since that is such a conspicuous feature of
biological organisms.
11.
Introduction: Building a theory 11
What kind of math can we use for that? Well, there are places in pure
math and in software engineering where you get hierarchical structures:
in Mandelbrot fractals, in Cantor transﬁnite ordinal numbers, in hier-
archies of fast growing functions, and in software levels of abstraction.
Fractals are continuous math and therefore not suitable for our discrete
models, but the three others are genuine possibilities, and we shall
discuss them all. One of our models of evolution does provably exhibit
hierarchical structure.
• Here is the big challenge: Biology is extremely complicated, and every
rule has exceptions. How can mathematics possibly deal with this?
We will outline an indirect way to deal with it, by studying a toy
model I call metabiology (= life as evolving software, computer program
organisms, computer program mutations), not the real thing. We are
using Leibnizian math, not Newtonian math.
By modeling life as software, as computer programs, we get a very rich
space of possible designs for organisms, and we can discuss biological
creativity = where new genes come from (where new biological ideas
such as multicellular organization come from), not just changes in gene
frequencies in a population as in conventional evolutionary models.
• Some simulations of evolution on the computer (in silico—as contrasted
with in vivo, in the organism, and in vitro, in the test tube) such as
Tierra and Avida do in fact model organisms as software. But in
these models there is only a limited amount of evolution followed by
stagnation.1
Furthermore I do not run my models on a computer, I prove theorems
about them. And one of these theorems is that evolution will con-
tinue indeﬁnitely, that biological creativity (or what passes for it in my
model) is endless, unceasing.
• The main theme of Darwinian evolution is competition, survival of
the ﬁttest, “Nature red in tooth and claw.” The main theme of my
model is creativity: Instead of a population of individuals competing
1
As for genetic algorithms, they are intended to “stagnate” when they achieve an
optimal solution to an engineering design problem; such a solution is a ﬁxed point of the
process of simulated evolution used by genetic algorithms.
12.
12 Chaitin: Metabiology
ferociously with each other in order to spread their individual genes (as
in Richard Dawkins’ The Selﬁsh Gene), instead of a jungle, my model
is like an individual Buddhist trying to attain enlightenment, a monk
who is on the path to enlightenment, it is like a mystic or a kabbalist
who is trying to get closer and closer to God.
More precisely, the single mutating organism in my model attains
greater and greater mathematical knowledge by discovering more and
more of the bits of Ω, which is, as we shall see in the part of the
course on Ω, Course Topic 5, a very concentrated form of mathematical
knowledge, of mathematical creativity. My organisms strive for greater
mathematical understanding, for purely mathematical enlightenment.
I model where new mathematical knowledge is coming from, where
new biological ideas are coming from, it is this process that I model
and prove theorems about.
• But my model of a single mutating organism is indeed Darwinian: I
have a single organism that is subjected to completely random muta-
tions until a ﬁtter organism is found, which then replaces my original
organism, and this process continues indeﬁnitely. The key point is that
in my model progress comes from combining random mutations and
having a ﬁtness criterion (which is my abstract encapsulation of both
competition and the environment).
The key point in Darwin’s theory was to replace God by randomness;
organisms are not designed, they emerge at random, and that is also
the case in my highly simpliﬁed toy model.
• Does this highly abstract game have any relevance to real biology?
Probably not, and if so, only very, very indirectly. It is mathematics
itself that beneﬁts most, because we begin to have a mathematical
theory inspired by Darwin, to have mathematical concepts that are
inspired by biology.
The fact that I can prove that evolution occurs in my model
does not in any way constitute a proof that Darwinians are
correct and Intelligent Design partidarians are mistaken.
But my work is suggestive and it does clarify some of the issues, by
furnishing a toy model that is much easier to analyze than the real
thing—the real thing is what is actually taking place in the biosphere,
13.
Introduction: Building a theory 13
not in my toy model, which consists of arbitrary mutation computer
programs operating on arbitrary organism computer programs.
• More on creativity, a key word in my model: Something is mechanical
if there is an algorithm for doing it; it is creative if there is no such
algorithm.
This notion of creativity is basic to our endeavor, and it comes from
the work of G¨odel on the incompleteness of formal axiomatic theories,
and from the work of Turing on the unsolvability of the so-called halt-
ing problem.2
Their work shows that there are no absolutely general
methods in mathematics and theoretical computer science, that cre-
ativity is essential, a conclusion that Paul Feyerabend with his book
Against Method would have loved had he been aware of it: What Fey-
erabend espouses for philosophical reasons is in fact a theorem, that is,
is provably correct in the ﬁeld of mathematics.
So before we get to Darwin, we shall spend a lot of time in this course
with G¨odel and Turing and the like, preparing the groundwork for our
model of evolution. Without this historical background it is impossible
to appreciate what is going on in our model.
• My model therefore mixes mathematical creativity and biological cre-
ativity. This is both good and bad. It’s bad, because it distances my
model from biology. But it is good, because mathematical creativity
is a deep mathematical question, a fundamental mystery, a big un-
known, and therefore something important to think about, at least for
mathematicians, if not for biologists.
Further distancing my model from biology, my model combines ran-
domness, a very Darwinian feature, with Turing oracles, which have no
counterpart in biology; we will discuss this in due course.
Exploring such models of randomly evolving software may well develop
into a new ﬁeld of mathematics. Hopefully this is just the beginning,
and metabiology will develop and will have more connection with biol-
ogy in the future than it has at present.
2
Turing’s halting problem is the question of deciding whether or not a computer pro-
gram that is self-contained, without any input, will run forever, or will eventually ﬁnish.
14.
14 Chaitin: Metabiology
• The main diﬀerence between our model and the DNA software in real
organisms is their time complexity: the amount of time the software
can run. I can prove elegant theorems because in my model the time
allowed for a program to run is ﬁnite, but unlimited.
Real DNA software must run quickly: 9 months to produce a baby,
70 years in total, more or less. A theory of the evolution of programs
with such limited time complexity, with such limited run time, would
be more realistic but it will not contain the neat results we have in our
idealized version of biology.
This is similar to the thermodynamics arguments which are taken in
the “thermodynamic limit” of large amounts of time, in order to obtain
more clear-cut results, when discussing the ideal performance of heat
engines (e.g., steam engines). Indeed, AIT is a kind of thermodynamics
of computation, with program-size complexity replacing entropy. In-
stead of applying to heat engines and telling us their ideal eﬃciency,
AIT does the same for computers, for computations.
You have to go far from everyday biology to ﬁnd beautiful mathematical
structure.
• It should be emphasized that metabiology is Work in Progress. It
may be mistaken. And it is certainly not ﬁnished yet. We are building
a new theory. How do you create a theory?
“Beauty” is the guide. And this course will give a history of ideas
for metabiology with plenty of examples. An idea is beautiful when it
illuminates you, when it connects everything, when you ask yourself,
“Why didn’t I see that before!,” when in retrospect it seems obvious.
AIT has two such ideas: the idea of looking at the size of a com-
puter program as a complexity measure, and the idea of self-delimiting
programs. Metabiology has two more beautiful ideas: the idea of or-
ganisms as arbitrary programs with a diﬃcult mathematical problem
to solve, and the idea of mutations as arbitrary programs that operate
on an organism to produce a mutated organism.
Once you have these ideas, the rest is just uninspired routine work,
lots of hard work, but that’s all. In this course we shall discuss all four
of these beautiful ideas, which were the key inspirations required for
creating AIT and metabiology.
15.
Introduction: Building a theory 15
Routine work is not enough, you need a spark from God. And mostly
you need an instinct for mathematical beauty, for sensing an idea that
can be developed, for the importance of an idea. That is, more than
anything else, a question of aesthetics, of intuition, of instinct, of judge-
ment, and it is highly subjective.
I will try my best to explain why I believe in these ideas, but just as
in artistic taste, there is no way to convince anyone. You either feel
it somewhere deep in your soul or you don’t. There is nothing more
important than experiencing beauty; it’s a glimpse of transcendence, a
glimpse of the divine, something that fewer and fewer people believe in
nowadays. But without that we are mere machines.
And I may have the beginnings of a mathematical theory of evolu-
tion and biological creativity, but a mathematical theory of beauty is
nowhere in sight.
• Incompleteness goes from being threatening to provoking creativity and
being applied in order to keep our organisms evolving indeﬁnitely. Evo-
lution stagnates in most models because the organisms achieve their
goals. In my model the organisms are asked to achieve something
that can never be fully achieved because of the incompleteness phe-
nomenon. So my organisms keep getting better and better at what
they are doing; they can never stop, because stopping would mean
that they had a complete answer to a math problem to which incom-
pleteness applies. Indeed, the three mathematical challenges that my
organisms face, naming large integers, fast growing functions, and large
transﬁnite ordinals, are very concrete, tangible examples of the incom-
pleteness phenomenon, which at ﬁrst seemed rather mysterious.
Incompleteness is the reason that our organisms have to keep evolving
forever, as they strive to become more and more complete, less and
less incomplete. . . Incompleteness keeps our model of evolution from
stagnating, it gives our organisms a mission, a raison d’ˆetre.
You have to go beyond incompleteness; incompleteness gives rise to
creativity and evolution. Incompleteness sounds bad, but the other
side of the coin is creativity and evolution, which are good.
Now we give an outline of the course, consisting of Course Topics 1–9:
1. This introduction.
16.
16 Chaitin: Metabiology
2. The Search for the Perfect Language. (My talk at the Perimeter Insti-
tute in Waterloo.)
Umberto Eco, Lull, Leibniz, Cantor, Russell, Hilbert, G¨odel, Turing,
AIT, Ω. Kabbalah, Key to Universal Knowledge, God-like Power of
Creation, the Golem!
Mathematical theories are all incomplete (G¨odel, Turing, Ω), but pro-
gramming languages are universal. Most concise programming lan-
guages, self-delimiting programs.
3. Is the world built out of information? Is everything software? (My talk
at the Technion in Haifa.)
Physics of information: Quantum Information Theory; general relativ-
ity and black holes, Beckenstein bound, holographic principle = every
physical system contains a ﬁnite number of bits of information that
grows as the surface area of the physical system, not as its volume
(Lee Smolin, Three Roads to Quantum Gravity); derivation of Ein-
stein’s ﬁeld equations for gravity from the thermodynamics of black
holes (Ted Jacobson, “Thermodynamics of Spacetime: The Einstein
Equation of State”).
The ﬁrst attempt to construct a truly fundamental mathematical model
for biology: von Neumann self-reproducing automata in a cellular
automata world, a world in which magic works, a plastic world.
See also: Edgar F. Codd, Cellular Automata. Konrad Zuse, Rechnen-
der Raum (Calculating Space). Fred Hoyle, Ossian’s Ride. Freeman
Dyson, The Sun, the Genome, and the Internet (1999), green technol-
ogy. Craig Venter, genetic engineering, synthetic life.
Technological applications: Seeds for houses, seeds for jet planes!
Plant the seed in the earth just add water and sunlight. Universal con-
structors, 3D printers = matter printers = printers for objects. Flexible
manufacturing. Alchemy, Plastic reality.
4. Artiﬁcial Life: Evolution Simulations.
Low Level: Thomas Ray’s Tierra, Christoph Adami’s Avida, Walter
Fontana’s ALchemy (Algorithmic Chemistry), Genetic algorithms.
High Level: Exploratory concept-formation based on examining lots
of examples in elementary number theory, experimental math with no
17.
Introduction: Building a theory 17
proofs: Douglas Lenat (1984), “Automated theory formation in math-
ematics,” AM.
After a while these stop evolving. What about proofs instead of
simulations? Seems impossible—see the frontispiece quotes facing the
title page, especially the one by Jacob Schwartz—but there is hope.
See Course Topic 5 arguing that Ω is provably a bridge from math to
biology.
5. How Real Are Real Numbers? A History of Ω. (My talk at WLU in
Waterloo.)
Course Topic 3 gives physical arguments against real numbers, and this
course topic gives mathematical arguments against real numbers. These
considerations about paradoxical real numbers will lead us straight to
the halting probability Ω. That is not how Ω was actually discovered,
but it is the best way of understanding Ω. It’s a Whig history: how it
should have been, not how it actually was.
The irreducible complexity real number Ω proves that math is more
biological than biology; this is the ﬁrst real bridge between math and
biology. Biology is extremely complicated, and pure math is inﬁnitely
complicated.
The theme of Ω as concentrated mathematical creativity is introduced
here; this is important because Ω is the organism that emerges through
random evolution in Course Topic 8.
Now let’s get to work in earnest to build a mathematical theory of
evolution and biological creativity.
6. Metabiology: Life as Evolving Software.
Stephen Wolfram, NKS: the origin of life as the physical implementa-
tion of a universal programming language; the Ubiquity of Univer-
sality. Fran¸cois Jacob, bricolage, Nature is a cobbler, a tinkerer. Neil
Shubin, Your Inner Fish. Stephen Gould, Wonderful Life, on the Cam-
brian explosion of body designs. Murray Gell-Mann, frozen accidents.
Ernst Haeckel, ontogeny recapitulates phylogeny. Evo-devo.
Note that a small change in a computer program (one bit!) can com-
pletely wreck it. But small changes can also make substantial improve-
ments. This is a highly nonlinear eﬀect, like the famous butterﬂy eﬀect
18.
18 Chaitin: Metabiology
of chaos theory (see James Gleick’s Chaos). Over the history of this
planet, covering the entire surface of the earth, there is time to try
many small changes. But not enough time according to the Intelli-
gent Design book Signature in the Cell. In the real world this is still
controversial, but in my toy model evolution provably works.
A blog summarized one of my talks on metabiology like this: “We are
all random walks in program space!” That’s the general idea; in Course
Topics 7 and 8 we ﬁll in the details of this new theory.
7. Creativity in Mathematics. We need to challenge our organisms into
evolving. We need to keep them from stagnating. These problems can
utilize an unlimited amount of mathematical creativity:
• Busy Beaver problem: Naming large integers: 1010
, 101010
. . .
• Naming fast-growing functions: N2
, 2N
. . .
• Naming large transﬁnite Cantor ordinals: ω, ω2
, ωω
. . .
8. Creativity in Biology. Single mutating organism. Hill-climbing algo-
rithm on a ﬁtness landscape. Hill-climbing random walks in software
space. Evolution of mutating software. What is a mutation? Exhaus-
tive search. Intelligent design. Cumulative evolution at random. Ω as
concentrated creativity, Ω as an evolving organism. Randomness yields
intelligence.
We have a proof that evolution works, at least in this toy model; in fact,
surprisingly it is nearly as fast as intelligent design, as deliberately
choosing the mutations in the best possible order. But can we show
that random evolution is slower than intelligent design? Otherwise the
theory collapses onto a point, it cannot distinguish, it does not make
useful distinctions. We also get evolution of hierarchical structure in
non-universal programming languages.
So we seem to have evolution at work in these toy models. But to what
extent is this relevant to real biological systems?
9. Conclusion: On the plasticity of the world. Is the universe mental?
Speculation where all this might possibly lead.
19.
Chapter 2
The search for the perfect
language
I will tell how the story given in Umberto Eco’s book The Search for the
Perfect Language continues with modern work on logical and programming
languages. Lecture given Monday, 21 September 2009, at the Perimeter In-
stitute for Theoretical Physics in Waterloo, Canada.1
Today I’m not going to talk much about Ω. I will focus on that at Wilfrid
Laurier University tomorrow. And if you want to hear a little bit about my
current enthusiasm, which is what I’m optimistically calling metabiology —
it’s a ﬁeld with a lovely name and almost no content at this time — that’s
on Wednesday at the Institute for Quantum Computing.
I thought it would be fun here at the Perimeter Institute to repeat a talk,
to give a version of a talk, that I gave in Jerusalem a year ago. To understand
the talk it helps to keep in mind that it was ﬁrst given in Jerusalem. I’d like
to give you a broad sweep of the history of mathematical logic. I’m a math-
ematician who likes physicists; some mathematicians don’t like physicists.
But I do. Before I became a mathematician I wanted to be a physicist.
So I’m going to talk about mathematics, and I’d like to give you a broad
overview, most deﬁnitely a non-standard view of some intellectual history. It
1
This lecture was published in Portuguese in S˜ao Paulo, Brazil, in the magazine Dicta
& Contradicta, No. 4, 2009. See http://www.dicta.com.br/.
19
20.
20 Chaitin: Metabiology
will be a talk about the history of work on the foundations of mathematics
as seen from the perspective of the Middle Ages. So here goes. . .
This talk = Umberto Eco + Hilbert, G¨odel, Turing. . .
Outline at: http://www.cs.umaine.edu/~chaitin/hu.html
There is a wonderful book by Umberto Eco called The Search for the Perfect
Language, and I recommend it highly to all of you.
In The Search for the Perfect Language you can see that Umberto Eco
likes the Middle Ages — I think he probably wishes we were still there. And
this book talks about a dream that Eco believes played a fundamental role
in European intellectual history, which is the search for the perfect language.
What is the search for the perfect language? Nowadays a physicist would
call this the search for a Theory of Everything (TOE), but in the terms in
which it was formulated originally, it was the idea of ﬁnding, shall we say, the
language of creation, the language before the Tower of Babel, the language
that God used in creating the universe, the language whose structure directly
expresses the structure of the world, the language in which concepts are
expressed in their direct, original format.
You can see that this idea is a little bit like the attempt to ﬁnd a foun-
dational Theory of Everything in physics.
The crucial point is that knowing this language would be like having a key
to universal knowledge. If you’re a theologian, it would bring you closer,
very close, to God’s thoughts, which is dangerous. If you’re a magician, it
would give you magical powers. If you’re a linguist, it would tell you the
original, pure, uncorrupted language from which all languages descend. One
can go on and on. . .
This very fascinating book is about the quest to ﬁnd this language. If
you ﬁnd it, you’re opening a door to absolute knowledge, to God, to the
ultimate nature of reality, to whatever.
And there are a lot of interesting chapters in this intellectual history. One
of them is Raymond Lull, around 1200, a Catalan.
Raymond Lull ≈ 1200
He was a very interesting gentleman who had the idea of mechanically com-
bining all possible concepts to get new knowledge. So you would have a wheel
with diﬀerent concepts on it, and another wheel with other concepts on it,
and you would rotate them to get all possible combinations. This would be
21.
The search for the perfect language 21
a systematic way to discover new concepts and new truths. And if you re-
member Swift’s Gulliver’s Travels, there Swift makes fun of an idea like this,
in one of the parts of the book that is not for children but deﬁnitely only for
adults.
Let’s leave Lull and go on to Leibniz. In The Search for the Perfect
Language there is an entire chapter on Leibniz. Leibniz is a transitional
ﬁgure in the search for the perfect language. Leibniz is wonderful because he
is universal. He knows all about Kabbalah, Christian Kabbalah and Jewish
Kabbalah, and all kinds of hermetic and esoteric doctrines, and he knows
all about alchemy, he actually ghost-authored a book on alchemy. Leibniz
knows about all these things, and he knows about ancient philosophy, he
knows about scholastic philosophy, and he also knows about what was then
called mechanical philosophy, which was the beginning of modern science.
And Leibniz sees good in all of this.
And he formulates a version of the search for the perfect language, which
is ﬁrmly grounded in the magical, theological original idea, but which is also
ﬁt for consumption nowadays, that is, acceptable to modern ears, to contem-
porary scientists. This is a universal language he called the characteristica
universalis that was supposed to come with a crucial calculus ratiocinator.
Leibniz: characteristica universalis, calculus ratiocinator
The idea, the goal, is that you would reduce reasoning to calculation, to
computation, because the most certain thing is that 2 + 5 = 7. In other
words, the way Leibniz put it, perhaps in one of his letters, is that if two
people have an intellectual dispute, instead of dueling they could just sit
down and say, “Gentlemen, let us compute!”, and get the correct answer and
ﬁnd out who was right.
So this is Leibniz’s version of the search for the perfect language. How
far did he get with this?
Well, Leibniz is a person who gets bored easily, and ﬂies like a butterﬂy
from ﬁeld to ﬁeld, throwing out fundamental ideas, rarely taking the trouble
to develop them fully.
One case of the characteristica universalis that Leibniz did develop is
called the calculus. This is one case where Leibniz worked out his ideas for
the perfect language in beautiful detail.
Leibniz’s version of the calculus diﬀers from Newton’s precisely because
it is part of Leibniz’s project for the characteristica universalis. Christian
Huygens hated the calculus.
22.
22 Chaitin: Metabiology
Christian Huygens taught Leibniz mathematics in Paris at a relatively
late age, when Leibniz was in his twenties. Most mathematicians start very,
very young. And Christian Huygen’s hated Leibniz’s calculus because he
said that it was mechanical, it was brainless: Any fool can just calculate the
answer by following the rules, without understanding what he or she is doing.
Huygens preferred the old, synthetic geometry proofs where you have
to be creative and come up with a diagram and some particular reason for
something to be true. Leibniz wanted a general method. He wanted to get
the formalism, the notation, right, and have a mechanical way to get the
answer.
Huygens didn’t like this, but that was precisely the point. This was
precisely what Leibniz was looking for, for everything!
The idea was that if you get absolute truth, if you have found the truth,
it should mechanically enable you to determine what’s going on, without
creativity. This is good, this is not bad.
This is also precisely how Leibniz’s version of the calculus diﬀered from
Newton’s. Leibniz saw clearly the importance of having a formalism that led
you automatically to the answer.
Let’s now take a big jump, to David Hilbert, about a century ago. . .
No, ﬁrst I want to tell you about an important attempt to ﬁnd the perfect
language: Cantor’s theory of inﬁnite sets.
Cantor: Inﬁnite Sets
This late 19th century theory is interesting because it’s ﬁrmly in the Middle
Ages and also, in a way, the inspiration for all of 20th century mathematics.
This theory of inﬁnite sets was actually theology. This is mathematical
theology. Normally you don’t mention that fact. To be a ﬁeld of mathe-
matics, the price of admission is you throw out all the philosophy, and you
just end up with something technical. So all the theology has been thrown
out.
But Cantor’s goal was to understand God. God is transcendent. The
theory of inﬁnite sets has this hierarchy of bigger and bigger inﬁnities, the
alephs, the ℵ’s. You have ℵ0, ℵ1, the inﬁnity of integers, of real numbers,
and you keep going. Each one of these is the set of all subsets of the previous
one. And very far out you get mind-boggling inﬁnities like ℵω; this is the
ﬁrst inﬁnity after
ℵ0, ℵ1, ℵ2, ℵ3, ℵ4 . . .
23.
The search for the perfect language 23
Then you can continue with
ω + 1, ω + 2, ω + 3 . . . 2ω + 1, 2ω + 2, 2ω + 3 . . .
These so-called ordinal numbers are subscripts for the ℵ’s, which are cardi-
nalities. Let’s go farther:
ℵω2 , ℵωω , ℵωωω . . .
And there’s an ordinal called epsilon-nought
0 = ωωωω...
which is the smallest solution of the equation
x = ωx
.
And the corresponding cardinal
ℵ 0
is pretty big!
You know, God is very far oﬀ, since God is inﬁnite and transcendent. We
can try to go in His direction. But we’re never going to get there, because
after every cardinal, there’s a bigger one, the cardinality of the set of all
subsets. And after any inﬁnite sequence of cardinals that you get, you just
take the union of all of that, and you get a bigger cardinal than is in the
sequence. So this thing is inherently open-ended. And contradictory, by
the way!
There’s only one problem. This is absolutely wonderful, breath-taking
stuﬀ. The only problem is that it’s contradictory.
The problem is very simple. If you take the universal set, the set of
everything, and you consider the set of all its subsets, by Cantor’s diago-
nal argument this should have a bigger cardinality, but how can you have
anything bigger than the set of everything?
This is the paradox that Bertrand Russell discovered. Russell looked
at this and asked why do you get this bad result. And if you look at the
Cantor diagonal argument proof that the set of all subsets of everything is
bigger than everything, it involves the set of all sets that are not members
of themselves,
{x : x ∈ x},
24.
24 Chaitin: Metabiology
which can neither be in itself nor not be in itself. This is called the Russell
paradox.
Cantor was aware of the fact that this happens, but Cantor wasn’t both-
ered by these contradictions, because he was doing theology. We’re ﬁnite but
God is inﬁnite, and it’s paradoxical for a ﬁnite being to try to comprehend a
transcendent, inﬁnite being, so paradoxes are okay. But the math community
is not very happy with a theory which leads to contradictions.
However, these ideas are so wonderful, that what the math community
has done is forget about all this theology and philosophy and try to sweep
the contradictions under the rug. There is an expurgated version of all this
called Zermelo-Fraenkel set theory, with the axiom of choice, usually: ZFC.
This is a formal axiomatic theory which you develop using ﬁrst-order logic,
and it is an expurgated version of Cantor’s theory believed not to contain
any paradoxes.
Anyway, Bertrand Russell was inspired by all of this to attempt a general
critique of mathematical reasoning, and to ﬁnd a lot of contradictions, a lot
of mathematical arguments that lead to contradictions.
Bertrand Russell: mathematics is full of contradictions.
I already told you about his most famous one, the Russell paradox.
Russell was an atheist who was searching for the absolute, who believed
in absolute truth. And he loved mathematics and wanted mathematics to
be perfect. Russell went around telling people about these contradictions in
order to try to get them ﬁxed.
Besides the paradox that there’s no biggest cardinal and that the set of
subsets of everything is bigger than everything, there’s also a problem with
the ordinal numbers that’s called the Burali-Forti paradox, namely that the
set of all the ordinals is an ordinal that’s bigger than all the ordinals. This
works because each ordinal can be deﬁned as the set of all the ordinals that
are smaller than it is. (Then an ordinal is less than another ordinal if and
only if it is contained in it.)
Russell is going around telling people that reason leads to contradictions.
So David Hilbert about a century ago proposes a program to put mathematics
on a ﬁrm foundation. And basically what Hilbert proposes is the idea of
a completely formal axiomatic theory, which is a modern version of
Leibniz’s characteristica universalis and calculus ratiocinator:
David Hilbert: mathematics is a formal axiomatic theory.
25.
The search for the perfect language 25
This is the idea of making mathematics totally objective, of removing all
the subjective elements.
So in such a formal axiomatic theory you would have a ﬁnite number
of axioms, axioms that are not written in an ambiguous natural language.
Instead you use a precise artiﬁcial language with a simple, regular artiﬁcial
grammar. You use mathematical logic, not informal reasoning, and you
specify the rules of the game completely precisely. It should be mechanical
to decide whether a proof is correct.
Hilbert was a conservative. He believed that mathematics gives abso-
lute truth, which is an idea from the Middle Ages. You can see the Middle
Ages whenever you mention absolute truth. Nevertheless, modern mathe-
maticians remain enamored with absolute truth. As G¨odel said, we pure
mathematicians are the last holdout of the Middle Ages. We still believe
in the Platonic world of ideas, at least mathematical ideas, when everyone
else, including philosophers, now laughs at this notion. But pure mathemati-
cians live in the Platonic world of ideas, even though everyone else stopped
believing in this a long time ago.
So math gives absolute truth, said Hilbert. Every mathematician some-
where deep inside believes this. Then there ought to exist a ﬁnite set of
axioms, and a precise set of rules for deduction, for inference, such that all of
mathematical truth is a consequence of these axioms. You see, if mathemat-
ical truth is black or white, and purely objective, then if you ﬁll in all the
steps in a proof and carefully use an artiﬁcial language to avoid ambiguity,
you should be able to have a ﬁnite set of axioms we can all agree on, that
in principle enable you to deduce all of mathematical truth. This is just the
notion that mathematics provides absolute certainty; Hilbert is analyzing
what this means.
What Hilbert says is that the traditional view that mathematics provides
absolute certainty, that in the Platonic world of pure mathematics everything
is black or white, means that there should be a single formal axiomatic theory
for all of math. That was a very important idea of his.
An important consequence of this idea goes back to the Middle Ages.
This perfect language for mathematics, which is what Hilbert was looking
for, would in fact give a key to absolute knowledge, because in principle
you could mechanically deduce all the theorems from the axioms, simply by
running through the tree of all possible proofs. You start with the axioms,
then you apply the rules of inference once, and get all the theorems that have
one-step proofs, you apply them two times, and you get all the theorems that
26.
26 Chaitin: Metabiology
have two-step proofs, and like that, totally mechanically, you would get all
of mathematical truth, by systematically traversing the tree of all possible
proofs.
This would not put all mathematicians out of work, not at all. In practice
this process would take an outrageous amount of time to get to interesting
results, and all the interesting theorems would be overwhelmed by uninter-
esting theorems, such as the fact that 1 + 1 = 2 and other trivialities.
It would be hard to ﬁnd the interesting theorems and to separate the
wheat from the chaﬀ. But in principle this would give you all mathematical
truths. You wouldn’t actually do it, but it would show that math gives
absolute certainty.
By the way, it was important to make all mathematicians agree on the
choice of formal axiomatic theory, and you would use metamathematics to
try to convince everyone that this formal axiomatic theory avoids all the
paradoxes that Bertrand Russell had noticed and contains no contradictions.
Okay, so this was the idea of putting mathematics on a ﬁrm foundation
and removing all doubts. This was Hilbert’s idea, about a century ago, and
metamathematics studies a formal axiomatic theory from the outside, and
notice that this is a door to absolute truth, following the notion of the perfect
language.
So what happens with this program, with this proposal of Hilbert’s? Well,
there’s some good news and some bad news. Some of the good news I already
mentioned: The thing that comes the closest to what Hilbert asked for is
Zermelo-Fraenkel set theory, and it is a beautiful axiomatic theory. I want
to mention some of the milestones in the development of this theory.
One of them is the von Neumann integers, so let me tell you about that.
Remember that Spinoza has a philosophical system in which the world is
built out of only one substance, and that substance is God, that’s all there
is. Zermelo-Fraenkel set theory is similar. Everything is sets, and every set
is built out of the empty set. That’s all there is: the empty set, and sets
built starting with the empty set.
So zero is the empty set, that’s the ﬁrst von Neumann integer, and in
general n + 1 is deﬁned to be the set of all integers less than or equal to n:
von Neumann integers: 0 = {}, n + 1 = {0, 1, 2, . . . , n}.
So if you write this out in full, removing all the abbreviations, all you have
are curly braces, you have set formation starting with no content, and the
27.
The search for the perfect language 27
full notation for n grows exponentially in n, if you write it all out, because
everything up to that point is repeated in the next number. In spite of this
exponential growth, this is a beautiful conceptual scheme.
Then you can deﬁne rational numbers as pairs of these integers, you
can deﬁne real numbers as limit sequences of rationals, and you get all of
mathematics, starting just with the empty set. So it’s a lovely piece of
ontology. Here’s all of mathematical creation just built out of the empty set.
And other people who worked on this are of course Fraenkel and Zermelo,
because it is called Zermelo-Fraenkel set theory, and an approximate notion
of what they did was to try to avoid sets that are too big. The universal set
is too big, it gets you into trouble. Not every property determines a set.
So this is a formal theory that most mathematicians believe enables you to
carry out all the arguments that normally appear in mathematics — maybe
if you don’t include category theory, which is very diﬃcult to formalize, and
even more paradoxical than set theory, from what I hear.
Okay, so that’s some of the positive work on Hilbert’s program. Now
some of the negative work on Hilbert’s program — I’d like to tell you about
it, you’ve all heard of it — is of course G¨odel in 1931 and Turing in 1936.
G¨odel, 1931 — Turing, 1936
What they show is that you can’t have a perfect language for mathematics,
you cannot have a formal axiomatic theory like Hilbert wanted for all of
mathematics, because of incompleteness, because no such system will include
all of mathematical truth, it will always leave out truths, it will always be
incomplete.
And this is G¨odel’s incompleteness theorem of 1931, and G¨odel’s original
proof is very strange. It’s basically the paradox of “this statement is false,”
“This statement is false!”
which is a paradox of course because it can be neither true nor false. If it’s
false that it’s false, then it’s true, and if it’s true that it’s false, then it’s
false. That’s just a paradox. But what G¨odel does is say “this statement is
unprovable.”
“This statement is unprovable!”
So if the statement says of itself it’s unprovable, there are two possibilities:
it’s provable, or it isn’t.
28.
28 Chaitin: Metabiology
If it’s provable, then we’re proving something that’s false, because it says
it’s unprovable. So we hope that’s not the case; by hypothesis, we’ll eliminate
that possibility. If we prove things that are false, we have a formal axiomatic
theory that we’re not interested in, because it proves false things.
The only possibility left is that it’s unprovable. But if it’s unprovable
then it’s true, because it asserts it’s unprovable, therefore there’s a hole. We
haven’t captured all of mathematical truth in our theory.
This proof of incompleteness shocks a lot of people, but my personal
reaction to it is, okay, it’s correct, but I don’t like it.
A better proof of incompleteness, a deeper proof, comes from Turing in
1936. He derives incompleteness from a more fundamental phenomenon,
which is uncomputability, the discovery that mathematics is full of stuﬀ that
can’t be calculated, of things you can deﬁne, but which you cannot calculate,
because there’s no algorithm.
Uncomputability ⇒ Incompleteness
And in particular, the uncomputable thing that he discovers is the halt-
ing problem, a very simple question: Does a computer program that’s
self-contained halt or does it go on forever? There is no algorithm to answer
this in every individual case, therefore there is no formal axiomatic theory
that enables you to always prove in individual cases what the answer is.
Why not? Because if there were a formal axiomatic theory that’s complete
for the halting problem, that would give you a mechanical procedure for
deciding, by running through the tree of all possible proofs, until you ﬁnd
a proof that an individual program you’re interested in halts, or you ﬁnd a
proof that it doesn’t. But that’s impossible because this is not a computable
function.
So Turing’s insight in 1936 is that incompleteness, that G¨odel found in
1931, for any formal axiomatic theory, comes from a deeper phenomenon,
which is uncomputability. Incompleteness is an immediate corollary of un-
computability, a concept which does not appear in G¨odel’s 1931 paper.
But Turing’s paper has both good and bad aspects. There’s a negative
aspect of his 1936 paper, which I’ve just told you about, but there’s also a
positive aspect. You get another proof, a deeper proof of incompleteness,
but you also get a kind of completeness. You ﬁnd a perfect language.
There is no perfect language for mathematical reasoning. G¨odel showed
that in 1931, and Turing showed it again in 1936. But what Turing also
29.
The search for the perfect language 29
showed in 1936 is that there are perfect languages, not for mathematical
reasoning, but for computation, for specifying algorithms.
What Turing discovers in 1936 is that there’s a kind of completeness
called universality and that there are universal Turing machines and universal
programming languages.
Universal Turing Machines / Programming Languages
What “universal” means, what a universal programming language or a uni-
versal Turing machine is, is a language in which every possible algorithm can
be written.
So on the one hand, Turing shows us in a deeper way that any language
for mathematical reasoning has to be incomplete, but on the other hand,
he shows us that languages for computation can be universal, which is just
another name, a synonym, for completeness.
There are perfect languages for computation, for writing algorithms, even
though there aren’t any perfect languages for mathematical reasoning. This
is the positive side, this is the completeness side, of Turing’s 1936 paper.
Now, what I’ve spent most of my professional life on, is a subject I call
algorithmic information theory
Algorithmic Information Theory (AIT)
that derives incompleteness from uncomputability by taking advantage of
a deeper phenomenon, by considering an extreme form of uncomputability,
which is called algorithmic randomness or algorithmic irreducibility.
AIT: algorithmic randomness, algorithmic irreducibility
There’s a perfect language again, and there’s also a negative side, the halt-
ing probability Ω, whose bits are algorithmically random, algorithmically
irreducible mathematical truths.
Ω = .010010111 . . .
This is a place in pure mathematics where there’s no structure. If you want
to know the bits of the numerical value of the halting probability, this is
a well-deﬁned mathematical question, and in the world of mathematics all
truths are necessary truths, but these look like accidental, contingent
truths. They look random, they have irreducible complexity.
30.
30 Chaitin: Metabiology
This is a maximal case of uncomputability, this is a place in pure mathe-
matics where there’s absolutely no structure at all. Although it is true that
you can in a few cases actually know some of the ﬁrst bits. . .
There are actually an inﬁnite number of halting probabilities depending
on your choice of programming language. After you choose a language, then
you ask what is the probability that a program generated by coin tossing
will eventually halt. And that gives you a diﬀerent halting probability. The
numerical value will be diﬀerent; the paradoxical properties are the same.
Okay, there are cases for which you can get a few of the ﬁrst bits. For
example, if Ω starts with 1s in binary or 9s in decimal, you can know those
bits or digits, if Ω is .11111. . . base two or .99999. . . base ten. So you can get
a ﬁnite number of bits, perhaps, of the numerical value, but if you have an N-
bit formal axiomatic theory, then you can’t get more than N bits of Ω. That’s
sort of the general result. It’s irreducible logically and computationally. It’s
irreducible mathematical information. It’s a perfect simulation in pure math,
where all truths are necessary, of contingent, accidental, maximal entropy
truths.
So that’s the bad news from AIT. But just like in Turing’s 1936 work,
there is a positive side. On the one hand we have maximal uncomputabil-
ity, maximal entropy, total lack of structure, of any redundancy, in an
information-theoretic sense, but there’s also good news.
AIT, the theory of program-size complexity, the theory where Ω is the
crown jewel, goes further than Turing, and picks out from Turing’s universal
Turing machines, from Turing’s universal languages, maximally expressive
programming languages. Because those are the ones that you have to use to
develop this theory where you get to Ω.
AIT has the notion of a maximally expressive programming language in
which programs are maximally compact, and deals with a very basic complex-
ity concept which is the size of the smallest program to calculate something:
H(x) is the size in bits of the smallest program to calculate x.
And we now have a better notion of perfection. The perfect languages
that Turing found, the universal programming languages, are not all equally
good. We now concentrate on a subset, the ones that enable us to write the
most concise programs. These are the most expressive languages, the ones
with the smallest programs.
Now let me tell you, this deﬁnition of complexity is a dry, technical way
of expressing this idea in modern terms. But let me put this into Medieval
31.
The search for the perfect language 31
terminology, which is much more colorful. The notion of program-size com-
plexity — which by the way has many diﬀerent names: algorithmic complex-
ity, Kolmogorov complexity, algorithmic information content — in Medieval
terms, what we’re asking is, how many yes/no decisions did God have
to make to create something?, which is obviously a rather basic question
to ask. That is, if you consider that God is calculating the universe.
I’m giving you a Medieval perspective on these modern developments.
Theology is the fundamental physics, it’s the theoretical physics of the Middle
Ages.
I have a lot of time left — I’ve been racing through this material — so
maybe I should explain in more detail how AIT contributes to the quest for
the perfect language.
The notion of universal Turing machine that is used in AIT is Turing’s
very basic idea of a ﬂexible machine. It’s ﬂexible hardware, which we call soft-
ware. In a way, Turing in 1936 creates the computer industry and computer
technology. That’s a tremendous beneﬁt of a paper that mathematically
sounds at ﬁrst rather negative, since it talks about things that cannot be
calculated, that cannot be proved. But on the other hand there’s a very pos-
itive aspect — I stated it in theoretical terms — which is that programming
languages can be complete, can be universal, even though formal axiomatic
theories cannot be complete.
Okay, so you get this technology, there’s this notion of a ﬂexible machine,
this notion of software, which emerges in this paper. Von Neumann, the
same von Neumann who invented the von Neumann integers, credited all of
this to Turing. At least Turing is responsible for the concept; the hardware
implementation is another matter.
Now, AIT, where you talk about program-size complexity, the size of the
smallest program, how many yes/no decisions God has to make to calcu-
late something, to create something, picks out a particular class of universal
Turing machines U.
What are the universal computers U like that you use to deﬁne program-
size complexity and talk about Ω? Well, a universal computer U has the
property that for any other computer C and its program p, your universal
computer U will calculate the same result if you give it the original program
p for C concatenated to a preﬁx πC which depends only on the computer
C that you want to simulate. πC tells U which computer to simulate. In
symbols,
U(πC p) = C(p).
32.
32 Chaitin: Metabiology
In other words, πC p is the concatenation of two pieces of information.
It’s a binary string. You take the original program p, which is also a binary
string, and in front of it you put a preﬁx that tells you which computer to
simulate.
Which means that these programs πC p for U are only a ﬁxed number of
bits larger than the programs p for any individual machine C.
These U are the universal Turing machines that you use in AIT. These
are the most expressive languages. These are the languages with maximal
expressive power. These are the languages in which programs are as concise
as possible. This is how you deﬁne program-size complexity. God will natu-
rally use the most perfect, most powerful programming languages, when he
creates the world, to build everything.
I should point out that Turing’s original universality concept was not
careful about counting bits; it didn’t really care about the size of programs.
All a universal machine U had to do was to be able to simulate any other
machine C, but one did not study the size of the program for U as a function
of the size of the program for C. Here we are careful not to waste bits.
AIT is concerned with particularly eﬃcient ways for U to be universal.
The original notion of universality in Turing was not this demanding.
The fact that you can just add a ﬁxed number of bits to a program for C
to get one for U is not completely trivial. Let me tell you why.
After you put πC and p together, you have to know where the preﬁx ends
and the program that is being simulated begins. There are many ways to do
this.
A very simple way to make the preﬁx πC self-delimiting is to have it be
a sequence of 0’s followed by a 1:
πC = 0k
1.
And the number k of 0’s tells us which machine C to simulate. That’s a very
wasteful way to indicate this.
The preﬁx πC is actually an interpreter for the programming language C.
AIT’s universal languages U have the property that you give U an interpreter
plus the program p in this other language C, and U will run the interpreter
to see what p does.
If you think of this interpreter πC as an arbitrary string of bits, one way
to make it self-delimiting is to just double all the bits. 0 goes to 00, 1 goes
to 11, and you put a pair of unequal bits 01 as punctuation at the end:
33.
The search for the perfect language 33
Arbitrary πC: 0 → 00, 1 → 11, 01 at the end.
This is a better way to have a self-delimiting preﬁx that you can concatenate
with p. It only doubles the size, the 0k
1 trick increases the size exponentially.
And there are more eﬃcient ways to make the preﬁx self-delimiting. For
example, you can put the size of the preﬁx in front of the preﬁx. But it’s
sort of like Russian dolls, because if you put the size |πC| of πC in front of
πC, |πC| also has to be self-delimiting:
U(. . . ||πC|| |πC| πC p) = C(p).
Anyway, picking U this way is the key idea in the original 1960s version
of AIT that Solomonoﬀ, Kolmogorov and I independently proposed. But
ten years later I realized that this is not the right approach. You actually
want the whole program πC p for U to be self-delimiting, not just the preﬁx
πC. You want the whole thing to be self-delimiting to get the right theory of
program-size complexity.
Let me compare the 1960s version of AIT and the 1970s version of AIT.
Let me compare these two diﬀerent theories of program-size complexity.
In the 1960s version, an N-bit string will in general need an N-bit pro-
gram, if it’s irreducible, and most strings are algorithmically irreducible.
Most N-bit strings need an N-bit program. These are the irreducible strings,
the ones that have no pattern, no structure. Most N-bit strings need an N-
bit program, because there aren’t enough smaller programs.
But in the 1970s version of AIT, you go from N bits to N + log2 N bits,
because you want to make the programs self-delimiting. An N-bit string will
usually need an N + log2 N bit program:
Most N-bit strings
AIT1960: N bits of complexity,
AIT1970: N + log2 N bits of complexity.
Actually, in AIT1970 it’s N plus H(N), which is the size of the smallest
self-delimiting program to calculate N, that’s exactly what that logarithmic
term is. In other words, in the 1970s version of AIT, the size of the smallest
program for calculating an N-bit string is usually N bits plus the size in bits
of the smallest self-delimiting program to calculate N, which is roughly
log N + log log N + log log log N + . . .
34.
34 Chaitin: Metabiology
bits long. That’s the Russian dolls aspect of this.
The 1970s version of AIT, which takes the idea of being self-delimiting
from the preﬁx and applies it to the whole program, gives us even better
perfect languages. AIT evolved in two stages. First we concentrate on those
U with
U(πC p) = C(p)
with πC self-delimiting, and then we insist that the whole thing πC p has also
got to be self-delimiting. And when you do that, you get important new
results, such as the sub-additivity of program-size complexity,
H(x, y) ≤ H(x) + H(y),
which is not the case if you don’t make everything self-delimiting. This just
says that you can concatenate the smallest program for calculating x and the
smallest program for calculating y to get a program for calculating x and y.
And you can’t even deﬁne the halting probability Ω in AIT1960. If you
allow all N-bit strings to be programs, then you cannot deﬁne the halting
probability in a natural way, because the sum for deﬁning the probability
that a program will halt
Ω =
p halts
2−(size in bits of p)
diverges to inﬁnity instead of being between zero and one. This is the key
technical point in AIT.
I want the halting probability to be ﬁnite. The normal way of thinking
about programs is that there are 2N
N-bit programs, and the natural way
of deﬁning the halting probability is that every N-bit program that halts
contributes 1/2N
to the halting probability. The only problem is that for
any ﬁxed size N there are roughly order of 2N
programs that halt, so if you
sum over all possible sizes, you get inﬁnity, which is no good.
In order to get the halting probability to be between zero and one
0 < Ω =
p halts
2−(size in bits of p)
< 1
you have to be sure that the total probability summed over all programs p
is less than or equal to one. This happens automatically if we force p to
be self-delimiting. How can we do this? Easy! Pretend that you are the
35.
The search for the perfect language 35
universal computer U. As you read the program bit by bit, you have to
be able to decide by yourself where the program ends, without any special
punctuation, such as a blank, at the end of the program.
This implies that no extension of a valid program is a valid program, and
that the set of valid programs is what’s called a preﬁx-free set. Then the fact
that the sum that deﬁnes Ω must be between zero and one, is just a special
case of what’s called the Kraft inequality in Shannon information theory.
But this technical machinery isn’t necessary. That 0 < Ω < 1 follows
immediately from the fact that as you read the program bit by bit you are
forced to decide where to stop without seeing any special punctuation. In
other words, in AIT1960 we were actually using a three-symbol alphabet for
programs: 0, 1 and blank. The blank told us where a program ends. But
that’s a symbol that you’re wasting, because you use it very little. As you
all know, if you have a three-symbol alphabet, then the right way to use it
is to use each symbol roughly one-third of the time.
So if you really use only 0s and 1s, then you have to force the Turing
machine to decide by itself where the program ends. You don’t put a blank
at the end to indicate that.
So programs go from N bits in size to N +log2 N bits, because you’ve got
to indicate in each program how big it is. On the other hand, you can just take
subroutines and concatenate them to make a bigger program, so program-
size complexity becomes sub-additive. You run the universal machine U to
calculate the ﬁrst object x, and then you run it again to calculate the second
object y, and then you’ve got x and y, and so
H(x, y) ≤ H(x) + H(y).
These self-delimiting binary languages are the ones that the study of
program-size complexity has led us to discriminate as the ideal languages,
the most perfect languages. We got to them in two stages, AIT1960 and
AIT1970. These are languages for computation, for expressing algorithms,
not for mathematical reasoning. They are universal programming languages
that are maximally expressive, maximally concise. We already knew how to
do that in the 1960s, but in the 1970s we realized that programs should be
self-delimiting, which made it possible to deﬁne the halting probability Ω.
Okay, so that’s the story, and now maybe I should summarize all of this,
this saga of the quest for the perfect language. As I said, the search for the
perfect language has some negative conclusions and some positive conclu-
sions.
36.
36 Chaitin: Metabiology
Hilbert wanted to ﬁnd a perfect language giving all of mathematical truth,
all mathematical knowledge, he wanted a formal axiomatic theory for all of
mathematics. This was supposed to be a Theory of Everything for the world
of pure math. And this cannot succeed, because we know that every formal
axiomatic theory is incomplete, as shown by G¨odel, by Turing, and by my
halting probability Ω. Instead of ﬁnding a perfect language, a perfect for-
mal axiomatic theory, we found incompleteness, uncomputability, and even
algorithmic irreducibility and algorithmic randomness.
So that’s the negative side of this story, which is fascinating from an
epistemological point of view, because we found limits to what we can know,
we found limits of formal reasoning.
Now interestingly enough, the mathematical community couldn’t care
less. They still want absolute truth! They still believe in absolute truth, and
that mathematics gives absolute truth. And if you want a proof of this, just
go to the December 2008 issue of the Notices of the American Mathematical
Society. That’s a special issue of the Notices devoted to formal proof.
The technology has been developed to the point where they can run real
mathematics, real proofs, through proof-checkers, and get them checked. A
mathematician writes the proof out in a formal language, and ﬁlls in the
missing steps and makes corrections until the proof-checker can understand
the whole thing and verify that it is correct. And these proof-checkers are
getting smarter and smarter, so that more and more of the details can be
left out. As the technology improves, the job of formalizing a proof becomes
easier and easier.
The formal-proof extremists are saying that in the future all mathematics
will have to be written out formally and veriﬁed by proof-checkers.
The engineering has been worked out to the point that you can formally
prove real mathematical results and run them through proof-checkers for
veriﬁcation. For example, this has been done with the proof of the four-color
conjecture. It was written out as a formal proof that was run through a
proof-checker.
And the position of these extremists is that in the future all mathematics
will have to be written out in a formal language, and you will have to get
it checked before submitting a paper to a human referee, who will then only
have to decide if the proof is worth publishing, not whether the proof is
correct. And they want a repository of all mathematical knowledge, which
would be a database of checked formal proofs of theorems.
This is a substantial community, and to learn more, go to the December
37.
The search for the perfect language 37
2008 AMS Notices, which is available on the web for free in the AMS website.
This is being worked on by a sizeable community, and the Notices devoted a
special issue to it, which means that mathematicians still believe in absolute
truth.
I’m not disparaging this extremely interesting work, but I am saying that
there’s a wonderful intellectual tension between it and the incompleteness re-
sults that I’ve discussed in this talk. There’s a wonderful intellectual tension
between incompleteness and the fact that people still believe in formal proof
and absolute truth. People still want to go ahead and carry out Hilbert’s
program and actually formalize everything, just as if G¨odel and Turing had
never happened!
I think this is an extremely interesting and, at least for me, a quite
unexpected development.
These were the negative conclusions from this saga. Now I want to wrap
this talk up by summarizing the positive conclusions.
There are perfect languages, for computing, not for reasoning. They’re
computer programming languages. And we have universal Turing machines
and universal programming languages, and although languages for reason-
ing cannot be complete, these universal programming languages are com-
plete. Furthermore, AIT has picked out the most expressive programming
languages, the ones that are particularly good to use for a theory of program-
size complexity.
So there is a substantial practical spinoﬀ. Furthermore, since I’ve worked
most of my professional career on AIT, I view AIT as a substantial contri-
bution to the search for the perfect language, because it gives us a measure
of expressive power, and of conceptual complexity and the complexity
of ideas. Remember, I said that from the perspective of the Middle Ages,
that’s how many yes/no decisions God had to make to create something,
which obviously He will do in an optimum manner.2
From the theoretical side, however, this quest was disappointing due to
G¨odel incompleteness and because there is no Theory of Everything for pure
math. Provably there is no TOE for pure math. In fact, if you look at the
bits of the halting probability Ω, they show that pure mathematics contains
inﬁnite irreducible complexity, and in this precise sense is more like biology,
the domain of the complex, than like theoretical physics, where there is still
2
Note that program-size complexity = size of smallest name for something.
38.
38 Chaitin: Metabiology
hope of ﬁnding a simple, elegant TOE.3
So this is the negative side of the story, unless you’re a biologist. The
positive side is we get this marvelous programming technology. So this dream,
the search for the perfect language and for absolute knowledge, ended in the
bowels of a computer, it ended in a Golem.
In fact, let me end with a Medieval perspective on this. How would all
this look to someone from the Middle Ages? This quest, the search for the
perfect language, was an attempt to obtain magical, God-like powers.
Let’s bring someone from the 1200s here and show them a notebook
computer. You have this dead machine, it’s a machine, it’s a physical object,
and when you put software into it, all of a sudden it comes to life!
So from the perspective of the Middle Ages, I would say that the perfect
languages that we’ve found have given us some magical, God-like powers,
which is that we can breath life into some inanimate matter. Observe that
hardware is analogous to the body, and software is analogous to the soul,
and when you put software into a computer, this inanimate object comes to
life and creates virtual worlds.
So from the perspective of somebody from the year 1200, the search for
the perfect language has been successful and has given us some magical,
God-like abilities, except that we take them entirely for granted.
Thanks very much!4
3
Incompleteness can be considered good rather than bad: It shows that mathematics
is creative, not mechanical.
4
Twenty minutes of questions and discussion followed. These have not been transcribed,
but are available via digital streaming video at http://pirsa.org/09090007/.
39.
Chapter 3
Is the world built out of
information? Is everything
software?
From Chaitin, Costa, Doria, After G¨odel, in preparation. Lecture, the Technion, Haifa,
Thursday, 10 June 2010.
Now for some even weirder stuﬀ! Let’s return to The Thirteenth Floor and
to the ideas that we brieﬂy referred to in the introductory section of this
chapter.
Let’s now turn to ontology: What is the world built out of, made out of?
Fundamental physics is currently in the doldrums. There is no pressing
unexpected, new experimental data — or if there is, we can’t see that it
is! So we are witnessing a return to pre-Socratic philosophy with its em-
phasis on ontology rather than epistemology. We are witnessing a return
to metaphysics. Metaphysics may be dead in contemporary philosophy, but
amazingly enough it is alive and well in contemporary fundamental physics
and cosmology.
There are serious problems with the traditional view that the world is
a space-time continuum. Quantum ﬁeld theory and general relativity con-
tradict each other. The notion of space-time breaks down at very small
distances, because extremely massive quantum ﬂuctuations (virtual parti-
cle/antiparticle pairs) should provoke black holes and space-time should be
torn apart, which doesn’t actually happen.
Here are two other examples of problems with the continuum, with very
39
40.
40 Chaitin: Metabiology
small distances:
• the inﬁnite self-energy of a point electron in classical Maxwell electro-
dynamics,
• and in quantum ﬁeld theory, renormalization, which Dirac never ac-
cepted.
And here is an example of renormalization: the inﬁnite bare charge of the
electron which is shielded by vacuum polarization via virtual pair formation
and annihilation, so that far from an electron it only seems to have ﬁnite
charge. This is analogous to the behavior of water, which is a highly polarized
molecule forming micro-clusters that shield charge, with many of the highly
positive hydrogen-ends of H2O near the highly negative oxygen-ends of these
water molecules.
In response to these problems with the continuum, some of us feel that
the traditional
Pythagorian ontology:
God is a mathematician,
the world is built out of mathematics,
should be changed to this more modern
→ Neo-Pythagorian ontology:
God is a programmer,
the world is built out of software.
In other words, all is algorithm!
There is an emerging school, a new viewpoint named digital philosophy.
Here are some key people and key works in this new school of thought: Ed-
ward Fredkin, http://www.digitalphilosophy.org, Stephen Wolfram, A New
Kind of Science, Konrad Zuse, Rechnender Raum (Calculating Space), John
von Neumann, Theory of Self-Reproducing Automata, and Chaitin, Meta
Math!.1
These may be regarded as works on metaphysics, on possible digital
worlds. However there have in fact been parallel developments in the world
of physics itself.
1
Lesser known but important works on digital philosophy: Arthur Burks, Essays on
Cellular Automata, Edgar Codd, Cellular Automata.
41.
Is the world built out of information? Is everything software? 41
Quantum information theory builds the world out of qubits, not matter.
And phenomenological quantum gravity and the theory of the entropy of
black holes suggests that any physical system contains only a ﬁnite number
of bits of information that grows, amazingly enough, as the surface area
of the physical system, not as its volume — hence the name holographic
principle. For more on the entropy of black holes, the Bekenstein bound, and
the holographic principle, see Lee Smolin, Three Roads to Quantum Gravity.
One of the key ideas that has emerged from this research on possible
digital worlds is to transform the universal Turing machine, a machine
capable of running any algorithm, into the universal constructor, a ma-
chine capable of building anything:
Universal Turing Machine → Universal Constructor.
And this leads to the idea of an information economy: worlds in which
everything is software, worlds in which everything is information and you can
construct anything if you have a program to calculate it. This is like magic
in the Middle Ages. You can bring something into being by invoking its true
name. Nothing is hardware, everything is software!2
A more modern version of this everything-is-information view is presented
in two green-technology books by Freeman Dyson: The Sun, the Genome and
the Internet, and A Many-Colored Glass. He envisions seeds to grow houses,
seeds to grow airplanes, seeds to grow factories, and imagines children using
genetic engineering to design and grow new kinds of ﬂowers! All you need is
water, sun and soil, plus the right seeds!
From an abstract, theoretical mathematical point of view, the key concept
here is an old friend from Chapter 2:
H(x) = the size in bits of the smallest program to compute x.
H(x) is also = to the minimum amount of algorithmic information needed
to build/construct x, = in Medieval language the number of yes/no decisions
God had to make to create x, = in biological terms, roughly the amount of
DNA needed for growing x.
It requires the self-delimiting programs of Chapter 2 for the following
intuitively necessary condition to hold:
H(x, y) ≤ H(x) + H(y) + c.
2
On magic in the Middle Ages, see Umberto Eco, The Search for the Perfect Language,
and Allison Coudert, Leibniz and the Kabbalah.
42.
42 Chaitin: Metabiology
This says that algorithmic information is sub-additive: If it takes H(x) bits of
information to build x and H(y) bits of information to build y, then the sum
of that suﬃces to build both x and y. Furthermore, the mutual information,
the information in common, has this important property:
H(x) + H(y) − H(x, y) =
H(x) − H(x|y∗
) + O(1),
H(y) − H(y|x∗
) + O(1).
Here
H(x|y) = the size in bits of the smallest program to compute x from y.
This triple equality tells us that the extent to which it is better to build
x and y together rather than separately (the bits of subroutines that are
shared, the amount of software that is shared) is also equal to the extent
that knowing a minimum-size program y for y helps us to know x and to
the extent to which knowing a minimum-size program x for x helps us to
know y. (This triple equality is an idealization; it holds only in the limit of
extremely large compute times for x and y.)
These results about algorithmic information/complexity H are a kind
of economic meta-theory for the information economy, which is the asymp-
totic limit, perhaps, of our current economy in which material resources
(petroleum, uranium, gold) are still important, not just technological and
scientiﬁc know-how.
But as astrophysicist Fred Hoyle points out in his science ﬁction novel
Ossian’s Ride, the availability of unlimited amounts of energy, say from nu-
clear fusion reactors, would make it possible to use giant mass spectrometers
to extract gold and other chemical elements directly from sea water and soil.
Material resources would no longer be that important.
If we had unlimited energy, all that would matter would be know-how,
information, knowing how to build things. And so we ﬁnally end up with the
idea of a printer for objects, a more plebeian term for a universal constructor.
There are already commercial versions of such devices. They are called 3D
printers and are used for rapid prototyping and digital fabrication. They are
not yet universal constructors, but the trend is clear. . . 3
In Medieval terms, results about H(x) are properties of the size of spells,
they are about the complexity of magic incantations! The idea that every-
thing is software is not as new as it may seem.
3
One current project is to build a 3D printer that can print a copy of itself. See
http://reprap.org.
43.
Bibliography
[1] A. Burks, Essays on Cellular Automata, University of Illinois Press
(1970).
[2] G. J. Chaitin, Meta Math!, Pantheon (2005).
[3] E. Codd, Cellular Automata, Academic Press (1968).
[4] A. Coudert, Leibniz and the Kabbalah, Kluwer (1995).
[5] F. Dyson, The Sun, the Genome and the Internet, Oxford University
Press (1999).
[6] F. Dyson, A Many-Colored Glass, University of Virginia Press (2007).
[7] U. Eco, The Search for the Perfect Language, Blackwell (1995).
[8] E. Fredkin, http://www.digitalphilosophy.org.
[9] F. Hoyle, Ossian’s Ride, Harper (1959).
[10] J. von Neumann, Theory of Self-Reproducing Automata, University of
Illinois Press (1966).
[11] L. Smolin, Three Roads to Quantum Gravity, Basic Books (2001).
[12] S. Wolfram, A New Kind of Science, Wolfram Media (2002).
[13] K. Zuse, Rechnender Raum (Calculating Space), Vieweg (1969).
43
45.
Chapter 4
The information economy
S. Zambelli, Computable, Constructive and Behavioural Economic Dynamics, Routledge,
2010, pp. 73–78.
In honor of Kumaraswamy Velupillai’s 60th birthday
Abstract: One can imagine a future society in which natural resources are
irrelevant and all that counts is information. I shall discuss this possibil-
ity, plus the role that algorithmic information theory might then play as a
metatheory for the amount of information required to construct something.
Introduction
I am not an economist; I work on algorithmic information theory (AIT). This
essay, in which I present a vision of a possible future information economy,
should not be taken too seriously. I am merely playing with ideas and trying
to provide some light entertainment of a kind suitable for this festschrift
volume, given Vela’s deep appreciation of the relevance of foundational issues
in mathematics for economic theory.
In algorithmic information theory, you measure the complexity of some-
thing by counting the number of bits in the smallest program for calculating
it:
program → Universal Computer → output.
If the output of a program could be a physical or a biological system, then
this complexity measure would give us a way to measure of the diﬃculty of
45
46.
46 Chaitin: Metabiology
explaining how to construct or grow something, in other words, measure
either traditional smokestack or newer green technological complexity:
software → Universal Constructor → physical system,
DNA → Development → biological system.
And it is possible to conceive of a future scenario in which technology is
not natural-resource limited, because energy and raw materials are freely
available, but is only know-how limited.
In this essay, I will outline four diﬀerent versions of this dream, in order
to explain why I take it seriously:
1. Magic, in which knowing someone’s secret name gives you power over
them,
2. Astrophysicist Fred Hoyle’s vision of a future society in his science-
ﬁction novel Ossian’s Ride,
3. Mathematician John von Neumann’s cellular automata world with its
self-reproducing automata and a universal constructor,
4. Physicist Freeman Dyson’s vision of a future green technology in which
you can, for example, grow houses from seeds.
As these four examples show, if an idea is important, it’s reinvented, it keeps
being rediscovered. In fact, I think this is an idea whose time has come.
Secret/True Names and the Esoteric Tradition
“In the beginning was the Word, and the Word was with God,
and the Word was God.” John 1:1
Information knowing someone’s secret/true name is very important in
the esoteric tradition [1, 2]:
• Recall the German fairy tale in which the punch line is “Rumpelstiltskin
is my name!” (the Brothers Grimm).
• You have power over someone if you know their secret name.
• You can summon a demon if you know its secret name.
47.
The information economy 47
• In the Garden of Eden, Adam acquired power over the animals by
naming them.
• God’s name is never mentioned by Orthodox Jews.
• The golem in Prague was animated by a piece of paper with God’s
secret name on it.
• Presumably God can summon a person or thing into existence by calling
its true name.
• Leibniz was interested in the original sacred Adamic language of cre-
ation, the perfect language in which the essence/true nature of each
substance or being is directly expressed, as a way of obtaining ultimate
knowledge. His project for a characteristica universalis evolved from
this, and the calculus evolved from that. Christian Huygens, who had
taught Leibniz mathematics in Paris, hated the calculus [3], because
it eliminated mathematical creativity and arrived at answers mechani-
cally and inelegantly.
Fred Hoyle’s Ossian’s Ride
The main features in the future economy that Hoyle imagines are:
• Cheap and unlimited hydrogen to helium fusion power,
• Therefore raw materials readily available from sea-water, soil and air
(for example, using extremely large-scale and energy intensive mass
spectrometer-like devices [Gordon Lasher, private communication]).
• And with essentially free energy and raw materials, all that counts is
technological know-how, which is just information.
Perhaps it’s best to let Hoyle explain this in his own words [4]:
[T]he older established industries of Europe and America. . .
grew up around specialized mineral deposits—coal, oil, metallic
ores. Without these deposits the older style of industrialization
was completely impossible. On the political and economic fronts,
48.
48 Chaitin: Metabiology
the world became divided into “haves” and “have-nots,” depend-
ing whereabouts on the earth’s surface these specialized deposits
happened to be situated. . .
In the second phase of industrialism. . . no specialized deposits
are needed at all. The key to this second phase lies in the pos-
session of an eﬀectively unlimited source of energy. Everything
here depends on the thermonuclear reactor. . . With a thermonu-
clear reactor, a single ton of ordinary water can be made to yield
as much energy as several hundred tons of coal—and there is no
shortage of water in the sea. Indeed, the use of coal and oil as a
prime mover in industry becomes utterly ineﬃcient and archaic.
With unlimited energy the need for high-grade metallic ores
disappears. Low-grade ones can be smelted—and there is an am-
ple supply of such ores to be found everywhere. Carbon can be
taken from inorganic compounds, nitrogen from the air, a whole
vast range of chemical from sea water.
So I arrived at the rich concept of this second phase of industri-
alization, a phase in which nothing is needed but the commonest
materials—water, air and fairly common rocks. This was a phase
that can be practiced by anybody, by any nation, provided one
condition is met: provided one knows exactly what to do. This
second phase was clearly enormously more eﬀective and powerful
than the ﬁrst.
Of course this concept wasn’t original. It must have been at
least thirty years old. It was the second concept that I was more
interested in. The concept of information as an entity in itself,
the concept of information as a violently explosive social force.
In Hoyle’s fantasy, this crucial information — including the design of ther-
monuclear reactors — that suddenly propels the world into a second phase
of industrialization comes from another world. It is a legacy bequeathed to
humanity by a nonhuman civilization desperately trying to preserve anything
it can when being destroyed by the brightening of its star.
49.
The information economy 49
John von Neumann’s Cellular Automata
World
This cellular automata world ﬁrst appeared in lectures and private working
notes by von Neumann. These ideas were advertised in article in Scientiﬁc
American in 1955 that was written by John Kemeny [5]. Left unﬁnished
because of von Neumann’s death in 1957, his notes were edited by Arthur
Burks and ﬁnally published in 1966 [6]. Burks then presented an overview
in [7]. Key points:
• World is a discrete crystalline medium.
• Two-dimensional world, graph paper, divided into square cells.
• Each square has 29 states.
• Time is quantized as well as space.
• State of each square the same universal function of its previous state
and the previous state of its 4 immediate neighbors (square itself plus
up, down, left, right immediate neighbors).
• Universal constructor can assemble any quiescent array of states.
• Then you have to start the device running.
• The universal constructor is part of von Neumann’s self-reproducing
automata.
The crucial point is that in von Neumann’s toy world, physical systems
are merely discrete information, that is all there is. And there is no dif-
ference between computing a string of bits (as in AIT) and “computing”
(constructing) an arbitrary physical system.
I should also mention that starting from scratch, Edgar Codd came up
with a simpler version of von Neumann’s cellular automata world in 1968 [8].
In Codd’s model cells have 8 states instead of 29.
50.
50 Chaitin: Metabiology
Freeman Dyson’s Green Technology
Instead of Hoyle’s vision of a second stage of traditional smokestack heavy
industry, Dyson [9, 10] optimistically envisions a green-technology small-is-
beautiful do-it-yourself grass-roots future.
The emerging technology that may someday lead to Dyson’s utopia is be-
coming known as “synthetic biology” and deals with deliberately engineered
organisms. This is also referred to as “artiﬁcial life,” the development of
“designer genomes.” To produce something, you just create the DNA for it.
Here are some key points in Dyson’s vision:
• Solar electrical power obtained from modiﬁed trees. (Not from ther-
monuclear reactors!)
• Other useful devices/machines grown from seeds. Even houses grown
from seeds?!
• School children able to design and grow new plants, animals.
• Mop up excessive carbon dioxide or produce fuels from sugar (actual
Craig Venter projects [11]).
On a much darker note, to show how important information is, there
presumably exists a sequence of a few-thousand DNA bases (A, C, G, T)
for the genome of a virus that would destroy the human race, indeed, most
life on this planet. With current or soon-to-be-available molecular biology
technology, genetic engineering tools, anyone who knew this sequence could
easily synthesize the corresponding pathogen. Dyson’s utopia can easily turn
into a nightmare.
AIT as an Economic Metatheory
So one can imagine scenarios in which natural resources are irrelevant and
all that counts is technological know-how, that is, information. We have just
seen four such scenarios. In such a world, I believe, AIT becomes, not an
economic theory, but perhaps an economic metatheory, since it is a theory
of information, a theory about the properties of technological know-how, as
I will now explain.
51.
The information economy 51
The main concept in AIT is the amount of information H(X) required to
compute (or construct) something, X. This is measured in bits of software,
the number of bits in the smallest program that calculates X. Brieﬂy, one
refers to H(X) as the complexity of X. For an introduction to AIT, please
see [12, 13].
In economic terms, H(X) is a measure of the amount of technological
know-how needed to produce X. If X is a hammer, H(X) will be small. If
X is a sophisticated military aircraft, H(X) will be quite large.
Two other concepts in AIT are the joint complexity H(X, Y ) of produc-
ing X and Y together, and the relative complexity H(X|Y ) of producing
X if we are given Y for free.
Consider now two objects, X and Y . In AIT,
H(X) + H(Y ) − H(X, Y )
is referred to as the mutual information in X and Y . This is the extent to
which it is cheaper to produce X and Y together than to produce X and Y
separately, in other words, the extent to which the technological know-how
needed to produce X and Y can be shared, or overlaps. And there is a basic
theorem in AIT that states that this is also
H(X) − H(X|Y ),
which is the extent to which being given the know-how for Y helps us to
construct X, and it’s also
H(Y ) − H(Y |X),
which is the extent to which being given the know-how for X helps us to
construct Y . This is not earth-shaking, but it’s nice to know.
(For a proof of this theorem about mutual information, please see [14].)
One of the reasons that we get these pleasing properties is that AIT is
like classical thermodynamics in that time is ignored. In thermodynamics,
heat engines operate very slowly, for example, reversibly. In AIT, the time
or eﬀort required to construct something is ignored, only the information
required is measured. This enables both thermodynamics and AIT to have
clean, simple results. They are toy models, as they must be if we wish to
prove nice theorems.
52.
52 Chaitin: Metabiology
Conclusion
Clearly, we are not yet living in an information economy. Oil, uranium,
gold and other scarce, precious limited natural resources still matter. But
someday we may live in an information economy, or at least approach it
asymptotically. In such an economy, everything is, in eﬀect, software; hard-
ware is comparatively unimportant. This is a possible world, though perhaps
not yet our own world.
References
1. A. Coudert, Leibniz and the Kabbalah, Kluwer, Dordrecht, 1995.
2. U. Eco, The Search for the Perfect Language, Blackwell, Oxford, 1995.
3. J. Hofmann, Leibniz in Paris 1672–1676, Cambridge University Press,
1974, p. 299.
4. F. Hoyle, Ossian’s Ride, Harper & Brothers, New York, 1959, pp. 157–
158.
5. J. Kemeny, “Man viewed as a machine,” Scientiﬁc American, April
1955, pp. 58–67.
6. J. von Neumann, Theory of Self-Reproducing Automata, University of
Illinois Press, Urbana, 1966. (Edited and completed by Arthur W.
Burks.)
7. A. Burks (ed.), Essays on Cellular Automata, University of Illinois
Press, Urbana, 1970.
8. E. Codd, Cellular Automata, Academic Press, New York, 1968.
9. F. Dyson, The Sun, the Genome, & the Internet, Oxford University
Press, New York, 1999.
10. F. Dyson, A Many-Colored Glass, University of Virginia Press, Char-
lottesville, 2007.
11. C. Venter, A Life Decoded, Viking, New York, 2007.
53.
The information economy 53
12. G. Chaitin, Meta Maths, Atlantic Books, London, 2006.
13. G. Chaitin, Thinking about G¨odel and Turing, World Scientiﬁc, Singa-
pore, 2007.
14. G. Chaitin, Exploring Randomness, Springer-Verlag, London, 2001, pp.
95–96.
1 July 2008
55.
Chapter 5
How real are real numbers?
We discuss mathematical and physical arguments against continuity and in
favor of discreteness, with particular emphasis on the ideas of ´Emile Borel
(1871–1956). Lecture given Tuesday, 22 September 2009, at Wilfrid Laurier
University in Waterloo, Canada.
I’m not going to give a tremendously serious talk on mathematics today.
Instead I will try to entertain and stimulate you by showing you some really
weird real numbers.
I’m not trying to undermine what you may have learned in your mathe-
matics classes. I love the real numbers. I have nothing against real numbers.
There’s even a real number — Ω — that has my name on it.1
But as you
will see, there are some strange things going on with the real numbers.
Let’s start by going back to a famous paper by Turing in 1936. This is
Turing’s famous 1936 paper in the Proceedings of the London Mathemati-
cal Society; mathematicians proudly claim it creates the computer industry,
which is not quite right of course.
But it does have the idea of a general-purpose computer and of hardware
and software, and it is a wonderful paper.
This paper is called “On computable numbers, with an application to the
Entscheidungsproblem.” And what most people forget, and is the subject of
1
See the chapter on “Chaitin’s Constant” in Steven Finch, Mathematical Constants,
Cambridge University Press, 2003.
55
56.
56 Chaitin: Metabiology
my talk today, is that when Turing talks about computable numbers, he’s
talking about computable real numbers.
Turing, 1936: “On computable numbers. . . ”
But when you work on a computer, the last thing on earth you’re ever going
to see is a real number, because a real number has an inﬁnite number of digits
of precision, and computers only have ﬁnite precision. Computers don’t quite
measure up to the exalted standards of pure mathematics.
One of the important contributions of Turing’s paper, not to computer
technology but to pure mathematics, and even to philosophy and epistemol-
ogy, is that Turing’s paper distinguishes very clearly between real numbers
that are computable and real numbers that are uncomputable.
What is a real number? It’s just a measurement made with inﬁnite pre-
cision. So if I have a straight line one unit long, and I want to ﬁnd out where
a point is, that corresponds to a real number. If it is all the way to the left
in this unit interval, it’s 0.0000. . . If the point is all the way to the right, it’s
1.0000. . . If it is exactly in the middle, that’s .50000. . . And every point on
this line corresponds to precisely one real number. There are no gaps.
0.0 ——– 0.5 ——– 1.0
So, if you just tell me exactly where a point is, that’s a real number. From
the point of view of geometrical intuition a real number is something very
simple: it is just a point on a line. But from an arithmetical point of view, if
you want to calculate its numerical value digit by digit or bit by bit if you’re
using binary, it turns out that real numbers are problematical.
Even though to geometrical intuition points are the most natural and
elementary thing you can imagine, if you want to actually calculate the value
of a real number with inﬁnite precision, you can get into big trouble. Actually,
you never calculate it with inﬁnite precision. What Turing says is that you
calculate it with arbitrary precision.
His notion of a computable real number is a real number that you can
calculate as accurately as you may wish.
I guess he actually says it is an inﬁnite calculation. You start calculating
its numerical value, if you’re using decimal, digit by digit, or if you’re using
binary, bit by bit. You have the integer part, the decimal point, and then
you have an inﬁnite string of bits or digits, depending on your base, and the
computer will grind away gradually giving you more and more bits or more
and more digits of the numerical value of the number.
57.
How real are real numbers? 57
So that’s a computable real number. According to Turing, that means it
is a real number for which there is an algorithm, a mechanical procedure, for
calculating its value with arbitrary accuracy, with more and more precision.
For example, π is a computable real number,
√
2 is a computable real
number, and e is a computable real number. In fact, every real number
you’ve ever encountered in your math classes, every individual real number
that you’ve ever encountered, is a computable real number.
Computable reals: π,
√
2, e, 1/2, 3/4 . . .
These are the familiar real numbers, the computable ones, but surprisingly
enough, Turing points out that there are also lots of uncomputable real
numbers.
Dramatically enough, the moment Turing comes up with the computer
as a mathematical concept — mathematicians call this a universal Turing
machine — he immediately points out that there are things no computer
can do. And one thing no computer can do is calculate the value of an
uncomputable real number.
How does Turing show that there are uncomputable reals? Well, the ﬁrst
argument he gives goes back to Cantor’s theory of inﬁnite sets, which tells us
that the set of real numbers is an inﬁnite set that is bigger, that is inﬁnitely
more numerous, than the set of computer programs.
The possible computer programs are just as numerous as the positive
integers, as the whole numbers 1, 2, 3, 4, 5. . . but the set of real numbers is
a much bigger inﬁnity.
So in fact there are more uncomputable reals than computable reals.
From Cantor’s theory of inﬁnite sets, we see that the set of uncomputable
reals is just as big as the set of all reals, while the set of computable reals
is only as big as the set of whole numbers. The set of uncomputable reals is
much bigger than the set of computable reals.
#{uncomputable reals} = #{all reals} = ℵ1,
#{computable reals} = #{computer programs} = #{whole numbers} =ℵ0,
ℵ1 > ℵ0.
The set of computable reals is as numerous as the computer programs, be-
cause each computable real needs a computer program to calculate it. And
the computer programs are as numerous as the whole numbers 1, 2, 3, 4, 5. . .
because you can think of a computer program as a very big whole number.
58.
58 Chaitin: Metabiology
In base-two a whole number is just a string of bits, which is all a computer
program is.
That most reals are uncomputable was quite a surprise, but Turing
doesn’t stop with that. In his famous paper he uses a technique from set
theory called Cantor’s diagonal argument to exhibit an individual example
of an uncomputable real.
Turing’s 1936 paper has an intellectual impact and a technological impact.
From a technological point of view his paper is fantastic because as we all
know the computer has changed our lives; we can’t imagine living without
it. In 1936 Turing has the idea of a ﬂexible machine, of ﬂexible hardware, of
what we now call software. A universal machine changes to be like any other
machine when you insert the right software. This is a very deep concept:
a ﬂexible digital machine. You don’t need lots of special-purpose machines,
you can make do with only one machine. This is the idea of a general-purpose
computer, and it is in Turing’s wonderful 1936 paper, before anybody built
such a device.
But then immediately Turing points out that there are real numbers
that can’t be calculated, that no machine can furnish you with better and
better approximations for; in fact there are more uncomputable reals than
computable reals.
So in a sense this means that most individual real numbers are like myth-
ical beasts, like Pegasus or unicorns — name your favorite mythical object
that unfortunately doesn’t exist in the real world.
In this talk I will concentrate on the uncomputable reals, not the reals that
are familiar to all of us like π,
√
2 and 3/4. I’ll tell you about some surprising
real numbers, ones that are in the shadow, that we cannot compute, and
whose numerical values are quite elusive.
And the ﬁrst thing I’d like to say on this topic is that there actually
was an example of an uncomputable real before Turing’s 1936 paper. It was
in a short essay by a wonderful French mathematician who is now largely
forgotten called ´Emile Borel. ´Emile Borel in 1927, without anticipating in
any way Turing’s wonderful paper, does point out a real number that we can
now recognize as an uncomputable real.
Borel’s 1927 number is a very paradoxical real number, and I’d like to
tell you about it. Let’s see if you enjoy it as much as I do!
Borel’s idea is to have a know-it-all real number: It’s an oracle that knows
the answer to every yes/no question.
Borel was a Frenchman, and he imagined writing all possible yes/no ques-
59.
How real are real numbers? 59
tions in French in a numbered list. So each question has its number, and then
what you do is consider the real number with no integer part, and whose Nth
decimal, the Nth digit after the decimal point of this number, answers the
Nth question in the list of all possible questions.
Borel’s 1927 know-it-all real: The Nth digit answers the Nth question.
If the answer to the Nth question is “yes,” then the Nth digit is, say, a 1, and
if the answer is “no” then that digit will be a 2. You have an inﬁnite number
of digits, so you can pack an inﬁnite number of answers in Borel’s number.
If you can imagine a list of all possible yes/no questions, this number will be
like an oracle that will answer them all.
If you could know the value of this magical number with suﬃcient accu-
racy, you could answer any particular, individual question.
Needless to say, this number is a bit fantastic. So why did Borel come up
with this crazy example? To show us that there are real numbers that are
not for real.
Borel’s amazing number will give you the answer to every yes/no question
in mathematics and in physics, and about the past and the future.
You can ask Borel’s oracle paradoxical questions, like
“Is the answer to this question no?”
And then you have a problem, because whether you answer “no” or you
answer “yes,” it will always be the wrong answer. There is no correct answer.
Another problem is if you ask
“Will I drink coﬀee tomorrow?”
And depending on what you ﬁnd in Borel’s number, you do the opposite.
You will just have to put up with not drinking coﬀee for one day, perhaps,
to refute the know-it-all number.
So these are some paradoxical aspects of Borel’s idea. Another problem
is, how do you make a list of all possible yes/no questions in order to give a
number to each question?
Actually, it is easy to number each question. What you do is make a list of
all possible texts drawn from the alphabet of the language you’re interested
in, and you have ten possibilities for each digit in Borel’s oracle number, and
so far we’ve only used the digits 1 and 2. So you can use the other digits to
say that that sequence of characters from that particular national alphabet
60.
60 Chaitin: Metabiology
isn’t grammatical, or it’s not a question, or it’s a question but it’s not a
yes/no question, or it’s a yes/no question which has no answer because it’s
paradoxical, and maybe it’s a yes/no question which has an answer but I
don’t want to tell you about the future because I want to avoid the coﬀee
problem.
Another way to try to ﬁx Borel’s number is to restrict it to just give
the answer to mathematical questions. You can imagine an inﬁnite list of
mathematical questions, you pick a formal language in which you ask yes/no
mathematical questions, you pick a notation for asking such questions, and
there is certainly no problem with having Borel’s know-it-all number answer
only mathematical questions. That will get rid of the paradoxes. You can
make that work simply because you can pack an inﬁnite amount of
information in a real number.2
And this magical number could be represented, in principle, by two metal
rods, if you believe in inﬁnite precision lengths. You have a rod that is exactly
one unit long, one meter long, and you have another rod that is smaller, whose
length is precisely the know-it-all number. You have your standard meter,
and you have something less than a meter long whose length is precisely the
magical know-it-all number.
If you could measure the size of the smaller rod relative to the standard
meter with arbitrary accuracy, you could answer every mathematical ques-
tion, if somebody gave you this magical metal rod.3
Of course, we are assuming that you can make measurements with inﬁnite
precision, which any physicist who is here will say is impossible. I think that
the most accurate measurement that has ever been made has under twenty
digits of precision.
But in theory having these two rods would give us an oracle. It would be
like having Borel’s know-it-all number.
And now you’re not going to be surprised to hear that if you ﬁx Borel’s
2
As long as we avoid self-reference, i.e., giving this know-it-all number a name and then
having a digit ask about itself in a way that requires that very digit to be diﬀerent. E.g.,
is the digit of Borel’s know-it-all number that corresponds to this very question a 2?
3
This is an argument against inﬁnite divisibility of space. In 1932 Hermann Weyl gave
an argument against the inﬁnite divisibility of time. If time is really inﬁnitely divisible,
then a machine could perform one step of a calculation in one second, then another step in
half a second, the next in 1/4 of a second, then in 1/8 of a second, then 1/16 of a second,
and in this manner would perform an inﬁnite number of steps in precisely 1+ 1
2 + 1
4 +. . . = 2
seconds. But no one believes that such so-called Zeno machines are actually possible.
61.
How real are real numbers? 61
number so that it is at least well-deﬁned, not paradoxical, it is in fact un-
computable. For otherwise you could compute the answer to every question,
which is implausible.
But Borel did not really have the notion of computability nor of what
a computer is. He was working with the idea of computation intuitively,
informally, without deﬁning it. He said that as far as he was concerned this
number is conceivable but not really legitimate. He felt that his real number
was too wild.
Borel had what is now called a constructive attitude. His personal view
which he states in that little 1927 essay with the oracle number, is that he
believes in a real number if in principle it can be calculated, if in theory
there’s some way to do that. And he talks about this counter-example, this
number which is conceivable but in his opinion is not really legitimate.
If we remove Borel’s number, that will leave a hole in the line, in the
unit interval [0, 1] = {0 ≤ x ≤ 1}. So we’ve broken the unit interval into
two pieces because we just eliminated one point, which is very unfortunate
geometrically. But let’s go on.
Okay, so this was 1927, this was ´Emile Borel with his paradoxical know-it-
all real number that answers every yes/no question, and it is a mathematical
fantasy, not a reality. Now let’s go back to Turing.
In 1936 Turing points out that there are more uncomputable reals than
computable ones. Now I’d like to tell you something that Turing doesn’t
point out, which uses ideas from ´Emile Borel, who was one of the inventors
of what’s called measure theory or probability theory.
It turns out that if you choose a real number x between zero and one,
x ∈ [0, 1], and you have uniform probability of picking any real number
between zero and one, then the probability is unity that the number x will be
uncomputable. The probability is zero that the number will be computable.
Prob{uncomputable reals} = 1, Prob{computable reals} = 0.
That’s not too diﬃcult to see. Now I’ll give you a mathematical proof of
this.
You want to cover all the computable reals in the unit interval with a
covering which can be arbitrarily small. That’s the way to prove that the
computable reals have measure zero, which means that they are an inﬁnites-
imal part of the unit interval, that they have zero probability. Technically,
you say they have measure zero.
62.
62 Chaitin: Metabiology
Remember that every computable real corresponds to a program, the
program to calculate it, and the programs are essentially positive integers,
so there’s a ﬁrst program, a second program, a third program. . . So you can
imagine all the computable reals in a list. There will be a ﬁrst computable
real, a second, a third. . .
And I cover the ﬁrst computable real with an interval. I put on top of it
an interval of length /2. And then I cover the second computable real with
an interval of length /4. I cover the third computable real with an interval
of length /8 . . . then /16, then /32 . . .
So the total size of the covering is
2
+
4
+
8
+
16
+
32
+ . . . = .
Some of these intervals may overlap; it doesn’t really matter. What does
matter is that the total length of all these intervals is exactly , and you can
make as small as you want.
So this is just a proof that something is of measure zero. I’m taking the
trouble to show that you can corner all the computable real numbers in the
unit interval by covering them. You can do it with a covering that you can
make as small as you please, which means that the computable reals have
zero probability, they occupy zero real-estate in the unit interval.
This is a proof that the computable reals really are exceptional. But
they’re the exception of our normal, everyday experience. The fact that un-
computable reals have probability unity doesn’t help us to ﬁnd any concrete
examples!
To repeat, if I pick a real number at random between zero and one, it
is possible to get a computable real, but it is inﬁnitely unlikely. Probability
zero in this circumstance doesn’t mean impossibility, it just means that it’s
an inﬁnitesimal probability, it is inﬁnitely unlikely. It is possible. It would
be miraculous, but it can happen.
The way mathematicians say this, is that real numbers are almost surely
uncomputable.
This is a bit discouraging to those of us who prefer computable reals to
uncomputable ones. Or maybe it is a bit surprising that all the individual
reals in our normal experience are exceptional. It does make me think that
perhaps real numbers are problematic and cannot be taken for granted. What
do you think?
63.
How real are real numbers? 63
What’s another way to put it? In other words, the real numbers are like
a Swiss cheese with a lot of holes. In fact, it’s all holes! It’s like looking at
the night sky. There are stars in the night sky, those are the computable
reals, but the background is always black — the stars are the exception.
So that is how the real numbers look!
All the reals we know and love are exceptional.
And Borel goes a little farther in his last book, written when he was in
his eighties. In 1952 he published a book called Les nombres inaccessibles —
Inaccessible Numbers — in which he points out that most real numbers can’t
even be referred to or named individually in any way. The real numbers that
you can somehow name, or pick out as individuals, even without being able
to compute them, have to be the exception, with total probability zero.
Most real numbers cannot even be named as individuals in any way,
constructive or non-constructive. The way somebody put it is, most reals
are wall-ﬂowers, they’ll never be invited to dance!
Prob{individually nameable reals} = 0, Prob{un-nameable reals} = 1.
Okay, so what I’d like to do in the rest of this talk is to take Borel’s crazy,
know-it-all, oracle real number, and try to make it as realistic as possible.
We’ve gotten ourselves into a bit of a quandary. I tend to believe in
something if I can calculate it; if so, that mathematical object has concrete
meaning for me. So I have a sort of constructive attitude in math.
But there is this surprising fact that in some sense most mathematical
facts or objects seem to be beyond our reach. Most real numbers can never
be calculated, they’re uncomputable, which suggests that mathematics is full
of things that we can’t know, that we can’t calculate.
This is related to something famous called G¨odel’s incompleteness the-
orem from 1931, ﬁve years before Turing. G¨odel’s 1931 theorem says that
given any ﬁnite set of axioms, there will be true mathematical statements
that escape, that are true but can’t be proven from those axioms. So math-
ematics resists axiomatization. There is no Theory of Everything for pure
mathematics.
G¨odel, 1931: No TOE for pure mathematics!
And what Turing shows in 1936 is that there are a lot of things in mathemat-
ics that you can never calculate, that are beyond our reach because there’s
no way to calculate the answer. And in fact real numbers are an example:
most real numbers, with probability one, cannot be calculated.
64.
64 Chaitin: Metabiology
So it might be nice to try to come up with an example of a particular real
number that can’t be calculated, and try to make these strange, mysterious,
hidden real numbers as concrete as possible.
I’d like to show you a real number — Ω — which is as real as I can make
it, but is nevertheless uncomputable. That’s my goal.
In other words, there is what you can calculate or what you can prove in
mathematics, and then there is this vast cloud of unknowing, of things
that you can’t know or calculate. And I would like to try to ﬁnd something
right at the border between the knowable and the unknowable. I’m going to
make it as real as possible, but it’s going to be just beyond our reach, just
beyond what we can calculate.
I want to show you a real number that can almost be calculated, which
is as close as possible to seeming real, concrete, but in fact escapes us and is
an example of this ubiquitous phenomenon of uncomputability that Turing
discovered in 1936, of numbers which cannot be computed.
How can we come up with a number like this? I’ll do it by combining ideas
from Turing with ideas from Borel, and then using compression to eliminate
all the redundancy. And the result will be my Ω number. This is not how I
actually discovered Ω, but I think it is a good way to understand Ω.
In his 1936 paper Turing discovered what’s called the halting problem.
This famous paper took years to digest, and it was a while before mathe-
maticians realized how important the halting problem is. Another important
idea in this paper is the notion of a universal Turing machine. Of course, he
doesn’t call it a Turing machine, that name came later. So if you look there
you don’t ﬁnd the words “Turing machine.”
Another thing that is in this paper but you won’t ﬁnd it if you look for it,
is a very famous result called the unsolvability of the halting problem, which
I will explain now. If you look at the paper, it’s not easy to spot, it’s not
called that, but the idea is certainly there.
It took years of work on this paper by a community to extract the essential
ideas, give them catchy names, and start waving ﬂags with those names on
them.
So let me tell you about the halting problem, which is a very fundamental
thing that Turing came up with.
Remember that Turing has the idea of a general-purpose computer, and
then since he’s a pure mathematician, he immediately starts pointing out
that there are things that no computer can calculate. There are things that
no algorithm can achieve, which there is no mechanical way to calculate.
65.
How real are real numbers? 65
One of these things is called the halting problem. What is the halting
problem? It’s a very simple question. Let’s say you’re given a computer
program, and it’s a computer program that is self-contained, so it cannot ask
for input, it cannot read in any data. If there is any data it needs all that
data has to be included in the program as a constant.
And the program just starts calculating, it starts executing. And there
are two possibilities: Does the program go on forever, or at some point does
it get a result and say “I’m ﬁnished,” and halt? That’s the question.
Does a self-contained computer program ever halt?
So you’re given a program that’s self-contained, and want to know what will
happen. It’s a self-contained program, you just start running it — the process
is totally mechanical — and there are two possibilities: The ﬁrst possibility
is that this process will go on forever, that the program is searching for
something that it will never ﬁnd, and is in some kind of an inﬁnite loop.
The other possibility is that the program will eventually ﬁnd what it is
looking for, and maybe produce a result; at any rate, it will halt and stop
and it is ﬁnished.
And you can ﬁnd out which is the case by running it. You run the
program, and if it stops eventually you are going to discover that, if you are
patient enough.
The problem is what if it never stops; running the program cannot de-
termine that. You can give up after running the program for a day or for a
week, but you can’t be sure it is never going to stop.
So Turing asks the very deep question, “Is there a general procedure,
given a program, for deciding in advance, without running it, whether it is
going to go on forever or whether it is eventually going to stop?” You want
an algorithm for deciding this. You want an algorithm which will take a
ﬁnite amount of time to decide, and will always give you the correct answer.
And what Turing shows is that there is no general method for deciding,
there is no algorithm for doing this; deciding whether a program halts or not
isn’t a computable function.
This is a very simple question involving computer programs that always
has an answer — the program either goes on forever or not — but there’s no
mechanical procedure for deciding, there’s no algorithm which always gives
you the correct answer, there’s no general way, given a computer program,
to tell what is going to happen.
66.
66 Chaitin: Metabiology
For individual programs you can sometimes decide, you can even settle
inﬁnitely many cases, but there’s no general way to decide.
This is the famous result called the unsolvability of the halting problem.
Turing, 1936: Unsolvability of the halting problem!
However if you are a practical person from the business school, you may say,
“What do I care?” And I would have to agree with you. You may well say,
“All I care is will the program stop in a reasonable amount of time, say, a
year. Who is going to wait more than a year?”
But if you want to know if a program will stop in a ﬁxed amount of time,
that’s very easy to do, you just run it for that amount of time and see what
happens. There is no unsolvability, none at all.
You only get into trouble when there’s no time limit.
So you may say that this is sort of a fantasy, because in the real world
there is always a time limit: We’re not going to live forever, or you’re going
to run out of power, or the computer is going to break down, or be crushed by
glaciers, or the continents will shift and a volcano will melt your computer,
or the sun will go nova, whatever horror you want to contemplate!
And I agree with you. The halting problem is a theoretical question. It
is not a practical question. The world of mathematics is a toy world where
we ask fantasy questions, which is why we have nice theories that give nice
answers. The real world is messy and complicated. The reason you can use
reasoning and prove things in pure mathematics is because it’s a toy world,
it’s much simpler than the real world.
Okay, so this question, the halting problem, is not a real question, it’s
an abstract, philosophical question. If you suspected that, I agree with you,
you were right! But I like to live in the world of ideas. It’s a game, you may
say, it’s a fantasy world, but that’s the world of pure mathematics.
So let’s start with the question Turing proved in his 1936 paper is unsolv-
able. There is no general method, no mechanical procedure, no algorithm to
answer this question that will always work. So what I do, is I play a trick of
the kind that you use in a ﬁeld called statistical mechanics, a ﬁeld of physics,
which is to take an individual problem and imbed it in a space, an ensemble
of all possible problems of that type. That’s a well-known strategy.
In other words, instead of asking if an individual program halts or not,
let’s look at the probability that this will happen, taken over the ensemble
of all possible programs. . .
67.
How real are real numbers? 67
But ﬁrst let me tell you why it is sometimes very important to know
whether a program halts or not. You may say, “Who cares?” Well, in
pure mathematics it is important because there are famous mathematical
conjectures which it turns out are equivalent to asking whether a program
halts or not.
There’s a lovely example from ancient Greece. There’s something called
a perfect number. A number is perfect if it is the sum of all its divisors. (Or
twice the sum, if you include the number itself as one of the divisors.) So 6
is a perfect number, because its divisors are 1, 2 and 3, and
6 = 1 + 2 + 3.
That’s a perfect number.
If the sum of the divisors is more than the number, then it’s abundant;
if the sum of the divisors is less than the number, then it is deﬁcient; and
if the sum of the divisors is exactly equal to the number, then it is perfect.
Furthermore, two numbers are amicable if each one is the sum of the divisors
of the other.
And there are lots of perfect numbers. The next perfect number is 28.
28 = 1 + 2 + 4 + 7 + 14.
The question is, are there any odd perfect numbers? This is a question that
goes back to ancient Greece, to Pythagoras, Euclid and Plato.
Are there odd perfect numbers?
So the question is, are there any odd perfect numbers? And the answer,
amazingly enough, is that nobody knows. It’s a very simple question, the
concepts go back two millennia, but all the perfect numbers that have been
found are even, and nobody knows if there’s an odd perfect number.
Now, in principle you could just start a computer program going, have
it look at each odd number, ﬁnd its divisors, add them, and see whether
the sum is exactly the number. So if there’s an odd perfect number, we’re
eventually going to ﬁnd it.
If the program never ends, then all the perfect numbers are even. It
searches for an odd perfect number and either it halts because it found one,
or goes on forever without ever ﬁnding what it is looking for.
It turns out that most of the famous conjectures in mathematics, but not
all, are equivalent to asking whether a computer program halts. The general
68.
68 Chaitin: Metabiology
idea is that most down-to-earth mathematical questions are instances of the
halting problem. However whether or not there are inﬁnitely many perfect
numbers — which is also unknown — is not a case of the halting problem.
On the other hand, a famous conjecture called the Riemann hypothesis is
an instance of the halting problem. And there’s Fermat’s Last Theorem, ac-
tually a three-century old conjecture which has now been proven by Andrew
Wiles, stating that there is no solution of
xN
+ yN
= zN
(x, y, z integers > 0, N integer ≥ 3).
These are all conjectures which if false can be refuted by a numerical counter-
example. You can search systematically for a counter-example using a com-
puter program, hence that kind of mathematical conjecture is equivalent to
asking whether a program halts or not.
There’s a program that systematically looks for solutions of xN
+yN
= zN
and there’s a program that systematically looks for zeros of the Riemann zeta
function that are in the wrong place. The Riemann hypothesis is complicated,
but if it’s false, there is a ﬁnite calculation which refutes it, and you can search
systematically for that. (The Riemann hypothesis is important because if it’s
true, then the prime numbers are smoothly distributed in a certain precise
technical sense. This seems to be the case but no one can prove it.)
What I’m trying to say is that a lot of famous mathematical conjectures
are equivalent to special cases of the halting problem. If you had a way of
solving the halting problem that would be pretty nifty. It would be great
to have an oracle for the halting problem. Which by the way is Turing’s
terminology, but not in that famous 1936 paper. In another paper he talks
about oracles, which is a lovely term to use in pure mathematics.
Following Borel 1927, we know how to pack the answers to all possible
cases of the halting problem into one real number, and this gives us a more
realistic version of Borel’s magical know-it-all oracle number. You use the
successive bits of a real number to give the answer to every individual case
of the halting problem.
Remember that you can think of a computer program as a whole number,
as an integer. You can number all the programs. In binary machine language
a computer program is just a long bit string, and you can think of it as the
base-two numeral for a big whole number. So every program is also a whole
number.
And then if a program is the number N, the Nth program in a list of all
possible programs, you use the Nth bit of a real number to tell us whether
69.
How real are real numbers? 69
or not that program halts. If the Nth program halts, the Nth bit will be a
1; if it doesn’t halt, the Nth bit will be a 0.
Halting-problem oracle number:
The Nth bit answers the Nth case of the halting problem.
This is a more realistic version of Borel’s 1927 oracle number. And following
Turing’s 1936 paper it is uncomputable. Why?
Because if you could compute this real number, you could solve the halt-
ing problem, you could decide whether any self-contained program will halt,
and this would enable you to settle a lot of famous mathematical conjectures,
for instance the Riemann hypothesis. The Clay Mathematics Institute has
oﬀered a million dollar prize to the person who settles the Riemann hypoth-
esis, but only if they settle it positively, I think. But it would also be very
interesting to refute the Riemann hypothesis.
There is a bit in this real number which corresponds to the program that
looks for a refutation of the Riemann hypothesis. If you could know what this
particular bit is, that wouldn’t actually be worth a million dollars, because it
wouldn’t give you a proof. Nevertheless this is a piece of information that a
lot of mathematicians would like to know because the Riemann hypothesis is
a famous problem in pure mathematics having to do with the prime numbers
and how smoothly they are distributed.
So a halting-problem oracle would be a valuable thing to have. This
number wouldn’t tell you about history or the future; it wouldn’t answer
every yes/no question in French. But Borel’s 1927 number is paradoxical.
Our halting-problem oracle is a much more down-to-earth number. In spite
of being more down to earth, it is an uncomputable real that would also be
very valuable.
But we can do even better! This halting-problem oracle packs a lot of
mathematical information into one real number, but it doesn’t do it in the
best, most economical way. This real number is redundant, it repeats a lot
of information, it’s not the most compact, concise way to give the answer to
every case of the halting problem. You’re wasting a lot of bits, you’re wasting
a lot of space in this real number, you’re repeating a lot of information.
Let me tell you why. We want to know whether individual programs halt
or not. Now I’ll give the second and last proof in this talk.
Suppose that we are given a lot of individual cases of the halting problem.
Suppose we have a list of a thousand or a million programs, and want to know
if each one halts or not. These are all self-contained programs.
70.
70 Chaitin: Metabiology
If you have a thousand programs or a million programs, you might think
that to know whether each of these programs halts or not is a thousand or a
million bits of mathematical information. And it turns out that it’s not, it’s
actually only ten or twenty bits of mathematical information.
N cases of halting problem = only log2 N bits of information.
Why isn’t it a thousand or a million bits of information?
Well, you don’t need to know the answer in every individual case. You
don’t want to ask the oracle too many questions. Oracles should be used
sparingly.
Do we really need to ask the oracle about each individual program? Not
at all! It is enough to know how many of the programs halt; I don’t need to
know each individual case.
And that’s a lot less information. If there are 2N
programs, you only
need N bits of information, not 2N
bits. You don’t need to know about each
individual case. As I said, you just need to know how many of the programs
halt. If there are N programs, that’s just log2 N bits of information, which
is much less than N bits of information.
How come we get this huge savings?
Let’s say you are given a ﬁnite set of programs, you have a ﬁnite collection
of programs, and you want to know whether each one halts or not. Why does
it suﬃce to know how many of these programs halt? You just start running
all of them in parallel, and they start halting, and eventually all the programs
that will ever halt, have halted. And if you know exactly how many that is,
you don’t have to wait any longer, you can stop at that point. You know
that all the other programs will never halt. All the ones that haven’t halted
yet are never going to halt.
In other words, the answers to individual instances of the halting problem
are never independent, they are always correlated. These are not independent
mathematical facts. That’s why we don’t really need to ask an oracle in each
individual case whether a program halts. We can compress this information
a great deal. This information has a lot of redundancy. There are a lot of
correlations in the answers to individual instances of the halting problem.
Okay, so you don’t need to use a bit for each program to get a real
number that’s an oracle for the halting problem. I just told you how to do
much better if you are only interested in a ﬁnite set of programs. But what
if you are interested in all possible programs, what then? Well, here’s how
you handle this.
71.
How real are real numbers? 71
You don’t ask whether individual programs halt or not; you ask what is
the probability that a program chosen at random will halt.
Halting probability Ω = Prob{random program halts}.
That’s a real number between zero and one, and it is a real number I’m very
proud of. I like to call it Ω, which is the last letter in the Greek alphabet,
because it’s sort of a maximally unknowable real number.
Let me explain ﬁrst how you deﬁne Ω, and then I’ll talk about its remark-
able properties.
The idea is this: I’m taking Turing’s halting problem and I’m making it
into the halting probability. Turing is interested in individual programs and
asks whether or not they halt. I take all possible programs, I put them into
a bag, a big bag that contains every possible computer program, I close my
eyes, I shake the bag, I reach in and pull out a program and ask, “What is
the probability that this program will halt?”
If every program halts, this probability would be one. If no program
halts, the probability of halting would be zero. Actually some programs halt
and some don’t, so the halting probability is going to be strictly between
zero and one
0 < Ω = .11011100 . . . < 1,
with an exact numerical value depending on the choice of programming lan-
guage.
And it turns out that if you do things properly — there are some technical
problems that I don’t want to talk about — you don’t really need to know
for every individual program whether it halts or not. What you really need
to know is what is the probability that a program will halt.
And the way it works is this: If I know the numerical value of the halting
probability Ω with N bits of precision — I’m writing it in binary, in base two
— if I know the numerical value of the halting probability Ω with N bits of
precision, then I know for every program up to N bits in size whether or not
it halts.
Can you see why? Try thinking about it for a while.
Knowing N bits of Ω ⇒ Knowing which ≤ N bit programs halt.
This is a very compact, compressed way — in fact, it is the most com-
pressed, compact way — of giving the answers to Turing’s halting problem.
You can show this is the best possible compression, the best possible oracle,
72.
72 Chaitin: Metabiology
this is the most economical way to do it, this is the algorithmic information
content of the halting problem.
Let me try to explain this. Do you know about ﬁle compression programs?
There are lots of compression programs on your computer, and I’m taking
all the individual answers to the halting problem and compressing them.
So whatever your favorite compression program is, let’s use it to com-
press all the answers to the halting problem. If you could compress it per-
fectly, you’d get something that has absolutely no redundancy, something
that couldn’t be compressed any more.
So you get rid of all the redundancy in individual answers to the halting
problem, and what you get is this number I call the halting probability Ω.
This is just the most compact, compressed way to give you the answer to all
the individual cases of Turing’s famous 1936 halting problem.
Even though Ω is a very valuable number because it solves the halting
problem, the interesting thing about it is that it is algorithmically and log-
ically irreducible. In other words, Ω looks random, it looks like it has no
structure, the bits of its numerical value look like independent tosses of a fair
coin.
The bits of Ω are irreducible mathematical information.
Why is this? The answer is, basically, that any structure in something dis-
appears when you compress it. If there were any pattern in the bits of Ω,
for example, if 0s and 1s were not equally likely, then Ω would not be maxi-
mally compressed. In other words, when you remove all the redundancy from
something, what you’re left with looks random, but it isn’t, because it’s full
of valuable information.
What you get when you compress Turing’s halting problem, Ω, isn’t noise,
it’s very valuable mathematical information, it gives you the answers to Tur-
ing’s halting problem, but it looks random, accidental, arbitrary, simply be-
cause you’ve removed all the redundancy. Each bit is a complete surprise.
This may seem paradoxical, but it is a basic result in information theory
that once you compress something and get rid of all the redundancy in it, if
you take a meaningful message and do this to it, afterwards it looks just like
noise.
Let me summarize what we’ve seen thus far.
We have the halting probability Ω that is an oracle for Turing’s halting
problem. It depends on your programming language and there are technical
details that I don’t want to go into, but if you do everything properly you
73.
How real are real numbers? 73
get this probability that is greater than zero and less than one. It’s a real
number, and if you write it in base two, there’s no integer part, just a “.” and
then a lot of bits. These bits look like they have absolutely no structure or
pattern; they look random, they look like the typical result of independent
tosses of a fair coin. They are sort of maximally unknowable, maximally
uncomputable. Let me try to explain what this means.
At this point I want to make a philosophical statement. In pure mathe-
matics all truths are necessary truths. And there are other truths that are
called contingent or accidental like historical facts. That Napoleon was the
emperor of France is not something that you expect to prove mathematically,
it just happened, so it’s an accidental or a contingent truth.
And whether each bit of the numerical value of the halting probability Ω
is a 0 or a 1 is a necessary truth, but looks like it’s contingent. It’s a perfect
simulation of a contingent, accidental, random truth in pure mathematics,
where all truths are necessary truths.
The bits of Ω are necessary but look accidental, contingent.
This is a place where God plays dice. I don’t know if any of you remember
the dispute many years ago between Neils Bohr and Albert Einstein about
quantum mechanics? Einstein said, “God doesn’t play dice!”, and Bohr said,
“Well, He does in quantum mechanics!” I think God also plays dice in pure
mathematics.
I do believe that in the Platonic world of mathematics the bits of the
halting probability are fully determined. It’s not arbitrary, you can’t chose
them at random. In the Platonic world of pure mathematics each bit is
determined. Another way to put it is that God knows what each bit is.
But what can we know down here at our level with our ﬁnite means? Well,
seen from our limited perspective the bits of Ω are maximally unknowable,
they are a worst case.
The precise mathematical statement of why the bits of the numerical
value of Ω are diﬃcult to know, diﬃcult to calculate and diﬃcult to prove
(to determine what they are by proof) is this: In order to be able to calculate
the ﬁrst N bits of the halting probability you need to use a program that is
at least N bits in size. And to be able to prove what each of these N bits is
starting from a set of axioms, you need to have at least N bits of axioms.
So the bits of Ω are irreducible mathematical facts, they are computa-
tionally and logically irreducible. Essentially the only way to get out of a
74.
74 Chaitin: Metabiology
formal mathematical theory what these bits are, is to put that in as a new
axiom. But you can prove anything by adding it as a new axiom.
So this a place where mathematical truth has no structure, no pattern,
where logical reasoning doesn’t work, because these are sort of accidental
mathematical facts.
Let me explain this another way. Leibniz talks about something called
the principle of suﬃcient reason. He was a rationalist, and he believed that
if anything is true, it must be true for a reason. In pure math the reason
that something is true is called a proof. However, the bits of the halting
probability Ω are truths that are true for no reason; more precisely, they are
true for no reason simpler than themselves. The only way to prove what
they are is to take that as a new postulate. They seem to be completely
contingent, entirely accidental.
The bits of Ω are mathematical facts that are true for no reason.
They look a lot like independent tosses of a fair coin, even though they are
determined mathematically. It’s a perfect simulation within pure math of
independent tosses of a fair coin.
So to give an example, 0s and 1s are going to be equally likely. If you
knew all the even bits, it wouldn’t help you to get any of the odd bits. If
you knew the ﬁrst million bits, it wouldn’t help you to get the next bit. It’s
a place where mathematical truth just has no structure or pattern.
But the bits of Ω do have a lot of statistical structure. For example, in
the limit there will be exactly as many 0s as 1s, the ratio of their occurrences
will tend to unity. Also, all blocks of two bits are equally likely. 00, 01, 10
and 11 each have limiting relative frequency exactly 1/4 — you can prove
that. More generally, Ω is what Borel called a normal number, which means
that in each base b, every possible block of K base-b “digits” will have exactly
the same limiting relative frequency 1/bK
. That’s provably the case for Ω.
Ω is provably Borel normal.
Another thing you can show is that the Ω number is transcendental;
it’s not algebraic, it’s not the solution of an algebraic equation with integer
coeﬃcients. Actually any uncomputable number must be transcendental;
it can’t be algebraic. But Ω is more than uncomputable, it’s maximally
uncomputable. This is a place where mathematical truth has absolutely
no structure or pattern. This is a place where mathematical truth looks
contingent or accidental or random.
75.
How real are real numbers? 75
Now if I may go one step further, I’d like to end this talk by comparing
pure mathematics with theoretical physics and with biology.
Pure mathematics developed together with theoretical physics. A lot of
wonderful pure mathematicians of the past were also theoretical physicists,
Euler for example, or more recently Hermann Weyl. The two ﬁelds are rather
similar. And physicists are still hoping for a theory of everything (TOE),
which would be a set of simple, elegant equations that give you the whole
universe, and which would ﬁt on a T-shirt.
So that’s physics. On the other hand we have biology. Molecular biology
is a very complicated subject. An individual cell is like a city. Every one of
us has 3 × 109
bases in our DNA, which is 6 × 109
bits. There is no simple
equation for a human being. Biology is the domain of the complicated.
How does pure mathematics compare with these two other ﬁelds? Nor-
mally you think pure math is closer to physics, since they grew together,
they co-evolved. But what the bits of the halting probability Ω show is that
in a certain sense pure math is closer to biology than it is to theoretical
physics, because pure mathematics provably contains inﬁnite irreducible
complexity. Math is even worse than biology, which has very high but only
ﬁnite complexity. The human genome is 6 × 109
bits, which is a lot, but
it’s ﬁnite. But pure mathematics contains the bits of Ω, which is an inﬁnite
number of bits of complexity!
Human = 6 × 109
bits, Ω = inﬁnite number of bits.
Thanks very much!
77.
Chapter 6
Speculations on biology,
information and complexity
Bulletin of the European Association for Theoretical Computer Science 91 (February 2007), pp. 231–237.
Abstract: It would be nice to have a mathematical understanding of basic
biological concepts and to be able to prove that life must evolve in very general
circumstances. At present we are far from being able to do this. But I’ll
discuss some partial steps in this direction plus what I regard as a possible
future line of attack.
Can Darwinian evolution be made into a math-
ematical theory?
Is there a fundamental mathematical theory
for biology?
Darwin = math ?!
In 1960 the physicist Eugene Wigner published a paper with a wonderful title,
“The unreasonable eﬀectiveness of mathematics in the natural sciences.” In
this paper he marveled at the miracle that pure mathematics is so often
extremely useful in theoretical physics.
To me this does not seem so marvelous, since mathematics and physics co-
evolved. That however does not diminish the miracle that at a fundamental
77
78.
78 Chaitin: Metabiology
level Nature is ruled by simple, beautiful mathematical laws, that is, the
miracle that Nature is comprehensible.
I personally am much more disturbed by another phenomenon, pointed
out by I.M. Gel’fand and propagated by Vladimir Arnold in a lecture of
his that is available on the web, which is the stunning contrast between the
relevance of mathematics to physics, and its amazing lack of relevance to
biology!
Indeed, unlike physics, biology is not ruled by simple laws. There is
no equation for your spouse, or for a human society or a natural ecology.
Biology is the domain of the complex. It takes 3 × 109
bases = 6 × 109
bits
of information to specify the DNA that determines a human being.
Darwinian evolution has acquired the status of a dogma, but to me as
a mathematician seems woefully vague and unsatisfactory. What is evolu-
tion? What is evolving? How can we measure that? And can we prove,
mathematically prove, that with high probability life must arise and evolve?
In my opinion, if Darwin’s theory is as simple, fundamental and basic as
its adherents believe, then there ought to be an equally fundamental math-
ematical theory about this, that expresses these ideas with the generality,
precision and degree of abstractness that we are accustomed to demand in
pure mathematics.
Look around you. We are surrounded by evolving organisms, they’re
everywhere, and their ubiquity is a challenge to the mathematical way of
thinking. Evolution is not just a story for children fascinated by dinosaurs.
In my own lifetime I have seen the ease with which microbes evolve immunity
to antibiotics. We may well live in a future in which people will again die of
simple infections that we were once brieﬂy able to control.
Evolution seems to work remarkably well all around us, but not as a
mathematical theory!
In the next section of this paper I will speculate about possible directions
for modeling evolution mathematically. I do not know how to solve this
diﬃcult problem; new ideas are needed. But later in the paper I will have
the pleasure of describing a minor triumph. The program-size complexity
viewpoint that I will now describe to you does have some successes to its
credit, even though they only take us an inﬁnitesimal distance in the direction
we must travel to fully understand evolution.
79.
Speculations on biology, information and complexity 79
A software view of biology:
Can we model evolution via evolving software?
I’d like to start by explaining my overall point of view. It is summarized
here:
Life = Software ?
program → COMPUTER → output
DNA → DEVELOPMENT/PREGNANCY → organism
(Size of program in bits) ≈ (Amount of DNA in bases) × 2
So the idea is ﬁrstly that I regard life as software, biochemical software.
In particular, I focus on the digital information contained in DNA. In my
opinion, DNA is essentially a programming language for building an organism
and then running that organism.
More precisely, my central metaphor is that DNA is a computer program,
and its output is the organism. And how can we measure the complexity of an
organism? How can we measure the amount of information that is contained
in DNA? Well, each of the successive bases in a DNA strand is just 2 bits
of digital software, since there are four possible bases. The alphabet for
computer software is 0 and 1. The alphabet of life is A, G, C, and T,
standing for adenine, cytosine, guanine, and thymine. A program is just a
string of bits, and the human genome is just a string of bases. So in both
cases we are looking at digital information.
My basic approach is to measure the complexity of a digital object by the
size in bits of the smallest program for calculating it. I think this is more
or less analogous to measuring the complexity of a biological organism by 2
times the number of bases in its DNA.
Of course, this is a tremendous oversimpliﬁcation. But I am only search-
ing for a toy model of biology that is simple enough that I can prove some
theorems, not for a detailed theory describing the actual biological organ-
isms that we have here on earth. I am searching for the Platonic essence of
biology; I am only interested in the actual creatures we know and love to the
extent that they are clues for ﬁnding ideal Platonic forms of life.
How to go about doing this, I am not sure. But I have some suggestions.
It might be interesting, I think, to attempt to discover a toy model for
evolution consisting of evolving, competing, interacting programs. Each or-
ganism would consist of a single program, and we would measure its com-
plexity in bits of software. The only problem is how to make the programs
80.
80 Chaitin: Metabiology
interact! This kind of model has no geometry, it leaves out the physical uni-
verse in which the organisms live. In fact, it omits bodies and retains only
their DNA. This hopefully helps to make the mathematics more tractable.
But at present this model has no interaction between organisms, no notion
of time, no dynamics, and no reason for things to evolve. The question is
how to add that to the model.
Hopeless, you may say. Perhaps not! Let’s consider some other models
that people have proposed. In von Neumann’s original model creatures are
embedded in a cellular automata world and are largely immobile. Not so
good! There is also the problem of dissecting out the individual organisms
that are embedded in a toy universe, which must be done before their in-
dividual complexities can be measured. My suggestion in one of my early
papers that it might be possible to use the concept of mutual information—
the extent to which the complexity of two things taken together is smaller
than the sum of their individual complexities—in order to accomplish this,
is not, in my current opinion, particularly fruitful.
In von Neumann’s original model we have the complete physics for a
toy cellular automata universe. Walter Fontana’s ALChemy = algorithmic
chemistry project went to a slightly higher level of abstraction. It used
LISP S-expressions to model biochemistry. LISP is a functional programming
language in which everything—programs as well as data—is kept in identical
symbolic form, namely as what are called LISP S-expressions. Such programs
can easily operate on each other and produce other programs, much in the
way that molecules can react and produce other molecules.
I have a feeling that both von Neumann’s cellular automata world and
Fontana’s algorithmic chemistry are too low-level to model biological evolu-
tion. (A model with perhaps the opposite problem of being at too high a
level, is Douglas Lenat’s AM = Automated Mathematician project, which
dealt with the evolution of new mathematical concepts.) So instead I am
proposing a model in which individual creatures are programs. As I said,
the only problem is how to model the ecology in which these creatures com-
pete. In other words, the problem is how to insert a dynamics into this static
software world.1
1
Thomas Ray’s Tierra project did in fact create an ecology with software parasites
and hyperparasites. The software creatures he considered were sequences of machine
language instructions coexisting in the memory of a single computer and competing for
that machine’s memory and execution time. Again, I feel this model was too low-level. I
feel that too much micro-structure was included.
81.
Speculations on biology, information and complexity 81
Since I have not been able to come up with a suitable dynamics for the
software model I am proposing, I must leave this as a challenge for the
future and proceed to describe a few biologically relevant things that I can
do by measuring the size of computer programs. Let me tell you what this
viewpoint can buy us that is a tiny bit biologically relevant.
Pure mathematics has inﬁnite complexity and
is therefore like biology
Okay, program-size complexity can’t help us very much with biological com-
plexity and evolution, at least not yet. It’s not much help in biology. But
this viewpoint has been developed into a mathematical theory of complexity
that I ﬁnd beautiful and compelling—since I’m one of the people who cre-
ated it—and that has important applications in another major ﬁeld, namely
metamathematics. I call my theory algorithmic information theory, and in
it you measure the complexity of something X via the size in bits of the
smallest program for calculating X, while completely ignoring the amount
of eﬀort which may be necessary to discover this program or to actually run
it (time and storage space). In fact, we pay a severe price for ignoring the
time a program takes to run and concentrating only on its size. We get a
beautiful theory, but we can almost never be sure that we have found the
smallest program for calculating something. We can almost never determine
the complexity of anything, if we chose to measure that in terms of the size
of the smallest program for calculating it!
This amazing fact, a modern example of the incompleteness phenomenon
ﬁrst discovered by Kurt G¨odel in 1931, severely limits the practical utility of
the concept of program-size complexity. However, from a philosophical point
of view, this paradoxical limitation on what we can know is precisely the
most interesting thing about algorithmic information theory, because that
has profound epistemological implications.
The jewel in the crown of algorithmic information theory is the halting
probability Ω, which provides a concentrated version of Alan Turing’s 1936
halting problem. In 1936 Turing asked if there was a way to determine
whether or not individual self-contained computer programs will eventually
stop. And his answer, surprisingly enough, is that this cannot be done.
Perhaps it can be done in individual cases, but Turing showed that there
82.
82 Chaitin: Metabiology
could be no general-purpose algorithm for doing this, one that would work
for all possible programs.
The halting probability Ω is deﬁned to be the probability that a program
that is chosen at random, that is, one that is generated by coin tossing, will
eventually halt. If no program ever halted, the value of Ω would be zero. If
all programs were to halt, the value of Ω would be one. And since in actual
fact some programs halt and some fail to halt, the value of Ω is greater
than zero and less than one. Moreover, Ω has the remarkable property that
its numerical value is maximally unknowable. More precisely, let’s imagine
writing the value of Ω out in binary, in base-two notation. That would consist
of a binary point followed by an inﬁnite stream of bits. It turns out that these
bits are irreducible, both computationally and logically:
• You need an N-bit program in order to be able to calculate the ﬁrst N
bits of the numerical value of Ω.
• You need N bits of axioms in order to be able to prove what are the
ﬁrst N bits of Ω.
• In fact, you need N bits of axioms in order to be able to determine the
positions and values of any N bits of Ω, not just the ﬁrst N bits.
Thus the bits of Ω are, in a sense, mathematical facts that are true for no
reason, more precisely, for no reason simpler than themselves. Essentially
the only way to determine the values of some of these bits is to directly add
that information as a new axiom.
And the only way to calculate individual bits of Ω is to separately add
each bit you want to your program. The more bits you want, the larger your
program must become, so the program doesn’t really help you very much.
You see, you can only calculate bits of Ω if you already know what these bits
are, which is not terribly useful. Whereas with π = 3.1415926 . . . we can get
all the bits or all the digits from a single ﬁnite program, that’s all you have
to know. The algorithm for compresses an inﬁnite amount of information
into a ﬁnite package. But with Ω there can be no compression, none at all,
because there is absolutely no structure.
Furthermore, since the bits of Ω in their totality are inﬁnitely complex,
we see that pure mathematics contains inﬁnite complexity. Each of the bits
of Ω is, so to speak, a complete surprise, an individual atom of mathematical
creativity. Pure mathematics is therefore, fundamentally, much more similar
83.
Speculations on biology, information and complexity 83
to biology, the domain of the complex, than it is to physics, where there
is still hope of someday ﬁnding a theory of everything, a complete set of
equations for the universe that might even ﬁt on a T-shirt.
In my opinion, establishing this surprising fact has been the most impor-
tant achievement of algorithmic information theory, even though it is actually
a rather weak link between pure mathematics and biology. But I think it’s
an actual link, perhaps the ﬁrst.
Computing Ω in the limit from below as a
model for evolution
I should also point out that Ω provides an extremely abstract—much too
abstract to be satisfying—model for evolution. Because even though Ω con-
tains inﬁnite complexity, it can be obtained in the limit of inﬁnite time via
a computational process. Since this extremely lengthy computational pro-
cess generates something of inﬁnite complexity, it may be regarded as an
evolutionary process.
How can we do this? Well, it’s actually quite simple. Even though, as
I have said, Ω is maximally unknowable, there is a simple but very time-
consuming way to obtain increasingly accurate lower bounds on Ω. To do
this simply pick a cut-oﬀ t, and consider the ﬁnite set of all programs p up
to t bits in size which halt within time t. Each such program p contributes
1/2|p|
, 1 over 2 raised to p’s size in bits, to Ω. In other words,
Ω = lim
t→∞
|p| ≤ t & halts within time t
2−|p|
.
This may be cute, and I feel compelled to tell you about it, but I certainly
do not regard this as a satisfactory model for biological evolution, since there
is no apparent connection with Darwin’s theory.
References
The classical work on a theoretical mathematical underpinning for biology
is von Neumann’s posthumous book [2]. (An earlier account of von Neu-
mann’s thinking on this subject was published in [1], which I read as a
84.
84 Chaitin: Metabiology
child.) Interestingly enough, Francis Crick—who probably contributed more
than any other individual to creating modern molecular biology—for many
years shared an oﬃce with Sydney Brenner, who was aware of von Neumann’s
thoughts on theoretical biology and self-reproduction. This interesting fact
is revealed in the splendid biography of Crick [3].
For a book-length presentation of my own work on information and com-
plexity, see [4], where there is a substantial amount of material on molecular
biology. This book is summarized in my recent article [5], which however
does not discuss biology. A longer overview of [4] is my Alan Turing lecture
[6], which does touch on biological questions.
For my complete train of thought on biology extending over nearly four
decades, see also [7,8,9,10,11].
For information on Tierra, see Tom Ray’s home page at http://www.his.
atr.jp/~ray/. For information on ALChemy, see http://www.santafe.
edu/~walter/AlChemy/papers.html. For information on Douglas Lenat’s
Automated Mathematician, see [12] and the Wikipedia entry http://en.
wikipedia.org/wiki/Automated_Mathematician.
For Vladimir Arnold’s provocative lecture, the one in which Wigner and
Gel’fand are mentioned, see http://pauli.uni-muenster.de/~munsteg/
arnold.html. Wigner’s entire paper is itself on the web at http://www.
dartmouth.edu/~matc/MathDrama/reading/Wigner.html.
1. J. Kemeny, “Man viewed as a machine,” Scientiﬁc American, April
1955, pp. 58–67.
2. J. von Neumann, Theory of Self-Reproducing Automata, University of
Illinois Press, Urbana, 1967.
3. M. Ridley, Francis Crick, Eminent Lives, New York, 2006.
4. G. Chaitin, Meta Math!, Pantheon Books, New York, 2005.
5. G. Chaitin, “The limits of reason,” Scientiﬁc American, March 2006,
pp. 74–81.
6. G. Chaitin, “Epistemology as information theory: from Leibniz to Ω,”
European Computing and Philosophy Conference, V¨aster˚as, Sweden,
June 2005.
85.
Speculations on biology, information and complexity 85
7. G. Chaitin, “To a mathematical deﬁnition of ‘life’,” ACM SICACT
News, January 1970, pp. 12–18.
8. G. Chaitin, “Toward a mathematical deﬁnition of ‘life’,” R. Levine,
M. Tribus, The Maximum Entropy Formalism, MIT Press, 1979, pp.
477–498.
9. G. Chaitin, “Algorithmic information and evolution,” O. Solbrig, G.
Nicolis, Perspectives on Biological Complexity, IUBS Press, 1991, pp.
51-60.
10. G. Chaitin, “Complexity and biology,” New Scientist, 5 October 1991,
p. 52.
11. G. Chaitin, “Meta-mathematics and the foundations of mathematics,”
Bulletin of the European Association for Theoretical Computer Science,
June 2002, pp. 167–179.
12. D. Lenat, “Automated theory formation in mathematics,” pp. 833–842
in volume 2 of R. Reddy, Proceedings of the 5th International Joint
Conference on Artiﬁcial Intelligence, Cambridge, MA, August 1977,
William Kaufmann, 1977.
87.
Chapter 7
Metaphysics, metamathematics
and metabiology
To be published in H. Zenil, Randomness Through Computation, World Scientiﬁc, 2011.
Abstract: In this essay we present an information-theoretic perspec-
tive on epistemology using software models. We shall use the notion of
algorithmic information to discuss what is a physical law, to determine the
limits of the axiomatic method, and to analyze Darwin’s theory of evolution.
Weyl, Leibniz, complexity and the principle of
suﬃcient reason
The best way to understand the deep concept of conceptual complexity and
algorithmic information, which is our basic tool, is to see how it evolved,
to know its long history. Let’s start with Hermann Weyl and the great
philosopher/mathematician G. W. Leibniz. That everything that is true is
true for a reason is rationalist Leibniz’s famous principle of suﬃcient reason.
The bits of Ω seem to refute this fundamental principle and also the idea
that everything can be proved starting from self-evident facts.
87
88.
88 Chaitin: Metabiology
What is a scientiﬁc theory?
The starting point of algorithmic information theory, which is the subject of
this essay, is this toy model of the scientiﬁc method:
theory/program/010 → Computer → experimental data/output/110100101.
A scientiﬁc theory is a computer program for exactly producing the exper-
imental data, and both theory and data are a ﬁnite sequence of bits, a bit
string. Then we can deﬁne the complexity of a theory to be its size in bits,
and we can compare the size in bits of a theory with the size in bits of the
experimental data that it accounts for.
That the simplest theory is best, means that we should pick the smallest
program that explains a given set of data. Furthermore, if the theory is the
same size as the data, then it is useless, because there is always a theory that
is the same size as the data that it explains. In other words, a theory must
be a compression of the data, and the greater the compression, the better
the theory. Explanations are compressions, comprehension is compression!
Furthermore, if a bit string has absolutely no structure, if it is completely
random, then there will be no theory for it that is smaller than it is. Most
bit strings of a given size are incompressible and therefore incomprehensible,
simply because there are not enough smaller theories to go around.
This software model of science is not new. It can be traced back via
Hermann Weyl (1932) to G. W. Leibniz (1686)! Let’s start with Weyl. In
his little book on philosophy The Open World: Three Lectures on the Meta-
physical Implications of Science, Weyl points out that if arbitrarily complex
laws are allowed, then the concept of law becomes vacuous, because there is
always a law! In his view, this implies that the concept of a physical law and
of complexity are inseparable; for there can be no concept of law without a
corresponding complexity concept. Unfortunately he also points out that in
spite of its importance, the concept of complexity is a slippery one and hard
to deﬁne mathematically in a convincing and rigorous fashion.
Furthermore, Weyl attributes these ideas to Leibniz, to the 1686 Dis-
cours de m´etaphysique. What does Leibniz have to say about complexity in
his Discours? The material on complexity is in Sections V and VI of the
Discours.
In Section V, Leibniz explains why science is possible, why the world is
comprehensible, lawful. It is, he says, because God has created the best
possible, the most perfect world, in that the greatest possible diversity of
89.
Metaphysics, metamathematics and metabiology 89
phenomena are governed by the smallest possible set of ideas. God simul-
taneously maximizes the richness and diversity of the world and minimizes
the complexity of the ideas, of the mathematical laws, that determine this
world. That is why science is possible!
A modern restatement of this idea is that science is possible because the
world seems very complex but is actually governed by a small set of laws
having low conceptual complexity.
And in Section VI of the Discours, Leibniz touches on randomness. He
points out that any ﬁnite set of points on a piece of graph paper always seems
to follow a law, because there is always a mathematical equation passing
through those very points. But there is a law only if the equation is simple,
not if it is very complicated. This is the idea that impressed Weyl, and it
becomes the deﬁnition of randomness in algorithmic information theory.1
Finding elegant programs
So the best theory for something is the smallest program that calculates it.
How can we be sure that we have the best theory? Let’s forget about theories
and just call a program elegant if it is the smallest program that produces
the output that it does. More precisely, a program is elegant if no smaller
program written in the same language produces the same output.
So can we be sure that a program is elegant, that it is the best theory
for its output? Amazingly enough, we can’t: It turns out that any formal
axiomatic theory A can prove that at most ﬁnitely many programs are el-
egant, in spite of the fact that there are inﬁnitely many elegant programs.
More precisely, it takes an N-bit theory A, one having N bits of axioms,
having complexity N, to be able to prove that an individual N-bit program
is elegant. And we don’t need to know much about the formal axiomatic
theory A in order to be able to prove that it has this limitation.
What is a formal axiomatic theory?
All we need to know about the axiomatic theory A, is the crucial require-
ment emphasized by David Hilbert that there should be a proof-checking
1
Historical Note: Algorithmic information theory was ﬁrst proposed in the 1960s by
R. Solomonoﬀ, A. N. Kolmogorov, and G. J. Chaitin. Solomonoﬀ and Chaitin considered
this toy model of the scientiﬁc method, and Kolmogorov and Chaitin proposed deﬁning
randomness as algorithmic incompressibility.
90.
90 Chaitin: Metabiology
algorithm, a mechanical procedure for deciding if a proof is correct or not. It
follows that we can systematically run through all possible proofs, all possible
strings of characters in the alphabet of the theory A, in size order, check-
ing which ones are valid proofs, and thus discover all the theorems, all the
provable assertions in the theory A.2
That’s all we need to know about a formal axiomatic theory A, that there
is an algorithm for generating all the theorems of the theory. This is the
software model of the axiomatic method studied in algorithmic information
theory. If the software for producing all the theorems is N bits in size, then
the complexity of our theory A is deﬁned to be N bits, and we can limit A’s
power in terms of its complexity H(A) = N. Here’s how:
Why can’t you prove that a program is elegant?
Suppose that we have an N-bit theory A, that is, that H(A) = N, and that
it is always possible to prove that individual elegant programs are in fact
elegant, and that it is never possible to prove that inelegant programs are
elegant. Consider the following paradoxical program P:
P runs through all possible proofs in the formal axiomatic theory
A, searching for the ﬁrst proof in A that an individual program
Q is elegant for which it is also the case that the size of Q in bits
is larger than the size of P in bits. And what does P do when it
ﬁnds Q? It runs Q and then P produces as its output the output
of Q.
In other words, the output of P is the same as the output of the ﬁrst provably
elegant program Q that is larger than P. But this contradicts the deﬁnition
of elegance! P is too small to be able to calculate the output of an elegant
program Q that is larger than P. We seem to have arrived at a contradiction!
But do not worry; there is no contradiction. What we have actually
proved is that P can never ﬁnd Q. In other words, there is no proof in the
formal axiomatic theory A that an individual program Q is elegant, not if
Q is larger than P. And how large is P? Well, just a ﬁxed number of bits
c larger than N, the complexity H(A) of the formal axiomatic theory A. P
2
Historical Note: The idea of running through all possible proofs, of creativity by
mechanically trying all possible combinations, can be traced back through Leibniz to
Ramon Llull in the 1200s.
91.
Metaphysics, metamathematics and metabiology 91
consists of a small, ﬁxed main program c bits in size, followed by a large
subroutine H(A) bits in size for generating all the theorems of A.
The only thing tricky about this proof is that it requires P to be able to
know its own size in bits. And how well we are able to do this depends on
the details of the particular programming language that we are using for the
proof. So to get a neat result and to be able to carry out this simple, elegant
proof, we have to be sure to use an appropriate programming language. This
is one of the key issues in algorithmic information theory, which programming
language to use.3
Farewell to reason: The halting probability Ω4
So there are inﬁnitely many elegant programs, but there are only ﬁnitely
many provably elegant programs in any formal axiomatic theory A. The proof
of this is rather straightforward and short. Nevertheless, this is a fundamental
information-theoretic incompleteness theorem that is rather diﬀerent in style
from the classical incompleteness results of G¨odel, Turing and others.
An even more important incompleteness result in algorithmic informa-
tion theory has to do with the halting probability Ω, the numerical value
of the probability that a program p whose successive bits are generated by
independent tosses of a fair coin will eventually halt:
Ω =
p halts
2−(size in bits of p)
.
To be able to deﬁne this probability Ω, it is also very important how you
chose your programming language. If you are not careful, this sum will
diverge instead of being ≤ 1 like a well-behaved probability should.
Turing’s fundamental result is that the halting problem in unsolvable.
In algorithmic information theory the fundamental result is that the halting
probability Ω is algorithmically irreducible or random. It follows that the
bits of Ω cannot be compressed into a theory less complicated than they
are. They are irreducibly complex. It takes N bits of axioms to be able to
3
See the chapter on “The Search for the Perfect Language” in Chaitin, Mathematics,
Complexity and Philosophy, in press.
4
Farewell to Reason is the title of a book by Paul Feyerabend, a wonderfully provocative
philosopher. We borrow his title here for dramatic eﬀect, but he does not discuss Ω in
this book or any of his other works.
92.
92 Chaitin: Metabiology
determine N bits of the numerical value
Ω = .1101011 . . .
of the halting probability. If your formal axiomatic theory A has H(A) = N,
then you can determine the values and positions of at most N + c bits of Ω.
In other words, the bits of Ω are logically irreducible, they cannot be
proved from anything simpler than they are. Essentially the only way to
determine what are the bits of Ω is to add these bits to your theory A as new
axioms. But you can prove anything by adding it as a new axiom. That’s
not using reasoning!
So the bits of Ω refute Leibniz’s principle of suﬃcient reason: they are
true for no reason. More precisely, they are not true for any reason simpler
than themselves. This is a place where mathematical truth has absolutely
no structure, no pattern, for which there is no theory!
Adding new axioms: Quasi-empirical mathematics5
So incompleteness follows immediately from fundamental information-
theoretic limitations. What to do about incompleteness? Well, just add
new axioms, increase the complexity H(A) of your theory A! That is the
only way to get around incompleteness.
In other words, do mathematics more like physics, add new axioms not
because they are self-evident, but for pragmatic reasons, because they help
mathematicians to organize their mathematical experience just like physi-
cal theories help physicists to organize their physical experience. After all,
Maxwell’s equations and the Schr¨odinger equation are not at all self-evident,
but they work! And this is just what mathematicians have done in theoret-
ical computer science with the hypothesis that P = NP, in mathematical
cryptography with the hypothesis that factoring is hard, and in abstract
axiomatic set theory with the new axiom of projective determinacy.6
5
The term quasi-empirical is due to the philosopher Imre Lakatos, a friend of Feyer-
abend. For more on this school, including the original article by Lakatos, see the collection
of quasi-empirical philosophy of math papers edited by Thomas Tymoczko, New Directions
in the Philosophy of Mathematics.
6
See the article on “The Brave New World of Bodacious Assumptions in Cryptography”
in the March 2010 issue of the AMS Notices, and the article by W. Hugh Woodin on “The
Continuum Hypothesis” in the June/July 2001 issue of the AMS Notices.
93.
Metaphysics, metamathematics and metabiology 93
Mathematics, biology and metabiology
We’ve discussed physical and mathematical theories; now let’s turn to biol-
ogy, the most exciting ﬁeld of science at this time, but one where mathematics
is not very helpful. Biology is very diﬀerent from physics. There is no sim-
ple equation for your spouse. Biology is the domain of the complex. There
are not many universal rules. There are always exceptions. Math is very
important in theoretical physics, but there is no fundamental mathematical
theoretical biology.
This is unacceptable. The honor of mathematics requires us to come up
with a mathematical theory of evolution and either prove that Darwin was
wrong or right! We want a general, abstract theory of evolution, not an
immensely complicated theory of actual biological evolution. And we want
proofs, not computer simulations! So we’ve got to keep our model very, very
simple.
That’s why this proposed new ﬁeld is metabiology, not biology.
What kind of math can we use to build such a theory? Well, it’s certainly
not going to be diﬀerential equations. Don’t expect to ﬁnd the secret of
life in a diﬀerential equation; that’s the wrong kind of mathematics for a
fundamental theory of biology.
In fact a universal Turing machine has much more to do with biology than
a diﬀerential equation does. A universal Turing machine is a very complicated
new kind of object compared to what came previously, compared with the
simple, elegant ideas in classical mathematics like analysis. And there are
self-reproducing computer programs, which is an encouraging sign.
There are in fact three areas in our current mathematics that do have
some fundamental connection with biology, that show promise for math to
continue moving in a biological direction:
Computation, Information, Complexity.
DNA is essentially a programming language that computes the organism and
its functioning; hence the relevance of the theory of computation for biology.
Furthermore, DNA contains biological information. Hence the relevance
of information theory. There are in fact at least four diﬀerent theories of
information:
• Boltzmann statistical mechanics and Boltzmann entropy,
• Shannon communication theory and coding theory,
94.
94 Chaitin: Metabiology
• algorithmic information theory (Solomonoﬀ, Kolmogorov, Chaitin),
which is the subject of this essay, and
• quantum information theory and qubits.
Of the four, AIT (algorithmic information theory) is closest in spirit to biol-
ogy. AIT studies the size in bits of the smallest program to compute some-
thing. And the complexity of a living organism can be roughly (very roughly)
measured by the number of bases in its DNA, in the biological computer pro-
gram for calculating it.
Finally, let’s talk about complexity. Complexity is in fact the most distin-
guishing feature of biological as opposed to physical science and mathematics.
There are many computational deﬁnitions of complexity, usually concerned
with computation times, but again AIT, which concentrates on program size
or conceptual complexity, is closest in spirit to biology.
Let’s emphasize what we are not interested in doing. We are certainly
not trying to do systems biology: large, complex realistic simulations of
biological systems. And we are not interested in anything that is at all like
Fisher-Wright population genetics that uses diﬀerential equations to study
the shift of gene frequencies in response to selective pressures.
We want to use a suﬃciently rich mathematical space to model the space
of all possible designs for biological organisms, to model biological creativity.
And the only space that is suﬃciently rich to do that is a software space, the
space of all possible algorithms in a ﬁxed programming language. Otherwise
we have limited ourselves to a ﬁxed set of possible genes as in population
genetics, and it is hopeless to expect to model the major transitions in bio-
logical evolution such as from single-celled to multicellular organisms, which
is a bit like taking a main program and making it into a subroutine that is
called many times.
Recall the cover of Stephen Gould’s Wonderful Life on the Burgess shale
and the Cambrian explosion? Around 250 primitive organisms with wildly
diﬀering body plans, looking very much like the combinatorial exploration of
a software space. Note that there are no intermediate forms; small changes
in software produce vast changes in output.
So to simplify matters and concentrate on the essentials, let’s throw away
the organism and just keep the DNA. Here is our proposal:
Metabiology: a ﬁeld parallel to biology that studies the random
evolution of artiﬁcial software (computer programs) rather than
95.
Metaphysics, metamathematics and metabiology 95
natural software (DNA), and that is suﬃciently simple to permit
rigorous proofs or at least heuristic arguments as convincing as
those that are employed in theoretical physics.
This analogy may seem a bit far-fetched. But recall that Darwin himself
was inspired by the analogy between artiﬁcial selection by plant and animal
breeders and natural section imposed by malthusian limitations.
Furthermore, there are many tantalizing analogies between DNA and
large, old pieces of software. Remember bricolage, that Nature is a cobbler,
a tinkerer? In fact, a human being is just a very large piece of software, one
that is 3 × 109
bases = 6 × 109
bits ≈ one gigabyte of software that has been
patched and modiﬁed for more than a billion years: a tremendous mess, in
fact, with bits and pieces of ﬁsh and amphibian design mixed in with that
for a mammal.7
For example, at one point in gestation the human embryo
has gills. As time goes by, large human software projects also turn into a
tremendous mess with many old bits and pieces.
The key point is that you can’t start over, you’ve got to make do with
what you have as best you can. If we could design a human being from
scratch we could do a much better job. But we can’t start over. Evolution
only makes small changes, incremental patches, to adapt the existing code
to new environments.
So how do we model this? Well, the key ideas are:
Evolution of mutating software,
and:
Random walks in software space.
That’s the general idea. And here are the speciﬁcs of our current model,
which is quite tentative.
We take an organism, a single organism, and perform random mutations
on it until we get a ﬁtter organism. That replaces the original organism, and
then we continue as before. The result is a random walk in software space
with increasing ﬁtness, a hill-climbing algorithm in fact.8
7
See Neil Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year History of the
Human Body.
8
In order to avoid getting stuck on a local maximum, in order to keep evolution from
stopping, we stipulate that there is a non-zero probability to go from any organism to
any other organism, and − log2 of the probability of mutating from A to B deﬁnes an
important concept, the mutation distance, which is measured in bits.
96.
96 Chaitin: Metabiology
Finally, a key element in our proposed model is the deﬁnition of ﬁtness.
For evolution to work, it is important to keep our organisms from stagnating.
It is important to give them something challenging to do.
The simplest possible challenge to force our organisms to evolve is what
is called the Busy Beaver problem, which is the problem of providing concise
names for extremely large integers. Each of our organisms produces a single
positive integer. The larger the integer, the ﬁtter the organism.9
The Busy Beaver function of N, BB(N), that is used in AIT is deﬁned to
be the largest positive integer that is produced by a program that is less than
or equal to N bits in size. BB(N) grows faster than any computable function
of N and is closely related to Turing’s famous halting problem, because if
BB(N) were computable, the halting problem would be solvable.10
Doing well on the Busy Beaver problem can utilize an unlimited amount
of mathematical creativity. For example, we can start with addition, then
invent multiplication, then exponentiation, then hyper-exponentials, and use
this to concisely name large integers:
N + N → N × N → NN
→ NNN
→ . . .
There are many possible choices for such an evolving software model:
You can vary the computer programming language and therefore the soft-
ware space, you can change the mutation model, and eventually you could
also change the ﬁtness measure. For a particular choice of language and
probability distribution of mutations, and keeping the current ﬁtness func-
tion, it is possible to show that in time of the order of 2N
the ﬁtness will
grow as BB(N), which grows faster than any computable function of N and
shows that genuine creativity is taking place, for mechanically changing the
organism can only yield ﬁtness that grows as a computable function.11
9
Alternative formulations: The organism calculates a total function f(n) of a single
non-negative integer n and f(n) is ﬁtter than g(n) if f(n)/g(n) → ∞ as n → ∞. Or the
organism calculates a (constructive) Cantor ordinal number and the larger the ordinal,
the ﬁtter the organism.
10
Consider BB (N) deﬁned to be the maximum run-time of any program that halts that
is less than or equal to N bits in size.
11
Note that to actually simulate our model an oracle for the halting problem would
have to be employed to avoid organisms that have no ﬁtness because they never calculate
a positive integer. This also explains how the ﬁtness can grow faster than any computable
function. In our evolution model, implicit use is being made of an oracle for the halting
problem, which answers questions whose answers cannot be computed by any algorithmic
process.
97.
Metaphysics, metamathematics and metabiology 97
So with random mutations and just a single organism we actually do get
evolution, unbounded evolution, which was precisely the goal of metabiology!
This theorem may seem encouraging, but it actually has a serious prob-
lem. The times involved are so large that our search process is essentially
ergodic, which means that we are doing an exhaustive search. Real evolu-
tion is not at all ergodic, since the space of all possible designs is much too
immense for exhaustive search.
It turns out that with this same model there is actually a much quicker
ideal evolutionary pathway that achieves ﬁtness BB(N) in time of the order of
N. This path is however unstable under random mutations, plus it is much
too good: Each organism adds only a single bit to the preceding organism,
and immediately achieves near optimal ﬁtness for an organism of its size,
which doesn’t seem to at all reﬂect the haphazard, frozen-accident nature of
what actually happens in biological evolution.12
So that is the current state of metabiology: a ﬁeld with some promise, but
not much actual content at the present time. The particular details of our
current model are not too important. Some kind of mutating software model
should work, should exhibit some kind of basic biological features. The chal-
lenge is to identify such a model, to characterize its behavior statistically,13
and to prove that it does what is required.
12
The Nth organism in this ideal evolutionary pathway is essentially just the ﬁrst N bits
of the numerical value of the halting probability Ω. Can you ﬁgure out how to compute
BB(N) from this?
13
For instance, will some kind of hierarchical structure emerge? Large human software
projects are always written that way.
99.
Bibliography
[1] G. J. Chaitin, Thinking about G¨odel and Turing: Essays on Complexity,
1970–2007, World Scientiﬁc, 2007.
[2] G. J. Chaitin, Mathematics, Complexity and Philosophy, Midas, in press.
(Draft at http://www.cs.umaine.edu/~chaitin/midas.html.)
[3] S. Gould, Wonderful Life, Norton (1989).
[4] N. Koblitz and A. Menezes, “The brave new world of bodacious assump-
tions in cryptography,” AMS Notices 57, 357–365 (2010).
[5] G. W. Leibniz, Discours de m´etaphysique, suivi de Monadologie, Galli-
mard (1995).
[6] N. Shubin, Your Inner Fish, Pantheon (2008).
[7] T. Tymoczko, New Directions in the Philosophy of Mathematics, Prince-
ton University Press (1998).
[8] H. Weyl, The Open World, Yale University Press (1932).
[9] W. H. Woodin, “The continuum hypothesis, Part I,” AMS Notices 48,
567–576 (2001).
99
101.
Chapter 8
Algorithmic information as a
fundamental concept in physics,
mathematics and biology
In Memoriam Jacob T. “Jack” Schwartz (1930–2009)
The concept of information is not only fundamental in quantum me-
chanics, but also, when formulated as program-size complexity, helps in
understanding what is a law of nature, the limitations of the axiomatic
method, and Darwin’s theory of evolution. Lecture given Wednesday, 23
September 2009, at the Institute for Quantum Computing in Waterloo,
Canada.
I’m delighted to be here at IQC, an institution devoted to two of my favorite
topics, information and computation.
Information & Computation
The ﬁeld I work in is algorithmic information theory, AIT, and in a funny
way AIT is precisely the dual of what the IQC is all about, which is quantum
information and quantum computing.
Let me compare and contrast the two ﬁelds: First of all, I care about
bits of software, not qubits, I care about the size of programs, not about
101
102.
102 Chaitin: Metabiology
compute times; I look at the information content of individual objects, not
at ensembles, and my computers are classical, not quantum. You care about
what can be known in physical systems and I care about what can be known
in pure mathematics. You care about practical applications, and I care about
philosophy.
And strangely enough, we both sort of end up in the same place, because
God plays dice both in quantum mechanics and in pure math: You need
quantum randomness for cryptography, and I ﬁnd irreducible complexity in
the bits of the halting probability Ω.
So my subject will be algorithmic information, not qubits, and I’d like to
show you three diﬀerent applications of the notion of algorithmic information:
in physics, in math and in biology. I’ll show you three software models, three
toy models of what goes on in physics, in math and in biology, in which you
get insight by considering the amount of software, the size of programs, the
algorithmic information content.
But ﬁrst of all, I should say that these are deﬁnitely toy models, highly
simpliﬁed models, of what goes on in physics, in math and in biology. In
fact, my motto for today is taken from Picasso, who said, “Art is a lie that
helps us see the truth!” Well, that also applies to theories:
Theories are lies that help us see the truth! (Picasso)
You see, in order to create a mathematical theory in which you can prove
neat theorems, you have to concentrate on the essential features of a situation.
The models that I will show you are highly simpliﬁed toy models. You have
to eliminate all the distractions, brush away all the inessential features, so
that you can see the ideal case, the ideal situation. I am only interested in
the Platonic essence of a situation, so that I can weave it into a beautiful
mathematical theory, so that I can lay bare its inner soul. I have no interest
in complicated, realistic models of anything, because they do not lead to
beautiful theories.
You will have to judge for yourself if my models are oversimpliﬁed, and
whether or not they help you to understand what is going on in more realistic
situations.
So let’s start with physics, with what you might call AIT’s version of the
Leibniz/Weyl model of the scientiﬁc method. What is a law of nature, what
is a scientiﬁc law? The AIT model of this is given in this diagram:
Theory / Program / 011 → Computer → Experimental Data / World / 1101101
103.
Algorithmic information in physics, mathematics and biology 103
On the left-hand side we have a scientiﬁc theory, which is a computer pro-
gram, a ﬁnite binary sequence, a bit string, for calculating exactly your
experimental data, perhaps the whole time-evolution of the universe, which
in this discrete model is also a bit string.
In other words, in this model a program is a theory for its output. I am
not interested in prediction and I am not interested in statistical theories,
only in deterministic theories that explain the data perfectly. And I don’t
care how much work it takes to execute the program, to run the theory, to
actually calculate the world from it, as long as the amount of time required
to do this is ﬁnite. And I assume the world is discrete, not continuous.
Remember Picasso. Remember I’m a pure mathematician!
The best theory is the smallest, the most concise program that calculates
precisely your data. And a theory is useless unless it is a compression, unless
it has a much smaller number of bits than the data it accounts for, the data
it explains. Why? Because there is always a theory with the same number
of bits, even if the data is completely random.
In fact, this gives you a way to distinguish between situations where
there is a law, and lawless situations, ones where there is no theory, no way
to understand what is happening.
Amazingly enough, aspects of this approach go back to Leibniz in 1686
and to Weyl in 1932. Let me tell you about this. In fact, it’s all here:
If arbitrarily complex laws are permitted, then the concept of law
becomes vacuous, because there is always a law!
Leibniz, 1686: Discours de m´etaphysique V, VI
Hermann Weyl, 1932: The Open World
1949: Philosophy of Mathematics & Natural Science
So you see, the concepts of law of nature and of complexity are insepara-
ble. Remember, 1686 was the year before Newton’s Principia. What Leibniz
discusses in Sections V and VI of his Discours is why is science possible,
how can you distinguish a world in which science applies from one where it
doesn’t, how can you tell if the world is lawful or not. According to AIT,
that’s only if there’s a theory with a much smaller number of bits than the
data.
Okay, that’s enough physics, let’s go on to mathematics. What if you
want to prove that you have the best theory, the most concise program for
something? What if you try to use mathematical reasoning for that?
104.
104 Chaitin: Metabiology
What metamathematics studies are so-called formal axiomatic theories,
in which there has to be a proof-checking algorithm, and therefore there is an
algorithm for checking all possible proofs and enumerating all the theorems
in your formal axiomatic theory. And that’s the model of an axiomatic
theory studied in AIT. AIT concentrates on the size in bits of the algorithm
for running through all the proofs and producing all theorems, a very slow,
never-ending computation, but one that can be done mindlessly. So how
many bits of information does it take to do this? That is all AIT cares
about. To AIT a formal axiomatic theory is just a black box; the inner
structure, the details of the theory, are unimportant.
Formal Axiomatic Theory / Hilbert
Axioms, Rules of Inference → Computer → Theorem1, Theorem2,
Theorem3 . . .
How many bits of software are you putting into the computer to
get the theorems out?
So that’s how we measure the information content, the complexity of a
mathematical theory. The complexity of a formal axiomatic theory is the
size in bits of the program that generates all the theorems. An N-bit theory
is one in which the program to generate all the theorems is N bits in size.
And my key result is that it takes an N-bit axiomatic theory to enable
you to prove that an N-bit program is elegant, that is, that it is the most
concise explanation for its output. More precisely, a program is elegant if no
smaller program written in the same language produces the same output.
It takes an N-bit theory to prove that an N-bit program is elegant.
How can you prove this metatheorem?
Well, that’s very easy. Consider a formal axiomatic mathematical theory
T and this paradoxical program P:
P computes the same output as the ﬁrst provably elegant program Q
whose size in bits is larger than P.
In other words, P runs through all proofs in T, checking them all, until it
gets to the ﬁrst proof that a program Q that is larger than P is elegant,
then P runs Q and produces Q’s output. But if P ﬁnds Q and does that,
it produces the same output as a provably elegant program Q that is larger
105.
Algorithmic information in physics, mathematics and biology 105
than P, which is impossible, because an elegant program is the most concise
program that yields the output it does.
Hence P never ﬁnds Q, which means that in T you can’t prove that Q is
elegant if Q is larger in size than P.
So now the key question is this: How many bits are there in P? Well, just
a ﬁxed number of bits more than there are in T. In other words, there is a
constant c such that for any formal axiomatic theory T, if you can prove in T
that a program Q is elegant only if it is actually elegant, then you can prove
in T that Q is elegant only if Q’s size in bits, |Q|, is less than or equal to c
plus the complexity of T, which is by deﬁnition the number of bits of software
that it takes to enumerate all the theorems of your theory T. Q.E.D.
“Q is elegant” ∈ T only if Q is elegant
⇒ “Q is elegant” ∈ T only if |Q| ≤ |T| + c.
That’s the ﬁrst major result in AIT: that it takes an N-bit theory to prove
that an N-bit program is elegant. The second major result is that the bits of
the base-two numerical value of the halting probability Ω are irreducible,
because it takes an N-bit theory to enable you to determine N bits of Ω.
This I won’t prove, but let me remind you how Ω is deﬁned. And of
course Ω = ΩL also depends on your choice of programming language L,
which I discussed in my talk on Monday at the Perimeter Institute:
Halting Probability
0 < ΩL = p halts 2−|p|
< 1, ΩL = .1101100 . . .
N-bit theory ⇒ at most N + c bits of Ω.
By the way, the fact that in general a formal axiomatic theory can’t
enable you to prove that individual programs Q are elegant, except in ﬁnitely
many cases, has as an immediate corollary that Turing’s halting problem is
unsolvable. Because if we had a general method, an algorithm, for deciding
if a program will ever halt, then it would be trivial to decide if Q is elegant:
You’d just run all the programs that halt that are smaller than Q to see if
any of them produce the same output.
Corollary: There is no algorithm for solving the halting problem.
And we have also proved in two diﬀerent ways that the world of pure math
is inﬁnitely complex, that no theory of ﬁnite complexity can enable you to
determine all the elegant programs or all the bits of the halting probability
Ω.
106.
106 Chaitin: Metabiology
Corollary: The world of pure math has inﬁnite complexity, and
is therefore more like biology, the domain of the complex, than
like physics, where there is still hope of a simple, elegant theory
of everything.
Those are my software models of physics and math. Now for a software
model of evolution! These are some new ideas of mine. I’ve just started
working on this. I call it metabiology.
Our starting point is the fact that, as Jack Schwartz used to tell me,
DNA is just digital software. So we model organisms only as DNA, that
is, we consider software organisms. Remember Picasso! We’ll mutate these
software organisms, have a ﬁtness function, and see what happens!
The basic ideas are summarized here:
Metabiology
Random Walks in Software Space
Evolution of Mutating Software
Organism = Program
Single Organism
Fitness Function = Busy Beaver Problem
Mutation Distance[A, B] =
− log2 probability of mutating from A to B
It will take a while to explain all this, and to tell you how far I’ve been able
to get with this model. Basically, I have only two and a half theorems. . .
Let me start by reminding you that Darwin was inspired by the analogy
between artiﬁcial selection by animal and plant breeders and natural selec-
tion by Nature. Metabiology exploits the analogy between natural software
(DNA) and artiﬁcial software (computer programs):
METABIOLOGY: a ﬁeld parallel to biology, dealing with the ran-
dom evolution of artiﬁcial software (computer programs) rather
than natural software (DNA), and simple enough to make it pos-
sible to prove rigorous theorems or formulate heuristic arguments
at the same high level of precision that is common in theoretical
physics.
Next, I’d like to tell you how I came up with this idea. There are two
key components. One I got by reading David Berlinski’s polemical book The
107.
Algorithmic information in physics, mathematics and biology 107
Devil’s Delusion which discusses some of the arguments against Darwinian
evolution, and the other is the Busy Beaver problem, which gives our organ-
isms something challenging to do. And I should tell you about Neil Shubin’s
book Your Inner Fish. Let me explain each of these in turn.
Berlinski has an incisive discussion of perplexities with Darwinian evolu-
tion. One is the absence of intermediate forms, the other is major transitions
in evolution such as that from single-celled to multicellular organisms. But
neither of these is a problem if we consider software organisms.
Darwin himself worried about the eye; he thought that partial eyes were
useless. In fact, as a biologist explained to me, eye-like organs have evolved
independently many diﬀerent times.
Anyway, we know very well that small changes in the software can produce
drastic changes in the output. A one-bit change can destroy a program! So
the absence of intermediate forms is not a problem.
How about the transition from unicellular to multicellular? No problem,
that’s just the idea of a subroutine. You take the main program and make
it into a subroutine that you call many times. Or you fork it and run it
simultaneously in many parallel threads.
And Berlinski discusses the neutral theory of evolution and talks about
evolution as a random walk.
So that was my starting point.
But to make my software organisms evolve, I need to give them something
challenging to do. The Busy Beaver problem to the rescue! That’s the
problem of ﬁnding small, concise names for extremely large positive integers.
A program names a positive integer by calculating it and then halting.
BB(N) = the largest positive integer you can name with
a program of size ≤ N bits.
And the BB problem can utilize an unlimited amount of mathematical cre-
ativity, because it’s equivalent to Turing’s halting problem, since another
way to deﬁne BB of N is as follows:
BB(N) = the longest runtime of any program that halts that is ≤ N bits in size.
These two deﬁnitions of BB(N) are essentially equivalent.1
1
On naming large numbers, see Archimedes’ The Sand Reckoner, described in Gamow,
One, Two, Three. . . Inﬁnity. I thank Ilias Kotsireas for reminding me of this early work
on the BB problem.
108.
108 Chaitin: Metabiology
The key point is that BB(N) grows faster than any computable function
of N, because otherwise there would be an algorithm for solving the halt-
ing problem, which earlier in this lecture we showed is impossible using an
information-theoretic argument.
BB(N) grows faster than any computable function of N.
At the level of abstraction I am working with in this model, there is no
essential diﬀerence between mathematical creativity and biological creativity.
Now let’s turn to Neil Shubin’s book Your Inner Fish, that he summarized
in his article in the January 2009 special issue of Scientiﬁc American devoted
to Darwin’s bicentennial.
I don’t know much about biology, but I do have a lot of experience with
software. Besides my theoretical work, during my career at IBM I worked
on large software projects, compilers, operating systems, that kind of stuﬀ.
You can’t start a large software project over from scratch, you just patch it
and add new function as best you can.
And as Shubin spells it out, that’s exactly what Nature does too. Think
of yourself as an extremely large piece of software that has been patched
and modiﬁed for more than a billion years to adapt us to new ecological
niches. We were not designed from scratch to be human beings, to be bipedal
primates. Some of the design is like that of ﬁsh, some is like that of an
amphibian.
If you could start over, you could design human beings much better. But
you can’t start over. You have to make do with what you have as best you
can. Evolution makes the minimum changes needed to adapt to a changing
environment.
As Fran¸cois Jacob used to emphasize, Nature is a cobbler, a handyman,
a bricoleur. We are all random walks in program space!
So those are the main ideas, and now let me present my model. I have a
single software organism and I try making random mutations. These could be
high-level language mutations (copy a subroutine with a change), or low-level
mutations.
Initially, I have chosen point mutations: insert, delete or change one or
more bits. As the number of bits that are mutated increases, the probability
of the mutation drops oﬀ exponentially. Also I favor the beginning of the
program. The closer to the beginning a bit is, the more likely it is to change.
Again, that drops oﬀ exponentially.
109.
Algorithmic information in physics, mathematics and biology 109
So I try a random mutation and I see what is the ﬁtness of the resulting
organism. I am only interested in programs that calculate a single positive
integer then halt, and the bigger the integer the ﬁtter the program. So if the
mutated organism is more ﬁt, it becomes my current organism. Otherwise I
keep my current organism and continue trying mutations.
By the way, to actually do this you would need to have an oracle for the
halting problem, since you have to skip mutations that give you a program
that never halts.
There is a non-zero probability that a single mutation will take us from
an organism to any other, which ensures we will not get stuck at a local
maximum.
For the details, you can see my article in the February 2009 EATCS
Bulletin, the Bulletin of the European Association for Theoretical Computer
Science. I don’t think that the details matter too much. There are a lot of
parameters that you can vary in this model and probably still get things to
work. For example, you can change the programming language or you can
change the mutation model.
As a matter of fact, the programming language I’ve used is one of the
universal Turing machines in AIT1960 that I discussed in my Monday lecture
at Perimeter. I picked it not because it is the right choice, but because it is
a programming language I know well.
Okay, we have a random walk in software space with increasing ﬁtness.
How well will this do? I’m not sure but I’ll tell you what I can prove.
First of all, the mutation model is set up in such a way that in time of the
order of 2N
a single mutation will try adding a self-contained N-bit preﬁx
that calculates BB(N) and ignores the rest of the program; the time is the
number of mutations that have been considered. The size of the organism in
bits will grow at most as 2N
, the ﬁtness will grow as BB(N). So the ﬁtness
will grow faster than any computable function, which shows that biological
creativity is taking place; for if an organism is improved mechanically via an
algorithm and without any creativity, then its ﬁtness will only increase as a
computable function.
Theorem 1
With high probability
ﬁtness[time order of 2N
] ≥ BB(N)
which grows faster than any computable function of N.
110.
110 Chaitin: Metabiology
We can prove that evolution will occur but our proof is not very interesting
since the time involved is large enough for evolution to randomly try all
the possibilities. And, most important of all, in this proof evolution is not
cumulative. In eﬀect, we are starting from scratch each time.
Now, I think that the behavior of this model will actually be cumulative
but I can’t prove it.
However, I can show that there is what might be called an ideal evolution-
ary pathway, which is a sequence of organisms having ﬁtness that grows as
BB(N) and size in bits that grows as N, and the mutation distance between
successive organisms is bounded.
That is encouraging. It shows that there are intermediate forms that are
ﬁtter and ﬁtter. The only problem is that this pathway is unstable and not
a likely random walk; it’s an ideal evolutionary pathway.
This is my second theorem, and the organisms are reversed initial seg-
ments of the bits of the halting probability Ω (plus a ﬁxed preﬁx). If you are
given an initial portion of Ω, K bits in fact, then you can ﬁnd which ≤ K
bit programs halt and see which one produces the largest positive integer,
which is by deﬁnition BB(K). Furthermore the mutation distance between
reversed initial segments of Ω is not large because you are only adding one
bit at a time.
Theorem 2
There is a sequence of organisms OK with the property that:
OK = the ﬁrst K bits of Ω,
ﬁtness[OK] = BB(K),
mutation-distance[OK, OK+1] < c.
(If we could shield the preﬁx from mutations, and we picked as successor to
each organism the ﬁttest organism within a certain ﬁxed mutation distance
neighborhood, then this ideal evolutionary pathway would be followed.)
These are my two theorems. The half-theorem is the fact that a sequence
of software organisms with increasing ﬁtness and bounded mutation distance
does not depend on the choice of universal Turing machine, because adding
a ﬁxed preﬁx to each organism keeps the mutation distance bounded.
Theorem 2.5
That a sequence of organisms OK has the property that
mutation-distance[OK, OK+1] < c
does not depend on the choice of universal Turing machine.
111.
Algorithmic information in physics, mathematics and biology 111
Okay, so at this point there are only these 2 1/2 theorems. Not very
impressive. As I said, at this time metabiology is a ﬁeld with a lovely name
but not much content.
However, I am hopeful. I feel that some kind of evolving software model
should work. There are a lot of parameters to vary, a lot of knobs to tweak.
The question is, how biological will the behavior of these models be? In
particular I would like to know if hierarchical structure will emerge.
The human genome is a very large piece of software: about a gigabyte of
DNA.
Large computer programs must be structured, they cannot be spaghetti
code; otherwise they cannot be debugged and maintained. Software organ-
isms, I suspect, can also beneﬁt from such discipline. Then a useful muta-
tion is likely to be small and localized, rather than involve many coordinated
changes scattered throughout the organism, which is much less likely.
Software engineering practice has a lot of experience with large software
projects, which may also be relevant to randomly evolving software organ-
isms, and perhaps indirectly to biology.
Clearly, there is a lot of work to be done.
As Dobzhansky said, nothing in biology makes sense except in the light
of evolution, and I think that a randomly evolving software approach can
give us some insight.
Thank you very much!
113.
Chapter 9
To a mathematical theory of
evolution and biological
creativity
To be published in H. Zenil, Computation in Nature & The Nature of Computation, World Scientiﬁc, 2012.
Abstract: We present an information-theoretic analysis of Darwin’s
theory of evolution, modeled as a hill-climbing algorithm on a ﬁtness
landscape. Our space of possible organisms consists of computer pro-
grams, which are subjected to random mutations. We study the random
walk of increasing ﬁtness made by a single mutating organism. In two
diﬀerent models we are able to show that evolution will occur and to char-
acterize the rate of evolutionary progress, i.e., the rate of biological creativity.
Key words and phrases: metabiology, evolution of mutating soft-
ware, random walks in software space, algorithmic information theory
9.1 Introduction
For many years we have been disturbed by the fact that there is no fun-
damental mathematical theory inspired by Darwin’s theory of evolution
[1, 2, 3, 4, 5, 6, 7, 8, 9]. This is the fourth paper in a series [10, 11, 12]
attempting to create such a theory.
In a previous paper [10] we did not yet have a workable mathematical
113
114.
114 Chaitin: Metabiology
framework: We were able to prove two not very impressive theorems, and
then the way forward was blocked. Now we have what appears to be a good
mathematical framework, and have been able to prove a number of theorems.
Things are starting to work, things are starting to get interesting, and there
are many technical questions, many open problems, to work on.
So this is a working paper, a progress report, intended to promote interest
in the ﬁeld and get others to participate in the research. There is much to
be done.
In order to present the ideas as clearly as possible and not get bogged
down in technical details, the material is presented more like a physics paper
than a math paper. Estimates are at times rather sloppy. We are trying to
get an idea of what is going on. The arguments concerning the basic math
framework are however very precise; that part is done more or less like a
math paper.
9.2 History of Metabiology
In the ﬁrst paper in this series [10] we proposed modeling biological evo-
lution by studying the evolution of randomly mutating software—we call
this metabiology. In particular, we proposed considering a single mutating
software organism following a random walk in software space of increasing
ﬁtness. Besides that the main contribution of [10] was to use the Busy Beaver
problem to challenge organisms into evolving. The larger the positive integer
that a program names, the ﬁtter the program.
And we measured the rate of evolutionary progress using the Busy Beaver
function BB(N) = the largest integer that can be named by an N-bit pro-
gram. Our two results employing the framework in [10] are that
• with random mutations, random point mutations, we will get to ﬁtness
BB(N) in time exponential in N (evolution by exhaustive search) [10,
11],
• whereas by choosing the mutations by hand and applying them in the
right order, we will get to ﬁtness BB(N) in time linear in N (evolution
by intelligent design) [11, 12].
We were unable to show that cumulative evolution will occur at random;
115.
To a mathematical theory of evolution and biological creativity 115
exhaustive search starts from scratch each time.1
This paper advances beyond the previous work on metabiology [10, 11, 12,
13] by proposing a better concept of mutation. Instead of changing, deleting
or inserting one or more adjacent bits in a binary program, we now have high-
level mutations: we can use an arbitrary algorithm M to map the organism A
into the mutated organism A = M(A). Furthermore, the probability of the
mutation M is now furnished by algorithmic information theory: it depends
on the size in bits of the self-delimiting program for M. It is very important
that we now have a natural, universal probability distribution on the space
of all possible mutations, and that this is such a rich space.
Using this new notion of mutation, these much more powerful mutations,
enables us to accomplish the following:
• We are now able to show that random evolution will become cumula-
tive and will reach ﬁtness BB(N) in time that grows roughly as N2
, so
that random evolution behaves much more like intelligent design than
it does like exhaustive search.2
• We also have a version of our model in which we can show that hi-
erarchical structure will evolve, a conspicuous feature of biological
organisms that previously [10] was beyond our reach.
This is encouraging progress, and suggests that we may now have the
correct version of these biology-inspired concepts. However there are many
serious lacunae in the theory as it currently stands. It does not yet deserve
to be called a mathematical theory of evolution and biological creativity; at
best, it is a sketch of a possible direction in which such a theory might go.
On the other hand, the new results are encouraging, and we feel it would
be inappropriate to sit on these results until all the lacunae are ﬁlled. After
all, that would take an entire book, since metabiology is, or will hopefully
become, a rich and entirely new ﬁeld.
That said, the reader will understand that this is a working paper, a
progress report, to show the direction in which the theory is developing, and
1
The Busy Beaver function BB(N) grows faster than any computable function. That
evolution is able to “compute” the uncomputable function BB(N) is evidence of creativity
that cannot be achieved mechanically. This is possible only because our model of evolu-
tion/creativity utilizes an uncomputable Turing oracle. Our model utilizes the oracle in a
highly constrained manner; otherwise it would be easy to calculate BB(N).
2
Most unfortunately, it is not yet demonstrated that random evolution cannot be as
fast as intelligent design.
116.
116 Chaitin: Metabiology
to indicate problems that need to be solved in order to advance, in order
to take the next step. We hope that this paper will encourage others to
participate in developing metabiology and exploring its potential.
9.3 Modeling Evolution
9.3.1 Software Organisms
In this paper we follow a metabiological [10, 11, 12, 13] approach: Instead of
studying the evolution of actual biological organisms we study the evolution
of software subjected to random mutations. In order to do this we use tools
from algorithmic information theory (AIT) [13, 14, 15, 16, 17, 18, 19]; to
fully understand this paper expert understanding of AIT is unfor-
tunately necessary (see the outline in the Appendix).
As our programming formalism we employ one of the optimal self-
delimiting binary universal Turing machines U of AIT [14], and also, but
only in Section 9.7, a primitive FORTRAN-like language that is not univer-
sal.
So our organisms consist on the one hand of arbitrary self-delimiting
binary programs p for U, or on the other hand of certain FORTRAN-like
computer programs. These are the respective software spaces in which we
shall be working, and in which we will study hill-climbing random walks.
9.3.2 The Hill-Climbing Algorithm
In our models of evolution, we deﬁne a hill-climbing random walk as follows:
We start with a single software organism A and subject it to random mu-
tations until a ﬁtter organism A is obtained, then subject that organism
to random mutations until an even ﬁtter organism A is obtained, etc. In
one of our models, organisms calculate natural numbers, and the bigger the
number, the ﬁtter the organism. In the other, organisms calculate functions
that map a natural number into another natural number, and the faster the
function grows, the ﬁtter the organism.
In this connection, here is a useful piece of terminology: A mutation M
succeeds if A = M(A) is ﬁtter than A; otherwise M is said to fail.
117.
To a mathematical theory of evolution and biological creativity 117
9.3.3 Fitness
In order to get our software organisms to evolve it is important to present
them with a challenge, to give them something diﬃcult to do. Three well-
known problems requiring unlimited amounts of mathematical creativity are:
• Model A: Naming large natural numbers (non-negative integers) [20,
21, 22, 23],
• Model B: Deﬁning extremely fast-growing functions [24, 25, 26],
• Model C: Naming large constructive Cantor ordinal numbers [26, 27].
So a software organism will be judged to be more ﬁt if it calculates a larger
integer (our Model A, Sections 9.4, 9.5, 9.6), or if it calculates a faster-
growing function (our Model B, Section 9.7). Naming large Cantor ordinals
(Model C) is left for future work, but is brieﬂy discussed in Section 9.8.
9.3.4 What is a Mutation?
Another central issue is the concept of a mutation. Biological systems are
subjected to point mutations, localized changes in DNA, as well as to high
level mutations such as copying an entire gene and then introducing changes
in it. Initially [10] we considered mutating programs by changing, deleting
or adding one or more adjacent bits in a binary program, and postponed
working with high-level source language mutations.
Here we employ an extremely general notion of mutation: A mutation
is an arbitrary algorithm that transforms, that maps the original organism
into the mutated organism. It takes as input the organism, and produces as
output the mutated organism. And if the mutation is an n-bit program, then
it has probability 2−n
. In order to have the total probability of mutations be
≤ 1 we use the self-delimiting programs of AIT [14].3
9.3.5 Mutation Distance
A second crucial concept is mutation distance, how diﬃcult it is to get from
organism A to organism B. We measure this distance in bits and it is deﬁned
3
The total probability of mutations is actually < 1, so that each time we pick a mutation
at random, there is a ﬁxed probability that we will get the null mutation M(A) = A, which
always fails.
118.
118 Chaitin: Metabiology
to be − log2 of the probability that a random mutation will change A to B.
Using AIT [14, 15, 16], we see that this is nearly H(B|A), the size in bits of
the smallest self-delimiting program that takes A as input and produces B
as output.4
More precisely,
H(B|A) = − log2 P(B|A) + O(1) = − log2
U(p|A)=B
2−|p|
+ O(1). (9.1)
Here |p| denotes the size in bits of the program p, and U(p|A) denotes the
output produced by running p given input A on the computer U until p halts.
The deﬁnition of H(B|A) that we employ here is somewhat diﬀerent from
the one that is used in AIT: a mutation is given A directly, it is not given a
minimum-size program for A. Nevertheless, (9.1) holds [14].
Interpreting (9.1) in words, it is nearly the same to consider the simplest
mutation from A to B, which is H(B|A) bits in size and has probability
2−H(B|A)
, as to sum the probability over all the mutations that carry A into
B.
Note that this distance measure is not symmetric. For example, it is easy
to change (X, Y ) into Y , but not vice versa.
9.3.6 Hidden Use of Oracles
There are two hidden assumptions here. First of all, we need to use an oracle
to compare the ﬁtness of an organism A with that of a mutated organism A .
This is because a mutated program may not halt and thus never produces a
natural number. Once we know that the original organism A and the mutated
organism A both halt, then we can run them to see what they calculate and
which is ﬁtter.
In the case of fast-growing computable functions, an oracle is deﬁnitely
needed to see if one grows faster than another; this cannot be determined by
running the primitive recursive functions [29] calculated by the FORTRAN-
like programs that we will study later, in Section 9.7.
Just as oracles would be needed to actually ﬁnd ﬁtter organisms, they
are also necessary because a random mutation may never halt and produce a
4
Similarly, H(B) denotes the size in bits of the smallest self-delimiting program for
B that is not given A. H(B) is called the complexity of B, and H(B|A) is the relative
complexity of B given A.
119.
To a mathematical theory of evolution and biological creativity 119
mutated organism. So to actually apply our random mutations to organisms
we would need to use an oracle in order to avoid non-terminating mutations.
9.4 Model A (Naming Integers) Exhaustive Search
9.4.1 The Busy Beaver Function
The ﬁrst step in this metabiological approach is to measure the rate of evo-
lution. To do that, we introduce this version of the Busy Beaver function:
BB(N) = the biggest natural number named by a ≤ N-bit program.
More formally,
BB(N) = max
H(k)≤N
k.
Here the program-size complexity or the algorithmic information content
H(k) of k is the size in bits of the smallest self-delimiting program p without
input for calculating k:
H(k) = min
U(p)=k
|p|.
Here again |p| denotes the size in bits of p, and U(p) denotes the output
produced by running the program p on the computer U until p halts.
9.4.2 Proof of Theorem 1 (Exhaustive Search)
Now, for the sake of deﬁniteness, let’s start with the trivial program that
directly outputs the positive integer 1, and apply mutations at random.5
Let’s deﬁne the mutation time to be n if we have tried n mutations, and
the organism time to be n if there are n successive organisms of increasing
ﬁtness so far in our inﬁnite random walk.
From AIT [14] we know that there is an N + O(1)-bit mutation that
ignores its input and produces as output a ≤ N-bit program that calculates
BB(N). This mutation M has probability 2−N+O(1)
and on the average, it will
occur at random every 2N+O(1)
times a random mutation is tried. Therefore:
5
The choice of initial organism is actually unimportant.
120.
120 Chaitin: Metabiology
Theorem 1 The ﬁtness of our organism will reach BB(N) by mutation time
2N
. In other words, we will achieve N bits of biological/mathematical cre-
ativity by time 2N
. Each successive bit of creativity takes twice as long as the
previous bit did.6
More precisely, the probability that this should fail to happen, the prob-
ability that M has not been tried by time 2N
, is
1 −
1
2N
2N
→ e−1
≈
1
2.7
<
1
2
.
And the probability that it will fail to happen by mutation time K2N
is
< 1/2K
.
This is the worst that evolution can do. It is the ﬁtness that organisms
will achieve if we are employing exhaustive search on the space of all possible
organisms. Actual biological evolution is not at all like that. The human
genome has 3 × 109
bases, but in the mere 4 × 109
years of life on this planet
only a tiny fraction of the total enormous number 43×109
of sequences of
3 × 109
bases can have been tried. In other words, evolution is not ergodic.
9.5 Model A (Naming Integers) Intelligent Design
9.5.1 Another Busy Beaver Function
If we could choose our mutations intelligently, evolution would be much more
rapid. Let’s use the halting probability Ω [19] to show just how rapid. First
we deﬁne a slightly diﬀerent Busy Beaver function BB based on Ω. Con-
sider a ﬁxed recursive/computable enumeration {pi : i = 0, 1, 2 . . .} without
repetitions of all the programs without input that halt when run on U. Thus
0 < Ω = ΩU =
i
2−|pi|
< 1 (9.2)
and we get the following sequence Ω0 = 0 < Ω1 < Ω2 . . . of lower bounds on
Ω:
ΩN =
i<N
2−|pi|
. (9.3)
6
Instead of bits of creativity one could perhaps refer to bits of inspiration; said inspira-
tion of course is ultimately coming through/from our oracle, which keeps us from getting
stuck on non-terminating programs.
121.
To a mathematical theory of evolution and biological creativity 121
In (9.2) and (9.3) |p| denotes the size in bits of p, as before.
We deﬁne BB (K) to be the least N for which the ﬁrst K bits of the
base-two numerical value of ΩN are correct, i.e., the same as the ﬁrst K bits
of the numerical value of Ω. BB (K) exists because we know from AIT [14]
that Ω is irrational, so Ω = .010000 is impossible and there is no danger that
ΩN will be of the form .0011111 with 1’s forever.
Note that BB and BB are approximately equal. For we can calculate
BB (N) if we are given N and the ﬁrst N bits of Ω. Therefore
BB (N) ≤ BB(N + H(N) + c) = BB(N + O(log N)).
Furthermore, if we knew N and any M ≥ BB (N), we could calculate the
string ω of the ﬁrst N bits of Ω, which according to AIT [14] has complexity
H(ω) > N − c , so
N − c < H(ω) ≤ H(N) + H(M) + c .
Therefore BB (N) and all greater than or equal numbers M have complexity
H(M) > N − H(N) − c − c , so BB (N) must be greater than the biggest
number M0 with complexity H(M0) ≤ N − H(N) − c − c . Therefore
BB (N) > BB(N − H(N) − c − c ) = BB(N + O(log N)).
9.5.2 Improving Lower Bounds on Ω
Our model consists of arbitrary mutation computer programs operating on
arbitrary organism computer programs. To analyze the behavior of this
system (Model A), however, we shall focus on a select subset: Our organisms
are lower bounds on Ω, and our mutations increase these lower bounds.
We are going to use these same organisms and mutations to analyze
both intelligent design (Section 9.5.3) and cumulative evolution at random
(Section 9.6). Think of Section 9.5.3 versus Section 9.6 as counterpoint.
Organism Pρ — Lower Bound ρ on Ω
Now we use a bit string ρ to represent a dyadic rational number in [0, 2) =
{0 ≤ x < 2}; ρ consists of the base-two units “digit” followed by the base-two
expansion of the fractional part of this rational number.
There is a self-delimiting preﬁx πΩ that given a bit string ρ that is a
lower bound on Ω, calculates the ﬁrst N such that Ω > ΩN ≥ ρ, where ΩN
122.
122 Chaitin: Metabiology
is deﬁned as in (9.3).7
If we concatenate the preﬁx πΩ with the string of bits
ρ, and insert 0|ρ|
1 in front of ρ in order to make everything self-delimiting,
we obtain a program Pρ for this N.
We will now analyze the behavior of Model A by using these organisms
of the form
Pρ = πΩ 0|ρ|
1ρ. (9.4)
To repeat, the output of Pρ, and therefore its ﬁtness φPρ , is determined as
follows:
U(Pρ) = the ﬁrst N for which
i<N
2−|pi|
= ΩN ≥ ρ. (9.5)
This ﬁtness will be ≥ BB (K) if ρ < Ω and the ﬁrst K bits of ρ are the
correct base-two numerical value of Ω. Pρ will fail to halt if ρ > Ω.8
Mutation Mk — Lower Bound ρ on Ω Increased by 2−k
Consider the mutations Mk that do the following. First of all, Mk computes
the ﬁtness φ of the current organism A by running A to determine the integer
φ = φA that A names. All that Mk takes from A is its ﬁtness φA. Then
Mk computes the corresponding lower bound on Ω:
ρ =
i<φ
2−|pi|
= Ωφ.
Here {pi} is the standard enumeration of all the programs that halt when
run on U that we employed in Section 9.5.1. Then Mk increments the lower
bound ρ on Ω by 2−k
:
ρ = ρ + 2−k
.
In this way Mk obtains the mutated program
A = Pρ .
A will fail to halt if ρ > Ω. If A does halt, then A = Mk(A) = Pρ will
have ﬁtness N(see (9.5)) greater than φA = φ because ρ > ρ = Ωφ, so more
halting programs are included in the sum (9.3) for ΩN , which therefore has
been extended farther:
[ΩN ≥ ρ > ρ = Ωφ] =⇒ [N > φ].
7
That ρ = Ω follows from the fact that Ω is irrational.
8
That ρ = Ω follows from the fact that Ω is irrational.
123.
To a mathematical theory of evolution and biological creativity 123
Therefore if Ω > ρ = ρ + 2−k
, then Mk increases the ﬁtness of A.
If ρ > Ω, then Pρ = Mk(A) never halts and is totally unﬁt.
9.5.3 Proof of Theorem 2 (Intelligent Design)
Please note that in this toy world, the “intelligent designer” is the author of
this paper, who chooses the mutations optimally in order to get his creatures
to evolve.
Let’s now start with the computer program Pρ with ρ = 0. In other
words, we start with a lower bound on Ω of zero.
Then for k = 1, 2, 3 . . . we try applying Mk to Pρ. The mutated organism
Pρ = Mk(Pρ) will either fail to halt, or it will have higher ﬁtness than our
previous organism and will replace it. Note that in general ρ = ρ + 2−k
,
although it could conceivably have that value. Mk will from Pρ take only its
ﬁtness, which is the ﬁrst N such that ΩN ≥ ρ.
ρ = ΩN + 2−k
≥ ρ + 2−k
.
So ρ is actually equal to a lower bound on Ω, ΩN , plus 2−k
. Thus Mk will
attempt to increase a lower bound on Ω, ΩN , by 2−k
. Mk will succeed if
Ω > ρ . Mk will fail if ρ > Ω. This is the situation at the end of stage k.
Then we increment k and repeat. The lower bounds on Ω will get higher and
higher.
More formally, let O0 = Pρ with ρ = 0. And for k ≥ 1 let
Ok =
Ok−1 if Mk fails,
Mk(Ok−1) if Mk succeeds.
Each Ok is a program of the form Pρ with Ω > ρ.
At the end of stage k in this process the ﬁrst k bits of ρ will be exactly
the same as the ﬁrst k bits of Ω, because at that point all together we have
tried summing 1/2+1/4+1/8 · · ·+1/2k
to ρ. In essence, we are using an
oracle to determine the value of Ω by successive interval halving.9
In other words, at the end of stage k the ﬁrst k bits of ρ in Ok are correct.
Hence:
9
That this works is easy to see visually. Think of the unit interval drawn vertically,
with 0 below and 1 above. The intervals are being pushed up after being halved, but it is
still the case that Ω remains inside each halved interval, even after it has been pushed up.
124.
124 Chaitin: Metabiology
Theorem 2 By picking our mutations intelligently rather than at random,
we obtain a sequence ON of software organisms with non-decreasing ﬁtness10
for which the ﬁtness of each organism is ≥ BB (N). In other words, we will
achieve N bits of biological/mathematical creativity in mutation time linear
in N. Each successive bit of creativity takes about as long as the previous bit
did.
However, successive mutations must be tried at random in our evolution
model; they cannot be chosen deliberately. We see in these two theorems two
extremes: Theorem 1, brainless exhaustive search, and Theorem 2, intelligent
design. What can real, random evolution actually achieve? We shall see that
the answer is closer to Theorem 2 than to Theorem 1. We will achieve ﬁtness
BB (N) in time roughly order of N2
. In other words, each successive bit of
creativity takes an amount of time which increases linearly in the number of
bits.
Open Problem 1 Is this the best that can be done by picking the mutations
intelligently rather than at random? Or can creativity be even faster than
linear? Does each use of the oracle yield only one bit of creativity? 11
Open Problem 2 In Theorem 2 how fast does the size in bits of the or-
ganism ON grow? By using entirely diﬀerent mutations intelligently, would
it be possible to have the size in bits of the organism ON grow linearly, or,
alternatively, for the mutation distance between ON and ON+1 to be bounded,
and still achieve the same rapid growth in ﬁtness?
Open Problem 3 In Theorem 2 how many diﬀerent organisms will there
be by mutation time N? I.e., on the average how fast does organism time
grow as a function of mutation time?
9.6 Model A (Naming Integers) Cumulative Evolu-
tion at Random
Now we shall achieve what Theorem 2 achieved by intelligent design, by using
randomness instead. Since the order of our mutations will be random, not
10
Note that this is actually a legitimate ﬁtness increasing (non-random) walk because
the ﬁtness increases each time that ON changes, i.e., each time that ON+1 = ON .
11
Yes, only one bit of creativity, otherwise Ω would be compressible. In fact, the
sequence of oracle replies must be incompressible.
125.
To a mathematical theory of evolution and biological creativity 125
intelligent, there will be some duplication of eﬀort and creativity is
delayed, but not overmuch.
In other words, instead of using the mutations Mk in a predetermined
order, they shall be picked at random, and also mixed together with other
mutations that increase the ﬁtness.
As you will recall (Section 9.5.2), a larger and larger positive integer is
equivalent to a better and better lower bound on Ω. That will be our clock,
our memory. We will again be evolving better and better lower bounds ρ on
Ω and we shall make use of the organisms Pρ as before ((9.4), Section 9.5.2).
We will also use again the mutations Mk of Section 9.5.2.
Let’s now study the behavior of the random walk in Model A if we start
with an arbitrary program A that has a ﬁtness, for example, the program
that is the constant 0, and apply mutations to it at random, according to
the probability measure on mutations determined by AIT [14], namely that
M has probability 2−H(M)
.12
So with probability one, every mutation will
be tried inﬁnitely often; M will be tried roughly every 2H(M)
mutation
times.
At any given point in this random walk, we can measure our progress to
Ω by the ﬁtness φ = φA of our current organism A and the corresponding
lower bound Ωφ = ΩφA
on Ω. Since the ﬁtness φ can only increase, the lower
bound Ωφ can only get better.
In our analysis of what will happen we focus on the mutations Mk; other
mutations will have no eﬀect on the analysis. They are harmless and can
be mixed in together with the Mk. By increasing the ﬁtness, they can only
make Ωφ converge to Ω more quickly.
We also need a new mutation M∗
. M∗
doesn’t get us much closer to Ω,
it just makes sure that our random walk will contain inﬁnitely many of the
programs Pρ. M∗
will be tried roughly periodically during our random walk.
M∗
takes the current lower bound Ωφ = ΩφA
on Ω, and produces
A = M∗
(A) = PΩ1+φA
.
A has ﬁtness 1 greater than the ﬁtness of A and thus mutation M∗
will always
succeed, and this keeps lots of organisms of the form Pρ in our random walk.
Let’s now return to the mutations Mk, each of which will also have to be
tried inﬁnitely often in the course of our random walk.
12
This is a convenient lower bound on the probability of a mutation. A more precise
value for the probability of jumping from A to A is 2−H(A |A)
.
126.
126 Chaitin: Metabiology
The mutation Mk will either have no eﬀect because Mk(A) fails to halt,
which means that we are less than 2−k
away from Ω, that is, ΩφA
is less
than 2−k
away from Ω, or Mk will have the eﬀect of incrementing our lower
bound ΩφA
on Ω by 2−k
. As more and more of these mutations Mk are tried
at random, eventually, purely by chance, more and more of the beginning
of ΩφA
will become correct (the same as the initial bits of Ω). Meanwhile,
the ﬁtness φA will increase enormously, passing BB (n) as soon as the ﬁrst
n bits of ΩφA
are correct. And soon afterwards, M∗
will package this in an
organism A = PΩ1+φA
.
How long will it take for all this to happen? I.e., how long will it take to
try the Mk for k = 1, 2, 3, . . . , n and then try M∗
? We have
H(Mk) ≤ H(k) + c.
Therefore mutation Mk has probability
≥ 2−H(k)−c
>
1
c k(log k)1+
(9.6)
since
k
1
k(log k)1+
converges.13
The mutation Mk will be tried in time proportional to 1 over the
probability of its being tried, which by (9.6) is approximately upper bounded
by
ξ(k) = c k(log k)1+
. (9.7)
On the average, from what point on will the ﬁrst n bits of Ωφ = ΩφA
be
the same as the ﬁrst n bits of Ω? We can be sure this will happen if we
ﬁrst try M1, then afterwards M2, then M3, etc. through Mn, in that order.
Note that if these mutations are tried in the wrong order, they will not have
the desired eﬀect. But they will do no harm either, and eventually will also
be tried in the correct order. Note that it is conceivable that none of these
Mk actually succeed, because of the other random mutations that were in
the mix, in the melee. These other mutations may already have pushed us
within 2−k
of Ω. So these Mk don’t have to succeed, they just have to be
tried. Then M∗
will make sure that we get an organism of the form Pρ with
at least n bits of ρ correct.
13
We are using here one of the basic theorems of AIT [14].
127.
To a mathematical theory of evolution and biological creativity 127
Hence:
Expected time to try M1 ≤ ξ(1)
Expected time to then afterwards try M2 ≤ ξ(2)
Expected time to then afterwards try M3 ≤ ξ(3)
. . .
Expected time to then afterwards try Mn ≤ ξ(n)
Expected time to then afterwards try M∗
≤ c
∴ Expected time to try M1, M2, M3 . . . Mn, M∗
in order ≤ k≤n ξ(k) + c
Using (9.7), we see that this is our extremely rough “ball-park” estimate
on a mutation time suﬃciently big for the ﬁrst n bits of ρ in Pρ = M∗
(A) to
be the correct bits of Ω:
k≤n
ξ(k) + c =
k≤n
c k(log k)1+
+ c = O(n2
(log n)1+
). (9.8)
Hence we expect that in time O(n2
(log n)1+
) our random walk will include
an organism Pρ in which the ﬁrst n bits of ρ are correct, and so Pρ will
compute a positive integer ≥ BB (n), and thus at this time the ﬁtness will
have to be at least that big:
Theorem 3 In Model A with random mutations, the ﬁtness of the organisms
Pρ = M∗
(A) will reach BB (N) by mutation time roughly N2
.
Note that since the bits of ρ in the organisms Pρ = M∗
(A) are becoming
better and better lower bounds on Ω, these organisms in eﬀect contain their
evolutionary history. In Model A, evolution is cumulative, it does not
start over from scratch as in exhaustive search.
It should be emphasized that in the course of such a hill-climbing random
walk, with probability one every possible mutation will be tried inﬁnitely of-
ten. However the mutations Mk will immediately recover from perturbations
and set the evolution back on course. In a sense the system is self-organizing
and self-repairing. Similarly, the initial organism is irrelevant.
Also note that with probability one the time history or evolutionary path-
way (i.e., the random walk in Model A) will quickly grow better and better
approximations to all possible halting probabilities ΩU (see (9.2)) determined
by any optimal universal self-delimiting binary computer U , not just for our
128.
128 Chaitin: Metabiology
original U. Furthermore, some mutations will periodically convert our organ-
ism into a numerical constant for its ﬁtness φ, and there will even be arbitrar-
ily long chains of successive numerical constant organisms φ, φ + 1, φ + 2 . . .
The microstructure and ﬂuctuations that will occur with probability one are
quite varied and should perhaps be studied in detail to unravel the full zoo
of organisms and their interconnections; this is in eﬀect a kind of miniature
mathematical ecology.
Open Problem 4 Study this mathematical ecology.
Open Problem 5 Improve the estimate (9.8) and get a better upper bound
on the expected time it will take to try M1, M2, M3 through Mn and M∗
in
that order. Besides the mean, what is the variance?
Open Problem 6 Separate random evolution and intelligent design: We
have shown that random evolution is fast, but can you prove that it cannot
be as fast as intelligent design? I.e., we have a lower bound on the speed
of random evolution, and now we also need an upper bound. This is prob-
ably easier to do if we only consider random mutations Mk and keep other
mutations from mixing in.
Open Problem 7 In Theorem 3 how fast does the size in bits of the organ-
ism Pρ grow? Is it possible to have the size in bits of the organism Pρ grow
linearly and still achieve the same rapid growth in ﬁtness?
Open Problem 8 It is interesting to think of Model A as a conventional
random walk and to study the average mutation distance between an organism
A and its successor A , its second successor A , etc. In organism time ∆t
how far will we get from A on the average? What will the variance be?
9.7 Model B (Naming Functions)
Let’s now consider Model B. Why study Model B? Because hierarchical struc-
ture is a conspicuous feature of actual biological organisms, but it is impossi-
ble to prove that such structure must emerge by random evolution in Model
A.
Why not? Because the programming language used by the organisms in
Model A is so powerful that all structure in the programs can be hidden.
129.
To a mathematical theory of evolution and biological creativity 129
Consider the programs Pρ deﬁned in Section 9.5.2 and used to prove Theo-
rems 2 and 3. As we saw in Theorem 3, these programs Pρ evolve without
limit at random. However, Pρ consists of a ﬁxed preﬁx πΩ followed by a
lower bound on Ω, ρ, and what evolves is the lower bound ρ, data which
has no visible hierarchical structure, not the preﬁx πΩ, code which has ﬁxed,
unevolving, hierarchical structure.
So in Model A it is impossible to prove that hierarchical structure will
emerge and increase in depth. To be able to do this we must utilize a less
powerful programming language, one that is not universal and in which the
hierarchical structure cannot be hidden: the Meyer-Ritchie LOOP language
[28].
We will show that the nesting depth of LOOP programs will increase
without limit, due to random mutations. This also provides a much more
concrete example of evolution than is furnished by our main model, Model
A.
Now for the details.
We study the evolution of functions f(x) of a single integer argument x;
faster growing functions are taken to be ﬁtter. More precisely, if f(x) and
g(x) are two such functions, f is ﬁtter than g iﬀ g/f → 0 as x → ∞. We use
an oracle to decide if A = M(A) is ﬁtter than A; if not, A is not replaced
by A .14
The programming language we are using has the advantage that
program structure cannot be hidden. It’s a programming language that is
powerful enough to program any primitive recursive function [29], but it’s
not a universal programming language.
To give a concrete example of hierarchical evolution, we use the extremely
simple Meyer-Ritchie LOOP programming language, containing only assign-
ment, addition by 1, do loops, and no conditional statements or subroutines.
All variables are natural numbers, non-negative integers. Here is an example
of a program written in this language:
14
An oracle is needed in order to decide whether g(x)/f(x) → 0 as x → ∞ and also
to avoid mutations M that never produce an A = M(A). Furthermore, if a mutation
produces a syntactically invalid LOOP program A , A does not replace A.
130.
130 Chaitin: Metabiology
// Exponential: 2 to the Nth power
// with only two nested do loops!
function(N) // Parameter must be called N.
M = 1
//
do N times
M2 = 0
// M2 = 2 * M
do M times
M2 = M2 + 1
M2 = M2 + 1
end do
M = M2
end do
// Return M = 2 to the Nth power.
return_value = M
// Last line of function must
// always set return_value.
end function
More generally, let’s start with f0(x) = 2x:
function(N) // f_0(N)
M = 0
// M = 2 * N
do N times
M = M + 1
M = M + 1
end do
return_value = M
end function // end f_0(N)
Note that the nesting depth of f0 is 1.
And given a program for the function fk, here is how we program
fk+1(x) = fx
k (2) (9.9)
by increasing the nesting depth of the program for fk by 1:
131.
To a mathematical theory of evolution and biological creativity 131
function(N) // f_(k+1)(N)
M = 2
// do M = f_k(M) N times
do N times
N_ = M
// Insert program for f_k here
// with "function" and "end function"
// stripped and all variable names
// renamed to variable name_
M = return_value_
end do
return_value = M
end function // end f_(k+1)(N)
So following (9.9) we now have programs for
f0(x) = 2x, f1(x) = 2x
, f2(x) = 222...
with x 2’s . . .
Note that a program in this language which has nesting depth 0 (no do
loops) can only calculate a function of the form (x+a constant), and that the
depth 1 function f0(x) = 2x grows faster than all of these depth 0 functions.
More generally, it can be proven by induction [29] that a program in this
language with do loop nesting depth ≤ k deﬁnes functions that grow more
slowly than fk, which is deﬁned by a depth k+1 LOOP program. This is the
basic theorem of Meyer and Ritchie [28] classifying the primitive recursive
functions according to their rates of growth.
Now consider the mutation M that examines a software organism A writ-
ten in this LOOP language to determine its nesting depth n, and then re-
places A by A = fn(x), a function that grows faster than any LOOP func-
tion with depth ≤ n. Mutation M will be tried at random with probability
≥ 2−H(M)
. And so:
Theorem 4 In Model B, the nesting depth of a LOOP function will increase
by 1 roughly periodically, with an estimated mutation time of 2H(M)
between
successive increments. Once mutation M increases the nesting depth, it will
remain greater than or equal to that increased depth, because no LOOP func-
tion with smaller nesting depth can grow as fast.
Note that this theorem works because the nesting depth of a primitive
recursive function is used as a clock; it gives Model B memory that can be
used by intelligent mutations like M.
132.
132 Chaitin: Metabiology
Open Problem 9 In the proof of Theorem 4, is the mutation M primitive
recursive, and if so, what is its LOOP nesting depth?
Open Problem 10 M can actually increase the nesting depth extremely
fast. Study this.
Open Problem 11 Formulate a version of Theorem 4 in terms of subrou-
tine nesting instead of do loop nesting. What is a good computer programming
language to use for this?
9.8 Remarks on Model C (Naming Ordinals)
Now let’s brieﬂy turn to programs that compute constructive Cantor ordinal
numbers α [27]. From a biological point of view, the evolution of ordinals is
piquant, because they certainly exhibit a great deal of hierarchical structure.
Not, in eﬀect, as we showed in Section 9.7 must occur in the genotype; here
it is automatically present in the phenotype.
Ordinals also seem like an excellent choice for an evolutionary model
because of their fundamental role in mathematics15
and because of the mys-
tique associated with naming large ordinals, a problem which can utilize an
unlimited amount of mathematical creativity [26, 27]. Conventional ordinal
notations can only handle an initial segment of the constructive ordinals.
However there are two fundamentally diﬀerent ways [27] to use algorithms
to name all such ordinals α:
• An ordinal is a program that given two positive integers, tells us which
is less than the other in a well-ordering of the positive integers with
order type α.
• An ordinal α is a program for obtaining that ordinal from below: If it
is a successor ordinal, as β + 1; if it is a limit ordinal, as the limit of a
fundamental sequence βk (k = 0, 1, 2 . . .).
This yields two diﬀerent deﬁnitions of the algorithmic information content
or program-size complexity of a constructive ordinal:
15
As an illustration of this, ordinals may be used to extend the function hierarchy fk
of Section 9.7 to transﬁnite k. For example, fω(x) = fx(x), fω+1(x) = fx
ω (2), fω+2(x) =
fx
ω+1(2) . . . fω×2(x) = fω+x(x), etc., an extension of (9.9).
133.
To a mathematical theory of evolution and biological creativity 133
H(α) = the size in bits of the smallest self-delimiting program
for calculating α.
We can now deﬁne this beautiful new version of the Busy Beaver function:
BBord(N) = max
H(α)≤N
α.
In order to make programs for ordinals α evolve, we now need to use
a very sophisticated oracle, one that can determine if a program computes
an ordinal and, given two such programs, can also determine if one of these
ordinals is less than the other. Assuming such an oracle, we get the following
version of Theorem 1, merely by using brainless exhaustive search:
Theorem 5 The ﬁtness of our ordinal organism α will reach BBord(N) by
mutation time 2N
.
Can we do better than this? The problem is to determine if there is some
kind of Ω number or other way to compress information about constructive
ordinals so that we can improve on Theorem 5 by proving that evolution
will probably reach BBord(N) in an amount of time which does not grow
exponentially.
We suspect that Model C may be an example of a case in which cumulative
evolution at random does not occur. On the other hand, we are given an
extremely powerful oracle; maybe it is possible to take advantage of that.
The problem is open.
Open Problem 12 Improve on Theorem 5 or show that no improvement is
possible.
9.9 Conclusion
At this point we should look back and ask why this all worked. Mainly for
the following reason: We used an extremely rich space of possible mutations,
one that possess a natural probability distribution: the space of all possible
self-delimiting programs studied by AIT [14]. But the use of such powerful
mutational mechanisms raises a number of issues.
Presumably DNA is a universal programming language, but how sophis-
ticated can mutations be in actual biological organisms? In this connection,
134.
134 Chaitin: Metabiology
note that evo-devo views DNA as software for constructing the embryo, and
that the change from single-celled to multicellular organisms is roughly like
taking a main program and making it into a subroutine, which is a fairly
high-level mutation. Could this be the reason that it took so long—on the
order of 109
years—for this to happen?16
The issue of balance between the power of the organisms and the power
of the mutations is an important one. In the current version of the theory,
both have equal power, but as a matter of aesthetics it would be bad form for
a proof to overemphasize the mutations at the expense of the organisms. In
future versions of the theory perhaps it will be desirable to limit the power
of mutations in some manner by ﬁat.
In this connection, note that there are two uses of oracles in this theory,
one to decide which of two organisms is ﬁtter, and another to eliminate non-
terminating mutations. It is perfectly ﬁne for a proof to be based on taking
advantage of the oracle for organisms, but taking advantage of the oracle for
mutations is questionable.
We have by no means presented in this paper a mathematical theory of
evolution and biological creativity comme il faut. But at this point in time we
believe that metabiology is still a possible contender for such a theory. The
ultimate goal must be to ﬁnd in the Platonic world of mathematical ideas
that ideal model of evolution by natural selection which real, messy biological
evolution can but approach asymptotically in the limit from below.
We thank Prof. Cristian Calude of the University of Auckland for reading
a draft of this paper, for his helpful comments, and for providing the paper
by Meyer and Ritchie [28].
Appendix. AIT in a Nutshell
Programming languages are commonly universal, that is to say, capable of
expressing essentially any algorithm.
In order to be able to combine subroutines, i.e., for algorithmic informa-
tion to be subadditive,
size of program to calculate x and y
≤ size of program to calculate x
+ size of program to calculate y,
16
During most of the history of the earth, life was unicellular.
135.
To a mathematical theory of evolution and biological creativity 135
it is important that programs be self-delimiting. This means that the uni-
versal computer U reads a program bit by bit as required and there is no
special delimiter to mark the end of the program; the computer must decide
by itself where to stop reading.
More precisely, if programs are self-delimiting we have
H(x, y) ≤ H(x) + H(y) + c,
where H(. . .) denotes the size in bits of the smallest program for U to cal-
culate . . . , and c is the number of bits in the main program that reads and
executes the subroutine for x followed by the subroutine for y.
Besides giving us subadditivity, the fact that programs are self-delimiting
also enables us to talk about that probability P(x) that a program that is
generated at random will compute x when run on U.
Let’s now consider how expressive diﬀerent programming languages can
be. Given a particular programming language U, two important things to
consider are the program-size complexity H(x) as a function of x, and the
corresponding algorithmic probability P(x) that a program whose bits are
chosen using independent tosses of a fair coin will compute x.
We are thus led to select a subset of the universal languages that minimize
H and maximize P; one way to deﬁne such a language is to consider a
universal computer U that runs self-delimiting binary computer programs
πC p deﬁned as follows:
U(πC p) = C(p).
In other words, the result of running on U the program consisting of the
preﬁx πC followed by the program p, is the same as the result of running p
on the computer C. The preﬁx πC tells U which computer C to simulate.
Any two such maximally expressive universal languages U and V will
necessarily have
|HU (x) − HV (x)| ≤ c
and
PU (x) ≥ PV (x) × 2−c
, PV (x) ≥ PU (x) × 2−c
.
It is in this precise sense that such a universal U minimizes H and maximizes
P.
For such languages U it will be the case that
H(x) = − log2 P(x) + O(1),
136.
136 Chaitin: Metabiology
which means that most of the probability of calculating x is concentrated
on the minimum-size program for doing this, which is therefore essentially
unique. O(1) means that the diﬀerence between the two sides of the equation
is order of unity, i.e., bounded by a constant.
Furthermore, we have
H(x, y) = H(x) + H(y|x) + O(1).
Here H(y|x) is the size of the smallest program to calculate y from x.17
This
tells us that essentially the best way to calculate x and y is to calculate x
and then calculate y from x. In other words, the joint complexity of x and y
is essentially the same as the absolute complexity of x added to the relative
complexity of y given x.
This decomposition of the joint complexity as a sum of absolute and
relative complexities implies that the mutual information content
H(x : y) ≡ H(x) + H(y) − H(x, y),
which is the extent to which it is easier to compute x and y together rather
than separately, has the property that
H(x : y) = H(x) − H(x|y) + O(1) = H(y) − H(y|x) + O(1).
In other words, H(x : y) is also the extent to which knowing y helps us to
know x and vice versa.
Last but not least, using such a maximally expressive U we can deﬁne
the halting probability Ω, for example as follows:
Ω = 2−|p|
summed over all programs p that halt when run on U, or alternatively
Ω = 2−H(n)
summed over all positive integers n, which has a slightly diﬀerent numerical
value but essentially the same paradoxical properties.
What are these properties? Ω is a form of concentrated mathematical
creativity, or, alternatively, a particularly economical Turing oracle for the
17
It is crucial that we are not given x directly. Instead we are given a minimum-size
program for x.
137.
To a mathematical theory of evolution and biological creativity 137
halting problem, because knowing n bits of the dyadic expansion of Ω enables
one to solve the halting problem for all programs p which compute a positive
integer that are up to n bits in size. It follows that the bits of the dyadic
expansion of Ω are irreducible mathematical information; they cannot be
compressed into a theory smaller than they are.18
From a philosophical point of view, however, the most striking thing
about Ω is that it provides a perfect simulation in pure mathematics, where
all truths are necessary truths, of contingent, accidental truths—i.e., of truths
such as historical facts or biological frozen accidents.
Furthermore, Ω opens a door for us from mathematics to biology. The
halting probability Ω contains inﬁnite irreducible complexity and in a sense
shows that pure mathematics is even more biological then biology itself,
which merely contains extremely large ﬁnite complexity. For each bit of the
dyadic expansion of Ω is one bit of independent, irreducible mathematical
information, while the human genome is merely 3 × 109
bases = 6 × 109
bits
of information.
18
More precisely, it takes a formal axiomatic theory of complexity ≥ n−c (one requiring
a ≥ n − c bit program to enumerate all its theorems) to enable us to determine n bits of
Ω.
139.
Bibliography
[1] D. Berlinski, The Devil’s Delusion, Crown Forum, 2008.
[2] S. J. Gould, Wonderful Life, Norton, 1990.
[3] N. Shubin, Your Inner Fish, Pantheon, 2008.
[4] M. Mitchell, Complexity, Oxford University Press, 2009.
[5] J. Fodor, M. Piattelli-Palmarini, What Darwin Got Wrong, Farrar,
Straus and Giroux, 2010.
[6] S. C. Meyer, Signature in the Cell, HarperOne, 2009.
[7] J. Maynard Smith, Shaping Life, Yale University Press, 1999.
[8] J. Maynard Smith, E. Szathm´ary, The Origins of Life, Oxford University
Press, 1999; The Major Transitions in Evolution, Oxford University
Press, 1997.
[9] F. Hoyle, Mathematics of Evolution, Acorn, 1999.
[10] G. J. Chaitin, “Evolution of mutating software,” EATCS Bulletin 97
(February 2009), pp. 157–164.
[11] G. J. Chaitin, “Metaphysics, metamathematics and metabiology,” in
H. Zenil, Randomness Through Computation, World Scientiﬁc, in press.
(Draft at http://www.umcs.maine.edu/~chaitin/lafalda.pdf.)
[12] G. J. Chaitin, Mathematics, Complexity and Philosophy, Midas,
in press. (Draft at http://www.umcs.maine.edu/~chaitin/midas.html.)
(See Chapter 3, “Algorithmic Information as a Fundamental Concept in
Physics, Mathematics and Biology.”)
139
140.
140 Chaitin: Metabiology
[13] G. J. Chaitin, Chapter “Complexity, Randomness” in Chaitin, Costa,
Doria, After G¨odel, in preparation. (Draft at http://www.umcs.maine.
edu/~chaitin/bookgoedel_2.pdf.)
[14] G. J. Chaitin, “A theory of program size formally identical to informa-
tion theory,” J. ACM 22 (1975), pp. 329–340.
[15] G. J. Chaitin, Algorithmic Information Theory, Cambridge University
Press, 1987.
[16] G. J. Chaitin, Exploring Randomness, Springer, 2001.
[17] C. S. Calude, Information and Randomness, Springer-Verlag, 2002.
[18] M. Li, P. M. B. Vit´anyi, An Introduction to Kolmogorov Complexity and
Its Applications, Springer, 2008.
[19] C. Calude, G. Chaitin, “What is a halting probability?,” AMS Notices
57 (2010), pp. 236–237.
[20] H. Steinhaus, Mathematical Snapshots, Oxford University Press, 1969,
pp. 29–30.
[21] D. E. Knuth, “Mathematics and computer science: Coping with ﬁnite-
ness,” Science 194 (1976), pp. 1235–1242.
[22] A. Hodges, One to Nine, Norton, 2008, pp. 246–249; M. Davis, The
Universal Computer, Norton, 2000, pp. 169, 235.
[23] G. J. Chaitin, “Computing the Busy Beaver function,” in T. M. Cover,
B. Gopinath, Open Problems in Communication and Computation,
Springer, 1987, pp. 108–112.
[24] G. H. Hardy, Orders of Inﬁnity, Cambridge University Press, 1910. (See
Theorem of Paul du Bois-Reymond, p. 8.)
[25] D. Hilbert, “On the inﬁnite,” in J. van Heijenoort, From Frege to G¨odel,
Harvard University Press, 1967, pp. 367–392.
[26] J. Stillwell, Roads to Inﬁnity, A. K. Peters, 2010.
141.
To a mathematical theory of evolution and biological creativity 141
[27] H. Rogers, Jr., Theory of Recursive Functions and Eﬀective Computabil-
ity, MIT Press, 1987. (See Chapter 11, especially Sections 11.7, 11.8 and
the exercises for these two sections.)
[28] A. R. Meyer, D. M. Ritchie, “The complexity of loop programs,” Pro-
ceedings ACM National Meeting, 1967, pp. 465–469.
[29] C. Calude, Theories of Computational Complexity, North-Holland, 1988.
(See Chapters 1, 5.)
143.
Chapter 10
Parsing the Turing test
Journal of Scientiﬁc Exploration 23 (2009), pp. 530–534.
Parsing the Turing Test: Philosophical and Methodological
Issues in the Quest for the Thinking Computer edited by Robert
Epstein, Gary Roberts and Grace Beber. Springer, 2009. xxiii + 517 pp.
$199.00 (hardcover). ISBN 9781402067082.
This big, expensive book oﬀers much food for thought. This review will
be a reaction to the ﬁrst editor’s introduction, plus the clever reverse Turing
test in Chapter 28 by Charles Platt with machines attempting to determine
if humans have any intelligence. Basically, based on my sample of these two
chapters, this book is a celebration of the coming extinction of the human
race. I shall play the devil’s advocate, and also take a meta perspective
on the book, analyzing its signiﬁcance as a social phenomenon instead of
considering its contents.
Turing’s famous paper on the imitation game (reprinted and annotated
in this book), a remote conversation with a computer attempting to prove
it is human, in addition to its intellectual ﬁreworks, reﬂects the fact that
Turing, as the French say, “felt uncomfortable in his skin,” both as a male
and as a human being. As this book indicates, this has now become part of
the zeitgeist and a general social problem.
The general attitude I see here reminds me of remarks by Marvin Minsky
I heard many years ago, when he called human beings “meat machines,”
and described the human race as a carbon-based life-form that was creating
a silicon-based life-form that would replace it. At the time, his remarks
seemed a bit mad, but now many people seem to feel that way.
143
144.
144 Chaitin: Metabiology
Why is this? Well, our current society attempts to make people into
machines, it behaves as if human beings were ants or bees. We are being
forced to live in an anthill, beehive society. Obviously machines are better at
being machines than we are, and humans feel ill-suited for anthill or beehive
life. Human beings are made to feel obsolete, has-beens.
Robert Epstein’s introduction argues that a super-human intelligence is
inevitable and not far oﬀ in time, and that at best we shall be slaves or pets
for the machine, at worse exterminated as annoying insects.
The authors are well aware of the amazing advances in computer technol-
ogy that they believe make this possible, but perhaps they are less aware of
the fact that the more we understand about organisms, the more molecular
biology progresses, the more amazing living beings seem. The cells in the
human body were originally autonomous living beings that have now banded
together much like the citizens in a nation or the employees in a corporation.
An individual cell is amazingly sophisticated, and, it seems to me, is best
compared with a computer or even with an entire city.
So our artiﬁcial machines may not catch up with Nature’s machines for
a while. Can a century of human engineering compare with billions of years
of evolution, essentially an immense parallel-processing molecular-level com-
putation going on throughout the entire biosphere?
In a more optimistic scenario we are not exterminated, the machines will
be our servants. Isaac Asimov thought that in the future human beings might
live like ancient Greek aristocrats with robotic slaves.
Yes, machines can calculate better than we can, and remember things
better than we can. Should we be very upset? Railroad trains go faster than
a person can run, a steam-shovel can move earth quicker than a person, an
airplane can ﬂy. But human beings made those machines, and should be
proud of it. Are we upset about the fact that we need to wear clothing in
the winter? Not at all. People are not very fast, not very strong, they do
not have fur or a tough hide, but they are extremely curious, clever, and
imaginative, ﬂexible and adaptable. Like the universal Turing machine, we
are generalists, not specialists. We are not optimized for any particular little
ecological niche.
It is also possible that eventually enhanced humans and humanized ma-
chines will become nearly indistinguishable, which doesn’t sound too bad to
me. It’s much like wearing clothing or using a can-opener.
But maybe none of this will happen. Another possibility is that machine
intelligences will remain unconscious zombies, monstrous golems lacking a
145.
Parsing the Turing test 145
divine spark, a human soul. For we are products of George Bernard Shaw’s
life-force, of Henri Bergson’s “´elan vital”, and machines are not. This is of
course not a fashionable view in our secular times, but let me try to give a
contemporary version of this argument, one designed for modern sensibilities.
First of all, quantum mechanics, a branch of fundamental physics, has
been telling us that the Schrdinger Psi function is real, more real than the
particles it describes. Electrons in atoms are expressed as probability waves
that interfere constructively and destructively. Atoms are like musical in-
struments.
Whatever the Psi function is, it is not material. It is more like an idea,
and therefore gives support to those Platonic idealist philosophies that view
spirit as more fundamental, more real, than matter. Of course, this is not
a fashionable interpretation. Nonetheless Nature is giving us this hint loud
and clear, even if we refuse to listen.
The latest version of quantum mechanics, now called quantum informa-
tion theory, reformulates “classical” 1920s quantum mechanics in terms of
qubits of information; information is certainly not matter. In my opinion
quantum information theory is even less materialist than classical quantum
mechanics.
Consciousness, quite mysterious at this time, is also more about informa-
tion than about matter, I think. Could consciousness reﬂect some currently
unknown level of physical reality? Could our current science be radically in-
complete? Indeed, it may well be so. There may be many scientiﬁc mysteries
yet to solve.
It is true that during the three-century plus history of modern science,
each period thinks it has a nearly ﬁnal answer, only to discover 25 or 50
years later some totally unexpected phenomenon that provokes a complete
paradigm shift. Let me invoke a temporal rather than a spatial “Copernican
principle.” Why should our epoch be especially favored? Why should we
have the ﬁnal answers?
A simple linear extrapolation of the history of science suggests that a
century from now things will look remarkably diﬀerent. What did we know
of quantum mechanics a century ago? Is it possible that, to use Wolf-
gang Pauli’s trenchant phrase, our current scientiﬁc world-view “is not even
wrong?” For our grand-children and great-grand children’s sake I hope so.
How boring if it should happen that there will be no fundamental changes
in our scientiﬁc world view in the future. Why should Nature’s imagination
be as limited as ours?
146.
146 Chaitin: Metabiology
So if our current scientiﬁc world view is not at all ﬁnal, perhaps living
beings do have something special that machines cannot attain, something
that science will some day understand as well as we currently understand
quantum mechanics, a scientiﬁc version, perhaps, of the soul or what the
spiritual would refer to as a divine spark. How otherwise to understand cases
of amazing human creativity? Pick your own favorite examples. I pick the
composer Johann Sebastian Bach, and the mathematicians Leonhard Euler,
Srinivasa Ramanujan and Georg Cantor. Can machines have that kind of
creativity, that kind of inspiration? These men seem to have had a direct
link to the source of new ideas.
Believers in Darwinian evolution by natural selection will argue that no
vital spark, no lan vital, nothing at all divine is needed, just random muta-
tions. I myself am a believer in Darwinian evolution. I am currently trying to
develop a theory I optimistically have dubbed “metabiology.” The purpose
of metabiology is to prove mathematically that Darwinian evolution works.
But I am open to the possibility that this may not be achievable. It would
also be delightful to be able to prove that evolution by natural selection
doesn’t, cannot work. I would be happy either way, as long as I can prove it.
Most likely my metabiological ideas will lead nowhere, but I feel my honor
as a mathematician demands that I should give it a try.
And why have human beings become so defeatist? Is it more fun to
work in a factory that produces robots than to conceive and raise one’s own
children? Or look at cars. I have been in remote corners of Argentina, where
people seem almost completely divorced from the modern world economy
and do everything themselves. They manage splendidly without cars, with
horses and donkeys. These are self-reproducing cars, vegetarian cars, not
ones that need petroleum.
No wonder that the contributors to this book have given up on human
beings. People are ill-used in our modern society, and sensitive scientiﬁc
intellectuals feel it. Scientists are now micro-managed. The refereeing and
grant systems with everything decided by committees favors safe, conser-
vative, incremental science. Can radical new ideas have a chance with our
current “factory” science? I doubt it. Would Galileo, Newton, Maxwell,
Darwin and Einstein be able to work in the current system? Would Euler,
Ramanujan and Cantor? I think not.
As I said, human beings are not ants, they are not bees, they were not
designed to be slaves. Let’s look at particularly creative periods in human
history, for example ancient Greece and the Italian Renaissance.
147.
Parsing the Turing test 147
How come the ancient Greeks were so creative? I asked a Greek intellec-
tual that once, in Mykonos, and he told me that the ancient Greeks discussed
this, and noted that ancient Egypt was largely stable and un-innovative for
millennia, the contrary of the ancient Greeks, because Greek city-states were
small and separated by mountains or isolated on islands, and so imaginative
individuals could be creative and aﬀect things, while Egyptian geography
permitted strong central, uniﬁed control of an empire, creativity was sup-
pressed, and talented individuals could have little or no eﬀect.
Similarly, the creativity of the Italian Renaissance probably had some-
thing to do with the fact that, even now, there is no Italian nation-state.
Italians are ﬁrst of all Tuscans or Sicilians, they are individualists, not Ital-
ians!
In both cases, ancient Greece and renaissance Italy, chaos and anarchy
encouraged creativity, and kept it from being suppressing by the authorities.
What can we learn from this? That strong central control is bad for us.
Immediate corollaries: The European Community was not a good idea. And
the United States would be better oﬀ as ﬁfty separate states. At least that’s
the case if you want to maximize creativity. I’ve already said what I think
of the current refereeing and grant systems.
Let me wrap up my argument. People are not machines. It is time for
people to stop trying to be like machines, because we have machines for
that now. We should stop worshipping the machine, and instead unleash
our creative, curious, passionate, inspired, intuitive, irrational individualistic
humanity.
149.
Chapter 11
Should mathematics be done
diﬀerently because of G¨odel’s
incompleteness theorem?
Speech on the occasion of being granted an honorary doctorate by the Univer-
sity of Cordoba, founded in 1613. Lecture given Monday, 23 November 2009,
in Cordoba, Argentina.1
Good afternoon.
First of all, I want to thank the university authorities who are present, to
thank the University of Cordoba, and to thank the Faculty of Philosophy and
Humanities, for this honor which I ﬁnd really moving. I consider myself an
Argentinean-American, and I cannot imagine anything nicer than receiving
an honorary doctorate from the oldest university in Argentina and one of the
oldest in the Americas.
I’m really very moved. It’s a great pleasure for me and my wife to be
here in Cordoba, especially for such a nice reason, and for us to become
acquainted with this city and its intellectual and scientiﬁc traditions.
So thank you very much.
Furthermore, in spite of what has just been said here by Professor Victor
Rodriguez about my achievements, I don’t think that I have accomplished
1
This speech was delivered in Spanish and translated into English by the author.
149
150.
150 Chaitin: Metabiology
very much. What I see constantly before me are the challenging questions
that I have not been able to answer, the big holes in what we can understand.
Very basic questions, such as whether it is possible to prove mathematically
that Darwin’s theory of evolution works or that it doesn’t work — either way
it would be very interesting. Or the subject that I want to talk about today,
which I will now introduce for you.
I used to work as a computer programmer; I wrote computer software and
did theory as a hobby. So I’m an amateur mathematician and a professional
programmer. That’s how I used to earn a living.
People normally think that mathematics is a dry, serious subject where
nothing dramatic ever happens. But in the past century math went through
a revolution as serious as the one that took place in physics because of the
theory of relativity and quantum theory. This fact is not well-known outside
the math community, but it is becoming better known now.
In particular, I’m referring to a controversy over how mathematics should
be done. There is a struggle for the soul of mathematics. I exaggerate a bit,
but not too much. There is a struggle for the soul of mathematics between
two diﬀerent groups, two tendencies, two opposing viewpoints.
On one side there is the famous French mathematician Poincar´e who
spoke of the importance of intuition in mathematics. On the other side
we have the German mathematician Hilbert who emphasized formalism
and the role of the axiomatic method. The conﬂict is between intuition and
formalism. In other words, is mathematics creative or is it mechanical?
Stating it that way, I indicate my own biases.
You can see which side I am on: the romantic side. But the debate is still
very much alive and I want to give you a concise history of this conﬂict.
About a century ago Hilbert proposed formalizing all of mathematics,
dropping the use of natural language and making math into a formal ax-
iomatic theory using an artiﬁcial language and mathematical logic. The key
point is that Hilbert thought that math gives absolute certainty and that
this implies that you can formalize mathematics completely in such a way
that there is an algorithm, a mechanical procedure, for checking whether or
not a proof is correct.
In other words, Hilbert believed that if math is objective not subjective,
if it really is absolutely certain, this is equivalent to saying that there are
rules of the game for carrying out proofs — if no steps are left out and we
use a completely formal language — which provide us with a completely
mechanical way to check if a proof is correct, that is, whether it obeys the
151.
Should mathematics be done diﬀerently? 151
rules. According to Hilbert, this is what it means to say that math gives
absolute certainty, which is what most mathematicians believe, because math
is a way of ﬂeeing from the real world to a toy world where truth is black or
white and proofs are absolutely convincing.
This is what Hilbert proposed about a century ago. And most people
thought that it could actually be done, that one could formalize everything.
Hilbert represented the orthodox, conservative position within the math com-
munity. People thought that it ought to be possible. In fact, some very pretty
work was done trying to achieve what Hilbert had proposed, trying to ful-
ﬁll his dream of formalizing mathematics completely and obtaining absolute
certainty and total objectivity.2
But in 1931 and in 1936 there were two big surprises. In 1931 Kurt
G¨odel showed that Hilbert’s project could never work, and in 1936 Alan
Turing showed this completely diﬀerently and found a deeper reason why
Hilbert’s dream was unattainable.
These two pieces of work are greatly admired, but in my opinion the
math community has a very ambiguous position about these two achieve-
ments. G¨odel and Turing are heroes, but nobody wants to face the disturbing
implications of their work.
What G¨odel showed in 1931 is that Hilbert’s dream is impossible because
any formalization of mathematics — any formal axiomatic system of the kind
that Hilbert sought for all of math, to give absolute certainty, to show that
the truth is black or white — will necessarily have to be incomplete because
some true results will be missing. In other words, no ﬁnite formal axiomatic
theory can give us all mathematical truths, some of them will always escape
us. In fact, an inﬁnity of true math results will be missing from any formal
axiomatic theory proposed to achieve Hilbert’s dream. Formal axiomatic
theories are always incomplete, they do not enable us to demonstrate all
possible mathematical truths.
G¨odel shows how to construct assertions which are true but cannot be
demonstrated within a given formal axiomatic system. The way he does it
is very surprising. He constructs a mathematical assertion — in fact, an
arithmetical assertion — which states that it itself cannot be demonstrated.
“I’m unprovable!”
2
In particular, I’m thinking of Zermelo-Fraenkel set theory and of the von Neumann
integers.
152.
152 Chaitin: Metabiology
If you can construct an assertion that states that it’s unprovable, there are
two possibilities: that it’s provable and that it isn’t. If it’s provable and
it asserts that it isn’t, we’re demonstrating something that’s false, which is
terrible. So by hypothesis we eliminate this possibility. If a formal axiomatic
system enables us to prove things that are false, it doesn’t interest us, it’s
a complete waste of time. Therefore “I’m unprovable” cannot be proved,
which means that it is true.
So you either demonstrate things that are false, or there are indemonstra-
ble truths, truths that escape us. This is the alternative that G¨odel confronts
us with. Assuming that the formal axiomatic system doesn’t enable you to
prove things that are false, there must be true mathematical assertions that
cannot be proved.
G¨odel incompleteness theorem was a big surprise at the time, and while
not provoking panic, it did lead to some rather emotional reactions, for ex-
ample, from Hermann Weyl. Weyl said that his faith in pure mathematics
was badly aﬀected, and that at ﬁrst it was diﬃcult for him to continue with
his research. And Weyl was a very ﬁne mathematician.
Now my story splits in two. On the one hand, there is more research on
incompleteness, on G¨odel’s remarkable discovery. On the other hand, the
math community begins to lose interest in these philosophical questions and
continues with its everyday work.
First I’ll tell you about Turing.
In 1936 Turing goes beyond G¨odel and ﬁnds a much deeper reason for
incompleteness. But I should emphasize that pioneering work is always the
most diﬃcult. Before G¨odel nobody was courageous enough to imagine that
Hilbert might be wrong. Turing found a deeper reason for incompleteness.
Turing discovered that there are many things in mathematics that can
be deﬁned but which there is no mechanical procedure, no algorithm, for
calculating — they are not computable functions. Math is full of things that
can be deﬁned but cannot be calculated. And uncomputability is a new
source of incompleteness.
If we consider a mathematical question such as Turing’s famous halting
problem for which there is no general method for calculating the answer, we
get the immediate corollary that there cannot be a formal axiomatic theory
that always enables us to prove what the answer is.
Why not?
One of the most basic properties of a formal axiomatic theory is that
in principle there is a mechanical procedure for systematically traversing the
153.
Should mathematics be done diﬀerently? 153
tree of all possible proofs and eliminating the ones that are incorrect. It would
be very slow, but in principle it would enable us to ﬁnd all the theorems.
So if we have a theory that enables us to demonstrate in individual cases
whether or not a program eventually halts, this would give us a mechanical
procedure, an algorithm, that always gives the correct answer, which Turing
showed in 1936 is impossible.
So Turing deduces G¨odel incompleteness from a more fundamental idea,
uncomputability, which is the fact that math is full of things that can be
deﬁned but cannot be calculated.
Now World War II begins and the generation that was interested in these
philosophical questions disappears from the scene. The math community
goes forward forgetting the crisis that was provoked by G¨odel’s theorem
which had been such a big surprise.
My problem is that I didn’t go forward. I remained obsessed with G¨odel’s
theorem. I thought it had to be very important. I bet my professional career
on the idea that it was a mistake to ignore G¨odel’s result.
What the math community did, since they are mathematicians and not
philosophers, is to continue with their daily work, with the problems that
interested them. The consensus was that yes in theory there are limits to
what can be demonstrated using any particular formal axiomatic theory, but
not in practice, not with the kinds of questions that interest us, not in our
own particular ﬁeld. This was more or less the community’s reaction.
In other words, while there may be mathematical facts that are true but
unprovable, these are highly artiﬁcial pathological cases. The consensus was
that in practice this does not occur. At least that is what mathematicians
preferred to think in order to be able to carry on with their work.
People have an amazing ability to avoid thinking about unpleasant sub-
jects such as death. If we think about death all the time it is impossible
to function. And if mathematicians think all the time about incompleteness
they can’t function either, since there will always be doubt about whether
the matter at hand can be settled by means of a proof. Why am I wasting
years of my life trying to prove something if there may not even be a proof?
Let’s consider an alternative course of action. Instead of ignoring G¨odel’s
theorem, what if we take it very seriously? I don’t believe in going to ex-
tremes, but if one took G¨odel’s result very, very seriously, how might one pro-
ceed? Consider the Riemann hypothesis. This is an important mathematical
conjecture that has a lot of signiﬁcant consequences. But unfortunately in a
hundred and ﬁfty years of eﬀort nobody has succeeded in proving the Rie-
154.
154 Chaitin: Metabiology
mann hypothesis. Mathematicians don’t know what to do; the way forward
is blocked. But physicists would just consider the Riemann hypothesis to be
a mathematical fact that has been corroborated empirically.
In other words, I think that a possible reaction to G¨odel’s result is to
make math a little bit more like theoretical physics. In physics axioms don’t
have to be self-evident. Maxwell’s equations and the Schr¨odinger equation
are not self-evident but they help us to organize, to unify a large body of
experimental data.
One could do mathematics in a similar fashion, taking G¨odel as justiﬁca-
tion for behaving as if math were an empirical science in which one doesn’t
try to demonstrate everything from self-evident principles, but instead one
only seeks to organize mathematical experience like physicists organize their
physics lab experience. One could proceed pragmatically and adopt unproven
hypotheses as new basic principles because they are extremely fruitful and
have many useful consequences even though they aren’t at all self-evident.
This is what I think we should do if we take G¨odel’s theorem seriously.
In my opinion mathematics is diﬀerent from physics, but maybe not as
diﬀerent as most people think. My work on metamathematics using complex-
ity and information-theoretic ideas suggests to me that perhaps we should
emphasize the similarities between the world of mathematics and the world
of physics instead of emphasizing the diﬀerences.
In this connection, there is a highly pertinent remark by the Russian
mathematician Vladimir Arnold. In his opinion the only diﬀerence between
mathematics and physics is that in mathematics the experiments are cheaper,
since one can carry them out on a computer instead of having to have a
laboratory full of expensive equipment! So math experiments are easier than
physics experiments.
How do I try to justify this new “quasi-empirical” view of mathematics?
Well, like most mathematicians, I do in fact believe in the Platonic world of
math ideas in which the truth is totally black or white. But I also believe
that we are denied direct access to this Platonic world and that down here
at our level it may be helpful to work a bit more quasi-empirically.
It may look like my mixed, hybrid, Platonic-empiricist position is incon-
sistent, but I don’t think that this is actually the case. Indeed, it is sometimes
very fruitful to take ideas that seem to be inconsistent and show that in fact
they aren’t.
Okay, so where do I ﬁnd arguments in favor of this quasi-empirical view of
mathematics? The key question is whether the incompleteness phenomenon
155.
Should mathematics be done diﬀerently? 155
that was discovered by G¨odel and further explored by Turing is exceptional
or widespread. How pervasive is incompleteness? That’s the basic question,
and it is quite controversial.
My contribution to this discussion is that I’ve found tools for measuring
the complexity or the information content of a formal axiomatic mathemati-
cal theory. And by using the concept of complexity in algorithmic information
theory one can see that incompleteness is natural, not surprising. In fact,
it’s inevitable, it’s unavoidable.
Using algorithmic information theory, one can see that the world of math-
ematical truths, the Platonic world of mathematical ideas, is inﬁnitely com-
plex. But any formal axiomatic system made by human beings necessarily
has only ﬁnite complexity. Indeed, rather low complexity, since the axioms
and rules of inference normally ﬁt on a couple of pages.
So seen from this perspective, incompleteness is natural, inevitable. The
world of mathematical ideas is inﬁnitely complex, but our theories only have
low, ﬁnite complexity; otherwise they wouldn’t ﬁt in a mathematician’s brain
nor would they be regarded as self-evident — but I’m against the idea that in
mathematics axioms have to be self-evident, because in physics self-evidence
of axioms is not required.
I’ve used complexity and information theory to argue that since the
amount of information in pure mathematics is inﬁnite, incompleteness is
only to be expected, since a formal axiomatic theory can capture at most a
ﬁnite amount of this mathematical information, an inﬁnitesimal portion in
fact.
This more or less summarizes an entire lifetime of research. But you will
not be surprised to learn that the mathematics community has not accepted
my quasi-empirical proposal. The immune system of an intellectual commu-
nity is very strong, and my ideas are rejected as foreign, as alien to the math
community.
Logicians don’t care much for computability, for complexity, for informa-
tion and for randomness. Randomness is a nightmare for a logician, because
randomness is irrational. Random events happen for no reason, they are
incomprehensible from a logical point of view.
However the physics community has some interest in my work. They like
the idea of using a physics-inspired approach in pure mathematics. They like
the idea that math isn’t that diﬀerent from physics. They like the idea that
a mathematics proof may be more convincing than the heuristic arguments
that are accepted in physics, but that this is only a matter of degree, not an
156.
156 Chaitin: Metabiology
absolute black or white diﬀerence. They have always felt that mathemati-
cians believe too much in absolute truth, and do not appreciate theoretical
physics enough.
But the coin is two-sided, and the conﬂict between intuition and for-
malism has become much more acute because of the computer. Computer
technology is a powerful argument against creativity and in favor of mecha-
nization and formalization.
Just take a look at the December 2008 issue of the Notices of the American
Mathematical Society which you can ﬁnd for free on the web. This is a special
issue devoted to formal proof. While I, a poor theoretician, have been trying
to convince mathematicians to pay attention to G¨odel’s theorem and work
slightly diﬀerently, these people — I didn’t realize what was happening until
they did it — have nearly succeeded in carrying out Hilbert’s dream.
They’ve constructed tools for formalizing almost all of mathematics.
They’ve done a superb piece of software engineering.
This community, which is a group of ﬁne mathematicians and software
engineers, believes that in the future all mathematical proofs should be for-
mal proofs. In their opinion, there will soon be no reason for accepting
informal proofs. We can start demanding formal proofs and re-writing all of
mathematics in a formal language so that it can be checked by veriﬁcation
software.
There are now interactive proof checkers for verifying mathematical
proofs. This is how these work: If I’m a mathematician and I have an
informal proof that I want to formalize, I give it to the proof checker. It will
say, “Well, there’s a particular step in this proof that I don’t understand yet.
Can you please explain this better?” And you keep ﬁlling in the proof, pro-
viding more details, until the software says, “Now I understand everything.
It’s all ﬁne. I have a complete formal proof.”
You didn’t have to write all the steps in the formal proof yourself; that
would be a big job. You write part of it, and the software provides the
rest. The ﬁnal result of this joint eﬀort is a complete formal proof that has
been checked and veriﬁed by reliable software, software that you trust be-
cause it was carefully developed using the best available software engineering
methodology.
And this veriﬁcation technology has advanced to the point where you
don’t just verify toy proofs, you can verify complicated proofs of really im-
portant theorems, for example the four color theorem, which states that four
colors suﬃce for coloring maps without having neighboring countries with
157.
Should mathematics be done diﬀerently? 157
the same color.
This was a rather complicated proof that not only was formalized, the
mathematician who did it did not complain and even stated that going
through this process enabled him to substantially improve the proof. So
this formal proof business is getting really serious.
Hilbert never thought that mathematicians should be required to use
detailed formal proofs in their daily work. But this community does. Fur-
thermore, they envision an oﬃcial repository for formal proofs that have been
put through this veriﬁcation process. Proofs will have to be accepted by this
repository to be used by the mathematics community; everything that has
been formalized and checked will be there, in one place.
So amazingly enough, the lines of research opened up by Hilbert’s formal-
ization proposal and by G¨odel’s work on the limitations of formal systems
are both progressing dramatically. I think there is a wonderful intellectual
tension between the work advancing formalization and the one criticizing it.
Both of these lines of research are going forward splendidly in parallel!
In mathematics this circumstance is striking because one thinks that the
truth is black or white. But in philosophy this situation doesn’t seem so
strange because philosophers understand that ideas that seem contradictory
are often in fact complementary.
I won’t try to predict the ﬁnal outcome of this conﬂict; probably there
will be no ﬁnal outcome. In philosophy there are no ﬁnal answers. Each
generation does its best to resolve the fundamental questions to their own
satisfaction, and then the next generation goes oﬀ in a diﬀerent direction.
So I won’t try to predict the future. I don’t know if mathematicians
will eventually think that incompleteness implies that they should do math
diﬀerently, or if formalization will win.
Perhaps we don’t have to choose between quasi-empiricism and formal-
ization. Both of these approaches can contribute something to mathematics
and to mathematical practice.
My late friend the mathematician Gian-Carlo Rota, whose provocative
ideas I greatly enjoy, has bequeathed us a collection of his essays entitled
Indiscrete Thoughts. He thinks that formal axiomatization is a cemetery.
When a theory is completely ﬁnished, then you can formalize it.
But when you are creating a new theory, you have to work with vague
intuitions, with imprecise ideas, and formalization is deadly. Premature for-
malization stiﬂes creativity; once a theory is formalized it becomes stiﬀ and
rigid and no new ideas can get in.
158.
158 Chaitin: Metabiology
So I think that quasi-empiricism and formalism can both contribute some-
thing of value. Furthermore, both are advancing step by step.
In 1974 I proposed accepting new math axioms the way that this is done
in physics,3
and nobody took me seriously, but in the past thirty-ﬁve years
this has actually happened.
It has happened in set theory, where there’s a new axiom called “projec-
tive determinacy.” It has happened in theoretical computer science, where
you use the hypothesis that P is not equal NP, which everyone believes
but nobody can prove. And it has happened in mathematical cryptogra-
phy, which is based on the assumption that you can’t factorize big numbers
quickly.
In these ﬁelds mathematicians are behaving as if they were physicists.
They’ve found new principles that enable them to organize the experiences
of each of these communities. These are principles that are not self-evident,
that have not been demonstrated, but that are accepted by consensus as new
fundamental principles, at least until they are disproven or counter-examples
are encountered.
Each of these mathematical communities is behaving as if they were the-
oretical physicists, they are doing what I call quasi-empirical mathematics.
So I’ve been delighted to witness these developments, but not so delighted
to see the striking advance of formalism in recent years.
These questions are still open, and they are very diﬃcult ones. I’ve tried
to argue in favor of a quasi-empirical stance, in favor of creativity and against
formalism, but I myself am not completely convinced by my own arguments.
More work is needed. We still do not know to what extent math is mechanical
or creative.
Thank you very much!
3
“Information-theoretic limitations of formal systems,” J. ACM 21, 1974, pp. 403–424.
159.
Bibliography
1. David Berlinski, The Devil’s Delusion: Atheism and its Scientiﬁc Pre-
tensions
2. Stephen Jay Gould, Wonderful Life: The Burgess Shale and the Nature
of History
3. Neil Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year
History of the Human Body
4. Melanie Mitchell, Complexity: A Guided Tour
5. Jerry Fodor and Massimo Piattelli-Palmarini, What Darwin Got
Wrong
6. Stephen C. Meyer, Signature in the Cell: DNA and the Evidence for
Intelligent Design
159
161.
Books by Chaitin
• Algorithmic Information Theory, Cambridge University Press, 1987.
• Information, Randomness and Incompleteness: Papers on Algorithmic
Information Theory, World Scientiﬁc, 1987, 2nd edition, 1990.
• Information-Theoretic Incompleteness, World Scientiﬁc, 1992.
• The Limits of Mathematics: A Course on Information Theory and the
Limits of Formal Reasoning, Springer, 1998. Also in Japanese.
• The Unknowable, Springer, 1999. Also in Japanese.
• Exploring Randomness, Springer, 2001.
• Conversations with a Mathematician: Math, Art, Science and the Lim-
its of Reason, Springer, 2002. Also in Portuguese and Japanese.
• From Philosophy to Program Size: Key Ideas and Methods. Lecture
Notes on Algorithmic Information Theory from the 8th Estonian Win-
ter School in Computer Science, EWSCS ’03, Tallinn Institute of Cy-
bernetics, 2003.
• Meta Math! The Quest for Omega, Pantheon, 2005. Also UK, French,
Italian, Portuguese, Japanese and Greek editions.
• Teoria algoritmica della complessit`a, Giappichelli, 2006.
• Thinking about G¨odel and Turing: Essays on Complexity, 1970–2007,
World Scientiﬁc, 2007.
• Mathematics, Complexity & Philosophy: Lectures in Canada and Ar-
gentina, Midas, in press. This is an English/Spanish bilingual edition.
161
162.
162 Chaitin: Metabiology
• G. Chaitin, N. da Costa, F. A. Doria, After G¨odel: Exploits into an
undecidable world, in preparation.
Be the first to comment