SlideShare a Scribd company logo
1 of 45
Download to read offline
Foundations of Induction
                 Marcus Hutter
           Canberra, ACT, 0200, Australia
            http://www.hutter1.net/



                  ANU       ETHZ

 NIPS – PhiMaLe Workshop – 17 December 2011
Marcus Hutter                       -2-                       Foundations of Induction

                                   Abstract
   Humans and many other intelligent systems (have to) learn from experience, build
models of the environment from the acquired knowledge, and use these models for
prediction. In philosophy this is called inductive inference, in statistics it is called
estimation and prediction, and in computer science it is addressed by machine
learning.
   I will first review unsuccessful attempts and unsuitable approaches towards a
general theory of induction, including Popper’s falsificationism and denial of
confirmation, frequentist statistics and much of statistical learning theory, subjective
Bayesianism, Carnap’s confirmation theory, the data paradigm, eliminative induction,
and deductive approaches. I will also debunk some other misguided views, such as
the no-free-lunch myth and pluralism.
   I will then turn to Solomonoff’s formal, general, complete, and essentially unique
theory of universal induction and prediction, rooted in algorithmic information theory
and based on the philosophical and technical ideas of Ockham, Epicurus, Bayes,
Turing, and Kolmogorov.
   This theory provably addresses most issues that have plagued other inductive
approaches, and essentially constitutes a conceptual solution to the induction
problem. Some theoretical guarantees, extensions to (re)active learning, practical
approximations, applications, and experimental results are mentioned in passing, but
they are not the focus of this talk.
   I will conclude with some general advice to philosophers and scientists interested in
the foundations of induction.
Marcus Hutter                   -3-                    Foundations of Induction

            Induction/Prediction Examples
Hypothesis testing/identification: Does treatment X cure cancer?
Do observations of white swans confirm that all ravens are black?
Model selection: Are planetary orbits circles or ellipses? How many
wavelets do I need to describe my picture well? Which genes can predict
cancer?
Parameter estimation: Bias of my coin. Eccentricity of earth’s orbit.
Sequence prediction: Predict weather/stock-quote/... tomorrow, based
on past sequence. Continue IQ test sequence like 1,4,9,16,?
Classification can be reduced to sequence prediction:
Predict whether email is spam.
Question: Is there a general & formal & complete & consistent theory
for induction & prediction?
Beyond induction: active/reward learning, fct. optimization, game theory.
Marcus Hutter                -4-                     Foundations of Induction

            The Need of a Unified Theory
     Why do we need or should want a unified theory of induction?
 • Finding new rules for every particular (new) problem is cumbersome.
 • A plurality of theories is prone to disagreement or contradiction.
 • Axiomatization boosted mathematics&logic&deduction and so
   (should) induction.
 • Provides a convincing story and conceptual tools for outsiders.
 • Automatize induction&science (that’s what machine learning does)
 • By relating it to existing narrow/heuristic/practical approaches we
   deepen our understanding of and can improve them.
 • Necessary for resolving philosophical problems.
 • Unified/universal theories are often beautiful gems.
 • There is no convincing argument that the goal is unattainable.
Marcus Hutter             -5-                    Foundations of Induction


                Math ⇔ Words
  “There is nothing that can be said by mathematical
   symbols and relations which cannot also be said by
   words.

    The converse, however, is false.
    Much that can be and is said by words
    cannot be put into equations,

    because it is nonsense.”

                  (Clifford A. Truesdell, 1966)
Marcus Hutter             -6-               Foundations of Induction


                Math ⇔ Words
  “There is nothing that can be said by mathematical
   symbols and relations which cannot also be said by
   words.

    The converse, however, is false.
    Much that can be and is said by words
    cannot be put into equations,

    because it is nonsense
                     xxxxx-science.”
Marcus Hutter                       -7-                     Foundations of Induction

                       Induction ⇔ Deduction
                    Approximate correspondence between
           the most important concepts in induction and deduction.
                                     Induction        ⇔   Deduction
Type of inference:      generalization/prediction     ⇔   specialization/derivation
Framework:                    probability axioms      =   logical axioms
Assumptions:                                  prior   =   non-logical axioms
Inference rule:                       Bayes rule      =   modus ponens
Results:                                  posterior   =   theorems
Universal scheme:         Solomonoff probability       =   Zermelo-Fraenkel set theory
Universal inference:          universal induction     =   universal theorem prover
Limitation:                        incomputable       =   incomplete (G¨del)
                                                                       o
In practice:                     approximations       =   semi-formal proofs
Operation:                          computation       =   proof

 The foundations of induction are as solid as those for deduction.
Marcus Hutter                -8-                 Foundations of Induction




                          Contents

                • Critique

                • Universal Induction

                • Universal Artificial Intelligence (very briefly)

                • Approximations & Applications

                • Conclusions
Marcus Hutter     -9-      Foundations of Induction




                Critique
Marcus Hutter                - 10 -                 Foundations of Induction

                   Why Popper is Dead
 • Popper was good at popularizing philosophy
   of science outside of philosophy.
 • Popper’s appeal: simple ideas, clearly expressed.
   Noble and heroic vision of science.
 • This made him a pop star among many scientists.
 • Unfortunately his ideas (falsificationism, corroboration) are seriously
   flawed.
 • Further, there have been better philosophy/philosophers before,
   during, and after Popper (but also many worse ones!)
 • Fazit: It’s time to move on and change your idol.
 • References: Godfrey-Smith (2003) Chp.4, Gardner (2001),
    Salmon (1981), Putnam (1974), Schilpp (1974).
Marcus Hutter               - 11 -                Foundations of Induction


                Popper’s Falsificationism
 • Demarcation problem: What is the difference between a scientific
   and a non-scientific theory?
 • Popper’s solution: Falsificationism: A hypothesis is scientific if and
   only if it can be refuted by some possible observation.
   Falsification is a matter of deductive logic.
 • Problem 1: Stochastic models can never be falsified in Popper’s
   strong deductive sense, since stochastic models can only become
   unlikely but never inconsistent with data.
 • Problem 2: Falsificationism alone cannot prefer to use a well-tested
   theory (e.g. how to build bridges) over a brand-new untested one,
   since both have not been falsified.
Marcus Hutter                - 12 -                Foundations of Induction



                   Popper on Simplicity

 • Why should we a-priori prefer to investigate “reasonable” theories
   over “obscure” theories.

 • Popper prefers simple over complex theories because he believes
   that simple theories are easier to falsify.

 • Popper equates simplicity with falsifiability, so is not advocating a
   simplicity bias proper.

 • Problem: A complex theory with fixed parameters is as easy to
   falsify as a simple theory.
Marcus Hutter               - 13 -               Foundations of Induction

Popper’s Corroboration / (Non)Confirmation
 • Popper0 (fallibilism): We can never be completely certain about
   factual issues ( )
 • Popper1 (skepticism): Scientific confirmation is a myth.
 • Popper2 (no confirmation): We cannot even increase our confidence
   in the truth of a theory when it passes observational tests.
 • Popper3 (no reason to worry):
   Induction is a myth, but science does not need it anyway.
 • Popper4 (corroboration): A theory that has survived many attempts
   to falsify it is “corroborated”, and it is rational to choose more
   corroborated theories.
 • Problem: Corroboration is just a new name for confirmation or
   meaningless.
Marcus Hutter                - 14 -                  Foundations of Induction

  The No Free Lunch (NFL) Theorem/Myth
 • Consider algorithms for finding the maximum of a function, and
   compare their performance uniformly averaged over all functions
   over some fixed finite domain.
 • Since sampling uniformly leads with (very) high probability to a
   totally random function (white noise), it is clear that on average no
   optimization algorithm can perform better than exhaustive search.
    .
⇒ All reasonable optimization algorithms
                                             Free!



  are equally good/bad on average.
 • Conclusion correct, but obviously no practical implication, since
   nobody cares about the maximum of white noise functions.
                                                                          Free!*
 • Uniform and universal sampling are both
   (non)assumptions, but only universal sampling
   makes sense and offers a free lunch.
                                                                *Subject to computation fees
Marcus Hutter                  - 15 -                 Foundations of Induction


                Problems with Frequentism
 • Definition: The probability of event E is the limiting relative
   frequency of its occurrence. P (E) := limn→∞ #n (E)/n.
 • Circularity of definition: Limit exists only with probability 1.
   So we have explained “Probability of E” in terms of “Probability 1”.
   What does probability 1 mean?                [Cournot’s principle can help]

 • Limitation to i.i.d.: Requires independent and identically distributed
   (i.i.d) samples. But the real world is not i.i.d.
 • Reference class problem: Example: Counting the frequency of some
   disease among “similar” patients. Considering all we know
   (symptoms, weight, age, ancestry, ...) there are no two similar
   patients.                 [Machine learning via feature selection can help]
Marcus Hutter                - 16 -                   Foundations of Induction


                Statistical Learning Theory
 • Statistical Learning Theory predominantly considers i.i.d. data.

 • E.g. Empirical Risk Minimization, PAC bounds, VC-dimension,
   Rademacher complexity, Cross-Validation is mostly developed for
   i.i.d. data.

 • Applications: There are enough applications with data close to i.i.d.
   for Frequentists to thrive, and they are pushing their frontiers too.

 • But the Real World is not (even close to) i.i.d.

 • Real Life is a single long non-stationary non-ergodic trajectory of
   experience.
Marcus Hutter                - 17 -                Foundations of Induction

          Limitations of Other Approaches
 • Subjective Bayes: No formal procedure/theory to get prior.
 • Objective Bayes: Right in spirit, but limited to small classes
   unless community embraces information theory.
 • MDL/MML: practical approximations of universal induction.
 • Pluralism is globally inconsistent.
 • Deductive Logic: Not strong enough to allow for induction.
 • Non-monotonic reasoning, inductive logic, default reasoning
   do not properly take uncertainty into account.
 • Carnap’s confirmation theory: Only for exchangeable data.
   Cannot confirm universal hypotheses.
 • Data paradigm: Data may be more important than algorithms for
   “simple” problems, but a “lookup-table” AGI will not work.
 • Eliminative induction ignores uncertainty and information theory.
Marcus Hutter                - 18 -                Foundations of Induction


                            Summary
                       The criticized approaches
           cannot serve as a general foundation of induction.

                          Conciliation
               Of course most of the criticized approaches
                  do work in their limited domains, and
      are trying to push their boundaries towards more generality.

                      And What Now?
         Criticizing others is easy and in itself a bit pointless.
  The crucial question is whether there is something better out there.
             And indeed there is, which I will turn to now.
Marcus Hutter   - 19 -   Foundations of Induction




         Universal Induction
Marcus Hutter                 - 20 -                 Foundations of Induction

         Foundations of Universal Induction
        Ockhams’ razor (simplicity) principle
        Entities should not be multiplied beyond necessity.
        Epicurus’ principle of multiple explanations
        If more than one theory is consistent with the observations, keep
        all theories.
        Bayes’ rule for conditional probabilities
        Given the prior belief/probability one can predict all future prob-
        abilities.
        Turing’s universal machine
        Everything computable by a human using a fixed procedure can
        also be computed by a (universal) Turing machine.
        Kolmogorov’s complexity
        The complexity or information content of an object is the length
        of its shortest description on a universal Turing machine.
        Solomonoff’s universal prior=Ockham+Epicurus+Bayes+Turing
        Solves the question of how to choose the prior if nothing is known.
        ⇒ universal induction, formal Occam,AIT,MML,MDL,SRM,...
Marcus Hutter                    - 21 -                   Foundations of Induction


     Science ≈ Induction ≈ Occam’s Razor
 • Grue Emerald Paradox:

    Hypothesis 1: All emeralds are green.

    Hypothesis 2: All emeralds found till y2020 are green,
                  thereafter all emeralds are blue.

 • Which hypothesis is more plausible? H1! Justification?

 • Occam’s razor: take simplest hypothesis consistent with data.
   is the most important principle in machine learning and science.

 • Problem: How to quantify “simplicity”? Beauty? Elegance?
            Description Length!
    [The Grue problem goes much deeper. This is only half of the story]
Marcus Hutter                                    - 22 -         Foundations of Induction

   Turing Machines & Effective Enumeration
 • Turing machine (TM) = (mathema-
   tical model for an) idealized computer.
 • See e.g. textbook                [HMU06]
    Show Turing Machine in Action: TuringBeispielAnimated.gif


 • Instruction i: If symbol on tape
   under head is 0/1, write 0/1/-
   and move head left/right/not
   and goto instruction=state j.
 • {partial recursive functions }
   ≡ {functions computable with a TM}.
 • A set of objects S = {o1 , o2 , o3 , ...} can be (effectively) enumerated
   :⇐⇒ ∃ TM machine mapping i to ⟨oi ⟩,
   where ⟨⟩ is some (often omitted) default coding of elements in S.
Marcus Hutter                - 23 -                Foundations of Induction


Information Theory & Kolmogorov Complexity
 • Quantification/interpretation of Occam’s razor:

 • Shortest description of object is best explanation.

 • Shortest program for a string on a Turing machine
   T leads to best extrapolation=prediction.

                      KT (x) = min{l(p) : T (p) = x}
                                      p

 • Prediction is best for a natural universal Turing machine U .

    Kolmogorov-complexity(x) = K(x) = KU (x)              ≤ KT (x) + cT
Marcus Hutter                 - 24 -                Foundations of Induction

                Bayesian Probability Theory
Given (1): Models P (D|Hi ) for probability of
observing data D, when Hi is true.
Given (2): Prior probability over hypotheses P (Hi ).
Goal: Posterior probability P (Hi |D) of Hi ,
after having seen data D.
Solution:
                                P (D|Hi ) · P (Hi )
Bayes’ rule:       P (Hi |D) = ∑
                                i P (D|Hi ) · P (Hi )

(1) Models P (D|Hi ) usually easy to describe (objective probabilities)
(2) But Bayesian prob. theory does not tell us how to choose the prior
P (Hi ) (subjective probabilities)
Marcus Hutter               - 25 -                   Foundations of Induction

            Algorithmic Probability Theory
 • Epicurus: If more than one theory is consistent
   with the observations, keep all theories.
 • ⇒ uniform prior over all Hi ?
 • Refinement with Occam’s razor quantified
   in terms of Kolmogorov complexity:

                     P (Hi ) := wHi := 2−KT /U (Hi )
                                 U

 • Fixing T we have a complete theory for prediction.
   Problem: How to choose T .
 • Choosing U we have a universal theory for prediction.
   Observation: Particular choice of U does not matter much.
   Problem: Incomputable.
Marcus Hutter                   - 26 -                  Foundations of Induction

 Inductive Inference & Universal Forecasting
 • Solomonoff combined Occam, Epicurus, Bayes, and
   Turing into one formal theory of sequential prediction.
 • M (x) = probability that a universal Turing
   machine outputs x when provided with
   fair coin flips on the input tape.
 • A posteriori probability of y given x is M (y|x) = M (xy)/M (x).
 • Given x1 , .., xt−1 , the probability of xt is M (xt |x1 ...xt−1 ).
 • Immediate “applications”:
   - Weather forecasting: xt ∈ {sun,rain}.
   - Stock-market prediction: xt ∈ {bear,bull}.
   - Continuing number sequences in an IQ test: xt ∈ I .
                                                     N
 • Optimal universal inductive reasoning system!
Marcus Hutter                 - 27 -                Foundations of Induction

           Some Prediction Bounds for M
                                          ∑
M (x) = universal distribution.      hn := xn (M (xn |x<n ) − µ(xn |x<n ))2
µ(x) = unknown true comp. distr. (no i.i.d. or any other assumptions)
                  ∑∞
• Total bound:       n=1 E[hn ] ≤ K(µ) ln 2, which implies
   Convergence: M (xn |x<n ) → µ(xn |x<n ) w.µ.p.1.
• Instantaneous i.i.d. bounds: For i.i.d. M with continuous, discrete, and
   universal prior, respectively:
          × 1          −1           × 1      −1
   E[hn ] ≤ n ln w(µ) and E[hn ] ≤ n ln wµ = n K(µ) ln 2.
                                                   1

• Bounds for computable environments: Rapidly M (xt |x<t ) → 1 on every
   computable sequence x1:∞ (whichsoever, e.g. 1∞ or the digits of π or e),
   i.e. M quickly recognizes the structure of the sequence.
• Weak instantaneous bounds: valid for all n and x1:n and xn ̸= xn :
                                                             ¯
             ×               ×
   2−K(n) ≤ M (¯n |x<n ) ≤ 22K(x1:n ∗)−K(n)
                   x
                                            ×
• Magic instance numbers: e.g. M (0|1n ) = 2−K(n) → 0, but spikes up for
   simple n. M is cautious at magic instance numbers n.
• Future bounds / errors to come: If our past observations ω1:n contain a
   lot of information about µ, we make few errors in future:
   ∑∞                    +
      t=n+1 E[ht |ω1:n ] ≤ [K(µ|ω1:n )+K(n)] ln 2
Marcus Hutter               - 28 -                Foundations of Induction

                Some other Properties of M
 • Problem of zero prior / confirmation of universal hypotheses:
                                     {
                                       ≡ 0 in Bayes-Laplace model
   P[All ravens black|n black ravens] f ast
                                       −→ 1 for universal prior wθ
                                                                 U


 • Reparametrization and regrouping invariance: wθ = 2−K(θ) always
                                                     U

   exists and is invariant w.r.t. all computable reparametrizations f .
   (Jeffrey prior only w.r.t. bijections, and does not always exist)
 • The Problem of Old Evidence: No risk of biasing the prior towards
   past data, since wθ is fixed and independent of model class M.
                     U


 • The Problem of New Theories: Updating of M is not necessary,
   since universal class MU includes already all.
 • M predicts better than all other mixture predictors based on any
   (continuous or discrete) model class and prior, even in
   non-computable environments.
Marcus Hutter               - 29 -               Foundations of Induction



          More Stuff / Critique / Problems

 • Prior knowledge y can be incorporated by using “subjective” prior
   wν|y = 2−K(ν|y) or by prefixing observation x by y.
    U



 • Additive/multiplicative constant fudges and U -dependence is often
   (but not always) harmless.

 • Incomputability: K and M can serve as “gold standards” which
   practitioners should aim at, but have to be (crudely) approximated
   in practice (MDL [Ris89], MML [Wal05], LZW [LZ76], CTW [WSTT95],
   NCD [CV05]).
Marcus Hutter   - 30 -   Foundations of Induction




             Universal
       Artificial Intelligence
Marcus Hutter                - 31 -                Foundations of Induction

   Induction→Prediction→Decision→Action
Having or acquiring or learning or inducing a model of the environment
an agent interacts with allows the agent to make predictions and utilize
them in its decision process of finding a good next action.
Induction infers general models from specific observations/facts/data,
usually exhibiting regularities or properties or relations in the latter.

                             Example
Induction: Find a model of the world economy.
Prediction: Use the model for predicting the future stock market.
Decision: Decide whether to invest assets in stocks or bonds.
Action: Trading large quantities of stocks influences the market.
Marcus Hutter                  - 32 -                 Foundations of Induction

                Sequential Decision Theory
Setup: For t = 1, 2, 3, 4, ...
       Given sequence x1 , x2 , ..., xt−1
       (1) predict/make decision yt ,
       (2) observe xt ,
       (3) suffer loss Loss(xt , yt ),
       (4) t → t + 1, goto (1)
Goal: Minimize expected Loss.
Greedy minimization of expected loss is optimal if:
Important: Decision yt does not influence env. (future observations).
           Loss function is known.
Problem: Expectation w.r.t. what?
Solution: W.r.t. universal distribution M if true distr. is unknown.
Marcus Hutter                 - 33 -                     Foundations of Induction


                           Agent Model
                           with Reward
                           if actions/decisions a
                        influence the environment q

         r1 | o1 r2 | o2 r3 | o3 r4 | o4 r5 | o5 r6 | o6             ...

                      ¨¨¨                    r
                                             ‰rr
                    ¨
                    %¨                          rr
                  Agent                             Environ-
          work               tape ...        work            tape ...
                    p                               ment q
                   €€            I
                                 
                     €€        
                       €€    
                         €
                         q
           a1      a2        a3         a4          a5      a6       ...
Marcus Hutter                     - 34 -               Foundations of Induction

            Universal Artificial Intelligence
Key idea: Optimal action/plan/policy based on the simplest world
model consistent with history. Formally ...
                         ∑           ∑                     ∑
 AIXI: ak := arg max         ... max      [rk + ... + rm ]   2−length(p)
                     ak              am
                          ok rk            om rm   p : U (p,a1 ..am )=o1 r1 ..om rm

  k=now, action, observation, reward, U niversal TM, program, m=lifespan
AIXI is an elegant, complete, essentially unique,
and limit-computable mathematical theory of AI.
Claim: AIXI is the most intelligent environmental
independent, i.e. universally optimal, agent possible.
Proof: For formalizations, quantifications, proofs see ⇒
Problem: Computationally intractable.
Achievement: Well-defines AI. Gold standard to aim at.
Inspired practical algorithms. Cf. infeasible exact minimax.             [H’00-05]
Marcus Hutter        - 35 -   Foundations of Induction




                Applications 
                Approximations
Marcus Hutter                  - 36 -            Foundations of Induction



 The Minimum Description Length Principle

 • Approximation of Solomonoff,
    since M is incomputable:

 • M (x) ≈ 2−KU (x) (quite good)

 • KU (x) ≈ KT (x) (very crude)

 • Predict y of highest M (y|x) is approximately same as

 • MDL: Predict y of smallest KT (xy).
Marcus Hutter                - 37 -                 Foundations of Induction

                    Universal Clustering
 • Question: When is object x similar to object y?
 • Universal solution: x similar to y
   ⇔ x can be easily (re)constructed from y
   ⇔ K(x|y) := min{l(p) : U (p, y) = x} is small.
 • Universal Similarity: Symmetrizenormalize K(x|y).
 • Normalized compression distance: Approximate K by KT .
 • Practice: For T choose (de)compressor like lzw or gzip or bzip(2).
 • Multiple objects ⇒ similarity matrix ⇒ similarity tree.
 • Applications: Completely automatic reconstruction (a) of the
   evolutionary tree of 24 mammals based on complete mtDNA, and
   (b) of the classification tree of 52 languages based on the
   declaration of human rights and (c) many others. [CilibrasiVitanyi’05]
Marcus Hutter               - 38 -                Foundations of Induction

                      Universal Search
 • Levin search: Fastest algorithm for
   inversion and optimization problems.
 • Theoretical application:
   Assume somebody found a non-constructive
   proof of P=NP, then Levin-search is a polynomial
   time algorithm for every NP (complete) problem.
 • Practical (OOPS) applications (J. Schmidhuber)
   Maze, towers of hanoi, robotics, ...
 • FastPrg: The asymptotically fastest and shortest algorithm for all
   well-defined problems.
 • Computable Approximations of AIXI:
   AIXItl and AIξ and MC-AIXI-CTW and ΦMDP.
 • Human Knowledge Compression Prize: (50’000=)
                                             C
Marcus Hutter                                      - 39 -              Foundations of Induction

                               A Monte-Carlo AIXI Approximation
                              based on Upper Confidence Tree (UCT) search for planning
                             and Context Tree Weighting (CTW) compression for learning
                                         Normalized Learning Scalability
       1
Norm. Av. Reward per Trial




                                                                           Optimum
                                                                           Tiger
                                                                           4x4 Grid
                                                                           1d Maze
                                                                           Extended Tiger
                                                   [VNHUS’09-11]           TicTacToe
                                                                           Cheese Maze
                                                    Experience             Pocman*
       0
        100                            1000            10000         100000           1000000
Marcus Hutter      - 40 -    Foundations of Induction




                Conclusion
Marcus Hutter                  - 41 -                Foundations of Induction

                               Summary
 • Conceptually and mathematically the problem of induction is solved.
 • Computational problems and some philosophical questions remain.
 • Ingredients for induction and prediction:
   Ockham, Epicurus, Turing, Bayes, Kolmogorov, Solomonoff
 • For decisions and actions: Include Bellman.
 • Mathematical results: consistency, bounds, optimality, many others.
 • Most Philosophical riddles around induction solved.
 • Experimental results via practical compressors.
                 Induction ≈ Science ≈ Machine Learning ≈
                Ockham’s razor ≈ Compression ≈ Intelligence.
Marcus Hutter                - 42 -               Foundations of Induction

                              Advice
 • Accept Universal Induction (UI) as the best conceptual solution of
   the induction problem so far.
 • Stand on the shoulders of giants like Shannon, Bayes, Turing,
   Kolmogorov, Solomonoff, Wallace, Rissanen, Bellman.
 • Work out defects / what is missing, and try to improve, or
 • Work on alternatives but then benchmark your approach against
   state of the art UI.
 • Cranks who have not understood the giants and try to reinvent the
   wheel from scratch can safely be ignored.
      Never trust a theory if it is not supported by an experiment
                    ===                                  =====
                  experiment                             theory
Marcus Hutter                - 43 -                 Foundations of Induction

                When it’s OK to ignore UI
 • if your pursued approaches already works sufficiently well
 • if your problem is simple enough (e.g. i.i.d.)
 • if you do not care about a principled/sound solution
 • if you’re happy to succeed by trial-and-error (with restrictions)

                    Information Theory
 • Information Theory plays an even more significant role for induction
   than this presentation might suggest.
 • Algorithmic Information Theory is superior to Shannon Information.
 • There are AIT versions that even capture Meaningful Information.
Marcus Hutter               - 44 -               Foundations of Induction



                            Outlook
 • Use compression size as general performance measure
   (like perplexity is used in speech)
 • Via code-length view, many approaches become comparable, and
   may be regarded as approximations to UI.
 • This should lead to better compression algorithms which in turn
   should lead to better learning algorithms.
 • Address open problems in induction within the UI framework.
Marcus Hutter                  - 45 -                 Foundations of Induction


           Thanks!           Questions?             Details:
[RH11]   S. Rathmanner and M. Hutter. A philosophical treatise of universal
         induction. Entropy, 13(6):1076–1136, 2011.

[Hut07] M. Hutter. On universal prediction and Bayesian confirmation.
        Theoretical Computer Science, 384(1):33–48, 2007.

[LH07]   S. Legg and M. Hutter. Universal intelligence: A definition of
         machine intelligence. Minds  Machines, 17(4):391–444, 2007.

[Hut05] M. Hutter. Universal Artificial Intelligence: Sequential Decisions
        based on Algorithmic Probability. Springer, Berlin, 2005.

[GS03]   P. Godfrey-Smith. Theory and Reality: An Introduction to the
         Philosophy of Science. University of Chicago, 2003.

More Related Content

Viewers also liked

Recursively Defined Sequences
Recursively Defined SequencesRecursively Defined Sequences
Recursively Defined SequencesBruce Lightner
 
Theories of Something, Everything & Nothing.
Theories of Something, Everything & Nothing.Theories of Something, Everything & Nothing.
Theories of Something, Everything & Nothing.mahutte
 
Can Intelligence Explode?
Can Intelligence Explode?Can Intelligence Explode?
Can Intelligence Explode?mahutte
 
Ground Excited Systems
Ground Excited SystemsGround Excited Systems
Ground Excited SystemsTeja Ande
 
Seismic Design of Buried Structures in PH and NZ
Seismic Design of Buried Structures in PH and NZSeismic Design of Buried Structures in PH and NZ
Seismic Design of Buried Structures in PH and NZLawrence Galvez
 
Base Excited Systems
Base Excited SystemsBase Excited Systems
Base Excited SystemsTeja Ande
 
Lesson14 Exmpl
Lesson14 ExmplLesson14 Exmpl
Lesson14 ExmplTeja Ande
 
Membranes and thin plates
Membranes and thin platesMembranes and thin plates
Membranes and thin platesMaged Mostafa
 
Vibration measurements2
Vibration measurements2Vibration measurements2
Vibration measurements2Maged Mostafa
 
Dynamic response to harmonic excitation
Dynamic response to harmonic excitationDynamic response to harmonic excitation
Dynamic response to harmonic excitationUniversity of Glasgow
 
02 Damping - SPC408 - Fall2016
02 Damping - SPC408 - Fall201602 Damping - SPC408 - Fall2016
02 Damping - SPC408 - Fall2016Maged Mostafa
 
Fly ash as a wood substitute
Fly ash as a wood substituteFly ash as a wood substitute
Fly ash as a wood substituteRanjeet Patil
 
App4 time history analysis
App4 time history analysisApp4 time history analysis
App4 time history analysisrafa far
 

Viewers also liked (20)

Recursively Defined Sequences
Recursively Defined SequencesRecursively Defined Sequences
Recursively Defined Sequences
 
Theories of Something, Everything & Nothing.
Theories of Something, Everything & Nothing.Theories of Something, Everything & Nothing.
Theories of Something, Everything & Nothing.
 
Can Intelligence Explode?
Can Intelligence Explode?Can Intelligence Explode?
Can Intelligence Explode?
 
My L&T Statement
My L&T StatementMy L&T Statement
My L&T Statement
 
Sam Session
Sam SessionSam Session
Sam Session
 
1q05 performancemonitoring
1q05 performancemonitoring1q05 performancemonitoring
1q05 performancemonitoring
 
Ground Excited Systems
Ground Excited SystemsGround Excited Systems
Ground Excited Systems
 
Seismic Design of Buried Structures in PH and NZ
Seismic Design of Buried Structures in PH and NZSeismic Design of Buried Structures in PH and NZ
Seismic Design of Buried Structures in PH and NZ
 
Base Excited Systems
Base Excited SystemsBase Excited Systems
Base Excited Systems
 
Lesson14 Exmpl
Lesson14 ExmplLesson14 Exmpl
Lesson14 Exmpl
 
Membranes and thin plates
Membranes and thin platesMembranes and thin plates
Membranes and thin plates
 
Vibration measurements2
Vibration measurements2Vibration measurements2
Vibration measurements2
 
Periodic Structures
Periodic StructuresPeriodic Structures
Periodic Structures
 
Dynamic response to harmonic excitation
Dynamic response to harmonic excitationDynamic response to harmonic excitation
Dynamic response to harmonic excitation
 
05 continuous shaft
05 continuous shaft05 continuous shaft
05 continuous shaft
 
02 Damping - SPC408 - Fall2016
02 Damping - SPC408 - Fall201602 Damping - SPC408 - Fall2016
02 Damping - SPC408 - Fall2016
 
Wind provisions
Wind provisionsWind provisions
Wind provisions
 
Fly ash as a wood substitute
Fly ash as a wood substituteFly ash as a wood substitute
Fly ash as a wood substitute
 
App4 time history analysis
App4 time history analysisApp4 time history analysis
App4 time history analysis
 
Initial Value Problems
Initial Value ProblemsInitial Value Problems
Initial Value Problems
 

Similar to Foundations of Induction

Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learningmahutte
 
The shaky foundations of science slides - James Fodor
The shaky foundations of science slides - James FodorThe shaky foundations of science slides - James Fodor
The shaky foundations of science slides - James FodorAdam Ford
 
A reflection on modeling and the nature of knowledge
A reflection on modeling and the nature of knowledgeA reflection on modeling and the nature of knowledge
A reflection on modeling and the nature of knowledgeJorge Zazueta
 
Natural sciences 2012 13
Natural sciences 2012 13Natural sciences 2012 13
Natural sciences 2012 13Kieran Ryan
 
Research paradigm
Research paradigmResearch paradigm
Research paradigmAmina Tariq
 
Research Methods and Paradigms
Research Methods and ParadigmsResearch Methods and Paradigms
Research Methods and ParadigmsDr Bryan Mills
 
IMS.BrownBagSeminar.QRP
IMS.BrownBagSeminar.QRPIMS.BrownBagSeminar.QRP
IMS.BrownBagSeminar.QRPtim smits
 
research methodology and description on designs
research methodology and description on designsresearch methodology and description on designs
research methodology and description on designsshaila55
 
Complexity and Context-Dependency (version for Bath IOP Seminar)
Complexity and Context-Dependency (version for Bath IOP Seminar)Complexity and Context-Dependency (version for Bath IOP Seminar)
Complexity and Context-Dependency (version for Bath IOP Seminar)Bruce Edmonds
 
Role of theory in research
Role of theory in researchRole of theory in research
Role of theory in researchAman Qureshi
 
Logic and scientific method
Logic and scientific methodLogic and scientific method
Logic and scientific methodBachtiar Idris
 
Research methodology Chapter 1
Research methodology Chapter 1Research methodology Chapter 1
Research methodology Chapter 1Pulchowk Campus
 
Mff715 w1 1_introto_research_fall11
Mff715 w1 1_introto_research_fall11Mff715 w1 1_introto_research_fall11
Mff715 w1 1_introto_research_fall11Rachel Chung
 
Fuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionFuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionNagasuri Bala Venkateswarlu
 
The logic of informal proofs
The logic of informal proofsThe logic of informal proofs
The logic of informal proofsBrendan Larvor
 

Similar to Foundations of Induction (20)

Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learning
 
The shaky foundations of science slides - James Fodor
The shaky foundations of science slides - James FodorThe shaky foundations of science slides - James Fodor
The shaky foundations of science slides - James Fodor
 
scientific.ppt
scientific.pptscientific.ppt
scientific.ppt
 
A reflection on modeling and the nature of knowledge
A reflection on modeling and the nature of knowledgeA reflection on modeling and the nature of knowledge
A reflection on modeling and the nature of knowledge
 
Natural sciences 2012 13
Natural sciences 2012 13Natural sciences 2012 13
Natural sciences 2012 13
 
Theory building (brm)
Theory building (brm)Theory building (brm)
Theory building (brm)
 
Research paradigm
Research paradigmResearch paradigm
Research paradigm
 
On theories
On theoriesOn theories
On theories
 
Reasoning in AI.pdf
Reasoning in AI.pdfReasoning in AI.pdf
Reasoning in AI.pdf
 
Inductive Essay Examples
Inductive Essay ExamplesInductive Essay Examples
Inductive Essay Examples
 
Research Methods and Paradigms
Research Methods and ParadigmsResearch Methods and Paradigms
Research Methods and Paradigms
 
IMS.BrownBagSeminar.QRP
IMS.BrownBagSeminar.QRPIMS.BrownBagSeminar.QRP
IMS.BrownBagSeminar.QRP
 
research methodology and description on designs
research methodology and description on designsresearch methodology and description on designs
research methodology and description on designs
 
Complexity and Context-Dependency (version for Bath IOP Seminar)
Complexity and Context-Dependency (version for Bath IOP Seminar)Complexity and Context-Dependency (version for Bath IOP Seminar)
Complexity and Context-Dependency (version for Bath IOP Seminar)
 
Role of theory in research
Role of theory in researchRole of theory in research
Role of theory in research
 
Logic and scientific method
Logic and scientific methodLogic and scientific method
Logic and scientific method
 
Research methodology Chapter 1
Research methodology Chapter 1Research methodology Chapter 1
Research methodology Chapter 1
 
Mff715 w1 1_introto_research_fall11
Mff715 w1 1_introto_research_fall11Mff715 w1 1_introto_research_fall11
Mff715 w1 1_introto_research_fall11
 
Fuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionFuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introduction
 
The logic of informal proofs
The logic of informal proofsThe logic of informal proofs
The logic of informal proofs
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Foundations of Induction

  • 1. Foundations of Induction Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU ETHZ NIPS – PhiMaLe Workshop – 17 December 2011
  • 2. Marcus Hutter -2- Foundations of Induction Abstract Humans and many other intelligent systems (have to) learn from experience, build models of the environment from the acquired knowledge, and use these models for prediction. In philosophy this is called inductive inference, in statistics it is called estimation and prediction, and in computer science it is addressed by machine learning. I will first review unsuccessful attempts and unsuitable approaches towards a general theory of induction, including Popper’s falsificationism and denial of confirmation, frequentist statistics and much of statistical learning theory, subjective Bayesianism, Carnap’s confirmation theory, the data paradigm, eliminative induction, and deductive approaches. I will also debunk some other misguided views, such as the no-free-lunch myth and pluralism. I will then turn to Solomonoff’s formal, general, complete, and essentially unique theory of universal induction and prediction, rooted in algorithmic information theory and based on the philosophical and technical ideas of Ockham, Epicurus, Bayes, Turing, and Kolmogorov. This theory provably addresses most issues that have plagued other inductive approaches, and essentially constitutes a conceptual solution to the induction problem. Some theoretical guarantees, extensions to (re)active learning, practical approximations, applications, and experimental results are mentioned in passing, but they are not the focus of this talk. I will conclude with some general advice to philosophers and scientists interested in the foundations of induction.
  • 3. Marcus Hutter -3- Foundations of Induction Induction/Prediction Examples Hypothesis testing/identification: Does treatment X cure cancer? Do observations of white swans confirm that all ravens are black? Model selection: Are planetary orbits circles or ellipses? How many wavelets do I need to describe my picture well? Which genes can predict cancer? Parameter estimation: Bias of my coin. Eccentricity of earth’s orbit. Sequence prediction: Predict weather/stock-quote/... tomorrow, based on past sequence. Continue IQ test sequence like 1,4,9,16,? Classification can be reduced to sequence prediction: Predict whether email is spam. Question: Is there a general & formal & complete & consistent theory for induction & prediction? Beyond induction: active/reward learning, fct. optimization, game theory.
  • 4. Marcus Hutter -4- Foundations of Induction The Need of a Unified Theory Why do we need or should want a unified theory of induction? • Finding new rules for every particular (new) problem is cumbersome. • A plurality of theories is prone to disagreement or contradiction. • Axiomatization boosted mathematics&logic&deduction and so (should) induction. • Provides a convincing story and conceptual tools for outsiders. • Automatize induction&science (that’s what machine learning does) • By relating it to existing narrow/heuristic/practical approaches we deepen our understanding of and can improve them. • Necessary for resolving philosophical problems. • Unified/universal theories are often beautiful gems. • There is no convincing argument that the goal is unattainable.
  • 5. Marcus Hutter -5- Foundations of Induction Math ⇔ Words “There is nothing that can be said by mathematical symbols and relations which cannot also be said by words. The converse, however, is false. Much that can be and is said by words cannot be put into equations, because it is nonsense.” (Clifford A. Truesdell, 1966)
  • 6. Marcus Hutter -6- Foundations of Induction Math ⇔ Words “There is nothing that can be said by mathematical symbols and relations which cannot also be said by words. The converse, however, is false. Much that can be and is said by words cannot be put into equations, because it is nonsense xxxxx-science.”
  • 7. Marcus Hutter -7- Foundations of Induction Induction ⇔ Deduction Approximate correspondence between the most important concepts in induction and deduction. Induction ⇔ Deduction Type of inference: generalization/prediction ⇔ specialization/derivation Framework: probability axioms = logical axioms Assumptions: prior = non-logical axioms Inference rule: Bayes rule = modus ponens Results: posterior = theorems Universal scheme: Solomonoff probability = Zermelo-Fraenkel set theory Universal inference: universal induction = universal theorem prover Limitation: incomputable = incomplete (G¨del) o In practice: approximations = semi-formal proofs Operation: computation = proof The foundations of induction are as solid as those for deduction.
  • 8. Marcus Hutter -8- Foundations of Induction Contents • Critique • Universal Induction • Universal Artificial Intelligence (very briefly) • Approximations & Applications • Conclusions
  • 9. Marcus Hutter -9- Foundations of Induction Critique
  • 10. Marcus Hutter - 10 - Foundations of Induction Why Popper is Dead • Popper was good at popularizing philosophy of science outside of philosophy. • Popper’s appeal: simple ideas, clearly expressed. Noble and heroic vision of science. • This made him a pop star among many scientists. • Unfortunately his ideas (falsificationism, corroboration) are seriously flawed. • Further, there have been better philosophy/philosophers before, during, and after Popper (but also many worse ones!) • Fazit: It’s time to move on and change your idol. • References: Godfrey-Smith (2003) Chp.4, Gardner (2001), Salmon (1981), Putnam (1974), Schilpp (1974).
  • 11. Marcus Hutter - 11 - Foundations of Induction Popper’s Falsificationism • Demarcation problem: What is the difference between a scientific and a non-scientific theory? • Popper’s solution: Falsificationism: A hypothesis is scientific if and only if it can be refuted by some possible observation. Falsification is a matter of deductive logic. • Problem 1: Stochastic models can never be falsified in Popper’s strong deductive sense, since stochastic models can only become unlikely but never inconsistent with data. • Problem 2: Falsificationism alone cannot prefer to use a well-tested theory (e.g. how to build bridges) over a brand-new untested one, since both have not been falsified.
  • 12. Marcus Hutter - 12 - Foundations of Induction Popper on Simplicity • Why should we a-priori prefer to investigate “reasonable” theories over “obscure” theories. • Popper prefers simple over complex theories because he believes that simple theories are easier to falsify. • Popper equates simplicity with falsifiability, so is not advocating a simplicity bias proper. • Problem: A complex theory with fixed parameters is as easy to falsify as a simple theory.
  • 13. Marcus Hutter - 13 - Foundations of Induction Popper’s Corroboration / (Non)Confirmation • Popper0 (fallibilism): We can never be completely certain about factual issues ( ) • Popper1 (skepticism): Scientific confirmation is a myth. • Popper2 (no confirmation): We cannot even increase our confidence in the truth of a theory when it passes observational tests. • Popper3 (no reason to worry): Induction is a myth, but science does not need it anyway. • Popper4 (corroboration): A theory that has survived many attempts to falsify it is “corroborated”, and it is rational to choose more corroborated theories. • Problem: Corroboration is just a new name for confirmation or meaningless.
  • 14. Marcus Hutter - 14 - Foundations of Induction The No Free Lunch (NFL) Theorem/Myth • Consider algorithms for finding the maximum of a function, and compare their performance uniformly averaged over all functions over some fixed finite domain. • Since sampling uniformly leads with (very) high probability to a totally random function (white noise), it is clear that on average no optimization algorithm can perform better than exhaustive search. . ⇒ All reasonable optimization algorithms Free! are equally good/bad on average. • Conclusion correct, but obviously no practical implication, since nobody cares about the maximum of white noise functions. Free!* • Uniform and universal sampling are both (non)assumptions, but only universal sampling makes sense and offers a free lunch. *Subject to computation fees
  • 15. Marcus Hutter - 15 - Foundations of Induction Problems with Frequentism • Definition: The probability of event E is the limiting relative frequency of its occurrence. P (E) := limn→∞ #n (E)/n. • Circularity of definition: Limit exists only with probability 1. So we have explained “Probability of E” in terms of “Probability 1”. What does probability 1 mean? [Cournot’s principle can help] • Limitation to i.i.d.: Requires independent and identically distributed (i.i.d) samples. But the real world is not i.i.d. • Reference class problem: Example: Counting the frequency of some disease among “similar” patients. Considering all we know (symptoms, weight, age, ancestry, ...) there are no two similar patients. [Machine learning via feature selection can help]
  • 16. Marcus Hutter - 16 - Foundations of Induction Statistical Learning Theory • Statistical Learning Theory predominantly considers i.i.d. data. • E.g. Empirical Risk Minimization, PAC bounds, VC-dimension, Rademacher complexity, Cross-Validation is mostly developed for i.i.d. data. • Applications: There are enough applications with data close to i.i.d. for Frequentists to thrive, and they are pushing their frontiers too. • But the Real World is not (even close to) i.i.d. • Real Life is a single long non-stationary non-ergodic trajectory of experience.
  • 17. Marcus Hutter - 17 - Foundations of Induction Limitations of Other Approaches • Subjective Bayes: No formal procedure/theory to get prior. • Objective Bayes: Right in spirit, but limited to small classes unless community embraces information theory. • MDL/MML: practical approximations of universal induction. • Pluralism is globally inconsistent. • Deductive Logic: Not strong enough to allow for induction. • Non-monotonic reasoning, inductive logic, default reasoning do not properly take uncertainty into account. • Carnap’s confirmation theory: Only for exchangeable data. Cannot confirm universal hypotheses. • Data paradigm: Data may be more important than algorithms for “simple” problems, but a “lookup-table” AGI will not work. • Eliminative induction ignores uncertainty and information theory.
  • 18. Marcus Hutter - 18 - Foundations of Induction Summary The criticized approaches cannot serve as a general foundation of induction. Conciliation Of course most of the criticized approaches do work in their limited domains, and are trying to push their boundaries towards more generality. And What Now? Criticizing others is easy and in itself a bit pointless. The crucial question is whether there is something better out there. And indeed there is, which I will turn to now.
  • 19. Marcus Hutter - 19 - Foundations of Induction Universal Induction
  • 20. Marcus Hutter - 20 - Foundations of Induction Foundations of Universal Induction Ockhams’ razor (simplicity) principle Entities should not be multiplied beyond necessity. Epicurus’ principle of multiple explanations If more than one theory is consistent with the observations, keep all theories. Bayes’ rule for conditional probabilities Given the prior belief/probability one can predict all future prob- abilities. Turing’s universal machine Everything computable by a human using a fixed procedure can also be computed by a (universal) Turing machine. Kolmogorov’s complexity The complexity or information content of an object is the length of its shortest description on a universal Turing machine. Solomonoff’s universal prior=Ockham+Epicurus+Bayes+Turing Solves the question of how to choose the prior if nothing is known. ⇒ universal induction, formal Occam,AIT,MML,MDL,SRM,...
  • 21. Marcus Hutter - 21 - Foundations of Induction Science ≈ Induction ≈ Occam’s Razor • Grue Emerald Paradox: Hypothesis 1: All emeralds are green. Hypothesis 2: All emeralds found till y2020 are green, thereafter all emeralds are blue. • Which hypothesis is more plausible? H1! Justification? • Occam’s razor: take simplest hypothesis consistent with data. is the most important principle in machine learning and science. • Problem: How to quantify “simplicity”? Beauty? Elegance? Description Length! [The Grue problem goes much deeper. This is only half of the story]
  • 22. Marcus Hutter - 22 - Foundations of Induction Turing Machines & Effective Enumeration • Turing machine (TM) = (mathema- tical model for an) idealized computer. • See e.g. textbook [HMU06] Show Turing Machine in Action: TuringBeispielAnimated.gif • Instruction i: If symbol on tape under head is 0/1, write 0/1/- and move head left/right/not and goto instruction=state j. • {partial recursive functions } ≡ {functions computable with a TM}. • A set of objects S = {o1 , o2 , o3 , ...} can be (effectively) enumerated :⇐⇒ ∃ TM machine mapping i to ⟨oi ⟩, where ⟨⟩ is some (often omitted) default coding of elements in S.
  • 23. Marcus Hutter - 23 - Foundations of Induction Information Theory & Kolmogorov Complexity • Quantification/interpretation of Occam’s razor: • Shortest description of object is best explanation. • Shortest program for a string on a Turing machine T leads to best extrapolation=prediction. KT (x) = min{l(p) : T (p) = x} p • Prediction is best for a natural universal Turing machine U . Kolmogorov-complexity(x) = K(x) = KU (x) ≤ KT (x) + cT
  • 24. Marcus Hutter - 24 - Foundations of Induction Bayesian Probability Theory Given (1): Models P (D|Hi ) for probability of observing data D, when Hi is true. Given (2): Prior probability over hypotheses P (Hi ). Goal: Posterior probability P (Hi |D) of Hi , after having seen data D. Solution: P (D|Hi ) · P (Hi ) Bayes’ rule: P (Hi |D) = ∑ i P (D|Hi ) · P (Hi ) (1) Models P (D|Hi ) usually easy to describe (objective probabilities) (2) But Bayesian prob. theory does not tell us how to choose the prior P (Hi ) (subjective probabilities)
  • 25. Marcus Hutter - 25 - Foundations of Induction Algorithmic Probability Theory • Epicurus: If more than one theory is consistent with the observations, keep all theories. • ⇒ uniform prior over all Hi ? • Refinement with Occam’s razor quantified in terms of Kolmogorov complexity: P (Hi ) := wHi := 2−KT /U (Hi ) U • Fixing T we have a complete theory for prediction. Problem: How to choose T . • Choosing U we have a universal theory for prediction. Observation: Particular choice of U does not matter much. Problem: Incomputable.
  • 26. Marcus Hutter - 26 - Foundations of Induction Inductive Inference & Universal Forecasting • Solomonoff combined Occam, Epicurus, Bayes, and Turing into one formal theory of sequential prediction. • M (x) = probability that a universal Turing machine outputs x when provided with fair coin flips on the input tape. • A posteriori probability of y given x is M (y|x) = M (xy)/M (x). • Given x1 , .., xt−1 , the probability of xt is M (xt |x1 ...xt−1 ). • Immediate “applications”: - Weather forecasting: xt ∈ {sun,rain}. - Stock-market prediction: xt ∈ {bear,bull}. - Continuing number sequences in an IQ test: xt ∈ I . N • Optimal universal inductive reasoning system!
  • 27. Marcus Hutter - 27 - Foundations of Induction Some Prediction Bounds for M ∑ M (x) = universal distribution. hn := xn (M (xn |x<n ) − µ(xn |x<n ))2 µ(x) = unknown true comp. distr. (no i.i.d. or any other assumptions) ∑∞ • Total bound: n=1 E[hn ] ≤ K(µ) ln 2, which implies Convergence: M (xn |x<n ) → µ(xn |x<n ) w.µ.p.1. • Instantaneous i.i.d. bounds: For i.i.d. M with continuous, discrete, and universal prior, respectively: × 1 −1 × 1 −1 E[hn ] ≤ n ln w(µ) and E[hn ] ≤ n ln wµ = n K(µ) ln 2. 1 • Bounds for computable environments: Rapidly M (xt |x<t ) → 1 on every computable sequence x1:∞ (whichsoever, e.g. 1∞ or the digits of π or e), i.e. M quickly recognizes the structure of the sequence. • Weak instantaneous bounds: valid for all n and x1:n and xn ̸= xn : ¯ × × 2−K(n) ≤ M (¯n |x<n ) ≤ 22K(x1:n ∗)−K(n) x × • Magic instance numbers: e.g. M (0|1n ) = 2−K(n) → 0, but spikes up for simple n. M is cautious at magic instance numbers n. • Future bounds / errors to come: If our past observations ω1:n contain a lot of information about µ, we make few errors in future: ∑∞ + t=n+1 E[ht |ω1:n ] ≤ [K(µ|ω1:n )+K(n)] ln 2
  • 28. Marcus Hutter - 28 - Foundations of Induction Some other Properties of M • Problem of zero prior / confirmation of universal hypotheses: { ≡ 0 in Bayes-Laplace model P[All ravens black|n black ravens] f ast −→ 1 for universal prior wθ U • Reparametrization and regrouping invariance: wθ = 2−K(θ) always U exists and is invariant w.r.t. all computable reparametrizations f . (Jeffrey prior only w.r.t. bijections, and does not always exist) • The Problem of Old Evidence: No risk of biasing the prior towards past data, since wθ is fixed and independent of model class M. U • The Problem of New Theories: Updating of M is not necessary, since universal class MU includes already all. • M predicts better than all other mixture predictors based on any (continuous or discrete) model class and prior, even in non-computable environments.
  • 29. Marcus Hutter - 29 - Foundations of Induction More Stuff / Critique / Problems • Prior knowledge y can be incorporated by using “subjective” prior wν|y = 2−K(ν|y) or by prefixing observation x by y. U • Additive/multiplicative constant fudges and U -dependence is often (but not always) harmless. • Incomputability: K and M can serve as “gold standards” which practitioners should aim at, but have to be (crudely) approximated in practice (MDL [Ris89], MML [Wal05], LZW [LZ76], CTW [WSTT95], NCD [CV05]).
  • 30. Marcus Hutter - 30 - Foundations of Induction Universal Artificial Intelligence
  • 31. Marcus Hutter - 31 - Foundations of Induction Induction→Prediction→Decision→Action Having or acquiring or learning or inducing a model of the environment an agent interacts with allows the agent to make predictions and utilize them in its decision process of finding a good next action. Induction infers general models from specific observations/facts/data, usually exhibiting regularities or properties or relations in the latter. Example Induction: Find a model of the world economy. Prediction: Use the model for predicting the future stock market. Decision: Decide whether to invest assets in stocks or bonds. Action: Trading large quantities of stocks influences the market.
  • 32. Marcus Hutter - 32 - Foundations of Induction Sequential Decision Theory Setup: For t = 1, 2, 3, 4, ... Given sequence x1 , x2 , ..., xt−1 (1) predict/make decision yt , (2) observe xt , (3) suffer loss Loss(xt , yt ), (4) t → t + 1, goto (1) Goal: Minimize expected Loss. Greedy minimization of expected loss is optimal if: Important: Decision yt does not influence env. (future observations). Loss function is known. Problem: Expectation w.r.t. what? Solution: W.r.t. universal distribution M if true distr. is unknown.
  • 33. Marcus Hutter - 33 - Foundations of Induction Agent Model with Reward if actions/decisions a influence the environment q r1 | o1 r2 | o2 r3 | o3 r4 | o4 r5 | o5 r6 | o6 ... ¨¨¨ r ‰rr ¨ %¨ rr Agent Environ- work tape ... work tape ... p ment q €€ I €€ €€ € q a1 a2 a3 a4 a5 a6 ...
  • 34. Marcus Hutter - 34 - Foundations of Induction Universal Artificial Intelligence Key idea: Optimal action/plan/policy based on the simplest world model consistent with history. Formally ... ∑ ∑ ∑ AIXI: ak := arg max ... max [rk + ... + rm ] 2−length(p) ak am ok rk om rm p : U (p,a1 ..am )=o1 r1 ..om rm k=now, action, observation, reward, U niversal TM, program, m=lifespan AIXI is an elegant, complete, essentially unique, and limit-computable mathematical theory of AI. Claim: AIXI is the most intelligent environmental independent, i.e. universally optimal, agent possible. Proof: For formalizations, quantifications, proofs see ⇒ Problem: Computationally intractable. Achievement: Well-defines AI. Gold standard to aim at. Inspired practical algorithms. Cf. infeasible exact minimax. [H’00-05]
  • 35. Marcus Hutter - 35 - Foundations of Induction Applications Approximations
  • 36. Marcus Hutter - 36 - Foundations of Induction The Minimum Description Length Principle • Approximation of Solomonoff, since M is incomputable: • M (x) ≈ 2−KU (x) (quite good) • KU (x) ≈ KT (x) (very crude) • Predict y of highest M (y|x) is approximately same as • MDL: Predict y of smallest KT (xy).
  • 37. Marcus Hutter - 37 - Foundations of Induction Universal Clustering • Question: When is object x similar to object y? • Universal solution: x similar to y ⇔ x can be easily (re)constructed from y ⇔ K(x|y) := min{l(p) : U (p, y) = x} is small. • Universal Similarity: Symmetrizenormalize K(x|y). • Normalized compression distance: Approximate K by KT . • Practice: For T choose (de)compressor like lzw or gzip or bzip(2). • Multiple objects ⇒ similarity matrix ⇒ similarity tree. • Applications: Completely automatic reconstruction (a) of the evolutionary tree of 24 mammals based on complete mtDNA, and (b) of the classification tree of 52 languages based on the declaration of human rights and (c) many others. [CilibrasiVitanyi’05]
  • 38. Marcus Hutter - 38 - Foundations of Induction Universal Search • Levin search: Fastest algorithm for inversion and optimization problems. • Theoretical application: Assume somebody found a non-constructive proof of P=NP, then Levin-search is a polynomial time algorithm for every NP (complete) problem. • Practical (OOPS) applications (J. Schmidhuber) Maze, towers of hanoi, robotics, ... • FastPrg: The asymptotically fastest and shortest algorithm for all well-defined problems. • Computable Approximations of AIXI: AIXItl and AIξ and MC-AIXI-CTW and ΦMDP. • Human Knowledge Compression Prize: (50’000=) C
  • 39. Marcus Hutter - 39 - Foundations of Induction A Monte-Carlo AIXI Approximation based on Upper Confidence Tree (UCT) search for planning and Context Tree Weighting (CTW) compression for learning Normalized Learning Scalability 1 Norm. Av. Reward per Trial Optimum Tiger 4x4 Grid 1d Maze Extended Tiger [VNHUS’09-11] TicTacToe Cheese Maze Experience Pocman* 0 100 1000 10000 100000 1000000
  • 40. Marcus Hutter - 40 - Foundations of Induction Conclusion
  • 41. Marcus Hutter - 41 - Foundations of Induction Summary • Conceptually and mathematically the problem of induction is solved. • Computational problems and some philosophical questions remain. • Ingredients for induction and prediction: Ockham, Epicurus, Turing, Bayes, Kolmogorov, Solomonoff • For decisions and actions: Include Bellman. • Mathematical results: consistency, bounds, optimality, many others. • Most Philosophical riddles around induction solved. • Experimental results via practical compressors. Induction ≈ Science ≈ Machine Learning ≈ Ockham’s razor ≈ Compression ≈ Intelligence.
  • 42. Marcus Hutter - 42 - Foundations of Induction Advice • Accept Universal Induction (UI) as the best conceptual solution of the induction problem so far. • Stand on the shoulders of giants like Shannon, Bayes, Turing, Kolmogorov, Solomonoff, Wallace, Rissanen, Bellman. • Work out defects / what is missing, and try to improve, or • Work on alternatives but then benchmark your approach against state of the art UI. • Cranks who have not understood the giants and try to reinvent the wheel from scratch can safely be ignored. Never trust a theory if it is not supported by an experiment === ===== experiment theory
  • 43. Marcus Hutter - 43 - Foundations of Induction When it’s OK to ignore UI • if your pursued approaches already works sufficiently well • if your problem is simple enough (e.g. i.i.d.) • if you do not care about a principled/sound solution • if you’re happy to succeed by trial-and-error (with restrictions) Information Theory • Information Theory plays an even more significant role for induction than this presentation might suggest. • Algorithmic Information Theory is superior to Shannon Information. • There are AIT versions that even capture Meaningful Information.
  • 44. Marcus Hutter - 44 - Foundations of Induction Outlook • Use compression size as general performance measure (like perplexity is used in speech) • Via code-length view, many approaches become comparable, and may be regarded as approximations to UI. • This should lead to better compression algorithms which in turn should lead to better learning algorithms. • Address open problems in induction within the UI framework.
  • 45. Marcus Hutter - 45 - Foundations of Induction Thanks! Questions? Details: [RH11] S. Rathmanner and M. Hutter. A philosophical treatise of universal induction. Entropy, 13(6):1076–1136, 2011. [Hut07] M. Hutter. On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384(1):33–48, 2007. [LH07] S. Legg and M. Hutter. Universal intelligence: A definition of machine intelligence. Minds Machines, 17(4):391–444, 2007. [Hut05] M. Hutter. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin, 2005. [GS03] P. Godfrey-Smith. Theory and Reality: An Introduction to the Philosophy of Science. University of Chicago, 2003.