SlideShare a Scribd company logo
1 of 98
Download to read offline
Outline       Framework     Theoretical Results             Consequences              Practical Implications




          Machine Learning: Some theoretical and practical
                             problems

                               Olivier Bousquet


                          Journ´es MAS, Lille, 2006
                               e




                           Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline            Framework        Theoretical Results             Consequences              Practical Implications




          1   Framework


          2   Theoretical Results


          3   Consequences


          4   Practical Implications




                                 Olivier Bousquet         Machine Learning: Some theoretical and practical problems
Outline         Framework          Theoretical Results             Consequences              Practical Implications




Outline



      1   Framework


      2   Theoretical Results


      3   Consequences


      4   Practical Implications




                                Olivier Bousquet         Machine Learning: Some theoretical and practical problems
Outline      Framework        Theoretical Results             Consequences              Practical Implications




The Setting


          Prediction problems
          after observing example pairs (X , Y ), build a function g : X → Y
          that predicts well: g (X ) ≈ Y
          Typical setting is statistical (data assumed to be sampled i.i.d.)
          Other setting: on-line adversarial (no assumption on the data
          generation mechanism)
          Goal: find the best algorithm
               Theoretical answer: fundamental limits of learning
               Practical answer: guidelines for algorithm design




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework        Theoretical Results             Consequences              Practical Implications




The Setting


          Prediction problems
          after observing example pairs (X , Y ), build a function g : X → Y
          that predicts well: g (X ) ≈ Y
          Typical setting is statistical (data assumed to be sampled i.i.d.)
          Other setting: on-line adversarial (no assumption on the data
          generation mechanism)
          Goal: find the best algorithm
               Theoretical answer: fundamental limits of learning
               Practical answer: guidelines for algorithm design




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework        Theoretical Results             Consequences              Practical Implications




The Setting


          Prediction problems
          after observing example pairs (X , Y ), build a function g : X → Y
          that predicts well: g (X ) ≈ Y
          Typical setting is statistical (data assumed to be sampled i.i.d.)
          Other setting: on-line adversarial (no assumption on the data
          generation mechanism)
          Goal: find the best algorithm
               Theoretical answer: fundamental limits of learning
               Practical answer: guidelines for algorithm design




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework        Theoretical Results             Consequences              Practical Implications




The Setting


          Prediction problems
          after observing example pairs (X , Y ), build a function g : X → Y
          that predicts well: g (X ) ≈ Y
          Typical setting is statistical (data assumed to be sampled i.i.d.)
          Other setting: on-line adversarial (no assumption on the data
          generation mechanism)
          Goal: find the best algorithm
               Theoretical answer: fundamental limits of learning
               Practical answer: guidelines for algorithm design




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework        Theoretical Results             Consequences              Practical Implications




The Setting


          Prediction problems
          after observing example pairs (X , Y ), build a function g : X → Y
          that predicts well: g (X ) ≈ Y
          Typical setting is statistical (data assumed to be sampled i.i.d.)
          Other setting: on-line adversarial (no assumption on the data
          generation mechanism)
          Goal: find the best algorithm
               Theoretical answer: fundamental limits of learning
               Practical answer: guidelines for algorithm design




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework        Theoretical Results             Consequences              Practical Implications




The Setting


          Prediction problems
          after observing example pairs (X , Y ), build a function g : X → Y
          that predicts well: g (X ) ≈ Y
          Typical setting is statistical (data assumed to be sampled i.i.d.)
          Other setting: on-line adversarial (no assumption on the data
          generation mechanism)
          Goal: find the best algorithm
               Theoretical answer: fundamental limits of learning
               Practical answer: guidelines for algorithm design




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework         Theoretical Results             Consequences              Practical Implications




Definitions

      We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

           A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y.
           Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )}
           Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the
           sample)
           Bayes error: best possible error L∗ = inf g L(g ) over all measurable
           functions
           Sequence of classification rules {gn }: defined for any sample size
           (algorithms are usually defined in this way, possibly with a sample
           size-dependent parameter)
           Consistency: limn→∞ EL(gn ) = L∗


                                Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework         Theoretical Results             Consequences              Practical Implications




Definitions

      We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

           A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y.
           Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )}
           Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the
           sample)
           Bayes error: best possible error L∗ = inf g L(g ) over all measurable
           functions
           Sequence of classification rules {gn }: defined for any sample size
           (algorithms are usually defined in this way, possibly with a sample
           size-dependent parameter)
           Consistency: limn→∞ EL(gn ) = L∗


                                Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework         Theoretical Results             Consequences              Practical Implications




Definitions

      We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

           A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y.
           Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )}
           Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the
           sample)
           Bayes error: best possible error L∗ = inf g L(g ) over all measurable
           functions
           Sequence of classification rules {gn }: defined for any sample size
           (algorithms are usually defined in this way, possibly with a sample
           size-dependent parameter)
           Consistency: limn→∞ EL(gn ) = L∗


                                Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework         Theoretical Results             Consequences              Practical Implications




Definitions

      We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

           A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y.
           Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )}
           Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the
           sample)
           Bayes error: best possible error L∗ = inf g L(g ) over all measurable
           functions
           Sequence of classification rules {gn }: defined for any sample size
           (algorithms are usually defined in this way, possibly with a sample
           size-dependent parameter)
           Consistency: limn→∞ EL(gn ) = L∗


                                Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework         Theoretical Results             Consequences              Practical Implications




Definitions

      We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

           A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y.
           Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )}
           Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the
           sample)
           Bayes error: best possible error L∗ = inf g L(g ) over all measurable
           functions
           Sequence of classification rules {gn }: defined for any sample size
           (algorithms are usually defined in this way, possibly with a sample
           size-dependent parameter)
           Consistency: limn→∞ EL(gn ) = L∗


                                Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework         Theoretical Results             Consequences              Practical Implications




Definitions

      We consider the classification setting: Y = {0, 1} with data sampled i.i.d.

           A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y.
           Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )}
           Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the
           sample)
           Bayes error: best possible error L∗ = inf g L(g ) over all measurable
           functions
           Sequence of classification rules {gn }: defined for any sample size
           (algorithms are usually defined in this way, possibly with a sample
           size-dependent parameter)
           Consistency: limn→∞ EL(gn ) = L∗


                                Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline         Framework          Theoretical Results             Consequences              Practical Implications




Outline



      1   Framework


      2   Theoretical Results


      3   Consequences


      4   Practical Implications




                                Olivier Bousquet         Machine Learning: Some theoretical and practical problems
Outline       Framework        Theoretical Results             Consequences              Practical Implications




Consistency


      How to build a consistent sequence of rules?
           Countable X
           very easy, just wait! eventually every point with non-zero probability
           is observed an unbounded number of times (i.e. take majority vote
           over observed x, and random prediction on unobserved ones)
           Uncountable X
           observed sample has measure zero (for non-atomic measures) so
           this trick does not work
           Instead, take local majority with two conditions: more and more
           local, but also more and more points averaged




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework        Theoretical Results             Consequences              Practical Implications




Consistency


      How to build a consistent sequence of rules?
           Countable X
           very easy, just wait! eventually every point with non-zero probability
           is observed an unbounded number of times (i.e. take majority vote
           over observed x, and random prediction on unobserved ones)
           Uncountable X
           observed sample has measure zero (for non-atomic measures) so
           this trick does not work
           Instead, take local majority with two conditions: more and more
           local, but also more and more points averaged




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework        Theoretical Results             Consequences              Practical Implications




Consistency


      How to build a consistent sequence of rules?
           Countable X
           very easy, just wait! eventually every point with non-zero probability
           is observed an unbounded number of times (i.e. take majority vote
           over observed x, and random prediction on unobserved ones)
           Uncountable X
           observed sample has measure zero (for non-atomic measures) so
           this trick does not work
           Instead, take local majority with two conditions: more and more
           local, but also more and more points averaged




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework        Theoretical Results             Consequences              Practical Implications




Consistency of Histograms

      Histogram in Rd : cubic cells of size hn , prediction is constant over each
      cell (majority vote)
                     d
           hn → 0, nhn → ∞ enough for universal consistency
           Idea of the proof
                 Continuous functions with bounded support are dense in Lp (ν)
                 Such functions are uniformly continuous and can thus be
                 approximated by histograms (average of the function on a cell)
                 provided cell size goes to 0
                 Since cells will contain more and more points (second
                 condition), the cell value will eventually converge to the
                 average over the cell



                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework        Theoretical Results             Consequences              Practical Implications




Consistency of Histograms

      Histogram in Rd : cubic cells of size hn , prediction is constant over each
      cell (majority vote)
                     d
           hn → 0, nhn → ∞ enough for universal consistency
           Idea of the proof
                 Continuous functions with bounded support are dense in Lp (ν)
                 Such functions are uniformly continuous and can thus be
                 approximated by histograms (average of the function on a cell)
                 provided cell size goes to 0
                 Since cells will contain more and more points (second
                 condition), the cell value will eventually converge to the
                 average over the cell



                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




No Free Lunch



           We can ”learn” anything
           Is the problem solved?
           The question becomes: among the consistent algorithms, which one
           is the best?
           We consider here the special case of classification
           Similar phenomena occur for regression or density estimation

      Unfortunately, there is no free lunch




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




No Free Lunch



           We can ”learn” anything
           Is the problem solved?
           The question becomes: among the consistent algorithms, which one
           is the best?
           We consider here the special case of classification
           Similar phenomena occur for regression or density estimation

      Unfortunately, there is no free lunch




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




No Free Lunch



           We can ”learn” anything
           Is the problem solved?
           The question becomes: among the consistent algorithms, which one
           is the best?
           We consider here the special case of classification
           Similar phenomena occur for regression or density estimation

      Unfortunately, there is no free lunch




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




No Free Lunch



           We can ”learn” anything
           Is the problem solved?
           The question becomes: among the consistent algorithms, which one
           is the best?
           We consider here the special case of classification
           Similar phenomena occur for regression or density estimation

      Unfortunately, there is no free lunch




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




No Free Lunch



           We can ”learn” anything
           Is the problem solved?
           The question becomes: among the consistent algorithms, which one
           is the best?
           We consider here the special case of classification
           Similar phenomena occur for regression or density estimation

      Unfortunately, there is no free lunch




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




No Free Lunch



           We can ”learn” anything
           Is the problem solved?
           The question becomes: among the consistent algorithms, which one
           is the best?
           We consider here the special case of classification
           Similar phenomena occur for regression or density estimation

      Unfortunately, there is no free lunch




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework        Theoretical Results             Consequences              Practical Implications




No Free Lunch 1


      Out-of-sample error: L (gn ) = P (gn (X ) = Y |X ∈ Sn )
                                                       /
           Consider a uniform probability distribution µ over problems, i.e. for
           all x, Eµ P(Y = 1|X = x) = Eµ P(Y = 0|X = x)
           All classifiers have the same average error

      Theorem (Wolpert96)
      For any classification rule gn ,
                                                            1
                                      Eµ EL (gn ) =
                                                            2




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework        Theoretical Results             Consequences              Practical Implications




No Free Lunch 1


      Out-of-sample error: L (gn ) = P (gn (X ) = Y |X ∈ Sn )
                                                       /
           Consider a uniform probability distribution µ over problems, i.e. for
           all x, Eµ P(Y = 1|X = x) = Eµ P(Y = 0|X = x)
           All classifiers have the same average error

      Theorem (Wolpert96)
      For any classification rule gn ,
                                                            1
                                      Eµ EL (gn ) =
                                                            2




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline            Framework    Theoretical Results             Consequences              Practical Implications




No Free Lunch 2


            A consequence of NFL1 is that there are always cases where an
            algorithm can be beaten.
            A stronger version of NFL1: No Super Classifier

      Theorem (DGL96)
      For every sequence of classification rules {gn } there is a universally
      consistent sequence {gn } such that for some distribution

                                       L(gn ) > L(gn )

      for all n.




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline            Framework    Theoretical Results             Consequences              Practical Implications




No Free Lunch 2


            A consequence of NFL1 is that there are always cases where an
            algorithm can be beaten.
            A stronger version of NFL1: No Super Classifier

      Theorem (DGL96)
      For every sequence of classification rules {gn } there is a universally
      consistent sequence {gn } such that for some distribution

                                       L(gn ) > L(gn )

      for all n.




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework        Theoretical Results             Consequences              Practical Implications




No Free Lunch 3



           A variation of NFL1
           Arbitrarily bad error for fixed sample sizes

      Theorem (Devroye82)
      Fix an > 0. For any integer n and classification rule gn , there exists a
      distribution of (X , Y ) with Bayes risk L∗ = 0 such that

                                    EL(gn ) ≥ 1/2 −




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework        Theoretical Results             Consequences              Practical Implications




No Free Lunch 3



           A variation of NFL1
           Arbitrarily bad error for fixed sample sizes

      Theorem (Devroye82)
      Fix an > 0. For any integer n and classification rule gn , there exists a
      distribution of (X , Y ) with Bayes risk L∗ = 0 such that

                                    EL(gn ) ≥ 1/2 −




                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline            Framework    Theoretical Results             Consequences              Practical Implications




No Free Lunch 4


            NFL3 possibly considers a different distribution for each n
            What happens for a fixed distribution when n increases?
            Slow rate phenomenon

      Theorem (Devroye82)
      Let {an } be a sequence of positive numbers converging to zero with
      1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, there
      exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that

                                        EL(gn ) ≥ an

      for all n.



                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline            Framework    Theoretical Results             Consequences              Practical Implications




No Free Lunch 4


            NFL3 possibly considers a different distribution for each n
            What happens for a fixed distribution when n increases?
            Slow rate phenomenon

      Theorem (Devroye82)
      Let {an } be a sequence of positive numbers converging to zero with
      1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, there
      exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that

                                        EL(gn ) ≥ an

      for all n.



                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline            Framework    Theoretical Results             Consequences              Practical Implications




No Free Lunch 4


            NFL3 possibly considers a different distribution for each n
            What happens for a fixed distribution when n increases?
            Slow rate phenomenon

      Theorem (Devroye82)
      Let {an } be a sequence of positive numbers converging to zero with
      1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, there
      exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that

                                        EL(gn ) ≥ an

      for all n.



                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Proofs



          The idea is to create a ”bad” distribution
          It turns out that random ones are bad enough: just create a problem
          with no structure (prediction at x unrelated to prediction at x )
          All proofs work on finite (for fixed n) or countable (for varying n)
          spaces (no need to introduce uncountable X )
          The trick is to make sure that there are enough point that have not
          been observed yet (on those, the error will be 1/2)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Proofs



          The idea is to create a ”bad” distribution
          It turns out that random ones are bad enough: just create a problem
          with no structure (prediction at x unrelated to prediction at x )
          All proofs work on finite (for fixed n) or countable (for varying n)
          spaces (no need to introduce uncountable X )
          The trick is to make sure that there are enough point that have not
          been observed yet (on those, the error will be 1/2)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Proofs



          The idea is to create a ”bad” distribution
          It turns out that random ones are bad enough: just create a problem
          with no structure (prediction at x unrelated to prediction at x )
          All proofs work on finite (for fixed n) or countable (for varying n)
          spaces (no need to introduce uncountable X )
          The trick is to make sure that there are enough point that have not
          been observed yet (on those, the error will be 1/2)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Proofs



          The idea is to create a ”bad” distribution
          It turns out that random ones are bad enough: just create a problem
          with no structure (prediction at x unrelated to prediction at x )
          All proofs work on finite (for fixed n) or countable (for varying n)
          spaces (no need to introduce uncountable X )
          The trick is to make sure that there are enough point that have not
          been observed yet (on those, the error will be 1/2)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework         Theoretical Results             Consequences              Practical Implications




A closer look at consistency



          Consider the trivially consistent rule for a countable space (majority
          vote)
          Its error decreases with increasing sample size

                                  ∀n, EL(gn ) ≥ EL(gn+1 )

          Is is true in general for universally consistent rules?




                             Olivier Bousquet        Machine Learning: Some theoretical and practical problems
Outline      Framework         Theoretical Results             Consequences              Practical Implications




A closer look at consistency



          Consider the trivially consistent rule for a countable space (majority
          vote)
          Its error decreases with increasing sample size

                                  ∀n, EL(gn ) ≥ EL(gn+1 )

          Is is true in general for universally consistent rules?




                             Olivier Bousquet        Machine Learning: Some theoretical and practical problems
Outline      Framework         Theoretical Results             Consequences              Practical Implications




A closer look at consistency



          Consider the trivially consistent rule for a countable space (majority
          vote)
          Its error decreases with increasing sample size

                                  ∀n, EL(gn ) ≥ EL(gn+1 )

          Is is true in general for universally consistent rules?




                             Olivier Bousquet        Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




Smart rules
      Consistency for uncountable spaces is not so trivial
           Smart rules
      Definition
      A sequence {gn } of classification rules is smart if for any distribution and
      any integer n,
                                 EL(gn ) ≥ EL(gn+1 )

           For uncountable spaces, some of the known universally consistent
           rules can be shown to be non-smart
           Conjecture: on Rd any smart rule is not universally consistent
           Interpretation: consistency on uncountable spaces requires to adapt
           the degree of smoothness to the sample size, this means that there
           will be a point for which smoothness degree will be too large

                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




Smart rules
      Consistency for uncountable spaces is not so trivial
           Smart rules
      Definition
      A sequence {gn } of classification rules is smart if for any distribution and
      any integer n,
                                 EL(gn ) ≥ EL(gn+1 )

           For uncountable spaces, some of the known universally consistent
           rules can be shown to be non-smart
           Conjecture: on Rd any smart rule is not universally consistent
           Interpretation: consistency on uncountable spaces requires to adapt
           the degree of smoothness to the sample size, this means that there
           will be a point for which smoothness degree will be too large

                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




Smart rules
      Consistency for uncountable spaces is not so trivial
           Smart rules
      Definition
      A sequence {gn } of classification rules is smart if for any distribution and
      any integer n,
                                 EL(gn ) ≥ EL(gn+1 )

           For uncountable spaces, some of the known universally consistent
           rules can be shown to be non-smart
           Conjecture: on Rd any smart rule is not universally consistent
           Interpretation: consistency on uncountable spaces requires to adapt
           the degree of smoothness to the sample size, this means that there
           will be a point for which smoothness degree will be too large

                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline        Framework       Theoretical Results             Consequences              Practical Implications




Smart rules
      Consistency for uncountable spaces is not so trivial
           Smart rules
      Definition
      A sequence {gn } of classification rules is smart if for any distribution and
      any integer n,
                                 EL(gn ) ≥ EL(gn+1 )

           For uncountable spaces, some of the known universally consistent
           rules can be shown to be non-smart
           Conjecture: on Rd any smart rule is not universally consistent
           Interpretation: consistency on uncountable spaces requires to adapt
           the degree of smoothness to the sample size, this means that there
           will be a point for which smoothness degree will be too large

                              Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Anti-learning


      Average error is 1/2 so there are problems for which the error is much
      worse than random guessing!
           One can indeed construct distributions for which some standard
           algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0!
           Of course this occurs for a fixed sample size
           Can one always do that (for any rule) ?
           The problem should have a structure, but one which is opposite to
           the ones preferred by the algorithm




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Anti-learning


      Average error is 1/2 so there are problems for which the error is much
      worse than random guessing!
           One can indeed construct distributions for which some standard
           algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0!
           Of course this occurs for a fixed sample size
           Can one always do that (for any rule) ?
           The problem should have a structure, but one which is opposite to
           the ones preferred by the algorithm




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Anti-learning


      Average error is 1/2 so there are problems for which the error is much
      worse than random guessing!
           One can indeed construct distributions for which some standard
           algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0!
           Of course this occurs for a fixed sample size
           Can one always do that (for any rule) ?
           The problem should have a structure, but one which is opposite to
           the ones preferred by the algorithm




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Anti-learning


      Average error is 1/2 so there are problems for which the error is much
      worse than random guessing!
           One can indeed construct distributions for which some standard
           algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0!
           Of course this occurs for a fixed sample size
           Can one always do that (for any rule) ?
           The problem should have a structure, but one which is opposite to
           the ones preferred by the algorithm




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Bayes Error Estimation

      Assume we just want to estimate L∗ .
           Of course, we could use any universally consistent algorithm and
           estimate its error. But we get slow rates!
           Is there a better way?

      Theorem (DGL96)
                                     ˆ
      For every n, for any estimate Ln of the Bayes error L∗ and for every
       > 0, there exists a distribution (X , Y ), such that

                                 ˆ
                              E |Ln − L∗ | ≥ 1/4 −

           Estimating this single number does not seem easier than estimating
           the whole set {x : P(Y = 1|x) > 1/2}


                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Bayes Error Estimation

      Assume we just want to estimate L∗ .
           Of course, we could use any universally consistent algorithm and
           estimate its error. But we get slow rates!
           Is there a better way?

      Theorem (DGL96)
                                     ˆ
      For every n, for any estimate Ln of the Bayes error L∗ and for every
       > 0, there exists a distribution (X , Y ), such that

                                 ˆ
                              E |Ln − L∗ | ≥ 1/4 −

           Estimating this single number does not seem easier than estimating
           the whole set {x : P(Y = 1|x) > 1/2}


                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Bayes Error Estimation

      Assume we just want to estimate L∗ .
           Of course, we could use any universally consistent algorithm and
           estimate its error. But we get slow rates!
           Is there a better way?

      Theorem (DGL96)
                                     ˆ
      For every n, for any estimate Ln of the Bayes error L∗ and for every
       > 0, there exists a distribution (X , Y ), such that

                                 ˆ
                              E |Ln − L∗ | ≥ 1/4 −

           Estimating this single number does not seem easier than estimating
           the whole set {x : P(Y = 1|x) > 1/2}


                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline         Framework          Theoretical Results             Consequences              Practical Implications




Outline



      1   Framework


      2   Theoretical Results


      3   Consequences


      4   Practical Implications




                                Olivier Bousquet         Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




What can we hope to prove?


          Our framework is too general! Nothing interesting can be said
          about learning algorithms
          Can we prove something interesting under slightly more restrictive
          assumptions?
          Are the distributions used to prove the NFLs pathological? (NFL 4
          holds even within classes of ”reasonable” distributions!)
          If we can define which problems actually occur in real life, we can
          hope to derive appropriate algorithms (optimal on this class of
          problems)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




What can we hope to prove?


          Our framework is too general! Nothing interesting can be said
          about learning algorithms
          Can we prove something interesting under slightly more restrictive
          assumptions?
          Are the distributions used to prove the NFLs pathological? (NFL 4
          holds even within classes of ”reasonable” distributions!)
          If we can define which problems actually occur in real life, we can
          hope to derive appropriate algorithms (optimal on this class of
          problems)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




What can we hope to prove?


          Our framework is too general! Nothing interesting can be said
          about learning algorithms
          Can we prove something interesting under slightly more restrictive
          assumptions?
          Are the distributions used to prove the NFLs pathological? (NFL 4
          holds even within classes of ”reasonable” distributions!)
          If we can define which problems actually occur in real life, we can
          hope to derive appropriate algorithms (optimal on this class of
          problems)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




What can we hope to prove?


          Our framework is too general! Nothing interesting can be said
          about learning algorithms
          Can we prove something interesting under slightly more restrictive
          assumptions?
          Are the distributions used to prove the NFLs pathological? (NFL 4
          holds even within classes of ”reasonable” distributions!)
          If we can define which problems actually occur in real life, we can
          hope to derive appropriate algorithms (optimal on this class of
          problems)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results               Consequences            Practical Implications




The Bayesian Way

           Assume something about how the data is generated
           Consider an algorithm specifically tuned to this property
           Prove that under this assumption the algorithm does well

      Most results are going in this direction (sometimes in a subtle way)
           Bayesian algorithms
           Most minimax results are of this form

                                 inf sup L(gn ) − inf L(g )
                                 {gn } P∈P                  g


           Seems reasonable and useful for understanding but does not provide
           guarantees

                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework         Theoretical Results             Consequences              Practical Implications




The Worst Case Way



          Assume nothing about the data (distribution-free)
          Restrict your objectives
          Derive an algorithm that reaches this objective no matter how the
          data is
                                 inf sup L(gn ) − inf L(g )
                                 {gn } P                    g ∈G

          Gives guarantees

      In between: adaptation




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework         Theoretical Results             Consequences              Practical Implications




The Worst Case Way



          Assume nothing about the data (distribution-free)
          Restrict your objectives
          Derive an algorithm that reaches this objective no matter how the
          data is
                                 inf sup L(gn ) − inf L(g )
                                 {gn } P                    g ∈G

          Gives guarantees

      In between: adaptation




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework         Theoretical Results             Consequences              Practical Implications




The Worst Case Way



          Assume nothing about the data (distribution-free)
          Restrict your objectives
          Derive an algorithm that reaches this objective no matter how the
          data is
                                 inf sup L(gn ) − inf L(g )
                                 {gn } P                    g ∈G

          Gives guarantees

      In between: adaptation




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework         Theoretical Results             Consequences              Practical Implications




The Worst Case Way



          Assume nothing about the data (distribution-free)
          Restrict your objectives
          Derive an algorithm that reaches this objective no matter how the
          data is
                                 inf sup L(gn ) − inf L(g )
                                 {gn } P                    g ∈G

          Gives guarantees

      In between: adaptation




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework         Theoretical Results             Consequences              Practical Implications




The Worst Case Way



          Assume nothing about the data (distribution-free)
          Restrict your objectives
          Derive an algorithm that reaches this objective no matter how the
          data is
                                 inf sup L(gn ) − inf L(g )
                                 {gn } P                    g ∈G

          Gives guarantees

      In between: adaptation




                               Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline         Framework          Theoretical Results             Consequences              Practical Implications




Outline



      1   Framework


      2   Theoretical Results


      3   Consequences


      4   Practical Implications




                                Olivier Bousquet         Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Does this help practically?



          We can probably come up with algorithms that work well on most
          real-world problems
          If we have a characterization of these problems, we can even prove
          something about such algorithms
          However, there is no guarantee that a new problem will satisfy this
          characterization
          So there cannot be a formal proof that an algorithm is good or bad




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Does this help practically?



          We can probably come up with algorithms that work well on most
          real-world problems
          If we have a characterization of these problems, we can even prove
          something about such algorithms
          However, there is no guarantee that a new problem will satisfy this
          characterization
          So there cannot be a formal proof that an algorithm is good or bad




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Does this help practically?



          We can probably come up with algorithms that work well on most
          real-world problems
          If we have a characterization of these problems, we can even prove
          something about such algorithms
          However, there is no guarantee that a new problem will satisfy this
          characterization
          So there cannot be a formal proof that an algorithm is good or bad




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




Does this help practically?



          We can probably come up with algorithms that work well on most
          real-world problems
          If we have a characterization of these problems, we can even prove
          something about such algorithms
          However, there is no guarantee that a new problem will satisfy this
          characterization
          So there cannot be a formal proof that an algorithm is good or bad




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




If theory cannot help, what can we do?




          Essentially a matter of finding an algorithm that implements the
          right notion of smoothness for the problem at hand
          More an art than a science!




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




If theory cannot help, what can we do?




          Essentially a matter of finding an algorithm that implements the
          right notion of smoothness for the problem at hand
          More an art than a science!




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Priors



      Algorithm design is composed of two steps
           Choosing a preference
           This first step is based on knowledge of the problem, this is where
           guidance (but no theory) is needed.
           Exploiting it for inference
           The second step can possibly be formalized (optimality with respect
           to assumptions). The main issue is computational cost.




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline       Framework       Theoretical Results             Consequences              Practical Implications




Priors



      Algorithm design is composed of two steps
           Choosing a preference
           This first step is based on knowledge of the problem, this is where
           guidance (but no theory) is needed.
           Exploiting it for inference
           The second step can possibly be formalized (optimality with respect
           to assumptions). The main issue is computational cost.




                             Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline          Framework        Theoretical Results             Consequences              Practical Implications




Why can algorithms fail in practice?

          1   Data representation (unappropriate features, errors, ...)
          2   Data scarcity (not enough data samples)
          3   Data overload (too many variables, too much noise)
          4   Lack of understanding of the result (impossible validation) / lack of
              validation data

      Examples
              Forgot to remove the output variable (or a version of it): algorithm
              picks it up
              An irrelevant variable happens to be discriminative (e.g. date of
              sample collection)
              Error in a measurement (misalignment in the database)

                                 Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




So, what would be helpful?

          Flexible ways to incorporate knowledge/expertise
               Provide tools that allow to formulate prior knowledge in a
               natural way
               Look for other types of prior assumptions that occur in various
               problems (e.g. manifold structure, clusteredness, analogy...)
          Ability to understand what is found by the algorithm (need a
          language to interact with experts)
               Investigate how to improve understandability (simpler models,
               separate models and language for interaction...)
               Improve interaction (understand user’s intent)
          Computationally efficient algorithms
               Scalability, anytime
               Incorporate time complexity in the theoretical analysis (trade
               complexity for accuracy)

                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems
Outline      Framework       Theoretical Results             Consequences              Practical Implications




References



          L. Devroye: Necessary and Sufficient Conditions for the Almost
          Everywhere Convergence of Nearest Neighbors Regression Function
          Estimates. Zeitschrift f¨r Wahrscheinlichkeitstheorie und verwandte
                                  u
          Gebiete, 61: 467-481 (1982)
          D. Wolpert: The lack of a prior distinctions between learning
          algorithms, Neural Computation 8 (1996)
          L. Devroye, L. Gy¨rfi and G. Lugosi: A Probabilistic Theory of
                           o
          Pattern Recognition, Springer (1996)




                            Olivier Bousquet       Machine Learning: Some theoretical and practical problems

More Related Content

What's hot

Reject Inference in Credit Scoring
Reject Inference in Credit ScoringReject Inference in Credit Scoring
Reject Inference in Credit ScoringAdrien Ehrhardt
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
Lesson 23: Antiderivatives
Lesson 23: AntiderivativesLesson 23: Antiderivatives
Lesson 23: AntiderivativesMatthew Leingang
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Lesson 28: Integration by Subsitution
Lesson 28: Integration by SubsitutionLesson 28: Integration by Subsitution
Lesson 28: Integration by SubsitutionMatthew Leingang
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lectureazuring
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer visionzukun
 
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Model
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized ModelPredicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Model
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Modelweekendsunny
 
Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)Matthew Leingang
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
 
Classical inference in/for physics
Classical inference in/for physicsClassical inference in/for physics
Classical inference in/for physicsThiago Mosqueiro
 

What's hot (19)

Reject Inference in Credit Scoring
Reject Inference in Credit ScoringReject Inference in Credit Scoring
Reject Inference in Credit Scoring
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
Lesson 23: Antiderivatives
Lesson 23: AntiderivativesLesson 23: Antiderivatives
Lesson 23: Antiderivatives
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Lesson 28: Integration by Subsitution
Lesson 28: Integration by SubsitutionLesson 28: Integration by Subsitution
Lesson 28: Integration by Subsitution
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lecture
 
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Model
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized ModelPredicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Model
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Model
 
Main
MainMain
Main
 
Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 
Classical inference in/for physics
Classical inference in/for physicsClassical inference in/for physics
Classical inference in/for physics
 

Viewers also liked

An introduc on to Machine Learning
An introduc on to Machine LearningAn introduc on to Machine Learning
An introduc on to Machine Learningbutest
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesDataminingTools Inc
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsHisham Arafat
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1Traian Rebedea
 
Modern frameworks for machine learning
Modern frameworks for machine learningModern frameworks for machine learning
Modern frameworks for machine learningSergii Nechuiviter
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine LearningDavid Jones
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsAlex Gorbachev
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...Vishal Chowdhary
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 

Viewers also liked (20)

An introduc on to Machine Learning
An introduc on to Machine LearningAn introduc on to Machine Learning
An introduc on to Machine Learning
 
Practical Machine Learning
Practical Machine Learning Practical Machine Learning
Practical Machine Learning
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And Techniques
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
Part 4 (machine learning overview) solution architecture
Part 4 (machine learning overview)   solution architecturePart 4 (machine learning overview)   solution architecture
Part 4 (machine learning overview) solution architecture
 
Modern frameworks for machine learning
Modern frameworks for machine learningModern frameworks for machine learning
Modern frameworks for machine learning
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Conceptual and theoretical framework
Conceptual and theoretical frameworkConceptual and theoretical framework
Conceptual and theoretical framework
 

Similar to Machine Learning: Some theoretical and practical problems

Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Conceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian ProcessesConceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian ProcessesJuanPabloCarbajal3
 
Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comEdhole.com
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Statistical Machine________ Learning.ppt
Statistical Machine________ Learning.pptStatistical Machine________ Learning.ppt
Statistical Machine________ Learning.pptSandeepGupta229023
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science Frank Kienle
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Module 4 - Linear Model for Classification.pptx
Module 4 - Linear Model for Classification.pptxModule 4 - Linear Model for Classification.pptx
Module 4 - Linear Model for Classification.pptxGulamSarwar31
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 

Similar to Machine Learning: Some theoretical and practical problems (20)

Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Conceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian ProcessesConceptual Introduction to Gaussian Processes
Conceptual Introduction to Gaussian Processes
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.com
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
ppt
pptppt
ppt
 
Statistical Machine________ Learning.ppt
Statistical Machine________ Learning.pptStatistical Machine________ Learning.ppt
Statistical Machine________ Learning.ppt
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Module 4 - Linear Model for Classification.pptx
Module 4 - Linear Model for Classification.pptxModule 4 - Linear Model for Classification.pptx
Module 4 - Linear Model for Classification.pptx
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
01 lec intro
01 lec intro01 lec intro
01 lec intro
 
presentacion
presentacionpresentacion
presentacion
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning: Some theoretical and practical problems

  • 1. Outline Framework Theoretical Results Consequences Practical Implications Machine Learning: Some theoretical and practical problems Olivier Bousquet Journ´es MAS, Lille, 2006 e Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 2. Outline Framework Theoretical Results Consequences Practical Implications 1 Framework 2 Theoretical Results 3 Consequences 4 Practical Implications Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 3. Outline Framework Theoretical Results Consequences Practical Implications Outline 1 Framework 2 Theoretical Results 3 Consequences 4 Practical Implications Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 4. Outline Framework Theoretical Results Consequences Practical Implications The Setting Prediction problems after observing example pairs (X , Y ), build a function g : X → Y that predicts well: g (X ) ≈ Y Typical setting is statistical (data assumed to be sampled i.i.d.) Other setting: on-line adversarial (no assumption on the data generation mechanism) Goal: find the best algorithm Theoretical answer: fundamental limits of learning Practical answer: guidelines for algorithm design Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 5. Outline Framework Theoretical Results Consequences Practical Implications The Setting Prediction problems after observing example pairs (X , Y ), build a function g : X → Y that predicts well: g (X ) ≈ Y Typical setting is statistical (data assumed to be sampled i.i.d.) Other setting: on-line adversarial (no assumption on the data generation mechanism) Goal: find the best algorithm Theoretical answer: fundamental limits of learning Practical answer: guidelines for algorithm design Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 6. Outline Framework Theoretical Results Consequences Practical Implications The Setting Prediction problems after observing example pairs (X , Y ), build a function g : X → Y that predicts well: g (X ) ≈ Y Typical setting is statistical (data assumed to be sampled i.i.d.) Other setting: on-line adversarial (no assumption on the data generation mechanism) Goal: find the best algorithm Theoretical answer: fundamental limits of learning Practical answer: guidelines for algorithm design Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 7. Outline Framework Theoretical Results Consequences Practical Implications The Setting Prediction problems after observing example pairs (X , Y ), build a function g : X → Y that predicts well: g (X ) ≈ Y Typical setting is statistical (data assumed to be sampled i.i.d.) Other setting: on-line adversarial (no assumption on the data generation mechanism) Goal: find the best algorithm Theoretical answer: fundamental limits of learning Practical answer: guidelines for algorithm design Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 8. Outline Framework Theoretical Results Consequences Practical Implications The Setting Prediction problems after observing example pairs (X , Y ), build a function g : X → Y that predicts well: g (X ) ≈ Y Typical setting is statistical (data assumed to be sampled i.i.d.) Other setting: on-line adversarial (no assumption on the data generation mechanism) Goal: find the best algorithm Theoretical answer: fundamental limits of learning Practical answer: guidelines for algorithm design Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 9. Outline Framework Theoretical Results Consequences Practical Implications The Setting Prediction problems after observing example pairs (X , Y ), build a function g : X → Y that predicts well: g (X ) ≈ Y Typical setting is statistical (data assumed to be sampled i.i.d.) Other setting: on-line adversarial (no assumption on the data generation mechanism) Goal: find the best algorithm Theoretical answer: fundamental limits of learning Practical answer: guidelines for algorithm design Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 10. Outline Framework Theoretical Results Consequences Practical Implications Definitions We consider the classification setting: Y = {0, 1} with data sampled i.i.d. A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y. Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )} Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the sample) Bayes error: best possible error L∗ = inf g L(g ) over all measurable functions Sequence of classification rules {gn }: defined for any sample size (algorithms are usually defined in this way, possibly with a sample size-dependent parameter) Consistency: limn→∞ EL(gn ) = L∗ Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 11. Outline Framework Theoretical Results Consequences Practical Implications Definitions We consider the classification setting: Y = {0, 1} with data sampled i.i.d. A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y. Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )} Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the sample) Bayes error: best possible error L∗ = inf g L(g ) over all measurable functions Sequence of classification rules {gn }: defined for any sample size (algorithms are usually defined in this way, possibly with a sample size-dependent parameter) Consistency: limn→∞ EL(gn ) = L∗ Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 12. Outline Framework Theoretical Results Consequences Practical Implications Definitions We consider the classification setting: Y = {0, 1} with data sampled i.i.d. A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y. Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )} Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the sample) Bayes error: best possible error L∗ = inf g L(g ) over all measurable functions Sequence of classification rules {gn }: defined for any sample size (algorithms are usually defined in this way, possibly with a sample size-dependent parameter) Consistency: limn→∞ EL(gn ) = L∗ Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 13. Outline Framework Theoretical Results Consequences Practical Implications Definitions We consider the classification setting: Y = {0, 1} with data sampled i.i.d. A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y. Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )} Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the sample) Bayes error: best possible error L∗ = inf g L(g ) over all measurable functions Sequence of classification rules {gn }: defined for any sample size (algorithms are usually defined in this way, possibly with a sample size-dependent parameter) Consistency: limn→∞ EL(gn ) = L∗ Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 14. Outline Framework Theoretical Results Consequences Practical Implications Definitions We consider the classification setting: Y = {0, 1} with data sampled i.i.d. A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y. Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )} Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the sample) Bayes error: best possible error L∗ = inf g L(g ) over all measurable functions Sequence of classification rules {gn }: defined for any sample size (algorithms are usually defined in this way, possibly with a sample size-dependent parameter) Consistency: limn→∞ EL(gn ) = L∗ Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 15. Outline Framework Theoretical Results Consequences Practical Implications Definitions We consider the classification setting: Y = {0, 1} with data sampled i.i.d. A rule (or learning algorithm) is a mapping gn : (X × Y)n × X → Y. Sample: Sn = {(X1 , Y1 ), . . . , (Xn , Yn )} Misclassification error: L(g ) = P (g (X ) = Y ) (conditional on the sample) Bayes error: best possible error L∗ = inf g L(g ) over all measurable functions Sequence of classification rules {gn }: defined for any sample size (algorithms are usually defined in this way, possibly with a sample size-dependent parameter) Consistency: limn→∞ EL(gn ) = L∗ Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 16. Outline Framework Theoretical Results Consequences Practical Implications Outline 1 Framework 2 Theoretical Results 3 Consequences 4 Practical Implications Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 17. Outline Framework Theoretical Results Consequences Practical Implications Consistency How to build a consistent sequence of rules? Countable X very easy, just wait! eventually every point with non-zero probability is observed an unbounded number of times (i.e. take majority vote over observed x, and random prediction on unobserved ones) Uncountable X observed sample has measure zero (for non-atomic measures) so this trick does not work Instead, take local majority with two conditions: more and more local, but also more and more points averaged Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 18. Outline Framework Theoretical Results Consequences Practical Implications Consistency How to build a consistent sequence of rules? Countable X very easy, just wait! eventually every point with non-zero probability is observed an unbounded number of times (i.e. take majority vote over observed x, and random prediction on unobserved ones) Uncountable X observed sample has measure zero (for non-atomic measures) so this trick does not work Instead, take local majority with two conditions: more and more local, but also more and more points averaged Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 19. Outline Framework Theoretical Results Consequences Practical Implications Consistency How to build a consistent sequence of rules? Countable X very easy, just wait! eventually every point with non-zero probability is observed an unbounded number of times (i.e. take majority vote over observed x, and random prediction on unobserved ones) Uncountable X observed sample has measure zero (for non-atomic measures) so this trick does not work Instead, take local majority with two conditions: more and more local, but also more and more points averaged Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 20. Outline Framework Theoretical Results Consequences Practical Implications Consistency of Histograms Histogram in Rd : cubic cells of size hn , prediction is constant over each cell (majority vote) d hn → 0, nhn → ∞ enough for universal consistency Idea of the proof Continuous functions with bounded support are dense in Lp (ν) Such functions are uniformly continuous and can thus be approximated by histograms (average of the function on a cell) provided cell size goes to 0 Since cells will contain more and more points (second condition), the cell value will eventually converge to the average over the cell Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 21. Outline Framework Theoretical Results Consequences Practical Implications Consistency of Histograms Histogram in Rd : cubic cells of size hn , prediction is constant over each cell (majority vote) d hn → 0, nhn → ∞ enough for universal consistency Idea of the proof Continuous functions with bounded support are dense in Lp (ν) Such functions are uniformly continuous and can thus be approximated by histograms (average of the function on a cell) provided cell size goes to 0 Since cells will contain more and more points (second condition), the cell value will eventually converge to the average over the cell Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 22. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch We can ”learn” anything Is the problem solved? The question becomes: among the consistent algorithms, which one is the best? We consider here the special case of classification Similar phenomena occur for regression or density estimation Unfortunately, there is no free lunch Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 23. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch We can ”learn” anything Is the problem solved? The question becomes: among the consistent algorithms, which one is the best? We consider here the special case of classification Similar phenomena occur for regression or density estimation Unfortunately, there is no free lunch Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 24. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch We can ”learn” anything Is the problem solved? The question becomes: among the consistent algorithms, which one is the best? We consider here the special case of classification Similar phenomena occur for regression or density estimation Unfortunately, there is no free lunch Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 25. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch We can ”learn” anything Is the problem solved? The question becomes: among the consistent algorithms, which one is the best? We consider here the special case of classification Similar phenomena occur for regression or density estimation Unfortunately, there is no free lunch Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 26. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch We can ”learn” anything Is the problem solved? The question becomes: among the consistent algorithms, which one is the best? We consider here the special case of classification Similar phenomena occur for regression or density estimation Unfortunately, there is no free lunch Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 27. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch We can ”learn” anything Is the problem solved? The question becomes: among the consistent algorithms, which one is the best? We consider here the special case of classification Similar phenomena occur for regression or density estimation Unfortunately, there is no free lunch Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 28. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 1 Out-of-sample error: L (gn ) = P (gn (X ) = Y |X ∈ Sn ) / Consider a uniform probability distribution µ over problems, i.e. for all x, Eµ P(Y = 1|X = x) = Eµ P(Y = 0|X = x) All classifiers have the same average error Theorem (Wolpert96) For any classification rule gn , 1 Eµ EL (gn ) = 2 Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 29. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 1 Out-of-sample error: L (gn ) = P (gn (X ) = Y |X ∈ Sn ) / Consider a uniform probability distribution µ over problems, i.e. for all x, Eµ P(Y = 1|X = x) = Eµ P(Y = 0|X = x) All classifiers have the same average error Theorem (Wolpert96) For any classification rule gn , 1 Eµ EL (gn ) = 2 Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 30. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 2 A consequence of NFL1 is that there are always cases where an algorithm can be beaten. A stronger version of NFL1: No Super Classifier Theorem (DGL96) For every sequence of classification rules {gn } there is a universally consistent sequence {gn } such that for some distribution L(gn ) > L(gn ) for all n. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 31. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 2 A consequence of NFL1 is that there are always cases where an algorithm can be beaten. A stronger version of NFL1: No Super Classifier Theorem (DGL96) For every sequence of classification rules {gn } there is a universally consistent sequence {gn } such that for some distribution L(gn ) > L(gn ) for all n. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 32. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 3 A variation of NFL1 Arbitrarily bad error for fixed sample sizes Theorem (Devroye82) Fix an > 0. For any integer n and classification rule gn , there exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that EL(gn ) ≥ 1/2 − Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 33. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 3 A variation of NFL1 Arbitrarily bad error for fixed sample sizes Theorem (Devroye82) Fix an > 0. For any integer n and classification rule gn , there exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that EL(gn ) ≥ 1/2 − Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 34. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 4 NFL3 possibly considers a different distribution for each n What happens for a fixed distribution when n increases? Slow rate phenomenon Theorem (Devroye82) Let {an } be a sequence of positive numbers converging to zero with 1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, there exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that EL(gn ) ≥ an for all n. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 35. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 4 NFL3 possibly considers a different distribution for each n What happens for a fixed distribution when n increases? Slow rate phenomenon Theorem (Devroye82) Let {an } be a sequence of positive numbers converging to zero with 1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, there exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that EL(gn ) ≥ an for all n. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 36. Outline Framework Theoretical Results Consequences Practical Implications No Free Lunch 4 NFL3 possibly considers a different distribution for each n What happens for a fixed distribution when n increases? Slow rate phenomenon Theorem (Devroye82) Let {an } be a sequence of positive numbers converging to zero with 1/16 ≥ a1 ≥ a2 ≥ . . .. For every sequence of classification rules, there exists a distribution of (X , Y ) with Bayes risk L∗ = 0 such that EL(gn ) ≥ an for all n. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 37. Outline Framework Theoretical Results Consequences Practical Implications Proofs The idea is to create a ”bad” distribution It turns out that random ones are bad enough: just create a problem with no structure (prediction at x unrelated to prediction at x ) All proofs work on finite (for fixed n) or countable (for varying n) spaces (no need to introduce uncountable X ) The trick is to make sure that there are enough point that have not been observed yet (on those, the error will be 1/2) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 38. Outline Framework Theoretical Results Consequences Practical Implications Proofs The idea is to create a ”bad” distribution It turns out that random ones are bad enough: just create a problem with no structure (prediction at x unrelated to prediction at x ) All proofs work on finite (for fixed n) or countable (for varying n) spaces (no need to introduce uncountable X ) The trick is to make sure that there are enough point that have not been observed yet (on those, the error will be 1/2) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 39. Outline Framework Theoretical Results Consequences Practical Implications Proofs The idea is to create a ”bad” distribution It turns out that random ones are bad enough: just create a problem with no structure (prediction at x unrelated to prediction at x ) All proofs work on finite (for fixed n) or countable (for varying n) spaces (no need to introduce uncountable X ) The trick is to make sure that there are enough point that have not been observed yet (on those, the error will be 1/2) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 40. Outline Framework Theoretical Results Consequences Practical Implications Proofs The idea is to create a ”bad” distribution It turns out that random ones are bad enough: just create a problem with no structure (prediction at x unrelated to prediction at x ) All proofs work on finite (for fixed n) or countable (for varying n) spaces (no need to introduce uncountable X ) The trick is to make sure that there are enough point that have not been observed yet (on those, the error will be 1/2) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 41. Outline Framework Theoretical Results Consequences Practical Implications A closer look at consistency Consider the trivially consistent rule for a countable space (majority vote) Its error decreases with increasing sample size ∀n, EL(gn ) ≥ EL(gn+1 ) Is is true in general for universally consistent rules? Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 42. Outline Framework Theoretical Results Consequences Practical Implications A closer look at consistency Consider the trivially consistent rule for a countable space (majority vote) Its error decreases with increasing sample size ∀n, EL(gn ) ≥ EL(gn+1 ) Is is true in general for universally consistent rules? Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 43. Outline Framework Theoretical Results Consequences Practical Implications A closer look at consistency Consider the trivially consistent rule for a countable space (majority vote) Its error decreases with increasing sample size ∀n, EL(gn ) ≥ EL(gn+1 ) Is is true in general for universally consistent rules? Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 44. Outline Framework Theoretical Results Consequences Practical Implications Smart rules Consistency for uncountable spaces is not so trivial Smart rules Definition A sequence {gn } of classification rules is smart if for any distribution and any integer n, EL(gn ) ≥ EL(gn+1 ) For uncountable spaces, some of the known universally consistent rules can be shown to be non-smart Conjecture: on Rd any smart rule is not universally consistent Interpretation: consistency on uncountable spaces requires to adapt the degree of smoothness to the sample size, this means that there will be a point for which smoothness degree will be too large Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 45. Outline Framework Theoretical Results Consequences Practical Implications Smart rules Consistency for uncountable spaces is not so trivial Smart rules Definition A sequence {gn } of classification rules is smart if for any distribution and any integer n, EL(gn ) ≥ EL(gn+1 ) For uncountable spaces, some of the known universally consistent rules can be shown to be non-smart Conjecture: on Rd any smart rule is not universally consistent Interpretation: consistency on uncountable spaces requires to adapt the degree of smoothness to the sample size, this means that there will be a point for which smoothness degree will be too large Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 46. Outline Framework Theoretical Results Consequences Practical Implications Smart rules Consistency for uncountable spaces is not so trivial Smart rules Definition A sequence {gn } of classification rules is smart if for any distribution and any integer n, EL(gn ) ≥ EL(gn+1 ) For uncountable spaces, some of the known universally consistent rules can be shown to be non-smart Conjecture: on Rd any smart rule is not universally consistent Interpretation: consistency on uncountable spaces requires to adapt the degree of smoothness to the sample size, this means that there will be a point for which smoothness degree will be too large Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 47. Outline Framework Theoretical Results Consequences Practical Implications Smart rules Consistency for uncountable spaces is not so trivial Smart rules Definition A sequence {gn } of classification rules is smart if for any distribution and any integer n, EL(gn ) ≥ EL(gn+1 ) For uncountable spaces, some of the known universally consistent rules can be shown to be non-smart Conjecture: on Rd any smart rule is not universally consistent Interpretation: consistency on uncountable spaces requires to adapt the degree of smoothness to the sample size, this means that there will be a point for which smoothness degree will be too large Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 48. Outline Framework Theoretical Results Consequences Practical Implications Anti-learning Average error is 1/2 so there are problems for which the error is much worse than random guessing! One can indeed construct distributions for which some standard algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0! Of course this occurs for a fixed sample size Can one always do that (for any rule) ? The problem should have a structure, but one which is opposite to the ones preferred by the algorithm Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 49. Outline Framework Theoretical Results Consequences Practical Implications Anti-learning Average error is 1/2 so there are problems for which the error is much worse than random guessing! One can indeed construct distributions for which some standard algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0! Of course this occurs for a fixed sample size Can one always do that (for any rule) ? The problem should have a structure, but one which is opposite to the ones preferred by the algorithm Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 50. Outline Framework Theoretical Results Consequences Practical Implications Anti-learning Average error is 1/2 so there are problems for which the error is much worse than random guessing! One can indeed construct distributions for which some standard algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0! Of course this occurs for a fixed sample size Can one always do that (for any rule) ? The problem should have a structure, but one which is opposite to the ones preferred by the algorithm Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 51. Outline Framework Theoretical Results Consequences Practical Implications Anti-learning Average error is 1/2 so there are problems for which the error is much worse than random guessing! One can indeed construct distributions for which some standard algorithms have EL(gn ) arbitrarily close to 1 even with L∗ = 0! Of course this occurs for a fixed sample size Can one always do that (for any rule) ? The problem should have a structure, but one which is opposite to the ones preferred by the algorithm Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 52. Outline Framework Theoretical Results Consequences Practical Implications Bayes Error Estimation Assume we just want to estimate L∗ . Of course, we could use any universally consistent algorithm and estimate its error. But we get slow rates! Is there a better way? Theorem (DGL96) ˆ For every n, for any estimate Ln of the Bayes error L∗ and for every > 0, there exists a distribution (X , Y ), such that ˆ E |Ln − L∗ | ≥ 1/4 − Estimating this single number does not seem easier than estimating the whole set {x : P(Y = 1|x) > 1/2} Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 53. Outline Framework Theoretical Results Consequences Practical Implications Bayes Error Estimation Assume we just want to estimate L∗ . Of course, we could use any universally consistent algorithm and estimate its error. But we get slow rates! Is there a better way? Theorem (DGL96) ˆ For every n, for any estimate Ln of the Bayes error L∗ and for every > 0, there exists a distribution (X , Y ), such that ˆ E |Ln − L∗ | ≥ 1/4 − Estimating this single number does not seem easier than estimating the whole set {x : P(Y = 1|x) > 1/2} Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 54. Outline Framework Theoretical Results Consequences Practical Implications Bayes Error Estimation Assume we just want to estimate L∗ . Of course, we could use any universally consistent algorithm and estimate its error. But we get slow rates! Is there a better way? Theorem (DGL96) ˆ For every n, for any estimate Ln of the Bayes error L∗ and for every > 0, there exists a distribution (X , Y ), such that ˆ E |Ln − L∗ | ≥ 1/4 − Estimating this single number does not seem easier than estimating the whole set {x : P(Y = 1|x) > 1/2} Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 55. Outline Framework Theoretical Results Consequences Practical Implications Outline 1 Framework 2 Theoretical Results 3 Consequences 4 Practical Implications Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 56. Outline Framework Theoretical Results Consequences Practical Implications What can we hope to prove? Our framework is too general! Nothing interesting can be said about learning algorithms Can we prove something interesting under slightly more restrictive assumptions? Are the distributions used to prove the NFLs pathological? (NFL 4 holds even within classes of ”reasonable” distributions!) If we can define which problems actually occur in real life, we can hope to derive appropriate algorithms (optimal on this class of problems) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 57. Outline Framework Theoretical Results Consequences Practical Implications What can we hope to prove? Our framework is too general! Nothing interesting can be said about learning algorithms Can we prove something interesting under slightly more restrictive assumptions? Are the distributions used to prove the NFLs pathological? (NFL 4 holds even within classes of ”reasonable” distributions!) If we can define which problems actually occur in real life, we can hope to derive appropriate algorithms (optimal on this class of problems) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 58. Outline Framework Theoretical Results Consequences Practical Implications What can we hope to prove? Our framework is too general! Nothing interesting can be said about learning algorithms Can we prove something interesting under slightly more restrictive assumptions? Are the distributions used to prove the NFLs pathological? (NFL 4 holds even within classes of ”reasonable” distributions!) If we can define which problems actually occur in real life, we can hope to derive appropriate algorithms (optimal on this class of problems) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 59. Outline Framework Theoretical Results Consequences Practical Implications What can we hope to prove? Our framework is too general! Nothing interesting can be said about learning algorithms Can we prove something interesting under slightly more restrictive assumptions? Are the distributions used to prove the NFLs pathological? (NFL 4 holds even within classes of ”reasonable” distributions!) If we can define which problems actually occur in real life, we can hope to derive appropriate algorithms (optimal on this class of problems) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 60. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 61. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 62. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 63. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 64. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 65. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 66. Outline Framework Theoretical Results Consequences Practical Implications The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup L(gn ) − inf L(g ) {gn } P∈P g Seems reasonable and useful for understanding but does not provide guarantees Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 67. Outline Framework Theoretical Results Consequences Practical Implications The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is inf sup L(gn ) − inf L(g ) {gn } P g ∈G Gives guarantees In between: adaptation Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 68. Outline Framework Theoretical Results Consequences Practical Implications The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is inf sup L(gn ) − inf L(g ) {gn } P g ∈G Gives guarantees In between: adaptation Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 69. Outline Framework Theoretical Results Consequences Practical Implications The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is inf sup L(gn ) − inf L(g ) {gn } P g ∈G Gives guarantees In between: adaptation Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 70. Outline Framework Theoretical Results Consequences Practical Implications The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is inf sup L(gn ) − inf L(g ) {gn } P g ∈G Gives guarantees In between: adaptation Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 71. Outline Framework Theoretical Results Consequences Practical Implications The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is inf sup L(gn ) − inf L(g ) {gn } P g ∈G Gives guarantees In between: adaptation Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 72. Outline Framework Theoretical Results Consequences Practical Implications Outline 1 Framework 2 Theoretical Results 3 Consequences 4 Practical Implications Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 73. Outline Framework Theoretical Results Consequences Practical Implications Does this help practically? We can probably come up with algorithms that work well on most real-world problems If we have a characterization of these problems, we can even prove something about such algorithms However, there is no guarantee that a new problem will satisfy this characterization So there cannot be a formal proof that an algorithm is good or bad Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 74. Outline Framework Theoretical Results Consequences Practical Implications Does this help practically? We can probably come up with algorithms that work well on most real-world problems If we have a characterization of these problems, we can even prove something about such algorithms However, there is no guarantee that a new problem will satisfy this characterization So there cannot be a formal proof that an algorithm is good or bad Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 75. Outline Framework Theoretical Results Consequences Practical Implications Does this help practically? We can probably come up with algorithms that work well on most real-world problems If we have a characterization of these problems, we can even prove something about such algorithms However, there is no guarantee that a new problem will satisfy this characterization So there cannot be a formal proof that an algorithm is good or bad Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 76. Outline Framework Theoretical Results Consequences Practical Implications Does this help practically? We can probably come up with algorithms that work well on most real-world problems If we have a characterization of these problems, we can even prove something about such algorithms However, there is no guarantee that a new problem will satisfy this characterization So there cannot be a formal proof that an algorithm is good or bad Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 77. Outline Framework Theoretical Results Consequences Practical Implications If theory cannot help, what can we do? Essentially a matter of finding an algorithm that implements the right notion of smoothness for the problem at hand More an art than a science! Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 78. Outline Framework Theoretical Results Consequences Practical Implications If theory cannot help, what can we do? Essentially a matter of finding an algorithm that implements the right notion of smoothness for the problem at hand More an art than a science! Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 79. Outline Framework Theoretical Results Consequences Practical Implications Priors Algorithm design is composed of two steps Choosing a preference This first step is based on knowledge of the problem, this is where guidance (but no theory) is needed. Exploiting it for inference The second step can possibly be formalized (optimality with respect to assumptions). The main issue is computational cost. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 80. Outline Framework Theoretical Results Consequences Practical Implications Priors Algorithm design is composed of two steps Choosing a preference This first step is based on knowledge of the problem, this is where guidance (but no theory) is needed. Exploiting it for inference The second step can possibly be formalized (optimality with respect to assumptions). The main issue is computational cost. Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 81. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 82. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 83. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 84. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 85. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 86. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 87. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 88. Outline Framework Theoretical Results Consequences Practical Implications Why can algorithms fail in practice? 1 Data representation (unappropriate features, errors, ...) 2 Data scarcity (not enough data samples) 3 Data overload (too many variables, too much noise) 4 Lack of understanding of the result (impossible validation) / lack of validation data Examples Forgot to remove the output variable (or a version of it): algorithm picks it up An irrelevant variable happens to be discriminative (e.g. date of sample collection) Error in a measurement (misalignment in the database) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 89. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 90. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 91. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 92. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 93. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 94. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 95. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 96. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 97. Outline Framework Theoretical Results Consequences Practical Implications So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user’s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy) Olivier Bousquet Machine Learning: Some theoretical and practical problems
  • 98. Outline Framework Theoretical Results Consequences Practical Implications References L. Devroye: Necessary and Sufficient Conditions for the Almost Everywhere Convergence of Nearest Neighbors Regression Function Estimates. Zeitschrift f¨r Wahrscheinlichkeitstheorie und verwandte u Gebiete, 61: 467-481 (1982) D. Wolpert: The lack of a prior distinctions between learning algorithms, Neural Computation 8 (1996) L. Devroye, L. Gy¨rfi and G. Lugosi: A Probabilistic Theory of o Pattern Recognition, Springer (1996) Olivier Bousquet Machine Learning: Some theoretical and practical problems