SlideShare a Scribd company logo
1 of 32
Download to read offline
General remarks about learning
Probability Theory and Statistics
                   Linear spaces




        Machine Learning
Preliminaries and Math Refresher

                    M. L¨thi, T. Vetter
                        u


                     February 18, 2008




             M. L¨thi, T. Vetter
                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
                Probability Theory and Statistics
                                   Linear spaces


Outline



   1   General remarks about learning


   2   Probability Theory and Statistics


   3   Linear spaces




                             M. L¨thi, T. Vetter
                                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
                Probability Theory and Statistics
                                   Linear spaces


Outline



   1   General remarks about learning


   2   Probability Theory and Statistics


   3   Linear spaces




                             M. L¨thi, T. Vetter
                                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
           Probability Theory and Statistics
                              Linear spaces




The problem of learning is arguably at the very core of the problem
of intelligence, both biological and artificial.

                                                    T. Poggio and C.R. Shelton




                        M. L¨thi, T. Vetter
                            u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
              Probability Theory and Statistics
                                 Linear spaces


Model building in natural sciences


   Model building
   Given a phenomenon, construct a model for it.

   Example (Heat Conduction)
   Phenomenon: The spontaneous transfer of thermal energy
   through matter, from a region of higher temperature to a region of
   lower temperature
   Model:
                         ∂Q
                             = −k       T · dS
                         ∂t          S




                           M. L¨thi, T. Vetter
                               u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
              Probability Theory and Statistics
                                 Linear spaces


Learning as Model Building


   Example (Learning)
   Phenomenon: Learning (Inferring general rules from examples)
   Model:

                                                  P(f )P(f |D)
                            f ∗ = arg max
                                           f ∈H      P(D)




                           M. L¨thi, T. Vetter
                               u                   Machine Learning Preliminaries and Math Refresher
General remarks about learning
              Probability Theory and Statistics
                                 Linear spaces


Learning as Model Building


   Example (Learning)
   Phenomenon: Learning (Inferring general rules from examples)
   Model:
                                                               
                                                               
                              ∗        P(f )
                                          )P(D|f
                           f = arg
                                  max
                           
                                 f ∈H    P(D)
   Neural networks, Decision Trees, Naive Bayes, Support Vector
   machines, etc.

   Models for learning
   The models for learning are the learning algorithms


                           M. L¨thi, T. Vetter
                               u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Goals of the first block

   Life is short . . .
   We want to cover the essentials of learning.


  General Setting                 Statistical                          Kernel Methods
      Mathematically              Learning Theory                             Theory of
      precise setting                     When does                           Kernels
      of the learning                     learning work                       Make linear
      problem                             Conditions any                      algorithms
      Valid for any                       algorithm has                       non-linear.
      kind of learning                    to satisfy                          Learning from
      algorithm                           Performance                         non-vectorial
                                          bounds                              data.

                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Mathematics needed in the first block
   The need for mathematics
   As we treat the learning problem in a formal setting, the results
   and methods are necessarily formulated in mathematical terms.


  General Setting                 Statistical                          Kernel Methods
      Probability                 Learning Theory                             Linear spaces
      theory                              More                                Linear algebra
      Statistics                          probability
                                                                              Basic
                                          theory
      Basic                                                                   optimization
      optimization                        More statistics                     theory
      theory



                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Mathematics needed in the first block
   The need for mathematics
   As we treat the learning problem in a formal setting, the results
   and methods are necessarily formulated in mathematical terms.


  General Setting                 Statistical                          Kernel Methods
      Probability                 Learning Theory                             Linear spaces
      theory                              More                                Linear algebra
      Statistics                          probability
                                                                              Basic
                                          theory
      Basic                                                                   optimization
      optimization                        More statistics                     theory
      theory
   A bit of mathematical maturity and an open mind is required. The
   rest will be explained.
                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
           Probability Theory and Statistics
                              Linear spaces




Nothing is more practical than a good theory.

                                                                  Vladimir N. Vapnik




                        M. L¨thi, T. Vetter
                            u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
           Probability Theory and Statistics
                              Linear spaces




Nothing is more practical than a good theory.

                                                                  Vladimir N. Vapnik

Nothing (in computer science) is more beautiful than learning
theory?

                                                                                  M. L¨thi
                                                                                      u




                        M. L¨thi, T. Vetter
                            u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
                Probability Theory and Statistics
                                   Linear spaces


Outline



   1   General remarks about learning


   2   Probability Theory and Statistics


   3   Linear spaces




                             M. L¨thi, T. Vetter
                                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
Probability Theory and Statistics
                   Linear spaces




             M. L¨thi, T. Vetter
                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
              Probability Theory and Statistics
                                 Linear spaces


Probability theory vs Statistics


 Definition (Probability Theory)                   Definition (Statistics)
 A branch of mathematics                          The science of collecting,
 concerned with the analysis of                   analyzing, presenting, and
 random phenomena.                                interpreting data.
 General ⇒ Specific                                Specific ⇒ General


       Statistical Machine learning is closely related to (inferential)
       statistics.
       Many state-of-the-art learning algorithms are based on
       concepts from probability theory.


                           M. L¨thi, T. Vetter
                               u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
                Probability Theory and Statistics
                                   Linear spaces


Probabilities


   Definition (Probability Space)
   A probability space is the triple

                                            (Ω, F, P)

   where
        Ω is a set of events ω
        F is a collection of events (e.g. the power-set P(Ω))
        P is a measure that satisfies the probability axioms.




                             M. L¨thi, T. Vetter
                                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Axioms of Probability



    1   For any A ∈ F, there exists a number P(A), the probability of
        A, satisfitying P(A) ≥ 0.
    2   P(Ω) = 1.
    3   Let {An , n ≥ 1} be a collection of pairwise disjoint events,
        and let A be their union. Then
                                                   ∞
                                       P(A) =            P(An ).
                                                   n=1




                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Independence

   Definition (Independence)
   Two events, A and B, are independent iff the probability of their
   intersection equals the product of the individual probabilities, i.e.

                             P(A ∩ B) = P(A) · P(B).

   Definition (Conditional probability)
   Given two events A and B, with P(B)  0, we define the
   conditional probability for A given B, P(A|B), by the relation

                                                   P(A ∩ B)
                                 P(A|B) =                   .
                                                    P(B)


                            M. L¨thi, T. Vetter
                                u                   Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Random Variables

       A single event is not that interesting.

   Definition (Random Variable)
   A random variable X is a function from the probability space to a
   vector of real numbers

                                        X : Ω → Rn .
   Random variables are characterized by their distribution function F :
   Definition (Probability Distribution Function)
   Let X : Ω → R be a random variable. We define

                    FX (x) = P(X ≤ x) − ∞  x  ∞.

                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Probability density function

   Definition (Probability density function)
   The density function, is the function fX , with the property
                                     x
                 FX (x) =                 fX (y ) dy , −∞  x  ∞.
                                   −∞




                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
              Probability Theory and Statistics
                                 Linear spaces


Convergence



  Definition (Convergence in Probability)
  Let X1 , X2 , . . . be random variables. We say that Xn converges in
  probability to the random variable X as n → ∞, iff, for all ε  0,

                     P(|Xn − X |  ε) → 0, as n → ∞.
                p
  We write Xn − X as n → ∞.
              →




                           M. L¨thi, T. Vetter
                               u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Weak law of large numbers


   Theorem (Bernoulli’s Theorems (Weak law of large numbers))
   Let X1 , . . . , Xn be a sequence of independent and identically
   distributed (i.i.d.) random variables, each having mean µ and
   standard deviation σ. Then

                    P[|(X1 + . . . + Xn )/n − µ|  ε] → 0

   as n → ∞.
   Thus given enough observations xi ∼ FX , the sample mean
   x = n n xi will approach the true mean µ.
       1
           i=1




                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Expectation

   Definition (Expectation)
   Let X be a random variable with probability density function fX ,
   and g : R → R a function. We define the expectation
                                                   ∞
                         E [g (X )] :=                 g (x)fX (x) dx.
                                               −∞


   Definition (Sample mean)
   Let a sample x = {x1 , x2 , . . . , xn } be given. We define the
   (sample) mean to be
                                             n
                                         1
                                 x=            xi .
                                         n
                                                   i=1


                            M. L¨thi, T. Vetter
                                u                      Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Variance
   Definition (Variance)
   Let X be a random variable with density funciton fX . The variance
   is given by

              Var[X ] = E [(X − E [X ])2 ] = E [X 2 ] − (E [X ])2 .

   The square root Var[X ] of the variance is referred to as the
   standard deviation.

   Definition (Sample Variance)
   Let the sample x = {x1 , x2 , . . . , xn } with sample mean x be given.
   We define the sample variance to be

                                            1
                                 s2 =          (xi − x)2 .
                                           n−1
                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Notation


   Assume F has a probability density function:

                                                   dF (x)
                                       f (x) =
                                                    dx
   Formally, we write:
                                     f (x) dx = dF (x)

   Example: Expectation
                                  ∞                               ∞
            E [g (X )] :=              g (x)f (x) dx. =               g (x)dF (x)
                                −∞                              −∞




                            M. L¨thi, T. Vetter
                                u                   Machine Learning Preliminaries and Math Refresher
General remarks about learning
                Probability Theory and Statistics
                                   Linear spaces


Outline



   1   General remarks about learning


   2   Probability Theory and Statistics


   3   Linear spaces




                             M. L¨thi, T. Vetter
                                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
                Probability Theory and Statistics
                                   Linear spaces


Vector Space
   A set V together with two binary operations
     1   vector addition + : V × V → V and
     2   scalar multiplication · : R × V → V
   is called a vector space over R, if it satisfies the following axioms:
     1   ∀x, y ∈ V : x + y = y + x (commutativity)
     2   ∀x, y ∈ V : x + (y + z) = (x + y ) + z (associativity)
     3   ∃0 ∈ V , ∀x ∈ V : 0 + x = x (identity of vector addition)
     4   ∃1 ∈ V , ∀x ∈ V : 1 · x = x (identity of vector multiplication)
     5   ∀x ∈ V : ∃x ∈ V : x + (−x) = 0 (additive inverse element)
     6   ∀α ∈ R, ∀x, y ∈ V : α · (x + y ) = α · x + α · y (distributivity)
     7   ∀α, β ∈ R, ∀x ∈ V : (α + β) · x = α · x + β · x (distributivity)
     8   ∀α, β ∈ R, ∀x ∈ V : α(β · x) = (αβ) · x
                             M. L¨thi, T. Vetter
                                 u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Vector Space

   More importantly for us, the definition implies:

                x + y ∈ V,                                      ∀x, y ∈ V
                αx ∈ V ,                              ∀α ∈ R, ∀x ∈ V


   Subspace criterion
   Let V be a vector space over R, and let W be a subset of V .
   Then W is a subspace if and only if it satisfies the following 3
   conditions:
     1   0∈W
     2   If x, y ∈ W then x + y ∈ W
     3   If x ∈ W and α ∈ R then αx ∈ W

                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
             Probability Theory and Statistics
                                Linear spaces


Normed spaces



  Definition (Normed vector space)
  A normed vector space is a pair (V , · ) where V is a vector space
  and · is the associated norm, satisfying the following properties
  for all u, v ∈ V :
    1   v ≥ 0 (positivity)
    2   u + v ≤ u + v (triangle inequality)
    3   αv = |α| v (positive scalability)
    4   v = 0 ⇔ v = 0 (positive definiteness)




                          M. L¨thi, T. Vetter
                              u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
            Probability Theory and Statistics
                               Linear spaces



Definition (Inner product space)
An real inner product space is a pair (V , ·, · ), where V is a real
vector space and ·, · the associated inner product, satisfying the
following properties for all u, v , ∈ V
  1   u, v = v , u (symmetry)
  2    αu, v = α u, v , u, αv = α u, v
      and
       u + v , w = u, w + v , w , u, v + w = u, v + u, w ,
      (bilinearity)
  3   u, u ≥ 0 (positive definiteness)

Definition (Strict inner product space)
A inner product space is called strict if

                                 u, u = 0 ⇔ u = 0
                         M. L¨thi, T. Vetter
                             u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
               Probability Theory and Statistics
                                  Linear spaces


Inner product space


   The strict inner product
        induces a norm: f            2   = f ,f .
        is used to define distances and angles between elements.

   Theorem (Cauchy Schwarz inequality)
   For all vectors u and v of a real inner product space (V , ·, · ), the
   following inequality holds:

                                   | u, v | ≤ u          v .




                            M. L¨thi, T. Vetter
                                u                  Machine Learning Preliminaries and Math Refresher
General remarks about learning
           Probability Theory and Statistics
                              Linear spaces


If you’re not comfortable with any of the presented material, you
should take your favourite textbook and read it up within the next
two weeks.




                        M. L¨thi, T. Vetter
                            u                  Machine Learning Preliminaries and Math Refresher

More Related Content

What's hot

Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Christian Robert
 
Computational Paths and the Calculation of Fundamental Groups
Computational Paths and the Calculation of Fundamental GroupsComputational Paths and the Calculation of Fundamental Groups
Computational Paths and the Calculation of Fundamental GroupsRuy De Queiroz
 
Machine learning SVM
Machine  learning SVMMachine  learning SVM
Machine learning SVMRaj Kamal
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Poggi analytics - ebl - 1
Poggi   analytics - ebl - 1Poggi   analytics - ebl - 1
Poggi analytics - ebl - 1Gaston Liberman
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data SciencePremier Publishers
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural networktuxette
 
A brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFTA brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFTJiahao Chen
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Reflections on understanding in mathematics
Reflections on understanding in mathematicsReflections on understanding in mathematics
Reflections on understanding in mathematicsLamrabet Driss
 
How the philosophy of mathematical practice can be logic by other means (bris...
How the philosophy of mathematical practice can be logic by other means (bris...How the philosophy of mathematical practice can be logic by other means (bris...
How the philosophy of mathematical practice can be logic by other means (bris...Brendan Larvor
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its roleValeria de Paiva
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010Christian Robert
 
Intuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for EngineersIntuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for EngineersIDES Editor
 

What's hot (20)

Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?
 
Computational Paths and the Calculation of Fundamental Groups
Computational Paths and the Calculation of Fundamental GroupsComputational Paths and the Calculation of Fundamental Groups
Computational Paths and the Calculation of Fundamental Groups
 
Machine learning SVM
Machine  learning SVMMachine  learning SVM
Machine learning SVM
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Poggi analytics - ebl - 1
Poggi   analytics - ebl - 1Poggi   analytics - ebl - 1
Poggi analytics - ebl - 1
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 
A brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFTA brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFT
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Reflections on understanding in mathematics
Reflections on understanding in mathematicsReflections on understanding in mathematics
Reflections on understanding in mathematics
 
Desy
DesyDesy
Desy
 
Boston talk
Boston talkBoston talk
Boston talk
 
How the philosophy of mathematical practice can be logic by other means (bris...
How the philosophy of mathematical practice can be logic by other means (bris...How the philosophy of mathematical practice can be logic by other means (bris...
How the philosophy of mathematical practice can be logic by other means (bris...
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its role
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
Intuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for EngineersIntuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for Engineers
 
Julia text mining_inmobi
Julia text mining_inmobiJulia text mining_inmobi
Julia text mining_inmobi
 

Viewers also liked

The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Chris Fregly
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰台灣資料科學年會
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Data Science Thailand
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan台灣資料科學年會
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Chris Fregly
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
 
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 

Viewers also liked (20)

The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
 
02 math essentials
02 math essentials02 math essentials
02 math essentials
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine Learning
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
 
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 

Similar to Machine Learning Preliminaries and Math Refresher

Introduction.doc
Introduction.docIntroduction.doc
Introduction.docbutest
 
Course Syllabus
Course SyllabusCourse Syllabus
Course Syllabusbutest
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
nonlinear_rmt.pdf
nonlinear_rmt.pdfnonlinear_rmt.pdf
nonlinear_rmt.pdfGieTe
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Discrete Mathematics
Discrete MathematicsDiscrete Mathematics
Discrete Mathematicsmetamath
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrisonComputer Science Club
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and datahaharrington
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 

Similar to Machine Learning Preliminaries and Math Refresher (18)

Introduction.doc
Introduction.docIntroduction.doc
Introduction.doc
 
Course Syllabus
Course SyllabusCourse Syllabus
Course Syllabus
 
Geuvers slides
Geuvers slidesGeuvers slides
Geuvers slides
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
nonlinear_rmt.pdf
nonlinear_rmt.pdfnonlinear_rmt.pdf
nonlinear_rmt.pdf
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 
ppt
pptppt
ppt
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Discrete Mathematics
Discrete MathematicsDiscrete Mathematics
Discrete Mathematics
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning Preliminaries and Math Refresher

  • 1. General remarks about learning Probability Theory and Statistics Linear spaces Machine Learning Preliminaries and Math Refresher M. L¨thi, T. Vetter u February 18, 2008 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 2. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 3. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 4. General remarks about learning Probability Theory and Statistics Linear spaces The problem of learning is arguably at the very core of the problem of intelligence, both biological and artificial. T. Poggio and C.R. Shelton M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 5. General remarks about learning Probability Theory and Statistics Linear spaces Model building in natural sciences Model building Given a phenomenon, construct a model for it. Example (Heat Conduction) Phenomenon: The spontaneous transfer of thermal energy through matter, from a region of higher temperature to a region of lower temperature Model: ∂Q = −k T · dS ∂t S M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 6. General remarks about learning Probability Theory and Statistics Linear spaces Learning as Model Building Example (Learning) Phenomenon: Learning (Inferring general rules from examples) Model: P(f )P(f |D) f ∗ = arg max f ∈H P(D) M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 7. General remarks about learning Probability Theory and Statistics Linear spaces Learning as Model Building Example (Learning) Phenomenon: Learning (Inferring general rules from examples) Model: ∗ P(f ) )P(D|f f = arg max f ∈H P(D) Neural networks, Decision Trees, Naive Bayes, Support Vector machines, etc. Models for learning The models for learning are the learning algorithms M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 8. General remarks about learning Probability Theory and Statistics Linear spaces Goals of the first block Life is short . . . We want to cover the essentials of learning. General Setting Statistical Kernel Methods Mathematically Learning Theory Theory of precise setting When does Kernels of the learning learning work Make linear problem Conditions any algorithms Valid for any algorithm has non-linear. kind of learning to satisfy Learning from algorithm Performance non-vectorial bounds data. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 9. General remarks about learning Probability Theory and Statistics Linear spaces Mathematics needed in the first block The need for mathematics As we treat the learning problem in a formal setting, the results and methods are necessarily formulated in mathematical terms. General Setting Statistical Kernel Methods Probability Learning Theory Linear spaces theory More Linear algebra Statistics probability Basic theory Basic optimization optimization More statistics theory theory M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 10. General remarks about learning Probability Theory and Statistics Linear spaces Mathematics needed in the first block The need for mathematics As we treat the learning problem in a formal setting, the results and methods are necessarily formulated in mathematical terms. General Setting Statistical Kernel Methods Probability Learning Theory Linear spaces theory More Linear algebra Statistics probability Basic theory Basic optimization optimization More statistics theory theory A bit of mathematical maturity and an open mind is required. The rest will be explained. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 11. General remarks about learning Probability Theory and Statistics Linear spaces Nothing is more practical than a good theory. Vladimir N. Vapnik M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 12. General remarks about learning Probability Theory and Statistics Linear spaces Nothing is more practical than a good theory. Vladimir N. Vapnik Nothing (in computer science) is more beautiful than learning theory? M. L¨thi u M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 13. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 14. General remarks about learning Probability Theory and Statistics Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 15. General remarks about learning Probability Theory and Statistics Linear spaces Probability theory vs Statistics Definition (Probability Theory) Definition (Statistics) A branch of mathematics The science of collecting, concerned with the analysis of analyzing, presenting, and random phenomena. interpreting data. General ⇒ Specific Specific ⇒ General Statistical Machine learning is closely related to (inferential) statistics. Many state-of-the-art learning algorithms are based on concepts from probability theory. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 16. General remarks about learning Probability Theory and Statistics Linear spaces Probabilities Definition (Probability Space) A probability space is the triple (Ω, F, P) where Ω is a set of events ω F is a collection of events (e.g. the power-set P(Ω)) P is a measure that satisfies the probability axioms. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 17. General remarks about learning Probability Theory and Statistics Linear spaces Axioms of Probability 1 For any A ∈ F, there exists a number P(A), the probability of A, satisfitying P(A) ≥ 0. 2 P(Ω) = 1. 3 Let {An , n ≥ 1} be a collection of pairwise disjoint events, and let A be their union. Then ∞ P(A) = P(An ). n=1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 18. General remarks about learning Probability Theory and Statistics Linear spaces Independence Definition (Independence) Two events, A and B, are independent iff the probability of their intersection equals the product of the individual probabilities, i.e. P(A ∩ B) = P(A) · P(B). Definition (Conditional probability) Given two events A and B, with P(B) 0, we define the conditional probability for A given B, P(A|B), by the relation P(A ∩ B) P(A|B) = . P(B) M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 19. General remarks about learning Probability Theory and Statistics Linear spaces Random Variables A single event is not that interesting. Definition (Random Variable) A random variable X is a function from the probability space to a vector of real numbers X : Ω → Rn . Random variables are characterized by their distribution function F : Definition (Probability Distribution Function) Let X : Ω → R be a random variable. We define FX (x) = P(X ≤ x) − ∞ x ∞. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 20. General remarks about learning Probability Theory and Statistics Linear spaces Probability density function Definition (Probability density function) The density function, is the function fX , with the property x FX (x) = fX (y ) dy , −∞ x ∞. −∞ M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 21. General remarks about learning Probability Theory and Statistics Linear spaces Convergence Definition (Convergence in Probability) Let X1 , X2 , . . . be random variables. We say that Xn converges in probability to the random variable X as n → ∞, iff, for all ε 0, P(|Xn − X | ε) → 0, as n → ∞. p We write Xn − X as n → ∞. → M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 22. General remarks about learning Probability Theory and Statistics Linear spaces Weak law of large numbers Theorem (Bernoulli’s Theorems (Weak law of large numbers)) Let X1 , . . . , Xn be a sequence of independent and identically distributed (i.i.d.) random variables, each having mean µ and standard deviation σ. Then P[|(X1 + . . . + Xn )/n − µ| ε] → 0 as n → ∞. Thus given enough observations xi ∼ FX , the sample mean x = n n xi will approach the true mean µ. 1 i=1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 23. General remarks about learning Probability Theory and Statistics Linear spaces Expectation Definition (Expectation) Let X be a random variable with probability density function fX , and g : R → R a function. We define the expectation ∞ E [g (X )] := g (x)fX (x) dx. −∞ Definition (Sample mean) Let a sample x = {x1 , x2 , . . . , xn } be given. We define the (sample) mean to be n 1 x= xi . n i=1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 24. General remarks about learning Probability Theory and Statistics Linear spaces Variance Definition (Variance) Let X be a random variable with density funciton fX . The variance is given by Var[X ] = E [(X − E [X ])2 ] = E [X 2 ] − (E [X ])2 . The square root Var[X ] of the variance is referred to as the standard deviation. Definition (Sample Variance) Let the sample x = {x1 , x2 , . . . , xn } with sample mean x be given. We define the sample variance to be 1 s2 = (xi − x)2 . n−1 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 25. General remarks about learning Probability Theory and Statistics Linear spaces Notation Assume F has a probability density function: dF (x) f (x) = dx Formally, we write: f (x) dx = dF (x) Example: Expectation ∞ ∞ E [g (X )] := g (x)f (x) dx. = g (x)dF (x) −∞ −∞ M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 26. General remarks about learning Probability Theory and Statistics Linear spaces Outline 1 General remarks about learning 2 Probability Theory and Statistics 3 Linear spaces M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 27. General remarks about learning Probability Theory and Statistics Linear spaces Vector Space A set V together with two binary operations 1 vector addition + : V × V → V and 2 scalar multiplication · : R × V → V is called a vector space over R, if it satisfies the following axioms: 1 ∀x, y ∈ V : x + y = y + x (commutativity) 2 ∀x, y ∈ V : x + (y + z) = (x + y ) + z (associativity) 3 ∃0 ∈ V , ∀x ∈ V : 0 + x = x (identity of vector addition) 4 ∃1 ∈ V , ∀x ∈ V : 1 · x = x (identity of vector multiplication) 5 ∀x ∈ V : ∃x ∈ V : x + (−x) = 0 (additive inverse element) 6 ∀α ∈ R, ∀x, y ∈ V : α · (x + y ) = α · x + α · y (distributivity) 7 ∀α, β ∈ R, ∀x ∈ V : (α + β) · x = α · x + β · x (distributivity) 8 ∀α, β ∈ R, ∀x ∈ V : α(β · x) = (αβ) · x M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 28. General remarks about learning Probability Theory and Statistics Linear spaces Vector Space More importantly for us, the definition implies: x + y ∈ V, ∀x, y ∈ V αx ∈ V , ∀α ∈ R, ∀x ∈ V Subspace criterion Let V be a vector space over R, and let W be a subset of V . Then W is a subspace if and only if it satisfies the following 3 conditions: 1 0∈W 2 If x, y ∈ W then x + y ∈ W 3 If x ∈ W and α ∈ R then αx ∈ W M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 29. General remarks about learning Probability Theory and Statistics Linear spaces Normed spaces Definition (Normed vector space) A normed vector space is a pair (V , · ) where V is a vector space and · is the associated norm, satisfying the following properties for all u, v ∈ V : 1 v ≥ 0 (positivity) 2 u + v ≤ u + v (triangle inequality) 3 αv = |α| v (positive scalability) 4 v = 0 ⇔ v = 0 (positive definiteness) M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 30. General remarks about learning Probability Theory and Statistics Linear spaces Definition (Inner product space) An real inner product space is a pair (V , ·, · ), where V is a real vector space and ·, · the associated inner product, satisfying the following properties for all u, v , ∈ V 1 u, v = v , u (symmetry) 2 αu, v = α u, v , u, αv = α u, v and u + v , w = u, w + v , w , u, v + w = u, v + u, w , (bilinearity) 3 u, u ≥ 0 (positive definiteness) Definition (Strict inner product space) A inner product space is called strict if u, u = 0 ⇔ u = 0 M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 31. General remarks about learning Probability Theory and Statistics Linear spaces Inner product space The strict inner product induces a norm: f 2 = f ,f . is used to define distances and angles between elements. Theorem (Cauchy Schwarz inequality) For all vectors u and v of a real inner product space (V , ·, · ), the following inequality holds: | u, v | ≤ u v . M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher
  • 32. General remarks about learning Probability Theory and Statistics Linear spaces If you’re not comfortable with any of the presented material, you should take your favourite textbook and read it up within the next two weeks. M. L¨thi, T. Vetter u Machine Learning Preliminaries and Math Refresher