SlideShare a Scribd company logo
1 of 31
Download to read offline
Note to other teachers and users of these slides.
                                                          Andrew would be delighted if you found this source
                                                          material useful in giving your own lectures. Feel free
                                                          to use these slides verbatim, or to modify them to fit
                                                          your own needs. PowerPoint originals are available. If
                                                          you make use of a significant portion of these slides in
                                                          your own lecture, please include this message, or the
                                                          following link to the source repository of Andrew’s
                                                          tutorials: http://www.cs.cmu.edu/~awm/tutorials .
                                                          Comments and corrections gratefully received.




                                    Gaussians
                                      Andrew W. Moore
                                           Professor
                                  School of Computer Science
                                  Carnegie Mellon University
                                        www.cs.cmu.edu/~awm
                                          awm@cs.cmu.edu
                                            412-268-7599



Copyright © Andrew W. Moore                                                                                    Slide 1




                              Gaussians in Data Mining
    •     Why we should care
    •     The entropy of a PDF
    •     Univariate Gaussians
    •     Multivariate Gaussians
    •     Bayes Rule and Gaussians
    •     Maximum Likelihood and MAP using
          Gaussians


Copyright © Andrew W. Moore                                                                                    Slide 2




                                                                                                                         1
Why we should care
    • Gaussians are as natural as Orange Juice
      and Sunshine
    • We need them to understand Bayes Optimal
      Classifiers
    • We need them to understand regression
    • We need them to understand neural nets
    • We need them to understand mixture
      models
    •…
                                (You get the idea)
Copyright © Andrew W. Moore                                            Slide 3




                                                   ⎧1
                 The “box”                                         w
                                                            | x |≤
                                                       if
                                                   ⎪
                                          p ( x) = ⎨ w             2
                 distribution                                      w
                                                   ⎪ 0 if   | x |>
                                                   ⎩               2




            1/w




                              -w/2    0        w/2




Copyright © Andrew W. Moore                                            Slide 4




                                                                                 2
⎧1
                 The “box”                                                         w
                                                                            | x |≤
                                                                  ⎪ w if           2
                                                         p ( x) = ⎨
                 distribution                                                      w
                                                                  ⎪ 0 if    | x |>
                                                                  ⎩                2




            1/w




                                 -w/2          0              w/2

                                                      w2
                              E[ X ] = 0   Var[ X ] =
                                                      12
Copyright © Andrew W. Moore                                                              Slide 5




                                  Entropy of a PDF
                                                     ∞

                                                     ∫ p( x) log p( x)dx
             Entropy of X = H [ X ] = −
                                                   x = −∞

                                                              Natural log (ln or loge)



            The larger the entropy of a distribution…
            …the harder it is to predict
            …the harder it is to compress it
            …the less spiky the distribution



Copyright © Andrew W. Moore                                                              Slide 6




                                                                                                   3
⎧1
                 The “box”                                                                  w
                                                                                     | x |≤
                                                                         ⎪ w if             2
                                                                p ( x) = ⎨
                 distribution                                                               w
                                                                         ⎪ 0 if      | x |>
                                                                         ⎩                  2




            1/w




                              -w/2                       0             w/2
                       ∞                          w/ 2                            w/ 2
                                                          1     1        1     1
                       ∫ p( x) log p( x)dx = −     ∫w / 2 w log w dx = − w log w x=−∫wdx = log w
 H[ X ] = −
                     x = −∞                      x=−                                  /2


Copyright © Andrew W. Moore                                                                     Slide 7




                                                                         ⎧1
         Unit variance                                                                      w
                                                                                     | x |≤
                                                                             if
                                                                         ⎪
                                                                p ( x) = ⎨ w                2
        box distribution                                                                    w
                                                                         ⎪ 0 if      | x |>
                                                                         ⎩                  2

                                                                           E[ X ] = 0
               1
                                                                                    w2
                                                                         Var[ X ] =
             23
                                                                                    12


                              −3                         0              3
   if w = 2 3 then Var[ X ] = 1 and H [ X ] = 1.242

Copyright © Andrew W. Moore                                                                     Slide 8




                                                                                                          4
The Hat                      ⎧ w− | x |
                                                ⎪          if |x| ≤ w
                                       p ( x) = ⎨ w 2
                 distribution                   ⎪0         if |x| > w
                                                ⎩


                                               E[ X ] = 0
                                                        w2
               1
                                             Var[ X ] =
               w
                                                         6


                              −w               w
                                   0




Copyright © Andrew W. Moore                                         Slide 9




     Unit variance hat                          ⎧ w− | x |
                                                ⎪          if |x| ≤ w
                                       p ( x) = ⎨ w 2
       distribution                             ⎪0         if |x| > w
                                                ⎩


                                               E[ X ] = 0
                                                        w2
               1
                                             Var[ X ] =
                6
                                                         6



                              −6   0            6
     if w = 6 then Var[ X ] = 1 and H [ X ] = 1.396

Copyright © Andrew W. Moore                                        Slide 10




                                                                              5
Dirac Delta
          The “2 spikes”                                                             δ ( x = −1) + δ ( x = 1)
                                                                        p ( x) =
           distribution                                                                         2


                                                                                              E[ X ] = 0
                                           1                          1
                                             δ ( x = −1)                δ ( x = 1)
             ∞                             2                          2
                                                                                            Var[ X ] = 1
             2




                                    -1                         0                   1
                                                        ∞

                                                        ∫ p( x) log p( x)dx = −∞
                                         H[ X ] = −
                                                      x = −∞


Copyright © Andrew W. Moore                                                                                 Slide 11




                      Entropies of unit-variance
                            distributions
                              Distribution                         Entropy

                              Box                                  1.242

                              Hat                                  1.396

                              2 spikes                             -infinity

                              ???                                  1.4189                Largest possible
                                                                                       entropy of any unit-
                                                                                       variance distribution



Copyright © Andrew W. Moore                                                                                 Slide 12




                                                                                                                       6
Unit variance                                                        ⎛ x2 ⎞
                                                                            1
                                                                               exp⎜ − ⎟
                                                                p ( x) =          ⎜ 2⎟
              Gaussian                                                      2π    ⎝    ⎠


                                                                               E[ X ] = 0

                                                                             Var[ X ] = 1




                                             ∞

                                             ∫ p( x) log p( x)dx = 1.4189
                              H[ X ] = −
                                           x = −∞


Copyright © Andrew W. Moore                                                                 Slide 13




                      General                                                  ⎛ ( x − μ )2 ⎞
                                                                        1
                                                                            exp⎜ −          ⎟
                                                            p ( x) =           ⎜    2σ 2 ⎟
                      Gaussian                                         2π σ    ⎝            ⎠

                                                    σ=15
                                                                               E[ X ] = μ

                                                                              Var[ X ] = σ 2


                                                    μ=100




Copyright © Andrew W. Moore                                                                 Slide 14




                                                                                                       7
General                     Also known
                                                                                    ⎛ ( x − μ )2 ⎞
                                                 as the normal
                                                                             1
                                                                                 exp⎜ −          ⎟
                                                                 p ( x) =
                                                  distribution
                                                                                    ⎜    2σ 2 ⎟
                      Gaussian
                                                     or Bell-
                                                                            2π σ    ⎝            ⎠
                                                 shaped curve



                                                     σ=15
                                                                                   E[ X ] = μ

                                                                                  Var[ X ] = σ 2


                                                    μ=100

      Shorthand: We say X ~ N(μ,σ2) to mean “X is distributed as a Gaussian
      with parameters μ and σ2”.
      In the above figure, X ~ N(100,152)


Copyright © Andrew W. Moore                                                                    Slide 15




                                        The Error Function
            Assume X ~ N(0,1)
            Define ERF(x) = P(X<x) = Cumulative Distribution of X

                                    x

                                    ∫ p( z )dz
         ERF ( x) =
                                  z = −∞

                                  ⎛ z2 ⎞
                              x
                1
                           ∫ ⎜ 2⎟
                               exp⎜ − ⎟dz
       =
                2π                ⎝    ⎠
                        z = −∞




Copyright © Andrew W. Moore                                                                    Slide 16




                                                                                                          8
Using The Error Function
            Assume X ~ N(μ,σ2)
                                      x−μ
            P(X<x| μ,σ2) =    ERF (         )
                                      σ2




Copyright © Andrew W. Moore                                   Slide 17




                   The Central Limit Theorem
    • If (X1,X2, … Xn) are i.i.d. continuous random
      variables
                                            1n
    • Then define z = f ( x1 , x2 ,...xn ) = ∑ xi
                                            n i =1

    • As n-->infinity, p(z)--->Gaussian with mean
      E[Xi] and variance Var[Xi]

                       Somewhat of a justification for assuming
                                   Gaussian noise is common
Copyright © Andrew W. Moore                                   Slide 18




                                                                         9
Other amazing facts about
                           Gaussians
    • Wouldn’t you like to know?




    • We will not examine them until we need to.




Copyright © Andrew W. Moore                                                                      Slide 19




                                  Bivariate Gaussians
                         ⎛X⎞
          Write r.v. X = ⎜ ⎟              Then define               X ~ N (μ , Σ ) to mean
                         ⎜Y ⎟
                         ⎝⎠

                                                              (                              )
                                          1
                                                          exp − 1 (x − μ)T Σ −1 (x − μ)
                          p ( x) =                1             2
                                     2π || Σ ||       2


        Where the Gaussian’s parameters are…

                                ⎛ μx ⎞     ⎛σ 2 x          σ xy ⎞
                              μ=⎜ ⎟      Σ=⎜                    ⎟
                                ⎜μ ⎟       ⎜σ              σ 2y ⎟
                                ⎝ y⎠       ⎝ xy                 ⎠

          Where we insist that Σ is symmetric non-negative definite




Copyright © Andrew W. Moore                                                                      Slide 20




                                                                                                            10
Bivariate Gaussians
                          ⎛X⎞
           Write r.v. X = ⎜ ⎟             Then define               X ~ N (μ , Σ ) to mean
                          ⎜Y ⎟
                          ⎝⎠

                                                              (                                      )
                                          1
                                                          exp − 1 (x − μ)T Σ −1 (x − μ)
                          p ( x) =                1             2
                                     2π || Σ ||       2


        Where the Gaussian’s parameters are…

                                ⎛ μx ⎞     ⎛σ 2 x          σ xy ⎞
                              μ=⎜ ⎟      Σ=⎜                    ⎟
                                ⎜μ ⎟       ⎜σ              σ 2y ⎟
                                ⎝ y⎠       ⎝ xy                 ⎠

           Where we insist that Σ is symmetric non-negative definite

           It turns out that E[X] = μ and Cov[X] = Σ. (Note that this is a
           resulting property of Gaussians, not a definition)*

                                                                   *This note rates 7.4 on the pedanticness scale


Copyright © Andrew W. Moore                                                                                         Slide 21




    Evaluating                                                                              (                            )
                                                                       1
                                                                                       exp − 1 (x − μ)T Σ −1 (x − μ)
                                                  p ( x) =
   p(x): Step 1
                                                                               1             2
                                                                  2π || Σ ||       2




      1.     Begin with vector x

                                                                                                x



                                                                                        μ




Copyright © Andrew W. Moore                                                                                         Slide 22




                                                                                                                               11
Evaluating                                                                    (                             )
                                                         1
                                                                         exp − 1 (x − μ)T Σ −1 (x − μ)
                                         p ( x) =
   p(x): Step 2
                                                                 1             2
                                                    2π || Σ ||       2




      1.     Begin with vector x
             Define δ = x - μ
      2.
                                                                                          x

                                                                              δ
                                                                          μ




Copyright © Andrew W. Moore                                                                               Slide 23




    Evaluating                                                                    (                             )
                                                         1
                                                                         exp − 1 (x − μ)T Σ −1 (x − μ)
                                         p ( x) =
   p(x): Step 3
                                                                 1             2
                                                    2π || Σ ||       2


                                                                                       Contours defined by
                                                                                      sqrt(δTΣ-1δ) = constant
      1.     Begin with vector x
             Define δ = x - μ
      2.
                                                                                          x
      3.     Count the number of contours
                                                                              δ
             crossed of the ellipsoids
             formed Σ-1                                                   μ
                                sqrt(δTΣ-1δ)
             D = this count =
             = Mahalonobis Distance
             between x and μ




Copyright © Andrew W. Moore                                                                               Slide 24




                                                                                                                     12
Evaluating                                                                            (                           )
                                                                      1
                                                                                      exp − 1 (x − μ)T Σ −1 (x − μ)
                                             p ( x) =
   p(x): Step 4
                                                                              1             2
                                                                 2π || Σ ||       2




      1.     Begin with vector x
             Define δ = x - μ
      2.
      3.     Count the number of contours




                                                        exp(-D 2/2)
             crossed of the ellipsoids
             formed Σ-1
             D = this count = sqrt(δTΣ-1δ)
             = Mahalonobis Distance
             between x and μ
      4.     Define w = exp(-D 2/2)
                                                                                        D2
                                                       x close to μ in squared Mahalonobis
                                                     space gets a large weight. Far away gets
                                                                   a tiny weight

Copyright © Andrew W. Moore                                                                                     Slide 25




    Evaluating                                                                            (                           )
                                                                      1
                                                                                      exp − 1 (x − μ)T Σ −1 (x − μ)
                                             p ( x) =
   p(x): Step 5
                                                                              1             2
                                                                 2π || Σ ||       2




      1.     Begin with vector x
             Define δ = x - μ
      2.
      3.     Count the number of contours
                                                        exp(-D 2/2)




             crossed of the ellipsoids
             formed Σ-1
             D = this count = sqrt(δTΣ-1δ)
             = Mahalonobis Distance
             between x and μ
      4.     Define w = exp(-D 2/2)
      5.                                                                                D2
                                    1
                                                     to ensure∫ p(x)dx = 1
           Multiply w by                     1
                                2π || Σ ||       2




Copyright © Andrew W. Moore                                                                                     Slide 26




                                                                                                                           13
Example




    Observe: Mean, Principal axes,
                                     Common convention: show contour
    implication of off-diagonal
                                        corresponding to 2 standard
    covariance term, max gradient
                                           deviations from mean
    zone of p(x)

Copyright © Andrew W. Moore                                            Slide 27




                               Example




Copyright © Andrew W. Moore                                            Slide 28




                                                                                  14
Example




       In this example, x and y are almost independent




Copyright © Andrew W. Moore                                       Slide 29




                                 Example




       In this example, x and “x+y” are clearly not independent




Copyright © Andrew W. Moore                                       Slide 30




                                                                             15
Example




       In this example, x and “20x+y” are clearly not independent




Copyright © Andrew W. Moore                                                                                     Slide 31




                              Multivariate Gaussians
                      ⎛ X1 ⎞
                      ⎜    ⎟
                      ⎜X ⎟
       Write r.v. X = ⎜ 2 ⎟                        Then define              X ~ N (μ , Σ ) to mean
                        M
                      ⎜    ⎟
                      ⎜X ⎟
                      ⎝ m⎠

                                                                        (                             )
                                              1
                                                                    exp − 1 (x − μ)T Σ −1 (x − μ)
                       p ( x) =           m                 1             2
                                  (2π )           || Σ ||
                                              2                 2



                                                                       ⎛ μ1 ⎞        ⎛ σ 21   σ 12 L σ 1m ⎞
                                                                                     ⎜                     ⎟
          Where the Gaussian’s                                         ⎜⎟
                                                                       ⎜μ ⎟          ⎜σ       σ 2 2 L σ 2m ⎟
          parameters have…                                           μ=⎜ 2⎟      Σ = ⎜ 12
                                                                                                         M⎟
                                                                         M
                                                                                     ⎜M        M     O      ⎟
                                                                       ⎜⎟
                                                                       ⎜μ ⎟          ⎜σ              L σ 2m ⎟
                                                                                              σ 2m
                                                                       ⎝ m⎠          ⎝ 1m                   ⎠
      Where we insist that Σ is symmetric non-negative definite
      Again, E[X] = μ and Cov[X] = Σ. (Note that this is a resulting property of Gaussians, not a definition)

Copyright © Andrew W. Moore                                                                                     Slide 32




                                                                                                                           16
General Gaussians
                                      ⎛ μ1 ⎞       ⎛ σ 21    σ 12 L σ 1m ⎞
                                                   ⎜                      ⎟
                                      ⎜⎟
                                      ⎜μ ⎟         ⎜σ        σ 2 2 L σ 2m ⎟
                                    μ=⎜ 2⎟     Σ = ⎜ 12
                                                                             M⎟
                                        M
                                                   ⎜M            M       O      ⎟
                                      ⎜⎟
                                      ⎜μ ⎟         ⎜σ                    L σ 2m ⎟
                                                             σ 2m
                                      ⎝ m⎠         ⎝ 1m                         ⎠




                                    x2




                                                        x1
Copyright © Andrew W. Moore                                                                    Slide 33




                              Axis-Aligned Gaussians
                                           ⎛ σ 21                                      0⎞
                                                                         L
                                                  0          0                 0
                                           ⎜                                               ⎟
                                ⎛ μ1 ⎞     ⎜ 0 σ 22                                    0⎟
                                                                         L
                                                             0                 0
                                ⎜⎟         ⎜0                                          0⎟
                                ⎜μ ⎟                         σ   2
                                                                         L
                                                  0                            0
                                         Σ=⎜                                               ⎟
                                                                     3
                              μ=⎜ 2⎟
                                           ⎜M                                           M⎟
                                  M               M              M       O      M
                                ⎜⎟         ⎜                                               ⎟
                                ⎜μ ⎟                                     L σ 2 m −1
                                ⎝ m⎠          0   0          0                         0⎟
                                           ⎜
                                           ⎜0                                         σ 2m ⎟
                                                                         L
                                           ⎝                                               ⎠
                                                  0          0                 0
     X i ⊥ X i for i ≠ j
                                    x2




                                                        x1
Copyright © Andrew W. Moore                                                                    Slide 34




                                                                                                          17
Spherical Gaussians
                                             ⎛σ 2                            0⎞
                                                                     L
                                                    0       0            0
                                             ⎜                                 ⎟
                                ⎛ μ1 ⎞              σ
                                             ⎜0                              0⎟
                                                        2
                                                                     L
                                                            0            0
                                ⎜⎟           ⎜0                              0⎟
                                ⎜μ ⎟                        σ   2
                                                                     L
                                                    0                    0
                                           Σ=⎜                                 ⎟
                              μ=⎜ 2⎟
                                             ⎜M                               M⎟
                                  M                 M       M        O   M
                                ⎜⎟           ⎜                                 ⎟
                                ⎜μ ⎟                                 L σ2
                                ⎝ m⎠         ⎜0     0       0                0⎟
                                             ⎜0                              σ2⎟
                                                                     L
                                             ⎝                                 ⎠
                                                    0       0            0
     X i ⊥ X i for i ≠ j
                                    x2




                                                    x1
Copyright © Andrew W. Moore                                                                        Slide 35




                              Degenerate Gaussians
                                      ⎛ μ1 ⎞
                                      ⎜⎟
                                      ⎜μ ⎟
                                    μ=⎜ 2⎟              || Σ ||= 0
                                        M
                                      ⎜⎟
                                      ⎜μ ⎟
                                      ⎝ m⎠


                                                                               What’s so wrong
                                                                                with clipping
                                                                               one’s toenails in
                                    x2                                             public?




                                                    x1
Copyright © Andrew W. Moore                                                                        Slide 36




                                                                                                              18
Where are we now?
    • We’ve seen the formulae for Gaussians
    • We have an intuition of how they behave
    • We have some experience of “reading” a
      Gaussian’s covariance matrix

    • Coming next:
                                Some useful tricks with Gaussians


Copyright © Andrew W. Moore                                               Slide 37




                              Subsets of variables
                                                        ⎛ X1 ⎞
                                                        ⎜           ⎟
                                                    U=⎜ M ⎟
                            ⎛ X1 ⎞
                            ⎜    ⎟                      ⎜X          ⎟
                                          ⎛U⎞
                            ⎜ X2 ⎟                      ⎝ m (u ) ⎠
                                          ⎜ ⎟ where
                  Write X = ⎜      as X = ⎜ ⎟
                              M⎟                       ⎛ X m ( u ) +1 ⎞
                                          ⎝V⎠
                            ⎜    ⎟                     ⎜              ⎟
                            ⎜X ⎟                    V =⎜ M ⎟
                            ⎝ m⎠
                                                       ⎜X             ⎟
                                                       ⎝              ⎠
                                                               m



                This will be our standard notation for breaking an m-
                dimensional distribution into subsets of variables



Copyright © Andrew W. Moore                                               Slide 38




                                                                                     19
Gaussian Marginals                         ⎛U⎞       Margin-
                                                  ⎜⎟                    U
                                                  ⎜V⎟
         are Gaussian
                                                             alize
                                                  ⎝⎠

                   ⎛ X1 ⎞
                                               ⎛ X1 ⎞
                   ⎜    ⎟                                 ⎛ X m ( u ) +1 ⎞
                                               ⎜        ⎟ ⎜              ⎟
                                 ⎛U⎞
                   ⎜ X2 ⎟
                          as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟
         Write X = ⎜             ⎜V⎟
                     M⎟          ⎝⎠            ⎜X       ⎟ ⎜X             ⎟
                   ⎜    ⎟                                 ⎝              ⎠
                   ⎜X ⎟                        ⎝ m (u ) ⎠         m
                   ⎝ m⎠
                   ⎛⎛μ ⎞ ⎛ Σ          Σ uv ⎞ ⎞
           ⎛U⎞
        IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu           ⎟⎟
           ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT       Σ vv ⎟ ⎟
           ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv            ⎠⎠

       THEN U is also distributed as a Gaussian

      U ~ N (μ u , Σ uu )
Copyright © Andrew W. Moore                                                   Slide 39




       Gaussian Marginals                         ⎛U⎞       Margin-
                                                  ⎜⎟                    U
                                                  ⎜V⎟
         are Gaussian
                                                             alize
                                                  ⎝⎠

                   ⎛ X1 ⎞
                                               ⎛ X1 ⎞
                   ⎜    ⎟                                 ⎛ X m ( u ) +1 ⎞
                                               ⎜        ⎟ ⎜              ⎟
                                 ⎛U⎞
                   ⎜ X2 ⎟
                          as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟
         Write X = ⎜             ⎜⎟
                     M⎟          ⎝V⎠           ⎜X       ⎟ ⎜X             ⎟
                   ⎜    ⎟                                 ⎝              ⎠
                   ⎜X ⎟                        ⎝ m (u ) ⎠         m
                   ⎝ m⎠
                   ⎛⎛μ ⎞ ⎛ Σ          Σ uv ⎞ ⎞
           ⎛U⎞
        IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu           ⎟⎟
           ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT       Σ vv ⎟ ⎟
           ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv            ⎠⎠             This fact is not
                                                        immediately obvious
       THEN U is also distributed as a Gaussian
                                                   Obvious, once we know
      U ~ N (μ u , Σ uu )                           it’s a Gaussian (why?)

Copyright © Andrew W. Moore                                                   Slide 40




                                                                                         20
Gaussian Marginals                              ⎛U⎞       Margin-
                                                       ⎜⎟                           U
                                                       ⎜V⎟
         are Gaussian
                                                                  alize
                                                       ⎝⎠

                   ⎛ X1 ⎞
                                               ⎛ X1 ⎞
                   ⎜    ⎟                                  ⎛ X m ( u ) +1 ⎞
                                               ⎜        ⎟  ⎜              ⎟
                                 ⎛U⎞
                   ⎜ X2 ⎟
                          as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟
         Write X = ⎜             ⎜⎟
                     M⎟          ⎝V⎠           ⎜X       ⎟  ⎜X             ⎟
                   ⎜    ⎟                                  ⎝              ⎠
                   ⎜X ⎟                        ⎝ m (u ) ⎠          m
                   ⎝ m⎠                       How would you prove
                                                                    this?
                   ⎛⎛μ ⎞ ⎛ Σ            Σ uv ⎞ ⎞
           ⎛U⎞
        IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu             ⎟⎟
           ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT         Σ vv ⎟ ⎟
           ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv              ⎠⎠
                                                                         p (u )
                                                                   ∫
                                                             =         p (u , v ) d v
       THEN U is also distributed as a Gaussian
                                                                   v
      U ~ N (μ u , Σ uu )                                    =         (snore...)

Copyright © Andrew W. Moore                                                              Slide 41




                                                                 Matrix A
         Linear Transforms                                                       AX
                                                                 Multiply
                                                       X
          remain Gaussian
       Assume X is an m-dimensional Gaussian r.v.

                                 X ~ N (μ , Σ )
                                                             p ≤ m ):
       Define Y to be a p-dimensional r. v. thusly (note

                                   Y = AX

        …where A is a p x m matrix. Then…

                                    (              )
                              Y ~ N Aμ , AΣ A T
                                                              Note: the “subset” result is
                                                              a special case of this result


Copyright © Andrew W. Moore                                                              Slide 42




                                                                                                    21
Adding samples of 2
       independent Gaussians                                       X                  X+Y
                                                                             +
             is Gaussian                                           Y

         if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y

                          then X + Y ~ N (μ x + μ y , Σ x + Σ y )

          Why doesn’t this hold if X and Y are dependent?
          Which of the below statements is true?
                         If X and Y are dependent, then X+Y is Gaussian but possibly
                         with some other covariance
                         If X and Y are dependent, then X+Y might be non-Gaussian


Copyright © Andrew W. Moore                                                             Slide 43




   Conditional of                                                  ⎛U⎞   Condition-
                                                                   ⎜⎟                 U|V
                                                                   ⎜V⎟
Gaussian is Gaussian
                                                                           alize
                                                                   ⎝⎠

                     ⎛⎛μ ⎞ ⎛ Σ                         Σ uv ⎞ ⎞
             ⎛U⎞
          IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                          ⎟⎟
             ⎜⎟      ⎜ ⎜ μ ⎟ ⎜ ΣT                      Σ vv ⎟ ⎟
             ⎝V⎠     ⎝ ⎝ v ⎠ ⎝ uv                           ⎠⎠

                              U | V ~ N (μ u |v , Σ u |v ) where
         THEN

                                            −
                        μ u|v = μ u + Σ T Σ vv1 ( V − μ v )
                                        uv


                                                    −
                              Σ u |v = Σ uu − Σ T Σ vv1 Σ uv
                                                uv




Copyright © Andrew W. Moore                                                             Slide 44




                                                                                                   22
⎛ ⎛ 2977 ⎞ ⎛ 849 2              − 967 ⎞ ⎞
              ⎛⎛μ ⎞ ⎛ Σ                    Σ uv ⎞ ⎞
      ⎛U⎞                                                 ⎛ w⎞
                                                       IF ⎜ ⎟ ~ N ⎜ ⎜                                    ⎟⎟
   IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                     ⎟⎟                         ⎟, ⎜
      ⎜V⎟                                                 ⎜ y⎟    ⎜ ⎜ 76 ⎟ ⎜ − 967
              ⎜ ⎜ μ ⎟ ⎜ ΣT                 Σ vv ⎟ ⎟                                               3.68 2 ⎟ ⎟
      ⎝⎠                                                  ⎝⎠      ⎝⎝       ⎠⎝
              ⎝ ⎝ v ⎠ ⎝ uv                      ⎠⎠                                                       ⎠⎠
                                                                     w | y ~ N (μ w| y , Σ w| y ) where
                   U | V ~ N (μ u |v , Σ u|v ) where   THEN
  THEN
                                                                           976 ( y − 76 )
                                                       μ w| y = 2977 −
                        −
   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                    uv
                                                                              3.68 2
                                                                          967 2
                        −
   Σ u|v = Σ uu − Σ T Σ vv1 Σ uv                                = 849 2 −         = 808 2
                                                       Σ w| y
                    uv
                                                                          3.68 2




Copyright © Andrew W. Moore                                                                                    Slide 45




                                                                  ⎛ ⎛ 2977 ⎞ ⎛ 849 2              − 967 ⎞ ⎞
              ⎛⎛μ ⎞ ⎛ Σ                    Σ uv ⎞ ⎞
      ⎛U⎞                                                 ⎛ w⎞
                                                       IF ⎜ ⎟ ~ N ⎜ ⎜                                    ⎟⎟
   IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                     ⎟⎟                         ⎟, ⎜
      ⎜V⎟                                                 ⎜ y⎟    ⎜ ⎜ 76 ⎟ ⎜ − 967
              ⎜ ⎜ μ ⎟ ⎜ ΣT                 Σ vv ⎟ ⎟                                               3.68 2 ⎟ ⎟
      ⎝⎠                                                  ⎝⎠      ⎝⎝       ⎠⎝
              ⎝ ⎝ v ⎠ ⎝ uv                      ⎠⎠                                                       ⎠⎠
                                                                     w | y ~ N (μ w| y , Σ w| y ) where
                   U | V ~ N (μ u |v , Σ u|v ) where   THEN
  THEN
                                                                           976 ( y − 76 )
                                                       μ w| y = 2977 −
                        −
   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                    uv
                                                                              3.68 2
                                                                          967 2
                        −
   Σ u|v = Σ uu − Σ T Σ vv1 Σ uv                                = 849 2 −         = 808 2
                                                       Σ w| y
                    uv                                                         2
                                                                          3.68


                                                                                    P(w|m=82)

                                                                                        P(w|m=76)




                                                                                        P(w)

Copyright © Andrew W. Moore                                                                                    Slide 46




                                                                                                                          23
⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞
              ⎛⎛μ ⎞ ⎛ Σ                    Σ uv ⎞ ⎞
      ⎛U⎞                                                       ⎛ w⎞
                                                         IF ⎜ ⎟ ~ N ⎜ ⎜                                ⎟⎟
   IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                     ⎟⎟                                  ⎟, ⎜
      ⎜V⎟                                                       ⎜ y⎟         ⎜      ⎟ ⎜−
              ⎜ ⎜ μ ⎟ ⎜ ΣT                 Σ vv ⎟ ⎟                                                   2⎟
                                                                ⎝ Note: ⎜ ⎝ 76 given 967 3.68 ⎠ ⎟
      ⎝⎠                                                            ⎠               ⎠ ⎝ value of
              ⎝ ⎝ v ⎠ ⎝ uv                      ⎠⎠                         when
                                                                           ⎝                             ⎠
                                                                      v isyμ~, N (μ conditional
                                                                                    w , Σ | y ) where
                                                                             v the
                   U | V ~ N (μ u |v , Σ u|v ) where     THEN w |
  THEN                                                                    mean of | y is wμu
                                                                                      u
                                                                             976 ( y − 76 )
                                                          μ w| y = 2977 −
                        −
   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                    uv
                                                                                 3.68 2
                                                                             967 2
                        −
   Σ u|v = Σ uu − Σ T Σ vv1 Σ uv                         Σ w| y = 849 2 −           = 808 2
                                                            Note: marginal 23.68 mean is
                    uv

                                                               a linear function of v


                                                                              P(w|m=82)
                                                         Note: conditional
                                                       variance can only be
                                                      equal to or smaller than P(w|m=76)
                                                         marginal variance
         Note: conditional
     variance is independent
      of the given value of v

                                                                                      P(w)

Copyright © Andrew W. Moore                                                                                  Slide 47




      Gaussians and the                                                                             ⎛U⎞
                                                                   U|V              Chain
                                                                                                    ⎜⎟
                                                                                                    ⎜V⎟
         chain rule
                                                                                    Rule
                                                                    V                               ⎝⎠

    Let A be a constant matrix

    IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv )
                        ⎛U⎞
                        ⎜ ⎟ ~ N (μ , Σ ), with
    THEN                ⎜V⎟
                        ⎝⎠
                                   ⎛ AΣ vv A T + Σ u |v             AΣ vv ⎞
       ⎛ Aμ v ⎞
                                 Σ=⎜                                      ⎟
     μ=⎜
       ⎜μ ⎟   ⎟                    ⎜ ( AΣ ) T                        Σ vv ⎟
       ⎝ v⎠                        ⎝                                      ⎠
                                              vv




Copyright © Andrew W. Moore                                                                                  Slide 48




                                                                                                                        24
Available Gaussian tools
⎛U⎞                                              ⎛⎛μ ⎞ ⎛ Σ        Σ uv ⎞ ⎞
                   Margin-               ⎛U⎞
                                                                             THEN U ~ N (μ u , Σ uu )
                                      IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu         ⎟⎟
⎜⎟                               U       ⎜⎟      ⎜ ⎜ μ ⎟ ⎜ ΣT     Σ vv ⎟ ⎟
⎜V⎟                                      ⎝V⎠
                    alize                        ⎝ ⎝ v ⎠ ⎝ uv          ⎠⎠
⎝⎠
                  Matrix A
                                       IF X ~ N (μ , Σ ) AND Y = AX THEN Y ~ N (Aμ , AΣ A T )
                                 AX
                   Multiply
X
                                           if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y
                                           then X + Y ~ N (μ x + μ y , Σ x + Σ y )
X
                        +        X+Y
Y                                                                            Σ uv ⎞ ⎞ THEN U | V ~ N (μ , Σ )
                                                    ⎛⎛μ ⎞ ⎛ Σ
                                            ⎛U⎞
                                         IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                 ⎟⎟
                                            ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT             Σ vv ⎟ ⎟
                                                                                                       u |v u |v

⎛U⎞                                         ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv                  ⎠⎠
                Condition-
⎜⎟                               U|V
⎜V⎟               alize                                                 −
                                                   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                                         where
⎝⎠                                                                                                              −
                                                                                          Σ u |v = Σ uu − Σ T Σ vv1 Σ uv
                                                                    uv
                                                                                                            uv


                                                 U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv )
                                     ⎛U⎞    IF
U|V                      Chain
                                     ⎜⎟
                                     ⎜V⎟
                         Rule                         ⎛U⎞                        ⎛ AΣ vv A T + Σ u|v        AΣ vv ⎞
 V                                   ⎝⎠               ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜                                ⎟
                                            THEN      ⎜V⎟                        ⎜ ( AΣ ) T                 Σ vv ⎟
                                                      ⎝⎠                         ⎝                                ⎠
                                                                                            vv




Copyright © Andrew W. Moore                                                                                        Slide 49




                                        Assume…
    • You are an intellectual snob
    • You have a child




Copyright © Andrew W. Moore                                                                                        Slide 50




                                                                                                                              25
Intellectual snobs with children
    • …are obsessed with IQ
    • In the world as a whole, IQs are drawn from
      a Gaussian N(100,152)




Copyright © Andrew W. Moore                            Slide 51




                                    IQ tests
    • If you take an IQ test you’ll get a score that,
      on average (over many tests) will be your
      IQ
    • But because of noise on any one test the
      score will often be a few points lower or
      higher than your true IQ.

                              SCORE | IQ ~ N(IQ,102)


Copyright © Andrew W. Moore                            Slide 52




                                                                  26
Assume…
    • You drag your kid off to get tested
    • She gets a score of 130
    • “Yippee” you screech and start deciding how
      to casually refer to her membership of the
      top 2% of IQs in your Christmas newsletter.

                                       P(X<130|μ=100,σ2=152) =
                                       P(X<2| μ=0,σ2=1) =
                                       erf(2) = 0.977




Copyright © Andrew W. Moore                                      Slide 53




                              Assume…
    • You drag your kid off to get tested
                               You are thinking:
    • She gets a score of 130
                      Well sure the test isn’t accurate, so
    • “Yippee” you screech andan IQ of 120 or she how
                      she might have start deciding
                       might have an 1Q of 140, but the
      to casually refer most her IQ given the evidence the
                         to likely membership of
      top 2% of IQs in “score=130” is, of course, newsletter.
                          your Christmas 130.

                                       P(X<130|μ=100,σ2=152) =
                                       P(X<2| μ=0,σ2=1) =
                                       erf(2) = 0.977

                                               Can we trust
                                              this reasoning?
Copyright © Andrew W. Moore                                      Slide 54




                                                                            27
Maximum Likelihood IQ
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

                              IQ mle = arg max p ( s = 130 | iq )
                                            iq


    • The MLE is the value of the hidden parameter that
      makes the observed data most likely
    • In this case
                                       IQ mle = 130

Copyright © Andrew W. Moore                                                                Slide 55




                                                  BUT….
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

                              IQ mle = arg max p ( s = 130 | iq )
                                            iq


    • The MLE is the value of the hidden parameter that
      makes the observed data most likely
    • In this case                 This is not the same as
                                                           “The most likely value of the
                                                          parameter given the observed
                                                  = 130
                                            mle
                                       IQ
                                                                      data”

Copyright © Andrew W. Moore                                                                Slide 56




                                                                                                      28
What we really want:
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

    • Question: What is
      IQ | (S=130)?



             Called the Posterior
              Distribution of IQ

Copyright © Andrew W. Moore                                            Slide 57




                              Which tool or tools?
                                          ⎛U⎞
    • IQ~N(100,152)                              Margin-
                                          ⎜⎟                 U
                                          ⎜V⎟     alize
                                          ⎝⎠
    • S|IQ ~ N(IQ, 102)
                                                 Matrix A
    • S=130
                                                             AX
                                                 Multiply
                                          X
    • Question: What is                   X
                                                   +         X+Y
      IQ | (S=130)?                       Y
                                          ⎛U⎞   Condition-
                                          ⎜⎟                 U|V
                                          ⎜V⎟     alize
                                          ⎝⎠

                                                                 ⎛U⎞
                                          U|V       Chain
                                                                 ⎜⎟
                                                                 ⎜V⎟
                                                    Rule
                                           V                     ⎝⎠
Copyright © Andrew W. Moore                                            Slide 58




                                                                                  29
Plan
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

    • Question: What is
      IQ | (S=130)?

                                      ⎛S⎞                          ⎛ IQ ⎞
      S | IQ                  Chain                                              Condition-
                                      ⎜⎟                           ⎜⎟                                    IQ | S
                                                    Swap
                                      ⎜ IQ ⎟                       ⎜S⎟
                              Rule                                                 alize
       IQ                             ⎝⎠                           ⎝⎠


Copyright © Andrew W. Moore                                                                                        Slide 59




          Working…                                                     ⎛⎛μ ⎞ ⎛ Σ                Σ uv ⎞ ⎞ THEN
                                                               ⎛U⎞
                                                            IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                 ⎟⎟
                                                                       ⎜ ⎜ μ ⎟ ⎜ ΣT
                                                               ⎜⎟
                                                                                                Σ vv ⎟ ⎟
                                                                 V⎠
                                                               ⎝       ⎝ ⎝ v ⎠ ⎝ uv                  ⎠⎠
   IQ~N(100,152)                                                                            −
                                                                       μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
   S|IQ ~ N(IQ, 102)                                                                    uv


   S=130
                                               IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv )

   Question: What is IQ | (S=130)?                       ⎛U⎞                        ⎛ AΣ vv A T + Σ u |v      AΣ vv ⎞
                                                         ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜                               ⎟
                                               THEN      ⎜V⎟                        ⎜ ( AΣ ) T                Σ vv ⎟
                                                         ⎝⎠                         ⎝                               ⎠
                                                                                               vv




Copyright © Andrew W. Moore                                                                                        Slide 60




                                                                                                                              30
Your pride and joy’s posterior IQ
    • If you did the working, you now have
      p(IQ|S=130)
    • If you have to give the most likely IQ given
      the score you should give
                              IQ map = arg max p (iq | s = 130 )
                                          iq



    • where MAP means “Maximum A-posteriori”



Copyright © Andrew W. Moore                                        Slide 61




                              What you should know
    • The Gaussian PDF formula off by heart
    • Understand the workings of the formula for
      a Gaussian
    • Be able to understand the Gaussian tools
      described so far
    • Have a rough idea of how you could prove
      them
    • Be happy with how you could use them

Copyright © Andrew W. Moore                                        Slide 62




                                                                              31

More Related Content

Viewers also liked

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetMarcus Peterson
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networksguestfee8698
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)guestfee8698
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Modelsguestfee8698
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clusteringguestfee8698
 

Viewers also liked (12)

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact Sheet
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
VC dimensio
VC dimensioVC dimensio
VC dimensio
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 

Gaussians

  • 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Gaussians Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © Andrew W. Moore Slide 1 Gaussians in Data Mining • Why we should care • The entropy of a PDF • Univariate Gaussians • Multivariate Gaussians • Bayes Rule and Gaussians • Maximum Likelihood and MAP using Gaussians Copyright © Andrew W. Moore Slide 2 1
  • 2. Why we should care • Gaussians are as natural as Orange Juice and Sunshine • We need them to understand Bayes Optimal Classifiers • We need them to understand regression • We need them to understand neural nets • We need them to understand mixture models •… (You get the idea) Copyright © Andrew W. Moore Slide 3 ⎧1 The “box” w | x |≤ if ⎪ p ( x) = ⎨ w 2 distribution w ⎪ 0 if | x |> ⎩ 2 1/w -w/2 0 w/2 Copyright © Andrew W. Moore Slide 4 2
  • 3. ⎧1 The “box” w | x |≤ ⎪ w if 2 p ( x) = ⎨ distribution w ⎪ 0 if | x |> ⎩ 2 1/w -w/2 0 w/2 w2 E[ X ] = 0 Var[ X ] = 12 Copyright © Andrew W. Moore Slide 5 Entropy of a PDF ∞ ∫ p( x) log p( x)dx Entropy of X = H [ X ] = − x = −∞ Natural log (ln or loge) The larger the entropy of a distribution… …the harder it is to predict …the harder it is to compress it …the less spiky the distribution Copyright © Andrew W. Moore Slide 6 3
  • 4. ⎧1 The “box” w | x |≤ ⎪ w if 2 p ( x) = ⎨ distribution w ⎪ 0 if | x |> ⎩ 2 1/w -w/2 0 w/2 ∞ w/ 2 w/ 2 1 1 1 1 ∫ p( x) log p( x)dx = − ∫w / 2 w log w dx = − w log w x=−∫wdx = log w H[ X ] = − x = −∞ x=− /2 Copyright © Andrew W. Moore Slide 7 ⎧1 Unit variance w | x |≤ if ⎪ p ( x) = ⎨ w 2 box distribution w ⎪ 0 if | x |> ⎩ 2 E[ X ] = 0 1 w2 Var[ X ] = 23 12 −3 0 3 if w = 2 3 then Var[ X ] = 1 and H [ X ] = 1.242 Copyright © Andrew W. Moore Slide 8 4
  • 5. The Hat ⎧ w− | x | ⎪ if |x| ≤ w p ( x) = ⎨ w 2 distribution ⎪0 if |x| > w ⎩ E[ X ] = 0 w2 1 Var[ X ] = w 6 −w w 0 Copyright © Andrew W. Moore Slide 9 Unit variance hat ⎧ w− | x | ⎪ if |x| ≤ w p ( x) = ⎨ w 2 distribution ⎪0 if |x| > w ⎩ E[ X ] = 0 w2 1 Var[ X ] = 6 6 −6 0 6 if w = 6 then Var[ X ] = 1 and H [ X ] = 1.396 Copyright © Andrew W. Moore Slide 10 5
  • 6. Dirac Delta The “2 spikes” δ ( x = −1) + δ ( x = 1) p ( x) = distribution 2 E[ X ] = 0 1 1 δ ( x = −1) δ ( x = 1) ∞ 2 2 Var[ X ] = 1 2 -1 0 1 ∞ ∫ p( x) log p( x)dx = −∞ H[ X ] = − x = −∞ Copyright © Andrew W. Moore Slide 11 Entropies of unit-variance distributions Distribution Entropy Box 1.242 Hat 1.396 2 spikes -infinity ??? 1.4189 Largest possible entropy of any unit- variance distribution Copyright © Andrew W. Moore Slide 12 6
  • 7. Unit variance ⎛ x2 ⎞ 1 exp⎜ − ⎟ p ( x) = ⎜ 2⎟ Gaussian 2π ⎝ ⎠ E[ X ] = 0 Var[ X ] = 1 ∞ ∫ p( x) log p( x)dx = 1.4189 H[ X ] = − x = −∞ Copyright © Andrew W. Moore Slide 13 General ⎛ ( x − μ )2 ⎞ 1 exp⎜ − ⎟ p ( x) = ⎜ 2σ 2 ⎟ Gaussian 2π σ ⎝ ⎠ σ=15 E[ X ] = μ Var[ X ] = σ 2 μ=100 Copyright © Andrew W. Moore Slide 14 7
  • 8. General Also known ⎛ ( x − μ )2 ⎞ as the normal 1 exp⎜ − ⎟ p ( x) = distribution ⎜ 2σ 2 ⎟ Gaussian or Bell- 2π σ ⎝ ⎠ shaped curve σ=15 E[ X ] = μ Var[ X ] = σ 2 μ=100 Shorthand: We say X ~ N(μ,σ2) to mean “X is distributed as a Gaussian with parameters μ and σ2”. In the above figure, X ~ N(100,152) Copyright © Andrew W. Moore Slide 15 The Error Function Assume X ~ N(0,1) Define ERF(x) = P(X<x) = Cumulative Distribution of X x ∫ p( z )dz ERF ( x) = z = −∞ ⎛ z2 ⎞ x 1 ∫ ⎜ 2⎟ exp⎜ − ⎟dz = 2π ⎝ ⎠ z = −∞ Copyright © Andrew W. Moore Slide 16 8
  • 9. Using The Error Function Assume X ~ N(μ,σ2) x−μ P(X<x| μ,σ2) = ERF ( ) σ2 Copyright © Andrew W. Moore Slide 17 The Central Limit Theorem • If (X1,X2, … Xn) are i.i.d. continuous random variables 1n • Then define z = f ( x1 , x2 ,...xn ) = ∑ xi n i =1 • As n-->infinity, p(z)--->Gaussian with mean E[Xi] and variance Var[Xi] Somewhat of a justification for assuming Gaussian noise is common Copyright © Andrew W. Moore Slide 18 9
  • 10. Other amazing facts about Gaussians • Wouldn’t you like to know? • We will not examine them until we need to. Copyright © Andrew W. Moore Slide 19 Bivariate Gaussians ⎛X⎞ Write r.v. X = ⎜ ⎟ Then define X ~ N (μ , Σ ) to mean ⎜Y ⎟ ⎝⎠ ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = 1 2 2π || Σ || 2 Where the Gaussian’s parameters are… ⎛ μx ⎞ ⎛σ 2 x σ xy ⎞ μ=⎜ ⎟ Σ=⎜ ⎟ ⎜μ ⎟ ⎜σ σ 2y ⎟ ⎝ y⎠ ⎝ xy ⎠ Where we insist that Σ is symmetric non-negative definite Copyright © Andrew W. Moore Slide 20 10
  • 11. Bivariate Gaussians ⎛X⎞ Write r.v. X = ⎜ ⎟ Then define X ~ N (μ , Σ ) to mean ⎜Y ⎟ ⎝⎠ ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = 1 2 2π || Σ || 2 Where the Gaussian’s parameters are… ⎛ μx ⎞ ⎛σ 2 x σ xy ⎞ μ=⎜ ⎟ Σ=⎜ ⎟ ⎜μ ⎟ ⎜σ σ 2y ⎟ ⎝ y⎠ ⎝ xy ⎠ Where we insist that Σ is symmetric non-negative definite It turns out that E[X] = μ and Cov[X] = Σ. (Note that this is a resulting property of Gaussians, not a definition)* *This note rates 7.4 on the pedanticness scale Copyright © Andrew W. Moore Slide 21 Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 1 1 2 2π || Σ || 2 1. Begin with vector x x μ Copyright © Andrew W. Moore Slide 22 11
  • 12. Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 2 1 2 2π || Σ || 2 1. Begin with vector x Define δ = x - μ 2. x δ μ Copyright © Andrew W. Moore Slide 23 Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 3 1 2 2π || Σ || 2 Contours defined by sqrt(δTΣ-1δ) = constant 1. Begin with vector x Define δ = x - μ 2. x 3. Count the number of contours δ crossed of the ellipsoids formed Σ-1 μ sqrt(δTΣ-1δ) D = this count = = Mahalonobis Distance between x and μ Copyright © Andrew W. Moore Slide 24 12
  • 13. Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 4 1 2 2π || Σ || 2 1. Begin with vector x Define δ = x - μ 2. 3. Count the number of contours exp(-D 2/2) crossed of the ellipsoids formed Σ-1 D = this count = sqrt(δTΣ-1δ) = Mahalonobis Distance between x and μ 4. Define w = exp(-D 2/2) D2 x close to μ in squared Mahalonobis space gets a large weight. Far away gets a tiny weight Copyright © Andrew W. Moore Slide 25 Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 5 1 2 2π || Σ || 2 1. Begin with vector x Define δ = x - μ 2. 3. Count the number of contours exp(-D 2/2) crossed of the ellipsoids formed Σ-1 D = this count = sqrt(δTΣ-1δ) = Mahalonobis Distance between x and μ 4. Define w = exp(-D 2/2) 5. D2 1 to ensure∫ p(x)dx = 1 Multiply w by 1 2π || Σ || 2 Copyright © Andrew W. Moore Slide 26 13
  • 14. Example Observe: Mean, Principal axes, Common convention: show contour implication of off-diagonal corresponding to 2 standard covariance term, max gradient deviations from mean zone of p(x) Copyright © Andrew W. Moore Slide 27 Example Copyright © Andrew W. Moore Slide 28 14
  • 15. Example In this example, x and y are almost independent Copyright © Andrew W. Moore Slide 29 Example In this example, x and “x+y” are clearly not independent Copyright © Andrew W. Moore Slide 30 15
  • 16. Example In this example, x and “20x+y” are clearly not independent Copyright © Andrew W. Moore Slide 31 Multivariate Gaussians ⎛ X1 ⎞ ⎜ ⎟ ⎜X ⎟ Write r.v. X = ⎜ 2 ⎟ Then define X ~ N (μ , Σ ) to mean M ⎜ ⎟ ⎜X ⎟ ⎝ m⎠ ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = m 1 2 (2π ) || Σ || 2 2 ⎛ μ1 ⎞ ⎛ σ 21 σ 12 L σ 1m ⎞ ⎜ ⎟ Where the Gaussian’s ⎜⎟ ⎜μ ⎟ ⎜σ σ 2 2 L σ 2m ⎟ parameters have… μ=⎜ 2⎟ Σ = ⎜ 12 M⎟ M ⎜M M O ⎟ ⎜⎟ ⎜μ ⎟ ⎜σ L σ 2m ⎟ σ 2m ⎝ m⎠ ⎝ 1m ⎠ Where we insist that Σ is symmetric non-negative definite Again, E[X] = μ and Cov[X] = Σ. (Note that this is a resulting property of Gaussians, not a definition) Copyright © Andrew W. Moore Slide 32 16
  • 17. General Gaussians ⎛ μ1 ⎞ ⎛ σ 21 σ 12 L σ 1m ⎞ ⎜ ⎟ ⎜⎟ ⎜μ ⎟ ⎜σ σ 2 2 L σ 2m ⎟ μ=⎜ 2⎟ Σ = ⎜ 12 M⎟ M ⎜M M O ⎟ ⎜⎟ ⎜μ ⎟ ⎜σ L σ 2m ⎟ σ 2m ⎝ m⎠ ⎝ 1m ⎠ x2 x1 Copyright © Andrew W. Moore Slide 33 Axis-Aligned Gaussians ⎛ σ 21 0⎞ L 0 0 0 ⎜ ⎟ ⎛ μ1 ⎞ ⎜ 0 σ 22 0⎟ L 0 0 ⎜⎟ ⎜0 0⎟ ⎜μ ⎟ σ 2 L 0 0 Σ=⎜ ⎟ 3 μ=⎜ 2⎟ ⎜M M⎟ M M M O M ⎜⎟ ⎜ ⎟ ⎜μ ⎟ L σ 2 m −1 ⎝ m⎠ 0 0 0 0⎟ ⎜ ⎜0 σ 2m ⎟ L ⎝ ⎠ 0 0 0 X i ⊥ X i for i ≠ j x2 x1 Copyright © Andrew W. Moore Slide 34 17
  • 18. Spherical Gaussians ⎛σ 2 0⎞ L 0 0 0 ⎜ ⎟ ⎛ μ1 ⎞ σ ⎜0 0⎟ 2 L 0 0 ⎜⎟ ⎜0 0⎟ ⎜μ ⎟ σ 2 L 0 0 Σ=⎜ ⎟ μ=⎜ 2⎟ ⎜M M⎟ M M M O M ⎜⎟ ⎜ ⎟ ⎜μ ⎟ L σ2 ⎝ m⎠ ⎜0 0 0 0⎟ ⎜0 σ2⎟ L ⎝ ⎠ 0 0 0 X i ⊥ X i for i ≠ j x2 x1 Copyright © Andrew W. Moore Slide 35 Degenerate Gaussians ⎛ μ1 ⎞ ⎜⎟ ⎜μ ⎟ μ=⎜ 2⎟ || Σ ||= 0 M ⎜⎟ ⎜μ ⎟ ⎝ m⎠ What’s so wrong with clipping one’s toenails in x2 public? x1 Copyright © Andrew W. Moore Slide 36 18
  • 19. Where are we now? • We’ve seen the formulae for Gaussians • We have an intuition of how they behave • We have some experience of “reading” a Gaussian’s covariance matrix • Coming next: Some useful tricks with Gaussians Copyright © Andrew W. Moore Slide 37 Subsets of variables ⎛ X1 ⎞ ⎜ ⎟ U=⎜ M ⎟ ⎛ X1 ⎞ ⎜ ⎟ ⎜X ⎟ ⎛U⎞ ⎜ X2 ⎟ ⎝ m (u ) ⎠ ⎜ ⎟ where Write X = ⎜ as X = ⎜ ⎟ M⎟ ⎛ X m ( u ) +1 ⎞ ⎝V⎠ ⎜ ⎟ ⎜ ⎟ ⎜X ⎟ V =⎜ M ⎟ ⎝ m⎠ ⎜X ⎟ ⎝ ⎠ m This will be our standard notation for breaking an m- dimensional distribution into subsets of variables Copyright © Andrew W. Moore Slide 38 19
  • 20. Gaussian Marginals ⎛U⎞ Margin- ⎜⎟ U ⎜V⎟ are Gaussian alize ⎝⎠ ⎛ X1 ⎞ ⎛ X1 ⎞ ⎜ ⎟ ⎛ X m ( u ) +1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎛U⎞ ⎜ X2 ⎟ as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟ Write X = ⎜ ⎜V⎟ M⎟ ⎝⎠ ⎜X ⎟ ⎜X ⎟ ⎜ ⎟ ⎝ ⎠ ⎜X ⎟ ⎝ m (u ) ⎠ m ⎝ m⎠ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ THEN U is also distributed as a Gaussian U ~ N (μ u , Σ uu ) Copyright © Andrew W. Moore Slide 39 Gaussian Marginals ⎛U⎞ Margin- ⎜⎟ U ⎜V⎟ are Gaussian alize ⎝⎠ ⎛ X1 ⎞ ⎛ X1 ⎞ ⎜ ⎟ ⎛ X m ( u ) +1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎛U⎞ ⎜ X2 ⎟ as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟ Write X = ⎜ ⎜⎟ M⎟ ⎝V⎠ ⎜X ⎟ ⎜X ⎟ ⎜ ⎟ ⎝ ⎠ ⎜X ⎟ ⎝ m (u ) ⎠ m ⎝ m⎠ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ This fact is not immediately obvious THEN U is also distributed as a Gaussian Obvious, once we know U ~ N (μ u , Σ uu ) it’s a Gaussian (why?) Copyright © Andrew W. Moore Slide 40 20
  • 21. Gaussian Marginals ⎛U⎞ Margin- ⎜⎟ U ⎜V⎟ are Gaussian alize ⎝⎠ ⎛ X1 ⎞ ⎛ X1 ⎞ ⎜ ⎟ ⎛ X m ( u ) +1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎛U⎞ ⎜ X2 ⎟ as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟ Write X = ⎜ ⎜⎟ M⎟ ⎝V⎠ ⎜X ⎟ ⎜X ⎟ ⎜ ⎟ ⎝ ⎠ ⎜X ⎟ ⎝ m (u ) ⎠ m ⎝ m⎠ How would you prove this? ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ p (u ) ∫ = p (u , v ) d v THEN U is also distributed as a Gaussian v U ~ N (μ u , Σ uu ) = (snore...) Copyright © Andrew W. Moore Slide 41 Matrix A Linear Transforms AX Multiply X remain Gaussian Assume X is an m-dimensional Gaussian r.v. X ~ N (μ , Σ ) p ≤ m ): Define Y to be a p-dimensional r. v. thusly (note Y = AX …where A is a p x m matrix. Then… ( ) Y ~ N Aμ , AΣ A T Note: the “subset” result is a special case of this result Copyright © Andrew W. Moore Slide 42 21
  • 22. Adding samples of 2 independent Gaussians X X+Y + is Gaussian Y if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y then X + Y ~ N (μ x + μ y , Σ x + Σ y ) Why doesn’t this hold if X and Y are dependent? Which of the below statements is true? If X and Y are dependent, then X+Y is Gaussian but possibly with some other covariance If X and Y are dependent, then X+Y might be non-Gaussian Copyright © Andrew W. Moore Slide 43 Conditional of ⎛U⎞ Condition- ⎜⎟ U|V ⎜V⎟ Gaussian is Gaussian alize ⎝⎠ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝V⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ U | V ~ N (μ u |v , Σ u |v ) where THEN − μ u|v = μ u + Σ T Σ vv1 ( V − μ v ) uv − Σ u |v = Σ uu − Σ T Σ vv1 Σ uv uv Copyright © Andrew W. Moore Slide 44 22
  • 23. ⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ ⎛ w⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ ⎟⎟ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎟, ⎜ ⎜V⎟ ⎜ y⎟ ⎜ ⎜ 76 ⎟ ⎜ − 967 ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ 3.68 2 ⎟ ⎟ ⎝⎠ ⎝⎠ ⎝⎝ ⎠⎝ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ ⎠⎠ w | y ~ N (μ w| y , Σ w| y ) where U | V ~ N (μ u |v , Σ u|v ) where THEN THEN 976 ( y − 76 ) μ w| y = 2977 − − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) uv 3.68 2 967 2 − Σ u|v = Σ uu − Σ T Σ vv1 Σ uv = 849 2 − = 808 2 Σ w| y uv 3.68 2 Copyright © Andrew W. Moore Slide 45 ⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ ⎛ w⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ ⎟⎟ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎟, ⎜ ⎜V⎟ ⎜ y⎟ ⎜ ⎜ 76 ⎟ ⎜ − 967 ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ 3.68 2 ⎟ ⎟ ⎝⎠ ⎝⎠ ⎝⎝ ⎠⎝ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ ⎠⎠ w | y ~ N (μ w| y , Σ w| y ) where U | V ~ N (μ u |v , Σ u|v ) where THEN THEN 976 ( y − 76 ) μ w| y = 2977 − − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) uv 3.68 2 967 2 − Σ u|v = Σ uu − Σ T Σ vv1 Σ uv = 849 2 − = 808 2 Σ w| y uv 2 3.68 P(w|m=82) P(w|m=76) P(w) Copyright © Andrew W. Moore Slide 46 23
  • 24. ⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ ⎛ w⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ ⎟⎟ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎟, ⎜ ⎜V⎟ ⎜ y⎟ ⎜ ⎟ ⎜− ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ 2⎟ ⎝ Note: ⎜ ⎝ 76 given 967 3.68 ⎠ ⎟ ⎝⎠ ⎠ ⎠ ⎝ value of ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ when ⎝ ⎠ v isyμ~, N (μ conditional w , Σ | y ) where v the U | V ~ N (μ u |v , Σ u|v ) where THEN w | THEN mean of | y is wμu u 976 ( y − 76 ) μ w| y = 2977 − − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) uv 3.68 2 967 2 − Σ u|v = Σ uu − Σ T Σ vv1 Σ uv Σ w| y = 849 2 − = 808 2 Note: marginal 23.68 mean is uv a linear function of v P(w|m=82) Note: conditional variance can only be equal to or smaller than P(w|m=76) marginal variance Note: conditional variance is independent of the given value of v P(w) Copyright © Andrew W. Moore Slide 47 Gaussians and the ⎛U⎞ U|V Chain ⎜⎟ ⎜V⎟ chain rule Rule V ⎝⎠ Let A be a constant matrix IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv ) ⎛U⎞ ⎜ ⎟ ~ N (μ , Σ ), with THEN ⎜V⎟ ⎝⎠ ⎛ AΣ vv A T + Σ u |v AΣ vv ⎞ ⎛ Aμ v ⎞ Σ=⎜ ⎟ μ=⎜ ⎜μ ⎟ ⎟ ⎜ ( AΣ ) T Σ vv ⎟ ⎝ v⎠ ⎝ ⎠ vv Copyright © Andrew W. Moore Slide 48 24
  • 25. Available Gaussian tools ⎛U⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ Margin- ⎛U⎞ THEN U ~ N (μ u , Σ uu ) IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜⎟ U ⎜⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎜V⎟ ⎝V⎠ alize ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ ⎝⎠ Matrix A IF X ~ N (μ , Σ ) AND Y = AX THEN Y ~ N (Aμ , AΣ A T ) AX Multiply X if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y then X + Y ~ N (μ x + μ y , Σ x + Σ y ) X + X+Y Y Σ uv ⎞ ⎞ THEN U | V ~ N (μ , Σ ) ⎛⎛μ ⎞ ⎛ Σ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ u |v u |v ⎛U⎞ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ Condition- ⎜⎟ U|V ⎜V⎟ alize − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) where ⎝⎠ − Σ u |v = Σ uu − Σ T Σ vv1 Σ uv uv uv U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv ) ⎛U⎞ IF U|V Chain ⎜⎟ ⎜V⎟ Rule ⎛U⎞ ⎛ AΣ vv A T + Σ u|v AΣ vv ⎞ V ⎝⎠ ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜ ⎟ THEN ⎜V⎟ ⎜ ( AΣ ) T Σ vv ⎟ ⎝⎠ ⎝ ⎠ vv Copyright © Andrew W. Moore Slide 49 Assume… • You are an intellectual snob • You have a child Copyright © Andrew W. Moore Slide 50 25
  • 26. Intellectual snobs with children • …are obsessed with IQ • In the world as a whole, IQs are drawn from a Gaussian N(100,152) Copyright © Andrew W. Moore Slide 51 IQ tests • If you take an IQ test you’ll get a score that, on average (over many tests) will be your IQ • But because of noise on any one test the score will often be a few points lower or higher than your true IQ. SCORE | IQ ~ N(IQ,102) Copyright © Andrew W. Moore Slide 52 26
  • 27. Assume… • You drag your kid off to get tested • She gets a score of 130 • “Yippee” you screech and start deciding how to casually refer to her membership of the top 2% of IQs in your Christmas newsletter. P(X<130|μ=100,σ2=152) = P(X<2| μ=0,σ2=1) = erf(2) = 0.977 Copyright © Andrew W. Moore Slide 53 Assume… • You drag your kid off to get tested You are thinking: • She gets a score of 130 Well sure the test isn’t accurate, so • “Yippee” you screech andan IQ of 120 or she how she might have start deciding might have an 1Q of 140, but the to casually refer most her IQ given the evidence the to likely membership of top 2% of IQs in “score=130” is, of course, newsletter. your Christmas 130. P(X<130|μ=100,σ2=152) = P(X<2| μ=0,σ2=1) = erf(2) = 0.977 Can we trust this reasoning? Copyright © Andrew W. Moore Slide 54 27
  • 28. Maximum Likelihood IQ • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 IQ mle = arg max p ( s = 130 | iq ) iq • The MLE is the value of the hidden parameter that makes the observed data most likely • In this case IQ mle = 130 Copyright © Andrew W. Moore Slide 55 BUT…. • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 IQ mle = arg max p ( s = 130 | iq ) iq • The MLE is the value of the hidden parameter that makes the observed data most likely • In this case This is not the same as “The most likely value of the parameter given the observed = 130 mle IQ data” Copyright © Andrew W. Moore Slide 56 28
  • 29. What we really want: • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 • Question: What is IQ | (S=130)? Called the Posterior Distribution of IQ Copyright © Andrew W. Moore Slide 57 Which tool or tools? ⎛U⎞ • IQ~N(100,152) Margin- ⎜⎟ U ⎜V⎟ alize ⎝⎠ • S|IQ ~ N(IQ, 102) Matrix A • S=130 AX Multiply X • Question: What is X + X+Y IQ | (S=130)? Y ⎛U⎞ Condition- ⎜⎟ U|V ⎜V⎟ alize ⎝⎠ ⎛U⎞ U|V Chain ⎜⎟ ⎜V⎟ Rule V ⎝⎠ Copyright © Andrew W. Moore Slide 58 29
  • 30. Plan • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 • Question: What is IQ | (S=130)? ⎛S⎞ ⎛ IQ ⎞ S | IQ Chain Condition- ⎜⎟ ⎜⎟ IQ | S Swap ⎜ IQ ⎟ ⎜S⎟ Rule alize IQ ⎝⎠ ⎝⎠ Copyright © Andrew W. Moore Slide 59 Working… ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ THEN ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT ⎜⎟ Σ vv ⎟ ⎟ V⎠ ⎝ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ IQ~N(100,152) − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) S|IQ ~ N(IQ, 102) uv S=130 IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv ) Question: What is IQ | (S=130)? ⎛U⎞ ⎛ AΣ vv A T + Σ u |v AΣ vv ⎞ ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜ ⎟ THEN ⎜V⎟ ⎜ ( AΣ ) T Σ vv ⎟ ⎝⎠ ⎝ ⎠ vv Copyright © Andrew W. Moore Slide 60 30
  • 31. Your pride and joy’s posterior IQ • If you did the working, you now have p(IQ|S=130) • If you have to give the most likely IQ given the score you should give IQ map = arg max p (iq | s = 130 ) iq • where MAP means “Maximum A-posteriori” Copyright © Andrew W. Moore Slide 61 What you should know • The Gaussian PDF formula off by heart • Understand the workings of the formula for a Gaussian • Be able to understand the Gaussian tools described so far • Have a rough idea of how you could prove them • Be happy with how you could use them Copyright © Andrew W. Moore Slide 62 31