EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell EE3J2 Data Mining
Objectives <ul><li>To review basic statistical modelling </li></ul><ul><li>To review the notion of probability distributio...
Discrete variables <ul><li>Suppose that  Y  is a  random variable  which can take any value in a discrete set  X= { x 1 ,x...
Discrete Probability Mass Function EE3J2 Data Mining Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 15...
Continuous Random Variables <ul><li>In most practical applications the data are not restricted to a finite set of values –...
Continuous Random Variables <ul><li>An alternative is to use a  parametric  model </li></ul><ul><li>In a parametric model,...
Gaussian PDF <ul><li>‘ Standard’ 1-dimensional Guassian PDF: </li></ul><ul><ul><li>mean   =0 </li></ul></ul><ul><ul><li>v...
Gaussian PDF EE3J2 Data Mining a b P ( a    x    b )
Gaussian PDF <ul><li>For a 1-dimensional Gaussian PDF  p  with mean    and variance   : </li></ul>EE3J2 Data Mining Cons...
More examples EE3J2 Data Mining  =0.1  =1.0  =10.0  =5.0
Fitting a Gaussian PDF to Data <ul><li>Suppose  y = y 1 ,…,y n ,…,y N   is a set of  N  data values </li></ul><ul><li>Give...
Fitting a Gaussian PDF to Data EE3J2 Data Mining Poor fit Good fit
Maximum Likelihood Estimation <ul><li>Define  the best fitting Gaussian to be the one such that  p ( y |  ,  ) is maximi...
ML estimation of   ,    <ul><li>Intuitively: </li></ul><ul><ul><li>The maximum likelihood estimate of     should be   t...
Multi-modal distributions <ul><li>In practice the distributions of many naturally occurring phenomena do not follow the si...
Gaussian Mixture PDFs <ul><li>Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-mo...
Gaussian Mixture - Example <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:   =0,   =0.1 </li></ul><...
Example 2 <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:   =0,   =0.1 </li></ul></ul><ul><ul><li>C...
Example 3 <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:   =0,   =0.1 </li></ul></ul><ul><ul><li>C...
Example 4 <ul><li>5 component Gaussian mixture PDF </li></ul>EE3J2 Data Mining
Gaussian Mixture Model <ul><li>In general, an  M  component Gaussian mixture PDF is defined by: </li></ul><ul><li>where ea...
Estimating the parameters of a Gaussian mixture model <ul><li>A Gaussian Mixture Model with  M  components has: </li></ul>...
Parameter Estimation <ul><li>If we knew which component each sample  y t   came from, then parameter estimation would be e...
Solution – the E-M algorithm <ul><li>Guess initial values </li></ul><ul><li>For each  n  calculate the probabilities </li>...
The E-M algorithm EE3J2 Data Mining Parameter set   p (y |   )  (0) …   (i) local  optimum
E-M Algorithm <ul><li>Let’s just look at estimation of a the mean  μ  of a single component of a GMM </li></ul><ul><li>In ...
E-M continued <ul><li>From Bayes’ theorem: </li></ul>EE3J2 Data Mining Calculate from  m th  Gaussian component Sum over a...
Example – initial model EE3J2 Data Mining P ( m 1 |y 6 )= λ 1 P ( m 2 |y 6 )= λ 2 m 1 m 2 y 6
Example – after 1 st  iteration of E-M EE3J2 Data Mining
Example – after 2 nd  iteration of E-M EE3J2 Data Mining
Example – after 4 th  iteration of E-M EE3J2 Data Mining
Example – after 10 th  iteration of E-M EE3J2 Data Mining
Multivariate Gaussian PDFs <ul><li>All PDFs so far have been  1-dimensional </li></ul><ul><li>They take  scalar  values </...
Multivariate Gaussian PDFs EE3J2 Data Mining Contours of equal probability 1-dimensional Gaussian PDFs
Multivariate Gaussian PDFs EE3J2 Data Mining 1-dimensional Gaussian PDFs
Multivariate Gaussian PDF <ul><li>The parameters of a multivariate Gaussian PDF are: </li></ul><ul><ul><li>The (vector) me...
Multivariate Gaussian PDFs <ul><li>Multivariate Gaussian PDFs are commonly used in pattern processing and data mining </li...
Summary <ul><li>Basic statistical modelling </li></ul><ul><li>Probability distributions </li></ul><ul><li>Probability dens...
Upcoming SlideShare
Loading in...5
×

EE3J2 Data Mining EE3J2 Data Mining

766

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
766
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
60
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

EE3J2 Data Mining EE3J2 Data Mining

  1. 1. EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell EE3J2 Data Mining
  2. 2. Objectives <ul><li>To review basic statistical modelling </li></ul><ul><li>To review the notion of probability distribution </li></ul><ul><li>To review the notion of probability distribution </li></ul><ul><li>To review the notion of probability density function </li></ul><ul><li>To introduce mixture densities </li></ul><ul><li>To introduce the multivariate Gaussian density </li></ul>EE3J2 Data Mining
  3. 3. Discrete variables <ul><li>Suppose that Y is a random variable which can take any value in a discrete set X= { x 1 ,x 2 ,…,x M } </li></ul><ul><li>Suppose that y 1 ,y 2 ,…,y N are samples of the random variable Y </li></ul><ul><li>If c m is the number of times that the y n = x m then an estimate of the probability that y n takes the value x m is given by: </li></ul>EE3J2 Data Mining
  4. 4. Discrete Probability Mass Function EE3J2 Data Mining Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098
  5. 5. Continuous Random Variables <ul><li>In most practical applications the data are not restricted to a finite set of values – they can take any value in N -dimensional space </li></ul><ul><li>Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities… </li></ul><ul><li>… but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods </li></ul>EE3J2 Data Mining
  6. 6. Continuous Random Variables <ul><li>An alternative is to use a parametric model </li></ul><ul><li>In a parametric model, probabilities are defined by a small set of parameters </li></ul><ul><li>Simplest example is a normal , or Gaussian model </li></ul><ul><li>A Gaussian probability density function (PDF) is defined by two parameters – its mean  and variance  </li></ul>EE3J2 Data Mining
  7. 7. Gaussian PDF <ul><li>‘ Standard’ 1-dimensional Guassian PDF: </li></ul><ul><ul><li>mean  =0 </li></ul></ul><ul><ul><li>variance  =1 </li></ul></ul>EE3J2 Data Mining
  8. 8. Gaussian PDF EE3J2 Data Mining a b P ( a  x  b )
  9. 9. Gaussian PDF <ul><li>For a 1-dimensional Gaussian PDF p with mean  and variance  : </li></ul>EE3J2 Data Mining Constant to ensure area under curve is 1 Defines ‘bell’ shape
  10. 10. More examples EE3J2 Data Mining  =0.1  =1.0  =10.0  =5.0
  11. 11. Fitting a Gaussian PDF to Data <ul><li>Suppose y = y 1 ,…,y n ,…,y N is a set of N data values </li></ul><ul><li>Given a Gaussian PDF p with mean  and variance  , define: </li></ul><ul><li>How do we choose  and  to maximise this probability? </li></ul>EE3J2 Data Mining
  12. 12. Fitting a Gaussian PDF to Data EE3J2 Data Mining Poor fit Good fit
  13. 13. Maximum Likelihood Estimation <ul><li>Define the best fitting Gaussian to be the one such that p ( y |  ,  ) is maximised. </li></ul><ul><li>Terminology: </li></ul><ul><ul><li>p ( y |  ,  ), thought of as a function of y is the probability ( density ) of y </li></ul></ul><ul><ul><li>p ( y |  ,  ), thought of as a function of  ,  is the likelihood of  ,  </li></ul></ul><ul><li>Maximising p ( y |  ,  ) with respect to  ,  is called Maximum Likelihood (ML) estimation of  ,  </li></ul>EE3J2 Data Mining
  14. 14. ML estimation of  ,  <ul><li>Intuitively: </li></ul><ul><ul><li>The maximum likelihood estimate of  should be the average value of y 1 ,…,y N , (the sample mean ) </li></ul></ul><ul><ul><li>The maximum likelihood estimate of  should be the variance of y 1 ,…,y N. (the sample variance ) </li></ul></ul><ul><li>This turns out to be true: p ( y |  ,  ) is maximised by setting: </li></ul>EE3J2 Data Mining
  15. 15. Multi-modal distributions <ul><li>In practice the distributions of many naturally occurring phenomena do not follow the simple bell-shaped Gaussian curve </li></ul><ul><li>For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults) </li></ul><ul><li>These peaks are the modes of the distribution and the distribution is called multi-modal </li></ul>EE3J2 Data Mining
  16. 16. Gaussian Mixture PDFs <ul><li>Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions. </li></ul><ul><li>A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs </li></ul><ul><li>For example, if p 1 and p 2 are Gaussiam PDFs, then </li></ul><ul><li>p ( y ) = w 1 p 1 ( y ) + w 2 p 2 ( y ) </li></ul><ul><li>defines a 2 component Gaussian mixture PDF </li></ul>EE3J2 Data Mining
  17. 17. Gaussian Mixture - Example <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:  =0,  =0.1 </li></ul></ul><ul><ul><li>Component 2:  =2,  =1 </li></ul></ul><ul><ul><li>w 1 = w 2 = 0.5 </li></ul></ul>EE3J2 Data Mining
  18. 18. Example 2 <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:  =0,  =0.1 </li></ul></ul><ul><ul><li>Component 2:  =2,  =1 </li></ul></ul><ul><ul><li>w 1 = 0.2 w 2 = 0.8 </li></ul></ul>EE3J2 Data Mining
  19. 19. Example 3 <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:  =0,  =0.1 </li></ul></ul><ul><ul><li>Component 2:  =2,  =1 </li></ul></ul><ul><ul><li>w 1 = 0.2 w 2 = 0.8 </li></ul></ul>EE3J2 Data Mining
  20. 20. Example 4 <ul><li>5 component Gaussian mixture PDF </li></ul>EE3J2 Data Mining
  21. 21. Gaussian Mixture Model <ul><li>In general, an M component Gaussian mixture PDF is defined by: </li></ul><ul><li>where each p m is a Gaussian PDF and </li></ul>EE3J2 Data Mining
  22. 22. Estimating the parameters of a Gaussian mixture model <ul><li>A Gaussian Mixture Model with M components has: </li></ul><ul><ul><li>M means:  1 ,…,  M </li></ul></ul><ul><ul><li>M variances  1 ,…,  M </li></ul></ul><ul><ul><li>M mixture weights w 1 ,…,w M . </li></ul></ul><ul><li>Given a set of data y = y 1 ,…,y N , how can we estimate these parameters? </li></ul><ul><li>I.e. how do we find a maximum likelihood estimate of  1 ,…,  M ,  1 ,…,  M , w 1 ,…,w M ? </li></ul>EE3J2 Data Mining
  23. 23. Parameter Estimation <ul><li>If we knew which component each sample y t came from, then parameter estimation would be easy: </li></ul><ul><ul><li>Set  m to be the average value of the samples which belong to the m th component </li></ul></ul><ul><ul><li>Set  m to be the variance of the samples which belong to the m th component </li></ul></ul><ul><ul><li>Set w m to be the proportion of samples which belong to the m th component </li></ul></ul><ul><li>But we don’t know which component each sample belongs to. </li></ul>EE3J2 Data Mining
  24. 24. Solution – the E-M algorithm <ul><li>Guess initial values </li></ul><ul><li>For each n calculate the probabilities </li></ul><ul><li>Use these probabilities to estimate how much each sample y n ‘belongs to’ the m th component </li></ul><ul><li>Calculate: </li></ul>EE3J2 Data Mining REPEAT This is a measure of how much y n ‘belongs to’ the m th component
  25. 25. The E-M algorithm EE3J2 Data Mining Parameter set  p (y |  )  (0) …  (i) local optimum
  26. 26. E-M Algorithm <ul><li>Let’s just look at estimation of a the mean μ of a single component of a GMM </li></ul><ul><li>In fact, </li></ul><ul><li>In other words, λ n is the probability of the m th component given the data point y n </li></ul>EE3J2 Data Mining
  27. 27. E-M continued <ul><li>From Bayes’ theorem: </li></ul>EE3J2 Data Mining Calculate from m th Gaussian component Sum over all components m th weight
  28. 28. Example – initial model EE3J2 Data Mining P ( m 1 |y 6 )= λ 1 P ( m 2 |y 6 )= λ 2 m 1 m 2 y 6
  29. 29. Example – after 1 st iteration of E-M EE3J2 Data Mining
  30. 30. Example – after 2 nd iteration of E-M EE3J2 Data Mining
  31. 31. Example – after 4 th iteration of E-M EE3J2 Data Mining
  32. 32. Example – after 10 th iteration of E-M EE3J2 Data Mining
  33. 33. Multivariate Gaussian PDFs <ul><li>All PDFs so far have been 1-dimensional </li></ul><ul><li>They take scalar values </li></ul><ul><li>But most real data will be represented as D -dimensional vectors </li></ul><ul><li>The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF </li></ul>EE3J2 Data Mining
  34. 34. Multivariate Gaussian PDFs EE3J2 Data Mining Contours of equal probability 1-dimensional Gaussian PDFs
  35. 35. Multivariate Gaussian PDFs EE3J2 Data Mining 1-dimensional Gaussian PDFs
  36. 36. Multivariate Gaussian PDF <ul><li>The parameters of a multivariate Gaussian PDF are: </li></ul><ul><ul><li>The (vector) mean  </li></ul></ul><ul><ul><li>The (vector) variance  </li></ul></ul><ul><ul><li>The covariance </li></ul></ul>EE3J2 Data Mining The covariance matrix 
  37. 37. Multivariate Gaussian PDFs <ul><li>Multivariate Gaussian PDFs are commonly used in pattern processing and data mining </li></ul><ul><li>Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs </li></ul><ul><li>The E-M algorithm works for multivariate Gaussian mixture PDFs </li></ul>EE3J2 Data Mining
  38. 38. Summary <ul><li>Basic statistical modelling </li></ul><ul><li>Probability distributions </li></ul><ul><li>Probability density function </li></ul><ul><li>Gaussian PDFs </li></ul><ul><li>Gaussian mixture PDFs and the E-M algorithm </li></ul><ul><li>Multivariate Gaussian PDFs </li></ul>EE3J2 Data Mining
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×