Upcoming SlideShare
×

# EE3J2 Data Mining EE3J2 Data Mining

766

Published on

3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
766
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
60
0
Likes
3
Embeds 0
No embeds

No notes for slide

### EE3J2 Data Mining EE3J2 Data Mining

1. 1. EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell EE3J2 Data Mining
2. 2. Objectives <ul><li>To review basic statistical modelling </li></ul><ul><li>To review the notion of probability distribution </li></ul><ul><li>To review the notion of probability distribution </li></ul><ul><li>To review the notion of probability density function </li></ul><ul><li>To introduce mixture densities </li></ul><ul><li>To introduce the multivariate Gaussian density </li></ul>EE3J2 Data Mining
3. 3. Discrete variables <ul><li>Suppose that Y is a random variable which can take any value in a discrete set X= { x 1 ,x 2 ,…,x M } </li></ul><ul><li>Suppose that y 1 ,y 2 ,…,y N are samples of the random variable Y </li></ul><ul><li>If c m is the number of times that the y n = x m then an estimate of the probability that y n takes the value x m is given by: </li></ul>EE3J2 Data Mining
4. 4. Discrete Probability Mass Function EE3J2 Data Mining Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098
5. 5. Continuous Random Variables <ul><li>In most practical applications the data are not restricted to a finite set of values – they can take any value in N -dimensional space </li></ul><ul><li>Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities… </li></ul><ul><li>… but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods </li></ul>EE3J2 Data Mining
6. 6. Continuous Random Variables <ul><li>An alternative is to use a parametric model </li></ul><ul><li>In a parametric model, probabilities are defined by a small set of parameters </li></ul><ul><li>Simplest example is a normal , or Gaussian model </li></ul><ul><li>A Gaussian probability density function (PDF) is defined by two parameters – its mean  and variance  </li></ul>EE3J2 Data Mining
7. 7. Gaussian PDF <ul><li>‘ Standard’ 1-dimensional Guassian PDF: </li></ul><ul><ul><li>mean  =0 </li></ul></ul><ul><ul><li>variance  =1 </li></ul></ul>EE3J2 Data Mining
8. 8. Gaussian PDF EE3J2 Data Mining a b P ( a  x  b )
9. 9. Gaussian PDF <ul><li>For a 1-dimensional Gaussian PDF p with mean  and variance  : </li></ul>EE3J2 Data Mining Constant to ensure area under curve is 1 Defines ‘bell’ shape
10. 10. More examples EE3J2 Data Mining  =0.1  =1.0  =10.0  =5.0
11. 11. Fitting a Gaussian PDF to Data <ul><li>Suppose y = y 1 ,…,y n ,…,y N is a set of N data values </li></ul><ul><li>Given a Gaussian PDF p with mean  and variance  , define: </li></ul><ul><li>How do we choose  and  to maximise this probability? </li></ul>EE3J2 Data Mining
12. 12. Fitting a Gaussian PDF to Data EE3J2 Data Mining Poor fit Good fit
13. 13. Maximum Likelihood Estimation <ul><li>Define the best fitting Gaussian to be the one such that p ( y |  ,  ) is maximised. </li></ul><ul><li>Terminology: </li></ul><ul><ul><li>p ( y |  ,  ), thought of as a function of y is the probability ( density ) of y </li></ul></ul><ul><ul><li>p ( y |  ,  ), thought of as a function of  ,  is the likelihood of  ,  </li></ul></ul><ul><li>Maximising p ( y |  ,  ) with respect to  ,  is called Maximum Likelihood (ML) estimation of  ,  </li></ul>EE3J2 Data Mining
14. 14. ML estimation of  ,  <ul><li>Intuitively: </li></ul><ul><ul><li>The maximum likelihood estimate of  should be the average value of y 1 ,…,y N , (the sample mean ) </li></ul></ul><ul><ul><li>The maximum likelihood estimate of  should be the variance of y 1 ,…,y N. (the sample variance ) </li></ul></ul><ul><li>This turns out to be true: p ( y |  ,  ) is maximised by setting: </li></ul>EE3J2 Data Mining
15. 15. Multi-modal distributions <ul><li>In practice the distributions of many naturally occurring phenomena do not follow the simple bell-shaped Gaussian curve </li></ul><ul><li>For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults) </li></ul><ul><li>These peaks are the modes of the distribution and the distribution is called multi-modal </li></ul>EE3J2 Data Mining
16. 16. Gaussian Mixture PDFs <ul><li>Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions. </li></ul><ul><li>A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs </li></ul><ul><li>For example, if p 1 and p 2 are Gaussiam PDFs, then </li></ul><ul><li>p ( y ) = w 1 p 1 ( y ) + w 2 p 2 ( y ) </li></ul><ul><li>defines a 2 component Gaussian mixture PDF </li></ul>EE3J2 Data Mining
17. 17. Gaussian Mixture - Example <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:  =0,  =0.1 </li></ul></ul><ul><ul><li>Component 2:  =2,  =1 </li></ul></ul><ul><ul><li>w 1 = w 2 = 0.5 </li></ul></ul>EE3J2 Data Mining
18. 18. Example 2 <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:  =0,  =0.1 </li></ul></ul><ul><ul><li>Component 2:  =2,  =1 </li></ul></ul><ul><ul><li>w 1 = 0.2 w 2 = 0.8 </li></ul></ul>EE3J2 Data Mining
19. 19. Example 3 <ul><li>2 component mixture model </li></ul><ul><ul><li>Component 1:  =0,  =0.1 </li></ul></ul><ul><ul><li>Component 2:  =2,  =1 </li></ul></ul><ul><ul><li>w 1 = 0.2 w 2 = 0.8 </li></ul></ul>EE3J2 Data Mining
20. 20. Example 4 <ul><li>5 component Gaussian mixture PDF </li></ul>EE3J2 Data Mining
21. 21. Gaussian Mixture Model <ul><li>In general, an M component Gaussian mixture PDF is defined by: </li></ul><ul><li>where each p m is a Gaussian PDF and </li></ul>EE3J2 Data Mining
22. 22. Estimating the parameters of a Gaussian mixture model <ul><li>A Gaussian Mixture Model with M components has: </li></ul><ul><ul><li>M means:  1 ,…,  M </li></ul></ul><ul><ul><li>M variances  1 ,…,  M </li></ul></ul><ul><ul><li>M mixture weights w 1 ,…,w M . </li></ul></ul><ul><li>Given a set of data y = y 1 ,…,y N , how can we estimate these parameters? </li></ul><ul><li>I.e. how do we find a maximum likelihood estimate of  1 ,…,  M ,  1 ,…,  M , w 1 ,…,w M ? </li></ul>EE3J2 Data Mining
23. 23. Parameter Estimation <ul><li>If we knew which component each sample y t came from, then parameter estimation would be easy: </li></ul><ul><ul><li>Set  m to be the average value of the samples which belong to the m th component </li></ul></ul><ul><ul><li>Set  m to be the variance of the samples which belong to the m th component </li></ul></ul><ul><ul><li>Set w m to be the proportion of samples which belong to the m th component </li></ul></ul><ul><li>But we don’t know which component each sample belongs to. </li></ul>EE3J2 Data Mining
24. 24. Solution – the E-M algorithm <ul><li>Guess initial values </li></ul><ul><li>For each n calculate the probabilities </li></ul><ul><li>Use these probabilities to estimate how much each sample y n ‘belongs to’ the m th component </li></ul><ul><li>Calculate: </li></ul>EE3J2 Data Mining REPEAT This is a measure of how much y n ‘belongs to’ the m th component
25. 25. The E-M algorithm EE3J2 Data Mining Parameter set  p (y |  )  (0) …  (i) local optimum
26. 26. E-M Algorithm <ul><li>Let’s just look at estimation of a the mean μ of a single component of a GMM </li></ul><ul><li>In fact, </li></ul><ul><li>In other words, λ n is the probability of the m th component given the data point y n </li></ul>EE3J2 Data Mining
27. 27. E-M continued <ul><li>From Bayes’ theorem: </li></ul>EE3J2 Data Mining Calculate from m th Gaussian component Sum over all components m th weight
28. 28. Example – initial model EE3J2 Data Mining P ( m 1 |y 6 )= λ 1 P ( m 2 |y 6 )= λ 2 m 1 m 2 y 6
29. 29. Example – after 1 st iteration of E-M EE3J2 Data Mining
30. 30. Example – after 2 nd iteration of E-M EE3J2 Data Mining
31. 31. Example – after 4 th iteration of E-M EE3J2 Data Mining
32. 32. Example – after 10 th iteration of E-M EE3J2 Data Mining
33. 33. Multivariate Gaussian PDFs <ul><li>All PDFs so far have been 1-dimensional </li></ul><ul><li>They take scalar values </li></ul><ul><li>But most real data will be represented as D -dimensional vectors </li></ul><ul><li>The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF </li></ul>EE3J2 Data Mining
34. 34. Multivariate Gaussian PDFs EE3J2 Data Mining Contours of equal probability 1-dimensional Gaussian PDFs
35. 35. Multivariate Gaussian PDFs EE3J2 Data Mining 1-dimensional Gaussian PDFs
36. 36. Multivariate Gaussian PDF <ul><li>The parameters of a multivariate Gaussian PDF are: </li></ul><ul><ul><li>The (vector) mean  </li></ul></ul><ul><ul><li>The (vector) variance  </li></ul></ul><ul><ul><li>The covariance </li></ul></ul>EE3J2 Data Mining The covariance matrix 
37. 37. Multivariate Gaussian PDFs <ul><li>Multivariate Gaussian PDFs are commonly used in pattern processing and data mining </li></ul><ul><li>Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs </li></ul><ul><li>The E-M algorithm works for multivariate Gaussian mixture PDFs </li></ul>EE3J2 Data Mining
38. 38. Summary <ul><li>Basic statistical modelling </li></ul><ul><li>Probability distributions </li></ul><ul><li>Probability density function </li></ul><ul><li>Gaussian PDFs </li></ul><ul><li>Gaussian mixture PDFs and the E-M algorithm </li></ul><ul><li>Multivariate Gaussian PDFs </li></ul>EE3J2 Data Mining
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.