EE3J2 Data Mining EE3J2 Data Mining
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

EE3J2 Data Mining EE3J2 Data Mining

  • 933 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
933
On Slideshare
929
From Embeds
4
Number of Embeds
1

Actions

Shares
Downloads
51
Comments
0
Likes
3

Embeds 4

http://www.slideshare.net 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell EE3J2 Data Mining
  • 2. Objectives
    • To review basic statistical modelling
    • To review the notion of probability distribution
    • To review the notion of probability distribution
    • To review the notion of probability density function
    • To introduce mixture densities
    • To introduce the multivariate Gaussian density
    EE3J2 Data Mining
  • 3. Discrete variables
    • Suppose that Y is a random variable which can take any value in a discrete set X= { x 1 ,x 2 ,…,x M }
    • Suppose that y 1 ,y 2 ,…,y N are samples of the random variable Y
    • If c m is the number of times that the y n = x m then an estimate of the probability that y n takes the value x m is given by:
    EE3J2 Data Mining
  • 4. Discrete Probability Mass Function EE3J2 Data Mining Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098
  • 5. Continuous Random Variables
    • In most practical applications the data are not restricted to a finite set of values – they can take any value in N -dimensional space
    • Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities…
    • … but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods
    EE3J2 Data Mining
  • 6. Continuous Random Variables
    • An alternative is to use a parametric model
    • In a parametric model, probabilities are defined by a small set of parameters
    • Simplest example is a normal , or Gaussian model
    • A Gaussian probability density function (PDF) is defined by two parameters – its mean  and variance 
    EE3J2 Data Mining
  • 7. Gaussian PDF
    • ‘ Standard’ 1-dimensional Guassian PDF:
      • mean  =0
      • variance  =1
    EE3J2 Data Mining
  • 8. Gaussian PDF EE3J2 Data Mining a b P ( a  x  b )
  • 9. Gaussian PDF
    • For a 1-dimensional Gaussian PDF p with mean  and variance  :
    EE3J2 Data Mining Constant to ensure area under curve is 1 Defines ‘bell’ shape
  • 10. More examples EE3J2 Data Mining  =0.1  =1.0  =10.0  =5.0
  • 11. Fitting a Gaussian PDF to Data
    • Suppose y = y 1 ,…,y n ,…,y N is a set of N data values
    • Given a Gaussian PDF p with mean  and variance  , define:
    • How do we choose  and  to maximise this probability?
    EE3J2 Data Mining
  • 12. Fitting a Gaussian PDF to Data EE3J2 Data Mining Poor fit Good fit
  • 13. Maximum Likelihood Estimation
    • Define the best fitting Gaussian to be the one such that p ( y |  ,  ) is maximised.
    • Terminology:
      • p ( y |  ,  ), thought of as a function of y is the probability ( density ) of y
      • p ( y |  ,  ), thought of as a function of  ,  is the likelihood of  , 
    • Maximising p ( y |  ,  ) with respect to  ,  is called Maximum Likelihood (ML) estimation of  , 
    EE3J2 Data Mining
  • 14. ML estimation of  , 
    • Intuitively:
      • The maximum likelihood estimate of  should be the average value of y 1 ,…,y N , (the sample mean )
      • The maximum likelihood estimate of  should be the variance of y 1 ,…,y N. (the sample variance )
    • This turns out to be true: p ( y |  ,  ) is maximised by setting:
    EE3J2 Data Mining
  • 15. Multi-modal distributions
    • In practice the distributions of many naturally occurring phenomena do not follow the simple bell-shaped Gaussian curve
    • For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults)
    • These peaks are the modes of the distribution and the distribution is called multi-modal
    EE3J2 Data Mining
  • 16. Gaussian Mixture PDFs
    • Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions.
    • A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs
    • For example, if p 1 and p 2 are Gaussiam PDFs, then
    • p ( y ) = w 1 p 1 ( y ) + w 2 p 2 ( y )
    • defines a 2 component Gaussian mixture PDF
    EE3J2 Data Mining
  • 17. Gaussian Mixture - Example
    • 2 component mixture model
      • Component 1:  =0,  =0.1
      • Component 2:  =2,  =1
      • w 1 = w 2 = 0.5
    EE3J2 Data Mining
  • 18. Example 2
    • 2 component mixture model
      • Component 1:  =0,  =0.1
      • Component 2:  =2,  =1
      • w 1 = 0.2 w 2 = 0.8
    EE3J2 Data Mining
  • 19. Example 3
    • 2 component mixture model
      • Component 1:  =0,  =0.1
      • Component 2:  =2,  =1
      • w 1 = 0.2 w 2 = 0.8
    EE3J2 Data Mining
  • 20. Example 4
    • 5 component Gaussian mixture PDF
    EE3J2 Data Mining
  • 21. Gaussian Mixture Model
    • In general, an M component Gaussian mixture PDF is defined by:
    • where each p m is a Gaussian PDF and
    EE3J2 Data Mining
  • 22. Estimating the parameters of a Gaussian mixture model
    • A Gaussian Mixture Model with M components has:
      • M means:  1 ,…,  M
      • M variances  1 ,…,  M
      • M mixture weights w 1 ,…,w M .
    • Given a set of data y = y 1 ,…,y N , how can we estimate these parameters?
    • I.e. how do we find a maximum likelihood estimate of  1 ,…,  M ,  1 ,…,  M , w 1 ,…,w M ?
    EE3J2 Data Mining
  • 23. Parameter Estimation
    • If we knew which component each sample y t came from, then parameter estimation would be easy:
      • Set  m to be the average value of the samples which belong to the m th component
      • Set  m to be the variance of the samples which belong to the m th component
      • Set w m to be the proportion of samples which belong to the m th component
    • But we don’t know which component each sample belongs to.
    EE3J2 Data Mining
  • 24. Solution – the E-M algorithm
    • Guess initial values
    • For each n calculate the probabilities
    • Use these probabilities to estimate how much each sample y n ‘belongs to’ the m th component
    • Calculate:
    EE3J2 Data Mining REPEAT This is a measure of how much y n ‘belongs to’ the m th component
  • 25. The E-M algorithm EE3J2 Data Mining Parameter set  p (y |  )  (0) …  (i) local optimum
  • 26. E-M Algorithm
    • Let’s just look at estimation of a the mean μ of a single component of a GMM
    • In fact,
    • In other words, λ n is the probability of the m th component given the data point y n
    EE3J2 Data Mining
  • 27. E-M continued
    • From Bayes’ theorem:
    EE3J2 Data Mining Calculate from m th Gaussian component Sum over all components m th weight
  • 28. Example – initial model EE3J2 Data Mining P ( m 1 |y 6 )= λ 1 P ( m 2 |y 6 )= λ 2 m 1 m 2 y 6
  • 29. Example – after 1 st iteration of E-M EE3J2 Data Mining
  • 30. Example – after 2 nd iteration of E-M EE3J2 Data Mining
  • 31. Example – after 4 th iteration of E-M EE3J2 Data Mining
  • 32. Example – after 10 th iteration of E-M EE3J2 Data Mining
  • 33. Multivariate Gaussian PDFs
    • All PDFs so far have been 1-dimensional
    • They take scalar values
    • But most real data will be represented as D -dimensional vectors
    • The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF
    EE3J2 Data Mining
  • 34. Multivariate Gaussian PDFs EE3J2 Data Mining Contours of equal probability 1-dimensional Gaussian PDFs
  • 35. Multivariate Gaussian PDFs EE3J2 Data Mining 1-dimensional Gaussian PDFs
  • 36. Multivariate Gaussian PDF
    • The parameters of a multivariate Gaussian PDF are:
      • The (vector) mean 
      • The (vector) variance 
      • The covariance
    EE3J2 Data Mining The covariance matrix 
  • 37. Multivariate Gaussian PDFs
    • Multivariate Gaussian PDFs are commonly used in pattern processing and data mining
    • Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs
    • The E-M algorithm works for multivariate Gaussian mixture PDFs
    EE3J2 Data Mining
  • 38. Summary
    • Basic statistical modelling
    • Probability distributions
    • Probability density function
    • Gaussian PDFs
    • Gaussian mixture PDFs and the E-M algorithm
    • Multivariate Gaussian PDFs
    EE3J2 Data Mining