Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Review: <ul><li>Information Retrieval </li></ul><ul><ul><li>Similarity measures </li></ul></ul><ul><ul><li>Evaluation Metr...
Data Mining Techniques Outline <ul><li>Statistical </li></ul><ul><ul><li>Point Estimation </li></ul></ul><ul><ul><li>Model...
Point Estimation <ul><li>Point Estimate:  estimate a population parameter. </li></ul><ul><li>May be made by calculating th...
Estimation Error <ul><li>Bias:  Difference between expected value and actual value. </li></ul><ul><li>Mean Squared Error (...
Jackknife Estimate <ul><li>Jackknife Estimate:  estimate of parameter is obtained by omitting one value from the set of ob...
Jackknife Estimate <ul><li>Definition :  </li></ul><ul><ul><li>Divide the sample size n into g groups of size m each, so n...
Jackknife Estimator: Example 1 <ul><li>Estimate of mean for X={x 1 , x 2 , x 3 ,}, n =3, g=3, m=1,    =    = ( x 1 + x ...
Jackknife Estimator: Example 2 <ul><li>Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1,     =   2  </li></ul><ul><...
Jackknife Estimator:  Example 2(cont’d) <ul><li>In general, apply the Jackknife technique to the biased estimator   2 </...
Maximum Likelihood Estimate (MLE) <ul><li>Obtain parameter estimates that maximize the probability that the sample data oc...
MLE Example <ul><li>Coin toss five times: {H,H,H,H,T} </li></ul><ul><li>Assuming a perfect coin with H and T equally likel...
MLE Example (cont’d) <ul><li>General likelihood formula: </li></ul><ul><li>Estimate for p is then 4/5 = 0.8 </li></ul>
Expectation-Maximization (EM) <ul><li>Solves estimation with incomplete data. </li></ul><ul><li>Obtain initial estimates f...
EM Example
EM Algorithm
Models Based on Summarization <ul><li>Basic concepts to provide an abstraction and summarization of the data as a whole. <...
Scatter Diagram
Bayes Theorem <ul><li>Posterior Probability:   P(h 1 |x i ) </li></ul><ul><li>Prior Probability:  P(h 1 ) </li></ul><ul><l...
Bayes Theorem Example <ul><li>Credit authorizations (hypotheses):  h 1 =authorize purchase, h 2  = authorize after further...
Bayes Example(cont’d) <ul><li>Training Data: </li></ul>
Bayes Example(cont’d) <ul><li>Calculate P(x i |h j ) and P(x i ) </li></ul><ul><li>Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6;...
Hypothesis Testing <ul><li>Find model to explain behavior by creating and then testing a hypothesis about the data. </li><...
Chi-Square Test <ul><li>One technique to perform hypothesis testing </li></ul><ul><li>Used to test the association between...
Chi-Square Test <ul><li>Given the average scores of five schools. Determine whether the difference is statistically signif...
Regression <ul><li>Predict future values based on past values </li></ul><ul><li>Fitting a set of points to a curve </li></...
Linear Regression --  with one input value
Correlation <ul><li>Examine the degree to which the values for two variables behave similarly. </li></ul><ul><li>Correlati...
Correlation <ul><li>Where X, Y are means for X and Y respectively. </li></ul><ul><li>Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,...
Similarity Measures <ul><li>Determine similarity between two objects. </li></ul><ul><li>Similarity characteristics: </li><...
Similarity Measures
Distance Measures <ul><li>Measure dissimilarity between objects </li></ul>
Next Lecture: <ul><li>Data Mining techniques (II) </li></ul><ul><ul><li>Decision trees, neural networks and genetic algori...
Upcoming SlideShare
Loading in …5
×

DataMining-techniques1.ppt

9,294 views

Published on

  • Be the first to comment

DataMining-techniques1.ppt

  1. 1. Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
  2. 2. Review: <ul><li>Information Retrieval </li></ul><ul><ul><li>Similarity measures </li></ul></ul><ul><ul><li>Evaluation Metrics : Precision and Recall </li></ul></ul><ul><li>Question Answering </li></ul><ul><li>Web Search Engine </li></ul><ul><ul><li>An application of IR </li></ul></ul><ul><ul><li>Related to web mining </li></ul></ul>
  3. 3. Data Mining Techniques Outline <ul><li>Statistical </li></ul><ul><ul><li>Point Estimation </li></ul></ul><ul><ul><li>Models Based on Summarization </li></ul></ul><ul><ul><li>Bayes Theorem </li></ul></ul><ul><ul><li>Hypothesis Testing </li></ul></ul><ul><ul><li>Regression and Correlation </li></ul></ul><ul><li>Similarity Measures </li></ul>Goal: Provide an overview of basic data mining techniques
  4. 4. Point Estimation <ul><li>Point Estimate: estimate a population parameter. </li></ul><ul><li>May be made by calculating the parameter for a sample. </li></ul><ul><li>May be used to predict value for missing data. </li></ul><ul><li>Ex: </li></ul><ul><ul><li>R contains 100 employees </li></ul></ul><ul><ul><li>99 have salary information </li></ul></ul><ul><ul><li>Mean salary of these is $50,000 </li></ul></ul><ul><ul><li>Use $50,000 as value of remaining employee’s salary. </li></ul></ul><ul><ul><li>Is this a good idea? </li></ul></ul>
  5. 5. Estimation Error <ul><li>Bias: Difference between expected value and actual value. </li></ul><ul><li>Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: </li></ul><ul><li>Root Mean Square Error (RMSE) </li></ul>
  6. 6. Jackknife Estimate <ul><li>Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. </li></ul><ul><li>Named to describe a “handy and useful tool” </li></ul><ul><li>Used to reduce bias </li></ul><ul><li>Property : The Jackknife estimator lowers the bias from the order of 1/n to 1/n 2 </li></ul>
  7. 7. Jackknife Estimate <ul><li>Definition : </li></ul><ul><ul><li>Divide the sample size n into g groups of size m each, so n=mg. (often m=1 and g=n) </li></ul></ul><ul><ul><li>estimate  j by ignoring the jth group. </li></ul></ul><ul><ul><li> _ is the average of  j . </li></ul></ul><ul><ul><li>The Jackknife estimator is </li></ul></ul><ul><ul><ul><li> Q = g  – (g-1)  _. </li></ul></ul></ul><ul><ul><ul><li>Where  is an estimator for the parameter theta. </li></ul></ul></ul>
  8. 8. Jackknife Estimator: Example 1 <ul><li>Estimate of mean for X={x 1 , x 2 , x 3 ,}, n =3, g=3, m=1,  =  = ( x 1 + x 2 + x 3 )/3 </li></ul><ul><li> 1 = ( x 2 + x 3 )/2,  2 = ( x 1 + x 3 )/2,  1 = ( x 1 + x 2 )/2, </li></ul><ul><li> _ = (  1 +  2 +  2 )/3 </li></ul><ul><li> Q = g  -(g-1)  _= 3  -(3-1)  _= ( x 1 + x 2 + x 3 )/3 </li></ul><ul><li>In this case, the Jackknife Estimator is the same as the usual estimator. </li></ul>
  9. 9. Jackknife Estimator: Example 2 <ul><li>Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1,  =  2 </li></ul><ul><li> 2 = ((1-3) 2 +(4-3) 2 +(4-3) 2 )/3 = 2 </li></ul><ul><li> 1 = ((4-4) 2 + (4-4) 2 ) /2 = 0 , </li></ul><ul><li> 2 = 2.25 ,  3 = 2.25 </li></ul><ul><li> _ = (  1 +  2 +  2 )/3 = 1.5 </li></ul><ul><li> Q = g  -(g-1)  _= 3  -(3-1)  _ </li></ul><ul><ul><li> =3(2)-2(1.5)=3 </li></ul></ul><ul><li>In this case, the Jackknife Estimator is different from the usual estimator. </li></ul>
  10. 10. Jackknife Estimator: Example 2(cont’d) <ul><li>In general, apply the Jackknife technique to the biased estimator  2 </li></ul><ul><li>  2 =  (x i – x ) 2 / n </li></ul><ul><li>then the jackknife estimator is s 2 </li></ul><ul><li>s 2 =  (x i – x ) 2 / (n -1) </li></ul><ul><li>Which is known to be unbiased for  2 </li></ul>
  11. 11. Maximum Likelihood Estimate (MLE) <ul><li>Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. </li></ul><ul><li>Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: </li></ul><ul><li>Maximize L. </li></ul>
  12. 12. MLE Example <ul><li>Coin toss five times: {H,H,H,H,T} </li></ul><ul><li>Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: </li></ul><ul><li>However if the probability of a H is 0.8 then: </li></ul>
  13. 13. MLE Example (cont’d) <ul><li>General likelihood formula: </li></ul><ul><li>Estimate for p is then 4/5 = 0.8 </li></ul>
  14. 14. Expectation-Maximization (EM) <ul><li>Solves estimation with incomplete data. </li></ul><ul><li>Obtain initial estimates for parameters. </li></ul><ul><li>Iteratively use estimates for missing data and continue until convergence. </li></ul>
  15. 15. EM Example
  16. 16. EM Algorithm
  17. 17. Models Based on Summarization <ul><li>Basic concepts to provide an abstraction and summarization of the data as a whole. </li></ul><ul><ul><li>Statistical concepts: mean, variance, median, mode, etc. </li></ul></ul><ul><li>Visualization: display the structure of the data graphically. </li></ul><ul><ul><li>Line graphs, Pie charts, Histograms, Scatter plots, Hierarchical graphs </li></ul></ul>
  18. 18. Scatter Diagram
  19. 19. Bayes Theorem <ul><li>Posterior Probability: P(h 1 |x i ) </li></ul><ul><li>Prior Probability: P(h 1 ) </li></ul><ul><li>Bayes Theorem: </li></ul><ul><li>Assign probabilities of hypotheses given a data value. </li></ul>
  20. 20. Bayes Theorem Example <ul><li>Credit authorizations (hypotheses): h 1 =authorize purchase, h 2 = authorize after further identification, h 3 =do not authorize, h 4 = do not authorize but contact police </li></ul><ul><li>Assign twelve data values for all combinations of credit and income: </li></ul><ul><li>From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%. </li></ul>
  21. 21. Bayes Example(cont’d) <ul><li>Training Data: </li></ul>
  22. 22. Bayes Example(cont’d) <ul><li>Calculate P(x i |h j ) and P(x i ) </li></ul><ul><li>Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i . </li></ul><ul><li>Predict the class for x 4 : </li></ul><ul><ul><li>Calculate P(h j |x 4 ) for all h j . </li></ul></ul><ul><ul><li>Place x 4 in class with largest value. </li></ul></ul><ul><ul><li>Ex: </li></ul></ul><ul><ul><ul><li>P(h 1 |x 4 )=(P(x 4 |h 1 )(P(h 1 ))/P(x 4 ) </li></ul></ul></ul><ul><ul><ul><li> =(1/6)(0.6)/0.1=1. </li></ul></ul></ul><ul><ul><ul><li>x 4 in class h 1 . </li></ul></ul></ul>
  23. 23. Hypothesis Testing <ul><li>Find model to explain behavior by creating and then testing a hypothesis about the data. </li></ul><ul><li>Exact opposite of usual DM approach. </li></ul><ul><li>H 0 – Null hypothesis; Hypothesis to be tested. </li></ul><ul><li>H 1 – Alternative hypothesis </li></ul>
  24. 24. Chi-Square Test <ul><li>One technique to perform hypothesis testing </li></ul><ul><li>Used to test the association between two observed variable values and determine if a set of observed values is statistically different. </li></ul><ul><li>The chi-squared statistic is defines as: </li></ul><ul><li>O – observed value </li></ul><ul><li>E – Expected value based on hypothesis. </li></ul>
  25. 25. Chi-Square Test <ul><li>Given the average scores of five schools. Determine whether the difference is statistically significant. </li></ul><ul><li>Ex: </li></ul><ul><ul><li>O={50,93,67,78,87} </li></ul></ul><ul><ul><li>E=75 </li></ul></ul><ul><ul><li> 2 =15.55 and therefore significant </li></ul></ul><ul><li>Examine a chi-squared significance table. </li></ul><ul><ul><li>with a degree of 4 and a significance level of 95%, the critical value is 9.488. Thus the variance between the schools’ scores and the expected value cannot be associated with pure chance. </li></ul></ul>
  26. 26. Regression <ul><li>Predict future values based on past values </li></ul><ul><li>Fitting a set of points to a curve </li></ul><ul><li>Linear Regression assumes linear relationship exists. </li></ul><ul><li>y = c 0 + c 1 x 1 + … + c n x n </li></ul><ul><ul><li>n input variables, (called regressors or predictors) </li></ul></ul><ul><ul><li>One out put variable, called response </li></ul></ul><ul><ul><li>n+1 constants, chosen during the modlong process to match the input examples </li></ul></ul>
  27. 27. Linear Regression -- with one input value
  28. 28. Correlation <ul><li>Examine the degree to which the values for two variables behave similarly. </li></ul><ul><li>Correlation coefficient r: </li></ul><ul><ul><li>1 = perfect correlation </li></ul></ul><ul><ul><li>-1 = perfect but opposite correlation </li></ul></ul><ul><ul><li>0 = no correlation </li></ul></ul>
  29. 29. Correlation <ul><li>Where X, Y are means for X and Y respectively. </li></ul><ul><li>Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1) </li></ul><ul><ul><li>r = ? </li></ul></ul><ul><li>Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10) </li></ul><ul><ul><li>r = ? </li></ul></ul>
  30. 30. Similarity Measures <ul><li>Determine similarity between two objects. </li></ul><ul><li>Similarity characteristics: </li></ul><ul><li>Alternatively, distance measure measure how unlike or dissimilar objects are. </li></ul>
  31. 31. Similarity Measures
  32. 32. Distance Measures <ul><li>Measure dissimilarity between objects </li></ul>
  33. 33. Next Lecture: <ul><li>Data Mining techniques (II) </li></ul><ul><ul><li>Decision trees, neural networks and genetic algorithms </li></ul></ul><ul><li>Reading assignments: Chapter 3 </li></ul>

×