Successfully reported this slideshow.
Upcoming SlideShare
×

# DataMining-techniques1.ppt

9,294 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### DataMining-techniques1.ppt

1. 1. Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
2. 2. Review: <ul><li>Information Retrieval </li></ul><ul><ul><li>Similarity measures </li></ul></ul><ul><ul><li>Evaluation Metrics : Precision and Recall </li></ul></ul><ul><li>Question Answering </li></ul><ul><li>Web Search Engine </li></ul><ul><ul><li>An application of IR </li></ul></ul><ul><ul><li>Related to web mining </li></ul></ul>
3. 3. Data Mining Techniques Outline <ul><li>Statistical </li></ul><ul><ul><li>Point Estimation </li></ul></ul><ul><ul><li>Models Based on Summarization </li></ul></ul><ul><ul><li>Bayes Theorem </li></ul></ul><ul><ul><li>Hypothesis Testing </li></ul></ul><ul><ul><li>Regression and Correlation </li></ul></ul><ul><li>Similarity Measures </li></ul>Goal: Provide an overview of basic data mining techniques
4. 4. Point Estimation <ul><li>Point Estimate: estimate a population parameter. </li></ul><ul><li>May be made by calculating the parameter for a sample. </li></ul><ul><li>May be used to predict value for missing data. </li></ul><ul><li>Ex: </li></ul><ul><ul><li>R contains 100 employees </li></ul></ul><ul><ul><li>99 have salary information </li></ul></ul><ul><ul><li>Mean salary of these is \$50,000 </li></ul></ul><ul><ul><li>Use \$50,000 as value of remaining employee’s salary. </li></ul></ul><ul><ul><li>Is this a good idea? </li></ul></ul>
5. 5. Estimation Error <ul><li>Bias: Difference between expected value and actual value. </li></ul><ul><li>Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: </li></ul><ul><li>Root Mean Square Error (RMSE) </li></ul>
6. 6. Jackknife Estimate <ul><li>Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. </li></ul><ul><li>Named to describe a “handy and useful tool” </li></ul><ul><li>Used to reduce bias </li></ul><ul><li>Property : The Jackknife estimator lowers the bias from the order of 1/n to 1/n 2 </li></ul>
7. 7. Jackknife Estimate <ul><li>Definition : </li></ul><ul><ul><li>Divide the sample size n into g groups of size m each, so n=mg. (often m=1 and g=n) </li></ul></ul><ul><ul><li>estimate  j by ignoring the jth group. </li></ul></ul><ul><ul><li> _ is the average of  j . </li></ul></ul><ul><ul><li>The Jackknife estimator is </li></ul></ul><ul><ul><ul><li> Q = g  – (g-1)  _. </li></ul></ul></ul><ul><ul><ul><li>Where  is an estimator for the parameter theta. </li></ul></ul></ul>
8. 8. Jackknife Estimator: Example 1 <ul><li>Estimate of mean for X={x 1 , x 2 , x 3 ,}, n =3, g=3, m=1,  =  = ( x 1 + x 2 + x 3 )/3 </li></ul><ul><li> 1 = ( x 2 + x 3 )/2,  2 = ( x 1 + x 3 )/2,  1 = ( x 1 + x 2 )/2, </li></ul><ul><li> _ = (  1 +  2 +  2 )/3 </li></ul><ul><li> Q = g  -(g-1)  _= 3  -(3-1)  _= ( x 1 + x 2 + x 3 )/3 </li></ul><ul><li>In this case, the Jackknife Estimator is the same as the usual estimator. </li></ul>
9. 9. Jackknife Estimator: Example 2 <ul><li>Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1,  =  2 </li></ul><ul><li> 2 = ((1-3) 2 +(4-3) 2 +(4-3) 2 )/3 = 2 </li></ul><ul><li> 1 = ((4-4) 2 + (4-4) 2 ) /2 = 0 , </li></ul><ul><li> 2 = 2.25 ,  3 = 2.25 </li></ul><ul><li> _ = (  1 +  2 +  2 )/3 = 1.5 </li></ul><ul><li> Q = g  -(g-1)  _= 3  -(3-1)  _ </li></ul><ul><ul><li> =3(2)-2(1.5)=3 </li></ul></ul><ul><li>In this case, the Jackknife Estimator is different from the usual estimator. </li></ul>
10. 10. Jackknife Estimator: Example 2(cont’d) <ul><li>In general, apply the Jackknife technique to the biased estimator  2 </li></ul><ul><li>  2 =  (x i – x ) 2 / n </li></ul><ul><li>then the jackknife estimator is s 2 </li></ul><ul><li>s 2 =  (x i – x ) 2 / (n -1) </li></ul><ul><li>Which is known to be unbiased for  2 </li></ul>
11. 11. Maximum Likelihood Estimate (MLE) <ul><li>Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. </li></ul><ul><li>Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: </li></ul><ul><li>Maximize L. </li></ul>
12. 12. MLE Example <ul><li>Coin toss five times: {H,H,H,H,T} </li></ul><ul><li>Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: </li></ul><ul><li>However if the probability of a H is 0.8 then: </li></ul>
13. 13. MLE Example (cont’d) <ul><li>General likelihood formula: </li></ul><ul><li>Estimate for p is then 4/5 = 0.8 </li></ul>
14. 14. Expectation-Maximization (EM) <ul><li>Solves estimation with incomplete data. </li></ul><ul><li>Obtain initial estimates for parameters. </li></ul><ul><li>Iteratively use estimates for missing data and continue until convergence. </li></ul>
15. 15. EM Example
16. 16. EM Algorithm
17. 17. Models Based on Summarization <ul><li>Basic concepts to provide an abstraction and summarization of the data as a whole. </li></ul><ul><ul><li>Statistical concepts: mean, variance, median, mode, etc. </li></ul></ul><ul><li>Visualization: display the structure of the data graphically. </li></ul><ul><ul><li>Line graphs, Pie charts, Histograms, Scatter plots, Hierarchical graphs </li></ul></ul>
18. 18. Scatter Diagram
19. 19. Bayes Theorem <ul><li>Posterior Probability: P(h 1 |x i ) </li></ul><ul><li>Prior Probability: P(h 1 ) </li></ul><ul><li>Bayes Theorem: </li></ul><ul><li>Assign probabilities of hypotheses given a data value. </li></ul>
20. 20. Bayes Theorem Example <ul><li>Credit authorizations (hypotheses): h 1 =authorize purchase, h 2 = authorize after further identification, h 3 =do not authorize, h 4 = do not authorize but contact police </li></ul><ul><li>Assign twelve data values for all combinations of credit and income: </li></ul><ul><li>From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%. </li></ul>
21. 21. Bayes Example(cont’d) <ul><li>Training Data: </li></ul>
22. 22. Bayes Example(cont’d) <ul><li>Calculate P(x i |h j ) and P(x i ) </li></ul><ul><li>Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i . </li></ul><ul><li>Predict the class for x 4 : </li></ul><ul><ul><li>Calculate P(h j |x 4 ) for all h j . </li></ul></ul><ul><ul><li>Place x 4 in class with largest value. </li></ul></ul><ul><ul><li>Ex: </li></ul></ul><ul><ul><ul><li>P(h 1 |x 4 )=(P(x 4 |h 1 )(P(h 1 ))/P(x 4 ) </li></ul></ul></ul><ul><ul><ul><li> =(1/6)(0.6)/0.1=1. </li></ul></ul></ul><ul><ul><ul><li>x 4 in class h 1 . </li></ul></ul></ul>
23. 23. Hypothesis Testing <ul><li>Find model to explain behavior by creating and then testing a hypothesis about the data. </li></ul><ul><li>Exact opposite of usual DM approach. </li></ul><ul><li>H 0 – Null hypothesis; Hypothesis to be tested. </li></ul><ul><li>H 1 – Alternative hypothesis </li></ul>
24. 24. Chi-Square Test <ul><li>One technique to perform hypothesis testing </li></ul><ul><li>Used to test the association between two observed variable values and determine if a set of observed values is statistically different. </li></ul><ul><li>The chi-squared statistic is defines as: </li></ul><ul><li>O – observed value </li></ul><ul><li>E – Expected value based on hypothesis. </li></ul>
25. 25. Chi-Square Test <ul><li>Given the average scores of five schools. Determine whether the difference is statistically significant. </li></ul><ul><li>Ex: </li></ul><ul><ul><li>O={50,93,67,78,87} </li></ul></ul><ul><ul><li>E=75 </li></ul></ul><ul><ul><li> 2 =15.55 and therefore significant </li></ul></ul><ul><li>Examine a chi-squared significance table. </li></ul><ul><ul><li>with a degree of 4 and a significance level of 95%, the critical value is 9.488. Thus the variance between the schools’ scores and the expected value cannot be associated with pure chance. </li></ul></ul>
26. 26. Regression <ul><li>Predict future values based on past values </li></ul><ul><li>Fitting a set of points to a curve </li></ul><ul><li>Linear Regression assumes linear relationship exists. </li></ul><ul><li>y = c 0 + c 1 x 1 + … + c n x n </li></ul><ul><ul><li>n input variables, (called regressors or predictors) </li></ul></ul><ul><ul><li>One out put variable, called response </li></ul></ul><ul><ul><li>n+1 constants, chosen during the modlong process to match the input examples </li></ul></ul>
27. 27. Linear Regression -- with one input value
28. 28. Correlation <ul><li>Examine the degree to which the values for two variables behave similarly. </li></ul><ul><li>Correlation coefficient r: </li></ul><ul><ul><li>1 = perfect correlation </li></ul></ul><ul><ul><li>-1 = perfect but opposite correlation </li></ul></ul><ul><ul><li>0 = no correlation </li></ul></ul>
29. 29. Correlation <ul><li>Where X, Y are means for X and Y respectively. </li></ul><ul><li>Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1) </li></ul><ul><ul><li>r = ? </li></ul></ul><ul><li>Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10) </li></ul><ul><ul><li>r = ? </li></ul></ul>
30. 30. Similarity Measures <ul><li>Determine similarity between two objects. </li></ul><ul><li>Similarity characteristics: </li></ul><ul><li>Alternatively, distance measure measure how unlike or dissimilar objects are. </li></ul>
31. 31. Similarity Measures
32. 32. Distance Measures <ul><li>Measure dissimilarity between objects </li></ul>
33. 33. Next Lecture: <ul><li>Data Mining techniques (II) </li></ul><ul><ul><li>Decision trees, neural networks and genetic algorithms </li></ul></ul><ul><li>Reading assignments: Chapter 3 </li></ul>