SlideShare a Scribd company logo
1 of 158
Data Mining: the Practice An Introduction Slides taken from : Data Mining  by I. H. Witten and E. Frank
What’s it all about? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data vs. information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Machine learning techniques ,[object Object],[object Object],[object Object],[object Object],[object Object]
Structural descriptions ,[object Object],… … … … … Hard Normal Yes Myope Presbyopic None Reduced No Hypermetrope Pre-presbyopic Soft Normal No Hypermetrope Young None Reduced No Myope Young Recommended lenses Tear production rate Astigmatism Spectacle prescription Age If tear production rate = reduced then recommendation = none Otherwise, if age = young and astigmatic = no  then recommendation = soft
Screening images ,[object Object],[object Object],[object Object],[object Object],[object Object]
Enter machine learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Marketing and sales I ,[object Object],[object Object],[object Object],[object Object]
Marketing and sales II ,[object Object],[object Object],[object Object],[object Object],[object Object]
Generalization as search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Enumerating the concept space ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bias ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Language bias ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search bias ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overfitting-avoidance bias ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concepts, instances, attributes Slides for Chapter 2 of  Data Mining  by I. H. Witten and E. Frank
Input:  Concepts, instances, attributes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Terminology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What’s a concept? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classification learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Association learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering ,[object Object],[object Object],[object Object],[object Object],… … … Iris virginica 1.9 5.1 2.7 5.8 102 101 52 51 2 1 Iris virginica 2.5 6.0 3.3 6.3 Iris versicolor 1.5 4.5 3.2 6.4 Iris versicolor 1.4 4.7 3.2 7.0 Iris setosa 0.2 1.4 3.0 4.9 Iris setosa 0.2 1.4 3.5 5.1 Type Petal width Petal length Sepal width Sepal length
Numeric prediction ,[object Object],[object Object],[object Object],[object Object],… … … … … 40 False Normal Mild Rainy 55 False High Hot  Overcast  0 True High  Hot  Sunny 5 False High Hot Sunny Play-time Windy Humidity Temperature Outlook
What’s in an example? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What’s in an attribute? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Nominal quantities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ordinal quantities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Interval quantities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ratio quantities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Attribute types used in practice ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Metadata ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Preparing the input ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The ARFF format % % ARFF file for weather data with some numeric features % @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature numeric @attribute humidity numeric @attribute windy {true, false} @attribute play? {yes, no} @data sunny, 85, 85, false, no sunny, 80, 90, true, no overcast, 83, 86, false, yes ...
Additional attribute types ,[object Object],[object Object],[object Object],[object Object],@attribute description string @attribute today date
Attribute types ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Nominal vs. ordinal ,[object Object],[object Object],If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no  and tear production rate = normal  then recommendation = soft If age    pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft
Missing values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inaccurate values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Getting to know the data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Output: representing structural patterns
Output: representing structural patterns ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classification rules ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The weather problem ,[object Object],… … … … … Yes False Normal Mild Rainy Yes False High Hot  Overcast  No True High  Hot  Sunny No False High Hot Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes
Weather data with mixed attributes ,[object Object],… … … … … Yes False 80 75 Rainy Yes False 86 83 Overcast  No True 90 80 Sunny No False 85 85 Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = yes If none of the above then play = yes
Association rules ,[object Object],[object Object],[object Object],[object Object],[object Object]
Support and confidence of a rule ,[object Object],[object Object],[object Object],[object Object],[object Object],If temperature = cool then humidity = normal
Interpreting association rules ,[object Object],[object Object],[object Object],If windy = false and play = no then outlook = sunny    and humidity = high If windy = false and play = no then outlook = sunny  If windy = false and play = no then humidity = high If humidity = high and windy = false and play = no  then outlook = sunny
Decision trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Nominal and numeric attributes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Missing values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The contact lenses data None Reduced Yes Hypermetrope Pre-presbyopic  None Normal Yes Hypermetrope Pre-presbyopic None Reduced No Myope Presbyopic None Normal No Myope Presbyopic None Reduced Yes Myope Presbyopic Hard Normal Yes Myope Presbyopic None Reduced No Hypermetrope Presbyopic Soft Normal No Hypermetrope Presbyopic None Reduced Yes Hypermetrope Presbyopic None Normal Yes Hypermetrope Presbyopic Soft Normal No Hypermetrope Pre-presbyopic None Reduced No Hypermetrope Pre-presbyopic Hard Normal Yes Myope Pre-presbyopic None Reduced Yes Myope Pre-presbyopic Soft Normal No Myope Pre-presbyopic None Reduced No Myope Pre-presbyopic hard Normal Yes Hypermetrope Young None Reduced Yes Hypermetrope Young Soft Normal No Hypermetrope Young None Reduced No Hypermetrope Young Hard Normal Yes Myope Young None Reduced Yes Myope Young  Soft Normal No Myope Young None Reduced No Myope Young Recommended lenses Tear production rate Astigmatism Spectacle prescription Age
A complete and correct rule set If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no  then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes  and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none
Classification vs. association rules ,[object Object],[object Object],If outlook = sunny and humidity = high then play = no If temperature = cool then humidity = normal If humidity = normal and windy = false then play = yes If outlook = sunny and play = no  then humidity = high If windy = false and play = no  then outlook = sunny and humidity = high
A decision tree for this problem
[object Object],[object Object],Predicting CPU performance 0 0 32 128 CHMAX 0 0 8 16 CHMIN Channels Performance Cache (Kb) Main memory (Kb) Cycle time (ns) 45 0 4000 1000 480 209 67 32 8000 512 480 208 … 269 32 32000 8000 29 2 198 256 6000 256 125 1 PRP CACH MMAX MMIN MYCT PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX
Linear regression for the CPU data PRP =  -56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH -  0.270 CHMIN + 1.46  CHMAX
Trees for numeric prediction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Regression tree for the CPU data
Model tree for the CPU data
Instance-based representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The distance function ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Representing clusters I Simple 2-D representation Venn diagram Overlapping clusters
Representing clusters II 1   2  3 a  0.4 0.1  0.5 b  0.1 0.8  0.1 c  0.3 0.3  0.4 d  0.1 0.1  0.8 e  0.4 0.2  0.4 f  0.1 0.4  0.5 g  0.7 0.2  0.1 h  0.5 0.4  0.1 … Probabilistic assignment Dendrogram NB: dendron is the Greek word for tree
Simplicity first ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inferring rudimentary rules ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pseudo-code for 1R ,[object Object],For each attribute, For each value of the attribute, make a rule as follows: count how often each class appears find the most frequent class make the rule assign that class to this attribute-value Calculate the error rate of the rules Choose the rules with the smallest error rate
Evaluating the weather attributes *  indicates a tie 3/6 True    No* 5/14 2/8 False    Yes Windy 1/7 Normal    Yes 4/14 3/7 High     No Humidity 5/14 4/14 Total errors 1/4 Cool     Yes 2/6 Mild     Yes 2/4 Hot    No* Temp 2/5 Rainy    Yes 0/4 Overcast    Yes 2/5 Sunny    No Outlook Errors Rules Attribute  No True High Mild Rainy Yes False Normal Hot Overcast Yes True High Mild Overcast Yes True Normal Mild Sunny Yes False Normal Mild Rainy Yes False Normal Cool Sunny No False High Mild Sunny Yes True Normal Cool Overcast No True Normal Cool Rainy Yes False Normal Cool Rainy Yes False High Mild Rainy Yes False High Hot  Overcast  No True High  Hot  Sunny No False High Hot Sunny Play Windy Humidity Temp Outlook
Constructing decision trees ,[object Object],[object Object],[object Object],[object Object],[object Object]
Which attribute to select?
Which attribute to select?
Criterion for attribute selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Computing information ,[object Object],[object Object],[object Object],[object Object]
Example: attribute  Outlook   ,[object Object],[object Object],[object Object],[object Object],Note: this is normally undefined.
Computing information gain ,[object Object],[object Object],gain( Outlook  )   = 0.247 bits gain( Temperature  )   = 0.029 bits gain( Humidity  )   = 0.152 bits gain( Windy  )   = 0.048 bits gain( Outlook  ) = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.940 – 0.693 = 0.247 bits
Continuing to split gain( Temperature  ) = 0.571 bits gain( Humidity  )   = 0.971 bits gain( Windy  )   = 0.020 bits
Final decision tree ,[object Object],[object Object]
Covering algorithms: Ruler Learners ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: generating a rule ,[object Object],[object Object],If x > 1.2 then class = a If x > 1.2 and y > 2.6 then class = a If true then class = a If x    1.2 then class = b If x > 1.2 and y    2.6 then class = b
Simple covering algorithm ,[object Object],[object Object],[object Object],[object Object]
Selecting a test ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: contact lens data ,[object Object],[object Object],4/12 Tear production rate = Normal 0/12 Tear production rate = Reduced 4/12 Astigmatism = yes 0/12 Astigmatism = no 1/12 Spectacle prescription = Hypermetrope 3/12 Spectacle prescription = Myope 1/8 Age = Presbyopic 1/8 Age = Pre-presbyopic 2/8 Age = Young If ?   then recommendation = hard
Modified rule and resulting data ,[object Object],[object Object],None Reduced Yes Hypermetrope Pre-presbyopic  None Normal Yes Hypermetrope Pre-presbyopic None Reduced Yes Myope Presbyopic Hard Normal Yes Myope Presbyopic None Reduced Yes Hypermetrope Presbyopic None Normal Yes Hypermetrope Presbyopic Hard Normal Yes Myope Pre-presbyopic None Reduced Yes Myope Pre-presbyopic hard Normal Yes Hypermetrope Young None Reduced Yes Hypermetrope Young Hard Normal Yes Myope Young None Reduced Yes Myope Young  Recommended lenses Tear production rate Astigmatism Spectacle prescription Age If astigmatism = yes    then recommendation = hard
Further refinement ,[object Object],[object Object],4/6 Tear production rate = Normal 0/6 Tear production rate = Reduced 1/6 Spectacle prescription = Hypermetrope 3/6 Spectacle prescription = Myope 1/4 Age = Presbyopic 1/4 Age = Pre-presbyopic 2/4 Age = Young If astigmatism = yes   and ?    then recommendation = hard
Modified rule and resulting data ,[object Object],[object Object],None Normal Yes Hypermetrope Pre-presbyopic Hard Normal Yes Myope Presbyopic None Normal Yes Hypermetrope Presbyopic Hard Normal Yes Myope Pre-presbyopic hard Normal Yes Hypermetrope Young Hard Normal Yes Myope Young Recommended lenses Tear production rate Astigmatism Spectacle prescription Age If astigmatism = yes   and tear production rate = normal  then recommendation = hard
Further refinement ,[object Object],[object Object],[object Object],[object Object],1/3 Spectacle prescription = Hypermetrope 3/3 Spectacle prescription = Myope 1/2 Age = Presbyopic 1/2 Age = Pre-presbyopic 2/2 Age = Young If astigmatism = yes    and tear production rate = normal   and ? then recommendation = hard
The result ,[object Object],[object Object],[object Object],[object Object],If astigmatism = yes and tear production rate = normal and spectacle prescription = myope then recommendation = hard If age = young and astigmatism = yes and tear production rate = normal then recommendation = hard
Pseudo-code for PRISM For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E
Separate and conquer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classification rules ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Credibility: Evaluating what’s been learned ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluation: the key to success ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Issues in evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training and testing I ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training and testing II ,[object Object],[object Object],[object Object],[object Object],[object Object]
Note on parameter tuning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Making the most of the data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Predicting performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Confidence intervals ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mean and variance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Confidence limits ,[object Object],[object Object],[object Object],– 1  0  1  1.65 0.25 40% 0.84 20% 1.28 10% 1.65 5% 2.33 2.58 3.09 z 1% 0.5% 0.1% Pr[ X     z ]
Transforming  f ,[object Object],[object Object],[object Object]
Examples ,[object Object],[object Object],[object Object],[object Object]
Holdout estimation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Repeated holdout method ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cross-validation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More on cross-validation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Leave-One-Out cross-validation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Leave-One-Out-CV and stratification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The bootstrap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The 0.632 bootstrap ,[object Object],[object Object],[object Object],[object Object]
Estimating error with the bootstrap ,[object Object],[object Object],[object Object],[object Object],[object Object]
More on the bootstrap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Comparing data mining schemes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Comparing schemes II ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Paired t-test ,[object Object],[object Object],[object Object],[object Object],[object Object],William Gosset Born: 1876 in Canterbury; Died:  1937 in Beaconsfield, England Obtained a post as a chemist in the Guinness brewery in Dublin in 1899. Invented the t-test to handle small samples for quality control in brewing. Wrote under the name &quot;Student&quot;.
Distribution of the means ,[object Object],[object Object],[object Object],[object Object],[object Object]
Student’s distribution ,[object Object],[object Object],9 degrees of freedom  normal distribution Assuming we have 10 estimates 0.88 20% 1.38 10% 1.83 5% 2.82 3.25 4.30 z 1% 0.5% 0.1% Pr[ X     z ] 0.84 20% 1.28 10% 1.65 5% 2.33 2.58 3.09 z 1% 0.5% 0.1% Pr[ X     z ]
Distribution of the differences ,[object Object],[object Object],[object Object],[object Object],[object Object]
Performing the test ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Unpaired observations ,[object Object],[object Object],[object Object]
Dependent estimates ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Predicting probabilities ,[object Object],[object Object],[object Object],[object Object],[object Object]
Quadratic loss function ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Informational loss function ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Counting the cost ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Counting the cost ,[object Object],[object Object],Actual class True negative False positive No False negative True positive Yes No Yes Predicted class
Aside: the kappa statistic ,[object Object],[object Object],[object Object]
Classification with costs ,[object Object],[object Object],[object Object]
Cost-sensitive classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cost-sensitive learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lift charts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Generating a lift chart ,[object Object],[object Object],… … … Yes 0.88 4 No 0.93 3 Yes 0.93 2 Yes 0.95 1 Actual class Predicted probability
A hypothetical lift chart 40% of responses for 10% of cost 80% of responses for 40% of cost
ROC curves ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A sample ROC curve ,[object Object],[object Object]
Cross-validation and ROC curves ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ROC curves for two schemes ,[object Object],[object Object],[object Object]
The convex hull ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More measures... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary of  some  measures Explanation Plot Domain TP/(TP+FN) TP/(TP+FP) Recall Precision Information retrieval Recall-precision curve TP/(TP+FN) FP/(FP+TN) TP rate FP rate Communications ROC curve TP (TP+FP)/(TP+FP+TN+FN) TP  Subset size Marketing Lift chart
Cost curves ,[object Object],[object Object]
Cost curves: example with costs
Evaluating numeric prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other measures ,[object Object],[object Object],[object Object]
Improvement on the mean ,[object Object],[object Object],[object Object]
Correlation coefficient ,[object Object],[object Object],[object Object]
Which measure? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],0.91 0.89 0.88 0.88 Correlation coefficient 30.4% 34.8% 40.1% 43.1% Relative absolute error 35.8% 39.4% 57.2% 42.2% Root rel squared error 29.2 33.4 38.5 41.3 Mean absolute error 57.4 63.3 91.7 67.8 Root mean-squared error D C B A
The MDL principle ,[object Object],[object Object],[object Object],[object Object],[object Object]
Model selection criteria ,[object Object],[object Object],[object Object],[object Object],[object Object],William of Ockham, born in the village of Ockham in Surrey (England) about 1285, was the most influential philosopher of the 14th century and a controversial theologian.
Elegance vs. errors ,[object Object],[object Object],[object Object],[object Object],[object Object]
MDL and compression ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MDL and Bayes’s theorem ,[object Object],[object Object],[object Object],[object Object],[object Object],constant
MDL and MAP ,[object Object],[object Object],[object Object],[object Object],[object Object]
Discussion of MDL principle ,[object Object],[object Object],[object Object],[object Object],[object Object]
MDL and clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Competitive Analysis in Pharmaceutical Industry
Competitive Analysis in Pharmaceutical IndustryCompetitive Analysis in Pharmaceutical Industry
Competitive Analysis in Pharmaceutical IndustryYee Jie NG
 
Therapeutic Drug Monitoring.pptx
Therapeutic Drug Monitoring.pptxTherapeutic Drug Monitoring.pptx
Therapeutic Drug Monitoring.pptxMangeshBansod2
 
Pharmaceutical labelling
Pharmaceutical labellingPharmaceutical labelling
Pharmaceutical labellingbvocmithilesh
 
Pharmaceutical Management: Product mix
Pharmaceutical Management: Product mixPharmaceutical Management: Product mix
Pharmaceutical Management: Product mixCarmela Alonzo
 
Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)Ann-Marie Roche
 
Hospital formulary- By Dr.Irfanul Haque
Hospital formulary- By Dr.Irfanul HaqueHospital formulary- By Dr.Irfanul Haque
Hospital formulary- By Dr.Irfanul HaqueDr.Irfan shaikh
 
How and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product PlanningHow and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product PlanningAnthony Russell
 
Seguimiento farmacoterapéutico
Seguimiento farmacoterapéuticoSeguimiento farmacoterapéutico
Seguimiento farmacoterapéuticoAdrianatej
 
The role of a pharmacist
The role of a pharmacistThe role of a pharmacist
The role of a pharmacistjos manik
 
Corporate Research Project on Pharmaceutical Industry of Singapore
Corporate Research Project on Pharmaceutical Industry of SingaporeCorporate Research Project on Pharmaceutical Industry of Singapore
Corporate Research Project on Pharmaceutical Industry of SingaporeSanket Golechha
 
Sources of drug information
Sources of drug informationSources of drug information
Sources of drug informationArvind Kumar
 
Clinical trial supply management
Clinical trial supply managementClinical trial supply management
Clinical trial supply managementDipesh Pabrekar
 
Rational Drug use - Group 5 presentation 2.0 bello edit
Rational Drug use - Group 5 presentation 2.0 bello editRational Drug use - Group 5 presentation 2.0 bello edit
Rational Drug use - Group 5 presentation 2.0 bello editCipriano Bello
 
Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...
Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...
Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...SerPIE Repository
 
The Statisticians Role in Pharmaceutical Development
The Statisticians Role in Pharmaceutical DevelopmentThe Statisticians Role in Pharmaceutical Development
The Statisticians Role in Pharmaceutical DevelopmentAshwani Dhingra
 
Literature monitoring for pv what are we doing at galderma elsevier webinar
Literature monitoring for pv   what are we doing at galderma elsevier webinarLiterature monitoring for pv   what are we doing at galderma elsevier webinar
Literature monitoring for pv what are we doing at galderma elsevier webinarAnn-Marie Roche
 
Trends in pharmaceutical industry
Trends in pharmaceutical industryTrends in pharmaceutical industry
Trends in pharmaceutical industryDarya Daoud
 

What's hot (20)

Competitive Analysis in Pharmaceutical Industry
Competitive Analysis in Pharmaceutical IndustryCompetitive Analysis in Pharmaceutical Industry
Competitive Analysis in Pharmaceutical Industry
 
Outsourcing in pharma
Outsourcing in pharmaOutsourcing in pharma
Outsourcing in pharma
 
Therapeutic Drug Monitoring.pptx
Therapeutic Drug Monitoring.pptxTherapeutic Drug Monitoring.pptx
Therapeutic Drug Monitoring.pptx
 
Pharmaceutical labelling
Pharmaceutical labellingPharmaceutical labelling
Pharmaceutical labelling
 
Pharmaceutical Management: Product mix
Pharmaceutical Management: Product mixPharmaceutical Management: Product mix
Pharmaceutical Management: Product mix
 
Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)
 
Hospital formulary- By Dr.Irfanul Haque
Hospital formulary- By Dr.Irfanul HaqueHospital formulary- By Dr.Irfanul Haque
Hospital formulary- By Dr.Irfanul Haque
 
How and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product PlanningHow and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product Planning
 
Seguimiento farmacoterapéutico
Seguimiento farmacoterapéuticoSeguimiento farmacoterapéutico
Seguimiento farmacoterapéutico
 
The role of a pharmacist
The role of a pharmacistThe role of a pharmacist
The role of a pharmacist
 
Good pharmacy practice
Good pharmacy practiceGood pharmacy practice
Good pharmacy practice
 
Corporate Research Project on Pharmaceutical Industry of Singapore
Corporate Research Project on Pharmaceutical Industry of SingaporeCorporate Research Project on Pharmaceutical Industry of Singapore
Corporate Research Project on Pharmaceutical Industry of Singapore
 
Pharmacovigilance
PharmacovigilancePharmacovigilance
Pharmacovigilance
 
Sources of drug information
Sources of drug informationSources of drug information
Sources of drug information
 
Clinical trial supply management
Clinical trial supply managementClinical trial supply management
Clinical trial supply management
 
Rational Drug use - Group 5 presentation 2.0 bello edit
Rational Drug use - Group 5 presentation 2.0 bello editRational Drug use - Group 5 presentation 2.0 bello edit
Rational Drug use - Group 5 presentation 2.0 bello edit
 
Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...
Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...
Proper Disposal of Unwanted Medicines with a Focus on the Aging. Source:U.S. ...
 
The Statisticians Role in Pharmaceutical Development
The Statisticians Role in Pharmaceutical DevelopmentThe Statisticians Role in Pharmaceutical Development
The Statisticians Role in Pharmaceutical Development
 
Literature monitoring for pv what are we doing at galderma elsevier webinar
Literature monitoring for pv   what are we doing at galderma elsevier webinarLiterature monitoring for pv   what are we doing at galderma elsevier webinar
Literature monitoring for pv what are we doing at galderma elsevier webinar
 
Trends in pharmaceutical industry
Trends in pharmaceutical industryTrends in pharmaceutical industry
Trends in pharmaceutical industry
 

Similar to Data Mining: Practical Machine Learning Tools and Techniques ...

Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)Rizwan Shaukat
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
slides
slidesslides
slidesbutest
 
slides
slidesslides
slidesbutest
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edubutest
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...butest
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
Data Mining
Data MiningData Mining
Data Miningshrapb
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesDataminingTools Inc
 

Similar to Data Mining: Practical Machine Learning Tools and Techniques ... (20)

Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Chapter8.coding
Chapter8.codingChapter8.coding
Chapter8.coding
 
slides
slidesslides
slides
 
slides
slidesslides
slides
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edu
 
Talk
TalkTalk
Talk
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Data Mining
Data MiningData Mining
Data Mining
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And Attributes
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Data Mining: Practical Machine Learning Tools and Techniques ...

  • 1. Data Mining: the Practice An Introduction Slides taken from : Data Mining by I. H. Witten and E. Frank
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Concepts, instances, attributes Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. The ARFF format % % ARFF file for weather data with some numeric features % @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature numeric @attribute humidity numeric @attribute windy {true, false} @attribute play? {yes, no} @data sunny, 85, 85, false, no sunny, 80, 90, true, no overcast, 83, 86, false, yes ...
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52. The contact lenses data None Reduced Yes Hypermetrope Pre-presbyopic None Normal Yes Hypermetrope Pre-presbyopic None Reduced No Myope Presbyopic None Normal No Myope Presbyopic None Reduced Yes Myope Presbyopic Hard Normal Yes Myope Presbyopic None Reduced No Hypermetrope Presbyopic Soft Normal No Hypermetrope Presbyopic None Reduced Yes Hypermetrope Presbyopic None Normal Yes Hypermetrope Presbyopic Soft Normal No Hypermetrope Pre-presbyopic None Reduced No Hypermetrope Pre-presbyopic Hard Normal Yes Myope Pre-presbyopic None Reduced Yes Myope Pre-presbyopic Soft Normal No Myope Pre-presbyopic None Reduced No Myope Pre-presbyopic hard Normal Yes Hypermetrope Young None Reduced Yes Hypermetrope Young Soft Normal No Hypermetrope Young None Reduced No Hypermetrope Young Hard Normal Yes Myope Young None Reduced Yes Myope Young Soft Normal No Myope Young None Reduced No Myope Young Recommended lenses Tear production rate Astigmatism Spectacle prescription Age
  • 53. A complete and correct rule set If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none
  • 54.
  • 55. A decision tree for this problem
  • 56.
  • 57. Linear regression for the CPU data PRP = -56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX
  • 58.
  • 59. Regression tree for the CPU data
  • 60. Model tree for the CPU data
  • 61.
  • 62.
  • 63. Representing clusters I Simple 2-D representation Venn diagram Overlapping clusters
  • 64. Representing clusters II 1 2 3 a 0.4 0.1 0.5 b 0.1 0.8 0.1 c 0.3 0.3 0.4 d 0.1 0.1 0.8 e 0.4 0.2 0.4 f 0.1 0.4 0.5 g 0.7 0.2 0.1 h 0.5 0.4 0.1 … Probabilistic assignment Dendrogram NB: dendron is the Greek word for tree
  • 65.
  • 66.
  • 67.
  • 68. Evaluating the weather attributes * indicates a tie 3/6 True  No* 5/14 2/8 False  Yes Windy 1/7 Normal  Yes 4/14 3/7 High  No Humidity 5/14 4/14 Total errors 1/4 Cool  Yes 2/6 Mild  Yes 2/4 Hot  No* Temp 2/5 Rainy  Yes 0/4 Overcast  Yes 2/5 Sunny  No Outlook Errors Rules Attribute No True High Mild Rainy Yes False Normal Hot Overcast Yes True High Mild Overcast Yes True Normal Mild Sunny Yes False Normal Mild Rainy Yes False Normal Cool Sunny No False High Mild Sunny Yes True Normal Cool Overcast No True Normal Cool Rainy Yes False Normal Cool Rainy Yes False High Mild Rainy Yes False High Hot Overcast No True High Hot Sunny No False High Hot Sunny Play Windy Humidity Temp Outlook
  • 69.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76. Continuing to split gain( Temperature ) = 0.571 bits gain( Humidity ) = 0.971 bits gain( Windy ) = 0.020 bits
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88. Pseudo-code for PRISM For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136. A hypothetical lift chart 40% of responses for 10% of cost 80% of responses for 40% of cost
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143. Summary of some measures Explanation Plot Domain TP/(TP+FN) TP/(TP+FP) Recall Precision Information retrieval Recall-precision curve TP/(TP+FN) FP/(FP+TN) TP rate FP rate Communications ROC curve TP (TP+FP)/(TP+FP+TN+FN) TP Subset size Marketing Lift chart
  • 144.
  • 145. Cost curves: example with costs
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153.
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.