SlideShare a Scribd company logo
1 of 91
Learning from Examples: Standard Methodology for Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using  Tuning  Sets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental Methodology: A Pictorial Overview generate solutions select best LEARNER training examples train’ set tune set testing examples classifier expected accuracy on future examples collection of classified examples Statistical techniques such as 10-fold cross validation and  t -tests are used to get meaningful results
Proper Experimental Methodology Can Have a Huge Impact! ,[object Object],[object Object],[object Object],[object Object]
Parameter Setting ,[object Object],[object Object],[object Object]
Using Multiple Tuning Sets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tuning a Parameter - Sample Usage ,[object Object],[object Object],[object Object],Tune set accuracy  (ave. over 10 runs)=92% 1 10 2 K=2 Tune set accuracy  (ave. over 10 runs)=97% 1 10 2 … Tune set accuracy  (ave. over 10 runs)=80% 1 10 2 K=100 K=0 tune train
What to Do for the FIELDED System? ,[object Object],[object Object],[object Object],[object Object]
What’s Wrong with This? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why Not Learn After  Each  Test Example? ,[object Object],[object Object],[object Object],[object Object]
Choosing a Good  N   for CV (from Weiss & Kulikowski Textbook) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Recap:     N   -fold Cross Validation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Examples Fold 2 Fold 3 Fold 4 Fold 5 Fold 1
Confusion Matrices -  Useful Way to Report TESTSET Errors Useful for NETtalk testbed – task of pronouncing written words
Scatter Plots - Compare Two Algo’s on Many Datasets Algo A’s Error Rate Algo B’s Error Rate Each dot is the error rate of the two algo’s on ONE dataset
Statistical Analysis of  Sampling Effects ,[object Object],[object Object],[object Object],[object Object]
The Binomial Distribution ,[object Object]
Using the Binomial ,[object Object],[object Object],[object Object],[object Object]
Central Limit Theorem ,[object Object],Surprisingly,  N  = 30 is large enough!  (in most cases at least) - see pg 132 of textbook   0 1 Ave  Y  over  N   trials  (repeated many times)
Confidence Intervals
As You Already Learned in “Stat 101” ,[object Object],[object Object],[object Object]
The Remaining Details
Alg 1 vs. Alg 2 ,[object Object],[object Object],[object Object],[object Object]
Leave-One-Out: Sign Test ,[object Object],[object Object],[object Object],[object Object]
What about 10-fold? ,[object Object],[object Object],[object Object]
Paired Student  t   -tests ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Paired Student  t   –Tests (cont.) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],i
[object Object],[object Object],[object Object],[object Object],The Random Variable in the  t   -Test ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],i
More on the Paired  t   -Test ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Null Hypothesis Graphically (View #1) δ Assume   zero  mean and use the  sample’s variance   (sample = experiment) 1. ½ (1 –  M  ) probability mass in each tail (ie,  M  inside) Typically  M  = 0.95 Does our measured  δ  lie in the regions indicated by arrows?  If so, reject  null hypothesis,  since it is unlikely we’d get such a  δ  by chance P( δ )
View #2 – The Confidence Interval for  δ   δ Use  sample’s mean and variance 2. Is  zero  in the  M  % of probability mass? If NOT, reject  null hypothesis P( δ )
The  t   -test Confidence Interval ,[object Object],[object Object],[object Object],[object Object],See if    contains ZERO. If not, we can reject the NULL HYPOTHESIS i.e. algorithms A & B perform equivalently * Hence if  N   is the typical 10, our dataset must have    ≥  300 examples
The  t   -Test Calculation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],See table 5.6 in Mitchell We don’t know an analytical expression for the variance, so we need to estimate it on the data
The  t   -test Calculation (cont.) - Using View #2  (get same result using view #1) ,[object Object],[object Object],PDF δ
Some Jargon:  P   –values (Uses View #1) ,[object Object],[object Object],P NULL HYPO DISTRIBUTION
From Wikipedia ( http:// en.wikipedia.org/wiki/P -value ) ,[object Object],[object Object],[object Object]
“ Accepting” the Null Hypothesis ,[object Object],[object Object],[object Object],[object Object],[object Object]
More on the  t   -Distribution ,[object Object],[object Object],[object Object],[object Object],[object Object],Gaussian t N different curve for each  N
Some Assumptions Underlying our Calculations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Stability Stability  =  how much the model an algorithm learns changes due to minor perturbations of the training set Paired  t   -test assumptions are a better match to stable algorithm Example:  k -NN, higher the  k , the  more stable
More on Paired  t   -Test Assumption Ideally train on one data set and then do a 10-fold paired  t -test What we should do:  train test1 … test10 What we usually do:  train1 test1   …   train10 test10 However, not enough data usually to do the ideal If we assume that train data is part of each paired experiment then we  violate  independence assumptions - each train set overlaps 90% with every other train set Learned model does  not  vary while we’re measuring its performance
The Great Debate (or one of them, at least) ,[object Object],[object Object],[object Object]
One vs. Two-Tailed  Graphically P(x) x 2.5% 2.5% 2.5% One-Tailed Test Two-Tailed Test
The Great Debate (More) ,[object Object],[object Object],[object Object],See  http://www.psychstat.missouristate.edu/introbook/sbk25m.htm By being more confident, it is  easier  to show significance!
Two Sided vs. One Sided ,[object Object],[object Object],Measured mean mean - x mean + x
Two Sided vs. One Sided ,[object Object],85%
Two Sided vs. One Sided ,[object Object],A - B
Two Sided vs. One Sided ,[object Object],A - B
Contingency Tables + - + - True Answer Algorithm Answer Counts of occurrences n(0,0) [true neg] n(0,1) [false neg] n(1,0) [false pos] n(1,1) [true pos]
TPR and FPR True Positive Rate  =  n(1,1) / ( n(1,1) + n(0,1) ) (TPR)     =  correctly categorized +’s / total positives      P(algo outputs + | + is correct) False Positive Rate  =  n(1,0) / ( n(1,0) + n(0,0) ) (FPR)    =  incorrectly categorized –’s / total neg’s      P(algo outputs + | - is correct) Can similarly define  False Negative Rate  and  True Negative Rate See   http:// en.wikipedia.org/wiki/Type_I_and_type_II_errors
ROC Curves ,[object Object],[object Object],[object Object],[object Object],[object Object]
ROC Curves Graphically 1.0 1.0 False positives rate True positives rate Prob (alg outputs + | + is correct) Prob (alg outputs + | - is correct) Ideal   Spot Alg 1 Alg 2 Different algorithms can work better in different parts of ROC space.  This depends on cost of false + vs false -
Creating an ROC Curve - the Standard Approach ,[object Object],[object Object],[object Object]
Algo for Creating ROC Curves ( one  possibility; use it on HW2) ,[object Object],[object Object],[object Object],[object Object]
Plotting ROC Curves  - Example Ex 9 .99 + Ex 7 .98 + Ex 1 .72  - Ex 2 .70 + Ex 6 .65 + Ex 10 .51  - Ex 3 .39  - Ex 5 .24 + Ex 4 .11  - Ex 8 .01  - ML Algo Output  (Sorted)   Correct     Category   1.0 1.0 P(alg outputs + | + is correct) P(alg outputs + | - is correct) TPR=(2/5), FPR=(0/5) TPR=(2/5), FPR=(1/5) TPR=(4/5), FPR=(1/5) TPR=(4/5), FPR=(3/5) TPR=(5/5), FPR=(3/5) TPR=(5/5), FPR=(5/5)
ROC’s and Many Models ( not  in the ensemble sense) ,[object Object],[object Object],[object Object]
Area Under ROC Curve ,[object Object],1.0 1.0 False positives True positives
Asymmetric Error Costs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ROC’s & Skewed Data ,[object Object],[object Object],[object Object],[object Object]
Precision vs. Recall (think about search engines) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Precision vs. Recall ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ROC vs. Recall-Precision ,[object Object],[object Object],The reason for this is that there may be lots of – ex’s (eg, might need to include 100 neg’s to get 1 more pos) vs. P ( + | - ) Recall Precision P ( + | + )
Recall-Precision Curves ,[object Object],[object Object],[object Object],[object Object],Recall Precision x
Interpolating in PR Space ,[object Object],[object Object],[object Object],[object Object]
The Relationship between  Precision-Recall and ROC Curves Jesse Davis & Mark Goadrich Department of Computer Sciences University of Wisconsin
Four Questions about  PR space and ROC space ,[object Object],[object Object],[object Object],[object Object]
Definition: Dominance
Definition: Area Under the Curve (AUC) Precision Recall TPR FPR
How do we evaluate ML algorithms? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Two Highly Skewed Domains Is an abnormality on a mammogram benign or malignant? Do these two identities refer to the same person? ? =
Diagnosing Breast Cancer [Real Data: Davis et al. IJCAI 2005]
Diagnosing Breast Cancer [Real Data: Davis et al. IJCAI 2005]
Predicting Aliases [Synthetic data:  Davis et al. ICIA 2005]
Predicting Aliases [Synthetic data:  Davis et al. ICIA 2005]
A1: Dominance Theorem For a fixed number of positive and negative examples, one curve dominates another curve in ROC space if and only if the first curve dominates the second curve in PR space
Q2: What is the  “best”  PR curve?  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Convex Hull
Convex Hull
A2: Achievable Curve
A2: Achievable Curve
Constructing the Achievable Curve ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q3: Interpolation ,[object Object],[object Object],A B TPR FPR
Linear Interpolation Not Achievable in PR Space ,[object Object],Example Counts PR Curves ROC Curves 750 4750 0.75 0.53 0.75 0.14 0.10 1.00 1.00 1.00 9000 1000 0.50 0.50 0.06 0.50 500 500 Prec Recall FP Rate TP Rate FP TP
Example Interpolation ,[object Object],Q: For each extra TP covered, how many FPs do you cover? A: 0.25 0.5 30 10 B A  5 TP  5 FP  0.25 REC  0.5 PREC TP B -TP A FP B -FP A
Example Interpolation ,[object Object],0.25 0.5 30 10 B A  5 TP  5 FP  0.25 REC  0.5 PREC
Example Interpolation A dataset with 20 positive and 2000 negative examples 0.25 0.5 30 10 B . A  6 5 TP  10 5 FP  0.3 0.25 REC  0.375 0.5 PREC
Example Interpolation A dataset with 20 positive and 2000 negative examples 0.25 0.5 30 10 B . . . . A  9 8 7 6 5 TP  25 20 15 10 5 FP  0.45 0.4 0.35 0.3 0.25 REC  0.265 0.286 0.318 0.375 0.5 PREC
Optimizing AUC ,[object Object],[object Object],[object Object]
Back to Q1 ,[object Object],[object Object]
Dominance Theorem For a fixed number of positive and negative examples, one curve dominates another curve in ROC space if and only if the first curve dominates the second curve in Precision-Recall space
For Fixed N, P and TPR: FPR  Precision (Not =) + - + - True Answer Algorithm Answer N P 900 25 100 75
Conclusions about  PR and ROC Curves ,[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval EstimationShubham Mehta
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)Abhimanyu Dwivedi
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsUniversity of Salerno
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
C2 st lecture 11 the t-test handout
C2 st lecture 11   the t-test handoutC2 st lecture 11   the t-test handout
C2 st lecture 11 the t-test handoutfatima d
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- IIIAkhila Prabhakaran
 
Business Statistics Chapter 9
Business Statistics Chapter 9Business Statistics Chapter 9
Business Statistics Chapter 9Lux PP
 
Directional Hypothesis testing
Directional Hypothesis testing Directional Hypothesis testing
Directional Hypothesis testing Rupak Roy
 
Estimation and confidence interval
Estimation and confidence intervalEstimation and confidence interval
Estimation and confidence intervalHomework Guru
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportionsAditya Mahagaonkar
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)Abhimanyu Dwivedi
 
CORE: May the “Power” (Statistical) - Be with You!
CORE: May the “Power” (Statistical) - Be with You!CORE: May the “Power” (Statistical) - Be with You!
CORE: May the “Power” (Statistical) - Be with You!Trident University
 
Math3010 week 4
Math3010 week 4Math3010 week 4
Math3010 week 4stanbridge
 

What's hot (20)

Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval Estimation
 
Testing a claim about a mean
Testing a claim about a mean  Testing a claim about a mean
Testing a claim about a mean
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Statistics
StatisticsStatistics
Statistics
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
C2 st lecture 11 the t-test handout
C2 st lecture 11   the t-test handoutC2 st lecture 11   the t-test handout
C2 st lecture 11 the t-test handout
 
Machine learning session2
Machine learning   session2Machine learning   session2
Machine learning session2
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
Business Statistics Chapter 9
Business Statistics Chapter 9Business Statistics Chapter 9
Business Statistics Chapter 9
 
Directional Hypothesis testing
Directional Hypothesis testing Directional Hypothesis testing
Directional Hypothesis testing
 
Basics of Hypothesis Testing
Basics of Hypothesis TestingBasics of Hypothesis Testing
Basics of Hypothesis Testing
 
Estimation and confidence interval
Estimation and confidence intervalEstimation and confidence interval
Estimation and confidence interval
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportions
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
Estimation
EstimationEstimation
Estimation
 
CORE: May the “Power” (Statistical) - Be with You!
CORE: May the “Power” (Statistical) - Be with You!CORE: May the “Power” (Statistical) - Be with You!
CORE: May the “Power” (Statistical) - Be with You!
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Math3010 week 4
Math3010 week 4Math3010 week 4
Math3010 week 4
 

Viewers also liked

STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...Ist. Superiore Marini-Gioia - Enzo Exposyto
 
Variabilità e concentrazione
Variabilità e concentrazioneVariabilità e concentrazione
Variabilità e concentrazioneLuigi Pasini
 
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...Ist. Superiore Marini-Gioia - Enzo Exposyto
 
Appunti statistica descrittiva 2
Appunti statistica descrittiva 2Appunti statistica descrittiva 2
Appunti statistica descrittiva 2ESmargiassi
 
STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...
STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...
STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...Ist. Superiore Marini-Gioia - Enzo Exposyto
 
La statistica, medie e indici di variabilità
La statistica, medie e indici di variabilitàLa statistica, medie e indici di variabilità
La statistica, medie e indici di variabilitàLuigi Pasini
 
CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...
CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...
CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...Ist. Superiore Marini-Gioia - Enzo Exposyto
 

Viewers also liked (8)

STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 6a - ISTOGRAMMA, S...
 
Variabilità e concentrazione
Variabilità e concentrazioneVariabilità e concentrazione
Variabilità e concentrazione
 
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...
STATISTICA DESCRITTIVA - Dall'ISTOGRAMMA alla TABELLA-CASO 3a - CARATTERE, MO...
 
Appunti statistica descrittiva 2
Appunti statistica descrittiva 2Appunti statistica descrittiva 2
Appunti statistica descrittiva 2
 
Statistica
StatisticaStatistica
Statistica
 
STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...
STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...
STATISTICA DESCRITTIVA - PRIMI PASSI-4 - MEDIE, MODA, MEDIANA, ISTOGRAMMA, DI...
 
La statistica, medie e indici di variabilità
La statistica, medie e indici di variabilitàLa statistica, medie e indici di variabilità
La statistica, medie e indici di variabilità
 
CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...
CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...
CUBICA: dal GRAFICO all'EQUAZIONE ESEMPIO 2 - TRE METODI - CALCOLI e GRAFICI ...
 

Similar to MLlectureMethod.ppt

WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.AmnaArooj13
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 
VCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsVCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsAndrew Grichting
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptSolomonkiplimo
 
12 13 h2_measurement_ppt
12 13 h2_measurement_ppt12 13 h2_measurement_ppt
12 13 h2_measurement_pptTan Hong
 
Steps of hypothesis testingSelect the appropriate testSo far.docx
Steps of hypothesis testingSelect the appropriate testSo far.docxSteps of hypothesis testingSelect the appropriate testSo far.docx
Steps of hypothesis testingSelect the appropriate testSo far.docxdessiechisomjj4
 
Model Calibration and Uncertainty Analysis
Model Calibration and Uncertainty AnalysisModel Calibration and Uncertainty Analysis
Model Calibration and Uncertainty AnalysisJ Boisvert-Chouinard
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptxmahamoh6
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 

Similar to MLlectureMethod.ppt (20)

WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
VCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsVCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurments
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.ppt
 
12 13 h2_measurement_ppt
12 13 h2_measurement_ppt12 13 h2_measurement_ppt
12 13 h2_measurement_ppt
 
Steps of hypothesis testingSelect the appropriate testSo far.docx
Steps of hypothesis testingSelect the appropriate testSo far.docxSteps of hypothesis testingSelect the appropriate testSo far.docx
Steps of hypothesis testingSelect the appropriate testSo far.docx
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
Model Calibration and Uncertainty Analysis
Model Calibration and Uncertainty AnalysisModel Calibration and Uncertainty Analysis
Model Calibration and Uncertainty Analysis
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
lecture8.ppt
lecture8.pptlecture8.ppt
lecture8.ppt
 
Lecture8
Lecture8Lecture8
Lecture8
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

MLlectureMethod.ppt

  • 1.
  • 2.
  • 3. Experimental Methodology: A Pictorial Overview generate solutions select best LEARNER training examples train’ set tune set testing examples classifier expected accuracy on future examples collection of classified examples Statistical techniques such as 10-fold cross validation and t -tests are used to get meaningful results
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Confusion Matrices - Useful Way to Report TESTSET Errors Useful for NETtalk testbed – task of pronouncing written words
  • 14. Scatter Plots - Compare Two Algo’s on Many Datasets Algo A’s Error Rate Algo B’s Error Rate Each dot is the error rate of the two algo’s on ONE dataset
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. The Null Hypothesis Graphically (View #1) δ Assume zero mean and use the sample’s variance (sample = experiment) 1. ½ (1 – M ) probability mass in each tail (ie, M inside) Typically M = 0.95 Does our measured δ lie in the regions indicated by arrows? If so, reject null hypothesis, since it is unlikely we’d get such a δ by chance P( δ )
  • 30. View #2 – The Confidence Interval for δ δ Use sample’s mean and variance 2. Is zero in the M % of probability mass? If NOT, reject null hypothesis P( δ )
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Stability Stability = how much the model an algorithm learns changes due to minor perturbations of the training set Paired t -test assumptions are a better match to stable algorithm Example: k -NN, higher the k , the more stable
  • 40. More on Paired t -Test Assumption Ideally train on one data set and then do a 10-fold paired t -test What we should do: train test1 … test10 What we usually do: train1 test1 … train10 test10 However, not enough data usually to do the ideal If we assume that train data is part of each paired experiment then we violate independence assumptions - each train set overlaps 90% with every other train set Learned model does not vary while we’re measuring its performance
  • 41.
  • 42. One vs. Two-Tailed Graphically P(x) x 2.5% 2.5% 2.5% One-Tailed Test Two-Tailed Test
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Contingency Tables + - + - True Answer Algorithm Answer Counts of occurrences n(0,0) [true neg] n(0,1) [false neg] n(1,0) [false pos] n(1,1) [true pos]
  • 49. TPR and FPR True Positive Rate = n(1,1) / ( n(1,1) + n(0,1) ) (TPR) = correctly categorized +’s / total positives  P(algo outputs + | + is correct) False Positive Rate = n(1,0) / ( n(1,0) + n(0,0) ) (FPR) = incorrectly categorized –’s / total neg’s  P(algo outputs + | - is correct) Can similarly define False Negative Rate and True Negative Rate See http:// en.wikipedia.org/wiki/Type_I_and_type_II_errors
  • 50.
  • 51. ROC Curves Graphically 1.0 1.0 False positives rate True positives rate Prob (alg outputs + | + is correct) Prob (alg outputs + | - is correct) Ideal Spot Alg 1 Alg 2 Different algorithms can work better in different parts of ROC space. This depends on cost of false + vs false -
  • 52.
  • 53.
  • 54. Plotting ROC Curves - Example Ex 9 .99 + Ex 7 .98 + Ex 1 .72 - Ex 2 .70 + Ex 6 .65 + Ex 10 .51 - Ex 3 .39 - Ex 5 .24 + Ex 4 .11 - Ex 8 .01 - ML Algo Output (Sorted) Correct Category 1.0 1.0 P(alg outputs + | + is correct) P(alg outputs + | - is correct) TPR=(2/5), FPR=(0/5) TPR=(2/5), FPR=(1/5) TPR=(4/5), FPR=(1/5) TPR=(4/5), FPR=(3/5) TPR=(5/5), FPR=(3/5) TPR=(5/5), FPR=(5/5)
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. The Relationship between Precision-Recall and ROC Curves Jesse Davis & Mark Goadrich Department of Computer Sciences University of Wisconsin
  • 65.
  • 67. Definition: Area Under the Curve (AUC) Precision Recall TPR FPR
  • 68.
  • 69. Two Highly Skewed Domains Is an abnormality on a mammogram benign or malignant? Do these two identities refer to the same person? ? =
  • 70. Diagnosing Breast Cancer [Real Data: Davis et al. IJCAI 2005]
  • 71. Diagnosing Breast Cancer [Real Data: Davis et al. IJCAI 2005]
  • 72. Predicting Aliases [Synthetic data: Davis et al. ICIA 2005]
  • 73. Predicting Aliases [Synthetic data: Davis et al. ICIA 2005]
  • 74. A1: Dominance Theorem For a fixed number of positive and negative examples, one curve dominates another curve in ROC space if and only if the first curve dominates the second curve in PR space
  • 75.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85. Example Interpolation A dataset with 20 positive and 2000 negative examples 0.25 0.5 30 10 B . A 6 5 TP 10 5 FP 0.3 0.25 REC 0.375 0.5 PREC
  • 86. Example Interpolation A dataset with 20 positive and 2000 negative examples 0.25 0.5 30 10 B . . . . A 9 8 7 6 5 TP 25 20 15 10 5 FP 0.45 0.4 0.35 0.3 0.25 REC 0.265 0.286 0.318 0.375 0.5 PREC
  • 87.
  • 88.
  • 89. Dominance Theorem For a fixed number of positive and negative examples, one curve dominates another curve in ROC space if and only if the first curve dominates the second curve in Precision-Recall space
  • 90. For Fixed N, P and TPR: FPR Precision (Not =) + - + - True Answer Algorithm Answer N P 900 25 100 75
  • 91.