Extension and
Validation of Moro et
al.
By: Tapan Oza
Goals
• Repeatable results
• Use same data
• Use same protocol

• Extension

Tapan Oza

• Validation

• Same data, new pro...
• "Using data mining for bank direct marketing: An
application of the CRISP-DM methodology." Moro et al.
• CRISP-DM: CRoss...
Tapan Oza

CRISP-DM

4
Classification Methodologies
• Assumes independent features
• Classification using Bayes Rule
• Apply a decision rule on p...
Performance: Accuracy vs Speed
• Data mining is strategic
• Computation costs are falling (Amazon EC2)
• Without accuracy,...
Extensions
• Modified Naïve Bayes
• Weak assumption of data independence
• Higher computational cost
• Computation is chea...
Results: Validation
• Paper doesn’t specify tree type
• 2 out of 3 validated
• SVM not validated
AUROC

SVM

NB

Decision ...
Results: Extension
• Extension was to have two models

• Weka output for AODE was incomplete
• Cause unknown
• Could be We...
• Random forest has impressive accuracy
• Naïve Bayes, Decision Tree, Random Forest are accurate
enough for deployment
• M...
• Moro, Sérgio, Raul Laureano, and Paulo Cortez. "Using data
mining for bank direct marketing: An application of the crisp...
Questions?
Upcoming SlideShare
Loading in …5
×

Extension and validation of moro et al

92
-1

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
92
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Extension and validation of moro et al

  1. 1. Extension and Validation of Moro et al. By: Tapan Oza
  2. 2. Goals • Repeatable results • Use same data • Use same protocol • Extension Tapan Oza • Validation • Same data, new protocols • Averaged one-dependence estimators (AODE) • Random Forest • Tools used: Weka 2
  3. 3. • "Using data mining for bank direct marketing: An application of the CRISP-DM methodology." Moro et al. • CRISP-DM: CRoss-Industry Standard Data Mining • Paper uses data from a Portuguese bank • Acquired via Call Center in 17 different campaigns • Large number of features • Large number of cases Tapan Oza Original Paper • Classification methodologies: • Naïve Bayes • Decision Tree • Support Vector Machine 3
  4. 4. Tapan Oza CRISP-DM 4
  5. 5. Classification Methodologies • Assumes independent features • Classification using Bayes Rule • Apply a decision rule on probability function • Decision Tree • Many ways to build tree • Common method splits on information gain Tapan Oza • Naïve Bayes • Support Vector Machine • Requires linearly separable data • Identifies separating hyperplanes 5
  6. 6. Performance: Accuracy vs Speed • Data mining is strategic • Computation costs are falling (Amazon EC2) • Without accuracy, model is useless • What do we use to measure Accuracy? Tapan Oza • Why Accuracy? • Area under the receiver operating characteristic curve (AUROC) • Higher AUROC = more confidence in classification 6
  7. 7. Extensions • Modified Naïve Bayes • Weak assumption of data independence • Higher computational cost • Computation is cheap • Random Forest • • • • Tapan Oza • AODE Many trees, one classification Every tree “votes” on classification Class with most “votes” is chosen Impressive accuracy 7
  8. 8. Results: Validation • Paper doesn’t specify tree type • 2 out of 3 validated • SVM not validated AUROC SVM NB Decision Tree Original 0.938 0.870 0.868 Validation 0.583 0.861 Tapan Oza • Average two different tree results 0.863 8
  9. 9. Results: Extension • Extension was to have two models • Weka output for AODE was incomplete • Cause unknown • Could be Weka Tapan Oza • AODE • Random forest • Random forest AUROC is 0.9 • Best result out of all the algorithms 9
  10. 10. • Random forest has impressive accuracy • Naïve Bayes, Decision Tree, Random Forest are accurate enough for deployment • Make sure you have the same tools when validating • Make sure you use multiple tools when testing extensions Tapan Oza Lessons Learned 10
  11. 11. • Moro, Sérgio, Raul Laureano, and Paulo Cortez. "Using data mining for bank direct marketing: An application of the crispdm methodology." (2011). • Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32. • Webb, Geoffrey I., Janice R. Boughton, and Zhihai Wang. "Not so naive bayes: Aggregating one-dependence estimators." Machine Learning 58.1 (2005): 5-24. Tapan Oza References: 11
  12. 12. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×