SlideShare a Scribd company logo
1 of 12
Extension and
Validation of Moro et
al.
By: Tapan Oza
Goals
• Repeatable results
• Use same data
• Use same protocol

• Extension

Tapan Oza

• Validation

• Same data, new protocols
• Averaged one-dependence estimators (AODE)
• Random Forest

• Tools used: Weka

2
• "Using data mining for bank direct marketing: An
application of the CRISP-DM methodology." Moro et al.
• CRISP-DM: CRoss-Industry Standard Data Mining
• Paper uses data from a Portuguese bank
• Acquired via Call Center in 17 different campaigns
• Large number of features
• Large number of cases

Tapan Oza

Original Paper

• Classification methodologies:
• Naïve Bayes
• Decision Tree
• Support Vector Machine

3
Tapan Oza

CRISP-DM

4
Classification Methodologies
• Assumes independent features
• Classification using Bayes Rule
• Apply a decision rule on probability function

• Decision Tree
• Many ways to build tree
• Common method splits on information gain

Tapan Oza

• Naïve Bayes

• Support Vector Machine
• Requires linearly separable data
• Identifies separating hyperplanes
5
Performance: Accuracy vs Speed
• Data mining is strategic
• Computation costs are falling (Amazon EC2)
• Without accuracy, model is useless

• What do we use to measure Accuracy?

Tapan Oza

• Why Accuracy?

• Area under the receiver operating characteristic curve
(AUROC)
• Higher AUROC = more confidence in classification
6
Extensions
• Modified Naïve Bayes
• Weak assumption of data independence
• Higher computational cost
• Computation is cheap

• Random Forest
•
•
•
•

Tapan Oza

• AODE

Many trees, one classification
Every tree “votes” on classification
Class with most “votes” is chosen
Impressive accuracy
7
Results: Validation
• Paper doesn’t specify tree type
• 2 out of 3 validated
• SVM not validated
AUROC

SVM

NB

Decision Tree

Original

0.938

0.870

0.868

Validation

0.583

0.861

Tapan Oza

• Average two different tree results

0.863

8
Results: Extension
• Extension was to have two models

• Weka output for AODE was incomplete
• Cause unknown
• Could be Weka

Tapan Oza

• AODE
• Random forest

• Random forest AUROC is 0.9
• Best result out of all the algorithms

9
• Random forest has impressive accuracy
• Naïve Bayes, Decision Tree, Random Forest are accurate
enough for deployment
• Make sure you have the same tools when validating
• Make sure you use multiple tools when testing
extensions

Tapan Oza

Lessons Learned

10
• Moro, Sérgio, Raul Laureano, and Paulo Cortez. "Using data
mining for bank direct marketing: An application of the crispdm methodology." (2011).
• Breiman, Leo. "Random forests." Machine learning 45.1
(2001): 5-32.
• Webb, Geoffrey I., Janice R. Boughton, and Zhihai Wang. "Not
so naive bayes: Aggregating one-dependence estimators."
Machine Learning 58.1 (2005): 5-24.

Tapan Oza

References:

11
Questions?

More Related Content

Similar to Extension and validation of moro et al

Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application DevelopmentLARCA UPC
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisAdamCribbs1
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data labChris Kernaghan
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLsFANCY ARORA
 
AIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements ObservatoryAIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements ObservatoryInternet Society
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data ScienceCarlos Edo
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining IntroAsma CHERIF
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...TEST Huddle
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningAbzetdin Adamov
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data ScienceThinkful
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...imgcommcall
 

Similar to Extension and validation of moro et al (20)

Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
 
AIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements ObservatoryAIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements Observatory
 
Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine Learning
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Data mining
Data miningData mining
Data mining
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
Big data
Big dataBig data
Big data
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Extension and validation of moro et al

  • 1. Extension and Validation of Moro et al. By: Tapan Oza
  • 2. Goals • Repeatable results • Use same data • Use same protocol • Extension Tapan Oza • Validation • Same data, new protocols • Averaged one-dependence estimators (AODE) • Random Forest • Tools used: Weka 2
  • 3. • "Using data mining for bank direct marketing: An application of the CRISP-DM methodology." Moro et al. • CRISP-DM: CRoss-Industry Standard Data Mining • Paper uses data from a Portuguese bank • Acquired via Call Center in 17 different campaigns • Large number of features • Large number of cases Tapan Oza Original Paper • Classification methodologies: • Naïve Bayes • Decision Tree • Support Vector Machine 3
  • 5. Classification Methodologies • Assumes independent features • Classification using Bayes Rule • Apply a decision rule on probability function • Decision Tree • Many ways to build tree • Common method splits on information gain Tapan Oza • Naïve Bayes • Support Vector Machine • Requires linearly separable data • Identifies separating hyperplanes 5
  • 6. Performance: Accuracy vs Speed • Data mining is strategic • Computation costs are falling (Amazon EC2) • Without accuracy, model is useless • What do we use to measure Accuracy? Tapan Oza • Why Accuracy? • Area under the receiver operating characteristic curve (AUROC) • Higher AUROC = more confidence in classification 6
  • 7. Extensions • Modified Naïve Bayes • Weak assumption of data independence • Higher computational cost • Computation is cheap • Random Forest • • • • Tapan Oza • AODE Many trees, one classification Every tree “votes” on classification Class with most “votes” is chosen Impressive accuracy 7
  • 8. Results: Validation • Paper doesn’t specify tree type • 2 out of 3 validated • SVM not validated AUROC SVM NB Decision Tree Original 0.938 0.870 0.868 Validation 0.583 0.861 Tapan Oza • Average two different tree results 0.863 8
  • 9. Results: Extension • Extension was to have two models • Weka output for AODE was incomplete • Cause unknown • Could be Weka Tapan Oza • AODE • Random forest • Random forest AUROC is 0.9 • Best result out of all the algorithms 9
  • 10. • Random forest has impressive accuracy • Naïve Bayes, Decision Tree, Random Forest are accurate enough for deployment • Make sure you have the same tools when validating • Make sure you use multiple tools when testing extensions Tapan Oza Lessons Learned 10
  • 11. • Moro, Sérgio, Raul Laureano, and Paulo Cortez. "Using data mining for bank direct marketing: An application of the crispdm methodology." (2011). • Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32. • Webb, Geoffrey I., Janice R. Boughton, and Zhihai Wang. "Not so naive bayes: Aggregating one-dependence estimators." Machine Learning 58.1 (2005): 5-24. Tapan Oza References: 11