SlideShare a Scribd company logo
Extension and
Validation of Moro et
al.
By: Tapan Oza
Goals
• Repeatable results
• Use same data
• Use same protocol

• Extension

Tapan Oza

• Validation

• Same data, new protocols
• Averaged one-dependence estimators (AODE)
• Random Forest

• Tools used: Weka

2
• "Using data mining for bank direct marketing: An
application of the CRISP-DM methodology." Moro et al.
• CRISP-DM: CRoss-Industry Standard Data Mining
• Paper uses data from a Portuguese bank
• Acquired via Call Center in 17 different campaigns
• Large number of features
• Large number of cases

Tapan Oza

Original Paper

• Classification methodologies:
• Naïve Bayes
• Decision Tree
• Support Vector Machine

3
Tapan Oza

CRISP-DM

4
Classification Methodologies
• Assumes independent features
• Classification using Bayes Rule
• Apply a decision rule on probability function

• Decision Tree
• Many ways to build tree
• Common method splits on information gain

Tapan Oza

• Naïve Bayes

• Support Vector Machine
• Requires linearly separable data
• Identifies separating hyperplanes
5
Performance: Accuracy vs Speed
• Data mining is strategic
• Computation costs are falling (Amazon EC2)
• Without accuracy, model is useless

• What do we use to measure Accuracy?

Tapan Oza

• Why Accuracy?

• Area under the receiver operating characteristic curve
(AUROC)
• Higher AUROC = more confidence in classification
6
Extensions
• Modified Naïve Bayes
• Weak assumption of data independence
• Higher computational cost
• Computation is cheap

• Random Forest
•
•
•
•

Tapan Oza

• AODE

Many trees, one classification
Every tree “votes” on classification
Class with most “votes” is chosen
Impressive accuracy
7
Results: Validation
• Paper doesn’t specify tree type
• 2 out of 3 validated
• SVM not validated
AUROC

SVM

NB

Decision Tree

Original

0.938

0.870

0.868

Validation

0.583

0.861

Tapan Oza

• Average two different tree results

0.863

8
Results: Extension
• Extension was to have two models

• Weka output for AODE was incomplete
• Cause unknown
• Could be Weka

Tapan Oza

• AODE
• Random forest

• Random forest AUROC is 0.9
• Best result out of all the algorithms

9
• Random forest has impressive accuracy
• Naïve Bayes, Decision Tree, Random Forest are accurate
enough for deployment
• Make sure you have the same tools when validating
• Make sure you use multiple tools when testing
extensions

Tapan Oza

Lessons Learned

10
• Moro, Sérgio, Raul Laureano, and Paulo Cortez. "Using data
mining for bank direct marketing: An application of the crispdm methodology." (2011).
• Breiman, Leo. "Random forests." Machine learning 45.1
(2001): 5-32.
• Webb, Geoffrey I., Janice R. Boughton, and Zhihai Wang. "Not
so naive bayes: Aggregating one-dependence estimators."
Machine Learning 58.1 (2005): 5-24.

Tapan Oza

References:

11
Questions?

More Related Content

Similar to Extension and validation of moro et al

Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
LARCA UPC
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
Maarten Smeets
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
AdamCribbs1
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
SaketBansal9
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
Chris Kernaghan
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
FANCY ARORA
 
AIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements ObservatoryAIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements Observatory
Internet Society
 
Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
Library and Information Science Research Coalition
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
Carlos Edo
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
Asma CHERIF
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
TEST Huddle
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Abzetdin Adamov
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Data mining
Data miningData mining
Data mining
Akanksha Yadav
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
Thinkful
 
Big data
Big dataBig data
Big data
nikki135
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
imgcommcall
 

Similar to Extension and validation of moro et al (20)

Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
 
AIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements ObservatoryAIMO: An African Internet Measurements Observatory
AIMO: An African Internet Measurements Observatory
 
Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine Learning
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Data mining
Data miningData mining
Data mining
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
Big data
Big dataBig data
Big data
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

Extension and validation of moro et al

  • 1. Extension and Validation of Moro et al. By: Tapan Oza
  • 2. Goals • Repeatable results • Use same data • Use same protocol • Extension Tapan Oza • Validation • Same data, new protocols • Averaged one-dependence estimators (AODE) • Random Forest • Tools used: Weka 2
  • 3. • "Using data mining for bank direct marketing: An application of the CRISP-DM methodology." Moro et al. • CRISP-DM: CRoss-Industry Standard Data Mining • Paper uses data from a Portuguese bank • Acquired via Call Center in 17 different campaigns • Large number of features • Large number of cases Tapan Oza Original Paper • Classification methodologies: • Naïve Bayes • Decision Tree • Support Vector Machine 3
  • 5. Classification Methodologies • Assumes independent features • Classification using Bayes Rule • Apply a decision rule on probability function • Decision Tree • Many ways to build tree • Common method splits on information gain Tapan Oza • Naïve Bayes • Support Vector Machine • Requires linearly separable data • Identifies separating hyperplanes 5
  • 6. Performance: Accuracy vs Speed • Data mining is strategic • Computation costs are falling (Amazon EC2) • Without accuracy, model is useless • What do we use to measure Accuracy? Tapan Oza • Why Accuracy? • Area under the receiver operating characteristic curve (AUROC) • Higher AUROC = more confidence in classification 6
  • 7. Extensions • Modified Naïve Bayes • Weak assumption of data independence • Higher computational cost • Computation is cheap • Random Forest • • • • Tapan Oza • AODE Many trees, one classification Every tree “votes” on classification Class with most “votes” is chosen Impressive accuracy 7
  • 8. Results: Validation • Paper doesn’t specify tree type • 2 out of 3 validated • SVM not validated AUROC SVM NB Decision Tree Original 0.938 0.870 0.868 Validation 0.583 0.861 Tapan Oza • Average two different tree results 0.863 8
  • 9. Results: Extension • Extension was to have two models • Weka output for AODE was incomplete • Cause unknown • Could be Weka Tapan Oza • AODE • Random forest • Random forest AUROC is 0.9 • Best result out of all the algorithms 9
  • 10. • Random forest has impressive accuracy • Naïve Bayes, Decision Tree, Random Forest are accurate enough for deployment • Make sure you have the same tools when validating • Make sure you use multiple tools when testing extensions Tapan Oza Lessons Learned 10
  • 11. • Moro, Sérgio, Raul Laureano, and Paulo Cortez. "Using data mining for bank direct marketing: An application of the crispdm methodology." (2011). • Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32. • Webb, Geoffrey I., Janice R. Boughton, and Zhihai Wang. "Not so naive bayes: Aggregating one-dependence estimators." Machine Learning 58.1 (2005): 5-24. Tapan Oza References: 11