SlideShare a Scribd company logo
DATA SCIENCE
PROJECT
METHODOLOGY
Sergey Shelpuk
sergey@shelpuk.com
DATA
SCIENCE
IS ALL
ABOUT
BUSINESS
TOP BIG DATA CHALLENGES
0 10 20 30 40 50 60
Determining how to get value from Big Data
Defining our strategy
Obtaining skills and capabilities needed
Integrating multiple data sources
Infrastructure and/or architecture
Risk and governance issues
Funding for Big Data related initiatives
Understanding what is Big Data
Leadership or organizational issues
Other
Top challenge Second Third© Gartner
METHODOLOGY IS A KEY TO
SUCCESS
Cross-Industry Standard Process for Data Mining (CRISP-DM)
BUSINESS
UNDERSTANDING
Determining Business Objectives
1. Gather background information
 Compiling the business background
 Defining business objectives
 Business success criteria
2. Assessing the situation
 Resource Inventory
 Requirements, Assumptions, and Constraints
 Risks and Contingencies
 Cost/Benefit Analysis
4. Determining data science goals
 Data science goals
 Data science success criteria
4. Producing a Project Plan
© IBM
EXAMPLE OF THE
PROJECT PLAN
Phase Time Resources Risks
Business
understanding
1 week All analysts Economic change
Data
understanding
3 weeks All analysts Data problems, technology
problems
Data preparation 5 weeks Data scientists, DB
engineers
Data problems, technology
problems
Modeling 2 weeks Data scientists Technology problems, inability
to build adequate model
Evaluation 1 week All analysts Economic change, inability to
implement results
Deployment 1 week Data scientist, DB
engineers,
implementation
team
Economic change, inability to
implement results
© IBM
READY FOR THE
DATA UNDERSTANDING?
From a business perspective:
 What does your business hope to gain from this project?
 How will you define the successful completion of our efforts?
 Do you have the budget and resources needed to reach our goals?
 Do you have access to all the data needed for this project?
 Have you and your team discussed the risks and contingencies associated with
this project?
 Do the results of your cost/benefit analysis make this project worthwhile?
From a data science perspective:
 How specifically can data mining help you meet your business goals?
 Do you have an idea about which data mining techniques might produce the best
results?
 How will you know when your results are accurate or effective enough? (Have we
set a measurement of data mining success?)
 How will the modeling results be deployed? Have you considered deployment in
your project plan?
 Does the project plan include all phases of CRISP-DM?
 Are risks and dependencies called out in the plan?© IBM
DATA
UNDERSTANDING
© IBM
1. Collect initial data
 Existing data
 Purchased data
 Additional data
2. Describe data
 Amount of data
 Value types
 Coding schemes
3. Explore data
4. Verify data quality
 Missing data
 Data errors
 Coding inconsistencies
 Bad metadata
READY FOR THE
DATA PREPARATION?
 Are all data sources clearly identified and accessed? Are you aware of
any problems or restrictions?
 Have you identified key attributes from the available data?
 Did these attributes help you to formulate hypotheses?
 Have you noted the size of all data sources?
 Are you able to use a subset of data where appropriate?
 Have you computed basic statistics for each attribute of interest? Did
meaningful information emerge?
 Did you use exploratory graphics to gain further insight into key
attributes? Did this insight reshape any of your hypotheses?
 What are the data quality issues for this project? Do you have a plan to
address these issues?
 Are the data preparation steps clear? For instance, do you know which
data sources to merge and which attributes to filter or select?
© IBM
DATA
PREPARATION
© IBM
1. Select right data
 Select training examples
 Select features
2. Clean data
 Fill in missed data
 Correct data errors
 Make coding consistent
2. Extend data
 Extend training examples
 Extend features
2. Format data
 Put data in a format for training the model
READY FOR THE
MODELING?
 Based upon your initial exploration and understanding, were you able to
select relevant subsets of data?
 Have you cleaned the data effectively or removed unsalvageable items?
Document any decisions in the final report.
 Are multiple data sets integrated properly? Were there any merging
problems that should be documented?
 Have you researched the requirements of the modeling tools that you
plan to use?
 Are there any formatting issues you can address before modeling? This
includes both required formatting concerns as well as tasks that may
reduce modeling time.
© IBM
MODELING
© IBM
1. Select modeling techniques
 Select data types available for analysis
 Select an algorithm or a model
Define modeling goals
 State specific modeling requirements
2. Build the model
 Set up hyperparameters
 Train the model
 Describe the result
3. Assess the model
READY FOR THE
EVALUATION?
 Are you able to understand the results of the models?
 Do the model results make sense to you from a purely logical
perspective? Are there apparent inconsistencies that need further
exploration?
 From your initial glance, do the results seem to address your
organization’s business question?
 Have you used analysis nodes and lift or gains charts to compare and
evaluate model accuracy?
 Have you explored more than one type of model and compared the
results?
 Are the results of your model deployable?
© IBM
EVALUATION
© IBM
1. Evaluate the results
 Are results presented clearly?
 Are there any novel findings?
 Can models and findings be applicable to business
goals?
 How well do the models and findings answer business
goals?
 What additional questions the modeling results have
risen?
2. Review the process
 Did the stage contribute to the value of the results?
 What went wrong and how it can be fixed?
 Are there alternative decisions which could have been
executed?
2. Determine the next steps
DEPLOYMENT
© IBM
1. Planning for deployment
 Summarize models and findings
 For each model create a deployment plan
 Identify any deployment problems and plan for
contingencies
2. Plan monitoring and maintenance
 Identify models and findings which require support
 How can the accuracy and validity be evaluated?
 How will you determine that a model has expired?
 What to do with the expired models?
2. Conduct a final project review
THANK YOU

More Related Content

What's hot

Data Strategy - Executive MBA Class, IE Business School
Data Strategy - Executive MBA Class, IE Business SchoolData Strategy - Executive MBA Class, IE Business School
Data Strategy - Executive MBA Class, IE Business School
Gam Dias
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Data analytics
Data analyticsData analytics
Data analytics
davidfergarcia
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
Ahmed Banafa
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data science
Navin Manaswi
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
Eva Durall
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
Manish Jain
 
Our big data
Our big dataOur big data
Our big data
uthrarajan
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Utkarsh Sharma
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
Spotle.ai
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Edureka!
 
Data Governance
Data GovernanceData Governance
Data Governance
SambaSoup
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
Atidan Technologies Pvt Ltd (India)
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
SSaudia
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
Ironside
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
Vishal Patel
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
BrijeshGoyani
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 

What's hot (20)

Data Strategy - Executive MBA Class, IE Business School
Data Strategy - Executive MBA Class, IE Business SchoolData Strategy - Executive MBA Class, IE Business School
Data Strategy - Executive MBA Class, IE Business School
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Data analytics
Data analyticsData analytics
Data analytics
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data science
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Our big data
Our big dataOur big data
Our big data
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 

Similar to CRISP-DM: a data science project methodology

Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
ellamangapis2003
 
Analytics
AnalyticsAnalytics
Santander's Data Transformation
Santander's Data TransformationSantander's Data Transformation
Santander's Data Transformation
Umran Rafi
 
Data integration my_experience
Data integration my_experienceData integration my_experience
Data integration my_experience
Arvind Murali. CIPM, CPM, M
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
DataScienceConferenc1
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
simonithomas47935
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
AgileNetwork
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
Matthew Reynolds
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdf
WinduGata3
 
Best practice for_agile_ds_projects
Best practice for_agile_ds_projectsBest practice for_agile_ds_projects
Best practice for_agile_ds_projects
Khalid Kahloot
 
Data mining
Data miningData mining
Data mining
GILM Project
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision Modeling
Decision Management Solutions
 
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
IsmailCassiem
 
Doing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting AnalyticsDoing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting Analytics
Tasktop
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
Shristi Shrestha
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
M Baddar
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Steve Feldman
 
Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure
MCAL Management Consulting and Advanced Learning
 
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
Dickinson + Associates
 

Similar to CRISP-DM: a data science project methodology (20)

Big data@work
Big data@workBig data@work
Big data@work
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Analytics
AnalyticsAnalytics
Analytics
 
Santander's Data Transformation
Santander's Data TransformationSantander's Data Transformation
Santander's Data Transformation
 
Data integration my_experience
Data integration my_experienceData integration my_experience
Data integration my_experience
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdf
 
Best practice for_agile_ds_projects
Best practice for_agile_ds_projectsBest practice for_agile_ds_projects
Best practice for_agile_ds_projects
 
Data mining
Data miningData mining
Data mining
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision Modeling
 
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
 
Doing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting AnalyticsDoing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting Analytics
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
 
Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure
 
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
 

More from Sergey Shelpuk

Data science: A New Profession in IT
Data science: A New Profession in ITData science: A New Profession in IT
Data science: A New Profession in IT
Sergey Shelpuk
 
Buzzword scheme
Buzzword schemeBuzzword scheme
Buzzword scheme
Sergey Shelpuk
 
Machine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewMachine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics Overview
Sergey Shelpuk
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?
Sergey Shelpuk
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning intro
Sergey Shelpuk
 
How to take over the world with artificial intelligence final
How to take over the world with artificial intelligence finalHow to take over the world with artificial intelligence final
How to take over the world with artificial intelligence final
Sergey Shelpuk
 
Object similarity with office laptop
Object similarity with office laptopObject similarity with office laptop
Object similarity with office laptop
Sergey Shelpuk
 
Data science for HR
Data science for HRData science for HR
Data science for HR
Sergey Shelpuk
 

More from Sergey Shelpuk (8)

Data science: A New Profession in IT
Data science: A New Profession in ITData science: A New Profession in IT
Data science: A New Profession in IT
 
Buzzword scheme
Buzzword schemeBuzzword scheme
Buzzword scheme
 
Machine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewMachine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics Overview
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning intro
 
How to take over the world with artificial intelligence final
How to take over the world with artificial intelligence finalHow to take over the world with artificial intelligence final
How to take over the world with artificial intelligence final
 
Object similarity with office laptop
Object similarity with office laptopObject similarity with office laptop
Object similarity with office laptop
 
Data science for HR
Data science for HRData science for HR
Data science for HR
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 

Recently uploaded (20)

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 

CRISP-DM: a data science project methodology

  • 3. TOP BIG DATA CHALLENGES 0 10 20 30 40 50 60 Determining how to get value from Big Data Defining our strategy Obtaining skills and capabilities needed Integrating multiple data sources Infrastructure and/or architecture Risk and governance issues Funding for Big Data related initiatives Understanding what is Big Data Leadership or organizational issues Other Top challenge Second Third© Gartner
  • 4. METHODOLOGY IS A KEY TO SUCCESS Cross-Industry Standard Process for Data Mining (CRISP-DM)
  • 5. BUSINESS UNDERSTANDING Determining Business Objectives 1. Gather background information  Compiling the business background  Defining business objectives  Business success criteria 2. Assessing the situation  Resource Inventory  Requirements, Assumptions, and Constraints  Risks and Contingencies  Cost/Benefit Analysis 4. Determining data science goals  Data science goals  Data science success criteria 4. Producing a Project Plan © IBM
  • 6. EXAMPLE OF THE PROJECT PLAN Phase Time Resources Risks Business understanding 1 week All analysts Economic change Data understanding 3 weeks All analysts Data problems, technology problems Data preparation 5 weeks Data scientists, DB engineers Data problems, technology problems Modeling 2 weeks Data scientists Technology problems, inability to build adequate model Evaluation 1 week All analysts Economic change, inability to implement results Deployment 1 week Data scientist, DB engineers, implementation team Economic change, inability to implement results © IBM
  • 7. READY FOR THE DATA UNDERSTANDING? From a business perspective:  What does your business hope to gain from this project?  How will you define the successful completion of our efforts?  Do you have the budget and resources needed to reach our goals?  Do you have access to all the data needed for this project?  Have you and your team discussed the risks and contingencies associated with this project?  Do the results of your cost/benefit analysis make this project worthwhile? From a data science perspective:  How specifically can data mining help you meet your business goals?  Do you have an idea about which data mining techniques might produce the best results?  How will you know when your results are accurate or effective enough? (Have we set a measurement of data mining success?)  How will the modeling results be deployed? Have you considered deployment in your project plan?  Does the project plan include all phases of CRISP-DM?  Are risks and dependencies called out in the plan?© IBM
  • 8. DATA UNDERSTANDING © IBM 1. Collect initial data  Existing data  Purchased data  Additional data 2. Describe data  Amount of data  Value types  Coding schemes 3. Explore data 4. Verify data quality  Missing data  Data errors  Coding inconsistencies  Bad metadata
  • 9. READY FOR THE DATA PREPARATION?  Are all data sources clearly identified and accessed? Are you aware of any problems or restrictions?  Have you identified key attributes from the available data?  Did these attributes help you to formulate hypotheses?  Have you noted the size of all data sources?  Are you able to use a subset of data where appropriate?  Have you computed basic statistics for each attribute of interest? Did meaningful information emerge?  Did you use exploratory graphics to gain further insight into key attributes? Did this insight reshape any of your hypotheses?  What are the data quality issues for this project? Do you have a plan to address these issues?  Are the data preparation steps clear? For instance, do you know which data sources to merge and which attributes to filter or select? © IBM
  • 10. DATA PREPARATION © IBM 1. Select right data  Select training examples  Select features 2. Clean data  Fill in missed data  Correct data errors  Make coding consistent 2. Extend data  Extend training examples  Extend features 2. Format data  Put data in a format for training the model
  • 11. READY FOR THE MODELING?  Based upon your initial exploration and understanding, were you able to select relevant subsets of data?  Have you cleaned the data effectively or removed unsalvageable items? Document any decisions in the final report.  Are multiple data sets integrated properly? Were there any merging problems that should be documented?  Have you researched the requirements of the modeling tools that you plan to use?  Are there any formatting issues you can address before modeling? This includes both required formatting concerns as well as tasks that may reduce modeling time. © IBM
  • 12. MODELING © IBM 1. Select modeling techniques  Select data types available for analysis  Select an algorithm or a model Define modeling goals  State specific modeling requirements 2. Build the model  Set up hyperparameters  Train the model  Describe the result 3. Assess the model
  • 13. READY FOR THE EVALUATION?  Are you able to understand the results of the models?  Do the model results make sense to you from a purely logical perspective? Are there apparent inconsistencies that need further exploration?  From your initial glance, do the results seem to address your organization’s business question?  Have you used analysis nodes and lift or gains charts to compare and evaluate model accuracy?  Have you explored more than one type of model and compared the results?  Are the results of your model deployable? © IBM
  • 14. EVALUATION © IBM 1. Evaluate the results  Are results presented clearly?  Are there any novel findings?  Can models and findings be applicable to business goals?  How well do the models and findings answer business goals?  What additional questions the modeling results have risen? 2. Review the process  Did the stage contribute to the value of the results?  What went wrong and how it can be fixed?  Are there alternative decisions which could have been executed? 2. Determine the next steps
  • 15. DEPLOYMENT © IBM 1. Planning for deployment  Summarize models and findings  For each model create a deployment plan  Identify any deployment problems and plan for contingencies 2. Plan monitoring and maintenance  Identify models and findings which require support  How can the accuracy and validity be evaluated?  How will you determine that a model has expired?  What to do with the expired models? 2. Conduct a final project review