SlideShare a Scribd company logo
1 of 33
1
CRISP-DM
CSE634 Data Mining
Prof. Anita Wasilewska
Jae Hong Kil (105228510)
2
References
• Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR),
Thomas Khabaza (SPSS), Thomas Reinartz, (DaimlerChrysler), Colin
Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler) “CRISP-DM 1.0 -
Step-by-step data mining guide”
• P. Gonzalez-Aranda, E.Menasalvas, S.Millan, F. Segovia “Towards a
Methodology for Data Mining Project Development: The Importance of
Abstraction”
• Laura Squier “What is Data Mining?” PPT
• “The CRISP-DM Model: The New Blueprint for DataMining”, Colin
Shearer, JOURNAL of Data Warehousing, Volume 5, Number 4,
p. 13-22, 2000
3
References
• Websites
http://www.crisp-dm.org/
http://www.crisp-dm.org/CRISPWP-800.pdf
http://www.spss.com/
http://www.kdnuggets.com/
4
Overview
• Introduction to CRISP-DM
• Phases and Tasks
• Summary
5
CRISP-DM
CRoss-Industry Standard Process
for Data Mining
6
Why Should There be a Standard Process?
The data mining process must be reliable and
repeatable by people with little data mining
background.
7
Why Should There be a Standard Process?
• Framework for recording experience
– Allows projects to be replicated
• Aid to project planning and management
• “Comfort factor” for new adopters
– Demonstrates maturity of Data Mining
– Reduces dependency on “stars”
8
Process Standardization
• Initiative launched in late 1996 by three “veterans” of data mining market.
Daimler Chrysler (then Daimler-Benz), SPSS (then ISL) , NCR
• Developed and refined through series of workshops (from 1997-1999)
• Over 300 organization contributed to the process model
• Published CRISP-DM 1.0 (1999)
• Over 200 members of the CRISP-DM SIG worldwide
- DM Vendors - SPSS, NCR, IBM, SAS, SGI, Data Distilleries, Syllogic, etc.
- System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte & Touche, etc.
- End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, etc.
9
CRISP-DM
• Non-proprietary
• Application/Industry neutral
• Tool neutral
• Focus on business issues
– As well as technical analysis
• Framework for guidance
• Experience base
– Templates for Analysis
10
CRISP-DM: Overview
• Data Mining methodology
• Process Model
• For anyone
• Provides a complete blueprint
• Life cycle: 6 phases
11
CRISP-DM: Phases
• Business Understanding
Project objectives and requirements understanding, Data mining problem definition
• Data Understanding
Initial data collection and familiarization, Data quality problems identification
• Data Preparation
Table, record and attribute selection, Data transformation and cleaning
• Modeling
Modeling techniques selection and application, Parameters calibration
• Evaluation
Business objectives & issues achievement evaluation
• Deployment
Result model deployment, Repeatable data mining process implementation
12
Phases and Tasks
Business
Understanding
Data
Understanding
Data
Preparation
Modeling Deployment
Evaluation
Format
Data
Integrate
Data
Construct
Data
Clean
Data
Select
Data
Determine
Business
Objectives
Review
Project
Produce
Final
Report
Plan Monitering
&
Maintenance
Plan
Deployment
Determine
Next Steps
Review
Process
Evaluate
Results
Assess
Model
Build
Model
Generate
Test Design
Select
Modeling
Technique
Assess
Situation
Explore
Data
Describe
Data
Collect
Initial
Data
Determine
Data Mining
Goals
Verify
Data
Quality
Produce
Project Plan
13
Phase 1. Business Understanding
• Statement of Business Objective
• Statement of Data Mining Objective
• Statement of Success Criteria
Focuses on understanding the project objectives and requirements
from a business perspective, then converting this knowledge into a
data mining problem definition and a preliminary plan designed to
achieve the objectives
14
Phase 1. Business Understanding
• Determine business objectives
- thoroughly understand, from a business perspective, what the client
really wants to accomplish
- uncover important factors, at the beginning, that can influence the
outcome of the project
- neglecting this step is to expend a great deal of effort producing the
right answers to the wrong questions
• Assess situation
- more detailed fact-finding about all of the resources, constraints,
assumptions and other factors that should be considered
- flesh out the details
15
Phase 1. Business Understanding
• Determine data mining goals
- a business goal states objectives in business terminology
- a data mining goal states project objectives in technical terms
ex) the business goal: “Increase catalog sales to existing customers.”
a data mining goal: “Predict how many widgets a customer will buy,
given their purchases over the past three years,
demographic information (age, salary, city) and
the price of the item.”
• Produce project plan
- describe the intended plan for achieving the data mining goals and the
business goals
- the plan should specify the anticipated set of steps to be performed
during the rest of the project including an initial selection of tools and
techniques
16
Phase 2. Data Understanding
• Explore the Data
• Verify the Quality
• Find Outliers
Starts with an initial data collection and proceeds with activities in
order to get familiar with the data, to identify data quality problems,
to discover first insights into the data or to detect interesting subsets
to form hypotheses for hidden information.
17
Phase 2. Data Understanding
• Collect initial data
- acquire within the project the data listed in the project resources
- includes data loading if necessary for data understanding
- possibly leads to initial data preparation steps
- if acquiring multiple data sources, integration is an additional issue,
either here or in the later data preparation phase
• Describe data
- examine the “gross” or “surface” properties of the acquired data
- report on the results
18
Phase 2. Data Understanding
• Explore data
- tackles the data mining questions, which can be addressed using
querying, visualization and reporting including:
distribution of key attributes, results of simple aggregations
relations between pairs or small numbers of attributes
properties of significant sub-populations, simple statistical analyses
- may address directly the data mining goals
- may contribute to or refine the data description and quality reports
- may feed into the transformation and other data preparation needed
• Verify data quality
- examine the quality of the data, addressing questions such as:
“Is the data complete?”, Are there missing values in the data?”
19
Phase 3. Data Preparation
•Takes usually over 90% of the time
- Collection
- Assessment
- Consolidation and Cleaning
- Data selection
- Transformations
Covers all activities to construct the final dataset from the initial raw data.
Data preparation tasks are likely to be performed multiple times and not in
any prescribed order. Tasks include table, record and attribute selection as
well as transformation and cleaning of data for modeling tools.
20
Phase 3. Data Preparation
• Select data
- decide on the data to be used for analysis
- criteria include relevance to the data mining goals, quality and technical
constraints such as limits on data volume or data types
- covers selection of attributes as well as selection of records in a table
• Clean data
- raise the data quality to the level required by the selected analysis
techniques
- may involve selection of clean subsets of the data, the insertion of
suitable defaults or more ambitious techniques such as the estimation
of missing data by modeling
21
Phase 3. Data Preparation
• Construct data
- constructive data preparation operations such as the production of
derived attributes, entire new records or transformed values for existing
attributes
• Integrate data
- methods whereby information is combined from multiple tables or
records to create new records or values
• Format data
- formatting transformations refer to primarily syntactic modifications
made to the data that do not change its meaning, but might be required
by the modeling tool
22
Phase 4. Modeling
• Select the modeling technique
(based upon the data mining objective)
• Build model
(Parameter settings)
• Assess model (rank the models)
Various modeling techniques are selected and applied and their parameters
are calibrated to optimal values. Some techniques have specific requirements
on the form of data. Therefore, stepping back to the data preparation phase
is often necessary.
23
Phase 4. Modeling
• Select modeling technique
- select the actual modeling technique that is to be used
ex) decision tree, neural network
- if multiple techniques are applied, perform this task for each techniques
separately
• Generate test design
- before actually building a model, generate a procedure or mechanism
to test the model’s quality and validity
ex) In classification, it is common to use error rates as quality measures
for data mining models. Therefore, typically separate the dataset into
train and test set, build the model on the train set and estimate its
quality on the separate test set
24
Phase 4. Modeling
• Build model
- run the modeling tool on the prepared dataset to create one or more
models
• Assess model
- interprets the models according to his domain knowledge, the data
mining success criteria and the desired test design
- judges the success of the application of modeling and discovery
techniques more technically
- contacts business analysts and domain experts later in order to discuss
the data mining results in the business context
- only consider models whereas the evaluation phase also takes into
account all other results that were produced in the course of the project
25
Phase 5. Evaluation
• Evaluation of model
- how well it performed on test data
• Methods and criteria
- depend on model type
• Interpretation of model
- important or not, easy or hard depends on algorithm
Thoroughly evaluate the model and review the steps executed to construct
the model to be certain it properly achieves the business objectives. A key
objective is to determine if there is some important business issue that has
not been sufficiently considered. At the end of this phase, a decision on the
use of the data mining results should be reached
26
Phase 5. Evaluation
• Evaluate results
- assesses the degree to which the model meets the business
objectives
- seeks to determine if there is some business reason why this
model is deficient
- test the model(s) on test applications in the real application if
time and budget constraints permit
- also assesses other data mining results generated
- unveil additional challenges, information or hints for future
directions
27
Phase 5. Evaluation
• Review process
- do a more thorough review of the data mining engagement in order to
determine if there is any important factor or task that has somehow
been overlooked
- review the quality assurance issues
ex) “Did we correctly build the model?”
• Determine next steps
- decides how to proceed at this stage
- decides whether to finish the project and move on to deployment if
appropriate or whether to initiate further iterations or set up new data
mining projects
- include analyses of remaining resources and budget that influences the
decisions
28
Phase 6. Deployment
• Determine how the results need to be utilized
• Who needs to use them?
• How often do they need to be used
• Deploy Data Mining results by
Scoring a database, utilizing results as business rules,
interactive scoring on-line
The knowledge gained will need to be organized and presented in a way that
the customer can use it. However, depending on the requirements, the
deployment phase can be as simple as generating a report or as complex as
implementing a repeatable data mining process across the enterprise.
29
Phase 6. Deployment
• Plan deployment
- in order to deploy the data mining result(s) into the business, takes the
evaluation results and concludes a strategy for deployment
- document the procedure for later deployment
• Plan monitoring and maintenance
- important if the data mining results become part of the day-to-day
business and it environment
- helps to avoid unnecessarily long periods of incorrect usage of data
mining results
- needs a detailed on monitoring process
- takes into account the specific type of deployment
30
Phase 6. Deployment
• Produce final report
- the project leader and his team write up a final report
- may be only a summary of the project and its experiences
- may be a final and comprehensive presentation of the data mining
result(s)
• Review project
- assess what went right and what went wrong, what was done well and
what needs to be improved
31
Summary
• Why CRISP-DM?
The data mining process must be reliable and repeatable
by people with little data mining skills
CRISP-DM provides a uniform framework for
- guidelines
- experience documentation
CRISP-DM is flexible to account for differences
- Different business/agency problems
- Different data
32
Questions & Discussion
33
Thank you
very much!!!

More Related Content

What's hot

Executive Search International Presentation Slideshow
Executive Search International Presentation SlideshowExecutive Search International Presentation Slideshow
Executive Search International Presentation Slideshowjlohr
 
Power BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchPower BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchLuciano Vilas Boas
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Accenture company details
Accenture company details Accenture company details
Accenture company details Arabinda Das
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMAshish Chandra Jha
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database ModelsMurassa Gillani
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using rYogesh Khandelwal
 
DataEd Slides: Data Strategy Best Practices
DataEd Slides:  Data Strategy Best PracticesDataEd Slides:  Data Strategy Best Practices
DataEd Slides: Data Strategy Best PracticesDATAVERSITY
 
System Analysis And Design Management Information System
System Analysis And Design Management Information SystemSystem Analysis And Design Management Information System
System Analysis And Design Management Information Systemnayanav
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 

What's hot (20)

Accenture-HRM
Accenture-HRMAccenture-HRM
Accenture-HRM
 
Executive Search International Presentation Slideshow
Executive Search International Presentation SlideshowExecutive Search International Presentation Slideshow
Executive Search International Presentation Slideshow
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Power BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchPower BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics Research
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Analytics in Online Retail
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
 
Accenture company details
Accenture company details Accenture company details
Accenture company details
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
data-mining-tutorial.ppt
data-mining-tutorial.pptdata-mining-tutorial.ppt
data-mining-tutorial.ppt
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DM
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 
DataEd Slides: Data Strategy Best Practices
DataEd Slides:  Data Strategy Best PracticesDataEd Slides:  Data Strategy Best Practices
DataEd Slides: Data Strategy Best Practices
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
System Analysis And Design Management Information System
System Analysis And Design Management Information SystemSystem Analysis And Design Management Information System
System Analysis And Design Management Information System
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Lect 1-2
Lect 1-2Lect 1-2
Lect 1-2
 

Similar to crisp.ppt

Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptxLithal Fragrance
 
Crisp dm
Crisp dmCrisp dm
Crisp dmakbkck
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptAsadkhan47384
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxellamangapis2003
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdffathiah5
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Perficient
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]ssuser23e4f31
 
Bda life cycle slideshare
Bda life cycle   slideshareBda life cycle   slideshare
Bda life cycle slideshareSathyaseelanK1
 
gis project planning and management
gis project planning and managementgis project planning and management
gis project planning and managementAbhiram Kanigolla
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...TEST Huddle
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 

Similar to crisp.ppt (20)

Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
Bda life cycle slideshare
Bda life cycle   slideshareBda life cycle   slideshare
Bda life cycle slideshare
 
Datascience methodology
Datascience methodologyDatascience methodology
Datascience methodology
 
gis project planning and management
gis project planning and managementgis project planning and management
gis project planning and management
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

crisp.ppt

  • 1. 1 CRISP-DM CSE634 Data Mining Prof. Anita Wasilewska Jae Hong Kil (105228510)
  • 2. 2 References • Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz, (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler) “CRISP-DM 1.0 - Step-by-step data mining guide” • P. Gonzalez-Aranda, E.Menasalvas, S.Millan, F. Segovia “Towards a Methodology for Data Mining Project Development: The Importance of Abstraction” • Laura Squier “What is Data Mining?” PPT • “The CRISP-DM Model: The New Blueprint for DataMining”, Colin Shearer, JOURNAL of Data Warehousing, Volume 5, Number 4, p. 13-22, 2000
  • 4. 4 Overview • Introduction to CRISP-DM • Phases and Tasks • Summary
  • 6. 6 Why Should There be a Standard Process? The data mining process must be reliable and repeatable by people with little data mining background.
  • 7. 7 Why Should There be a Standard Process? • Framework for recording experience – Allows projects to be replicated • Aid to project planning and management • “Comfort factor” for new adopters – Demonstrates maturity of Data Mining – Reduces dependency on “stars”
  • 8. 8 Process Standardization • Initiative launched in late 1996 by three “veterans” of data mining market. Daimler Chrysler (then Daimler-Benz), SPSS (then ISL) , NCR • Developed and refined through series of workshops (from 1997-1999) • Over 300 organization contributed to the process model • Published CRISP-DM 1.0 (1999) • Over 200 members of the CRISP-DM SIG worldwide - DM Vendors - SPSS, NCR, IBM, SAS, SGI, Data Distilleries, Syllogic, etc. - System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte & Touche, etc. - End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, etc.
  • 9. 9 CRISP-DM • Non-proprietary • Application/Industry neutral • Tool neutral • Focus on business issues – As well as technical analysis • Framework for guidance • Experience base – Templates for Analysis
  • 10. 10 CRISP-DM: Overview • Data Mining methodology • Process Model • For anyone • Provides a complete blueprint • Life cycle: 6 phases
  • 11. 11 CRISP-DM: Phases • Business Understanding Project objectives and requirements understanding, Data mining problem definition • Data Understanding Initial data collection and familiarization, Data quality problems identification • Data Preparation Table, record and attribute selection, Data transformation and cleaning • Modeling Modeling techniques selection and application, Parameters calibration • Evaluation Business objectives & issues achievement evaluation • Deployment Result model deployment, Repeatable data mining process implementation
  • 12. 12 Phases and Tasks Business Understanding Data Understanding Data Preparation Modeling Deployment Evaluation Format Data Integrate Data Construct Data Clean Data Select Data Determine Business Objectives Review Project Produce Final Report Plan Monitering & Maintenance Plan Deployment Determine Next Steps Review Process Evaluate Results Assess Model Build Model Generate Test Design Select Modeling Technique Assess Situation Explore Data Describe Data Collect Initial Data Determine Data Mining Goals Verify Data Quality Produce Project Plan
  • 13. 13 Phase 1. Business Understanding • Statement of Business Objective • Statement of Data Mining Objective • Statement of Success Criteria Focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives
  • 14. 14 Phase 1. Business Understanding • Determine business objectives - thoroughly understand, from a business perspective, what the client really wants to accomplish - uncover important factors, at the beginning, that can influence the outcome of the project - neglecting this step is to expend a great deal of effort producing the right answers to the wrong questions • Assess situation - more detailed fact-finding about all of the resources, constraints, assumptions and other factors that should be considered - flesh out the details
  • 15. 15 Phase 1. Business Understanding • Determine data mining goals - a business goal states objectives in business terminology - a data mining goal states project objectives in technical terms ex) the business goal: “Increase catalog sales to existing customers.” a data mining goal: “Predict how many widgets a customer will buy, given their purchases over the past three years, demographic information (age, salary, city) and the price of the item.” • Produce project plan - describe the intended plan for achieving the data mining goals and the business goals - the plan should specify the anticipated set of steps to be performed during the rest of the project including an initial selection of tools and techniques
  • 16. 16 Phase 2. Data Understanding • Explore the Data • Verify the Quality • Find Outliers Starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypotheses for hidden information.
  • 17. 17 Phase 2. Data Understanding • Collect initial data - acquire within the project the data listed in the project resources - includes data loading if necessary for data understanding - possibly leads to initial data preparation steps - if acquiring multiple data sources, integration is an additional issue, either here or in the later data preparation phase • Describe data - examine the “gross” or “surface” properties of the acquired data - report on the results
  • 18. 18 Phase 2. Data Understanding • Explore data - tackles the data mining questions, which can be addressed using querying, visualization and reporting including: distribution of key attributes, results of simple aggregations relations between pairs or small numbers of attributes properties of significant sub-populations, simple statistical analyses - may address directly the data mining goals - may contribute to or refine the data description and quality reports - may feed into the transformation and other data preparation needed • Verify data quality - examine the quality of the data, addressing questions such as: “Is the data complete?”, Are there missing values in the data?”
  • 19. 19 Phase 3. Data Preparation •Takes usually over 90% of the time - Collection - Assessment - Consolidation and Cleaning - Data selection - Transformations Covers all activities to construct the final dataset from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Tasks include table, record and attribute selection as well as transformation and cleaning of data for modeling tools.
  • 20. 20 Phase 3. Data Preparation • Select data - decide on the data to be used for analysis - criteria include relevance to the data mining goals, quality and technical constraints such as limits on data volume or data types - covers selection of attributes as well as selection of records in a table • Clean data - raise the data quality to the level required by the selected analysis techniques - may involve selection of clean subsets of the data, the insertion of suitable defaults or more ambitious techniques such as the estimation of missing data by modeling
  • 21. 21 Phase 3. Data Preparation • Construct data - constructive data preparation operations such as the production of derived attributes, entire new records or transformed values for existing attributes • Integrate data - methods whereby information is combined from multiple tables or records to create new records or values • Format data - formatting transformations refer to primarily syntactic modifications made to the data that do not change its meaning, but might be required by the modeling tool
  • 22. 22 Phase 4. Modeling • Select the modeling technique (based upon the data mining objective) • Build model (Parameter settings) • Assess model (rank the models) Various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often necessary.
  • 23. 23 Phase 4. Modeling • Select modeling technique - select the actual modeling technique that is to be used ex) decision tree, neural network - if multiple techniques are applied, perform this task for each techniques separately • Generate test design - before actually building a model, generate a procedure or mechanism to test the model’s quality and validity ex) In classification, it is common to use error rates as quality measures for data mining models. Therefore, typically separate the dataset into train and test set, build the model on the train set and estimate its quality on the separate test set
  • 24. 24 Phase 4. Modeling • Build model - run the modeling tool on the prepared dataset to create one or more models • Assess model - interprets the models according to his domain knowledge, the data mining success criteria and the desired test design - judges the success of the application of modeling and discovery techniques more technically - contacts business analysts and domain experts later in order to discuss the data mining results in the business context - only consider models whereas the evaluation phase also takes into account all other results that were produced in the course of the project
  • 25. 25 Phase 5. Evaluation • Evaluation of model - how well it performed on test data • Methods and criteria - depend on model type • Interpretation of model - important or not, easy or hard depends on algorithm Thoroughly evaluate the model and review the steps executed to construct the model to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached
  • 26. 26 Phase 5. Evaluation • Evaluate results - assesses the degree to which the model meets the business objectives - seeks to determine if there is some business reason why this model is deficient - test the model(s) on test applications in the real application if time and budget constraints permit - also assesses other data mining results generated - unveil additional challenges, information or hints for future directions
  • 27. 27 Phase 5. Evaluation • Review process - do a more thorough review of the data mining engagement in order to determine if there is any important factor or task that has somehow been overlooked - review the quality assurance issues ex) “Did we correctly build the model?” • Determine next steps - decides how to proceed at this stage - decides whether to finish the project and move on to deployment if appropriate or whether to initiate further iterations or set up new data mining projects - include analyses of remaining resources and budget that influences the decisions
  • 28. 28 Phase 6. Deployment • Determine how the results need to be utilized • Who needs to use them? • How often do they need to be used • Deploy Data Mining results by Scoring a database, utilizing results as business rules, interactive scoring on-line The knowledge gained will need to be organized and presented in a way that the customer can use it. However, depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise.
  • 29. 29 Phase 6. Deployment • Plan deployment - in order to deploy the data mining result(s) into the business, takes the evaluation results and concludes a strategy for deployment - document the procedure for later deployment • Plan monitoring and maintenance - important if the data mining results become part of the day-to-day business and it environment - helps to avoid unnecessarily long periods of incorrect usage of data mining results - needs a detailed on monitoring process - takes into account the specific type of deployment
  • 30. 30 Phase 6. Deployment • Produce final report - the project leader and his team write up a final report - may be only a summary of the project and its experiences - may be a final and comprehensive presentation of the data mining result(s) • Review project - assess what went right and what went wrong, what was done well and what needs to be improved
  • 31. 31 Summary • Why CRISP-DM? The data mining process must be reliable and repeatable by people with little data mining skills CRISP-DM provides a uniform framework for - guidelines - experience documentation CRISP-DM is flexible to account for differences - Different business/agency problems - Different data