SlideShare a Scribd company logo
DATA MINING PROCESS
Lecture 2
1.11.2012
Barbro Back
DATA MINING PROCESSES- STANDARD
PROCESSES
  Crisp   – DM
      Cross-Industry Standard Process for Data Mining
  Semma
      Is specific to SAS
Cross-Industry Standard Process
                                       for Data Mining (CRISP-DM)
                                       provides an overview of the life
                                       cycle of a data mining project.

                                       Six phases:
                                       Business understanding
                                       Data understanding
                                       Data preparation
                                       Modeling
                                       Evaluation
                                       Deployment

Phases of the CRISP-DM Process Model
CRISP- DM
1.    Business Understanding
2.    Data Understanding
3.    Data Preparation
4.    Modeling
5.    Evaluation
6.    Deployment
1 BUSINESS UNDERSTANDING
  Includes:
      Determining business objectives
        A managerial need for new knowledge
           What types of customers are interested in each of our

            products?
           What are typical profiles of our customers and how much

            value do each of them provide to us
      Assessing the current situation
      Establishing data mining goals
      Developing a project plan including a budget
2 DATA UNDERSTANDING
  Selectthe data
  Three important issues
      Set up a concise and clear description of the problem
      Identify the relevant data for the problem description
      (The selected variables should be independent of each
       other, depends on the method)
  Types    of data
      Demographic data – income, education, gender etc
      Socio-graphic data – hobbies, club memberships etc
      Transactional data –sales records, credit card spending etc.
      Quantitative data – numerical values
      Qualitative data – contains nominal and ordinal data
SCALES
  Nominal  – no order between data points - gender
  Ordinal – order between data points – ranking
   results
  Interval – order between data points and equal
   distances between measurements – no true zero
   point
  Ratio – an interval scale with a true zero point –
   Sales has doubled - sales previous month 1 milj.,
   this month 2 milj.

  Question: Is the Likert scale an ordinal or
  interval scale?
3 DATA PREPARATION
  Cleandata for better quality
  Convert data to be consistent

  Treatment of missing values

  Redundant data

  Determine the data types:
      In SPSS Modeler the following data types are used
      RANGE Numeric values (integer, real)
      FLAG Binary (yes/no, 0/1)
      SET Data with distinct multiple values, (string)
      TYPELESS For other types of data
4 MODELING
  Data   treatment
      Training set, validation set, test set
  Data   mining techniques
      Association
      Classification
      Clustering-segmentation
      Predictions
      Sequential Patterns
      Similar Time Sequences
5 EVALUATION
  How
     to recognize the business value from
 knowledge discovered.
      A puzzle to be solved between data analysts, business
       analysts and decision makers
  Which    visualization tool to use
      Pie charts, histograms, box plots, scatter plots, self-
       organizing maps
6 DEPLOYMENT
  Resultsneed to be reported to project sponsors
  Monitoring for change is important
SEMMA (BY THE SAS INSTITUTE)
  Sample

  Explore

  Modify

  Model

  Assess

  See
      http://www.sas.com/offices/europe/uk/technologies/
       analytics/datamining/miner/semma.html
AN APPLICATION EXAMPLE                (CRISP – DM)
  Topredict which customers would be insolvent early
  enough for the firm to take preventive actions
  Billing
         period was 2 months
  Customers used their phone for 4 weeks

  Received  bill about 1 week later
  Payment was due 30 days after receiving the bill

  Actions if bill not paid before 14 days after due date.

  Phone disconnected if bill exceeded a certain amount
  Hypothesis: Customer’s change their calling
   behaviour before becoming insolvent
EXAMPLE CONT.
  Data 100 000 customers
  17 month period



  Discriminant   Analysis, decision trees and neural
   networks were used
  2066 cases

  46 initial variables

  Costs were allocated to misclassification errors


  Final result:
  89.8 % correctly classified with test data and a cost
   function = 360 € compared to 14 580 € in the first
   run.

More Related Content

What's hot

Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
Shyaamini Balu
 
Business Intelligence Introduction
Business Intelligence IntroductionBusiness Intelligence Introduction
Business Intelligence Introduction
Amr Ali
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
Lisette ZOUNON
 
Subscriber Data Mining in Telecommunication
Subscriber Data Mining in TelecommunicationSubscriber Data Mining in Telecommunication
Subscriber Data Mining in TelecommunicationNarayan Kandel
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
skewdlogix
 
Business Intelligence Module 1
Business Intelligence Module 1Business Intelligence Module 1
Business Intelligence Module 1
Home
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
grinu
 
Business Intelligence - Conceptual Introduction
Business Intelligence - Conceptual IntroductionBusiness Intelligence - Conceptual Introduction
Business Intelligence - Conceptual Introduction
Ahmed Rami Elsherif, PMP, ITBMC
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicago
KM Chicago
 
Data mining
Data miningData mining
Data mining
Gagan Mittal
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
ijsrd.com
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3
Home
 
USE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTORUSE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTOR
arpit bhadoriya
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Hoang Nguyen
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2
Home
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
ijcsit
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNGDivya Tadi
 

What's hot (20)

Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
 
Business Intelligence Introduction
Business Intelligence IntroductionBusiness Intelligence Introduction
Business Intelligence Introduction
 
Unit ii data analytics
Unit ii data analytics Unit ii data analytics
Unit ii data analytics
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
 
Subscriber Data Mining in Telecommunication
Subscriber Data Mining in TelecommunicationSubscriber Data Mining in Telecommunication
Subscriber Data Mining in Telecommunication
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Business Intelligence Module 1
Business Intelligence Module 1Business Intelligence Module 1
Business Intelligence Module 1
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
Business Intelligence - Conceptual Introduction
Business Intelligence - Conceptual IntroductionBusiness Intelligence - Conceptual Introduction
Business Intelligence - Conceptual Introduction
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicago
 
Data mining
Data miningData mining
Data mining
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3
 
USE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTORUSE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTOR
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
 

Similar to Lecture 2 1_11_2012_data_mining_process

Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)sadam33146
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
M Baddar
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
Ashish Chandra Jha
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
tobiemuir
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
Asadkhan47384
 
Kdd crisp-semma
Kdd crisp-semmaKdd crisp-semma
Kdd crisp-semma
AndresEspeso1
 
Analytics
AnalyticsAnalytics
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
ShivanandaVSeeri
 
Data mining
Data miningData mining
Data mining
sagar dl
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
 
Customer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crmCustomer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crm
Ganesha Pandian
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
fathiah5
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
Pedro Martins
 
Erp related technologies
Erp related technologiesErp related technologies
Erp related technologies
Lalit Singh
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 

Similar to Lecture 2 1_11_2012_data_mining_process (20)

Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
 
Kdd crisp-semma
Kdd crisp-semmaKdd crisp-semma
Kdd crisp-semma
 
Analytics
AnalyticsAnalytics
Analytics
 
Data mining knowing the unknown
Data mining knowing the unknownData mining knowing the unknown
Data mining knowing the unknown
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data mining
Data miningData mining
Data mining
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Customer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crmCustomer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crm
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
Erp related technologies
Erp related technologiesErp related technologies
Erp related technologies
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 

Recently uploaded

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 

Recently uploaded (20)

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 

Lecture 2 1_11_2012_data_mining_process

  • 1. DATA MINING PROCESS Lecture 2 1.11.2012 Barbro Back
  • 2. DATA MINING PROCESSES- STANDARD PROCESSES   Crisp – DM   Cross-Industry Standard Process for Data Mining   Semma   Is specific to SAS
  • 3. Cross-Industry Standard Process for Data Mining (CRISP-DM) provides an overview of the life cycle of a data mining project. Six phases: Business understanding Data understanding Data preparation Modeling Evaluation Deployment Phases of the CRISP-DM Process Model
  • 4. CRISP- DM 1.  Business Understanding 2.  Data Understanding 3.  Data Preparation 4.  Modeling 5.  Evaluation 6.  Deployment
  • 5. 1 BUSINESS UNDERSTANDING   Includes:   Determining business objectives  A managerial need for new knowledge  What types of customers are interested in each of our products?  What are typical profiles of our customers and how much value do each of them provide to us   Assessing the current situation   Establishing data mining goals   Developing a project plan including a budget
  • 6. 2 DATA UNDERSTANDING   Selectthe data   Three important issues   Set up a concise and clear description of the problem   Identify the relevant data for the problem description   (The selected variables should be independent of each other, depends on the method)   Types of data   Demographic data – income, education, gender etc   Socio-graphic data – hobbies, club memberships etc   Transactional data –sales records, credit card spending etc.   Quantitative data – numerical values   Qualitative data – contains nominal and ordinal data
  • 7. SCALES   Nominal – no order between data points - gender   Ordinal – order between data points – ranking results   Interval – order between data points and equal distances between measurements – no true zero point   Ratio – an interval scale with a true zero point – Sales has doubled - sales previous month 1 milj., this month 2 milj.   Question: Is the Likert scale an ordinal or interval scale?
  • 8. 3 DATA PREPARATION   Cleandata for better quality   Convert data to be consistent   Treatment of missing values   Redundant data   Determine the data types:   In SPSS Modeler the following data types are used   RANGE Numeric values (integer, real)   FLAG Binary (yes/no, 0/1)   SET Data with distinct multiple values, (string)   TYPELESS For other types of data
  • 9. 4 MODELING   Data treatment   Training set, validation set, test set   Data mining techniques   Association   Classification   Clustering-segmentation   Predictions   Sequential Patterns   Similar Time Sequences
  • 10. 5 EVALUATION   How to recognize the business value from knowledge discovered.   A puzzle to be solved between data analysts, business analysts and decision makers   Which visualization tool to use   Pie charts, histograms, box plots, scatter plots, self- organizing maps
  • 11. 6 DEPLOYMENT   Resultsneed to be reported to project sponsors   Monitoring for change is important
  • 12. SEMMA (BY THE SAS INSTITUTE)   Sample   Explore   Modify   Model   Assess   See   http://www.sas.com/offices/europe/uk/technologies/ analytics/datamining/miner/semma.html
  • 13. AN APPLICATION EXAMPLE (CRISP – DM)   Topredict which customers would be insolvent early enough for the firm to take preventive actions   Billing period was 2 months   Customers used their phone for 4 weeks   Received bill about 1 week later   Payment was due 30 days after receiving the bill   Actions if bill not paid before 14 days after due date.   Phone disconnected if bill exceeded a certain amount   Hypothesis: Customer’s change their calling behaviour before becoming insolvent
  • 14. EXAMPLE CONT.   Data 100 000 customers   17 month period   Discriminant Analysis, decision trees and neural networks were used   2066 cases   46 initial variables   Costs were allocated to misclassification errors   Final result:   89.8 % correctly classified with test data and a cost function = 360 € compared to 14 580 € in the first run.