Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DATA MINING PROCESSLecture 21.11.2012Barbro Back
DATA MINING PROCESSES- STANDARDPROCESSES  Crisp   – DM      Cross-Industry Standard Process for Data Mining  Semma    ...
Cross-Industry Standard Process                                       for Data Mining (CRISP-DM)                          ...
CRISP- DM1.    Business Understanding2.    Data Understanding3.    Data Preparation4.    Modeling5.    Evaluation6.    Dep...
1 BUSINESS UNDERSTANDING  Includes:      Determining business objectives        A managerial need for new knowledge    ...
2 DATA UNDERSTANDING  Selectthe data  Three important issues      Set up a concise and clear description of the problem...
SCALES  Nominal  – no order between data points - gender  Ordinal – order between data points – ranking   results  Inte...
3 DATA PREPARATION  Cleandata for better quality  Convert data to be consistent  Treatment of missing values  Redundan...
4 MODELING  Data   treatment      Training set, validation set, test set  Data   mining techniques      Association  ...
5 EVALUATION  How     to recognize the business value from knowledge discovered.      A puzzle to be solved between data...
6 DEPLOYMENT  Resultsneed to be reported to project sponsors  Monitoring for change is important
SEMMA (BY THE SAS INSTITUTE)  Sample  Explore  Modify  Model  Assess  See      http://www.sas.com/offices/europe/uk...
AN APPLICATION EXAMPLE                (CRISP – DM)  Topredict which customers would be insolvent early  enough for the fi...
EXAMPLE CONT.  Data 100 000 customers  17 month period  Discriminant   Analysis, decision trees and neural   networks w...
Upcoming SlideShare
Loading in …5
×

Lecture 2 1_11_2012_data_mining_process

390 views

Published on

data mining

Published in: Education
  • Be the first to comment

  • Be the first to like this

Lecture 2 1_11_2012_data_mining_process

  1. 1. DATA MINING PROCESSLecture 21.11.2012Barbro Back
  2. 2. DATA MINING PROCESSES- STANDARDPROCESSES  Crisp – DM   Cross-Industry Standard Process for Data Mining  Semma   Is specific to SAS
  3. 3. Cross-Industry Standard Process for Data Mining (CRISP-DM) provides an overview of the life cycle of a data mining project. Six phases: Business understanding Data understanding Data preparation Modeling Evaluation DeploymentPhases of the CRISP-DM Process Model
  4. 4. CRISP- DM1.  Business Understanding2.  Data Understanding3.  Data Preparation4.  Modeling5.  Evaluation6.  Deployment
  5. 5. 1 BUSINESS UNDERSTANDING  Includes:   Determining business objectives  A managerial need for new knowledge  What types of customers are interested in each of our products?  What are typical profiles of our customers and how much value do each of them provide to us   Assessing the current situation   Establishing data mining goals   Developing a project plan including a budget
  6. 6. 2 DATA UNDERSTANDING  Selectthe data  Three important issues   Set up a concise and clear description of the problem   Identify the relevant data for the problem description   (The selected variables should be independent of each other, depends on the method)  Types of data   Demographic data – income, education, gender etc   Socio-graphic data – hobbies, club memberships etc   Transactional data –sales records, credit card spending etc.   Quantitative data – numerical values   Qualitative data – contains nominal and ordinal data
  7. 7. SCALES  Nominal – no order between data points - gender  Ordinal – order between data points – ranking results  Interval – order between data points and equal distances between measurements – no true zero point  Ratio – an interval scale with a true zero point – Sales has doubled - sales previous month 1 milj., this month 2 milj.  Question: Is the Likert scale an ordinal or interval scale?
  8. 8. 3 DATA PREPARATION  Cleandata for better quality  Convert data to be consistent  Treatment of missing values  Redundant data  Determine the data types:   In SPSS Modeler the following data types are used   RANGE Numeric values (integer, real)   FLAG Binary (yes/no, 0/1)   SET Data with distinct multiple values, (string)   TYPELESS For other types of data
  9. 9. 4 MODELING  Data treatment   Training set, validation set, test set  Data mining techniques   Association   Classification   Clustering-segmentation   Predictions   Sequential Patterns   Similar Time Sequences
  10. 10. 5 EVALUATION  How to recognize the business value from knowledge discovered.   A puzzle to be solved between data analysts, business analysts and decision makers  Which visualization tool to use   Pie charts, histograms, box plots, scatter plots, self- organizing maps
  11. 11. 6 DEPLOYMENT  Resultsneed to be reported to project sponsors  Monitoring for change is important
  12. 12. SEMMA (BY THE SAS INSTITUTE)  Sample  Explore  Modify  Model  Assess  See   http://www.sas.com/offices/europe/uk/technologies/ analytics/datamining/miner/semma.html
  13. 13. AN APPLICATION EXAMPLE (CRISP – DM)  Topredict which customers would be insolvent early enough for the firm to take preventive actions  Billing period was 2 months  Customers used their phone for 4 weeks  Received bill about 1 week later  Payment was due 30 days after receiving the bill  Actions if bill not paid before 14 days after due date.  Phone disconnected if bill exceeded a certain amount  Hypothesis: Customer’s change their calling behaviour before becoming insolvent
  14. 14. EXAMPLE CONT.  Data 100 000 customers  17 month period  Discriminant Analysis, decision trees and neural networks were used  2066 cases  46 initial variables  Costs were allocated to misclassification errors  Final result:  89.8 % correctly classified with test data and a cost function = 360 € compared to 14 580 € in the first run.

×