Predictive Analytics: Advanced techniques in data mining

4,712 views
4,451 views

Published on

Presented at SAS Business Analytics 2011 event in Singapore.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,712
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Predictive Analytics: Advanced techniques in data mining

  1. 1. Predictive AnalyticsAdvanced Techniques in Data MiningSara Venturina Copyright © 2011, SAS Institute Inc. All rights reserved.
  2. 2. Agenda• What is predictive analytics?• Predictive Analytics Process• Data Preparation techniques• Modeling Techniques• Model Monitoring techniques 2 Copyright © 2011, SAS Institute Inc. All rights reserved.
  3. 3. What is Predictive Analytics?Different levels of analytics Forecasting Predictive modeling Optimization Statistical analysis Query drilldown Alerts (or OLAP) Ad hoc reportsStandardreports 3 Copyright © 2011, SAS Institute Inc. All rights reserved.
  4. 4. What is Predictive Analytics?Unfortunately, there is no “magic” involved!• Use of data from different source tables• Utilizing various data transformation techniques• Employing statistical theories as foundation• Will need software to manage thisFocus on business/commercial (as opposed to research) analytics is trickier as you need to balance the theories with realistic application 4 Copyright © 2011, SAS Institute Inc. All rights reserved.
  5. 5. Predictive Analytics Process Defining Objectives Model Data Monitoring Preparation Predictive Analytics Process Deployment Modeling 5 Copyright © 2011, SAS Institute Inc. All rights reserved.
  6. 6. Data Preparation Techniques• Possible data sources• Data transformation techniques• Deriving “behavioral” information• Data quality check before modeling 6 Copyright © 2011, SAS Institute Inc. All rights reserved.
  7. 7. Data Preparation TechniquesPossible data sources• Data warehouse/ data marts• Operational systems i.e. transaction systems, billing, call center data, etc• External data i.e. survey data, campaign, data from external agencies, etcFor external data make sure information is consistently available 7 Copyright © 2011, SAS Institute Inc. All rights reserved.
  8. 8. Data Preparation TechniquesData transformation techniques• Entity-level information• Indicator variables • Are values skewed towards 1 level?• Categorization/grouping of values • Is there too many levels of values? • Are there values that rarely occur?• Binning of continuous variables• Benchmarking information, i.e. industry benchmarking 8 Copyright © 2011, SAS Institute Inc. All rights reserved.
  9. 9. Data Preparation TechniquesDeriving “behavioral” information using several time periods• Average behavior over the last X time periods• Measures of variation • Standard deviation • Coefficient of Variation • Deviation from the Mean• Measures of trend information • Ratio of 1 vs 3, 3 vs 6 time periods • Proportion of Current vs Average of last X time periods • Slope of regression line 9 Copyright © 2011, SAS Institute Inc. All rights reserved.
  10. 10. Data Preparation TechniquesData quality check before modeling• Generation of summary statistics of derived variables• Random checking• Correct imputation of missing values 10 Copyright © 2011, SAS Institute Inc. All rights reserved.
  11. 11. Modeling Techniques• Use of SAS Enterprise Miner• Ensemble modeling outside of SAS• Base SAS modeling i.e. for categorical target, survival analysis, etc 11 Copyright © 2011, SAS Institute Inc. All rights reserved.
  12. 12. Modeling TechniquesUse of SAS Enterprise Miner For initial /basic modeling, use Decision Tree, Regression. Neural networks can be used to provide diagnostic insights 12 Copyright © 2011, SAS Institute Inc. All rights reserved.
  13. 13. Modeling TechniquesEnsemble modeling in and out of SAS EM Ensemble Models based on the Weightage following models Model 1 Decision 0.4 Model 2 Regression 0.6 Model 3 Regression 0.4 13 Copyright © 2011, SAS Institute Inc. All rights reserved.
  14. 14. Modeling TechniquesBase SAS modeling• Categorical data modeling i.e. • PROC CATMOD/GENMOD • PROC SURVEYLOGISTIC• Survival analysis: • PROC LIFEREG • PROC LIFETEST • PROC PHREGBase SAS modeling requires more familiarity with underlying statistical concepts 14 Copyright © 2011, SAS Institute Inc. All rights reserved.
  15. 15. Model Monitoring Techniques• Comparing actual vs predicted• Scored base analysis: • Variable distribution analysis • Predicted Score distribution 15 Copyright © 2011, SAS Institute Inc. All rights reserved.
  16. 16. Model MonitoringMonitoring of model assessment charts i.e. measures what percentage of all churners Compares the effectiveness of running a are in the scoring list (i.e. top 10% scores model versus selecting randomly captured 40% of actual churners)Other model assessment statistics can be computed such as hit rate, Gini coefficient, etc 16 Copyright © 2011, SAS Institute Inc. All rights reserved.
  17. 17. Model Monitoring (cont’d)Scored base analysis i.e.• Variable distribution analysis 17 Copyright © 2011, SAS Institute Inc. All rights reserved.
  18. 18. Model Monitoring (cont’d)Scored base analysis i.e.• Predicted Score distribution 18 Copyright © 2011, SAS Institute Inc. All rights reserved.
  19. 19. Predictive Analytics as an Iterative Process Defining Objectives Model Data Monitoring Preparation Predictive Analytics Process Deployment Modeling 19 Copyright © 2011, SAS Institute Inc. All rights reserved.
  20. 20. Questions? 20 20 Copyright © 2011, SAS Institute Inc. All rights reserved.
  21. 21. 21 21Copyright © 2011, SAS Institute Inc. All rights reserved.
  22. 22. Copyright © 2011, SAS Institute Inc. All rights reserved.

×