Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Webinar: 10 best                    practices in                    operationalJames Taylor,                      analytic...
Your presenters James Taylor   CEO of Decision Management Solutions. James works with   clients to improve their business ...
AGENDA         1   Introducing             Operational             Analytics         2   The 10 Best             Practices...
The 10 Best Practices1.    Be flexible; data mining is not a set of rules!2.    Avoid 3 key data preparation, modeling mis...
IntroducingOperational Analytics             ©2011 Decision Management Solutions                                          ...
Analytics have power        Online     Acquisition           Campaign      Conversion     Rates               Response    ...
And that power is operational How do I…  prevent this customer from churning?  convert this visitor?  acquire this prospec...
Multiplying the power of analytics    Type  Strategy   TacticsOperations           Low     Economic impact                ...
Operational decisions matter  “Most discussions of decision making assume   that only senior executives make decisions    ...
10 Best Practices        ©2011 Decision Management Solutions                                         10
Be Flexible: Data Mining isNot a Series of RecipesData Mining Project Entry   Points:1) Business Understanding2) Data Unde...
Avoid The Three BiggestData Preparation Mistakes1.   Don’t blindly use data mining software     defaults     –   Missing d...
Some Software Fills Missing Values Automatically Common  automated missing value imputation:  – 0, mid-point, mean, or   ...
Avoid The Three BiggestData Preparation Mistakes2.   Don’t forget some algorithms assume the     distributions for data   ...
How Non-normality affectsRegression ModelsRegression models―fit‖ is worse withskewed (non-normal) data    – In example at...
Avoid The Three BiggestData Preparation Mistakes2.   Don’t forget some algorithms assume the     distributions for data   ...
Avoid The Three BiggestData Preparation Mistakes2.   Don’t forget some algorithms assume the     distributions for data   ...
Avoid The Three Biggest DataPreparation Mistakes3.   Don’t assume algorithms can ―figure out‖     patterns on their own   ...
What are Model Ensembles?   Combining outputs from multiple models into single    decision   Models can be created using...
Motivation for Ensembles   Performance, performance, performance   Single model sometimes provide insufficient    accura...
Four Keys to EffectiveEnsembling   Diversity of opinion   Independence   Decentralization   Aggregation   From The Wi...
Bagging   Bagging Method     – Create many data sets by       bootstrapping (can also do this       with cross validation...
Boosting (Adaboost) Boosting   Method  – Creating tree using training data set             Reweight                      ...
Random Forest Ensembles Random    Forest (RF) Method   – Exact same methodology as     Bagging, but with a twist   – At e...
Model Ensembles: The Good and the Bad Pro  – Can significantly reduce model error  – Can be easy to automate -- already h...
Ensembles of Trees: Smoothers            Ensembles        smooth jagged decision boundariesPicture fromT.G. Dietterich. E...
Heterogeneous ModelEnsembles on Glass Data                      Model prediction diversity                       obtained...
The Conflict withData Mining Algorithm Objectives Algorithm Objectives  – Linear Regression and    Neural networks minimiz...
The Conflict withData Mining Algorithm Objectives Algorithm Objectives             Business Objectives  – Linear Regressio...
Possible Solutions to Business Objective      / Data Mining Objective Mismatch     Model Ranking Metric               Mode...
Model Comparison Example:Rankings Tell Different Stories Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC C...
Model Deployment Methods   In data mining software application itself     – Pro: Easy--same processing done as in buildin...
Sample PMML Code                   33
Typical Predictive Model    Deployment Processing Flow                                 Select    Clean Data          Impor...
Knowing is not enough     Those who know first, win      Those who ACT first, win    Provided they act intelligently      ...
Avoid the insight-to-action gap                        ©2011 Decision Management Solutions   36
Analytic insights must drive action                      ?                        ©2011 Decision Management Solutions   37
Business rules drive decisions                Decision                              Regulations   Policy    History       ...
Three legged stools need three legs                        ©2011 Decision Management Solutions   39
Operational decisions at the center                  Business                             ©2011 Decision Management Soluti...
Monitoring and compliance                      ©2011 Decision Management Solutions   41
Scorecards are a powerful tool        Years Under Contract                  Years Under Contract 1                   0    ...
Why use a scorecard?Reason Codes                           Simplicity•Return the most important             •Easy to use a...
Continuous improvement                    ©2011 Decision Management Solutions   44
Continuous improvement                    ©2011 Decision Management Solutions   45
Don’t start by focusing on the data                                                                Better                 ...
Start by focusing on the value                                                                   Better                   ...
Wrap Up
The 10 Best Practices1.   Be flexible; data mining is not a set of rules!2.   Avoid 3 key data preparation, modeling     m...
Action Plan              Identify your decisions                 before analytics              Adopt business rules to    ...
Let us know if we can help Decision Management Solutions can help you  Focus on the right decisions  Implement a blueprint...
Thank you!                      James Taylor, CEO   james@decisionmanagementsolutions.comwww.decisionmangementsolutions.co...
Upcoming SlideShare
Loading in …5
×

10 best practices in operational analytics

38,115 views

Published on

One of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.

This is the slide deck from the Webinar. James Taylor, CEO of Decision Management Solutions, and Dean Abbott of Abbott Analytics discuss 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems. webinar recording available here: https://decisionmanagement.omnovia.com/archives/70931

Published in: Technology
  • A 7 Time Lotto Winner Stepped Up to Share His Secrets With YOU ♥♥♥ http://t.cn/Airf5UFH
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The "Magical" Transformation That Happens When Two Brain Technologies Combine! ▲▲▲ http://t.cn/AiuvUCDd
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • How to Manifest Anything You Want in 24 hours ■■■ http://scamcb.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • We called it "operation mind control" - as we discovered a simple mind game that makes a girl become obsessed with you. (Aand it works even if you're not her type or she's already dating someone else) Here's how we figured it out... ◆◆◆ http://t.cn/AijLRbnO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • How to Manifest Anything You Want in 24 hours ♥♥♥ https://bit.ly/30Ju5r6
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

10 best practices in operational analytics

  1. 1. Webinar: 10 best practices in operationalJames Taylor, analytics CEO
  2. 2. Your presenters James Taylor CEO of Decision Management Solutions. James works with clients to improve their business by applying analytics and analytic technology to automate and improve decisions. He has spent the last 8 years developing the concept of Decision Management and has 20 years experience in all aspects of software. Dean Abbott Owner of Abbott Analytics. Dean has applied Data Mining and Predictive Analytics for 22 years and provides mentoring, coaching, and solutions for Web Analytics, Compliance, Fraud Detection, Survey Analysis, Text Mining, Marketing and CRM analytics and more. Dean has partnerships with the largest predictive analytics organizations in the US. ©2011 Decision Management Solutions 2
  3. 3. AGENDA 1 Introducing Operational Analytics 2 The 10 Best Practices 3 Wrap up
  4. 4. The 10 Best Practices1. Be flexible; data mining is not a set of rules!2. Avoid 3 key data preparation, modeling mistakes3. Diversity is strength: build lots of models4. Pick the right metric to assess models5. Have deployment in mind when building models6. Focus on actions7. The three legged stool8. Focus on explicability9. Build in decision analysis10. BWTDIM ©2011 Decision Management Solutions 4
  5. 5. IntroducingOperational Analytics ©2011 Decision Management Solutions 5
  6. 6. Analytics have power Online Acquisition Campaign Conversion Rates Response Risk Customer Fraud Churn ©2011 Decision Management Solutions 6
  7. 7. And that power is operational How do I… prevent this customer from churning? convert this visitor? acquire this prospect? make this offer compelling to this person? identify this claim as fraudulent? correctly estimate the risk of this loan? It’s not about “aha” moments It’s about making better operational decisions ©2011 Decision Management Solutions 7
  8. 8. Multiplying the power of analytics Type Strategy TacticsOperations Low Economic impact High ©2011 Decision Management Solutions 8
  9. 9. Operational decisions matter “Most discussions of decision making assume that only senior executives make decisions or that only senior executives’ decisions matter. This is a dangerous mistake.” Peter Drucker ©2011 Decision Management Solutions 9
  10. 10. 10 Best Practices ©2011 Decision Management Solutions 10
  11. 11. Be Flexible: Data Mining isNot a Series of RecipesData Mining Project Entry Points:1) Business Understanding2) Data Understanding Business Data Understanding UnderstandingData Mining Project Next Data Data Preparation Steps: Data Deployment Data1) Data Understanding Modeling2) Modeling, then Data Preparation Evaluation3) Data Preparation, then Data Understanding, then Modeling 11
  12. 12. Avoid The Three BiggestData Preparation Mistakes1. Don’t blindly use data mining software defaults – Missing data  Is the record with missing values in one of the fields kept at all?  What value is filled in? What effect will this have? – Exploding categorical variables with large numbers of values – what happens to the models? 12
  13. 13. Some Software Fills Missing Values Automatically Common automated missing value imputation: – 0, mid-point, mean, or listwise deletion Example at upper right has 5300+ records, 17 missing values encoded as ―0‖ Afterfixing model with mean imputation, R^2 rises from 0.597 to 0.657 13
  14. 14. Avoid The Three BiggestData Preparation Mistakes2. Don’t forget some algorithms assume the distributions for data – Some algorithms assume normally distributed data: linear regression, Bayes and Nearest Mean classifiers 14
  15. 15. How Non-normality affectsRegression ModelsRegression models―fit‖ is worse withskewed (non-normal) data – In example at right, by simply applying the log transform, performance is improved from R^2=0.566 to 0.597 15
  16. 16. Avoid The Three BiggestData Preparation Mistakes2. Don’t forget some algorithms assume the distributions for data – Some algorithms assume normally distributed data: linear regression, Bayes and Nearest Mean classifiers – Distance-based algorithms are strongly influenced by outliers and skewed distributions: k-Nearest Neighbor, k-Means, the above algorithms 16
  17. 17. Avoid The Three BiggestData Preparation Mistakes2. Don’t forget some algorithms assume the distributions for data – Some algorithms assume normally distributed data: linear regression, Bayes and Nearest Mean classifiers – Distance-based algorithms are strongly influenced by outliers and skewed distributions: k-Nearest Neighbor, k-Means, the above algorithms – Some algorithms require categorical data (rather than numeric): Naïve Bayes, CHAID, Apriori 17
  18. 18. Avoid The Three Biggest DataPreparation Mistakes3. Don’t assume algorithms can ―figure out‖ patterns on their own – Features fix data distribution problems – Features present data (information) to modeling algorithms in ways they perhaps can never identify themselves  Interactions, record-connecting and temporal features, non-linear transformations 18
  19. 19. What are Model Ensembles? Combining outputs from multiple models into single decision Models can be created using the same algorithm, or several different algorithms Decision Logic Ensemble Prediction 19
  20. 20. Motivation for Ensembles Performance, performance, performance Single model sometimes provide insufficient accuracy – Neural networks become stuck in local minima – Decision trees run out of data – Single algorithms keep pushing performance using the same ideas (basis function / algorithm), and are incapable of ―thinking outside of their box‖ Often, different algorithms achieve the same level of accuracy but on different cases—they identify different ways to get the same level of accuracy 20
  21. 21. Four Keys to EffectiveEnsembling Diversity of opinion Independence Decentralization Aggregation From The Wisdom of Crowds, James Surowiecki 21
  22. 22. Bagging Bagging Method – Create many data sets by bootstrapping (can also do this with cross validation) – Create one decision tree for each data set – Combine decision trees by averaging (or voting) final decisions – Primarily reduces model variance rather than bias Results Final – On average, better than any Answer individual tree (average) 22
  23. 23. Boosting (Adaboost) Boosting Method – Creating tree using training data set Reweight examples – Score each data point, indicating when each where incorrect decision is made (errors) classification – Retrain, giving rows with incorrect decisions incorrect more weight. Repeat Combine – Final prediction is a weighted average of all models via models-> model regularization. weighted sum – Best to create ―weak‖ models—simple models (just a few splits for a decision tree) and let the boosting iterations find the complexity. – Often used with trees or Naïve Bayes Results – Usually better than individual tree or Bagging 23
  24. 24. Random Forest Ensembles Random Forest (RF) Method – Exact same methodology as Bagging, but with a twist – At each split, rather than using the entire set of candidate inputs, use a random subset of candidate inputs – Generates diversity of samples and inputs (splits) Results – On average, better than any Final individual tree, Bagging, or even Answer Boosting (average) 24
  25. 25. Model Ensembles: The Good and the Bad Pro – Can significantly reduce model error – Can be easy to automate -- already has been done in many commercial tools using Boosting, Bagging, ARCing, RF Con – Model interpretability is lost (if there was any) – If not done automatically, can be very time consuming to generate dozens of models to combine 25
  26. 26. Ensembles of Trees: Smoothers  Ensembles smooth jagged decision boundariesPicture fromT.G. Dietterich. Ensemble methods in machine learning. In Multiple ClassierSystems, Cagliari, Italy, 2000. 26
  27. 27. Heterogeneous ModelEnsembles on Glass Data  Model prediction diversity obtained by using different algorithms: tree, NN, RBF, Gaussian, Regression, k-NN  Combining 3-5 models on average better than best single model  Combining all 6 models not best (best is 3&4 model combination), but is close  The is an example of reducing model variance through ensembles, but not model bias 27
  28. 28. The Conflict withData Mining Algorithm Objectives Algorithm Objectives – Linear Regression and Neural networks minimize squared error – C5 minimizes entropy – CART minimizes Gini index – Logistic regression maximizes the log of the odds of the probability the record belongs to class ―1‖ (classification accuracy) – Nearest neighbor minimizes Euclidean distance 28
  29. 29. The Conflict withData Mining Algorithm Objectives Algorithm Objectives Business Objectives – Linear Regression and – Maximize net revenue Neural networks minimize – Achieve cumulative squared error response rate of 13% – C5 minimizes entropy – Maximize responders – CART minimizes Gini index subject to a budget of – Logistic regression $100,000 maximizes the log of the – Maximize savings from odds of the probability the identifying customer likely record belongs to class ―1‖ to churn (classification accuracy) – Maximize collected revenue – Nearest neighbor by identifying next best minimizes Euclidean case to collect distance – Minimize false alarms in top 100 hits – Maximize hits subject to a false alarm rate of 1 in 1,000,000 29
  30. 30. Possible Solutions to Business Objective / Data Mining Objective Mismatch Model Ranking Metric Model Building Considerations1. Rank models by algorithm 1. Force the data into the objectives, ignoring business algorithm box, and hope the objectives, and hope the winner does a good job in models do a good enough job reality2. Use optimization algorithms to 2. Throw away very nice theory of maximize/minimize directly the data mining algorithms, and business objective hope the optimization algorithms converge well3. Build models normally, but rank 3. Take your lumps with models by business objectives, algorithms not quite doing what ignoring their ―natural‖ we want them to do, but take algorithm score, hoping that advantage of the power and some algorithms do well efficiency of algorithms enough at scoring by business objective 30
  31. 31. Model Comparison Example:Rankings Tell Different Stories Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC Correlation between rankings: 31
  32. 32. Model Deployment Methods In data mining software application itself – Pro: Easy--same processing done as in building model – Con: Slowest method of implementation with large data In database or real-time system – Model encoded in Predictive Model Markup Language (PMML) -- http://www.dmg.org/  A database becomes the run-time engine  Typically for model only, though PMML supports data preparation and cleansing functions as well – SQL code – Model encoded in ―wrapper‖, run via calls from database, transaction system, or operating system  Batch run or source code Run-time engine – Often part of data mining software package itself 32
  33. 33. Sample PMML Code 33
  34. 34. Typical Predictive Model Deployment Processing Flow Select Clean Data Import/Select Fields (missing, Data to ScoreData to Needed recodes, …)Score The key: reproduce all Re-create data pre-processing done Derived to build the models Variables Decile** Score* Scored Scored Data Data Data 34
  35. 35. Knowing is not enough Those who know first, win Those who ACT first, win Provided they act intelligently ©2011 Decision Management Solutions 35
  36. 36. Avoid the insight-to-action gap ©2011 Decision Management Solutions 36
  37. 37. Analytic insights must drive action ? ©2011 Decision Management Solutions 37
  38. 38. Business rules drive decisions Decision Regulations Policy History Experience Legacy Applications ©2011 Decision Management Solutions 38
  39. 39. Three legged stools need three legs ©2011 Decision Management Solutions 39
  40. 40. Operational decisions at the center Business ©2011 Decision Management Solutions 40
  41. 41. Monitoring and compliance ©2011 Decision Management Solutions 41
  42. 42. Scorecards are a powerful tool Years Under Contract Years Under Contract 1 0 1 0 2 5 2 5 More than 2 10 More than 2 10 Number of Contract Changes Number of Contract Changes 0 0 0 0 1 5 1 5 More than 1 10 More than 1 10 Value Rating of Current Plan Value Rating of Current Plan Poor 0 Poor 0 Good 10 Good 10 Excellent 20 Excellent 20 Score Score 30 ©2011 Decision Management Solutions Fig 5.4 Smart (Enough) Systems, Prentice Hall June 2007. 42
  43. 43. Why use a scorecard?Reason Codes Simplicity•Return the most important •Easy to use and explainreason(s) for a score •Easy to implement•Explaining results •Although not necessarily easy to buildTransparency Compact•It is really clear how a score card •One score card can often replacegot its result many rules and tables•The complete workings of a score •One artifact for one predictioncard can be loggedCompliance Familiar•Easy to enforce rules about use of •Analytic teams are used tospecific attributes developing score cards•Easy to remove rough edges •Regulators and business owners are used to reviewing them ©2011 Decision Management Solutions 43
  44. 44. Continuous improvement ©2011 Decision Management Solutions 44
  45. 45. Continuous improvement ©2011 Decision Management Solutions 45
  46. 46. Don’t start by focusing on the data Better decision Analytic insight Derived information Available data ©2011 Decision Management Solutions 46
  47. 47. Start by focusing on the value Better decision Analytic insight Analytic Derived insight Derived information Available information data Available data ©2011 Decision Management Solutions 47
  48. 48. Wrap Up
  49. 49. The 10 Best Practices1. Be flexible; data mining is not a set of rules!2. Avoid 3 key data preparation, modeling mistakes3. Diversity is strength: build lots of models4. Pick the right metric to assess models5. Have deployment in mind when building models6. Focus on actions7. The three legged stool8. Focus on explicability9. Build in decision analysis ©2011 Decision Management Solutions 49
  50. 50. Action Plan Identify your decisions before analytics Adopt business rules to implement analytics Bring business, analytic and IT people together ©2011 Decision Management Solutions 50
  51. 51. Let us know if we can help Decision Management Solutions can help you Focus on the right decisions Implement a blueprint Define a strategy http://www.decisionmanagementsolutions.com Abbott Analytics can help you Find the right software Define a strategy Learn the ropes http://www.abbottanalytics.com ©2011 Decision Management Solutions 51
  52. 52. Thank you! James Taylor, CEO james@decisionmanagementsolutions.comwww.decisionmangementsolutions.com/learnmo re

×