SlideShare a Scribd company logo
1 of 16
Download to read offline
What Is Predictive Modeling?

                                               4250 258th Ave SE
                                               Issaquah, WA 98029
                                               425.996.8732 Office
                                               bill.cassill@numericalalchemy.com




                          Copyright 2009 Numerical Alchemy, Inc.
  This material is not to be distributed or in any way duplicated without the prior consent of the author.
What Is a Model?

ā€¢ Predictive modeling refers to a class of techniques that
  determine the most likely outcome given a set of inputs.
  Frequently, this requires inputs consisting of prior data that
  will be used to predict a future outcome or event.
     Predictive Model Often Uses Past Data to Predict Future Events

                Model Inputs

                   Input A
                                                      Outcome
                   Input B
                                                       Event
                   Input C

                   Input D

                                                                            1
            Past Data (e.g. last month)   Future Data (e.g. 2 months out)
What Are Models Used For?

ā€¢ Models currently have many uses.

ā€¢ Some examples include:
   ā€“ Which people are a good credit risk?
   ā€“ What is someoneā€™s accident risk based on age, gender, and past
     driving history?
   ā€“ Who is most likely to buy my products in the next 90 days?
   ā€“ Who is most likely to stop doing business with my company in the near
     future?
   ā€“ Which purchase transactions represent a significant fraud risk?


ā€¢ All of these questions can be answered with predictive
  modeling.                                                              2
A Tangled Web of Data

ā€¢ What can make the prediction task complex is when we are
  faced with hundreds or thousands of potential factors that
  can be used as inputs.

ā€¢ The obvious questions arise:
   ā€“ Which ones should I use?
   ā€“ How many of the factors are truly relevant or predictive?
   ā€“ How do I know if I have the ā€œrightā€ model?


ā€¢ All of these questions can be answered by a good analyst or
  statistician.

                                                                 3
Outcome Variables

ā€¢ Several types of outcome variables can be predicted using
  statistical modeling techniques.

ā€¢ These include:
   ā€“ Continuous values like future customer profitability and future sales
     volumes
   ā€“ Binary outcomes (1 = event occurs & 0 = event does not occur) like
     whether someone buys something (or not) or defaults on a credit card
     (or not)
   ā€“ Multi-category outcomes like small, medium, and large.


ā€¢ However, by far the most popular outcomes to model are the
  continuous and binary variety.
                                                                         4
Prediction and Scores

ā€¢ Once a model has been built, it can be used to generate
  scores (i.e. predicted values) on new data. Depending on the
  outcome being modeled, these scores can take on a couple of
  different varieties.

ā€¢ Predicted scores for binary outcomes are represented as a
  probability score: a 0 to 1 decimal score representing the
  percentage chance that the modeled event will occur for a
  given case.

ā€¢ For continuous values, predicted scores take on the scale and
  characteristics of the original outcome variable.
                                                                  5
Finding the ā€œRight Modelā€

ā€¢ There are many measures that tell you how predictive your
  model is. The problem is that no matter how predictive your
  model is on one set of data, it may lose itā€™s predictive power
  once applied to another set of data.

ā€¢ One example is using demographic data to predict store level
  retail sales during the summer months. The predictors we
  observe for the South Eastern U.S. may not prove useful when
  used on West Coast locations.

ā€¢ Similarly, using an algorithm that predicts summer sales well
  may likely prove useless in predicting the spike in sales during
  the November and December Christmas season.                      6
Validation Is the Key

ā€¢ The way to truly test how well a model performs is to test it
  on an external data set.

ā€¢ The data the model is built on is typically call the
  ā€œdevelopment sampleā€ while the data set used to validate the
  model is called the ā€œvalidation sample.ā€

ā€¢ Ideally, both samples will be pulled from the same population
  of cases. By creating random samples, we can be fairly sure
  that we are creating data sets that are representative of the
  population of interest.

                                                                  7
Lift Charts

ā€¢ One way to tell how well a model performs is by looking at
  something called a lift chart. In order to construct one, follow
  these basic steps:

     1. Sort the case in the data set in descending order from the highest
        predicted score to the lowest (i.e. the highest scores are at the top)

     2. Cut the file into 10% chunks called ā€œdecilesā€ where the top 10% (or top
        decile) represents the top 10% with the highest scoring cases.

     3. Calculate your lift value by dividing the average value of the outcome
        variable within each decile by the average value of the entire sample.




                                                                                  8
Lift Charts (cont.)

ā€¢ Once weā€™ve done the basic data manipulation as shown on
  the previous page, we can make a chart like the one shown
  below. The good thing about models is that we can use them
  to identify and target our actions to a much smaller number
  of cases.                       Sample Lift Chart
                                         7.00%
The average rate for the
outcome event is 1.5% of the
                                         6.00%          It is better to target these casesā€¦
total cases. However, for the
top decile (or the top 10% of            5.00%
cases with the highest
                                         4.00%
scores), the percentage of
                                                                                   ā€¦than these
cases experiencing the event
                                         3.00%
is 6%. This represents a lift
of 4 times higher than the
                                         2.00%
sample average.
                                         1.00%
In terms of application, if this model
were developed to identify likely        0.00%
buyers of a product, we would want
to focus our marketing efforts on
those in the top one or two deciles
who have a much stronger likelihood
to purchase vs. those who are very
                                                                                                    9
unlikely to purchase.                                Average Decile Value    Average Sample Value
Gains Charts

ā€¢ Gains charts are another way to determine how well a model
  performs.

ā€¢ Like lift charts, we sort the data in descending order from
  highest score to lowest score. Next, we cut the file into 10%
  chunks.

ā€¢ However, unlike a lift chart, the idea is to see how much of the
  target event we are capturing as we move from the top of the
  data file to the bottom.


                                                                  10
Gains Charts (cont.)

ā€¢ We compare the cumulative capture of the ā€œeventā€ cases to
  the cumulative capture rate if the file had simply been sorted
  in a random order.
                                                                                            Sample Gains Chart
  In this example, the model
                                                                      100.00%
  captures 45% of all the cases
                                     Cumulative % of Event Captured



  that exhibit the ā€œeventā€œ within                                      90.00%
  the top 10% of the file. Within
                                                                       80.00%
  the top 30% of the file better
                                                                       70.00%
  than 75% of the ā€œeventā€ cases
  have been captured.
                                                                       60.00%
                                                                       50.00%
  These results for the model
                                                                       40.00%
  can be compared to a random
                                                                       30.00%
  sorting of the file. In the case
  of a random sort, we could                                           20.00%
  expect to capture 10% of the
                                                                       10.00%
  ā€œeventā€ cases within the top
                                                                        0.00%
  10% of the file and 30% of
  ā€œeventā€ cases within the top
  30% of the file.




                                                                                                                                      11
                                                                           Cumulative Capture (Model)   Cumulative Capture (Random)
Using the Model

ā€¢ Once the model has been developed and validated, it is time
  to use it. In order to use it, fresh data is utilized to generate
  scores on the cases or population of interest.

ā€¢ Typically, models are deployed to be used in one of three
  fashions:
   ā€“ One time or infrequent, occasional use
   ā€“ Regularly scheduled rescoring (e.g. weekly, monthly, quarterly)
     depending upon when fresh data becomes available
   ā€“ Scoring in real time. This is most appropriate for applications like
     transaction fraud detection or continuous learning predictive
     algorithms.

                                                                            12
Tracking the Model

ā€¢ Like almost everything else, models age and can become less
  predictive over time.

ā€¢ Because of this, it is important to periodically reassess a
  modelā€™s performance.

ā€¢ This can be done using the standard lift and gains charts. By
  comparing the model performance over different time
  periods, the degree of performance decay can be assessed on
  an ongoing basis.


                                                                13
Putting a Model Out to Pasture

ā€¢ When a model finally loses its luster, it is time to retire it.

ā€¢ However, the decision as to when to retire an existing model
  can be somewhat subjective.

ā€¢ When you do make this decision, you are faced with the
  prospect of creating a new model to replace the one you are
  going to retire.

ā€¢ Donā€™t panic! This is just part of the model lifecycle. Simply
  create the new one and then switch them out.
                                                                    14
Final Comments

ā€¢ Congratulations! You can now claim to be an educated user
  of predictive analytics.

ā€¢ At this point, you should have an idea of:
   ā€“   What a model does
   ā€“   What it can be used for
   ā€“   How to assess itā€™s predictive accuracy
   ā€“   The basic model lifecycle


ā€¢ We hope you have enjoyed this little overview, and best of
  luck in your application of predictive analytics.

                                                               15

More Related Content

What's hot

L08 Over Fitting
L08 Over FittingL08 Over Fitting
L08 Over FittingYujin Chung
Ā 
Data Analytics Notes
Data Analytics NotesData Analytics Notes
Data Analytics NotesTRANJAY CHANDEL
Ā 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajib Kumar De
Ā 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4Wake Tech BAS
Ā 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
Ā 
Recommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectRecommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectPranov Mishra
Ā 
Machine learning project
Machine learning project Machine learning project
Machine learning project BabatundeSogunro
Ā 
Calculating a Sample Size
Calculating a Sample SizeCalculating a Sample Size
Calculating a Sample SizeMatt Hansen
Ā 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Shubhashish Biswas
Ā 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationAlejandro Correa Bahnsen, PhD
Ā 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningPranov Mishra
Ā 
Application of probability theory in small business management in nigeria
Application of probability theory in small business management in nigeriaApplication of probability theory in small business management in nigeria
Application of probability theory in small business management in nigeriaAlexander Decker
Ā 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
Ā 
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib KeeminkPython and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib KeeminkPyData
Ā 
Defining the VOC and Defects
Defining the VOC and DefectsDefining the VOC and Defects
Defining the VOC and DefectsMatt Hansen
Ā 
QWE Inc Report_Group 2
QWE Inc Report_Group 2QWE Inc Report_Group 2
QWE Inc Report_Group 2Xinyu Liu
Ā 
Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)Matt Hansen
Ā 

What's hot (20)

L08 Over Fitting
L08 Over FittingL08 Over Fitting
L08 Over Fitting
Ā 
Data Analytics Notes
Data Analytics NotesData Analytics Notes
Data Analytics Notes
Ā 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
Ā 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Ā 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
Ā 
Recommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectRecommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning Project
Ā 
MidTerm memo
MidTerm memoMidTerm memo
MidTerm memo
Ā 
Machine learning project
Machine learning project Machine learning project
Machine learning project
Ā 
Calculating a Sample Size
Calculating a Sample SizeCalculating a Sample Size
Calculating a Sample Size
Ā 
Final Report
Final ReportFinal Report
Final Report
Ā 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
Ā 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
Ā 
report
reportreport
report
Ā 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
Ā 
Application of probability theory in small business management in nigeria
Application of probability theory in small business management in nigeriaApplication of probability theory in small business management in nigeria
Application of probability theory in small business management in nigeria
Ā 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
Ā 
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib KeeminkPython and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Ā 
Defining the VOC and Defects
Defining the VOC and DefectsDefining the VOC and Defects
Defining the VOC and Defects
Ā 
QWE Inc Report_Group 2
QWE Inc Report_Group 2QWE Inc Report_Group 2
QWE Inc Report_Group 2
Ā 
Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)
Ā 

Similar to What Is a Model, Anyhow?

Workbook Project
Workbook ProjectWorkbook Project
Workbook ProjectBrian Ryan
Ā 
8 rajib chakravorty risk
8 rajib chakravorty risk8 rajib chakravorty risk
8 rajib chakravorty riskCCR-interactive
Ā 
Injecting Certainty Into An Uncertain Process
Injecting Certainty Into An Uncertain ProcessInjecting Certainty Into An Uncertain Process
Injecting Certainty Into An Uncertain ProcessJerry Scherer
Ā 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
Ā 
Minimize Fraud And Maximize Revenue Deposit Risk Scoring
Minimize Fraud And Maximize Revenue   Deposit Risk ScoringMinimize Fraud And Maximize Revenue   Deposit Risk Scoring
Minimize Fraud And Maximize Revenue Deposit Risk Scoringjiz95001
Ā 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dssNiyitegekabilly
Ā 
The Dark Art of Production Alerting
The Dark Art of Production AlertingThe Dark Art of Production Alerting
The Dark Art of Production AlertingAlois Reitbauer
Ā 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectBoston Institute of Analytics
Ā 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
Ā 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paperShubhashish Biswas
Ā 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxsmile790243
Ā 
5 simple questions to determin sample size
5 simple questions to determin sample size5 simple questions to determin sample size
5 simple questions to determin sample sizeZixia Wang
Ā 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
Ā 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
Ā 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptxParveen Vashisth
Ā 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
Ā 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4RichardGroom
Ā 
Choosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning ModelChoosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning ModelExperian
Ā 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfMachineLearning22
Ā 
Chapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfChapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfShamshadAli58
Ā 

Similar to What Is a Model, Anyhow? (20)

Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
Ā 
8 rajib chakravorty risk
8 rajib chakravorty risk8 rajib chakravorty risk
8 rajib chakravorty risk
Ā 
Injecting Certainty Into An Uncertain Process
Injecting Certainty Into An Uncertain ProcessInjecting Certainty Into An Uncertain Process
Injecting Certainty Into An Uncertain Process
Ā 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Ā 
Minimize Fraud And Maximize Revenue Deposit Risk Scoring
Minimize Fraud And Maximize Revenue   Deposit Risk ScoringMinimize Fraud And Maximize Revenue   Deposit Risk Scoring
Minimize Fraud And Maximize Revenue Deposit Risk Scoring
Ā 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
Ā 
The Dark Art of Production Alerting
The Dark Art of Production AlertingThe Dark Art of Production Alerting
The Dark Art of Production Alerting
Ā 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Ā 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Ā 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
Ā 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docx
Ā 
5 simple questions to determin sample size
5 simple questions to determin sample size5 simple questions to determin sample size
5 simple questions to determin sample size
Ā 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
Ā 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
Ā 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
Ā 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
Ā 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
Ā 
Choosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning ModelChoosing The Right Credit Decisioning Model
Choosing The Right Credit Decisioning Model
Ā 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdf
Ā 
Chapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfChapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdf
Ā 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Ā 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Ā 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
Ā 
FULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | Delhisoniya singh
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
Ā 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Ā 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Ā 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Ā 
FULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY šŸ” 8264348440 šŸ” Call Girls in Diplomatic Enclave | Delhi
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Ā 

What Is a Model, Anyhow?

  • 1. What Is Predictive Modeling? 4250 258th Ave SE Issaquah, WA 98029 425.996.8732 Office bill.cassill@numericalalchemy.com Copyright 2009 Numerical Alchemy, Inc. This material is not to be distributed or in any way duplicated without the prior consent of the author.
  • 2. What Is a Model? ā€¢ Predictive modeling refers to a class of techniques that determine the most likely outcome given a set of inputs. Frequently, this requires inputs consisting of prior data that will be used to predict a future outcome or event. Predictive Model Often Uses Past Data to Predict Future Events Model Inputs Input A Outcome Input B Event Input C Input D 1 Past Data (e.g. last month) Future Data (e.g. 2 months out)
  • 3. What Are Models Used For? ā€¢ Models currently have many uses. ā€¢ Some examples include: ā€“ Which people are a good credit risk? ā€“ What is someoneā€™s accident risk based on age, gender, and past driving history? ā€“ Who is most likely to buy my products in the next 90 days? ā€“ Who is most likely to stop doing business with my company in the near future? ā€“ Which purchase transactions represent a significant fraud risk? ā€¢ All of these questions can be answered with predictive modeling. 2
  • 4. A Tangled Web of Data ā€¢ What can make the prediction task complex is when we are faced with hundreds or thousands of potential factors that can be used as inputs. ā€¢ The obvious questions arise: ā€“ Which ones should I use? ā€“ How many of the factors are truly relevant or predictive? ā€“ How do I know if I have the ā€œrightā€ model? ā€¢ All of these questions can be answered by a good analyst or statistician. 3
  • 5. Outcome Variables ā€¢ Several types of outcome variables can be predicted using statistical modeling techniques. ā€¢ These include: ā€“ Continuous values like future customer profitability and future sales volumes ā€“ Binary outcomes (1 = event occurs & 0 = event does not occur) like whether someone buys something (or not) or defaults on a credit card (or not) ā€“ Multi-category outcomes like small, medium, and large. ā€¢ However, by far the most popular outcomes to model are the continuous and binary variety. 4
  • 6. Prediction and Scores ā€¢ Once a model has been built, it can be used to generate scores (i.e. predicted values) on new data. Depending on the outcome being modeled, these scores can take on a couple of different varieties. ā€¢ Predicted scores for binary outcomes are represented as a probability score: a 0 to 1 decimal score representing the percentage chance that the modeled event will occur for a given case. ā€¢ For continuous values, predicted scores take on the scale and characteristics of the original outcome variable. 5
  • 7. Finding the ā€œRight Modelā€ ā€¢ There are many measures that tell you how predictive your model is. The problem is that no matter how predictive your model is on one set of data, it may lose itā€™s predictive power once applied to another set of data. ā€¢ One example is using demographic data to predict store level retail sales during the summer months. The predictors we observe for the South Eastern U.S. may not prove useful when used on West Coast locations. ā€¢ Similarly, using an algorithm that predicts summer sales well may likely prove useless in predicting the spike in sales during the November and December Christmas season. 6
  • 8. Validation Is the Key ā€¢ The way to truly test how well a model performs is to test it on an external data set. ā€¢ The data the model is built on is typically call the ā€œdevelopment sampleā€ while the data set used to validate the model is called the ā€œvalidation sample.ā€ ā€¢ Ideally, both samples will be pulled from the same population of cases. By creating random samples, we can be fairly sure that we are creating data sets that are representative of the population of interest. 7
  • 9. Lift Charts ā€¢ One way to tell how well a model performs is by looking at something called a lift chart. In order to construct one, follow these basic steps: 1. Sort the case in the data set in descending order from the highest predicted score to the lowest (i.e. the highest scores are at the top) 2. Cut the file into 10% chunks called ā€œdecilesā€ where the top 10% (or top decile) represents the top 10% with the highest scoring cases. 3. Calculate your lift value by dividing the average value of the outcome variable within each decile by the average value of the entire sample. 8
  • 10. Lift Charts (cont.) ā€¢ Once weā€™ve done the basic data manipulation as shown on the previous page, we can make a chart like the one shown below. The good thing about models is that we can use them to identify and target our actions to a much smaller number of cases. Sample Lift Chart 7.00% The average rate for the outcome event is 1.5% of the 6.00% It is better to target these casesā€¦ total cases. However, for the top decile (or the top 10% of 5.00% cases with the highest 4.00% scores), the percentage of ā€¦than these cases experiencing the event 3.00% is 6%. This represents a lift of 4 times higher than the 2.00% sample average. 1.00% In terms of application, if this model were developed to identify likely 0.00% buyers of a product, we would want to focus our marketing efforts on those in the top one or two deciles who have a much stronger likelihood to purchase vs. those who are very 9 unlikely to purchase. Average Decile Value Average Sample Value
  • 11. Gains Charts ā€¢ Gains charts are another way to determine how well a model performs. ā€¢ Like lift charts, we sort the data in descending order from highest score to lowest score. Next, we cut the file into 10% chunks. ā€¢ However, unlike a lift chart, the idea is to see how much of the target event we are capturing as we move from the top of the data file to the bottom. 10
  • 12. Gains Charts (cont.) ā€¢ We compare the cumulative capture of the ā€œeventā€ cases to the cumulative capture rate if the file had simply been sorted in a random order. Sample Gains Chart In this example, the model 100.00% captures 45% of all the cases Cumulative % of Event Captured that exhibit the ā€œeventā€œ within 90.00% the top 10% of the file. Within 80.00% the top 30% of the file better 70.00% than 75% of the ā€œeventā€ cases have been captured. 60.00% 50.00% These results for the model 40.00% can be compared to a random 30.00% sorting of the file. In the case of a random sort, we could 20.00% expect to capture 10% of the 10.00% ā€œeventā€ cases within the top 0.00% 10% of the file and 30% of ā€œeventā€ cases within the top 30% of the file. 11 Cumulative Capture (Model) Cumulative Capture (Random)
  • 13. Using the Model ā€¢ Once the model has been developed and validated, it is time to use it. In order to use it, fresh data is utilized to generate scores on the cases or population of interest. ā€¢ Typically, models are deployed to be used in one of three fashions: ā€“ One time or infrequent, occasional use ā€“ Regularly scheduled rescoring (e.g. weekly, monthly, quarterly) depending upon when fresh data becomes available ā€“ Scoring in real time. This is most appropriate for applications like transaction fraud detection or continuous learning predictive algorithms. 12
  • 14. Tracking the Model ā€¢ Like almost everything else, models age and can become less predictive over time. ā€¢ Because of this, it is important to periodically reassess a modelā€™s performance. ā€¢ This can be done using the standard lift and gains charts. By comparing the model performance over different time periods, the degree of performance decay can be assessed on an ongoing basis. 13
  • 15. Putting a Model Out to Pasture ā€¢ When a model finally loses its luster, it is time to retire it. ā€¢ However, the decision as to when to retire an existing model can be somewhat subjective. ā€¢ When you do make this decision, you are faced with the prospect of creating a new model to replace the one you are going to retire. ā€¢ Donā€™t panic! This is just part of the model lifecycle. Simply create the new one and then switch them out. 14
  • 16. Final Comments ā€¢ Congratulations! You can now claim to be an educated user of predictive analytics. ā€¢ At this point, you should have an idea of: ā€“ What a model does ā€“ What it can be used for ā€“ How to assess itā€™s predictive accuracy ā€“ The basic model lifecycle ā€¢ We hope you have enjoyed this little overview, and best of luck in your application of predictive analytics. 15