SlideShare a Scribd company logo
FORECASTING VISITATION 
March 06, 2014
GOALS FOR TODAY’S PRESENTATION 
 Overview of predictive analytics and modeling process 
 Share a use case that illustrates PA
THE MODELING PROCESS 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE
USE CASE PROFILE 
 Science center in the Midwest 
 Approx. 800,000 visitors a year 
 Approx. 20,000 member households 
 The Raiser’s Edge for fundraising 
 Ticketmaster VISTA for ticketing
DEFINE 
QUESTION 
EXPLORE 
AND SELECT 
DATA 
DEPLOY AND 
MONITOR THE BUSINESS QUESTION 
MODEL 
EVALUATE 
How do we make more money?
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
THE BUSINESS QUESTION MONITOR 
MODEL 
EVALUATE 
How do we make more money? 
What are the factors that affect visitation?
DEFINE 
QUESTION 
EXPLORE 
AND SELECT 
DATA 
DEPLOY AND 
MONITOR BRAINSTORMING THE ANSWER 
MODEL 
EVALUATE 
 What do we think the factors are? 
 Exhibits 
 Day of the week 
 Seasonality 
 Holidays 
 These are the “predictors” – use these to create the modeling database
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
EXPLORING THE DATA MONITOR 
MODEL 
EVALUATE 
 Generally become familiar with the data 
 Where are the outliers? 
 Are you finding evidence of bad data? 
 Do you have the data you need? 
 Transform the data so it is ready to be modeled
DEPLOY 
AND 
MONITOR 
EXPLORE THE DATA DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
MODEL 
EVALUATE
DEPLOY 
AND 
MONITOR 
EXPLORE THE DATA DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
MODEL 
EVALUATE
DEPLOY 
AND 
MONITOR 
EXPLORE THE DATA DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
MODEL 
EVALUATE
DEPLOY 
AND 
MONITOR 
MODELING: FIRST PASS DEFINE 
-5.0 -2.5 0.0 2.5 5.0 7.5 10.0 
99.99 
99 
95 
80 
50 
20 
5 
1 
0.01 
Standardized Residual 
Percent 
Normal Probability Plot 
(response is ADM) 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
MODEL 
EVALUATE
MODELING: FIRST PASS = 44% 
Predictor Coef P 
Constant 1085.08 0 
Mon -651.48 0 
Tue -650.91 0 
Wed -266.8 0 
Thur -308.87 0 
Fri -56.84 0.388 
Sat 507.88 0 
Apr -128.2 0.412 
May -253.93 0.011 
June 370 0.001 
July 1019.8 0 
Aug 843.4 0 
Sept -392.99 0 
Oct -398.2 0 
Nov -179.2 0.014 
Holiday -214.8 0.053 
Holiday Wkn 355.26 0 
EXH2 578.5 0.01 
EXH3 448.9 0.069 
EXH4 62.6 0.908 
EXH5 629.3 0.01 
Active Exh+ -3.2 0.995 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE
EVALUATE AND IMPROVE 
SECOND PASS = 66% 
-5.0 -2.5 0.0 2.5 5.0 7.5 
99.99 
99 
95 
80 
50 
20 
5 
1 
0.01 
Standardized Residual 
Percent 
Normal Probability Plot 
(response is ADM) 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE
EVALUATE AND IMPROVE 
THIRD PASS = 85% 
-4 -3 -2 -1 0 1 2 3 4 
99.99 
99 
95 
80 
50 
20 
5 
1 
0.01 
Standardized Residual 
Percent 
Normal Probability Plot 
(response is ADM)
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
THE FINAL MODEL MONITOR 
MODEL 
EVALUATE 
ADM = 1124 + 354 DW_SUN - 448 DW_MON - 339 DW_TUE + 113 DW_WED + 297 DW_FRI 
+ 1295 DW_SAT - 189 M_JAN + 102 M_MAR + 25.1 M_APR - 454 M_MAY 
+ 360 M_JUN + 1349 M_JUL + 972 M_AUG - 515 M_SEP - 565 M_OCT 
- 426 M_NOV - 541 M_DEC + 426 EXH_26 + 450 HOL_WKN + 144 AE 
- 17.2 AE_01 + 917 SPR_BRK + 1064 CXNY_WKS + 4952 P_COH + 202 FAMFRI 
- 54 FTH_WKS - 3689 RWB - 2576 MKT_EV - 416 NH_COL - 1971 NH_FTH 
+ 2798 NH_MLK + 3058 NH_LBD + 1196 NH_MEM - 1273 NH_NYD + 3009 NH_PRES 
+ 309 NH_VET + 633 CH_ESTRS - 2201 CH_ESTRM + 1776 CH_GDFRI + 3332 NYL 
+ 3938 PKD - 2169 OLOW + 2838 OHIGH + 1738 STDH - 1421 STDL + 1275 S99 
- 1155 S01
DEPLOY 
AND 
MONITOR 
THE FINAL MODEL DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
MODEL 
EVALUATE
THE FINAL MODEL 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE 
Constant 1124 
Predictor (top) Effect Predictor (bottom) Effect 
Community Open House 4952 September -515 
New Year's Week 3332 December -541 
Labor Day Weekend 3058 October -565 
President's Day 3009 New Year's Day -1273 
Martin Luther King Day 2798 Fourth of July -1971 
Good Friday 1776 Easter Monday -2201 
July 1349 Red White and Boom -3689
COMPILE DATA TO PREDICT 
ADMISSIONS 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE
DEPLOY 
AND 
MONITOR 
COMPILE DATA TO PREDICT DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
MODEL 
EVALUATE
PREDICTION LINE FIT PLOT 
5000 
4000 
3000 
2000 
1000 
0 
July Admissions Line Fit Plot 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Admissions 
July 2012 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE
COMPARING TO REALITY 
5000 
4000 
3000 
2000 
1000 
0 
July Admissions Line Fit Plot 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Admissions 
July 2012 
DEFINE 
QUESTION 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR 
MODEL 
EVALUATE
DEFINE 
QUESTIO 
N 
EXPLORE 
AND 
SELECT 
DATA 
DEPLOY 
AND 
MONITOR SO WHAT? 
MODEL 
EVALUAT 
E 
 The model translates the your strategy into numbers 
 Business decisions could include… 
 Adding or reducing staffing and volunteers more strategically 
 Open the right amount of ticket windows 
 Opening an auxiliary room to handle lunch overflow 
 Planning for shuttle parking and security 
 Leveling visitation - if you know a day will likely be low attendance, you could move events 
or group outings 
 Using a visitation model, you can… 
 Invest resources more efficiently 
 Improve the visitor experience

More Related Content

Similar to Forecasting Visitation

Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationJeff Birkner
 
Melda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxMelda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxImelda903061
 
QCon2015 NoSQLWithCachingSearchAndRealTimeAnalytics
QCon2015 NoSQLWithCachingSearchAndRealTimeAnalyticsQCon2015 NoSQLWithCachingSearchAndRealTimeAnalytics
QCon2015 NoSQLWithCachingSearchAndRealTimeAnalyticsJames Gorlick
 
Activity-Based Costing
Activity-Based CostingActivity-Based Costing
Activity-Based Costingrexcris
 
Powershop Life Time Value and Segmentation Presentation
Powershop Life Time Value and Segmentation PresentationPowershop Life Time Value and Segmentation Presentation
Powershop Life Time Value and Segmentation PresentationOptimalBI Limited
 
Production Planning and Control
Production Planning and ControlProduction Planning and Control
Production Planning and ControlSanjit Singh
 
Web trafic time series forecasting
Web trafic time series forecastingWeb trafic time series forecasting
Web trafic time series forecastingKorivi Sravan Kumar
 
Database Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago districDatabase Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago districDemin Wang
 
All-In AdWords Strategies for Peak Season
All-In AdWords Strategies for Peak SeasonAll-In AdWords Strategies for Peak Season
All-In AdWords Strategies for Peak SeasonROI Revolution
 
Forecasting enterprenuership 2311
Forecasting enterprenuership 2311Forecasting enterprenuership 2311
Forecasting enterprenuership 2311sainath balasani
 
Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Gramener
 
Laser Scanning Inspection Report-Reference
Laser Scanning Inspection Report-ReferenceLaser Scanning Inspection Report-Reference
Laser Scanning Inspection Report-Reference灿 冯
 
Financial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdfFinancial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdfSIMBARASHEMABHEKA
 
Financial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdfFinancial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdfSIMBARASHEMABHEKA
 
Alex Shaw III - Information Technology Portfolio
Alex Shaw III - Information Technology PortfolioAlex Shaw III - Information Technology Portfolio
Alex Shaw III - Information Technology PortfolioAlexShawIII
 
Agile Finance for Project Success
Agile Finance for Project SuccessAgile Finance for Project Success
Agile Finance for Project SuccessStephen Milligan
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeAssignmentchimp
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization ToolPooyan Jamshidi
 

Similar to Forecasting Visitation (20)

9. Source Cost Methodology
9. Source Cost Methodology9. Source Cost Methodology
9. Source Cost Methodology
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
 
Melda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxMelda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptx
 
QCon2015 NoSQLWithCachingSearchAndRealTimeAnalytics
QCon2015 NoSQLWithCachingSearchAndRealTimeAnalyticsQCon2015 NoSQLWithCachingSearchAndRealTimeAnalytics
QCon2015 NoSQLWithCachingSearchAndRealTimeAnalytics
 
Activity-Based Costing
Activity-Based CostingActivity-Based Costing
Activity-Based Costing
 
Powershop Life Time Value and Segmentation Presentation
Powershop Life Time Value and Segmentation PresentationPowershop Life Time Value and Segmentation Presentation
Powershop Life Time Value and Segmentation Presentation
 
Production Planning and Control
Production Planning and ControlProduction Planning and Control
Production Planning and Control
 
Web trafic time series forecasting
Web trafic time series forecastingWeb trafic time series forecasting
Web trafic time series forecasting
 
Database Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago districDatabase Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago distric
 
All-In AdWords Strategies for Peak Season
All-In AdWords Strategies for Peak SeasonAll-In AdWords Strategies for Peak Season
All-In AdWords Strategies for Peak Season
 
Forecasting enterprenuership 2311
Forecasting enterprenuership 2311Forecasting enterprenuership 2311
Forecasting enterprenuership 2311
 
Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016
 
Laser Scanning Inspection Report-Reference
Laser Scanning Inspection Report-ReferenceLaser Scanning Inspection Report-Reference
Laser Scanning Inspection Report-Reference
 
Financial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdfFinancial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdf
 
Financial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdfFinancial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdf
 
LSC Digital Prospecting
LSC Digital ProspectingLSC Digital Prospecting
LSC Digital Prospecting
 
Alex Shaw III - Information Technology Portfolio
Alex Shaw III - Information Technology PortfolioAlex Shaw III - Information Technology Portfolio
Alex Shaw III - Information Technology Portfolio
 
Agile Finance for Project Success
Agile Finance for Project SuccessAgile Finance for Project Success
Agile Finance for Project Success
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylme
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
 

Forecasting Visitation

  • 2. GOALS FOR TODAY’S PRESENTATION  Overview of predictive analytics and modeling process  Share a use case that illustrates PA
  • 3. THE MODELING PROCESS DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE
  • 4. USE CASE PROFILE  Science center in the Midwest  Approx. 800,000 visitors a year  Approx. 20,000 member households  The Raiser’s Edge for fundraising  Ticketmaster VISTA for ticketing
  • 5. DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR THE BUSINESS QUESTION MODEL EVALUATE How do we make more money?
  • 6. DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND THE BUSINESS QUESTION MONITOR MODEL EVALUATE How do we make more money? What are the factors that affect visitation?
  • 7. DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR BRAINSTORMING THE ANSWER MODEL EVALUATE  What do we think the factors are?  Exhibits  Day of the week  Seasonality  Holidays  These are the “predictors” – use these to create the modeling database
  • 8. DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND EXPLORING THE DATA MONITOR MODEL EVALUATE  Generally become familiar with the data  Where are the outliers?  Are you finding evidence of bad data?  Do you have the data you need?  Transform the data so it is ready to be modeled
  • 9. DEPLOY AND MONITOR EXPLORE THE DATA DEFINE QUESTION EXPLORE AND SELECT DATA MODEL EVALUATE
  • 10. DEPLOY AND MONITOR EXPLORE THE DATA DEFINE QUESTION EXPLORE AND SELECT DATA MODEL EVALUATE
  • 11. DEPLOY AND MONITOR EXPLORE THE DATA DEFINE QUESTION EXPLORE AND SELECT DATA MODEL EVALUATE
  • 12. DEPLOY AND MONITOR MODELING: FIRST PASS DEFINE -5.0 -2.5 0.0 2.5 5.0 7.5 10.0 99.99 99 95 80 50 20 5 1 0.01 Standardized Residual Percent Normal Probability Plot (response is ADM) QUESTION EXPLORE AND SELECT DATA MODEL EVALUATE
  • 13. MODELING: FIRST PASS = 44% Predictor Coef P Constant 1085.08 0 Mon -651.48 0 Tue -650.91 0 Wed -266.8 0 Thur -308.87 0 Fri -56.84 0.388 Sat 507.88 0 Apr -128.2 0.412 May -253.93 0.011 June 370 0.001 July 1019.8 0 Aug 843.4 0 Sept -392.99 0 Oct -398.2 0 Nov -179.2 0.014 Holiday -214.8 0.053 Holiday Wkn 355.26 0 EXH2 578.5 0.01 EXH3 448.9 0.069 EXH4 62.6 0.908 EXH5 629.3 0.01 Active Exh+ -3.2 0.995 DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE
  • 14. EVALUATE AND IMPROVE SECOND PASS = 66% -5.0 -2.5 0.0 2.5 5.0 7.5 99.99 99 95 80 50 20 5 1 0.01 Standardized Residual Percent Normal Probability Plot (response is ADM) DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE
  • 15. EVALUATE AND IMPROVE THIRD PASS = 85% -4 -3 -2 -1 0 1 2 3 4 99.99 99 95 80 50 20 5 1 0.01 Standardized Residual Percent Normal Probability Plot (response is ADM)
  • 16. DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND THE FINAL MODEL MONITOR MODEL EVALUATE ADM = 1124 + 354 DW_SUN - 448 DW_MON - 339 DW_TUE + 113 DW_WED + 297 DW_FRI + 1295 DW_SAT - 189 M_JAN + 102 M_MAR + 25.1 M_APR - 454 M_MAY + 360 M_JUN + 1349 M_JUL + 972 M_AUG - 515 M_SEP - 565 M_OCT - 426 M_NOV - 541 M_DEC + 426 EXH_26 + 450 HOL_WKN + 144 AE - 17.2 AE_01 + 917 SPR_BRK + 1064 CXNY_WKS + 4952 P_COH + 202 FAMFRI - 54 FTH_WKS - 3689 RWB - 2576 MKT_EV - 416 NH_COL - 1971 NH_FTH + 2798 NH_MLK + 3058 NH_LBD + 1196 NH_MEM - 1273 NH_NYD + 3009 NH_PRES + 309 NH_VET + 633 CH_ESTRS - 2201 CH_ESTRM + 1776 CH_GDFRI + 3332 NYL + 3938 PKD - 2169 OLOW + 2838 OHIGH + 1738 STDH - 1421 STDL + 1275 S99 - 1155 S01
  • 17. DEPLOY AND MONITOR THE FINAL MODEL DEFINE QUESTION EXPLORE AND SELECT DATA MODEL EVALUATE
  • 18. THE FINAL MODEL DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE Constant 1124 Predictor (top) Effect Predictor (bottom) Effect Community Open House 4952 September -515 New Year's Week 3332 December -541 Labor Day Weekend 3058 October -565 President's Day 3009 New Year's Day -1273 Martin Luther King Day 2798 Fourth of July -1971 Good Friday 1776 Easter Monday -2201 July 1349 Red White and Boom -3689
  • 19. COMPILE DATA TO PREDICT ADMISSIONS DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE
  • 20. DEPLOY AND MONITOR COMPILE DATA TO PREDICT DEFINE QUESTION EXPLORE AND SELECT DATA MODEL EVALUATE
  • 21. PREDICTION LINE FIT PLOT 5000 4000 3000 2000 1000 0 July Admissions Line Fit Plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Admissions July 2012 DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE
  • 22. COMPARING TO REALITY 5000 4000 3000 2000 1000 0 July Admissions Line Fit Plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Admissions July 2012 DEFINE QUESTION EXPLORE AND SELECT DATA DEPLOY AND MONITOR MODEL EVALUATE
  • 23. DEFINE QUESTIO N EXPLORE AND SELECT DATA DEPLOY AND MONITOR SO WHAT? MODEL EVALUAT E  The model translates the your strategy into numbers  Business decisions could include…  Adding or reducing staffing and volunteers more strategically  Open the right amount of ticket windows  Opening an auxiliary room to handle lunch overflow  Planning for shuttle parking and security  Leveling visitation - if you know a day will likely be low attendance, you could move events or group outings  Using a visitation model, you can…  Invest resources more efficiently  Improve the visitor experience

Editor's Notes

  1. Begin by reviewing what this part of the presentation is aiming to accomplish I would like to give you an overview of the PA process – the methodology you might follow if you were to do this I’ll spend most of out time illustrating how we used this model with a client And I will share with you the resulting model and how one might deploy it
  2. Before we jump into the work we’ve done with our client, I wanted to give you a quick overall review of the modeling process and then we’ll walk through each of these stages as we applied them to our client. Define question You must know what you want to predict before you start the process – what is the business driver? It is easy to jump into a project without a clear understanding of the business problem that is to be addressed Project starting with “Let’s run this data through some predictive algorithms to see what we get” are doomed to fail Explore and select data Once you have define clearly the business questions, look at your data Consider what data you think might be important to the question you have defined Prepare the data – is it clean? Is it ready for the modeling tool? Modeling Exploratory Data Analysis – first look at data you selected – where are the outliers?, why are they there?, are you finding bad data? Choose a model - Different models are used to answer different questions Build model Once you’ve explored and prepared the data and chosen the model, you are ready to build the model based on a subset of data you selected Point the model at the historical data to “train the model” and improve it in iterations Evaluate Did the model properly address the business question? Did you use the right data to answer the business question? What did you find surprising about the results? Are there surprises that are worth further investigation to make sure you’re the data you’re using is effective? Deploy and monitor This is the step where you release the model into your organization’s decision-making process. This is an important step. It brings us to this place where we take a look at our results and ask “So what?”. It’s not enough to create a fancy looking model if it’s not going to lead us somewhere where we can have specific changes that we can apply to how we run our business. Embed the model in your reports and BI At a high level, here’s the modeling process we’re going to walk through. I’m going to use a case study from one of our clients to highlight this process and share a little bit about what we’ve learned working with them.
  3. Let me tell you a little bit about our client. We are working with a mid-size science center. They get approximately 800,000 visitors a year They average about 20,000 members a year They use multiple systems throughout the organization but two of the main ones are The Raiser’s Edge for fundraising and Ticketmaster VISTA for ticketing. For the purposes of our work with this client, we’ve focused on data from Raiser’s Edge and VISTA for now. This gives you a little sense of the organization and you’ll certainly learn more about them as we go along.
  4. So now that you know a little about the client behind our use case, let me walk you through the modeling process that we’ve used for this client. As you’ll recall, we need to start by defining our business question. Without a business goal, predictive modeling is just another answer in search of a question. As we worked with our client, the general question they started with was: “How do we make more money? How do we earn a return?” It’s a good question and one that I’m sure we’ve all thought about at some point, but this is a big place to start. We need specificity in order for this to be meaningful, the question must be specific to your business. It must take into account the specifics of your organization or it will not be valuable and it will be hard to know if the modeling accomplished its goal. We need to break this down to get to something that is meaningful. So we talked about the different ways they make money. They sell memberships. They get donations. They sell tickets to the science center. They sell IMAX tickets. They have a gift shop. They have a café. They have events. We decided to focus on General Admission ticketing.
  5. So, our question went from the broad (and not terrible useable) “how do we make more money?” to the focused and ready for PA “What are the factors that affect visitation?” By creating a model that predicts visitation (the bread and butter for this organization), our client will be in a better position to plan for it – to make strategic business decisions that, ultimately, will earn a return. Some of the issues that we were expecting to address included staffing: Some days they were swamped and some days they were not and struggled to plan for staffing
  6. No one knows your organization better than you do. Before you dive into the data, you should consider what you think the answer to the question is. With this client, we guessed that the factors that affect visitation were: SEE POINTS ON SLIDE These “predictors” informed what data we needed to use. If we want to look at how exhibits affect attendance, we need to extract it into the “analysis dataset” This is a first pass in an iterative process. You will learn through the process that not all of the initial factors are impactful and some predictors will be missing.
  7. We know the question, we guessed at the answer, and we created a set of data that included the prediction (admission) and all the predictors. Now, we begin the data exploration. The purpose of the data exploration is… To get a basic understanding of the data; just looking at the data can be illuminating What trends might exist? Are there outliers? Modeling is, in many way, an exercise in explaining the outliers. Are you noticing data that looks odd or wrong? Data exploration can highlight data entry errors or anomalous transaction processing issues. You may also learn, when looking at the data, that you will need to transform it for analysis. For example, creating yes/no fields for exhibits was more useful than one field that listed all the exhibits. Same with holidays, we learned that the specific holiday was more relevant than just a generic “holiday.” The tools you use for data analysis can include simple charts and graphs in Excel and more sophisticated tools that use statistical and data mining algorithms
  8. We also looked at a probability plot – this uses statistics (standard deviation) to help highlight outliers The blue line shows what is expected. You immediately see the curve. On the high end there were some outliers and we began to explain those with the free days. We also discovered that “outreach” days were skewing the data (these were days where attendees from schools, etc. where added as admissions) There are a lot of outliers on the low and it turns out those are days where the museum is closed. While it may seem obvious, it was in the data and we needed to go through a step to find it and tag it. We also noticed some “closed” days where there are admissions. Turns out these where data entry anomalies that need to be removed. Each step the analysis data gets better and better – and it has the ancillary benefit of highlighting some opportunities to improve processing. It is essential that you identify outliers. “Outliers increase the failures of mode” – in other words, they will mess up your model – they can pull it down, prop it up, of actually cause it to completely flip. To find them, as we’ve shown you here, you use statistics, graphs, and common sense.
  9. So let’s talk about the models we built for this client. We choose to begin by using logistic regression. We added our predictors and ran a first pass Day of Week Month Exhibits If it was a Holiday If it was a weekend of a Holiday We looked at the graphs and assessed the indicators. The blue line represents what the model predicts, the red dots are the actual. What we see is that there are a lot of “errors” The dots on the high end are much too far away, the model is missing something… The overall results are an S-curve, we want this to be straighter…
  10. We don’t want to make this too technical, but I thought I’d share some of the data that is behind the model. This is where it starts to get real (and cool, imo). Without going into the weeds too much, what this tells is that the “Constant” is 1085, meaning that we start with that number and adjust it up and down depending on the predictor. On a Sunday you would subtract 651 On Saturday, you would add 507 In July, you add 1019 The “P” column is a statistical indicator of the relevance of the predictor. Anything .05 or greater is suspect. We are creating a formula that will drive our model This also told us that our model explained only 44% of the data – not good enough So, we looked closer at the outliers… We tagged the free “open house” days We tagged the 4th of July, Christmas and New Years instead of making them generic “holidays” We removed some of the less relevant predictors We ran the analysis again
  11. The chart certainly improved, much more of the admissions data are explained by the model, 62% to be exact. We can still see that the predictions are weaker as the admissions get larger We focused on the high days there were not explained and we found a data entry error – some entries were miscoded as admissions (they were actually large video conferencing events) We found that other high days were Friday nights where families were given discounts and special programming We decided to add in the weather variable to see what impact that had We cleaned those up and recreated the analysis data set
  12. Now we see a marked improvement on the quality of the model The outliers are now within the bounds that we are more comfortable with (all within 3 standard deviations, fwiw) The model accurately described 85% of the admissions data – that is good enough to start making some business decisions
  13. Here’s the model ends up looking like. It’s a formula that predicts – that is what a model is. This is not the actual model for the client, but the data is a fair representative of what you might find. This is pretty ugly and not really usable as is There are a few ways you can represent it
  14. You can look at models in decision trees It moves from left to right adjusting project admission based on the flow For example, if you follow the flow of a Saturday in July all the to the end, it will show you what the predicted visitation number for that scenario
  15. Here is a look of a snippet of the data behind the model. You can see pretty easily how you could use this to build a spreadsheet or report to project visitation based on the factors that you know. For example, you can see that the primary factors that increase visitation are Good Friday, the month of July and a large exhibit. While the 4th of July and Tuesdays Holidays are impactful in both directions Exhibits – surprisingly, we found that most exhibits are not a huge impact, but one, Titanic, was. The next question is to model why it had such an inordinate impact on visitation. To some, this may all seem to be common sense. In some respects it is, but it is many layers of common sense interacting with each other dynamically. In practice, developing and using predictive models will always outperform a pure "common sense" approach to targeting. The reason is that good models are better able to make correct judgment calls, and simultaneously take into account multiple factors and variables.
  16. With our “formula” in hand, we can create a 365-row spreadsheet that predicts our visitation every day of the year You simply tag all future dates based on the variables that the model said are significant. For example, July 1, 2012 is tagged 1 for the month of July and 1 for Sunday. If you were also having an exhibit that day, you would tag it too. To illustrate this we created one worksheet with all of the Flags for each Variable (assigned a 1 or 0 for ‘yes’ or ‘no’). We did this one pretty manually in Excel to illustrate it. However, there are more automated/sophisticated approaches that leverage SSIS tools and opuir BI tool, JCA Answers.
  17. Here is the result that shows admissions after the data has been flagged
  18. It’s important to remember that there is no such thing as a perfect prediction – all predictions have error (aka ‘residual’). The model will tell you what your error is and you can look at it using a “line fit plot” It shows the Predicted Admissions with the error range.
  19. Of course, as we said, the explains 85% of the causes, that still leave 15% unexplained. As we continue to identify and explain outliers, the predictions will improve. In the chart above, you can see the actuals (orange line), compared to the predictions. What is the reason the two highlighted areas were so outside the prediction? Was it a variable that we didn’t include, like weather? Are the data for these days complete and accurate (perhaps we didn’t flag an event)? In following up, we will note the larger deviations and seek to identify them. And the model will improve.
  20. Knowing what your constant is and knowing the numeric effects of time of year, day of week, holidays, exhibits, etc. You can literally plug your yearly plan into a spreadsheet and see the projected visitation. So now we have a pretty strong model that can predict visitation, what do we do with it? Everything we looked at today was just a starting place.