SlideShare a Scribd company logo
Best Practices and Considerations in
Big Data analytics
April 19, 2018
challengesturning Big Dataanalyticsinto competitiveadvantage
2
Lack of clear roadmap
Getting corporate alignment
and C-suite buy-in
Lack of appropriate data
structures Difficulties finding insights
Uncertainty as to how to convert
insights to action
Difficulty proving the business
case/undisciplined reporting and
measurement
3
‘Small’ data… there’s still big work
to be done here
ANALYTICS 1.0
(DESCRIPTIVE)
ANALYTICS 2.0
(PREDICTIVE)
ANALYTICS 3.0
(PRESCRIPTIVE)
ONE VIEW OF THE
CUSTOMER
ALIGNMENT
AND CULTURE
© 2016 Evironics Analytics
Many organizations that do have Big Data (needs, problems)
still have a lot of work to do to achieve analytics capabilities
with their ‘small data’.
4
Call Center
Operations
Quality Control
Prioritizing channel
opportunity
Risk ManagementFraud Detection
Direct
Marketing
Retention/
acquisitio
n
Some current applications of analytics
Analytics as an ongoing journey
5
Analyze
Insights
Opportunities
Plan
Investment
Strategies
Customer Contact &
Communication
Strategies
Execute
Creative and Media
In-Market Testing
Evaluate
Measure
Learn
Optimize
Continuous Improvement Through Data Analytics
Data discovery Process overview
6
Businessdiscovery&alignment
Analyticalsnapshot
Insightsand recommendations
Analyticalroadmap TacticalopportunitiesDatastrategyCustomerstrategy
Data audit
The Four step Process in building analytics solutions
• A discipline that requires STRUCTURE and PROCESS
– At Environics Analytics we utilize the following four-step process to manage projects:
7
Problem
Identification
Creation of the Analytical
Data Environment
Application of the
Data Mining Tools
Implementation and
Tracking
The overall process
remains the same
Yet elements
within the
process are
different
The data mining tasks within the process?
8
© 2016 Environics Analytics
• Meet with key
stakeholders
• Understand
business issues
• Understand
available data
• Review relevant
documentation
• Define data audit
requirements
• Audit existing data
• Assess completeness
and accuracy of data
collected
• Produce initial
frequency reports
• Identify data
gaps/recommend
third party data
overlays
• Variable creation
• Operationaliza
tion of
solution
• Real-
time vs.
batch
• Creating the
measurement
framework
• Analyzing
model
performance
and key
learning
• Feature
engineering
• Correlation
• Factor
analysis
• Decision
tree
• EDA reports
• Modelling
routines
• Regression
• Neural nets
• Ensemble,
etc.
• Validation
• Decile
charts, gains
charts
• AUC curves
Identification of
Business
Problem
Creation of
the
Analytical
File
Development
/ Validation
of the
Predictive
Analytics
Solution
Deployment and
Measurement of
the Solution
1. Identify the Real Business Problem
9
Define the problem
Listen and learn
Key
stakeholders
Documents
and reports
Dept. 1
Dept. 2
Dept. 3
 DO be careful of silos between departments
 DON’T jump to conclusions
2. Look for Quick Wins
10
Creating a
quick win
Identify and prioritize existing
business challenges
Make optimal
and effective
use of
information
Identify key
champion
stakeholders
 DO avoid exercises that cannot clearly
demonstrate cost benefits
2.Creating a quick win
• Retention issues are typically a challenge for most marketers
• High-value retention model offers significant savings
11
Quantity
Promoted
Defection
Rate of
Group
# of Potential
Defectors.
Save Rate
# of Cust.
Saved
Avg. Value
Total Saved
Quarterly
Savings without model 100,000 1.19% 1186 20% 237 $300 $71,100
Savings with model 100,000 3.12% 3117 20% 623 $300 $186,900
Diff: model vs. no model 1.93% 1931 386 $115,800
Potential Benefits - Targeting Top 25% of Model
3. Become familiar with the data: Data Audit
Process
12
Meta Data Frequency
N-tile
Random
Data Dump
Data Audit
Reports
Identify Gaps
Recommend 3rd Party
Data Overlays
(if applicable)
Data Audit: Assess Completeness & Accuracy of Data
3. Becomefamiliar with data: meta Data Audit
Reports
• Data audit is done on
53,235 records
• By looking at number
of unique values and
number of missing
values, you can begin
to understand data
• Data indicates that
relatively few
customers have email
(70%) while tenure
(creation date)
appears to be a good
variable with few
missing values (3%)
13
Data Audit Report - Column MetaData
Structure Value Distribution
Ordinal Column Name Column Type Unique Values FREQs Missing Mis % Non Missing
1 NAMEUPPER varchar 48,962 134 0% 53,101
2 PROFILENO int 53,177 59 0% 53,176
3
PROFILETYPE_LINKCO
DE varchar 64 Y 0 0% 53,235
4 NAME varchar 48,910 134 0% 53,101
5 FIRSTNAME varchar 12,025 3,690 7% 49,545
6 MIDDLEINIT varchar 823 44,632 84% 8,603
7 LASTNAME varchar 21,678 3,077 6% 50,158
8 COURTESYTITLE varchar 93 Y 15,884 30% 37,351
9 ADDRESS1 varchar 36,252 10,852 20% 42,383
10 ADDRESS2 varchar 2,795 49,254 93% 3,981
11 CITY varchar 3,469 10,878 20% 42,357
12 STATE varchar 758 10,084 19% 43,151
13 ZIP varchar 27,667 11,807 22% 41,428
14 COUNTRY varchar 1,217 19,509 37% 33,726
15 STMTREMARKS varchar 1,867 50,326 95% 2,909
16 UNAPPLIEDBALANCE varchar 57 Y 1,725 3% 51,510
17 PHONE varchar 32,037 17,088 32% 36,147
18 FAX varchar 3,053 49,673 93% 3,562
19 EMAIL varchar 13,731 37,219 70% 16,016
20 TITLE varchar 647 52,519 99% 716
21 BUSINESSTYPE varchar 121 53,105 100% 130
22 NOTES varchar 45 Y 53,191 100% 44
23 ADDITIONALNOTES varchar 20 Y 53,216 100% 19
24 OTHER varchar 9 Y 53,227 100% 8
25 TRAVELPREF varchar 6 Y 53,229 100% 6
26 INTERFACEID varchar 980 51,568 97% 1,667
27 CREATIONDATE datetime 3,041 Y 1,822 3% 51,413
28 CREDITLIMIT varchar 53,235 100% 0
 DON’T forget to look at key arithmetic diagnostics
14
Other fields were eliminated because:
• Missing values
• Cardinality
• Irrelevant to actual Twitter behaviour
• Example: many ID fields in data
3. Become familiar with data: meta Data Audit
Reports
Column Name Column Type Unique Values Missing Mis %
created_at Character 209,390 554,551 67%
entities_urls Character 114,003 380,722 46%
in_reply_to_screen_name Character 13,327 801,635 98%
in_reply_to_status_id Numeric 26,744 793,501 97%
in_reply_to_status_id_str Numeric 26,744 793,501 97%
in_reply_to_user_id Numeric 28,700 782,513 95%
in_reply_to_user_id_str Numeric 28,700 782,513 95%
lang Character 1 0 0%
place Character 1,415 816,208 99%
source Character 3,919 5,899 1%
text Character 172,958 513,496 62%
user_created_at Character 129,976 332,696 40%
user_followers_count Numeric 68,365 0 0%
user_friends_count Numeric 41,694 0 0%
user_id_str Numeric 329,121 0 0%
4. Use Statistics Judiciously
• The appropriate technique will depend on the business problem
15
• Exploratory – identify key variablesCorrelation Analysis
• Exploratory – identify key variables
• Can also be used to build final model
CHAID
• Exploratory – reduce data and help to identify key
variables
Factor Analysis
• Define distinct homogenous groups of customersCluster Analysis
• Build Final ModelMultiple Regression
• Build Final ModelLogistic Regression
• Build Final ModelNeural Nets
Statistical Tool Business Application
 DON’T assume PhDs in math and computer
engineering know everything
5. Establish Performance Benchmarks
from the Start
16
• Example: the gains chart looks at lift results
% of Validation
Sample
Validation
Names
Response
Rate
% of Total
Responders
Response
Rate Lift
Interval
ROI
Modelling
Benefits
0-10% 20,000 3.50% 23% 233 145% $26,667
10-20% 40,000 3.00% 40% 200 75% $40,000
20-30% 60,000 2.75% 55% 183 58% $50,000
30-40% 80,000 2.50% 67% 167 22% $53,333
40-50% 100,000 2.25% 75% 150 -13% $50,000
. .
. .
. .
90-100% 200,000 1.50% 100% 100 -58% $0
 DON’T be consumed by looking at residuals (difference
between predicted estimates and observed estimates)
• Be very careful about overstating results
• Example - Mortgage Insurance Model
17
Is there a problem here?
– Need to delve more into the model
6. Interpret Results Carefully/Overstatement
% of Names Promoted
% of Mortgage
Insurance Buyers
0-5% 80%
5-10% 5%
10-15% 5%
15%-100% 10%
18
• Let’s look at model variable contribution reports
• Ever bought insurance is accounting for 85% of
power of model
• Maybe upfront segmentation should be done. Are
we sure that the results are still valid?
• Correlation results indicate that ever bought
insurance variable has results that are in a
different magnitude when compared to other
variables
• Perhaps, we need to investigate this variable more
closely
6. Interpret Results Carefully/overstatement
Model Variable % Contribution to Model
Ever bought Insurance 85%
1 or more lending
product 8%
Have an investment
product 5%
Have a credit card 1%
Live in Ontario 1%
Variable
Correlation
Coefficient
ever bought Insurance 0.75
1 or more lending products 0.2
have a line of credit product 0.18
have a car loan 0.17
have an RRSP 0.16
have an RESP 0.15
have an investiment product 0.16
live in Toronto 0.15
live in Ontario 0.14
..etc
have a chequing account 0.0002
19
Pre-Period Post Period
Independent Variables Dependent Variables
• The analytical file does tell us something
here
• Analytical file was improperly created
where information used to create the
dependent variable was also used to create
the ever bought insurance variable
• Need to create proper analytical file
6. Interpret Results Carefully/Overstatement
Response
Ever bought
Insurance
1 or more lending
products
have an
investiment have a credit card live in Ontario
yes yes Yes Yes Yes Yes
yes yes Yes No Yes No
yes yes Yes Yes No Yes
yes yes Yes Yes No No
yes yes Yes No Yes Yes
yes yes No Yes Yes No
yes yes Yes Yes No Yes
yes yes Yes No No No
yes yes Yes Yes Yes Yes
yes yes Yes Yes Yes No
yes yes No Yes No Yes
No No No No No No
No No Yes No No No
No No No Yes No No
No No Yes No No Yes
No yes No No No Yes
No yes No No No No
No No No No Yes No
7. Use Art and Science to Build Solutions
• Challenge
– Retailer collects no information on its customers
– Market research indicates the key drivers of purchase behaviour
are high income, females and immigrants
• Solution
– Using an indexing approach, create postal code index variable
based on three Statistics Canada variables
20
Income % Female % Landed Immig.
Average Postal Code $40,000 52% 5%
M5A 1J2 $50,000 60% 10%
Index 1.25 1.15 2
The index for M5A 1J2 is (.33 x 1.25)+(.33 x 1.15)+(.33 x 2) = 1.45
• Index scheme can then be used to score each postal code
• 800,000 postal codes in Canada are then ranked into 20 half deciles
based on descending index score
21
% of File # of Postal Codes Min Index in Interval # of Prospects
0-5% 40,000 5.50 80,000
5-10% 40,000 5.00 60,000
10-15% 40,000 4.80 90,000
…
95-100% 40,000 0.05 30,000
Total 800,000 3,000,000
 DON’T use lack of data or inability to use advanced
techniques as barriers to at least test different initiatives
7. Use Art and Science to Build Solutions
8. Implement Solutions Carefully
• Challenge: Loss Cost model
– Key variables were analyzed
– Why are key model variables not performing?
– Audit of current data indicated strong presence of apartments
– No apartment data were in model development file
22
9. Measure and Track Results
• Customer Migration:
A Different View
• Series of reports designed to:
– Determine actual customer
migration patterns of Carded
Patrons between two set
periods of time
– Compare this to the
predicted migration pattern
– If variance is significantly
different, look at which
original profile variables are
still impacting migration
versus those that are not
23
Actual
New Segment (Current)
Est #
Customers
Gold Silver Reward Lapsed
Old Segment
Pre Period
Gold 50,000 50% 30% 15% 5%
Silver 150,000 20% 30% 30% 20%
Reward 300,000 5% 10% 50% 35%
Total 500,000
Predicted
New Segment (Current)
Est #
Customers
Gold Silver Reward Lapsed
Old Segment
Pre Period
Gold 50,000 60% 20% 15% 5%
Silver 150,000 15% 25% 35% 25%
Reward 300,000 10% 15% 50% 25%
Total 500,000
Variance
New Segment (Current)
Est #
Customers
Gold Silver Reward Lapsed
Old Segment
Pre Period
Gold 50,000 -10% 10% 0% 0%
Silver 150,000 5% 5% -5% -5%
Reward 300,000 -5% -5% 0% 10%
Total 500,000
 DON’T develop solutions that cannot be measured and tracked
10. Keeping abreast of latest changes
• Big Data
• Increased Automation of Tools
• Artificial Intelligence
24
25
20112006 2017
Big Data
Lots of books, whitepapers andpress
26
What does this really mean for
predictive analytics practitioners?
MORE DATA
Big Data
27
Big Data :Hype to productivity
(early maturity)
Source: Gartner, Hype Cycle for Business Intelligence and Analytics 2015
Innovation
Trigger
Peak of Inflated
Expectations
Trough of
Disillusionment
Slope of
Enlightenment
Plateau of
Productivity
2015
2014
2013
2012
2011
© 2016 Environics Analytics
• Big Data had been a
fixture on the Gartner
Hype Cycle for
Emerging Technologies
for several years
• in 2015, Gartner
eliminated the ‘Big
Data’ Hype Cycle
because it felt that Big
Data had moved past
hype and into practice.
28
Big Data: some of the Vendors?
© 2016 Environics Analytics
Source:
MattTurck.com
Big Data
• What is new to the data scientist?
– Newer ETL skills
– Ability to work with all kinds
of data
29
Different roles, different impacts
• Data Managers – need to manage new containers, flows,
governance, delivering on expectations
• Analytics Practitioners – need to harness new types of
data and integrate them into their processes (determine
the veracity and potential added value)
• Business Strategists – need to understand and evaluate
the potential that new types of data and capabilities have
and how to leverage this to deliver value to the
organization
30
© 2016 Environics Analytics
Big Data
• Increased Demand for Text Mining
• Need for more robust platforms that operate in cloud
• Increased Data Governance is a consequence of Big
Data
31
Increased Automation of Tools
• Creating the analytical file is no longer the
monopoly of programmers in R, Python or SAS.
• Data scientists need to understand data and the
relevant processes in creating the analytical file but
not necessarily programming code or syntax.
Sample of data process
flow used to create
analytical file in
determining top 5
stores
32
Increased Automation of tools
• Machine learning software and tools can now
generate multiple models in matter of seconds i.e.
can we envision a “Ford” type model factory
33
But what about AI and machine learning?
34
Picture credit: Medical Futurist
Machine learning
• Some debate on the exact definition but essentially
the process of a machine that can make decisions
without human intervention.
• Predictive Analytics process is still the same. See
below.
35
RAW DATA Data cleansing Feature engineering Model building
Machine learning and artificial
intelligence
• Neural net techniques represent the underpinnings
of AI ?
36
ARETHEYALLFORMSOFMACHINE
LEARNING?
Linear/logistic regression Svm/discriminate analysis Decision trees NEURAL NETS
image recognition has been the most
commonly used AI Application
37
• The concept of the pixel and trying to predict pixels within a picture
• Business applications could include insurance claim processing,
property management claims and others
AI and Natural language processing
38
• Uses AI to generate text based on historical text
• Utilizes recurrent neural net methodology unlike
image recognition which uses convolutional neural
net methodology.
• Many application today:
– Speech recognition(Echo,Siri)
– Analysis of historical documentation
• Medical Diagnoses
• Review of Legal cases and precedents
But why is AI now a game changer
39
• Two keys to success:-
– Extremely large volumes of data
– Large signal to noise ratio
My Book
4
0
'Boire provides a straightforward and disciplined overview of the practice of data mining. Whether you are
dealing with large or small data sets, tomorrow's business leaders will be the ones that extract the most
value from their customer information. Boire leverages his extensive experience as a practitioner to help
the reader take a measured approach while providing a unique view of data mining.‘
- Bryan Pearson, President and Chief Executive Officer, LoyaltyOne
'Terms like 'analytics,' 'data mining,' and 'modeling' evoke fear in the heart of many a traditional marketer.
Instead of focusing solely on techniques, Boire organizes his content around types of management
decisions. In the process, he demystifies and explains the rationale and methods used in terms that anyone
can understand. A must read for those getting started or looking to round out their expertise.‘
- Kenneth B. Wong, Distinguished Professor of Marketing, Queens University, School of Business
'Business managers and decision makers have been in need of a book on data mining, and—voila! This
industry overview is unique in serving the needs of the consummate businessperson, differentiating it from
the many introductions for would-be hands-on, technical practitioners. Boire has formed a conceptually
rich and insightful compendium that delivers a pragmatic perspective on both the tactical and strategic
value of data mining and predictive analytics.'
- Eric Siegel, founder of Predictive Analytics World and author of Predictive Analytics: The Power to Predict
Who Will Click, Buy, Lie, or Die
Data discovery sample example-migration
example
• 60% of Premiere members are in decline or inactive
– a concerning trend that should be addressed with
new marketing initiatives to re-engage these
members
Current Year Activity
Value Segment in Previous Yr. Growth Stable Decline Inactive Total
Premiere --- 41% 53% 7% 18,406
High 8% 23% 44% 24% 36,811
Medium 9% 21% 14% 56% 73,620
Low 14% 11% --- 75% 55,215
Total 9% 20% 20% 50% 184,052
41
Data discovery Sample example of
Potential Decision Matrix
– Sample Roadmap of Marketing Opportunities
– Opportunity to prioritize activities based on $
opportunity
42
Segment # Customers Avg. Value Strategy Projected RR
$ Growth
Potential Segment Potential
New 54,752 $80.40
Dedicated
Welcome &
Onboarding
15.00% 40% $264,123.65
Premiere in Decline 9,755 $623.53 Retention 5.00% 20% $60,825.35
Potential Premiere 22,086 $147.08 Up-sell 10.00% 30% $97,452.27
Underdeveloped Concession 15,203 $248.00 Cross-sell 2.50% 15% $14,138.79
Etc….
• Predictive Modelling
– Response (and Premium)
– Cancellation
– Contact Rate
• Contact Management
– 3.5 million credit card customers
– 8 marketing sponsors, 21 products
– 80 million targeted communications
• Optimization
– Cost per sale improved from $112
to $37
– Break-even period improved from
53 months to 7 months
– Avg. premium improved 240%
43
HBC: Ongoing Improvements
• The chart above illustrates the improvement in clients results over time:
– Campaign 1& 2 no modeling was used
– Campaign 3-5 the model was applied for list selection
– Campaign 6-10 model was applied, plus business rules from a contact
management database
Campaign # Leads # Sales Total Cost Cost/Sale
Avg
Premium/Cust/
Month
# Months to
Break Even
1 20,000 285 $ 32,000.00 $ 112.28 $ 2.10 53
Pre-Modeling
2 20,000 303 $ 32,000.00 $ 105.61 $ 2.34 45
3 40,000 1134 $ 64,000.00 $ 56.44 $ 4.17 14
Modeling Only4 30,000 1029 $ 50,000.00 $ 48.59 $ 4.44 11
5 30,000 1084 $ 54,750.00 $ 50.51 $ 4.06 12
6 15,000 806 $ 30,446.00 $ 37.77 $ 3.89 10
Modeling & Contact
Management
7 15,000 757 $ 28,442.00 $ 37.57 $ 4.79 8
8 15,000 727 $ 26,678.00 $ 36.70 $ 4.72 8
9 15,000 690 $ 28,064.00 $ 40.67 $ 4.10 10
10 15,000 725 $ 27,225.00 $ 37.55 $ 5.07 7
44
HBC Targeting - Integrating Models
Likelihood to
Answer Call
Likelihood to Purchase Product
HighLow
High
Low
Tele-market
Do Not Market
Tele-market different
product
Same product
different channel
45
46
IS big data ALWAYS THE ANSWER?
INSURANCE TELEMATICS CASE
P&c INSURANCE MODEL – USE TELEMATICS
DATA?
47
© 2016 Environics Analytics
This predictive model did NOT use
telematics data. It is already a very good
model. If telematics data is available,
should it be used? Consider…
• only ~25% of policyholder base has
installed sensor in their car
• this is a biased sample – sensors are
typically installed by those with
conservative driving behaviour
Can we significantly increase the lift
shown in this table with telematics data?
Premium vs. Auto Loss Model Comparison of Cumulative % of Losses
EA Model
48
FINAL THOUGHTS
GOAL: Create the right analytical data set for the business problem
1 2 3BIG DATA
PANACEA FOR
ALL PROJECTS
BIG DATA
USE JUDICIOUSLY
ULTIMATELY,
CONSIDER HOW TO:

More Related Content

Similar to Business and Data Analytics Collaborative April Meetup

Sales and Use Tax Process: Benchmarks and Best Practices for Retailers
Sales and Use Tax Process: Benchmarks and Best Practices for RetailersSales and Use Tax Process: Benchmarks and Best Practices for Retailers
Sales and Use Tax Process: Benchmarks and Best Practices for Retailers
Sovos
 
The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...
The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...
The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...
Epsilon Marketing
 
The Voice of Australia 2016
The Voice of Australia 2016The Voice of Australia 2016
The Voice of Australia 2016
Bentleys (WA) Pty Ltd
 
Maintaining Credit Quality in Banks and Credit Unions
Maintaining Credit Quality in Banks and Credit UnionsMaintaining Credit Quality in Banks and Credit Unions
Maintaining Credit Quality in Banks and Credit Unions
Libby Bierman
 
Generating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsGenerating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB Leads
Bredin, Inc.
 
Payables Review
Payables ReviewPayables Review
Payables Review
Marc Freedman
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
Ganes Kesari
 
Zuora CFO Roundtable - Vancouver | Dec 9
Zuora CFO Roundtable - Vancouver | Dec 9Zuora CFO Roundtable - Vancouver | Dec 9
Zuora CFO Roundtable - Vancouver | Dec 9
Zuora, Inc.
 
Managing Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLsManaging Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLs
Lean Transit Consulting
 
Salesforce Revenue model
Salesforce   Revenue modelSalesforce   Revenue model
Salesforce Revenue model
Ankit Balyan MBA, B.Tech.
 
Beyond the Hype: Building a Sustainable Supplier Risk Strategy
Beyond the Hype: Building a Sustainable Supplier Risk StrategyBeyond the Hype: Building a Sustainable Supplier Risk Strategy
Beyond the Hype: Building a Sustainable Supplier Risk Strategy
Bristlecone SCC
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
1030 track3 boire
1030 track3 boire1030 track3 boire
1030 track3 boire
Rising Media, Inc.
 
Investor Pitch Deck Pe PowerPoint Presentation Slides
Investor Pitch Deck Pe PowerPoint Presentation SlidesInvestor Pitch Deck Pe PowerPoint Presentation Slides
Investor Pitch Deck Pe PowerPoint Presentation Slides
SlideTeam
 
Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?
Matt Moneypenny
 
The 80_20 Strategic Tool 3
The 80_20 Strategic Tool 3The 80_20 Strategic Tool 3
The 80_20 Strategic Tool 3Brian Wax
 
The 80/20 strategic tool
The 80/20 strategic tool The 80/20 strategic tool
The 80/20 strategic tool
The Evans Group LLC
 
Investor Pitch Deck Pe Powerpoint Presentation Slides
Investor Pitch Deck Pe Powerpoint Presentation SlidesInvestor Pitch Deck Pe Powerpoint Presentation Slides
Investor Pitch Deck Pe Powerpoint Presentation Slides
SlideTeam
 
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsOptimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
G3 Communications
 

Similar to Business and Data Analytics Collaborative April Meetup (20)

Sales and Use Tax Process: Benchmarks and Best Practices for Retailers
Sales and Use Tax Process: Benchmarks and Best Practices for RetailersSales and Use Tax Process: Benchmarks and Best Practices for Retailers
Sales and Use Tax Process: Benchmarks and Best Practices for Retailers
 
The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...
The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...
The successful analytics organization - Epsilon and Transamerica, LIMRA Data ...
 
The Voice of Australia 2016
The Voice of Australia 2016The Voice of Australia 2016
The Voice of Australia 2016
 
Maintaining Credit Quality in Banks and Credit Unions
Maintaining Credit Quality in Banks and Credit UnionsMaintaining Credit Quality in Banks and Credit Unions
Maintaining Credit Quality in Banks and Credit Unions
 
Generating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsGenerating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB Leads
 
Payables Review
Payables ReviewPayables Review
Payables Review
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Zuora CFO Roundtable - Vancouver | Dec 9
Zuora CFO Roundtable - Vancouver | Dec 9Zuora CFO Roundtable - Vancouver | Dec 9
Zuora CFO Roundtable - Vancouver | Dec 9
 
Managing Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLsManaging Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLs
 
Salesforce Revenue model
Salesforce   Revenue modelSalesforce   Revenue model
Salesforce Revenue model
 
eCommerce Metrics That Matter
eCommerce Metrics That MattereCommerce Metrics That Matter
eCommerce Metrics That Matter
 
Beyond the Hype: Building a Sustainable Supplier Risk Strategy
Beyond the Hype: Building a Sustainable Supplier Risk StrategyBeyond the Hype: Building a Sustainable Supplier Risk Strategy
Beyond the Hype: Building a Sustainable Supplier Risk Strategy
 
Creditscore
CreditscoreCreditscore
Creditscore
 
1030 track3 boire
1030 track3 boire1030 track3 boire
1030 track3 boire
 
Investor Pitch Deck Pe PowerPoint Presentation Slides
Investor Pitch Deck Pe PowerPoint Presentation SlidesInvestor Pitch Deck Pe PowerPoint Presentation Slides
Investor Pitch Deck Pe PowerPoint Presentation Slides
 
Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?
 
The 80_20 Strategic Tool 3
The 80_20 Strategic Tool 3The 80_20 Strategic Tool 3
The 80_20 Strategic Tool 3
 
The 80/20 strategic tool
The 80/20 strategic tool The 80/20 strategic tool
The 80/20 strategic tool
 
Investor Pitch Deck Pe Powerpoint Presentation Slides
Investor Pitch Deck Pe Powerpoint Presentation SlidesInvestor Pitch Deck Pe Powerpoint Presentation Slides
Investor Pitch Deck Pe Powerpoint Presentation Slides
 
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsOptimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 

Business and Data Analytics Collaborative April Meetup

  • 1. Best Practices and Considerations in Big Data analytics April 19, 2018
  • 2. challengesturning Big Dataanalyticsinto competitiveadvantage 2 Lack of clear roadmap Getting corporate alignment and C-suite buy-in Lack of appropriate data structures Difficulties finding insights Uncertainty as to how to convert insights to action Difficulty proving the business case/undisciplined reporting and measurement
  • 3. 3 ‘Small’ data… there’s still big work to be done here ANALYTICS 1.0 (DESCRIPTIVE) ANALYTICS 2.0 (PREDICTIVE) ANALYTICS 3.0 (PRESCRIPTIVE) ONE VIEW OF THE CUSTOMER ALIGNMENT AND CULTURE © 2016 Evironics Analytics Many organizations that do have Big Data (needs, problems) still have a lot of work to do to achieve analytics capabilities with their ‘small data’.
  • 4. 4 Call Center Operations Quality Control Prioritizing channel opportunity Risk ManagementFraud Detection Direct Marketing Retention/ acquisitio n Some current applications of analytics
  • 5. Analytics as an ongoing journey 5 Analyze Insights Opportunities Plan Investment Strategies Customer Contact & Communication Strategies Execute Creative and Media In-Market Testing Evaluate Measure Learn Optimize Continuous Improvement Through Data Analytics
  • 6. Data discovery Process overview 6 Businessdiscovery&alignment Analyticalsnapshot Insightsand recommendations Analyticalroadmap TacticalopportunitiesDatastrategyCustomerstrategy Data audit
  • 7. The Four step Process in building analytics solutions • A discipline that requires STRUCTURE and PROCESS – At Environics Analytics we utilize the following four-step process to manage projects: 7 Problem Identification Creation of the Analytical Data Environment Application of the Data Mining Tools Implementation and Tracking The overall process remains the same Yet elements within the process are different
  • 8. The data mining tasks within the process? 8 © 2016 Environics Analytics • Meet with key stakeholders • Understand business issues • Understand available data • Review relevant documentation • Define data audit requirements • Audit existing data • Assess completeness and accuracy of data collected • Produce initial frequency reports • Identify data gaps/recommend third party data overlays • Variable creation • Operationaliza tion of solution • Real- time vs. batch • Creating the measurement framework • Analyzing model performance and key learning • Feature engineering • Correlation • Factor analysis • Decision tree • EDA reports • Modelling routines • Regression • Neural nets • Ensemble, etc. • Validation • Decile charts, gains charts • AUC curves Identification of Business Problem Creation of the Analytical File Development / Validation of the Predictive Analytics Solution Deployment and Measurement of the Solution
  • 9. 1. Identify the Real Business Problem 9 Define the problem Listen and learn Key stakeholders Documents and reports Dept. 1 Dept. 2 Dept. 3  DO be careful of silos between departments  DON’T jump to conclusions
  • 10. 2. Look for Quick Wins 10 Creating a quick win Identify and prioritize existing business challenges Make optimal and effective use of information Identify key champion stakeholders  DO avoid exercises that cannot clearly demonstrate cost benefits
  • 11. 2.Creating a quick win • Retention issues are typically a challenge for most marketers • High-value retention model offers significant savings 11 Quantity Promoted Defection Rate of Group # of Potential Defectors. Save Rate # of Cust. Saved Avg. Value Total Saved Quarterly Savings without model 100,000 1.19% 1186 20% 237 $300 $71,100 Savings with model 100,000 3.12% 3117 20% 623 $300 $186,900 Diff: model vs. no model 1.93% 1931 386 $115,800 Potential Benefits - Targeting Top 25% of Model
  • 12. 3. Become familiar with the data: Data Audit Process 12 Meta Data Frequency N-tile Random Data Dump Data Audit Reports Identify Gaps Recommend 3rd Party Data Overlays (if applicable) Data Audit: Assess Completeness & Accuracy of Data
  • 13. 3. Becomefamiliar with data: meta Data Audit Reports • Data audit is done on 53,235 records • By looking at number of unique values and number of missing values, you can begin to understand data • Data indicates that relatively few customers have email (70%) while tenure (creation date) appears to be a good variable with few missing values (3%) 13 Data Audit Report - Column MetaData Structure Value Distribution Ordinal Column Name Column Type Unique Values FREQs Missing Mis % Non Missing 1 NAMEUPPER varchar 48,962 134 0% 53,101 2 PROFILENO int 53,177 59 0% 53,176 3 PROFILETYPE_LINKCO DE varchar 64 Y 0 0% 53,235 4 NAME varchar 48,910 134 0% 53,101 5 FIRSTNAME varchar 12,025 3,690 7% 49,545 6 MIDDLEINIT varchar 823 44,632 84% 8,603 7 LASTNAME varchar 21,678 3,077 6% 50,158 8 COURTESYTITLE varchar 93 Y 15,884 30% 37,351 9 ADDRESS1 varchar 36,252 10,852 20% 42,383 10 ADDRESS2 varchar 2,795 49,254 93% 3,981 11 CITY varchar 3,469 10,878 20% 42,357 12 STATE varchar 758 10,084 19% 43,151 13 ZIP varchar 27,667 11,807 22% 41,428 14 COUNTRY varchar 1,217 19,509 37% 33,726 15 STMTREMARKS varchar 1,867 50,326 95% 2,909 16 UNAPPLIEDBALANCE varchar 57 Y 1,725 3% 51,510 17 PHONE varchar 32,037 17,088 32% 36,147 18 FAX varchar 3,053 49,673 93% 3,562 19 EMAIL varchar 13,731 37,219 70% 16,016 20 TITLE varchar 647 52,519 99% 716 21 BUSINESSTYPE varchar 121 53,105 100% 130 22 NOTES varchar 45 Y 53,191 100% 44 23 ADDITIONALNOTES varchar 20 Y 53,216 100% 19 24 OTHER varchar 9 Y 53,227 100% 8 25 TRAVELPREF varchar 6 Y 53,229 100% 6 26 INTERFACEID varchar 980 51,568 97% 1,667 27 CREATIONDATE datetime 3,041 Y 1,822 3% 51,413 28 CREDITLIMIT varchar 53,235 100% 0  DON’T forget to look at key arithmetic diagnostics
  • 14. 14 Other fields were eliminated because: • Missing values • Cardinality • Irrelevant to actual Twitter behaviour • Example: many ID fields in data 3. Become familiar with data: meta Data Audit Reports Column Name Column Type Unique Values Missing Mis % created_at Character 209,390 554,551 67% entities_urls Character 114,003 380,722 46% in_reply_to_screen_name Character 13,327 801,635 98% in_reply_to_status_id Numeric 26,744 793,501 97% in_reply_to_status_id_str Numeric 26,744 793,501 97% in_reply_to_user_id Numeric 28,700 782,513 95% in_reply_to_user_id_str Numeric 28,700 782,513 95% lang Character 1 0 0% place Character 1,415 816,208 99% source Character 3,919 5,899 1% text Character 172,958 513,496 62% user_created_at Character 129,976 332,696 40% user_followers_count Numeric 68,365 0 0% user_friends_count Numeric 41,694 0 0% user_id_str Numeric 329,121 0 0%
  • 15. 4. Use Statistics Judiciously • The appropriate technique will depend on the business problem 15 • Exploratory – identify key variablesCorrelation Analysis • Exploratory – identify key variables • Can also be used to build final model CHAID • Exploratory – reduce data and help to identify key variables Factor Analysis • Define distinct homogenous groups of customersCluster Analysis • Build Final ModelMultiple Regression • Build Final ModelLogistic Regression • Build Final ModelNeural Nets Statistical Tool Business Application  DON’T assume PhDs in math and computer engineering know everything
  • 16. 5. Establish Performance Benchmarks from the Start 16 • Example: the gains chart looks at lift results % of Validation Sample Validation Names Response Rate % of Total Responders Response Rate Lift Interval ROI Modelling Benefits 0-10% 20,000 3.50% 23% 233 145% $26,667 10-20% 40,000 3.00% 40% 200 75% $40,000 20-30% 60,000 2.75% 55% 183 58% $50,000 30-40% 80,000 2.50% 67% 167 22% $53,333 40-50% 100,000 2.25% 75% 150 -13% $50,000 . . . . . . 90-100% 200,000 1.50% 100% 100 -58% $0  DON’T be consumed by looking at residuals (difference between predicted estimates and observed estimates)
  • 17. • Be very careful about overstating results • Example - Mortgage Insurance Model 17 Is there a problem here? – Need to delve more into the model 6. Interpret Results Carefully/Overstatement % of Names Promoted % of Mortgage Insurance Buyers 0-5% 80% 5-10% 5% 10-15% 5% 15%-100% 10%
  • 18. 18 • Let’s look at model variable contribution reports • Ever bought insurance is accounting for 85% of power of model • Maybe upfront segmentation should be done. Are we sure that the results are still valid? • Correlation results indicate that ever bought insurance variable has results that are in a different magnitude when compared to other variables • Perhaps, we need to investigate this variable more closely 6. Interpret Results Carefully/overstatement Model Variable % Contribution to Model Ever bought Insurance 85% 1 or more lending product 8% Have an investment product 5% Have a credit card 1% Live in Ontario 1% Variable Correlation Coefficient ever bought Insurance 0.75 1 or more lending products 0.2 have a line of credit product 0.18 have a car loan 0.17 have an RRSP 0.16 have an RESP 0.15 have an investiment product 0.16 live in Toronto 0.15 live in Ontario 0.14 ..etc have a chequing account 0.0002
  • 19. 19 Pre-Period Post Period Independent Variables Dependent Variables • The analytical file does tell us something here • Analytical file was improperly created where information used to create the dependent variable was also used to create the ever bought insurance variable • Need to create proper analytical file 6. Interpret Results Carefully/Overstatement Response Ever bought Insurance 1 or more lending products have an investiment have a credit card live in Ontario yes yes Yes Yes Yes Yes yes yes Yes No Yes No yes yes Yes Yes No Yes yes yes Yes Yes No No yes yes Yes No Yes Yes yes yes No Yes Yes No yes yes Yes Yes No Yes yes yes Yes No No No yes yes Yes Yes Yes Yes yes yes Yes Yes Yes No yes yes No Yes No Yes No No No No No No No No Yes No No No No No No Yes No No No No Yes No No Yes No yes No No No Yes No yes No No No No No No No No Yes No
  • 20. 7. Use Art and Science to Build Solutions • Challenge – Retailer collects no information on its customers – Market research indicates the key drivers of purchase behaviour are high income, females and immigrants • Solution – Using an indexing approach, create postal code index variable based on three Statistics Canada variables 20 Income % Female % Landed Immig. Average Postal Code $40,000 52% 5% M5A 1J2 $50,000 60% 10% Index 1.25 1.15 2 The index for M5A 1J2 is (.33 x 1.25)+(.33 x 1.15)+(.33 x 2) = 1.45
  • 21. • Index scheme can then be used to score each postal code • 800,000 postal codes in Canada are then ranked into 20 half deciles based on descending index score 21 % of File # of Postal Codes Min Index in Interval # of Prospects 0-5% 40,000 5.50 80,000 5-10% 40,000 5.00 60,000 10-15% 40,000 4.80 90,000 … 95-100% 40,000 0.05 30,000 Total 800,000 3,000,000  DON’T use lack of data or inability to use advanced techniques as barriers to at least test different initiatives 7. Use Art and Science to Build Solutions
  • 22. 8. Implement Solutions Carefully • Challenge: Loss Cost model – Key variables were analyzed – Why are key model variables not performing? – Audit of current data indicated strong presence of apartments – No apartment data were in model development file 22
  • 23. 9. Measure and Track Results • Customer Migration: A Different View • Series of reports designed to: – Determine actual customer migration patterns of Carded Patrons between two set periods of time – Compare this to the predicted migration pattern – If variance is significantly different, look at which original profile variables are still impacting migration versus those that are not 23 Actual New Segment (Current) Est # Customers Gold Silver Reward Lapsed Old Segment Pre Period Gold 50,000 50% 30% 15% 5% Silver 150,000 20% 30% 30% 20% Reward 300,000 5% 10% 50% 35% Total 500,000 Predicted New Segment (Current) Est # Customers Gold Silver Reward Lapsed Old Segment Pre Period Gold 50,000 60% 20% 15% 5% Silver 150,000 15% 25% 35% 25% Reward 300,000 10% 15% 50% 25% Total 500,000 Variance New Segment (Current) Est # Customers Gold Silver Reward Lapsed Old Segment Pre Period Gold 50,000 -10% 10% 0% 0% Silver 150,000 5% 5% -5% -5% Reward 300,000 -5% -5% 0% 10% Total 500,000  DON’T develop solutions that cannot be measured and tracked
  • 24. 10. Keeping abreast of latest changes • Big Data • Increased Automation of Tools • Artificial Intelligence 24
  • 25. 25 20112006 2017 Big Data Lots of books, whitepapers andpress
  • 26. 26 What does this really mean for predictive analytics practitioners? MORE DATA Big Data
  • 27. 27 Big Data :Hype to productivity (early maturity) Source: Gartner, Hype Cycle for Business Intelligence and Analytics 2015 Innovation Trigger Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity 2015 2014 2013 2012 2011 © 2016 Environics Analytics • Big Data had been a fixture on the Gartner Hype Cycle for Emerging Technologies for several years • in 2015, Gartner eliminated the ‘Big Data’ Hype Cycle because it felt that Big Data had moved past hype and into practice.
  • 28. 28 Big Data: some of the Vendors? © 2016 Environics Analytics Source: MattTurck.com
  • 29. Big Data • What is new to the data scientist? – Newer ETL skills – Ability to work with all kinds of data 29
  • 30. Different roles, different impacts • Data Managers – need to manage new containers, flows, governance, delivering on expectations • Analytics Practitioners – need to harness new types of data and integrate them into their processes (determine the veracity and potential added value) • Business Strategists – need to understand and evaluate the potential that new types of data and capabilities have and how to leverage this to deliver value to the organization 30 © 2016 Environics Analytics
  • 31. Big Data • Increased Demand for Text Mining • Need for more robust platforms that operate in cloud • Increased Data Governance is a consequence of Big Data 31
  • 32. Increased Automation of Tools • Creating the analytical file is no longer the monopoly of programmers in R, Python or SAS. • Data scientists need to understand data and the relevant processes in creating the analytical file but not necessarily programming code or syntax. Sample of data process flow used to create analytical file in determining top 5 stores 32
  • 33. Increased Automation of tools • Machine learning software and tools can now generate multiple models in matter of seconds i.e. can we envision a “Ford” type model factory 33
  • 34. But what about AI and machine learning? 34 Picture credit: Medical Futurist
  • 35. Machine learning • Some debate on the exact definition but essentially the process of a machine that can make decisions without human intervention. • Predictive Analytics process is still the same. See below. 35 RAW DATA Data cleansing Feature engineering Model building
  • 36. Machine learning and artificial intelligence • Neural net techniques represent the underpinnings of AI ? 36 ARETHEYALLFORMSOFMACHINE LEARNING? Linear/logistic regression Svm/discriminate analysis Decision trees NEURAL NETS
  • 37. image recognition has been the most commonly used AI Application 37 • The concept of the pixel and trying to predict pixels within a picture • Business applications could include insurance claim processing, property management claims and others
  • 38. AI and Natural language processing 38 • Uses AI to generate text based on historical text • Utilizes recurrent neural net methodology unlike image recognition which uses convolutional neural net methodology. • Many application today: – Speech recognition(Echo,Siri) – Analysis of historical documentation • Medical Diagnoses • Review of Legal cases and precedents
  • 39. But why is AI now a game changer 39 • Two keys to success:- – Extremely large volumes of data – Large signal to noise ratio
  • 40. My Book 4 0 'Boire provides a straightforward and disciplined overview of the practice of data mining. Whether you are dealing with large or small data sets, tomorrow's business leaders will be the ones that extract the most value from their customer information. Boire leverages his extensive experience as a practitioner to help the reader take a measured approach while providing a unique view of data mining.‘ - Bryan Pearson, President and Chief Executive Officer, LoyaltyOne 'Terms like 'analytics,' 'data mining,' and 'modeling' evoke fear in the heart of many a traditional marketer. Instead of focusing solely on techniques, Boire organizes his content around types of management decisions. In the process, he demystifies and explains the rationale and methods used in terms that anyone can understand. A must read for those getting started or looking to round out their expertise.‘ - Kenneth B. Wong, Distinguished Professor of Marketing, Queens University, School of Business 'Business managers and decision makers have been in need of a book on data mining, and—voila! This industry overview is unique in serving the needs of the consummate businessperson, differentiating it from the many introductions for would-be hands-on, technical practitioners. Boire has formed a conceptually rich and insightful compendium that delivers a pragmatic perspective on both the tactical and strategic value of data mining and predictive analytics.' - Eric Siegel, founder of Predictive Analytics World and author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
  • 41. Data discovery sample example-migration example • 60% of Premiere members are in decline or inactive – a concerning trend that should be addressed with new marketing initiatives to re-engage these members Current Year Activity Value Segment in Previous Yr. Growth Stable Decline Inactive Total Premiere --- 41% 53% 7% 18,406 High 8% 23% 44% 24% 36,811 Medium 9% 21% 14% 56% 73,620 Low 14% 11% --- 75% 55,215 Total 9% 20% 20% 50% 184,052 41
  • 42. Data discovery Sample example of Potential Decision Matrix – Sample Roadmap of Marketing Opportunities – Opportunity to prioritize activities based on $ opportunity 42 Segment # Customers Avg. Value Strategy Projected RR $ Growth Potential Segment Potential New 54,752 $80.40 Dedicated Welcome & Onboarding 15.00% 40% $264,123.65 Premiere in Decline 9,755 $623.53 Retention 5.00% 20% $60,825.35 Potential Premiere 22,086 $147.08 Up-sell 10.00% 30% $97,452.27 Underdeveloped Concession 15,203 $248.00 Cross-sell 2.50% 15% $14,138.79 Etc….
  • 43. • Predictive Modelling – Response (and Premium) – Cancellation – Contact Rate • Contact Management – 3.5 million credit card customers – 8 marketing sponsors, 21 products – 80 million targeted communications • Optimization – Cost per sale improved from $112 to $37 – Break-even period improved from 53 months to 7 months – Avg. premium improved 240% 43
  • 44. HBC: Ongoing Improvements • The chart above illustrates the improvement in clients results over time: – Campaign 1& 2 no modeling was used – Campaign 3-5 the model was applied for list selection – Campaign 6-10 model was applied, plus business rules from a contact management database Campaign # Leads # Sales Total Cost Cost/Sale Avg Premium/Cust/ Month # Months to Break Even 1 20,000 285 $ 32,000.00 $ 112.28 $ 2.10 53 Pre-Modeling 2 20,000 303 $ 32,000.00 $ 105.61 $ 2.34 45 3 40,000 1134 $ 64,000.00 $ 56.44 $ 4.17 14 Modeling Only4 30,000 1029 $ 50,000.00 $ 48.59 $ 4.44 11 5 30,000 1084 $ 54,750.00 $ 50.51 $ 4.06 12 6 15,000 806 $ 30,446.00 $ 37.77 $ 3.89 10 Modeling & Contact Management 7 15,000 757 $ 28,442.00 $ 37.57 $ 4.79 8 8 15,000 727 $ 26,678.00 $ 36.70 $ 4.72 8 9 15,000 690 $ 28,064.00 $ 40.67 $ 4.10 10 10 15,000 725 $ 27,225.00 $ 37.55 $ 5.07 7 44
  • 45. HBC Targeting - Integrating Models Likelihood to Answer Call Likelihood to Purchase Product HighLow High Low Tele-market Do Not Market Tele-market different product Same product different channel 45
  • 46. 46 IS big data ALWAYS THE ANSWER? INSURANCE TELEMATICS CASE
  • 47. P&c INSURANCE MODEL – USE TELEMATICS DATA? 47 © 2016 Environics Analytics This predictive model did NOT use telematics data. It is already a very good model. If telematics data is available, should it be used? Consider… • only ~25% of policyholder base has installed sensor in their car • this is a biased sample – sensors are typically installed by those with conservative driving behaviour Can we significantly increase the lift shown in this table with telematics data? Premium vs. Auto Loss Model Comparison of Cumulative % of Losses EA Model
  • 48. 48 FINAL THOUGHTS GOAL: Create the right analytical data set for the business problem 1 2 3BIG DATA PANACEA FOR ALL PROJECTS BIG DATA USE JUDICIOUSLY ULTIMATELY, CONSIDER HOW TO: