SlideShare a Scribd company logo
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 11
Analytics Academy – “Statistical
thinking”
(Client: a household-name media company)
Information and
Data Management
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 22
Contents
1st Day
Introduction
The research process:
CRISP-DM
Analysis
Reporting vs. modelling
Is there an effect?
Is there a single cause?
Forecasting
Could there be more than
one cause?
2nd day
Working together
The Data Academy
Sharing results
What to show
How to show it
Next steps
A further project
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 33
Introduction:
Getting your data to speak
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 44
Customer Acquisition & Retention
Marketing Efficiency & Advertising Revenue
Cost to Serve & Profitability
Promotion & Pricing Optimisation
Demand Forecasting
Fraud Detection
Many business challenges
4Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 555
Do you know what you are looking for?
The business need for the analysis is formulated and confirmed.
The question that the stakeholder needs an answer for is articulated.
Steering away from analysing ‘right answer to the wrong question’.
Do you know what you will do with the answers you find?
The desired outputs from the analysis are shaped in detail to ensure
that the analysis produces outputs in a format that is fit-for-purpose.
Actual outputs can be easily integrated into the stakeholders’ target
documents, systems or processes.
Do you have a way to evaluate success?
Can you measure the current situation in terms of money, time or
units?
Do you have a way of tracking the results of your work in the same
units?
Before jumping into the data...
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 666
Gather ideas from people in your business about the
cause –> effect relationships.
Gather impressions about the different classes or
types of events.
Consider both positive and negative outcomes.
Translate these ideas/impressions into data
What would data have to look like to detect the effects and
trends people believe in?
Translate business objective into analysis goal…
Getting your data to speak
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 77
The research process:
CRISP-DM
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 88
CRISP-DM process
Business data
for analytics
1 Develop
business
understanding
2 Develop data
understanding
3 Prepare data
4 Develop
model
5 Evaluate
results
6 Deploy live
model
Key:
Data set
Process stage
Flow between
stages
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 99
CRISP-DM
Business understanding
Determine objectives
Establish use cases
Summarise current situation
Determine project goals
Map business goals to data
problem
Estimate current value so that
ROI can be calculated
Create project plan
Data understanding
Collect initial data
Document the real meaning
of each data field
Capture baseline SQL
Explore the data
Check data quality
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1010
Analysis:
Reporting vs. Modelling
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 111111
vs.
What was the rate of net growth? Why did we have higher/lower rate?
Information based on user-directed queries
(hypothesis testing)
Knowledge based on finding unknown
relationships (hypothesis generation)
Historical Analysis Predictive Analysis
Monitors performance measures Determines performance measures
Reactive Proactive
Reporting vs. Modelling
ModellingReporting
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 121212
Data Input Target Output Algorithm Goal Results
Find Most
Important
Inputs?
Easy to
interprete/
visualise?
Numeric and/or Symbolic Symbolic C5.0 Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes
Numeric and/or Symbolic Numeric or Symbolic C&RT Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes
Numeric Numeric Linear Regression Predicting / Forecasting Equation for prediction with coefficients Yes No
Numeric and/or Symbolic Symbolic Logistic Regression Predicting / Probability Equation for prediction of probability and associated coefficients No1
No
Numeric and/or Symbolic Numeric or Symbolic Neural Network Predicting / Probability Prediction and relative importance of input neurons No2
No
Numeric and/or Symbolic None Kohonen Map Clustering / Segmentation Cluster Membership and deviation No Yes
Numeric None K-Means Clustering / Segmentation Cluster Membership with cluster centers No Yes
Numeric and/or Symbolic None Two-Step Clustering / Segmentation Cluster Membership with cluster centers No Yes
Symbolic Symbolic Apriori4, 5
Association Detection Association rules with confidence Yes Yes
Numeric and/or Symbolic Time to event Kaplan-Meier Strategic Planning Survivor / Hazard Curve No Yes
Numeric and/or Symbolic Time to event Cox Regression Tactical Interventions Survivor / Hazard Curve No Yes
Sometimes we put the data into the model and see what happens. Other times we
manipulate the inputs (or the outputs) in some way so as to give the algorithm more
information to work with.
By combining multiple techniques, we can often gain better insight into the nature of
potential solutions to a business problem and hopefully lead us to a more useful result.
Since more than one approach may be used to address a single business problem, the
same data may be used to address a wide range of applications. It will depend on which
model you choose, how you manipulate the data in the file, and which input or target
variables you choose.
Map analysis goal to
modelling technique
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1313
Analysis:
Basic statistical terms
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1414
Level of measurement
Level of measurement Summary
statistic
Visualization
Categorical or Nominal Mode Bar chart, pareto chart
Ranked or Ordinal Median,
percentile
Bar chart
Numeric or Scale Mean or
average
Histogram, line graph,
bubble chart
0
5
10
15
20
25
30
35
40
45
50
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0
5
10
15
20
25
30
35
40
45
50
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1515
Inserting functions
Click in a cell
Go the Insert menu
Choose Functions…
Select a category
Click on a function
and look at the
brief help
(first letter search
works)
Click OK to paste
Click
Help on this function
for more information
and a worked example
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1616
Numbers that describe
a distribution
Statistic Function Definition
Mode =MODE The most common value
Percent =COUNT What proportion of the
cases are in this group?
COUNT in the group
divided by total COUNT.
Percentile =PERCENTILE
=PERCENTILERANK
How far down the list of
an ordered set are you?
Median =MEDIAN The middle value of an
ordered set. The 50th
percentile.
Mean =AVERAGE Add all the values and
divide by the count
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1717
Analysis:
Is there an effect?
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1818
Discussion of crosstabs
A method to test if two variables have
a non-random relationship
Also called chi-square analysis for the name of the
statistic that is calculated
Χ2 or X2
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1919
Discussion of crosstabs
Is there a relationship
between the section you
are reading on the website
and whether or not you are
motivated to subscribe?
Or are the numbers just due
to the normal visit pattern
on the site?
Data:
Subscribe y/n on this visit to
this section
Section
YES NO TOTAL
HOME
NEWS
SPORT
FINANCE
COMMENT
BLOGS
CULTURE
TRAVEL
LIFESTYLE
FASHION
TECH
Offers
TOTAL
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2020
Crosstabs example
Actual counts
Calculated %
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2121
Analysis:
Is there a single cause?
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2222
When we seek to predict something, what we are
really saying is that we have in mind what the cause
is and we are trying to predict how likely the effect
is.
Modelling techniques do not make predictions on
their own. Analysts structure the data input so that
the model can use it in a cause and effect way.
Thus, it is important to make sure that all of the
inputs into a model precede the output in time. You
can’t put the effect before the cause.
22
Predictive modelling
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 232323
Try to get to models early in the process, even before
you think you are ready.
Models can tell you things about the data that you
can’t see “just by looking”
Build lots of models.
Throw away the ones that you are done with
Refine models based on what you learn at each
iteration.
Algorithms (within their limitations) are objective
Interpret the results, then make
them better
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2424
Could there be
more than one cause?
A topic for another course…
Advanced analytics
Structure the data into
before and after
Pick a target
Test multiple input
hypotheses at once
Forecasting:
ARIMA allows for
including multiple time
series inputs
Special events
Weather
Economic trends
Multivariate propensity
Discover different
predictive segments
Works best with
predicting Y/N actions
rather than values
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2525
Sharing results:
What to show
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2626
Your job is to inspire
Your job is not to convince or teach
Lead with the important and interesting findings
Explain in general terms
Leave the details at the Data Academy
Inspire them
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2727
Business
Objectives
Analysis
Results
Business
Terms
Modelling &
Evaluation
(Accuracy &
Significance)
Meaningful
Relevant
Actionable
Quantified
Translating analysis results into
business terms
27Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Analysis
Goals
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2828
How you did it
How long it took
Statistical methods
What to leave in
the Data Academy
Problems you had
Caveats related to data
Dirty data
Analysis
Goals
Analysis
Results
Modelling &
Evaluation
(Accuracy & Significance)
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2929
Sharing results:
How to show it
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3030
Can everyone read it
Is someone color blind?
Does someone have corrective
lenses?
Will it print in black and white?
Test print
Black text on dark colors,
including red, will not print.
Use white text instead
Wrong Better
Better
0
10
20
30
40
50
60
70
80
90
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Not as Nice
0
10
20
30
40
50
60
70
80
90
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Design – Colour
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3131
Design –
Colour for the colourblind
• http://www.colorbrewer2.org/
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3232
Contact information
Please direct enquiries to Jefferson Lynch:
jefferson.lynch@red-olive.co.uk
Office: 01256 831100
Mobile: 07860 353027
32

More Related Content

What's hot

Data Science-Data Analytics
Data Science-Data AnalyticsData Science-Data Analytics
Data Science-Data Analytics
Alexander Kolker
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
Eric Esajian
 
Master Of Science Dissertation
Master Of Science DissertationMaster Of Science Dissertation
Master Of Science Dissertation
Alessio Aristide Di Salvo
 
1030 track1 heiler
1030 track1 heiler1030 track1 heiler
1030 track1 heiler
Rising Media, Inc.
 
1.a fuzzy
1.a fuzzy1.a fuzzy
1.a fuzzy
libfsb
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 
Customer Analytics Best Practice
Customer Analytics Best PracticeCustomer Analytics Best Practice
Customer Analytics Best Practice
Ta-Wei (David) Huang
 
P 02 internal_data_first_2017_04_22_v6
P 02 internal_data_first_2017_04_22_v6P 02 internal_data_first_2017_04_22_v6
P 02 internal_data_first_2017_04_22_v6
Vishwa Kolla
 
Forrester big data_predictive_analytics
Forrester big data_predictive_analyticsForrester big data_predictive_analytics
Forrester big data_predictive_analytics
Shyam Sarkar
 
1645 track 3 porter
1645 track 3 porter1645 track 3 porter
1645 track 3 porter
Rising Media, Inc.
 
Getting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics ProgramGetting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics Program
J. Bryan Bennett, MBA, CPA, LSSGB
 
How Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataHow Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with Data
Ta-Wei (David) Huang
 
P 02 ta_in_uw_transformation_2017_06_13_v5
P 02 ta_in_uw_transformation_2017_06_13_v5P 02 ta_in_uw_transformation_2017_06_13_v5
P 02 ta_in_uw_transformation_2017_06_13_v5
Vishwa Kolla
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
Shubhashish Biswas
 
P 01 advanced_people_analytics_2016_04_03_v11
P 01 advanced_people_analytics_2016_04_03_v11P 01 advanced_people_analytics_2016_04_03_v11
P 01 advanced_people_analytics_2016_04_03_v11
Vishwa Kolla
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization Introduction
Raul Chong
 
Operational analytics overview
Operational analytics overviewOperational analytics overview
Operational analytics overview
pallavi pentapati
 
1000 track2 boire
1000 track2 boire1000 track2 boire
1000 track2 boire
Rising Media, Inc.
 
Hr analytics inter relations plot
Hr analytics   inter relations plotHr analytics   inter relations plot
Hr analytics inter relations plot
AnalitiQs
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
Domino Data Lab
 

What's hot (20)

Data Science-Data Analytics
Data Science-Data AnalyticsData Science-Data Analytics
Data Science-Data Analytics
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Master Of Science Dissertation
Master Of Science DissertationMaster Of Science Dissertation
Master Of Science Dissertation
 
1030 track1 heiler
1030 track1 heiler1030 track1 heiler
1030 track1 heiler
 
1.a fuzzy
1.a fuzzy1.a fuzzy
1.a fuzzy
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Customer Analytics Best Practice
Customer Analytics Best PracticeCustomer Analytics Best Practice
Customer Analytics Best Practice
 
P 02 internal_data_first_2017_04_22_v6
P 02 internal_data_first_2017_04_22_v6P 02 internal_data_first_2017_04_22_v6
P 02 internal_data_first_2017_04_22_v6
 
Forrester big data_predictive_analytics
Forrester big data_predictive_analyticsForrester big data_predictive_analytics
Forrester big data_predictive_analytics
 
1645 track 3 porter
1645 track 3 porter1645 track 3 porter
1645 track 3 porter
 
Getting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics ProgramGetting Started With a Healthcare Predictive Analytics Program
Getting Started With a Healthcare Predictive Analytics Program
 
How Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataHow Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with Data
 
P 02 ta_in_uw_transformation_2017_06_13_v5
P 02 ta_in_uw_transformation_2017_06_13_v5P 02 ta_in_uw_transformation_2017_06_13_v5
P 02 ta_in_uw_transformation_2017_06_13_v5
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
P 01 advanced_people_analytics_2016_04_03_v11
P 01 advanced_people_analytics_2016_04_03_v11P 01 advanced_people_analytics_2016_04_03_v11
P 01 advanced_people_analytics_2016_04_03_v11
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization Introduction
 
Operational analytics overview
Operational analytics overviewOperational analytics overview
Operational analytics overview
 
1000 track2 boire
1000 track2 boire1000 track2 boire
1000 track2 boire
 
Hr analytics inter relations plot
Hr analytics   inter relations plotHr analytics   inter relations plot
Hr analytics inter relations plot
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 

Similar to 1505 Statistical Thinking course extract

The Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impactThe Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impact
Paul Laughlin
 
Complex Problem Solving and Big Data Analytics
Complex Problem Solving and Big Data AnalyticsComplex Problem Solving and Big Data Analytics
Complex Problem Solving and Big Data Analytics
CoThink
 
Creating a Business Case for Big Data
Creating a Business Case for Big DataCreating a Business Case for Big Data
Creating a Business Case for Big Data
Perficient, Inc.
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
Product School
 
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big dataConociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Mundo Contact
 
Oracle big data and rtd v5
Oracle big data and rtd v5Oracle big data and rtd v5
Oracle big data and rtd v5
techsuda
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
Sunil Ranka
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
DataScienceConferenc1
 
How to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organizationHow to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organization
SAS Canada
 
Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7
OReillyStrata
 
Business analytics -Abhay Mahalley
Business analytics -Abhay MahalleyBusiness analytics -Abhay Mahalley
Business analytics -Abhay Mahalley
Abhay Mahalley
 
Business analytics !!
Business analytics !!Business analytics !!
Business analytics !!
Abhay Mahalley
 
Business analytics
Business analytics Business analytics
Business analytics
Abhay Mahalley
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
Sergey Shelpuk
 
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектовAI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
GeeksLab Odessa
 
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Hannah Flynn
 
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Aggregage
 
The Journey to Big Data Analytics
The Journey to Big Data AnalyticsThe Journey to Big Data Analytics
The Journey to Big Data Analytics
Dr.Stefan Radtke
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
Gramener
 
REQUE - Predictive lead scoring for recruiters and talent agencies
REQUE - Predictive lead scoring for recruiters and talent agenciesREQUE - Predictive lead scoring for recruiters and talent agencies
REQUE - Predictive lead scoring for recruiters and talent agencies
Miroslav Maráz
 

Similar to 1505 Statistical Thinking course extract (20)

The Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impactThe Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impact
 
Complex Problem Solving and Big Data Analytics
Complex Problem Solving and Big Data AnalyticsComplex Problem Solving and Big Data Analytics
Complex Problem Solving and Big Data Analytics
 
Creating a Business Case for Big Data
Creating a Business Case for Big DataCreating a Business Case for Big Data
Creating a Business Case for Big Data
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big dataConociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
 
Oracle big data and rtd v5
Oracle big data and rtd v5Oracle big data and rtd v5
Oracle big data and rtd v5
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
How to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organizationHow to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organization
 
Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7
 
Business analytics -Abhay Mahalley
Business analytics -Abhay MahalleyBusiness analytics -Abhay Mahalley
Business analytics -Abhay Mahalley
 
Business analytics !!
Business analytics !!Business analytics !!
Business analytics !!
 
Business analytics
Business analytics Business analytics
Business analytics
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектовAI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
 
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
 
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
Dashboards that Set Your App Apart: The Complete Predictive Analytics Lifecyc...
 
The Journey to Big Data Analytics
The Journey to Big Data AnalyticsThe Journey to Big Data Analytics
The Journey to Big Data Analytics
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 
REQUE - Predictive lead scoring for recruiters and talent agencies
REQUE - Predictive lead scoring for recruiters and talent agenciesREQUE - Predictive lead scoring for recruiters and talent agencies
REQUE - Predictive lead scoring for recruiters and talent agencies
 

1505 Statistical Thinking course extract

  • 1. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 11 Analytics Academy – “Statistical thinking” (Client: a household-name media company) Information and Data Management
  • 2. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 22 Contents 1st Day Introduction The research process: CRISP-DM Analysis Reporting vs. modelling Is there an effect? Is there a single cause? Forecasting Could there be more than one cause? 2nd day Working together The Data Academy Sharing results What to show How to show it Next steps A further project Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2
  • 3. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 33 Introduction: Getting your data to speak Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3
  • 4. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 44 Customer Acquisition & Retention Marketing Efficiency & Advertising Revenue Cost to Serve & Profitability Promotion & Pricing Optimisation Demand Forecasting Fraud Detection Many business challenges 4Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 5. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 555 Do you know what you are looking for? The business need for the analysis is formulated and confirmed. The question that the stakeholder needs an answer for is articulated. Steering away from analysing ‘right answer to the wrong question’. Do you know what you will do with the answers you find? The desired outputs from the analysis are shaped in detail to ensure that the analysis produces outputs in a format that is fit-for-purpose. Actual outputs can be easily integrated into the stakeholders’ target documents, systems or processes. Do you have a way to evaluate success? Can you measure the current situation in terms of money, time or units? Do you have a way of tracking the results of your work in the same units? Before jumping into the data... Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 6. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 666 Gather ideas from people in your business about the cause –> effect relationships. Gather impressions about the different classes or types of events. Consider both positive and negative outcomes. Translate these ideas/impressions into data What would data have to look like to detect the effects and trends people believe in? Translate business objective into analysis goal… Getting your data to speak Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 7. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 77 The research process: CRISP-DM Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
  • 8. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 88 CRISP-DM process Business data for analytics 1 Develop business understanding 2 Develop data understanding 3 Prepare data 4 Develop model 5 Evaluate results 6 Deploy live model Key: Data set Process stage Flow between stages Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
  • 9. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 99 CRISP-DM Business understanding Determine objectives Establish use cases Summarise current situation Determine project goals Map business goals to data problem Estimate current value so that ROI can be calculated Create project plan Data understanding Collect initial data Document the real meaning of each data field Capture baseline SQL Explore the data Check data quality
  • 10. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1010 Analysis: Reporting vs. Modelling Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10
  • 11. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 111111 vs. What was the rate of net growth? Why did we have higher/lower rate? Information based on user-directed queries (hypothesis testing) Knowledge based on finding unknown relationships (hypothesis generation) Historical Analysis Predictive Analysis Monitors performance measures Determines performance measures Reactive Proactive Reporting vs. Modelling ModellingReporting Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 12. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 121212 Data Input Target Output Algorithm Goal Results Find Most Important Inputs? Easy to interprete/ visualise? Numeric and/or Symbolic Symbolic C5.0 Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes Numeric and/or Symbolic Numeric or Symbolic C&RT Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes Numeric Numeric Linear Regression Predicting / Forecasting Equation for prediction with coefficients Yes No Numeric and/or Symbolic Symbolic Logistic Regression Predicting / Probability Equation for prediction of probability and associated coefficients No1 No Numeric and/or Symbolic Numeric or Symbolic Neural Network Predicting / Probability Prediction and relative importance of input neurons No2 No Numeric and/or Symbolic None Kohonen Map Clustering / Segmentation Cluster Membership and deviation No Yes Numeric None K-Means Clustering / Segmentation Cluster Membership with cluster centers No Yes Numeric and/or Symbolic None Two-Step Clustering / Segmentation Cluster Membership with cluster centers No Yes Symbolic Symbolic Apriori4, 5 Association Detection Association rules with confidence Yes Yes Numeric and/or Symbolic Time to event Kaplan-Meier Strategic Planning Survivor / Hazard Curve No Yes Numeric and/or Symbolic Time to event Cox Regression Tactical Interventions Survivor / Hazard Curve No Yes Sometimes we put the data into the model and see what happens. Other times we manipulate the inputs (or the outputs) in some way so as to give the algorithm more information to work with. By combining multiple techniques, we can often gain better insight into the nature of potential solutions to a business problem and hopefully lead us to a more useful result. Since more than one approach may be used to address a single business problem, the same data may be used to address a wide range of applications. It will depend on which model you choose, how you manipulate the data in the file, and which input or target variables you choose. Map analysis goal to modelling technique Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 13. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1313 Analysis: Basic statistical terms Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13
  • 14. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1414 Level of measurement Level of measurement Summary statistic Visualization Categorical or Nominal Mode Bar chart, pareto chart Ranked or Ordinal Median, percentile Bar chart Numeric or Scale Mean or average Histogram, line graph, bubble chart 0 5 10 15 20 25 30 35 40 45 50 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0 5 10 15 20 25 30 35 40 45 50
  • 15. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1515 Inserting functions Click in a cell Go the Insert menu Choose Functions… Select a category Click on a function and look at the brief help (first letter search works) Click OK to paste Click Help on this function for more information and a worked example
  • 16. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1616 Numbers that describe a distribution Statistic Function Definition Mode =MODE The most common value Percent =COUNT What proportion of the cases are in this group? COUNT in the group divided by total COUNT. Percentile =PERCENTILE =PERCENTILERANK How far down the list of an ordered set are you? Median =MEDIAN The middle value of an ordered set. The 50th percentile. Mean =AVERAGE Add all the values and divide by the count
  • 17. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1717 Analysis: Is there an effect? Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17
  • 18. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1818 Discussion of crosstabs A method to test if two variables have a non-random relationship Also called chi-square analysis for the name of the statistic that is calculated Χ2 or X2
  • 19. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1919 Discussion of crosstabs Is there a relationship between the section you are reading on the website and whether or not you are motivated to subscribe? Or are the numbers just due to the normal visit pattern on the site? Data: Subscribe y/n on this visit to this section Section YES NO TOTAL HOME NEWS SPORT FINANCE COMMENT BLOGS CULTURE TRAVEL LIFESTYLE FASHION TECH Offers TOTAL
  • 20. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2020 Crosstabs example Actual counts Calculated %
  • 21. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2121 Analysis: Is there a single cause? Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21
  • 22. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2222 When we seek to predict something, what we are really saying is that we have in mind what the cause is and we are trying to predict how likely the effect is. Modelling techniques do not make predictions on their own. Analysts structure the data input so that the model can use it in a cause and effect way. Thus, it is important to make sure that all of the inputs into a model precede the output in time. You can’t put the effect before the cause. 22 Predictive modelling Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 23. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 232323 Try to get to models early in the process, even before you think you are ready. Models can tell you things about the data that you can’t see “just by looking” Build lots of models. Throw away the ones that you are done with Refine models based on what you learn at each iteration. Algorithms (within their limitations) are objective Interpret the results, then make them better Copyright © 2012 Red Olive Ltd, All Rights Reserved.
  • 24. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2424 Could there be more than one cause? A topic for another course… Advanced analytics Structure the data into before and after Pick a target Test multiple input hypotheses at once Forecasting: ARIMA allows for including multiple time series inputs Special events Weather Economic trends Multivariate propensity Discover different predictive segments Works best with predicting Y/N actions rather than values
  • 25. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2525 Sharing results: What to show Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25
  • 26. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2626 Your job is to inspire Your job is not to convince or teach Lead with the important and interesting findings Explain in general terms Leave the details at the Data Academy Inspire them Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26
  • 27. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2727 Business Objectives Analysis Results Business Terms Modelling & Evaluation (Accuracy & Significance) Meaningful Relevant Actionable Quantified Translating analysis results into business terms 27Copyright © 2012 Red Olive Ltd, All Rights Reserved. Analysis Goals
  • 28. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2828 How you did it How long it took Statistical methods What to leave in the Data Academy Problems you had Caveats related to data Dirty data Analysis Goals Analysis Results Modelling & Evaluation (Accuracy & Significance) Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28
  • 29. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2929 Sharing results: How to show it Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29
  • 30. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3030 Can everyone read it Is someone color blind? Does someone have corrective lenses? Will it print in black and white? Test print Black text on dark colors, including red, will not print. Use white text instead Wrong Better Better 0 10 20 30 40 50 60 70 80 90 100 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North Not as Nice 0 10 20 30 40 50 60 70 80 90 100 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North Design – Colour
  • 31. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3131 Design – Colour for the colourblind • http://www.colorbrewer2.org/
  • 32. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3232 Contact information Please direct enquiries to Jefferson Lynch: jefferson.lynch@red-olive.co.uk Office: 01256 831100 Mobile: 07860 353027 32