SlideShare a Scribd company logo
1 of 18
Download to read offline
1
Predictive Analytics:
Extracts from Red Olive foundational course
For more details or to speak about a tailored
course for your organisation please contact:
Jefferson Lynch: jefferson.lynch@red-olive.co.uk
+44 1256 831100
December 2014
Analytics and
Data Management1
Contents
What makes a great analysis?
Measuring relationships between variables
Profiling
What is data mining?
The data mining process
Data mining techniques
Discussion – next steps for data mining
Back-up slides
Introduction to descriptive statistics
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 2
Some examples
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 3
Christmas
2011
Queen’s
Diamond
Jubilee
Road works
ban starts
(1st July 2012)
London 2012
Olympic and
Paralympic
Games
Winter-time
road works /
end FY
Monitoring Trends
Traffic Disruption in London
Information from Transport for London
Oracle Day presentation, 6 Nov 2012
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 4
Geographical Mash-ups
5
Visualising:
Connections between businesses in East London
Based on:
Streams of Twitter data, tracking relationships, mentions
and retweeets
Source: http://www.techcitymap.com/index.html#/ Copyright © 2014 Red Olive Ltd, All Rights Reserved.
Census Analysis
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 6
Original source: ONS data visualisation centre
http://www.ons.gov.uk/ons/interactive/index.html
Census 2011: Explore
population changes in your area
Source: The Telegraph online
Interactive tool for looking
comparing areas on their 2001
and 2011 demographic profiles
http://www.telegraph.co.uk/ear
th/greenpolitics/population/94
03239/Census-2011-Explore-
the-population-changes-in-
your-area.html
Measuring relationships
between variables
• In order to start making ‘connections’ we need to investigate
relationships between variables
• Start point - relationships between two variables at time
• Multivariate techniques allow us to investigate relationships between
many variables
• The appropriate measure of relationship depends on the type of data
that you’re analysing – primarily whether scale (numeric) or nominal
(categorical)
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
Measures of relationship
Scale (numeric) data
Correlation quantifies the linear relationship between variables
in scatter plots
+1 = exact positive relationship e.g. e.g.
0 = no relationship e.g.
x x
x
x
x
x
x x x
-1 = exact negative relationship e.g.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
Correlation coefficient
takes values between -1 and +1
The correlation will rarely be exactly 1 or -1
This would suggest that the variables were
exactly dependent on each other
Likewise the correlation is rarely exactly 0
Because a slight relationship can occur by
chance
Correlation measures the extent of a linear
relationship, so needs to be handled with
care
Four sets of data with
the same correlation of 0.816
For Correlation:
Excel function CORREL
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 9
What is data mining?
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 10
Two main types of
data mining model
Type 1: Models driven by a Target Variable
e.g. Which site visitors are likely to subscribe?
- Implies building a Predictive Model
- ‘Directed’ Data Mining Techniques
Type 2: Models with no Target Variable
e.g. How does the subscriber base segment?
- Implies a Descriptive Model
- ‘Undirected’ Data Mining Techniques
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 11
Gains Chart – based on representative
evaluation sample
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cumulative%oofrespoondents
Cumulative % of base
Gains Chart Churn Model
prediction
random
optimal
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 12
Data mining techniques and
where they can be applied
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 13
Techniques to be discussed
Predictive
Forecasting
Decision trees
Regression models
Descriptive
Factor analysis
Cluster analysis
Affinity analysis
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 14
Techniques on individual-level data
Data mining methods
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 15
16
Target Variable:
Good/Bad Credit Rating
Best predictor:
Income Level
2nd best predictor:
Number of credit cards
End nodes:
No further splits
Example
Decision Tree
Final predictor:
Age
Highly significant
Copyright © 2014 Red Olive Ltd, All
Rights Reserved.
16
Regression Example
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 17
Source: The Times
24/11/2012
Regression
Model
The affinity tile map
Strengths of affinities
are displayed using a
‘hot-cold’ colour palette
By clicking on a tile,
details of the pair of
products and their
affinity are revealed
Source: Teradata
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 18

More Related Content

Similar to 201412 Predictive Analytics Foundation course extract

Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
201505 Statistical Thinking course extract
201505 Statistical Thinking course extract201505 Statistical Thinking course extract
201505 Statistical Thinking course extractJefferson Lynch
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
PPT1-Buss Intel Analytics.pptx
PPT1-Buss Intel  Analytics.pptxPPT1-Buss Intel  Analytics.pptx
PPT1-Buss Intel Analytics.pptxssuser28b150
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extractJefferson Lynch
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 
Big Data Expo 2015 - Ureason
Big Data Expo 2015 - UreasonBig Data Expo 2015 - Ureason
Big Data Expo 2015 - UreasonBigDataExpo
 
Improving practitioner decision making capabilities with data and analytics v1
Improving practitioner decision making capabilities with data and analytics v1Improving practitioner decision making capabilities with data and analytics v1
Improving practitioner decision making capabilities with data and analytics v1Ali Khan
 
Analytics 101 - Getting Started
Analytics 101 - Getting Started Analytics 101 - Getting Started
Analytics 101 - Getting Started Gautam Munshi
 
Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27Raj Kasarabada
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfGraceOkeke3
 
Social Life Networks (Eventshop and Personal Event Shop)
Social Life Networks (Eventshop and Personal Event Shop)Social Life Networks (Eventshop and Personal Event Shop)
Social Life Networks (Eventshop and Personal Event Shop)Siripen Pongpaichet
 

Similar to 201412 Predictive Analytics Foundation course extract (20)

Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
201505 Statistical Thinking course extract
201505 Statistical Thinking course extract201505 Statistical Thinking course extract
201505 Statistical Thinking course extract
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
PPT1-Buss Intel Analytics.pptx
PPT1-Buss Intel  Analytics.pptxPPT1-Buss Intel  Analytics.pptx
PPT1-Buss Intel Analytics.pptx
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extract
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Big Data Expo 2015 - Ureason
Big Data Expo 2015 - UreasonBig Data Expo 2015 - Ureason
Big Data Expo 2015 - Ureason
 
Improving practitioner decision making capabilities with data and analytics v1
Improving practitioner decision making capabilities with data and analytics v1Improving practitioner decision making capabilities with data and analytics v1
Improving practitioner decision making capabilities with data and analytics v1
 
Analytics 101 - Getting Started
Analytics 101 - Getting Started Analytics 101 - Getting Started
Analytics 101 - Getting Started
 
Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
Social Life Networks (Eventshop and Personal Event Shop)
Social Life Networks (Eventshop and Personal Event Shop)Social Life Networks (Eventshop and Personal Event Shop)
Social Life Networks (Eventshop and Personal Event Shop)
 

201412 Predictive Analytics Foundation course extract

  • 1. 1 Predictive Analytics: Extracts from Red Olive foundational course For more details or to speak about a tailored course for your organisation please contact: Jefferson Lynch: jefferson.lynch@red-olive.co.uk +44 1256 831100 December 2014 Analytics and Data Management1
  • 2. Contents What makes a great analysis? Measuring relationships between variables Profiling What is data mining? The data mining process Data mining techniques Discussion – next steps for data mining Back-up slides Introduction to descriptive statistics Copyright © 2014 Red Olive Ltd, All Rights Reserved. 2
  • 3. Some examples Copyright © 2014 Red Olive Ltd, All Rights Reserved. 3
  • 4. Christmas 2011 Queen’s Diamond Jubilee Road works ban starts (1st July 2012) London 2012 Olympic and Paralympic Games Winter-time road works / end FY Monitoring Trends Traffic Disruption in London Information from Transport for London Oracle Day presentation, 6 Nov 2012 Copyright © 2014 Red Olive Ltd, All Rights Reserved. 4
  • 5. Geographical Mash-ups 5 Visualising: Connections between businesses in East London Based on: Streams of Twitter data, tracking relationships, mentions and retweeets Source: http://www.techcitymap.com/index.html#/ Copyright © 2014 Red Olive Ltd, All Rights Reserved.
  • 6. Census Analysis Copyright © 2014 Red Olive Ltd, All Rights Reserved. 6 Original source: ONS data visualisation centre http://www.ons.gov.uk/ons/interactive/index.html Census 2011: Explore population changes in your area Source: The Telegraph online Interactive tool for looking comparing areas on their 2001 and 2011 demographic profiles http://www.telegraph.co.uk/ear th/greenpolitics/population/94 03239/Census-2011-Explore- the-population-changes-in- your-area.html
  • 7. Measuring relationships between variables • In order to start making ‘connections’ we need to investigate relationships between variables • Start point - relationships between two variables at time • Multivariate techniques allow us to investigate relationships between many variables • The appropriate measure of relationship depends on the type of data that you’re analysing – primarily whether scale (numeric) or nominal (categorical) Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
  • 8. Measures of relationship Scale (numeric) data Correlation quantifies the linear relationship between variables in scatter plots +1 = exact positive relationship e.g. e.g. 0 = no relationship e.g. x x x x x x x x x -1 = exact negative relationship e.g. Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
  • 9. Correlation coefficient takes values between -1 and +1 The correlation will rarely be exactly 1 or -1 This would suggest that the variables were exactly dependent on each other Likewise the correlation is rarely exactly 0 Because a slight relationship can occur by chance Correlation measures the extent of a linear relationship, so needs to be handled with care Four sets of data with the same correlation of 0.816 For Correlation: Excel function CORREL Copyright © 2014 Red Olive Ltd, All Rights Reserved. 9
  • 10. What is data mining? Copyright © 2014 Red Olive Ltd, All Rights Reserved. 10
  • 11. Two main types of data mining model Type 1: Models driven by a Target Variable e.g. Which site visitors are likely to subscribe? - Implies building a Predictive Model - ‘Directed’ Data Mining Techniques Type 2: Models with no Target Variable e.g. How does the subscriber base segment? - Implies a Descriptive Model - ‘Undirected’ Data Mining Techniques Copyright © 2014 Red Olive Ltd, All Rights Reserved. 11
  • 12. Gains Chart – based on representative evaluation sample 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Cumulative%oofrespoondents Cumulative % of base Gains Chart Churn Model prediction random optimal Copyright © 2014 Red Olive Ltd, All Rights Reserved. 12
  • 13. Data mining techniques and where they can be applied Copyright © 2014 Red Olive Ltd, All Rights Reserved. 13
  • 14. Techniques to be discussed Predictive Forecasting Decision trees Regression models Descriptive Factor analysis Cluster analysis Affinity analysis Copyright © 2014 Red Olive Ltd, All Rights Reserved. 14
  • 15. Techniques on individual-level data Data mining methods Copyright © 2014 Red Olive Ltd, All Rights Reserved. 15
  • 16. 16 Target Variable: Good/Bad Credit Rating Best predictor: Income Level 2nd best predictor: Number of credit cards End nodes: No further splits Example Decision Tree Final predictor: Age Highly significant Copyright © 2014 Red Olive Ltd, All Rights Reserved. 16
  • 17. Regression Example Copyright © 2014 Red Olive Ltd, All Rights Reserved. 17 Source: The Times 24/11/2012 Regression Model
  • 18. The affinity tile map Strengths of affinities are displayed using a ‘hot-cold’ colour palette By clicking on a tile, details of the pair of products and their affinity are revealed Source: Teradata Copyright © 2014 Red Olive Ltd, All Rights Reserved. 18