More Related Content Similar to 201412 Predictive Analytics Foundation course extract Similar to 201412 Predictive Analytics Foundation course extract (20) 201412 Predictive Analytics Foundation course extract1. 1
Predictive Analytics:
Extracts from Red Olive foundational course
For more details or to speak about a tailored
course for your organisation please contact:
Jefferson Lynch: jefferson.lynch@red-olive.co.uk
+44 1256 831100
December 2014
Analytics and
Data Management1
2. Contents
What makes a great analysis?
Measuring relationships between variables
Profiling
What is data mining?
The data mining process
Data mining techniques
Discussion – next steps for data mining
Back-up slides
Introduction to descriptive statistics
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 2
4. Christmas
2011
Queen’s
Diamond
Jubilee
Road works
ban starts
(1st July 2012)
London 2012
Olympic and
Paralympic
Games
Winter-time
road works /
end FY
Monitoring Trends
Traffic Disruption in London
Information from Transport for London
Oracle Day presentation, 6 Nov 2012
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 4
6. Census Analysis
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 6
Original source: ONS data visualisation centre
http://www.ons.gov.uk/ons/interactive/index.html
Census 2011: Explore
population changes in your area
Source: The Telegraph online
Interactive tool for looking
comparing areas on their 2001
and 2011 demographic profiles
http://www.telegraph.co.uk/ear
th/greenpolitics/population/94
03239/Census-2011-Explore-
the-population-changes-in-
your-area.html
7. Measuring relationships
between variables
• In order to start making ‘connections’ we need to investigate
relationships between variables
• Start point - relationships between two variables at time
• Multivariate techniques allow us to investigate relationships between
many variables
• The appropriate measure of relationship depends on the type of data
that you’re analysing – primarily whether scale (numeric) or nominal
(categorical)
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
8. Measures of relationship
Scale (numeric) data
Correlation quantifies the linear relationship between variables
in scatter plots
+1 = exact positive relationship e.g. e.g.
0 = no relationship e.g.
x x
x
x
x
x
x x x
-1 = exact negative relationship e.g.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
9. Correlation coefficient
takes values between -1 and +1
The correlation will rarely be exactly 1 or -1
This would suggest that the variables were
exactly dependent on each other
Likewise the correlation is rarely exactly 0
Because a slight relationship can occur by
chance
Correlation measures the extent of a linear
relationship, so needs to be handled with
care
Four sets of data with
the same correlation of 0.816
For Correlation:
Excel function CORREL
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 9
10. What is data mining?
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 10
11. Two main types of
data mining model
Type 1: Models driven by a Target Variable
e.g. Which site visitors are likely to subscribe?
- Implies building a Predictive Model
- ‘Directed’ Data Mining Techniques
Type 2: Models with no Target Variable
e.g. How does the subscriber base segment?
- Implies a Descriptive Model
- ‘Undirected’ Data Mining Techniques
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 11
12. Gains Chart – based on representative
evaluation sample
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cumulative%oofrespoondents
Cumulative % of base
Gains Chart Churn Model
prediction
random
optimal
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 12
13. Data mining techniques and
where they can be applied
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 13
14. Techniques to be discussed
Predictive
Forecasting
Decision trees
Regression models
Descriptive
Factor analysis
Cluster analysis
Affinity analysis
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 14
16. 16
Target Variable:
Good/Bad Credit Rating
Best predictor:
Income Level
2nd best predictor:
Number of credit cards
End nodes:
No further splits
Example
Decision Tree
Final predictor:
Age
Highly significant
Copyright © 2014 Red Olive Ltd, All
Rights Reserved.
16
18. The affinity tile map
Strengths of affinities
are displayed using a
‘hot-cold’ colour palette
By clicking on a tile,
details of the pair of
products and their
affinity are revealed
Source: Teradata
Copyright © 2014 Red Olive Ltd, All Rights Reserved. 18