A PREDICTIVE
ANALYTICS
PRIMER
A N A L Y S I S B Y S H L O K A
B Y T H O M A S H .
D A V E N P O R T
A N A L Y S I S B Y S H L O K A
P r e d i c t i v e a n a l y t i c s i s t h e
b r a n c h o f t h e a d v a n c e d
a n a l y t i c s w h i c h i s u s e d t o
m a k e p r e d i c t i o n s a b o u t
u n k n o w n f u t u r e e v e n t s .
P R E D I C T I V E
A N A L Y T I C S
P r e d i c t i v e a n a l y t i c s u s e s m a n y
t e c h n i q u e s f r o m d a t a m i n i n g ,
s t a t i s t i c s , m o d e l i n g , m a c h i n e
l e a r n i n g , a n d a r t i f i c i a l
i n t e l l i g e n c e t o a n a l y z e c u r r e n t
d a t a t o m a k e p r e d i c t i o n s a b o u t
f u t u r e .
2
1
K n o w w h e r e
y o u ’ r e l i k e l y t o
b e i n t h e f u t u r e .
I d e n t i f y d i f f e r e n t
g r o u p s o f c u s t o m e r s
f o r t a r g e t e d a n a l y s i s
a n d p r e c i s i o n
m a r k e t i n g .
W H Y D O W E N E E D
P R E D I C T I V E A N A L Y T I C S ?
4
3
A n a l y z e y o u r d a t a t o
p r e d i c t i n d i v i d u a l o r
g r o u p b e h a v i o r .
Q u a n t i f y t h e r i s k
a s s o c i a t e d w i t h
c u s t o m e r s o r
a c q u i s i t i o n s .
W H Y D O W E N E E D
P R E D I C T I V E A N A L Y T I C S ?
Q u a n t i t a t i v e
a n a l y s i s i s n ’ t
m a g i c — b u t i t i s
n o r m a l l y d o n e
w i t h a l o t o f
p a s t d a t a , a
l i t t l e s t a t i s t i c a l
w i z a r d r y , a n d
s o m e i m p o r t a n t
a s s u m p t i o n s .
ITS NOT
MAGIC!
  T H A T C A N
M A K E L I F E
E A S I E R
O R
  T O U G H !
E L E M E N T S
I T S T A R T S W I T H D A T A
It is imperative that any advanced analytics are
based on stable and accurate information.
Therefore, there's huge focus on sound data
management to ensure all data is properly
scrubbed and validated prior to analysis.
01
I T S T A R T S W I T H D A T A
Lack of good data is the most common barrier to
organisations seeking to employ predictive
analytics.
If you have multiple channels, it is imperative
that they capture data on customer purchases in
the same way your previous channels did.
01
I T S T A R T S W I T H D A T A
it’s a fairly tough job to create a single customer
data warehouse with unique customer IDs on
everyone, and all past purchases customers have
made through all channels. If you’ve already done
that, you’ve got an incredible asset for predictive
customer analytics.
01
P r e d i c t i v e
a n a l y t i c s
e n c o m p a s s e s a
h o s t o f t o o l s a n d
t e c h n i q u e s t o
a c h i e v e y o u r
s p e c i f i c g o a l s a n d
i n c r e a s e y o u r
k n o w l e d g e .
02
T H E S T A T I S T I C S
P R E D I C T I V E A N A L Y T I C S
T O O L S A N D T E C H N I Q U E S
R E G R E S S I O N A N A L Y S I S I N I T S
V A R I O U S F O R M S I S T H E P R I M A R Y
T O O L
An analyst hypothesizes
that a set of independent
variables are statistically
correlated with the
purchase of a product for
a sample of customers.
The analyst performs a
regression analysis to
see just how correlated
each variable is
and finds that each
variable in the model is
important.
The analyst can then use
the regression
coefficients to create a
score predicting the
likelihood of the
purchase.
T H E S T A T I S T I C S
E v e r y m o d e l
h a s t h e m , a n d
i t ’ s i m p o r t a n t
t o k n o w w h a t
t h e y a r e a n d
m o n i t o r
w h e t h e r t h e y
a r e s t i l l t r u e .
04
A S S U M P T I O N
S i n c e f a u l t y o r
o b s o l e t e
a s s u m p t i o n s c a n
c l e a r l y b r i n g
d o w n w h o l e
b a n k s a n d e v e n
( n e a r l y ! ) w h o l e
e c o n o m i e s , i t ’ s
p r e t t y i m p o r t a n t
t h a t t h e y b e
c a r e f u l l y
e x a m i n e d .  
A S S U M P T I O N
M a n a g e r s s h o u l d a l w a y s a s k a n a l y s t s
w h a t t h e k e y a s s u m p t i o n s a r e , a n d
w h a t w o u l d h a v e t o h a p p e n f o r t h e m
t o n o l o n g e r b e v a l i d . A n d b o t h
m a n a g e r s a n d a n a l y s t s s h o u l d
c o n t i n u a l l y m o n i t o r t h e w o r l d t o s e e
i f k e y f a c t o r s i n v o l v e d i n a s s u m p t i o n s
m i g h t h a v e c h a n g e d o v e r t i m e .
A S S U M P T I O N
I f y o u r m o d e l w a s
c r e a t e d s e v e r a l
y e a r s a g o , i t m a y
n o l o n g e r
a c c u r a t e l y p r e d i c t
c u r r e n t b e h a v i o u r .
T h e g r e a t e r t h e
e l a p s e d t i m e , t h e
m o r e l i k e l y
c u s t o m e r b e h a v i o u r
h a s c h a n g e d .  
A
R E A S O N S F O R I N V A L I D
A S S U M P T I O N S
A: TIME
N e t f l i x p r e d i c t i v e
m o d e l s t h a t w e r e
c r e a t e d o n e a r l y
I n t e r n e t
u s e r s ( t e c h n i c a l l y -
f o c u s e d a n d y o u n g )
h a d t o b e r e t i r e d
b e c a u s e l a t e r
I n t e r n e t u s e r s
( e s s e n t i a l l y
e v e r y o n e ) w e r e
d i f f e r e n t .  A
R E A S O N S F O R I N V A L I D
A S S U M P T I O N S
A: TIME
P r e d i c t i v e m o d e l ’ s
a s s u m p t i o n s m a y n o
l o n g e r b e v a l i d i s i f
t h e a n a l y s t d i d n ’ t
i n c l u d e a k e y
v a r i a b l e i n t h e
m o d e l , a n d t h a t
v a r i a b l e h a s
c h a n g e d
s u b s t a n t i a l l y o v e r
t i m e .
A
R E A S O N S F O R I N V A L I D
A S S U M P T I O N S
B: MISSING
KEY
VARIABLE
A
R E A S O N S F O R I N V A L I D
A S S U M P T I O N S
B: MISSING
KEY
VARIABLE
T h e f i n a n c i a l c r i s i s o f
2 0 0 8 - 9 , c a u s e d l a r g e l y
b y i n v a l i d m o d e l s
p r e d i c t i n g h o w l i k e l y
m o r t g a g e c u s t o m e r s w e r e
t o r e p a y t h e i r l o a n s . T h e
m o d e l s d i d n ’ t i n c l u d e
t h e p o s s i b i l i t y t h a t
h o u s i n g p r i c e s m i g h t
s t o p r i s i n g o r t h e y m i g h t
f a l l . I n e s s e n c e , t h e f a c t
t h a t h o u s i n g p r i c e s
w o u l d a l w a y s r i s e w a s a
h i d d e n a s s u m p t i o n .
A S K Q U E S T I O N S
C a n y o u t e l l m e s o m e t h i n g a b o u t t h e
s o u r c e o f d a t a y o u u s e d i n y o u r
a n a l y s i s ?
A r e y o u s u r e t h e s a m p l e d a t a a r e
r e p r e s e n t a t i v e o f t h e p o p u l a t i o n ?
A r e t h e r e a n y o u t l i e r s i n y o u r d a t a
d i s t r i b u t i o n ? H o w d i d t h e y a f f e c t t h e
r e s u l t s ?
W h a t a s s u m p t i o n s a r e b e h i n d y o u r
a n a l y s i s ?
A r e t h e r e a n y c o n d i t i o n s t h a t w o u l d
m a k e y o u r a s s u m p t i o n s i n v a l i d ?
W H A T S H O U L D
M A N A G E R S D O ?

Analysis by shloka

  • 1.
    A PREDICTIVE ANALYTICS PRIMER A NA L Y S I S B Y S H L O K A
  • 2.
    B Y TH O M A S H . D A V E N P O R T A N A L Y S I S B Y S H L O K A
  • 3.
    P r ed i c t i v e a n a l y t i c s i s t h e b r a n c h o f t h e a d v a n c e d a n a l y t i c s w h i c h i s u s e d t o m a k e p r e d i c t i o n s a b o u t u n k n o w n f u t u r e e v e n t s . P R E D I C T I V E A N A L Y T I C S
  • 4.
    P r ed i c t i v e a n a l y t i c s u s e s m a n y t e c h n i q u e s f r o m d a t a m i n i n g , s t a t i s t i c s , m o d e l i n g , m a c h i n e l e a r n i n g , a n d a r t i f i c i a l i n t e l l i g e n c e t o a n a l y z e c u r r e n t d a t a t o m a k e p r e d i c t i o n s a b o u t f u t u r e .
  • 5.
    2 1 K n ow w h e r e y o u ’ r e l i k e l y t o b e i n t h e f u t u r e . I d e n t i f y d i f f e r e n t g r o u p s o f c u s t o m e r s f o r t a r g e t e d a n a l y s i s a n d p r e c i s i o n m a r k e t i n g . W H Y D O W E N E E D P R E D I C T I V E A N A L Y T I C S ?
  • 6.
    4 3 A n al y z e y o u r d a t a t o p r e d i c t i n d i v i d u a l o r g r o u p b e h a v i o r . Q u a n t i f y t h e r i s k a s s o c i a t e d w i t h c u s t o m e r s o r a c q u i s i t i o n s . W H Y D O W E N E E D P R E D I C T I V E A N A L Y T I C S ?
  • 7.
    Q u an t i t a t i v e a n a l y s i s i s n ’ t m a g i c — b u t i t i s n o r m a l l y d o n e w i t h a l o t o f p a s t d a t a , a l i t t l e s t a t i s t i c a l w i z a r d r y , a n d s o m e i m p o r t a n t a s s u m p t i o n s . ITS NOT MAGIC!
  • 8.
      T HA T C A N M A K E L I F E E A S I E R O R   T O U G H ! E L E M E N T S
  • 9.
    I T ST A R T S W I T H D A T A It is imperative that any advanced analytics are based on stable and accurate information. Therefore, there's huge focus on sound data management to ensure all data is properly scrubbed and validated prior to analysis. 01
  • 10.
    I T ST A R T S W I T H D A T A Lack of good data is the most common barrier to organisations seeking to employ predictive analytics. If you have multiple channels, it is imperative that they capture data on customer purchases in the same way your previous channels did. 01
  • 11.
    I T ST A R T S W I T H D A T A it’s a fairly tough job to create a single customer data warehouse with unique customer IDs on everyone, and all past purchases customers have made through all channels. If you’ve already done that, you’ve got an incredible asset for predictive customer analytics. 01
  • 12.
    P r ed i c t i v e a n a l y t i c s e n c o m p a s s e s a h o s t o f t o o l s a n d t e c h n i q u e s t o a c h i e v e y o u r s p e c i f i c g o a l s a n d i n c r e a s e y o u r k n o w l e d g e . 02 T H E S T A T I S T I C S
  • 13.
    P R ED I C T I V E A N A L Y T I C S T O O L S A N D T E C H N I Q U E S
  • 14.
    R E GR E S S I O N A N A L Y S I S I N I T S V A R I O U S F O R M S I S T H E P R I M A R Y T O O L An analyst hypothesizes that a set of independent variables are statistically correlated with the purchase of a product for a sample of customers. The analyst performs a regression analysis to see just how correlated each variable is and finds that each variable in the model is important. The analyst can then use the regression coefficients to create a score predicting the likelihood of the purchase. T H E S T A T I S T I C S
  • 15.
    E v er y m o d e l h a s t h e m , a n d i t ’ s i m p o r t a n t t o k n o w w h a t t h e y a r e a n d m o n i t o r w h e t h e r t h e y a r e s t i l l t r u e . 04 A S S U M P T I O N
  • 16.
    S i nc e f a u l t y o r o b s o l e t e a s s u m p t i o n s c a n c l e a r l y b r i n g d o w n w h o l e b a n k s a n d e v e n ( n e a r l y ! ) w h o l e e c o n o m i e s , i t ’ s p r e t t y i m p o r t a n t t h a t t h e y b e c a r e f u l l y e x a m i n e d .   A S S U M P T I O N
  • 17.
    M a na g e r s s h o u l d a l w a y s a s k a n a l y s t s w h a t t h e k e y a s s u m p t i o n s a r e , a n d w h a t w o u l d h a v e t o h a p p e n f o r t h e m t o n o l o n g e r b e v a l i d . A n d b o t h m a n a g e r s a n d a n a l y s t s s h o u l d c o n t i n u a l l y m o n i t o r t h e w o r l d t o s e e i f k e y f a c t o r s i n v o l v e d i n a s s u m p t i o n s m i g h t h a v e c h a n g e d o v e r t i m e . A S S U M P T I O N
  • 18.
    I f yo u r m o d e l w a s c r e a t e d s e v e r a l y e a r s a g o , i t m a y n o l o n g e r a c c u r a t e l y p r e d i c t c u r r e n t b e h a v i o u r . T h e g r e a t e r t h e e l a p s e d t i m e , t h e m o r e l i k e l y c u s t o m e r b e h a v i o u r h a s c h a n g e d .   A R E A S O N S F O R I N V A L I D A S S U M P T I O N S A: TIME
  • 19.
    N e tf l i x p r e d i c t i v e m o d e l s t h a t w e r e c r e a t e d o n e a r l y I n t e r n e t u s e r s ( t e c h n i c a l l y - f o c u s e d a n d y o u n g ) h a d t o b e r e t i r e d b e c a u s e l a t e r I n t e r n e t u s e r s ( e s s e n t i a l l y e v e r y o n e ) w e r e d i f f e r e n t .  A R E A S O N S F O R I N V A L I D A S S U M P T I O N S A: TIME
  • 20.
    P r ed i c t i v e m o d e l ’ s a s s u m p t i o n s m a y n o l o n g e r b e v a l i d i s i f t h e a n a l y s t d i d n ’ t i n c l u d e a k e y v a r i a b l e i n t h e m o d e l , a n d t h a t v a r i a b l e h a s c h a n g e d s u b s t a n t i a l l y o v e r t i m e . A R E A S O N S F O R I N V A L I D A S S U M P T I O N S B: MISSING KEY VARIABLE
  • 21.
    A R E AS O N S F O R I N V A L I D A S S U M P T I O N S B: MISSING KEY VARIABLE T h e f i n a n c i a l c r i s i s o f 2 0 0 8 - 9 , c a u s e d l a r g e l y b y i n v a l i d m o d e l s p r e d i c t i n g h o w l i k e l y m o r t g a g e c u s t o m e r s w e r e t o r e p a y t h e i r l o a n s . T h e m o d e l s d i d n ’ t i n c l u d e t h e p o s s i b i l i t y t h a t h o u s i n g p r i c e s m i g h t s t o p r i s i n g o r t h e y m i g h t f a l l . I n e s s e n c e , t h e f a c t t h a t h o u s i n g p r i c e s w o u l d a l w a y s r i s e w a s a h i d d e n a s s u m p t i o n .
  • 22.
    A S KQ U E S T I O N S C a n y o u t e l l m e s o m e t h i n g a b o u t t h e s o u r c e o f d a t a y o u u s e d i n y o u r a n a l y s i s ? A r e y o u s u r e t h e s a m p l e d a t a a r e r e p r e s e n t a t i v e o f t h e p o p u l a t i o n ? A r e t h e r e a n y o u t l i e r s i n y o u r d a t a d i s t r i b u t i o n ? H o w d i d t h e y a f f e c t t h e r e s u l t s ? W h a t a s s u m p t i o n s a r e b e h i n d y o u r a n a l y s i s ? A r e t h e r e a n y c o n d i t i o n s t h a t w o u l d m a k e y o u r a s s u m p t i o n s i n v a l i d ? W H A T S H O U L D M A N A G E R S D O ?