A high level overview of all that is Analytics

Simplified Analytics
Predictive Analytics: Primer
Sep 27, 2011

What is Predictive Analytics?
Various way of doing it
Forecasting Techniques
Decision Trees
Regression
How to find out if a method works?
How to deploy them in real world?
When to do Predictive Analytics vs. not?
REFERENCES
Intended for Knowledge Sharing
only.
2
only. 2
CONTENTS

only.
3
only. 3
What is Predictive Analytics?
Prediction of future value of variable of interest(predicted) from past values of either
itself or other explanatory variables(predictor)…
eg. Stock price movements, credit card default rates, inventory management, etc.
Concepts of Time Windows..
Other time components..
• Trend – long term organic growth
• Seasonality – specific fluctuations repeating for certain time points(months, days) every year
• Development window (Jan’08 – Jun’10)
• Observe the predicted variable (stock price,
default rate, etc.) and /or get the relationship with
predictor variables
• Validation window (Jul’10 – Dec’10)
• Check if prediction accuracy within acceptable
limits
• If not, improve the prediction framework
• Prediction window (Jan’11 – May’11)
• Use the predictive method to get the projections
• Strategize business actions based on projections
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Rev($Bn)
Development Window Validation
Window
Prediction
Window

only.
4
only. 4
Various ways of doing it
All methods can be grouped in three broad categories..
• Simple Forecasting Techniques
• Decision Trees
• Regression
Simple Forecasting Techniques:
• Moving Averages – Moving Averages over last ‘x’ months
• Decomposition Method – Tease out Trend and Seasonality components for use in predictions
• Holt Exponential Smoothing Techniques –Apply Trend and Seasonality to Exponential Averages.
Exponential Averages assign progressively lesser weights to older observations.
Decision Trees:
• Breaks down population into smaller buckets and predicts for each buckets. Yield much higher
prediction accuracy than simple forecasting techniques.
Regression:
• Establishes a mathematical relationship between ‘predicted’ and ‘predictor’, which can then be
used to predict future values from known values of ‘predictor’.

only.
5
only. 5
Simple Forecasting Techniques
Simplest method of forecasting but cannot explain why it predicts certain value...
Moving Averages:
Prediction(t) = Average(Value at t-1 to t-x)
For next month, shift average window by 1 month and so on.
Decomposition Method:
Prediction(t) = Trended value(T)*Seasonality Index(SI)
-> T= Actual value in last available month*Growth factor;
and Growth factor = (Actual(t) – Actual(t-1))/Actual(t-1)
-> SI = average of all Jan/average of all months;
SI has to be calculated separately for each of 12 months
and then SI relevant for “being predicted” month applied
Holt Exponential Smoothing:
Prediction(t) = (Smoothed series+ Trend(T))*SI
->Smoothed series = Smoothing Factor * Actual last month +
(1-Smoothing Factor) * Smoothed for last month and so on
350
400
450
500
550
600
650
700
#International airline passengers('000)
Actuals Moving Averages(12 months)
Decomposition Method Holt-Winters

• Begins with entire population and splits on ‘predicted’ variable(e.g., default rate) by a predictor variable,
e.g. Customer type – Subprime or Premium
• Checks if the difference in ‘predicted’(default rate) is statistically significant using Chi-square or t-test
• If the difference is significant, then it splits the nodes* by other variables,
• If not, it goes back and tries to ‘significantly’ split the population by another variable
How long does it keep splitting?
• Until it finds significant splits based on the Chi-square or t-tests
• Until it hits max number of nodes* (manageable number for business actions)
• When the counts in lower most nodes becomes less than 5%
*Each subgroup resulting from split is called a node
only.
6
only. 6
Decision Trees
Higher prediction accuracy and explain ability, since prediction is done at member sub-
groups level…
All Credit Card holders
Default rate: 2%
Sub-prime
Default rate: 5%
Premium
Default rate: 1%
FICO <250
Default rate: 8%
FICO: 250 to 400
Default rate: 6%
FICO>400
Default rate: 4%
Monthly spend <$500
Default rate: 0.5%
Monthly spend >$500
Default rate: 1.5%
nodes

• Estimates degree of relationship between the “predicted” variable and the “predictor” variables
e.g. Credit Card default = intercept + b1*bankruptcy +b2*payment to income ratio
->intercept – unexplained factor
->b1,b2– strength of relationship- how much “predicted”((default probability) changes with unit
changes in “predictor” values(bankruptcy or payment to income ratio)
What are the various types of regression?
only.
7
only. 7
Regression
Highest prediction accuracy and explain ability, since prediction is done at individual
member level…
Regression Methods
Linear Logistic ARIMA
When they should be used?
To predict value of a variable,
e.g., Credit Card spend, inventory
quantity
To predict probability of certain event
happening, e.g., credit card default, inventory
shortage
To predict future values from historical
figures, e.g., future stock price from
past figures
Inherent assumptions in the
technique
Predicted variable follows "normal
distribution" meaning population
has most members having about
average values and lesser counts
towards extremes
Probability of event happening follows
"binomial distribution" meaning probability
of observing 'x' defaulters by picking 'N'
members is highest if the proportion of
defaulters in population is (x/N)
‘Stationary time series’, i.e., the
structure of time series doesn’t change
significantly, i.e., increase in volatility
or change in growth rate itself

only.
8
only. 8
How to find if a method works?
Various measurement diagnostics can be used to check prediction accuracy…
• Root Mean Square Error (RMSE): Average difference between actual and predicted values.
RMSE = average of square(actual – predicted)
• Error rate(%): Tells what is the error relative to actual values of predicted variable.
Error rate (%) = RMSE/average of actuals
Decision Trees and Regression models have more sophisticated diagnostics…
• R-square: Tells how much of the variance in “predicted” variable is captured by the model.
• Rank Order: Checks if the predicted values correlate with actual values.
Steps:
• Sort the population by predicted values
• Split into groups with equal number of obs, generally ten groups or deciles
• Get the average of both actual and predicted values for each group
• Check if both averages are gradually decreasing from the top group to bottom
• Gains Chart: Useful mostly in logistic regression models. Tells if most of the defaulters are being captured in
top groups itself. If not, models aren’t giving highest probability to actual defaulters and so models needs to
be revisited.
• Akaike Information Criteria(AIC): Helps in selecting the most “parsimonious” regression models- maximum
information capture with least number of predictors.

only.
9
only. 9
How to deploy them in real world?
Simple Forecasting Techniques are used to predict at portfolio level only, e.g., predictions for
Auto-Lease portfolio’s loss rates
but both Decision Trees and Regression Models require separate infrastructure to get deployed
for real time/non-real time predictions…
Decision Trees is used as a “rule engine”. Every customer will fall into one of the nodes and the
prediction for that node is used to act on this customer’s request, e.g., Sub-prime customer with
FICO<250 will be targeted even when he is just 1 payment due, vs. a premium customer in high
customer will be given leverage to 4 payments due.
Regression Model gives “account level” estimates which are then used to act on customer’s
request, e.g., Fraud models, etc. Models have to run every time a customer transacts.

OPPORTUNITY SIZING
MONTH 1 MONTH 2 MONTH 3
FT 1 FT2
Is ROI
acceptable?
MIN COUNT
REQUIREMENTS
MMF
NO MMF
Is minimum
count
available?
REQUIRED ACCURACY
OF PREDICTION
Prediction
accuracy
unsatisfactory?
CONSTRAINTS EXPLANATION QUESTIONS
only. 10
When to do Predictive Analytics vs. not?

only. 11
REFERENCES
Simple Forecasting Techniques
http://itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm
Binomial Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm
Exponential Smoothing
http://forecasters.org/pdfs/foresight/free/Issue19_goodwin.pdf
Decision Trees
http://www.salford-systems.com/resources/whitepapers/index.html
Linear Regression
http://faculty.chass.ncsu.edu/garson/PA765/regress.htm
Logistic Regression
http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm
ARIMA Regression(also called as Box-Jenkins methodology)
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc445.htm

A high level overview of all that is Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A high level overview of all that is Analytics

Similar to A high level overview of all that is Analytics (20)

More from Ramkumar Ravichandran

More from Ramkumar Ravichandran (20)

Recently uploaded

Recently uploaded (20)

A high level overview of all that is Analytics