Predictions from MARS

May 2012
Maria Lupetini
Engineering Asset Management & Analytics
Qualcomm Incorporated

 Advantages of MARS Modeling
 Predicting Demand for an Asset
 Capturing Trends and Seasonal Effects
 Finding Interactive Effects
 Weighting More Recent Data
 Autoregressive Model for Time Series
 Using Lag Variables
 Don’t be Afraid of Missing Values
 Summary of Findings

 Regression: Linear, Logistic, GLM, MARS
 ARIMA Time Series
 Decision Trees
 Neural Networks
 Support Vector Machines
 And more

Need to pick one or more approaches tailored
to problem you are tackling

 Sales - Dollars, Number of Chips

 Resources - People, Software Assets

 Performance of a Semiconductor - Seconds
to load a web page

 …You name it.

 Data contains continuous numbers
 $123,456.00
 Number of employees
 Understand influences of categories
 Geographical regions
 Operating system: Windows, Android
 Seasonal or repeated trends
 Months of the year
 Christmas season
 Special Effects
 Consumer Promotions and Advertising
 Switch turned on

What do you do if you want to predict a trend or find a pattern in data….and

 There are hundreds of possible variables that influence your outcome -
◦ Which ones matter?

 What if the variables interact with each other and effect the outcome
◦ How do you find that those relationships?

 What if variables are not linearly related to the outcome
◦ How do determine the what the relationship curves will look like?
◦ Threshold or plateau relationship

 What if the data you are using to predict is a mixture of numbers and categories
◦ How do you build a prediction formula?

 How do I build a prediction model that is easy to understand?

… USE MARS

 MARS short for Multivariate Adaptive Regression Splines

 Technique introduced in 1991, Jerome Friedman, Stanford
University

 Nonparametric, data driven algorithm

 Prediction is a regression model with additional side
equations (basis functions)

 Uses piecewise regression splines to build the prediction

 Provides data reduction to select which variables matter

Software Used in Designing Semiconductor Chips

 Is the use of the software growing?

 What time of day are the software licenses most
demanded?

 Does demand change over the weekend?

 How many copies do we need next week?

100
150
200
250
300
350

50

0
8/28/2011 12…
9/2/2011 4 PM
9/8/2011 8 AM
9/14/2011 12…
9/19/2011 4 PM
9/25/2011 8 AM
10/1/2011 12…
10/6/2011 4 PM
10/12/2011 8…
10/18/2011 12…
10/23/2011 4…
10/29/2011 8…
11/4/2011 12…
11/9/2011 4 PM
11/15/2011 8…
11/21/2011 12…
11/26/2011 4…
12/2/2011 8 AM
12/8/2011 12…
12/13/2011 4…
12/19/2011 8…
12/25/2011 12…
12/30/2011 4…
1/5/2012 8 AM
1/11/2012 12…
1/16/2012 4 PM
1/22/2012 8 AM
1/28/2012 12…
from Aug 2011 to April 2012

2/2/2012 4 PM
2/8/2012 8 AM
2/14/2012 12…
2/19/2012 4 PM
Number of Software Licenses Used in an Hour

2/25/2012 8 AM
3/2/2012 12 AM
3/7/2012 4 PM
3/13/2012 9 AM
3/19/2012 1 AM
3/24/2012 5 PM
How do you forecast this time series of demand data?

3/30/2012 9 AM
4/5/2012 1 AM
4/10/2012 5 PM

Actual
Licenses Week Day Week
Time Used Number WeekDay Name end Holiday Hour
9/4/2011 9 PM 58
37 1 Sun 1 Y 21
9/4/2011 10 PM 75
37 1 Sun 1 Y 22
9/4/2011 11 PM 88
37 1 Sun 1 Y 23
9/5/2011 12 AM 81
37 2 Mon 0 Y 0
9/5/2011 1 AM 74
37 2 Mon 0 Y 1
9/5/2011 2 AM 80
37 2 Mon 0 Y 2
9/5/2011 3 AM 81
37 2 Mon 0 Y 3

• Real Continuous or Integer Variables: License Counts, Week Number
• Categorical Text Variables: Holiday flag, Day Name
• Binary Numbers: Weekend flag
• Choice of Categorical or Real Number: Week Day, Hour

Can we building a prediction model of the form?

Demand =
Constant Base+
Baseline trend +
Hour of day effect +
Day of Week effect +
Holiday effect

Trend line captures:
• Growing use of this software product from Sep 20112 to Apr 2012
• Deadlines of semiconductor chip projects (Jan. and March)

Additional
licenses
needed as
function of
hour of the
day

Hour Predictor Captures:
• Highest use of licenses during 10 to 1pm US Pacific time
• Effect of Use in European/Indian time zones

Additional Weekday was coded as
licenses a continuous variable.
needed as Coding it as a
function of categorical can also
day of the work here.
week 1= Sunday,
2=Monday, etc

Day of Week Predictor Captures:
• Highest use of licenses during Wednesday to Friday

Possible Interactive Effects Between Variables

Look to find an interactive
effects between hour of day
and day of week.

Did not want to allow
interactive effects between
week_number and holiday
variables with other variables

Additional
licenses needed
as function of
hour and day

Interactive effect
• Work patterns are different on the weekends when
compared to the work week.

Additional
licenses
needed on
non-holidays

Holiday Predictor Captures:
• The difference in demand in a hour if it is a holiday

Weighting of Observations
5/21/2012 12 AM
Day and Hour Observation

4/1/2012 12 AM

2/11/2012 12 AM

12/23/2011 12 AM

11/3/2011 12 AM

9/14/2011 12 AM

7/26/2011 12 AM

0 1 2 3 4

Weight Applied to Observations

MARS will consider a “variable” as a weighting factor.
Here, the observations in April 2012 were 3 times
more important than observations in Sep 2011.

100
150
200
250
300
350

50

0
4/8/2012 12 AM
4/8/2012 8 AM
4/8/2012 4 PM
4/9/2012 12 AM
4/9/2012 8 AM
4/9/2012 4 PM
4/10/2012 12 AM
4/10/2012 8 AM
4/10/2012 4 PM
4/11/2012 12 AM
4/11/2012 8 AM
4/11/2012 4 PM
4/12/2012 12 AM
4/12/2012 8 AM
Blue line Actual Licenses Used

4/12/2012 4 PM

Part of the Training Dataset
4/13/2012 12 AM
4/13/2012 8 AM
4/13/2012 4 PM
4/14/2012 12 AM
4/14/2012 8 AM
4/14/2012 4 PM
4/15/2012 12 AM
4/15/2012 8 AM
4/15/2012 4 PM
4/16/2012 12 AM
4/16/2012 8 AM
4/16/2012 4 PM
4/17/2012 12 AM
4/17/2012 8 AM
4/17/2012 4 PM
4/18/2012 12 AM
4/18/2012 8 AM
4/18/2012 4 PM
4/19/2012 12 AM
Number of Software Licenses Used and Predicted

4/19/2012 8 AM
4/19/2012 4 PM
4/20/2012 12 AM
Prediction on Unseen Data

4/20/2012 8 AM
4/20/2012 4 PM
Red line is MARS fit on Training Data for 4/18 to 4/15 and Prediction on 4/15 to 4/21

4/21/2012 12 AM
4/21/2012 8 AM
4/21/2012 4 PM

100
150
200
250
300
350

50
0
8/28/2011 12 AM
9/2/2011 4 PM
9/8/2011 8 AM
9/14/2011 12 AM
9/19/2011 4 PM
9/25/2011 8 AM
10/1/2011 12 AM
10/6/2011 4 PM
10/12/2011 8 AM
10/18/2011 12 AM
10/23/2011 4 PM
10/29/2011 8 AM
11/4/2011 12 AM
11/9/2011 4 PM
11/15/2011 8 AM
11/21/2011 12 AM
11/26/2011 4 PM
12/2/2011 8 AM
12/8/2011 12 AM
12/13/2011 4 PM
12/19/2011 8 AM
12/25/2011 12 AM

Prediction Model

• Overall trend
12/30/2011 4 PM
1/5/2012 8 AM
Training Dataset

1/11/2012 12 AM
1/16/2012 4 PM
1/22/2012 8 AM
1/28/2012 12 AM
Actual

MARS was able to capture:

2/2/2012 4 PM
Number of Software Licenses Used

2/8/2012 8 AM
2/14/2012 12 AM
• Hourly and Week Day effect

2/19/2012 4 PM
2/25/2012 8 AM
• Somewhat captured US holidays

3/2/2012 12 AM
3/7/2012 4 PM
3/13/2012 9 AM
3/19/2012 1 AM
3/24/2012 5 PM
3/30/2012 9 AM
4/5/2012 1 AM
4/10/2012 5 PM

Variable Importance -gcv
--------------------------------------------------------------- MARS tells you
WEEKDAY 100.00000 2713.86182 which variables
are most
HOUR 93.20326 2418.96997
WEEK_NUMBER 44.00605 903.06390
HOLIDAY$ 21.76427 574.55463 important.

Great R-Squared
==============================
of 90%. Other
diagnostics, not
N: 15217.52 R-SQUARED: 0.90281 presented here,
MEAN DEP VAR: 158.15640 ADJ R-SQUARED: 0.90214
UNCENTERED R-SQUARED = R-0 SQUARED: 0.98493 looked good too.
F-STATISTIC = 1344.99320 S.E. OF REGRESSION = 35.12427
P-VALUE = 0.00000 RESIDUAL SUM OF SQUARES = .678790E+07
[MDF,NDF] = [ 38, 5502 ] REGRESSION SUM OF SQUARES = .630548E+08

Actual Used: Range 45 to 344 Licenses
Average 95
Standard Dev. 70

Can we build a prediction model of the
autoregressive form?

Demand =
Constant Base+
Baseline trend +
Effect of Licenses Used from a week ago +
Workweek vs. Weekend effect +
Holiday effect

Set Up Autoregressive Model, Part 2

Creating lag variable for “Used Lag168.”
This predictor is the number of licenses
used in the same hour, in the same day,
in the prior week.

MARS found underlying trend when adjusting for other
factors in the Autoregressive model version.

Adjusting for underlying trend makes series
stationary. This is necessary for ARIMA models.

MARS captures contribution of Used Lag 168 hours
variable

Selected MARS Output Showing Model Form and Fit

BF1 = ( USED<168> ne . );
BF2 = ( USED<168> = . ); Basis Functions and
BF3 = max( 0, USED<168> - 42) * BF1; Prediction Equation
BF4 = max( 0, 42 - USED<168>) * BF1; from MARS.
BF5 = (HOLIDAY$ in ( "Y" ));
BF7 = (MON_TO_FRI in ( 0 )); Note the handling of
BF9 = max( 0, WEEK_NUMBER - 50) * BF1;
missing values.
BF10 = max( 0, 50 - WEEK_NUMBER) * BF1;
BF11 = max( 0, USED<168> - 137) * BF1;
BF13 = max( 0, USED<168> - 265) * BF1; Reasonable fit with
BF15 = (MON_TO_FRI in ( 0 )) * BF2; 82% R-squared

Number of Lucenses Needed = 134- 39 * BF1 + 0.58 * BF3 - 2.12 * BF4
- 42* BF5 - 21.6 * BF7 - 0.235 * BF9 - 1.598 * BF10 + 0.338 * BF11
- 0.535 * BF13 - 38 * BF15;

N: 15055.88 R-SQUARED: 0.82525
MEAN DEP VAR: 158.75413 ADJ R-SQUARED: 0.82493

F-STATISTIC = 2533.14901 S.E. OF REGRESSION = 47.37796

For observations where the 168 lag of the “Used” variable is not missing:

Holiday = 1 if it’s a holiday, else 0
Weekend = 1 if it’s Saturday or Sunday, else 0

A = max( 0, USED<168> - 42)
B = max( 0, 42 - USED<168>) Autoregressive
C = max( 0, USED<168> - 137) Splines
D = max( 0, USED<168> - 265)

E = max( 0, WEEK_NUMBER - 50)
F = max( 0, 50 - WEEK_NUMBER) Trend line Splines

Forecasted License Need= 95 - 42*Holiday - 22 * Weekend
[0.6 * A - 2.1 * B + 0.3 * C - 0.5 * D] +
[- 0.2 * E - 1.6 * F]

100
150
200
250
350
400

300

50
0
9/4/2011 12 AM
9/10/2011 6 AM
9/16/2011 12 PM
9/22/2011 6 PM
9/29/2011 12 AM
10/5/2011 6 AM
10/11/2011 12 PM
10/17/2011 6 PM
10/24/2011 12 AM
10/30/2011 6 AM
11/5/2011 12 PM
11/11/2011 6 PM
11/18/2011 12 AM
11/24/2011 6 AM
11/30/2011 12 PM
12/6/2011 6 PM
12/13/2011 12 AM
12/19/2011 6 AM
12/25/2011 12 PM
12/31/2011 6 PM
1/7/2012 12 AM
1/13/2012 6 AM
1/19/2012 12 PM
1/25/2012 6 PM
2/1/2012 12 AM
2/7/2012 6 AM
2/13/2012 12 PM
2/19/2012 6 PM
2/26/2012 12 AM
3/3/2012 6 AM
3/9/2012 12 PM
3/15/2012 7 PM
3/22/2012 1 AM
3/28/2012 7 AM
4/3/2012 1 PM
4/9/2012 7 PM
4/16/2012 1 AM
USED
Predicted

100
150
200
250
300
350
400

0
50
4/8/2012 12 AM
4/8/2012 8 AM
4/8/2012 4 PM
4/9/2012 12 AM
4/9/2012 8 AM
4/9/2012 4 PM
4/10/2012 12 AM
4/10/2012 8 AM
4/10/2012 4 PM
4/11/2012 12 AM
4/11/2012 8 AM
Blue line is Actual Used

4/11/2012 4 PM

Part of Training Dataset
4/12/2012 12 AM
4/12/2012 8 AM
4/12/2012 4 PM
4/13/2012 12 AM
4/13/2012 8 AM
4/13/2012 4 PM
4/14/2012 12 AM
4/14/2012 8 AM
4/14/2012 4 PM
4/15/2012 12 AM
4/15/2012 8 AM
4/15/2012 4 PM
4/16/2012 12 AM
4/16/2012 8 AM
4/16/2012 4 PM
4/17/2012 12 AM
4/17/2012 8 AM
4/17/2012 4 PM
4/18/2012 12 AM
4/18/2012 8 AM
4/18/2012 4 PM
Number of Licenses Used and Predicted

4/19/2012 12 AM
4/19/2012 8 AM
Forecasting Unseen Data

4/19/2012 4 PM
4/20/2012 12 AM
4/20/2012 8 AM
4/20/2012 4 PM
4/21/2012 12 AM
Red line is MARS fit on Training data for 4/8 to 4/14 and Prediction on 4/15 to 4/21 data

4/21/2012 8 AM
4/21/2012 4 PM

Number of Licenses

100
150
200
250
300
350
400

50

0
4/8/2012 12 AM
4/8/2012 9 AM
4/8/2012 6 PM
4/9/2012 3 AM
4/9/2012 12 PM
4/9/2012 9 PM
4/10/2012 6 AM
4/10/2012 3 PM
4/11/2012 12 AM
4/11/2012 9 AM
4/11/2012 6 PM
4/12/2012 3 AM
4/12/2012 12 PM
4/12/2012 9 PM
4/13/2012 6 AM

Predicted_AutoRegressive
4/13/2012 3 PM
4/14/2012 12 AM
4/14/2012 9 AM
4/14/2012 6 PM
4/15/2012 3 AM
4/15/2012 12 PM
4/15/2012 9 PM
Actual Used

4/16/2012 6 AM
to Actual Licenses Used

4/16/2012 3 PM
4/17/2012 12 AM
4/17/2012 9 AM
Compare Forecast of Two Models

4/17/2012 6 PM
4/18/2012 3 AM
4/18/2012 12 PM
4/18/2012 9 PM
4/19/2012 6 AM
4/19/2012 3 PM
4/20/2012 12 AM
4/20/2012 9 AM
Predicted Not Auto Reg

4/20/2012 6 PM
4/21/2012 3 AM
4/21/2012 12 PM
4/21/2012 9 PM

Mathematically
 MARS is versatile; it models most data types
 Selects best predictors
 Models nonlinear relationships
 Easily finds selective interactive effects
 Simple to create lag variables as predictors
 Flexible weighting schemes for observations
 Can handle missing values

Operationally
 Don’t call me for more software license copies on
Thursday at noon; everyone else is!

Predictions from MARS

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to Predictions from MARS

Similar to Predictions from MARS (20)

More from Salford Systems

More from Salford Systems (20)

Recently uploaded

Recently uploaded (20)

Predictions from MARS