Upcoming SlideShare
×

# Predictions from MARS

1,172 views

Published on

Published in: Technology, Economy & Finance
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,172
On SlideShare
0
From Embeds
0
Number of Embeds
353
Actions
Shares
0
19
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Predictions from MARS

1. 1. May 2012 Maria LupetiniEngineering Asset Management & Analytics Qualcomm Incorporated
2. 2.  Advantages of MARS Modeling Predicting Demand for an Asset Capturing Trends and Seasonal Effects Finding Interactive Effects Weighting More Recent Data Autoregressive Model for Time Series Using Lag Variables Don’t be Afraid of Missing Values Summary of Findings
3. 3.  Regression: Linear, Logistic, GLM, MARS ARIMA Time Series Decision Trees Neural Networks Support Vector Machines And moreNeed to pick one or more approaches tailored to problem you are tackling
4. 4.  Sales - Dollars, Number of Chips Resources - People, Software Assets Performance of a Semiconductor - Seconds to load a web page …You name it.
5. 5.  Data contains continuous numbers  \$123,456.00  Number of employees Understand influences of categories  Geographical regions  Operating system: Windows, Android Seasonal or repeated trends  Months of the year  Christmas season Special Effects  Consumer Promotions and Advertising  Switch turned on
6. 6. What do you do if you want to predict a trend or find a pattern in data….and There are hundreds of possible variables that influence your outcome - ◦ Which ones matter? What if the variables interact with each other and effect the outcome ◦ How do you find that those relationships? What if variables are not linearly related to the outcome ◦ How do determine the what the relationship curves will look like? ◦ Threshold or plateau relationship What if the data you are using to predict is a mixture of numbers and categories ◦ How do you build a prediction formula? How do I build a prediction model that is easy to understand? … USE MARS
7. 7.  MARS short for Multivariate Adaptive Regression Splines Technique introduced in 1991, Jerome Friedman, Stanford University Nonparametric, data driven algorithm Prediction is a regression model with additional side equations (basis functions) Uses piecewise regression splines to build the prediction Provides data reduction to select which variables matter
8. 8. Software Used in Designing Semiconductor Chips Is the use of the software growing? What time of day are the software licenses most demanded? Does demand change over the weekend? How many copies do we need next week?
9. 9. 100 150 200 250 300 350 50 0 8/28/2011 12… 9/2/2011 4 PM 9/8/2011 8 AM 9/14/2011 12…9/19/2011 4 PM9/25/2011 8 AM 10/1/2011 12…10/6/2011 4 PM 10/12/2011 8… 10/18/2011 12… 10/23/2011 4… 10/29/2011 8… 11/4/2011 12…11/9/2011 4 PM 11/15/2011 8… 11/21/2011 12… 11/26/2011 4…12/2/2011 8 AM 12/8/2011 12… 12/13/2011 4… 12/19/2011 8… 12/25/2011 12… 12/30/2011 4… 1/5/2012 8 AM 1/11/2012 12…1/16/2012 4 PM1/22/2012 8 AM 1/28/2012 12… from Aug 2011 to April 2012 2/2/2012 4 PM 2/8/2012 8 AM 2/14/2012 12…2/19/2012 4 PM Number of Software Licenses Used in an Hour2/25/2012 8 AM3/2/2012 12 AM 3/7/2012 4 PM3/13/2012 9 AM3/19/2012 1 AM3/24/2012 5 PM How do you forecast this time series of demand data?3/30/2012 9 AM 4/5/2012 1 AM4/10/2012 5 PM
10. 10. Actual Licenses Week Day Week Time Used Number WeekDay Name end Holiday Hour9/4/2011 9 PM 58 37 1 Sun 1 Y 219/4/2011 10 PM 75 37 1 Sun 1 Y 229/4/2011 11 PM 88 37 1 Sun 1 Y 239/5/2011 12 AM 81 37 2 Mon 0 Y 09/5/2011 1 AM 74 37 2 Mon 0 Y 19/5/2011 2 AM 80 37 2 Mon 0 Y 29/5/2011 3 AM 81 37 2 Mon 0 Y 3 • Real Continuous or Integer Variables: License Counts, Week Number • Categorical Text Variables: Holiday flag, Day Name • Binary Numbers: Weekend flag • Choice of Categorical or Real Number: Week Day, Hour
11. 11. Can we building a prediction model of the form?Demand = Constant Base+ Baseline trend + Hour of day effect + Day of Week effect + Holiday effect
12. 12. Setting Up Model in MARS
13. 13. Trend line captures:• Growing use of this software product from Sep 20112 to Apr 2012• Deadlines of semiconductor chip projects (Jan. and March)
14. 14. Additional licenses needed asfunction ofhour of the day Hour Predictor Captures: • Highest use of licenses during 10 to 1pm US Pacific time • Effect of Use in European/Indian time zones
15. 15. Additional Weekday was coded as licenses a continuous variable. needed as Coding it as afunction of categorical can also day of the work here. week 1= Sunday, 2=Monday, etc Day of Week Predictor Captures: • Highest use of licenses during Wednesday to Friday
16. 16. Possible Interactive Effects Between Variables Look to find an interactive effects between hour of day and day of week. Did not want to allow interactive effects between week_number and holiday variables with other variables
17. 17. Additionallicenses needed as function of hour and day Interactive effect • Work patterns are different on the weekends when compared to the work week.
18. 18. Additional licenses needed onnon-holidays Holiday Predictor Captures: • The difference in demand in a hour if it is a holiday
19. 19. Weighting of Observations 5/21/2012 12 AM Day and Hour Observation 4/1/2012 12 AM 2/11/2012 12 AM 12/23/2011 12 AM 11/3/2011 12 AM 9/14/2011 12 AM 7/26/2011 12 AM 0 1 2 3 4 Weight Applied to ObservationsMARS will consider a “variable” as a weighting factor.Here, the observations in April 2012 were 3 timesmore important than observations in Sep 2011.
20. 20. 100 150 200 250 300 350 50 0 4/8/2012 12 AM 4/8/2012 8 AM 4/8/2012 4 PM 4/9/2012 12 AM 4/9/2012 8 AM 4/9/2012 4 PM4/10/2012 12 AM 4/10/2012 8 AM 4/10/2012 4 PM4/11/2012 12 AM 4/11/2012 8 AM 4/11/2012 4 PM4/12/2012 12 AM 4/12/2012 8 AM Blue line Actual Licenses Used 4/12/2012 4 PM Part of the Training Dataset4/13/2012 12 AM 4/13/2012 8 AM 4/13/2012 4 PM4/14/2012 12 AM 4/14/2012 8 AM 4/14/2012 4 PM4/15/2012 12 AM 4/15/2012 8 AM 4/15/2012 4 PM4/16/2012 12 AM 4/16/2012 8 AM 4/16/2012 4 PM4/17/2012 12 AM 4/17/2012 8 AM 4/17/2012 4 PM4/18/2012 12 AM 4/18/2012 8 AM 4/18/2012 4 PM4/19/2012 12 AM Number of Software Licenses Used and Predicted 4/19/2012 8 AM 4/19/2012 4 PM4/20/2012 12 AM Prediction on Unseen Data 4/20/2012 8 AM 4/20/2012 4 PM Red line is MARS fit on Training Data for 4/18 to 4/15 and Prediction on 4/15 to 4/214/21/2012 12 AM 4/21/2012 8 AM 4/21/2012 4 PM
21. 21. 100 150 200 250 300 350 50 0 8/28/2011 12 AM 9/2/2011 4 PM 9/8/2011 8 AM 9/14/2011 12 AM 9/19/2011 4 PM 9/25/2011 8 AM 10/1/2011 12 AM 10/6/2011 4 PM 10/12/2011 8 AM 10/18/2011 12 AM 10/23/2011 4 PM 10/29/2011 8 AM 11/4/2011 12 AM 11/9/2011 4 PM 11/15/2011 8 AM 11/21/2011 12 AM 11/26/2011 4 PM 12/2/2011 8 AM 12/8/2011 12 AM 12/13/2011 4 PM 12/19/2011 8 AM 12/25/2011 12 AM Prediction Model• Overall trend 12/30/2011 4 PM 1/5/2012 8 AM Training Dataset 1/11/2012 12 AM 1/16/2012 4 PM 1/22/2012 8 AM 1/28/2012 12 AM ActualMARS was able to capture: 2/2/2012 4 PM Number of Software Licenses Used 2/8/2012 8 AM 2/14/2012 12 AM• Hourly and Week Day effect 2/19/2012 4 PM 2/25/2012 8 AM• Somewhat captured US holidays 3/2/2012 12 AM 3/7/2012 4 PM 3/13/2012 9 AM 3/19/2012 1 AM 3/24/2012 5 PM 3/30/2012 9 AM 4/5/2012 1 AM 4/10/2012 5 PM
22. 22. Variable Importance -gcv--------------------------------------------------------------- MARS tells youWEEKDAY 100.00000 2713.86182 which variables are mostHOUR 93.20326 2418.96997WEEK_NUMBER 44.00605 903.06390HOLIDAY\$ 21.76427 574.55463 important. Great R-Squared============================== of 90%. Other diagnostics, notN: 15217.52 R-SQUARED: 0.90281 presented here,MEAN DEP VAR: 158.15640 ADJ R-SQUARED: 0.90214 UNCENTERED R-SQUARED = R-0 SQUARED: 0.98493 looked good too.F-STATISTIC = 1344.99320 S.E. OF REGRESSION = 35.12427 P-VALUE = 0.00000 RESIDUAL SUM OF SQUARES = .678790E+07 [MDF,NDF] = [ 38, 5502 ] REGRESSION SUM OF SQUARES = .630548E+08 Actual Used: Range 45 to 344 Licenses Average 95 Standard Dev. 70
23. 23. Can we build a prediction model of theautoregressive form?Demand = Constant Base+ Baseline trend + Effect of Licenses Used from a week ago + Workweek vs. Weekend effect + Holiday effect
24. 24. Set Up Autoregressive Model, Part 2 Creating lag variable for “Used Lag168.” This predictor is the number of licenses used in the same hour, in the same day, in the prior week.
25. 25. MARS found underlying trend when adjusting for otherfactors in the Autoregressive model version. Adjusting for underlying trend makes series stationary. This is necessary for ARIMA models.
26. 26. MARS captures contribution of Used Lag 168 hoursvariable
27. 27. Selected MARS Output Showing Model Form and FitBF1 = ( USED<168> ne . );BF2 = ( USED<168> = . ); Basis Functions andBF3 = max( 0, USED<168> - 42) * BF1; Prediction EquationBF4 = max( 0, 42 - USED<168>) * BF1; from MARS.BF5 = (HOLIDAY\$ in ( "Y" ));BF7 = (MON_TO_FRI in ( 0 )); Note the handling ofBF9 = max( 0, WEEK_NUMBER - 50) * BF1; missing values.BF10 = max( 0, 50 - WEEK_NUMBER) * BF1;BF11 = max( 0, USED<168> - 137) * BF1;BF13 = max( 0, USED<168> - 265) * BF1; Reasonable fit withBF15 = (MON_TO_FRI in ( 0 )) * BF2; 82% R-squaredNumber of Lucenses Needed = 134- 39 * BF1 + 0.58 * BF3 - 2.12 * BF4- 42* BF5 - 21.6 * BF7 - 0.235 * BF9 - 1.598 * BF10 + 0.338 * BF11- 0.535 * BF13 - 38 * BF15;N: 15055.88 R-SQUARED: 0.82525 MEAN DEP VAR: 158.75413 ADJ R-SQUARED: 0.82493F-STATISTIC = 2533.14901 S.E. OF REGRESSION = 47.37796
28. 28. For observations where the 168 lag of the “Used” variable is not missing:Holiday = 1 if it’s a holiday, else 0Weekend = 1 if it’s Saturday or Sunday, else 0A = max( 0, USED<168> - 42)B = max( 0, 42 - USED<168>) AutoregressiveC = max( 0, USED<168> - 137) SplinesD = max( 0, USED<168> - 265)E = max( 0, WEEK_NUMBER - 50)F = max( 0, 50 - WEEK_NUMBER) Trend line Splines Forecasted License Need= 95 - 42*Holiday - 22 * Weekend [0.6 * A - 2.1 * B + 0.3 * C - 0.5 * D] + [- 0.2 * E - 1.6 * F]
29. 29. 100 150 200 250 350 400 300 50 0 9/4/2011 12 AM 9/10/2011 6 AM 9/16/2011 12 PM 9/22/2011 6 PM 9/29/2011 12 AM 10/5/2011 6 AM10/11/2011 12 PM 10/17/2011 6 PM10/24/2011 12 AM 10/30/2011 6 AM 11/5/2011 12 PM 11/11/2011 6 PM11/18/2011 12 AM 11/24/2011 6 AM11/30/2011 12 PM 12/6/2011 6 PM12/13/2011 12 AM 12/19/2011 6 AM12/25/2011 12 PM 12/31/2011 6 PM 1/7/2012 12 AM 1/13/2012 6 AM 1/19/2012 12 PM 1/25/2012 6 PM 2/1/2012 12 AM 2/7/2012 6 AM 2/13/2012 12 PM 2/19/2012 6 PM 2/26/2012 12 AM 3/3/2012 6 AM 3/9/2012 12 PM 3/15/2012 7 PM 3/22/2012 1 AM 3/28/2012 7 AM 4/3/2012 1 PM 4/9/2012 7 PM 4/16/2012 1 AM USED Predicted
30. 30. 100 150 200 250 300 350 400 0 50 4/8/2012 12 AM 4/8/2012 8 AM 4/8/2012 4 PM 4/9/2012 12 AM 4/9/2012 8 AM 4/9/2012 4 PM4/10/2012 12 AM 4/10/2012 8 AM 4/10/2012 4 PM4/11/2012 12 AM 4/11/2012 8 AM Blue line is Actual Used 4/11/2012 4 PM Part of Training Dataset4/12/2012 12 AM 4/12/2012 8 AM 4/12/2012 4 PM4/13/2012 12 AM 4/13/2012 8 AM 4/13/2012 4 PM4/14/2012 12 AM 4/14/2012 8 AM 4/14/2012 4 PM4/15/2012 12 AM 4/15/2012 8 AM 4/15/2012 4 PM4/16/2012 12 AM 4/16/2012 8 AM 4/16/2012 4 PM4/17/2012 12 AM 4/17/2012 8 AM 4/17/2012 4 PM4/18/2012 12 AM 4/18/2012 8 AM 4/18/2012 4 PM Number of Licenses Used and Predicted4/19/2012 12 AM 4/19/2012 8 AM Forecasting Unseen Data 4/19/2012 4 PM4/20/2012 12 AM 4/20/2012 8 AM 4/20/2012 4 PM4/21/2012 12 AM Red line is MARS fit on Training data for 4/8 to 4/14 and Prediction on 4/15 to 4/21 data 4/21/2012 8 AM 4/21/2012 4 PM
31. 31. Number of Licenses 100 150 200 250 300 350 400 50 0 4/8/2012 12 AM 4/8/2012 9 AM 4/8/2012 6 PM 4/9/2012 3 AM 4/9/2012 12 PM 4/9/2012 9 PM 4/10/2012 6 AM 4/10/2012 3 PM4/11/2012 12 AM 4/11/2012 9 AM 4/11/2012 6 PM 4/12/2012 3 AM4/12/2012 12 PM 4/12/2012 9 PM 4/13/2012 6 AM Predicted_AutoRegressive 4/13/2012 3 PM4/14/2012 12 AM 4/14/2012 9 AM 4/14/2012 6 PM 4/15/2012 3 AM4/15/2012 12 PM 4/15/2012 9 PM Actual Used 4/16/2012 6 AM to Actual Licenses Used 4/16/2012 3 PM4/17/2012 12 AM 4/17/2012 9 AM Compare Forecast of Two Models 4/17/2012 6 PM 4/18/2012 3 AM4/18/2012 12 PM 4/18/2012 9 PM 4/19/2012 6 AM 4/19/2012 3 PM4/20/2012 12 AM 4/20/2012 9 AM Predicted Not Auto Reg 4/20/2012 6 PM 4/21/2012 3 AM4/21/2012 12 PM 4/21/2012 9 PM
32. 32. Mathematically MARS is versatile; it models most data types Selects best predictors Models nonlinear relationships Easily finds selective interactive effects Simple to create lag variables as predictors Flexible weighting schemes for observations Can handle missing valuesOperationally Don’t call me for more software license copies on Thursday at noon; everyone else is!