Successfully reported this slideshow.
Upcoming SlideShare
×

# Model Building Steps: Forecasting the Jobs number

992 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Model Building Steps: Forecasting the Jobs number

1. 1. Model Building Steps Forecasting the jobs number John H. Muller October 1, 2012John H. Muller () Model Building Steps October 1, 2012 1 / 40
2. 2. Outline1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 2 / 40
3. 3. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 3 / 40
4. 4. Goals for the presentationIllustrate issues and choices in a typical model building process.To do that we take the following as out task.Build a model to forecast a macro economic time seriesTime is limited, so we don’t have time to discuss: econometrics or macroeconomics time series methods details or merits of particular modeling or model ﬁtting methods John H. Muller () Model Building Steps October 1, 2012 4 / 40
5. 5. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 5 / 40
6. 6. Total nonfarm (FRED symbol = PAYEMS) 140000 120000 100000 80000 60000 1960 1970 1980 1990 2000 2010Every month BLS publishes the Employment Situation report.Two most important numbers: Unemployment rate and Total nonfarmTotal nonfarm: count of jobs from survey of businesses(units: thousands of jobs)John H. Muller () Model Building Steps October 1, 2012 6 / 40
7. 7. Total nonfarm (FRED symbol = PAYEMS) 140000 120000 100000 80000 60000 1960 1970 1980 1990 2000 2010Every month BLS publishes the Employment Situation report.Two most important numbers: Unemployment rate and Total nonfarmTotal nonfarm: count of jobs from survey of businesses(units: thousands of jobs)Task: Forecast month-over-month change in Total nonfarmJohn H. Muller () Model Building Steps October 1, 2012 6 / 40
8. 8. Total nonfarm (FRED symbol = PAYEMS) 138000 136000 134000 132000 130000 2000 2002 2004 2006 2008 2010 2012 Figure: PAYEMS since 2000John H. Muller () Model Building Steps October 1, 2012 7 / 40
9. 9. Month over Month Change in PAYEMS 1000 500 0 −500 −1000 2000 2002 2004 2006 2008 2010 2012 mean=23, sd = 289 Figure: Month-over-month change in PAYEMSJohn H. Muller () Model Building Steps October 1, 2012 8 / 40
10. 10. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 9 / 40
11. 11. ID DescriptionALTSALES Light Weight Vehicle Sales: Autos & Light TrucksBUSLOANS Commercial and Industrial Loans at All Commercial BanksCE16OV Civilian EmploymentCIVPART Civilian Labor Force Participation RateCLF16OV Civilian Labor ForceCONSUMER Consumer Loans at All Commercial BanksCPATAX Corporate Proﬁts After TaxDPI Disposable Personal IncomePAYEMS All Employees: Total nonfarmPCE Personal Consumption ExpendituresPSAVERT Personal Saving RateSRVPRD All Employees: Service-Providing IndustriesTCU Capacity Utilization: Total IndustryUEMP27OV Civilians Unemployed for 27 Weeks and OverUEMPLT5 Civilians Unemployed - Less Than 5 WeeksUEMPMEAN Average (Mean) Duration of UnemploymentUEMPMED Median Duration of UnemploymentUNEMPLOY UnemployedUNRATE Civilian Unemployment RateUSGOOD All Employees: Goods-Producing Industries Table: Variables and descriptionsJohn H. Muller () Model Building Steps October 1, 2012 10 / 40
12. 12. 6000 16000 UNEMPLOY UNRATE USGOOD 4 6 810 18000 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 UEMPLT5 UEMPMEAN UEMPMED 5 15 25 15 302500 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 PAYEMS SRVPRD UEMP27OV 105000128000 2000 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 CE16OV CIVPART CLF16OV 64 66135000 140000 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 Jobs Figure: Original Series John H. Muller () Model Building Steps October 1, 2012 11 / 40
13. 13. PCE PSAVERT −2 0 2 4 66000 9000 2000 2004 2008 2012 2000 2004 2008 2012 ALTSALES CONSUMER DPI 600 100010 14 18 6000 9000 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 Consumer Figure: Original Series John H. Muller () Model Building Steps October 1, 2012 12 / 40
14. 14. TCU70 75 80 2000 2002 2004 2006 2008 2010 2012 BUSLOANS CPATAX1000 1400 600 1200 2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012 Business Figure: Original Series John H. Muller () Model Building Steps October 1, 2012 13 / 40
15. 15. UNEMPLOY UNRATE USGOOD −10000 −0.4 0.2−500 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 UEMPLT5 UEMPMEAN UEMPMED−500 500 −1 12 −20 2 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 −1000 1000 PAYEMS SRVPRD UEMP27OV −400 400−1000 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 −10003000 CE16OV CIVPART CLF16OV −0.4 0.2−1000 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 Jobs Figure: Diﬀerenced Series John H. Muller () Model Building Steps October 1, 2012 14 / 40
16. 16. PCE PSAVERT100 200 4 2 0−100 0 −4 −2 2000 2004 2008 2012 2000 2004 2008 2012 ALTSALES CONSUMER DPI 3004 100 2002 −100 0 100−4 −2 0 0 2000 2004 2008 2012 2000 2004 2008 2012 20002002 2006 2010 Consumer Figure: Diﬀerenced Series John H. Muller () Model Building Steps October 1, 2012 15 / 40
17. 17. TCU10−2 −1 2000 2002 2004 2006 2008 2010 2012 BUSLOANS CPATAX0 20 40 60 −100 0 100 200−40 2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012 Business Figure: Diﬀerenced Series John H. Muller () Model Building Steps October 1, 2012 16 / 40
18. 18. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 17 / 40
19. 19. Preliminaries choose target and predictor variables consideration might include: history, cost, frequency, accuracy choose model form & method: Lasso & random forest alternatives: neural networks, OLS, robust regression, ... Criteria for choosing: ◮ prediction accuracy ◮ interpretability ◮ suitability to the task and data ◮ available software, model maintenance, implementation complexity Derive variables from inputs. smoothed, standardized alternatives: powers of original variables, cross terms, ratios Plan for estimating out of sample error: cross validation & test/train split John H. Muller () Model Building Steps October 1, 2012 18 / 40
20. 20. PreliminariesData issues Missing data: remove alternatives: impute, ignore (for some model forms) Outliers: trim to within 3 sd of rolling mean alternatives: ignore, remove Correlated predictor variable: ignore alternatives: cluster variables and choose 1 from each cluster John H. Muller () Model Building Steps October 1, 2012 19 / 40
21. 21. Figure: Trimmed and Smoothed UEMPMED UNEMPLOY UNRATE USGOOD −500 500 −1000 0−2 0 2 −0.4 0.2 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 SRVPRD UEMP27OV UEMPLT5 UEMPMEAN −4000 400 −500 500−1000 500 −1 1 2 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 −1000 1000 CE16OV CIVPART CLF16OV PAYEMS −1000 3000−1000 3000 −0.4 0.2 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 Jobs John H. Muller () Model Building Steps October 1, 2012 20 / 40
22. 22. Figure: Trimmed and Smoothed PCE PSAVERT−100 0 100200 2000 2004 2008 2012 −4 −2 0 2 4 2000 2004 2008 2012 ALTSALES CONSUMER DPI −100 100 300−4 −2 0 2 4 0 100 200 2000 2004 2008 2012 2000 2004 2008 2012 2000 2004 2008 2012 Consumer John H. Muller () Model Building Steps October 1, 2012 21 / 40
23. 23. Figure: Trimmed and Smoothed TCU1−2 −1 0 2000 2002 2004 2006 2008 2010 2012 BUSLOANS CPATAX0 20 40 60 −100 0 100−40 2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012 Business John H. Muller () Model Building Steps October 1, 2012 22 / 40
24. 24. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 23 / 40
25. 25. Fitting and Tuning the ModelComplexity: how many knobs the model hase.g. degrees of freedom,# variables, shrinkage factor, tree size, ...Fitting: estimating parameters for given complexityMethods: least squares, method of moments, maximum likelihood, optimizationTuning: adjusting the models complexityPossibly iterative, using diagnostics: out of sample error ∂error sensitivity, e.g. ∂data signiﬁcance of parameters error structure, e.g. heteroskedastic alignment with prior beliefs Which variabless are important for the model? John H. Muller () Model Building Steps October 1, 2012 24 / 40
26. 26. Cross−Validated MSE 8e+04 4e+04 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of final L1 normJohn H. Muller () Model Building Steps October 1, 2012 25 / 40
27. 27. LASSOStandardized Coefficients * 8 7 4 * * ** * * * * * * * * ** ** **** *** * *** * * *** * *** * ** ** * * ** * * * * * * * * * * * * * ** * * * ** **** *** *** *** * ** * * * * *** * *** * ** * * *** * *** ** * ** ** * * * * * * * 0 ** * * * ** * **** *** * **** *** * *** **** *** *** * *** *** * ** * ** * * * * * * * * * * * * * * * ** ** *** **** ** * *** * * *** * *** * ** * * * ** * * * * * * * * * * * *** * * * * *** * * ** * * * * * 3 * −2000 * * * 18 * 0.0 0.2 0.4 0.6 0.8 1.0 |beta|/max|beta|John H. Muller () Model Building Steps October 1, 2012 26 / 40
28. 28. variable estimate ALTSALES 0.000 BUSLOANS 0.000 CE16OV 0.157 CIVPART 0.000 CLF16OV 0.000 CONSUMER 0.000 CPATAX 0.000 DPI 0.000 PAYEMS 0.028 PCE 2.250 PSAVERT -15.350 SRVPRD 0.000 TCU 0.000 UEMP27OV -0.265 UEMPLT5 0.000 UEMPMEAN -18.559 UEMPMED 0.000 UNEMPLOY 0.000 UNRATE -305.852 USGOOD 0.292 Table: Coeﬃcient estimates for LASSOJohn H. Muller () Model Building Steps October 1, 2012 27 / 40
29. 29. Random Forest: predictor variable importance USGOOD TCU UNEMPLOY CE16OV UEMP27OV UNRATE PCE CLF16OV PAYEMS CIVPART SRVPRD UEMPMEAN UEMPMED PSAVERT CONSUMER BUSLOANS UEMPLT5 DPI CPATAX ALTSALES 0 200000 400000 600000 800000 1000000 1200000 IncNodePurityJohn H. Muller () Model Building Steps October 1, 2012 28 / 40
30. 30. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 29 / 40
31. 31. Model SelectionModel selection: choosing the best among diﬀerent modelsOur criteria: prediction accuracyHow will we measure this? Training set cross validation estimates of out of sample MSE RF: 52,000 Lasso: 62,000 Separate test data. 25,000 essentially the same for both! John H. Muller () Model Building Steps October 1, 2012 30 / 40
32. 32. 10005000−500−1000 target rf lasso 2000 2002 2004 2006 2008 2010 2012John H. Muller () Model Building Steps October 1, 2012 31 / 40
33. 33. 5000−1000 rf lasso 2000 2002 2004 2006 2008 2010 2012 Figure: Training set errorJohn H. Muller () Model Building Steps October 1, 2012 32 / 40
34. 34. Random Forest Lasso 1.0 1.0 0.6 0.6ACF ACF 0.2 0.2 −0.2 −0.2 0 5 10 15 20 0 5 10 15 20 Lag Lag Figure: Training error ACF John H. Muller () Model Building Steps October 1, 2012 33 / 40
35. 35. 500 target rf lasso4003002001000 Jan Mar May Jul SepJohn H. Muller () Model Building Steps October 1, 2012 34 / 40
36. 36. 2000−200−400 rf lasso Jan Mar May Jul Sep Figure: Test set errorJohn H. Muller () Model Building Steps October 1, 2012 35 / 40
37. 37. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 36 / 40
38. 38. PredictionRandom Forest: 120Lasso: 86 John H. Muller () Model Building Steps October 1, 2012 37 / 40
39. 39. 1 Goals2 The Task Forecasting the jobs number Predictor Variables3 Modeling Process Preliminaries Fitting and Tuning the Model Model Selection Prediction4 Resources John H. Muller () Model Building Steps October 1, 2012 38 / 40
40. 40. The Secrets of Economic Indicators, Bernard BaumohlThe Elements of Statistical Learning, Hastie, Tibshirani, FriedmanMacroeconomic Patterns and Stories, Edward E. LeamerAnalysis of Financial Time Series, Ruey S. Tsayhttp://api.stlouisfed.org/docs/fred/good source for both FRED and ALFREDhttp://cran.r-project.org/John H. Muller () Model Building Steps October 1, 2012 39 / 40
41. 41. Thank you! and thank you to John Verostek, Vladimir Valenta and Steve KusiakJohn H. Muller () Model Building Steps October 1, 2012 40 / 40