More Related Content
Similar to 1645 track 2 ard_using our laptop
Similar to 1645 track 2 ard_using our laptop (20)
More from Rising Media, Inc.
More from Rising Media, Inc. (20)
1645 track 2 ard_using our laptop
- 1. © 2016 Micron Technology, Inc. |
©2016 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to
change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or compatibility,
are provided for informational purposes only and do not modify the warranty, if any, applicable to any
product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the
property of Micron Technology, Inc. All other trademarks are the property of their respective owners.
Demand Forecasting with Machine Learning
Colin Ard
- 2. © 2016 Micron Technology, Inc. |
Agenda
- Demand Forecasting at Micron
- Classical Time Series Analysis
- Building Predictive Models for Time Series with Machine Learning
- Demand Forecasting with Machine Learning Ensembles
2
- 3. © 2016 Micron Technology, Inc. |
A Bit About Micron
3
- Founded in 1978 in Boise, ID
- First fabrication unit completed in 1980
- Growth through expansion and acquisition
- Global company with over 30,000 employees
- 4. © 2016 Micron Technology, Inc. |
Demand Forecasting at Micron
4
- 5. - Scope: Tens of thousands of series requiring forecasting
- Scale: Consistent high demand vs sparse low demand
- Structure…
Data Complexities
- 6. © 2016 Micron Technology, Inc. |
0
Univariate Time Series Analysis
6
𝑌𝑡 = 𝛼 + 𝛽𝑡 + 𝑒𝑡, 𝑒𝑡 ~Normal 0, 𝜎2
Solve by minimizing loss function: 𝐿 𝑌, 𝛼, 𝛽
Residual
𝐿 𝑌, 𝛼, 𝛽 = 𝑌𝑡 − 𝛼 − 𝛽𝑡 2
𝑇
𝑡=1
Squared error loss
- 7. © 2016 Micron Technology, Inc. |
Univariate Time Series Analysis
7
ACFℎ = Correlation 𝑌𝑡, 𝑌𝑡−ℎ
Partial ACFℎ = Correlation 𝑌𝑡, 𝑌𝑡−ℎ|𝑌𝑡−1, … 𝑌𝑡−ℎ+1
Residual
𝐿 𝑌, 𝛼, 𝛽 = 𝑌𝑡 − 𝛼 − 𝛽𝑡 2
𝑇
𝑡=1
Squared error loss
𝑌𝑡 = 𝛼 + 𝛽𝑡 + 𝑒𝑡, 𝑒𝑡 ~Normal 0, 𝜎2
Solve by minimizing loss function: 𝐿 𝑌, 𝛼, 𝛽
- 8. © 2016 Micron Technology, Inc. |
Auto-regression (AR): 𝑌𝑡 = 𝛿 + 𝜑𝑌𝑡−1 + 𝑤𝑡
Univariate Time Series Analysis
8
Differencing: ∆𝑌𝑡 = 𝑌𝑡 − 𝑌𝑡−1
Moving average (MA): 𝑌𝑡 = 𝜇 + 𝜃𝑤𝑡−1 + 𝑤𝑡
Non-stationary series
Difference: 𝑌𝑡 → ∆𝑌𝑡
AR 1
MA 1
Residuals from Differenced Series
- 9. © 2016 Micron Technology, Inc. |
Univariate Time Series Analysis
9
Non-stationary series
Difference: 𝑌𝑡 → ∆𝑌𝑡
Residuals from ARIMA (1, 1, 1)
Auto-regression (AR): 𝑌𝑡 = 𝛿 + 𝜑𝑌𝑡−1 + 𝑤𝑡
Differencing: ∆𝑌𝑡 = 𝑌𝑡 − 𝑌𝑡−1
Moving average (MA): 𝑌𝑡 = 𝜇 + 𝜃𝑤𝑡−1 + 𝑤𝑡
- 10. © 2016 Micron Technology, Inc. |
Univariate Time Series Analysis
10
Non-stationary series
Difference: 𝑌𝑡 → ∆𝑌𝑡
Forecasted Value
Expected Value
Residuals from ARIMA (1, 1, 1)
Auto-regression (AR): 𝑌𝑡 = 𝛿 + 𝜑𝑌𝑡−1 + 𝑤𝑡
Differencing: ∆𝑌𝑡 = 𝑌𝑡 − 𝑌𝑡−1
Moving average (MA): 𝑌𝑡 = 𝜇 + 𝜃𝑤𝑡−1 + 𝑤𝑡
- 11. © 2016 Micron Technology, Inc. |11
A Machine Learning Approach
- 12. © 2016 Micron Technology, Inc. |12
A Machine Learning Approach
- 13. © 2016 Micron Technology, Inc. |13
A Machine Learning Approach
- 14. © 2016 Micron Technology, Inc. |14
A Machine Learning Approach
- 15. © 2016 Micron Technology, Inc. |15
A Machine Learning Approach
- 16. © 2016 Micron Technology, Inc. |
The Bias-Variance Tradeoff
16
𝐿 𝑌, 𝑓; 𝛾 = 𝑌𝑖 − 𝑓 𝑋𝑖
2
𝑁
𝑖=1
+ 𝛾 𝑓′′
𝑠 2
𝑑𝑠
𝛾 = ∞ 𝛾 = 00 < 𝛾 < ∞
Squared error loss Complexity penalty
𝛾 ≥ 0: Tuning Parameter
- 17. © 2016 Micron Technology, Inc. | 17
Demand History
Naïve Forecast
Alternate Forecasts
Ensemble Forecast
Machine Learning
- 18. © 2016 Micron Technology, Inc. |
Ensembling Methods Pt. 1
18
Final pre-processing steps:
- Cumulative forecasts totals
- Model demand over the next 3 months, as opposed to demand
3 months from now
- Separate ensemble models trained for each cumulative
forecast span
- Feature sorting for suitability in modeling:
- linear associations
- interaction-dependent associations
- Feature/Outcome transformation and scaling…
Outcome𝑖 =
Actual𝑖 − Naive𝑖
𝑐 + Actual𝑖 + Naive𝑖
Feature𝑖𝑚 =
Forecast 𝑖𝑚 − Naive𝑖
𝑐 + Forecast 𝑖𝑚 + Naive𝑖
∆𝑌𝑖
Stacked Generalization
𝑓1 𝑋𝑖
…
𝑌𝑖 = 𝐹 𝑋𝑖
𝑓2 𝑋𝑖 𝑓3 𝑋𝑖 𝑓 𝑀 𝑋𝑖
𝐹 ∆𝑌𝑖 1
, ∆𝑌𝑖 2
, … , ∆𝑌𝑖 𝑀
, …
~
- 19. © 2016 Micron Technology, Inc. |
Ensembling Methods Pt. 2
19
Naïve Forecast: Qty
∆𝑌 ARIMA
100K
-0.2
1M
R1
R2
R3 R4
∆𝑌𝑖 = 𝐿 𝑹, ∆𝑌𝑖 1
, ∆𝑌𝑖 2
, … , ∆𝑌𝑖 𝑀
, …
…
Boosting Algorithms
𝑌0
∗
= 𝑌
ℎ0 𝑋
𝑌1
∗
ℎ1 𝑋
𝑌 𝑀
∗
ℎ 𝑀 𝑋 𝑌 = 𝐹 𝑀 𝑋
- 20. © 2016 Micron Technology, Inc. |
Ensemble Methods Pt. 3
20
Training Data
T1 T2 T3 TB
…
Bootstrap Aggregation
Bagged Estimate
𝑌𝑖 =
1
𝐵
𝑌𝑖
𝑏
𝐵
𝑏=1
𝑋𝑖1, 𝑌𝑖1 𝑋𝑖2, 𝑌𝑖2 … 𝑋𝑖,𝑇−ℎ, 𝑌𝑖,𝑇−ℎ 𝑋𝑖𝑠 𝑖
, 𝑌𝑖𝑠 𝑖
𝑋1𝑠1
, 𝑌1𝑠1
𝑋2𝑠2
, 𝑌2𝑠2
…
Candidate model inputs for Product i
Full sample at bth training iteration
Goal:
- A generalizable model for change in demand
Challenges:
- Have to assume the fundamentals that drove historical
demand are likely to fluctuate over time
- Limited to validation of the algorithm rather than
testing of predictions from a specific trained model
𝑆𝑎𝑚𝑝𝑙𝑒
Forecasting h months ahead from month T
𝑋𝑖𝑠 𝑖
, 𝑌𝑖𝑠 𝑖
… 𝑋 𝑁𝑠 𝑁
, 𝑌𝑁𝑠 𝑁
- 21. © 2016 Micron Technology, Inc. |
Forecast Accuracy Metrics and Validation
21
𝑤𝑀𝐴𝑃𝐸 = 100 ×
Actual − Forecast
Actual
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 − 𝑤𝑀𝐴𝑃𝐸
Training Data:
𝑋𝑖𝑠, 𝑌𝑖𝑠 , 𝑠 ∈ 1, … , 𝑇 − ℎ
Validation Data:
𝑋𝑖𝑇, 𝑌𝑖𝑇
𝑃𝑟𝑒𝑑𝑖𝑐𝑡
Training Data:
𝑋𝑖𝑠, 𝑌𝑖𝑠 , 𝑠 ∈ 2, … , 𝑇 − ℎ + 1
Validation Data:
𝑋𝑖,𝑇+1, 𝑌𝑖,𝑇+1
𝑃𝑟𝑒𝑑𝑖𝑐𝑡
…
Training Data:
𝑋𝑖𝑠, 𝑌𝑖𝑠 , 𝑠 ∈ 𝐾 + 1, … , 𝑇 − ℎ + 𝐾
Validation Data:
𝑋𝑖,𝑇+𝐾, 𝑌𝑖,𝑇+𝐾
𝑃𝑟𝑒𝑑𝑖𝑐𝑡
Training Data:
𝑋𝑖𝑠, 𝑌𝑖𝑠 , 𝑠 ∈ 3, … , 𝑇 − ℎ + 2
Validation Data:
𝑋𝑖,𝑇+2, 𝑌𝑖,𝑇+2
𝑃𝑟𝑒𝑑𝑖𝑐𝑡
- 22. © 2016 Micron Technology, Inc. |
Forecast Accuracy Validation
22
Lag-1 Lag-2 (Cumulative) Lag-3 (Cumulative)
- 23. © 2016 Micron Technology, Inc. |
Conclusions
23
- Significant gains in forecast accuracy across lags
- Understand the challenge and play to your strengths
- Business processes, available data, and goals of the analysis
- Institutional knowledge and human expertise
- Acknowledgements
- Micron Enterprise Data Science and Demand Management Teams