SlideShare a Scribd company logo
1 of 19
Mohamed Baddar
Senior Data Scientist at Careem
Networks GmbH
mohamed.baddar@careem.com
Hybrid Linear and Non-Linear models for time
series prediction - MarketPlace case study
1
2
• Problem Statement
• Notations
• Motivation
• Background
• Hybrid Model
• Marketplace Case Study : Supply Prediction for P2P ride sharing
• Model Pitfalls and possible improvements
• Questions
Agenda
• Customer side objective : Reliable ride sharing service
• Reliability means , whenever a customer asks for ride , he finds a captain
• Captain side :High utilization
• Captains receives requests immediately after he declares himself “free”
• Core task to achieve this objective is to predict supply (number of free captains) and demand (number
of bookings), at each location and time instances
• If a significant gap between supply and demand found, we can fill it by increasing supply
• One way is to apply surge to incentify captains to move to areas in hours where this gap is
expected to happen
• We need to be proactive by predicting the problem and acting before it happen
• Supply prediction can help significantly in that problem
Problem Statement (MarketPlace Objective)
3
Notation
Y Supply
X Surge (Peak) types
T Time patterns (trend , seasonality)
TS Time Series
ARIMA Autoregressive Integrated Moving Average : model for time series forecasting
NN Neural Networks
NL Non Linear Model
L Linear Model
E White Noise Error
4
Currently implemented algorithms
• TS Forecasting : ES,ARIMA : Focus on TS patterns , ARIMA with
covariates assume linear relationship between X and Y
• Machine Learning model : CART, NN capture non linearity
between external factors and predicted quantity but don’t focus
on TS patterns
One possible solution ? Hybridization
• Hybrid model that captures both non linearities between Y and X
and time series patterns T
• Inspired by how ARIMA with covariates is designed
Motivation
5
• Non linear model , output of each layer
is a combination of the set of function in
previous layers. Function are
categorized into, propagation, activation
and output functions
• Parameters
• Number of hidden layers
• Number of neurons in each layer
• (+) Capture complex non linear patterns
• (-) slow training , non interpretable
Neural Networks
6
ARMA (p,q)
• Quantity modeled as linear function of previous values
and fit errors
• Data must be stationary (mean and variance don’t
change over time)
• More complex models are used to capture seasonality
• Combined with Regression to capture effect of external
factors on Time series
• (+) Capture Time series patterns , ARMA structure and
seasonality
• (-) assume linear relationships
ARIMA
7
Data is not (weak) stationary if mean
and variance vary over time
* Differencing
Stationarity and differencing
1-Input Data
2-Log Transformation
Fig 3
8
3- Seasonal and Lag differencing
Hybrid Model (PoC)
NL
(NN) L(ARIMA)
NL.Fitted
L.Fitted
Y,X
E
Y = NL(X) + L(X) + E
* T-D : Transformation and differencing
* Applied for Y and X to preserve
Interpretability
T-D
9
Transform and Stationarize (Y,X) via log transformation and differencing if necessary
NL_M = NULL// Non Linear Model
L_M = NULL // Linear Model
RMSE = Inf
L = 0’s // Assume initial Linear components as 0’s
while(less than max iterations AND delta(RMSE) > threshold)
NL = Y - L
NL_M (Y~X) <- build NN from NL data
K = Y - NL_M_fitted // Remainder from NL_M
L_M = build ARMA(p,q) given K
//If L_M is not NULL model (for ex. p,q both = 0) then hybridization was actually needed
Y.fitted = NL.fitted + L.fitted //assuming mean of E is zero, white noise
E = Y-fitted
sanity check => E is white noise
//updating phase
RMSE = RMSE_calc(E)
Calculate delta(RMSE)
L = L_M_fitted
* RMSE conversion means NL and L models becomes stable and converges
Hybrid Model Building (PoC)
10
Marketplace Case Study (1)
11
Data Description
• Data is partitioned by Zone (For example Berlin Mitte , Dubai Al Barsha)
• For Each zone , data is aggregated on time granularity level (hour, 15 or 30 min)
• A time series is create for the supply level for this zone at this time window
• Hybrid model is applied to model supply relation with time and surge. Also to predict future values with
difference surge values. It works like a what-if analysis tool
MarketPlace Case Study (3)
12
• If needed , seasonal differencing (frequency = 4*24) then Lag1 differencing
• Neural network with backpropagation training is used as a non-linear model
Y <- NN(dow,Hour,Minute,Surge_1,Surge_2)
Y = (average) number of captains , in zone and time window
Dow => day of week
Hour => factor with 24 levels
minute => 4 level factor : 0,15,30,45
Surge_1,2 : different types of Surges (peak)
• Number of neurons per level = 10 , 1 level
• ARMA model for linear model max p, q = 5
• Model applied for each zone in each city
• Maximum number of Hybrid model iterations = 3
Model implementation
13
In-sample data performance
• E = Y - Y.fitted
Y.fitted = NN.fitted +L.fitted
• NN build with 1 layer with 10 neurons
• Sample ARMA model, for one of the dataset :
ARMA(4,2)
• Accuracy, On 5 Zones :
• Average RMSE for NN only = 39
• Average RMSE for NN+ARIMA = 32
• Improvement = approx. 18%
Model Accuracy and diagnostics
14
White noise
Remaining Work after first POC
• On Algorithmic side
• Formal verification of the hybridization method
• Experimenting other NL models (CART, GBM, RF) and L Models for TS (ARIMAX, Transfer
functions)
• Further analysis on algorithm convergence
• Explore modification of core NN optimization to adapt with error AR and MA patterns
• On Implementation
• For R neuralnet packages, sometimes NN fails to build , building algorithm doesn’t converge.
• Scaling method for more zones, doing experimentation on more datasets
• On accuracy measures
• Cross validation for NN and rolling origin for ARIMA (as a kind of unit testing)
• Cross validation and rolling origin for the hybrid model
This still WIP !
15
16
• Apply LSTM RNN for time series prediction
• More complex NN
• More complex TS models
• Estimator variance, stability , multiple initial models to avoid local optima
• Revisiet algorithm convergence
Feedback from audience
Questions
17
We are Hiring !
18
Shukran! Thank you! Danke Schön
19

More Related Content

What's hot

Lesson08_new
Lesson08_newLesson08_new
Lesson08_new
shengvn
 
FPGA-Sketch Board
FPGA-Sketch BoardFPGA-Sketch Board
FPGA-Sketch Board
shahparin
 
Improving Sketch Reconstruction Accuracy
Improving Sketch Reconstruction AccuracyImproving Sketch Reconstruction Accuracy
Improving Sketch Reconstruction Accuracy
Gene Moo Lee
 
Rightand wrong[1]
Rightand wrong[1]Rightand wrong[1]
Rightand wrong[1]
slange
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Simplilearn
 

What's hot (20)

Keeping the same rules 2
Keeping the same rules 2Keeping the same rules 2
Keeping the same rules 2
 
Block diagram representation
Block diagram representationBlock diagram representation
Block diagram representation
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Lesson08_new
Lesson08_newLesson08_new
Lesson08_new
 
Md university cmis 102 week 3 hands
Md university cmis 102 week 3 handsMd university cmis 102 week 3 hands
Md university cmis 102 week 3 hands
 
A Tutorial on Computational Geometry
A Tutorial on Computational GeometryA Tutorial on Computational Geometry
A Tutorial on Computational Geometry
 
FPGA-Sketch Board
FPGA-Sketch BoardFPGA-Sketch Board
FPGA-Sketch Board
 
Adj Exp Smoothing
Adj Exp SmoothingAdj Exp Smoothing
Adj Exp Smoothing
 
Hungarian Method
Hungarian MethodHungarian Method
Hungarian Method
 
Improving Sketch Reconstruction Accuracy
Improving Sketch Reconstruction AccuracyImproving Sketch Reconstruction Accuracy
Improving Sketch Reconstruction Accuracy
 
Ge6757 unit2
Ge6757   unit2Ge6757   unit2
Ge6757 unit2
 
Intro to Forecasting in R - Part 4
Intro to Forecasting in R - Part 4Intro to Forecasting in R - Part 4
Intro to Forecasting in R - Part 4
 
CS6491Project4
CS6491Project4CS6491Project4
CS6491Project4
 
X‾ and r charts
X‾ and r chartsX‾ and r charts
X‾ and r charts
 
Boundary fill algm
Boundary fill algmBoundary fill algm
Boundary fill algm
 
Rightand wrong[1]
Rightand wrong[1]Rightand wrong[1]
Rightand wrong[1]
 
Graph-based SLAM
Graph-based SLAMGraph-based SLAM
Graph-based SLAM
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Trend adjusted exponential smoothing forecasting metho ds
Trend adjusted exponential smoothing forecasting metho dsTrend adjusted exponential smoothing forecasting metho ds
Trend adjusted exponential smoothing forecasting metho ds
 
Run chart
Run chartRun chart
Run chart
 

Similar to ANN ARIMA Hybrid Models for Time Series Prediction

Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
Kumar
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
AntareepMajumder
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 

Similar to ANN ARIMA Hybrid Models for Time Series Prediction (20)

Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdf
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
timeseries cheat sheet with example code for R
timeseries cheat sheet with example code for Rtimeseries cheat sheet with example code for R
timeseries cheat sheet with example code for R
 
Chpt7
Chpt7Chpt7
Chpt7
 
Applications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency TradingApplications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency Trading
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
 
Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
Space time & power.
Space time & power.Space time & power.
Space time & power.
 
iterativealgorithms.ppsx
iterativealgorithms.ppsxiterativealgorithms.ppsx
iterativealgorithms.ppsx
 
Iterative Algorithms.ppsx
Iterative Algorithms.ppsxIterative Algorithms.ppsx
Iterative Algorithms.ppsx
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 

Recently uploaded

Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 

Recently uploaded (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 

ANN ARIMA Hybrid Models for Time Series Prediction

  • 1. Mohamed Baddar Senior Data Scientist at Careem Networks GmbH mohamed.baddar@careem.com Hybrid Linear and Non-Linear models for time series prediction - MarketPlace case study 1
  • 2. 2 • Problem Statement • Notations • Motivation • Background • Hybrid Model • Marketplace Case Study : Supply Prediction for P2P ride sharing • Model Pitfalls and possible improvements • Questions Agenda
  • 3. • Customer side objective : Reliable ride sharing service • Reliability means , whenever a customer asks for ride , he finds a captain • Captain side :High utilization • Captains receives requests immediately after he declares himself “free” • Core task to achieve this objective is to predict supply (number of free captains) and demand (number of bookings), at each location and time instances • If a significant gap between supply and demand found, we can fill it by increasing supply • One way is to apply surge to incentify captains to move to areas in hours where this gap is expected to happen • We need to be proactive by predicting the problem and acting before it happen • Supply prediction can help significantly in that problem Problem Statement (MarketPlace Objective) 3
  • 4. Notation Y Supply X Surge (Peak) types T Time patterns (trend , seasonality) TS Time Series ARIMA Autoregressive Integrated Moving Average : model for time series forecasting NN Neural Networks NL Non Linear Model L Linear Model E White Noise Error 4
  • 5. Currently implemented algorithms • TS Forecasting : ES,ARIMA : Focus on TS patterns , ARIMA with covariates assume linear relationship between X and Y • Machine Learning model : CART, NN capture non linearity between external factors and predicted quantity but don’t focus on TS patterns One possible solution ? Hybridization • Hybrid model that captures both non linearities between Y and X and time series patterns T • Inspired by how ARIMA with covariates is designed Motivation 5
  • 6. • Non linear model , output of each layer is a combination of the set of function in previous layers. Function are categorized into, propagation, activation and output functions • Parameters • Number of hidden layers • Number of neurons in each layer • (+) Capture complex non linear patterns • (-) slow training , non interpretable Neural Networks 6
  • 7. ARMA (p,q) • Quantity modeled as linear function of previous values and fit errors • Data must be stationary (mean and variance don’t change over time) • More complex models are used to capture seasonality • Combined with Regression to capture effect of external factors on Time series • (+) Capture Time series patterns , ARMA structure and seasonality • (-) assume linear relationships ARIMA 7
  • 8. Data is not (weak) stationary if mean and variance vary over time * Differencing Stationarity and differencing 1-Input Data 2-Log Transformation Fig 3 8 3- Seasonal and Lag differencing
  • 9. Hybrid Model (PoC) NL (NN) L(ARIMA) NL.Fitted L.Fitted Y,X E Y = NL(X) + L(X) + E * T-D : Transformation and differencing * Applied for Y and X to preserve Interpretability T-D 9
  • 10. Transform and Stationarize (Y,X) via log transformation and differencing if necessary NL_M = NULL// Non Linear Model L_M = NULL // Linear Model RMSE = Inf L = 0’s // Assume initial Linear components as 0’s while(less than max iterations AND delta(RMSE) > threshold) NL = Y - L NL_M (Y~X) <- build NN from NL data K = Y - NL_M_fitted // Remainder from NL_M L_M = build ARMA(p,q) given K //If L_M is not NULL model (for ex. p,q both = 0) then hybridization was actually needed Y.fitted = NL.fitted + L.fitted //assuming mean of E is zero, white noise E = Y-fitted sanity check => E is white noise //updating phase RMSE = RMSE_calc(E) Calculate delta(RMSE) L = L_M_fitted * RMSE conversion means NL and L models becomes stable and converges Hybrid Model Building (PoC) 10
  • 12. Data Description • Data is partitioned by Zone (For example Berlin Mitte , Dubai Al Barsha) • For Each zone , data is aggregated on time granularity level (hour, 15 or 30 min) • A time series is create for the supply level for this zone at this time window • Hybrid model is applied to model supply relation with time and surge. Also to predict future values with difference surge values. It works like a what-if analysis tool MarketPlace Case Study (3) 12
  • 13. • If needed , seasonal differencing (frequency = 4*24) then Lag1 differencing • Neural network with backpropagation training is used as a non-linear model Y <- NN(dow,Hour,Minute,Surge_1,Surge_2) Y = (average) number of captains , in zone and time window Dow => day of week Hour => factor with 24 levels minute => 4 level factor : 0,15,30,45 Surge_1,2 : different types of Surges (peak) • Number of neurons per level = 10 , 1 level • ARMA model for linear model max p, q = 5 • Model applied for each zone in each city • Maximum number of Hybrid model iterations = 3 Model implementation 13
  • 14. In-sample data performance • E = Y - Y.fitted Y.fitted = NN.fitted +L.fitted • NN build with 1 layer with 10 neurons • Sample ARMA model, for one of the dataset : ARMA(4,2) • Accuracy, On 5 Zones : • Average RMSE for NN only = 39 • Average RMSE for NN+ARIMA = 32 • Improvement = approx. 18% Model Accuracy and diagnostics 14 White noise
  • 15. Remaining Work after first POC • On Algorithmic side • Formal verification of the hybridization method • Experimenting other NL models (CART, GBM, RF) and L Models for TS (ARIMAX, Transfer functions) • Further analysis on algorithm convergence • Explore modification of core NN optimization to adapt with error AR and MA patterns • On Implementation • For R neuralnet packages, sometimes NN fails to build , building algorithm doesn’t converge. • Scaling method for more zones, doing experimentation on more datasets • On accuracy measures • Cross validation for NN and rolling origin for ARIMA (as a kind of unit testing) • Cross validation and rolling origin for the hybrid model This still WIP ! 15
  • 16. 16 • Apply LSTM RNN for time series prediction • More complex NN • More complex TS models • Estimator variance, stability , multiple initial models to avoid local optima • Revisiet algorithm convergence Feedback from audience
  • 19. Shukran! Thank you! Danke Schön 19