SlideShare a Scribd company logo
1 of 19
Download to read offline
HOW TO BUILD
ADVANCED
PREDICTION WITH
ADDING
EXTERNAL DATA
Taras Firman, Ph.D.,
Senior Data Scientist
M:
Advantages – Fast to compute, easier to model, easier to identify changes in trends, better for strategic long term forecasting.
Disadvantages – If you need to plan as the daily level for capacity, people and spoilage of product then higher levels of forecasting
won’t help understand the demand on a daily basis as a 1/30th ratio estimate is clearly insufficient.
W:
Advantages – When you can’t handle the modeling process at a daily level you “settle” for this. When you have very systematic cyclical
cycles like “articice extents” that follow a rigid curve and not need for day of the week variations.
Disadvantages – Floating Holidays like Thanksgiving, Easter, Ramadan, Chinese New Year change every year and disrupt the estimate for
the coefficients for the week of the year impact which can be handled by creating a variable for each.
D:
Advantages – Weekly data can’t deal with holidays and their lead/lag relationships. If a holiday has days 1,2,3 before the holiday as very
large volume a daily model can forecast that while the weekly won’t be able year in and year out model and forecast that impact as the
day of the week that the holiday occurs changes every year.
Disadvantages – Slower to process, but this can be mitigated by reusing models.
Monthly VS weekly VS daily
Prediction approaches
● Frequency domain ● Machine learning● Time domain
Forecasting’s short history
● Generic models
(Moving Average Process MA(q), Exp Smoothing, Autoregressive Process AR(p), Autoregressive
Moving Average ARMA(p, q), Autoregressive Integrated Moving Average ARIMA (p, d, q))
● State Space models and Kalman Filter
● Multivariate vector models
● Feature extraction & ML
● DL approaches
(LSTM Recurrent Neural Networks)
Interpolation and extrapolation
Retail data. Substitutes. Categories
Decomposition tactic
Trend aproximation
The approximation of the trend can be found from the formula below
where Pn
(t) is a degree polynomial and Ak
is a set of indexes, including the first k indexes
with highest amplitudes.
Seasonality VS Cycles
Canadian lynx data
Aperiodic population cycles of approximately 10 years
Monthly sales of new one-family houses sold in USA
Strong seasonality within each year and strong cycles with
period 6-10 years
Half-hourly electricity demand in England
Multi-seasonality with daily and weekly patterns
Calendar
● Holidays
● Vacation
● School vacation
● Fasting and Abstinence
● Festivals
● Shopping holiday
External sources. APIs
Wunderground API
https://www.wunderground.com/weather/api/
Google trends
https://trends.google.com/
OfficeHolidays
http://www.officeholidays.com/
HolidayCalendar
https://holidaycalendar.com/
10Times
https://10times.com
OfficeHolidays
Feature engineering for residuals
● One-hot encoding
● Counting
● Statistical moments
● Percentiles
● Lags
● Logs
● Peaks
● Least-squares spectral analysis
● Nonlinear transformations Factor analysis!
Correlation types
● Pearson correlation is statistic to measure the degree of the relationship between
linearly related variables.
Assumptions: both variables should be normally distributed and have linearity
and homoscedasticity relationship (normally distributed about the regression line)
● Spearman rank correlation is non-parametric test that is used to measure the degree
of association between two variables.
Assumptions: it doesn’t make any assumptions about the distribution.
● Kendall tau is a statistic used to measure the ordinal association between two
measured quantities.
Assumptions: data must be at least ordinal and scores on one variable must
be montonically related to the other variable.
where
where s1
/s2
is number of
concordant/discordant pairs
How to work with a short history?
Stochastic Simulation (Monte-Carlo)
Predicting the Past and Predicting
the Future
Error measuring
is an accuracy measure based on percentage (or relative) errors. One
supposed problem with SMAPE is that it is not symmetric since over- and
under-forecasts are not treated equally.
is scale-dependent
is scale-dependent
is the computed average of percentage errors. The formula can be
used as a measure of the bias in the forecasts
usually expresses accuracy as a percentage. It puts a heavier penalty
on negative errors, than on positive errors.
Robustness. Model selection
If n/k < 40:
where
- the set of model parameters;
- the likelihood of the candidate model given the data;
- the number of estimated parameters in the candidate model;
- the number of observations.
Existing solutions with Python
Pandas Statsmodels Scikit-learn
XGBoost PyFlux Prophet
PyAF TensorFlow Cesium
Usage
● Capacity planning
● Utilization maximization
● Cost minimization
● Dynamic pricing
● Supply chain management
Inspired by Technology.
Driven by Value.
Find us at eleks.com
Have a question? Write to eleksinfo@eleks.com
Taras Firman
email: taras.firman@eleks.com
skype: tarasinho_318
AI&BigData 2017
4 November, Lviv

More Related Content

What's hot

Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning Gokul Jayan
 
Quality management in nursing
Quality management in nursingQuality management in nursing
Quality management in nursingselinasimpson1401
 
Intro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGIntro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGegoodwintx
 
Types of variables and descriptive statistics
Types of variables and descriptive statisticsTypes of variables and descriptive statistics
Types of variables and descriptive statisticsDhritiman Chakrabarti
 
Quality management education
Quality management educationQuality management education
Quality management educationselinasimpson2001
 
UsingAHP_02
UsingAHP_02UsingAHP_02
UsingAHP_02pbaxter
 
Ses 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsSes 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsmetnashikiom2011-13
 
Levels of measurement
Levels of measurementLevels of measurement
Levels of measurementdebmahuya
 
A.1 properties of point estimators
A.1 properties of point estimatorsA.1 properties of point estimators
A.1 properties of point estimatorsUlster BOCES
 
Adaptive short term forecasting
Adaptive short term forecastingAdaptive short term forecasting
Adaptive short term forecastingAlex
 
Displaying and describing categorical data
Displaying and describing categorical dataDisplaying and describing categorical data
Displaying and describing categorical dataOlivia Dombrowski
 
Hrug intro to forecasting
Hrug intro to forecastingHrug intro to forecasting
Hrug intro to forecastingegoodwintx
 
Www.iso 9001
Www.iso 9001Www.iso 9001
Www.iso 9001jomritagu
 

What's hot (20)

Home quality management
Home quality managementHome quality management
Home quality management
 
Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning
 
Quality management in nursing
Quality management in nursingQuality management in nursing
Quality management in nursing
 
Forecasting
ForecastingForecasting
Forecasting
 
Intro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGIntro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUG
 
Project management quality
Project management qualityProject management quality
Project management quality
 
Types of variables and descriptive statistics
Types of variables and descriptive statisticsTypes of variables and descriptive statistics
Types of variables and descriptive statistics
 
Quality management education
Quality management educationQuality management education
Quality management education
 
UsingAHP_02
UsingAHP_02UsingAHP_02
UsingAHP_02
 
Ses 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsSes 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statistics
 
Levels of measurement
Levels of measurementLevels of measurement
Levels of measurement
 
A.1 properties of point estimators
A.1 properties of point estimatorsA.1 properties of point estimators
A.1 properties of point estimators
 
Quality management policy
Quality management policyQuality management policy
Quality management policy
 
Adaptive short term forecasting
Adaptive short term forecastingAdaptive short term forecasting
Adaptive short term forecasting
 
Displaying and describing categorical data
Displaying and describing categorical dataDisplaying and describing categorical data
Displaying and describing categorical data
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Quality management posters
Quality management postersQuality management posters
Quality management posters
 
Hrug intro to forecasting
Hrug intro to forecastingHrug intro to forecasting
Hrug intro to forecasting
 
Quality management journal
Quality management journalQuality management journal
Quality management journal
 
Www.iso 9001
Www.iso 9001Www.iso 9001
Www.iso 9001
 

Similar to Taras Firman "How to build advanced prediction with adding external data."

KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1TimKasse
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateDaniel Koh
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFDaniel Koh
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Storm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SASStorm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SASGautam Sawant
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinMinchao Lin
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfMargiShah29
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic RegressionTaweh Beysolow II
 
Statistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingStatistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingShamshadAli58
 
Machine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation dataMachine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation datajagan477830
 
Weather forecasting model.pptx
Weather forecasting model.pptxWeather forecasting model.pptx
Weather forecasting model.pptxVisheshYadav12
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIStudio Synthesis
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...경록 박
 
Load_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptxLoad_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptxDEEPAKCHAURASIYA37
 

Similar to Taras Firman "How to build advanced prediction with adding external data." (20)

KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generate
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIF
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Storm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SASStorm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SAS
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Statistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingStatistics for Managers pptS for better understanding
Statistics for Managers pptS for better understanding
 
Machine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation dataMachine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation data
 
FinalReport
FinalReportFinalReport
FinalReport
 
Weather forecasting model.pptx
Weather forecasting model.pptxWeather forecasting model.pptx
Weather forecasting model.pptx
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
 
Load_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptxLoad_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptx
 

More from DataConf

Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
Sergiy Lunyakin "Cloud BI with Azure Analysis Services"Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
Sergiy Lunyakin "Cloud BI with Azure Analysis Services"DataConf
 
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"DataConf
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"DataConf
 
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"DataConf
 
Oles Petriv "Semantic image segmentation using word embeddings."
Oles Petriv "Semantic image segmentation using word embeddings."Oles Petriv "Semantic image segmentation using word embeddings."
Oles Petriv "Semantic image segmentation using word embeddings."DataConf
 
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...DataConf
 
Vitalii Bashun "First Spark application in one hour"
Vitalii Bashun "First Spark application in one hour"Vitalii Bashun "First Spark application in one hour"
Vitalii Bashun "First Spark application in one hour"DataConf
 
Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"DataConf
 
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...DataConf
 

More from DataConf (9)

Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
Sergiy Lunyakin "Cloud BI with Azure Analysis Services"Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
 
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
 
Oles Petriv "Semantic image segmentation using word embeddings."
Oles Petriv "Semantic image segmentation using word embeddings."Oles Petriv "Semantic image segmentation using word embeddings."
Oles Petriv "Semantic image segmentation using word embeddings."
 
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
 
Vitalii Bashun "First Spark application in one hour"
Vitalii Bashun "First Spark application in one hour"Vitalii Bashun "First Spark application in one hour"
Vitalii Bashun "First Spark application in one hour"
 
Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"
 
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...
 

Recently uploaded

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Recently uploaded (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Taras Firman "How to build advanced prediction with adding external data."

  • 1. HOW TO BUILD ADVANCED PREDICTION WITH ADDING EXTERNAL DATA Taras Firman, Ph.D., Senior Data Scientist
  • 2. M: Advantages – Fast to compute, easier to model, easier to identify changes in trends, better for strategic long term forecasting. Disadvantages – If you need to plan as the daily level for capacity, people and spoilage of product then higher levels of forecasting won’t help understand the demand on a daily basis as a 1/30th ratio estimate is clearly insufficient. W: Advantages – When you can’t handle the modeling process at a daily level you “settle” for this. When you have very systematic cyclical cycles like “articice extents” that follow a rigid curve and not need for day of the week variations. Disadvantages – Floating Holidays like Thanksgiving, Easter, Ramadan, Chinese New Year change every year and disrupt the estimate for the coefficients for the week of the year impact which can be handled by creating a variable for each. D: Advantages – Weekly data can’t deal with holidays and their lead/lag relationships. If a holiday has days 1,2,3 before the holiday as very large volume a daily model can forecast that while the weekly won’t be able year in and year out model and forecast that impact as the day of the week that the holiday occurs changes every year. Disadvantages – Slower to process, but this can be mitigated by reusing models. Monthly VS weekly VS daily
  • 3. Prediction approaches ● Frequency domain ● Machine learning● Time domain
  • 4. Forecasting’s short history ● Generic models (Moving Average Process MA(q), Exp Smoothing, Autoregressive Process AR(p), Autoregressive Moving Average ARMA(p, q), Autoregressive Integrated Moving Average ARIMA (p, d, q)) ● State Space models and Kalman Filter ● Multivariate vector models ● Feature extraction & ML ● DL approaches (LSTM Recurrent Neural Networks)
  • 8. Trend aproximation The approximation of the trend can be found from the formula below where Pn (t) is a degree polynomial and Ak is a set of indexes, including the first k indexes with highest amplitudes.
  • 9. Seasonality VS Cycles Canadian lynx data Aperiodic population cycles of approximately 10 years Monthly sales of new one-family houses sold in USA Strong seasonality within each year and strong cycles with period 6-10 years Half-hourly electricity demand in England Multi-seasonality with daily and weekly patterns
  • 10. Calendar ● Holidays ● Vacation ● School vacation ● Fasting and Abstinence ● Festivals ● Shopping holiday
  • 11. External sources. APIs Wunderground API https://www.wunderground.com/weather/api/ Google trends https://trends.google.com/ OfficeHolidays http://www.officeholidays.com/ HolidayCalendar https://holidaycalendar.com/ 10Times https://10times.com OfficeHolidays
  • 12. Feature engineering for residuals ● One-hot encoding ● Counting ● Statistical moments ● Percentiles ● Lags ● Logs ● Peaks ● Least-squares spectral analysis ● Nonlinear transformations Factor analysis!
  • 13. Correlation types ● Pearson correlation is statistic to measure the degree of the relationship between linearly related variables. Assumptions: both variables should be normally distributed and have linearity and homoscedasticity relationship (normally distributed about the regression line) ● Spearman rank correlation is non-parametric test that is used to measure the degree of association between two variables. Assumptions: it doesn’t make any assumptions about the distribution. ● Kendall tau is a statistic used to measure the ordinal association between two measured quantities. Assumptions: data must be at least ordinal and scores on one variable must be montonically related to the other variable. where where s1 /s2 is number of concordant/discordant pairs
  • 14. How to work with a short history? Stochastic Simulation (Monte-Carlo) Predicting the Past and Predicting the Future
  • 15. Error measuring is an accuracy measure based on percentage (or relative) errors. One supposed problem with SMAPE is that it is not symmetric since over- and under-forecasts are not treated equally. is scale-dependent is scale-dependent is the computed average of percentage errors. The formula can be used as a measure of the bias in the forecasts usually expresses accuracy as a percentage. It puts a heavier penalty on negative errors, than on positive errors.
  • 16. Robustness. Model selection If n/k < 40: where - the set of model parameters; - the likelihood of the candidate model given the data; - the number of estimated parameters in the candidate model; - the number of observations.
  • 17. Existing solutions with Python Pandas Statsmodels Scikit-learn XGBoost PyFlux Prophet PyAF TensorFlow Cesium
  • 18. Usage ● Capacity planning ● Utilization maximization ● Cost minimization ● Dynamic pricing ● Supply chain management
  • 19. Inspired by Technology. Driven by Value. Find us at eleks.com Have a question? Write to eleksinfo@eleks.com Taras Firman email: taras.firman@eleks.com skype: tarasinho_318 AI&BigData 2017 4 November, Lviv