SlideShare a Scribd company logo
HOW TO BUILD
ADVANCED
PREDICTION WITH
ADDING
EXTERNAL DATA
Taras Firman, Ph.D.,
Senior Data Scientist
M:
Advantages – Fast to compute, easier to model, easier to identify changes in trends, better for strategic long term forecasting.
Disadvantages – If you need to plan as the daily level for capacity, people and spoilage of product then higher levels of forecasting
won’t help understand the demand on a daily basis as a 1/30th ratio estimate is clearly insufficient.
W:
Advantages – When you can’t handle the modeling process at a daily level you “settle” for this. When you have very systematic cyclical
cycles like “articice extents” that follow a rigid curve and not need for day of the week variations.
Disadvantages – Floating Holidays like Thanksgiving, Easter, Ramadan, Chinese New Year change every year and disrupt the estimate for
the coefficients for the week of the year impact which can be handled by creating a variable for each.
D:
Advantages – Weekly data can’t deal with holidays and their lead/lag relationships. If a holiday has days 1,2,3 before the holiday as very
large volume a daily model can forecast that while the weekly won’t be able year in and year out model and forecast that impact as the
day of the week that the holiday occurs changes every year.
Disadvantages – Slower to process, but this can be mitigated by reusing models.
Monthly VS weekly VS daily
Prediction approaches
● Frequency domain ● Machine learning● Time domain
Forecasting’s short history
● Generic models
(Moving Average Process MA(q), Exp Smoothing, Autoregressive Process AR(p), Autoregressive
Moving Average ARMA(p, q), Autoregressive Integrated Moving Average ARIMA (p, d, q))
● State Space models and Kalman Filter
● Multivariate vector models
● Feature extraction & ML
● DL approaches
(LSTM Recurrent Neural Networks)
Interpolation and extrapolation
Retail data. Substitutes. Categories
Decomposition tactic
Trend aproximation
The approximation of the trend can be found from the formula below
where Pn
(t) is a degree polynomial and Ak
is a set of indexes, including the first k indexes
with highest amplitudes.
Seasonality VS Cycles
Canadian lynx data
Aperiodic population cycles of approximately 10 years
Monthly sales of new one-family houses sold in USA
Strong seasonality within each year and strong cycles with
period 6-10 years
Half-hourly electricity demand in England
Multi-seasonality with daily and weekly patterns
Calendar
● Holidays
● Vacation
● School vacation
● Fasting and Abstinence
● Festivals
● Shopping holiday
External sources. APIs
Wunderground API
https://www.wunderground.com/weather/api/
Google trends
https://trends.google.com/
OfficeHolidays
http://www.officeholidays.com/
HolidayCalendar
https://holidaycalendar.com/
10Times
https://10times.com
OfficeHolidays
Feature engineering for residuals
● One-hot encoding
● Counting
● Statistical moments
● Percentiles
● Lags
● Logs
● Peaks
● Least-squares spectral analysis
● Nonlinear transformations Factor analysis!
Correlation types
● Pearson correlation is statistic to measure the degree of the relationship between
linearly related variables.
Assumptions: both variables should be normally distributed and have linearity
and homoscedasticity relationship (normally distributed about the regression line)
● Spearman rank correlation is non-parametric test that is used to measure the degree
of association between two variables.
Assumptions: it doesn’t make any assumptions about the distribution.
● Kendall tau is a statistic used to measure the ordinal association between two
measured quantities.
Assumptions: data must be at least ordinal and scores on one variable must
be montonically related to the other variable.
where
where s1
/s2
is number of
concordant/discordant pairs
How to work with a short history?
Stochastic Simulation (Monte-Carlo)
Predicting the Past and Predicting
the Future
Error measuring
is an accuracy measure based on percentage (or relative) errors. One
supposed problem with SMAPE is that it is not symmetric since over- and
under-forecasts are not treated equally.
is scale-dependent
is scale-dependent
is the computed average of percentage errors. The formula can be
used as a measure of the bias in the forecasts
usually expresses accuracy as a percentage. It puts a heavier penalty
on negative errors, than on positive errors.
Robustness. Model selection
If n/k < 40:
where
- the set of model parameters;
- the likelihood of the candidate model given the data;
- the number of estimated parameters in the candidate model;
- the number of observations.
Existing solutions with Python
Pandas Statsmodels Scikit-learn
XGBoost PyFlux Prophet
PyAF TensorFlow Cesium
Usage
● Capacity planning
● Utilization maximization
● Cost minimization
● Dynamic pricing
● Supply chain management
Inspired by Technology.
Driven by Value.
Find us at eleks.com
Have a question? Write to eleksinfo@eleks.com
Taras Firman
email: taras.firman@eleks.com
skype: tarasinho_318
AI&BigData 2017
4 November, Lviv

More Related Content

What's hot

Home quality management
Home quality managementHome quality management
Home quality management
selinasimpson371
 
Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning
Gokul Jayan
 
Quality management in nursing
Quality management in nursingQuality management in nursing
Quality management in nursing
selinasimpson1401
 
Forecasting
ForecastingForecasting
Forecasting
Manish Kaushik
 
Intro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGIntro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUG
egoodwintx
 
Project management quality
Project management qualityProject management quality
Project management quality
selinasimpson0401
 
Types of variables and descriptive statistics
Types of variables and descriptive statisticsTypes of variables and descriptive statistics
Types of variables and descriptive statistics
Dhritiman Chakrabarti
 
Quality management education
Quality management educationQuality management education
Quality management education
selinasimpson2001
 
UsingAHP_02
UsingAHP_02UsingAHP_02
UsingAHP_02
pbaxter
 
Ses 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsSes 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statistics
metnashikiom2011-13
 
Levels of measurement
Levels of measurementLevels of measurement
Levels of measurement
debmahuya
 
A.1 properties of point estimators
A.1 properties of point estimatorsA.1 properties of point estimators
A.1 properties of point estimators
Ulster BOCES
 
Quality management policy
Quality management policyQuality management policy
Quality management policy
selinasimpson0501
 
Adaptive short term forecasting
Adaptive short term forecastingAdaptive short term forecasting
Adaptive short term forecasting
Alex
 
Displaying and describing categorical data
Displaying and describing categorical dataDisplaying and describing categorical data
Displaying and describing categorical data
Olivia Dombrowski
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
Fatima Bianca Gueco
 
Quality management posters
Quality management postersQuality management posters
Quality management posters
selinasimpson2701
 
Hrug intro to forecasting
Hrug intro to forecastingHrug intro to forecasting
Hrug intro to forecasting
egoodwintx
 
Quality management journal
Quality management journalQuality management journal
Quality management journal
selinasimpson0401
 
Www.iso 9001
Www.iso 9001Www.iso 9001
Www.iso 9001
jomritagu
 

What's hot (20)

Home quality management
Home quality managementHome quality management
Home quality management
 
Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning
 
Quality management in nursing
Quality management in nursingQuality management in nursing
Quality management in nursing
 
Forecasting
ForecastingForecasting
Forecasting
 
Intro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUGIntro to Forecasting - Part 3 - HRUG
Intro to Forecasting - Part 3 - HRUG
 
Project management quality
Project management qualityProject management quality
Project management quality
 
Types of variables and descriptive statistics
Types of variables and descriptive statisticsTypes of variables and descriptive statistics
Types of variables and descriptive statistics
 
Quality management education
Quality management educationQuality management education
Quality management education
 
UsingAHP_02
UsingAHP_02UsingAHP_02
UsingAHP_02
 
Ses 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsSes 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statistics
 
Levels of measurement
Levels of measurementLevels of measurement
Levels of measurement
 
A.1 properties of point estimators
A.1 properties of point estimatorsA.1 properties of point estimators
A.1 properties of point estimators
 
Quality management policy
Quality management policyQuality management policy
Quality management policy
 
Adaptive short term forecasting
Adaptive short term forecastingAdaptive short term forecasting
Adaptive short term forecasting
 
Displaying and describing categorical data
Displaying and describing categorical dataDisplaying and describing categorical data
Displaying and describing categorical data
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Quality management posters
Quality management postersQuality management posters
Quality management posters
 
Hrug intro to forecasting
Hrug intro to forecastingHrug intro to forecasting
Hrug intro to forecasting
 
Quality management journal
Quality management journalQuality management journal
Quality management journal
 
Www.iso 9001
Www.iso 9001Www.iso 9001
Www.iso 9001
 

Viewers also liked

Ai big dataconference_volodymyr getmanskyi colorization distance measuring
Ai big dataconference_volodymyr getmanskyi colorization distance measuringAi big dataconference_volodymyr getmanskyi colorization distance measuring
Ai big dataconference_volodymyr getmanskyi colorization distance measuring
Olga Zinkevych
 
Ai&bigdataconference oleksandr saienko machine learning use cases in telecom
Ai&bigdataconference oleksandr saienko machine learning use cases in telecomAi&bigdataconference oleksandr saienko machine learning use cases in telecom
Ai&bigdataconference oleksandr saienko machine learning use cases in telecom
Olga Zinkevych
 
Ai big dataconference_sparkinonehour_vitalii bashun
Ai big dataconference_sparkinonehour_vitalii bashunAi big dataconference_sparkinonehour_vitalii bashun
Ai big dataconference_sparkinonehour_vitalii bashun
Olga Zinkevych
 
Ai big dataconference_krakovetskyi_microsoft ai a new era of smart solutions
Ai big dataconference_krakovetskyi_microsoft ai a new era of smart solutionsAi big dataconference_krakovetskyi_microsoft ai a new era of smart solutions
Ai big dataconference_krakovetskyi_microsoft ai a new era of smart solutions
Olga Zinkevych
 
Ai big dataconference_ml_fastdata_vitalii bondarenko
Ai big dataconference_ml_fastdata_vitalii bondarenkoAi big dataconference_ml_fastdata_vitalii bondarenko
Ai big dataconference_ml_fastdata_vitalii bondarenko
Olga Zinkevych
 
Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake
Olga Zinkevych
 

Viewers also liked (6)

Ai big dataconference_volodymyr getmanskyi colorization distance measuring
Ai big dataconference_volodymyr getmanskyi colorization distance measuringAi big dataconference_volodymyr getmanskyi colorization distance measuring
Ai big dataconference_volodymyr getmanskyi colorization distance measuring
 
Ai&bigdataconference oleksandr saienko machine learning use cases in telecom
Ai&bigdataconference oleksandr saienko machine learning use cases in telecomAi&bigdataconference oleksandr saienko machine learning use cases in telecom
Ai&bigdataconference oleksandr saienko machine learning use cases in telecom
 
Ai big dataconference_sparkinonehour_vitalii bashun
Ai big dataconference_sparkinonehour_vitalii bashunAi big dataconference_sparkinonehour_vitalii bashun
Ai big dataconference_sparkinonehour_vitalii bashun
 
Ai big dataconference_krakovetskyi_microsoft ai a new era of smart solutions
Ai big dataconference_krakovetskyi_microsoft ai a new era of smart solutionsAi big dataconference_krakovetskyi_microsoft ai a new era of smart solutions
Ai big dataconference_krakovetskyi_microsoft ai a new era of smart solutions
 
Ai big dataconference_ml_fastdata_vitalii bondarenko
Ai big dataconference_ml_fastdata_vitalii bondarenkoAi big dataconference_ml_fastdata_vitalii bondarenko
Ai big dataconference_ml_fastdata_vitalii bondarenko
 
Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake
 

Similar to Ai big dataconference_taras firman how to build advanced prediction with adding external data

IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Dr. Radhey Shyam
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
TimKasse
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generate
Daniel Koh
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIF
Daniel Koh
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Storm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SASStorm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SAS
Gautam Sawant
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
egoodwintx
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
Minchao Lin
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
MargiShah29
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
Taweh Beysolow II
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
bhuvana ganesan
 
Statistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingStatistics for Managers pptS for better understanding
Statistics for Managers pptS for better understanding
ShamshadAli58
 
Machine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation dataMachine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation data
jagan477830
 
FinalReport
FinalReportFinalReport
FinalReport
Anıl Ulaş KOÇAK
 
Weather forecasting model.pptx
Weather forecasting model.pptxWeather forecasting model.pptx
Weather forecasting model.pptx
VisheshYadav12
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
Studio Synthesis
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
Ali T. Lotia
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
경록 박
 

Similar to Ai big dataconference_taras firman how to build advanced prediction with adding external data (20)

IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generate
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIF
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Storm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SASStorm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SAS
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Statistics for Managers pptS for better understanding
Statistics for Managers pptS for better understandingStatistics for Managers pptS for better understanding
Statistics for Managers pptS for better understanding
 
Machine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation dataMachine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation data
 
FinalReport
FinalReportFinalReport
FinalReport
 
Weather forecasting model.pptx
Weather forecasting model.pptxWeather forecasting model.pptx
Weather forecasting model.pptx
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
 

More from Olga Zinkevych

Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...
Olga Zinkevych
 
Evolution of words through time a malenko dataconf 21 04_18
Evolution of words through time a malenko dataconf 21 04_18Evolution of words through time a malenko dataconf 21 04_18
Evolution of words through time a malenko dataconf 21 04_18
Olga Zinkevych
 
What it takes to build a model for detecting patients that defaults from medi...
What it takes to build a model for detecting patients that defaults from medi...What it takes to build a model for detecting patients that defaults from medi...
What it takes to build a model for detecting patients that defaults from medi...
Olga Zinkevych
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Olga Zinkevych
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Olga Zinkevych
 
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Olga Zinkevych
 
Aibdconference chat bot for every product Maksym Volchenko
Aibdconference chat bot for every product Maksym VolchenkoAibdconference chat bot for every product Maksym Volchenko
Aibdconference chat bot for every product Maksym Volchenko
Olga Zinkevych
 
Ai big dataconference_semantic image segmentatation using word embeddings_ole...
Ai big dataconference_semantic image segmentatation using word embeddings_ole...Ai big dataconference_semantic image segmentatation using word embeddings_ole...
Ai big dataconference_semantic image segmentatation using word embeddings_ole...
Olga Zinkevych
 
Ai big dataconference_jeffrey ricker_kappa_architecture
Ai big dataconference_jeffrey ricker_kappa_architectureAi big dataconference_jeffrey ricker_kappa_architecture
Ai big dataconference_jeffrey ricker_kappa_architecture
Olga Zinkevych
 

More from Olga Zinkevych (9)

Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...
 
Evolution of words through time a malenko dataconf 21 04_18
Evolution of words through time a malenko dataconf 21 04_18Evolution of words through time a malenko dataconf 21 04_18
Evolution of words through time a malenko dataconf 21 04_18
 
What it takes to build a model for detecting patients that defaults from medi...
What it takes to build a model for detecting patients that defaults from medi...What it takes to build a model for detecting patients that defaults from medi...
What it takes to build a model for detecting patients that defaults from medi...
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
 
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
 
Aibdconference chat bot for every product Maksym Volchenko
Aibdconference chat bot for every product Maksym VolchenkoAibdconference chat bot for every product Maksym Volchenko
Aibdconference chat bot for every product Maksym Volchenko
 
Ai big dataconference_semantic image segmentatation using word embeddings_ole...
Ai big dataconference_semantic image segmentatation using word embeddings_ole...Ai big dataconference_semantic image segmentatation using word embeddings_ole...
Ai big dataconference_semantic image segmentatation using word embeddings_ole...
 
Ai big dataconference_jeffrey ricker_kappa_architecture
Ai big dataconference_jeffrey ricker_kappa_architectureAi big dataconference_jeffrey ricker_kappa_architecture
Ai big dataconference_jeffrey ricker_kappa_architecture
 

Recently uploaded

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 

Recently uploaded (20)

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 

Ai big dataconference_taras firman how to build advanced prediction with adding external data

  • 1. HOW TO BUILD ADVANCED PREDICTION WITH ADDING EXTERNAL DATA Taras Firman, Ph.D., Senior Data Scientist
  • 2. M: Advantages – Fast to compute, easier to model, easier to identify changes in trends, better for strategic long term forecasting. Disadvantages – If you need to plan as the daily level for capacity, people and spoilage of product then higher levels of forecasting won’t help understand the demand on a daily basis as a 1/30th ratio estimate is clearly insufficient. W: Advantages – When you can’t handle the modeling process at a daily level you “settle” for this. When you have very systematic cyclical cycles like “articice extents” that follow a rigid curve and not need for day of the week variations. Disadvantages – Floating Holidays like Thanksgiving, Easter, Ramadan, Chinese New Year change every year and disrupt the estimate for the coefficients for the week of the year impact which can be handled by creating a variable for each. D: Advantages – Weekly data can’t deal with holidays and their lead/lag relationships. If a holiday has days 1,2,3 before the holiday as very large volume a daily model can forecast that while the weekly won’t be able year in and year out model and forecast that impact as the day of the week that the holiday occurs changes every year. Disadvantages – Slower to process, but this can be mitigated by reusing models. Monthly VS weekly VS daily
  • 3. Prediction approaches ● Frequency domain ● Machine learning● Time domain
  • 4. Forecasting’s short history ● Generic models (Moving Average Process MA(q), Exp Smoothing, Autoregressive Process AR(p), Autoregressive Moving Average ARMA(p, q), Autoregressive Integrated Moving Average ARIMA (p, d, q)) ● State Space models and Kalman Filter ● Multivariate vector models ● Feature extraction & ML ● DL approaches (LSTM Recurrent Neural Networks)
  • 8. Trend aproximation The approximation of the trend can be found from the formula below where Pn (t) is a degree polynomial and Ak is a set of indexes, including the first k indexes with highest amplitudes.
  • 9. Seasonality VS Cycles Canadian lynx data Aperiodic population cycles of approximately 10 years Monthly sales of new one-family houses sold in USA Strong seasonality within each year and strong cycles with period 6-10 years Half-hourly electricity demand in England Multi-seasonality with daily and weekly patterns
  • 10. Calendar ● Holidays ● Vacation ● School vacation ● Fasting and Abstinence ● Festivals ● Shopping holiday
  • 11. External sources. APIs Wunderground API https://www.wunderground.com/weather/api/ Google trends https://trends.google.com/ OfficeHolidays http://www.officeholidays.com/ HolidayCalendar https://holidaycalendar.com/ 10Times https://10times.com OfficeHolidays
  • 12. Feature engineering for residuals ● One-hot encoding ● Counting ● Statistical moments ● Percentiles ● Lags ● Logs ● Peaks ● Least-squares spectral analysis ● Nonlinear transformations Factor analysis!
  • 13. Correlation types ● Pearson correlation is statistic to measure the degree of the relationship between linearly related variables. Assumptions: both variables should be normally distributed and have linearity and homoscedasticity relationship (normally distributed about the regression line) ● Spearman rank correlation is non-parametric test that is used to measure the degree of association between two variables. Assumptions: it doesn’t make any assumptions about the distribution. ● Kendall tau is a statistic used to measure the ordinal association between two measured quantities. Assumptions: data must be at least ordinal and scores on one variable must be montonically related to the other variable. where where s1 /s2 is number of concordant/discordant pairs
  • 14. How to work with a short history? Stochastic Simulation (Monte-Carlo) Predicting the Past and Predicting the Future
  • 15. Error measuring is an accuracy measure based on percentage (or relative) errors. One supposed problem with SMAPE is that it is not symmetric since over- and under-forecasts are not treated equally. is scale-dependent is scale-dependent is the computed average of percentage errors. The formula can be used as a measure of the bias in the forecasts usually expresses accuracy as a percentage. It puts a heavier penalty on negative errors, than on positive errors.
  • 16. Robustness. Model selection If n/k < 40: where - the set of model parameters; - the likelihood of the candidate model given the data; - the number of estimated parameters in the candidate model; - the number of observations.
  • 17. Existing solutions with Python Pandas Statsmodels Scikit-learn XGBoost PyFlux Prophet PyAF TensorFlow Cesium
  • 18. Usage ● Capacity planning ● Utilization maximization ● Cost minimization ● Dynamic pricing ● Supply chain management
  • 19. Inspired by Technology. Driven by Value. Find us at eleks.com Have a question? Write to eleksinfo@eleks.com Taras Firman email: taras.firman@eleks.com skype: tarasinho_318 AI&BigData 2017 4 November, Lviv