SlideShare a Scribd company logo
1 of 18
Course: Business Forecasting
Submitted to: Prof. Apratim Guha
Topic
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Submitted by:
Name Roll No.
Shruti
Nigam
EA21029
Business Forecasting Project Report
PGCBA Batch-IV
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Preface
As a part of the curriculum for the course Business Forecasting, XLRI PGCBA-4
Program, we are required to apply various models and forecasting techniques
on a case study and submit a project report. The basic objective behind doing
this project report is to get hands-on experience and practical knowledge on
business forecasting tools and techniques that can be used to solve real world
problems.
In this project report, we have used various techniques like basic exploratory
analysis, visualization of time series, regression modelling for time series data,
2-layer ARIMA model for analysis, decomposition methods, residual analysis to
smooth out data to be able to forecast accurately and to reach to our conclusion.
Doing this report helped us to gain deeper insights into the fields of analytics
and application of business forecasting to resolve real-life challenges and
problems.
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Abstract
Business forecasting allows us to analyse the data in hand, creating strategies
for projections, and then comparing the forecasting model to the realized
outcome. Forecasting can be done by many methods. The time series analysis
method has been used in the project to predict the future trend or pattern
analysing the given data over given period. Time Series data focuses on the
patterns found in the historical data and uses statistical methods to understand
how time affects the target variable. Here, concepts such as analysis of the
seasonality, trend, cyclicity, and irregularity found in historical data are used to
understand the future better. In this project we are applying the time series
analysis to determine the current trend of rate of influx of individuals and
commercial vehicles from Mexico and Canada to US via multiple modes of
transport. Based on which forecasting has been done to compare it with the
realized outcome.
The overview of methodology has been displayed with introduction in next
pages.
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Table of Contents
Introduction ............................................................................................................................................5
Overview of the procedure to be followed ............................................................................................5
Data Exploration .....................................................................................................................................6
2.2 Data Loading .....................................................................................................................................6
2.3 Data Pre-processing..........................................................................................................................6
2.4 Data Exploration ...............................................................................................................................6
Decomposition........................................................................................................................................9
3. Model 1: Regression with trend and seasonality..............................................................................11
Model 2: Pure ARIMA model ................................................................................................................13
Model 3: Seasonal Naïve.......................................................................................................................14
Model 4:Holt-Winters...........................................................................................................................15
Model 5: Neural network model...........................................................................................................15
Model 6: ETS .........................................................................................................................................15
ACCURACY.............................................................................................................................................16
Forecasting............................................................................................................................................16
Bibliography ..........................................................................................................................................18
Appendix ...............................................................................................................................................18
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Introduction
The wisdom of globalization happening over the years has introduced us to the concept of tearing
down borders. The statement to support this says that growing integration and interdependence leads
to a retreat of the regulatory state, more open borders, and more harmonious cross-border
relationships. In fact, the prominent free market advocates such as Wall Street Journal even published
that border not only more meaningless for the flow of goods and money but also for people, backing
the merging ‘borderless world’.
However, after the devastating 9/11 terrorist attacks on the mainland US turned the North America’s
vision of having a border-free vision into another direction. One of the immediate responses by US
authorities after the attack was to do something about the leaking border. Rather than simply being
dismantled in the face of intensifying pressures of economic integration, border controls are being
retooled and redesigned as part of a new and expanding “war on terrorism.” Traditional border issues
such as trade and migration are now inescapably evaluated through a security lens. Optimistic talk of
opening borders has been replaced by more anxious and sombre talk about “security perimeters” and
“homeland defence”. The American public’s views on the border-less world also re-shaped as
prevalent fears about unpredictable terrorism has heightened after the incident.
In this project, we are focusing to determine that how the inflow and outflow of the changing practice
and politics of North American border controls by analysing the implications of these changes for cross
border relations and continental integration. We are using the past data to find and analyses the
pattern and past trends and finally to forecast the trend for the most recent year of 2020-21.
Abbreviations used in the report
‘ ts – Time Series
WN – White noise
Overview of the procedure to be followed
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Data Exploration
2.1 Data Explanation
This data file contains whole data from January 1996 to February 2020 of the total incoming
crossing counts into the US. This file contains 7 columns specifying the port and its unique code,
the border, the mode of vehicle used, number of people crossing the border into the US, the date
and time of crossing, the mode of transport used to cross over.
• Port Name: Name of the port from which the border is crossed.
• State: States in US
• Port Code: Unique port code
• Border: US-Canada or US-Mexico border
• Month: Jan to Dec till Feb 2020
• Year: 1996 to 2020
• Date (DD/MM/YY): Date of crossing the border
• Measure: Mode of transportation
• Value: Count of people crossing
2.2 Data Loading
The data which is in excel format was loading into a variable. There are 355511 records and 7 variables
in the dataset.
2.3 Data Pre-processing
We checked for the nulls in the data set as a part of data pre-processing. No nulls were found in the
data set.
2.4 Data Exploration
The dataset is a representation of influx of immigrants from Mexico and Canada borders of US to
various states via varied modes of transport. The data consists of number of migrations on daily basis
across multiple entry points of states across the US. In time series forecasting only numerical variables
can be considered in X variables/predictors.
The data was imported into R and then grouped into Monthly data, from January 1996 to February
2020. Data visualizations plots were plotted to ascertain time series behaviour over the years and
months.
Plots show a varying trend over the years with no cyclic pattern. Seasonality seems to be prominent
characteristic.
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Figure 1. Data Table for Individuals and commercial vehicles crossing US border into states, Jan 1996 to Feb 2020
There is a sharp decline post 2001, which suddenly stabilizes by 2012 and start moving slightly
upwards. There is a significant change in trend observed from 2010 hence, data till 2010 should not
be used in the analysis. We reduce the dataset, will consider data from January 2011 onwards, pls
refer Fig-2
Figure 2. Autoplot of entire data, since 1996
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
There is clear seasonality every 12 months, dataset can be treated as frequency 12. The change in
seasonality can be seen clearly over years, it appears to follow almost similar pattern every year.
Over the years people crossing the border follow the same pattern throughout the year, every month
as it has been set since 1996. Despite huge decrease in number of people crossing the border every
decade. See (Fig-3,4)
There seems to be no cyclic pattern, first decade numbers descend, next decade numbers ascend.
Even 5 years movements are different.
Graphs suggests time series has prominent presence of trend and seasonality components, i.e., time
s is non-stationary.
The data has been reduced to 2011 onwards, on plotting the data from 2011 to 2020, we observe that
there is clear pattern of seasonality and slightly increasing trend. See (Fig-6) and (Fig-7).
Figure 3. Seasonality plot of entire data
Figure 4. Seasonality plot of entire data, numbers have decreased
but pattern remains similar
Figure 5. Plot of Reduced dataset since Jan 2011 to Feb 2020
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Let us decompose the time series to understand the components that will help in building the model
for forecasting.
The presence of seasonality in the data between 2011-2020 can be seen in the seasonal plots. Pls refer
(Fig-8 and Fig-9)
Over the years number of people crossing the borders into US is following the same pattern through
the year and months from 1996. Despite huge decrease in number of people crossing the border every
decade, the seasonal pattern persists. Pls refer (Fig-8, Fig-9, and Fig-10).
Decomposition
To further analyse the data, we decompose the data into individual components of trend, seasonality,
and residuals/errors. Pls refer (Fig-6)
Classical Method (additive model) of decomposition was run on the data and the following
observations were made:
1. Presence of strong seasonality
2. Trend line is strong and volatile
3. High Variance in the remainders
Figure 5. Seasonality, sub series plots of reduced Dataset. Similar patterns as entire dataset
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Classical Method (multiplicative model) of decomposition was also run on the data and the following
observations were like the additive model.
Therefore, these methods are not comparable and X11 seasonal smoothening is being applied to
understand the time series components.
X11 method was applied on the data and the findings are follows:
1. It has automatically selected the additive time series structure
2. Presence of seasonality is strong, and intervals do not increase with time
The time series decomposition has defined the model as additive.
Now, we prepare for creating Models for forecasting.
We divide the dataset into train and test. January 2011 to December 2018 is train and January 2019
onwards is test data.
Then we check the residuals characteristics in both grouped data and train dataset by Box cox test, by
looking at lambda value, checking for heteroskedasticity.
Before, going for the modelling we checked for the presence of heteroscedasticity, through Box cox
test function. We get the λ = 0.5925186. This makes the size of the seasonal variation about the same
across the whole series, as that makes the forecasting model simpler. In this case it works quite
well.
There is clear seasonality every 12 months, dataset can be treated as frequency 12. And it is evident
from the ACF plot, lag values are decreasing towards zero, very slowly. Plus, lags are outside
transformation points. Hence, the train dataset is non-stationary (however, seasonality being present,
no need to check for non-stationarity).
Figure 6. Residula check in reduced dataset.
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Moreover, the box-pierce test shows the presence of white noise, hence rejecting the null
Hypothesis.
Let's start modelling the ts for forecasting number of passengers influx into states (including both
borders crossing)
3. Model 1: Regression with trend and seasonality
The tslm() function fits a linear regression model to time series data.
Now we model our dataset for obtaining forecast.
To build the forecasting model, data has been split into test and train, with train set containing
datapoints from Jan 2010- Dec-2017 and test set containing datapoints from Jan 2018- end.
t-value and p-value holds no meaning in terms of forecasting. If the predictions are close to the actual
values, we would expect to be close to 1. On the other hand, if the predictions are unrelated to the
actual values, then (again, assuming there is an intercept). In all cases, lies between 0 and 1.
Multiple R-squared: 0.9575, Adjusted R-squared: 0.9504 are near to 1. Meaning the predicted fitted
values by the model 1 are much closer to original prediction. We can see that in the following figure.
Let’s check the residuals behaviour too, whether model id good enough for prediction in test data. We
check residuals in train data then check accuracy on test data. Because there is no guarantee that
model performing good in train will perform same or better in test data.
Though, the fitted values are closest to the original values, the residuals of the model exhibit
autocorrelation, which is clear from the plot, and because the Breusch-Godfrey test for serial
tslm(formula = Value ~ trend + season, data = crossing.ts.train)
Values’ = 43948 + (-2177459)* season2 + 1797434 * season3 + 1110099 * season4 +
2456941* season5 + 2125431 * season6 + 5482345 * season7 + 5572770 * season8 + 1241338
* season9 + 1989670 * season10 + 968427 * season11 + 2760972* season12
Figure 7. Fitted line by TSLM model over train dataset.
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
correlation test p-value = 0.01292 < 0.05, rejects the Null hypothesis of No autocorrelation/presence
of White Noise.
Variation in time series is present but there is no sign of heteroscedasticity.
The histogram shows that the residuals seem to be slightly skewed, which may also affect the coverage
probability of the prediction intervals. Most number of Lags are crossing out of transformation lines,
i.e. Prediction Intervals (1.98/N). In any case, the autocorrelation is large, which will impact PI and
thus have impact on forecasted values or PI.
Hence, we need to fit a second layer ARIMA model to the residuals, for improving the prediction
capability by capturing the information left in the residuals.
First, check for the differencing needed to smooth out the residuals.
Differencing can help stabilise the mean of a time series by removing changes in the level of a time
series, and therefore eliminating (or reducing) trend and seasonality.
This process of using a sequence of KPSS tests to determine the appropriate number of first
differences is zero to make the data stationary.
Because seasonal differencing returns 1 (indicating one seasonal difference is required). These
functions suggest we should do both a seasonal difference and no first difference.
So, d = 0, and D = 1. Now create the trend and seasonality variables based on this to be fed into ARIMA
model.
Model fitted by auto.arima is ARIMA(1,0,1)(2,1,0)[12] with AICc = 1241.86 .
The coefficients of AR and MA terms is less than 1, the sum of the coefficients of the two seasonal
AR terms is less than one and the sum of the coefficients of two MA terms is also less than one.
Figure 8. Residual check for TSLM model
(1-0.9764 * B) (1 – (-0.4329)* B12
) (1- B12
)* Yt = (1 + (-0.3645) * B12
) et
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
p-value is 0.1967 greater than 5% in Ljung Box test, meaning fail to reject null hypothesis. i.e. White
noise is present. However, we can see lag 7 is going outside PI, only one lag going outside
transformation line will not impact the prediction from the model. This seems to be a good model. This
ARIMA model is stationary and invertible.
AICc=1241.86 , MAPE = 225.185, MAPE = 0.6689771: MODEL TS+ SARIMA
Model 2: Pure ARIMA model
Before ARIMA , will check for Non stationarity and whether differencing is needed or not. By using
Unit root tests These are statistical hypothesis tests of stationarity that are designed for determining
whether differencing is required.
A number of unit root tests are available, which are based on different assumptions and may lead to
conflicting answers. In our analysis, we use the Kwiatkowski-Phillips-Schmidt-Shin (KPSS)
test (Kwiatkowski, Phillips, Schmidt, & Shin, 1992). In this test, the null hypothesis is that the data are
stationary, and we look for evidence that the null hypothesis is false. Consequently, small p-values
(e.g., less than 0.05) suggest that differencing is required.
On checking residuals on train data, lag 1 is highly significant, and followed by negative lag 6. Lags
oscillate after fixed number of lags interval, indicating prominent seasonality. It also shows lags going
outside transformation line, meaning no White Noise.
KPSS Test for Level Stationarity has p-value = 0.01 less than 0.05, rejects the Null, i.e., series is non-
stationary.
Augmented Dickey-Fuller Test p-value = 0.2981, Fail to reject Null, i.e., ts is non-stationary.
Tests show differencing of 1 lag is needed and zero seasonal differencing is needed to reduce the
differenced time series into stationary.
We apply Pure ARIMA model with d = 1 and D=0 by auto. Arima(), model is ARIMA(0,1,0) with
AICc=1015.96 MAPE = 1.374052 MASE = 0.611111.
Figure 9. Pure Arima Model fitted over train data
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
A Ljung-Box test returns a p-value = 0.001733 less than 0.05, suggesting that the residuals are NOT
white noise. Lag 1 is negative and significant and goes out of transformation line. Others are in waves,
only one lag may not have significant impact on forecasting.
Seasonality component is not captured by the model. We try to see the fit of the forecasted values by
the model with the original dataset. We can easily see that fitted values are closely following the test
dataset values. The Model seems a good fit.
Next, we try to build a better model on train data directly. To make train data stationary, we need
one non-seasonal and one seasonal differencing. Making the next pure model seasonal ARIMA.
Model is ARIMA (0,1,1) (0,1,1) [12] with AICc=852.48, MAPE = -0.2317185, MASE = 1.103649
This has the lowest AICc value with MASE. Ljung-Box test has p-value = 0.9549, Fail to reject null i.e.,
pure WN, evident from the ACF plot too. This is the best model so far.
Now, we explore other models.
Model 3: Seasonal Naïve
The forecasting techniques of Naïve, Seasonal Naïve, Average and Drift was applied on the data.
Seasonal naive has least value for MAPE AND MASE on test data
On fitting the various forecasting methods output on data, we find a better fit on the naïve seasonal
forecasting technique. The checking upon the model’s residuals the following observations can be
made (FIG-10):
1. ACF – all the lags beyond the threshold limits
2. Lag 1st
is significant, Every 12th
lag point higher the rest
3. Residual are left skewed however the curve normal
4. There is strong presence of autocorrelation.
On performing the Ljung-box tests we find a highly significant p=value (<2.2e-16) (Fig-11).
Indicating residuals are non-stationery and residuals contain more information and require
further modelling.
Again, we will check differencing needed to make residuals stationary. 1 Non seasonal differencing is
needed on residuals. We fit auto.arima with d=1. The model is ARIMA(0,1,1) with AICc=1195.28,
MAPE 246.2098 MASE 0.6888965.
This is model for residual forecasting. The residuals have become pure WN with Ljung-Box test p-
value = 0.1834 greater than 5%. Now, we need to fit the forecasted model into original values. First,
calculate forecasted values by adding up forecasted values from seasonal naïve model with
forecasted residuals from auto Arima to get proper forecasted values. Now we put this on the plot.
- (1-B) * (1- B12
)* Yt = (1 + (-0.7723) * B) + (1 + (-0.6338) * B12
) et
- (1-B) * Yt = (1 + (-0.7723) * B) + (1 + (- 0.8044) * B) et
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
As we can see fit is not as good as pure Arima. We will later compare the accuracy. Let’s see how
smoothing parameters work in this data for forecasting.
Model 4:Holt-Winters
We fit both Additive and multiplicative holt winter’s model, as both trend and seasonality are
present.
The output of HoltWinters() tells us that the estimated value of the alpha parameter is about 0.032
for additive and 0.0616 for multiplicative. This is very close to zero, telling us that the forecasts are
based on both recent and less recent observations (although somewhat more weight is placed on
recent observations).
The model shows additive ts structure fits better than multiplicative.
On checking residuals, only lag 24 is significant, all other lags are WN with Ljung-Box test for additive
is p-value = 0.01273 less than 5%. Rejects the Null, i.e. some non-stationarity is there. However, one
larger lag will not impact the forecast significantly. We try to plot the dataset with fitted values.
Residuals are normally distributed with little bit skewed with no heteroscedasticity.
The linear exponential smoothing models are all special cases of ARIMA models, the nonlinear
exponential smoothing models have no equivalent ARIMA counterparts. For non-stationary residuals
and data, we can also explore ETS models.
Model 5: Neural network model
We model the d=train data by neural network. It is evident that, lag 1 is significant and positive, from
lag 18 all lags become negative. No heteroscadascity is present in residuals.
The model is NNAR(2,1,2). There is no AICc value computed for this.
We later compare the accuracy of the forecast.
Let’s explore ETS, we have experienced the data residuals display non stationarity. As we know all
ETS models are non-stationary, while some ARIMA models are stationary.
Model 6: ETS
ETS model is a time series univariate forecasting method; its use focuses on trend and seasonal
components. ETS a three-character string identifying the nature of Time-Series components,
first character: Nature of Remainder: l t
second character: Nature of Trend: b t
third character: Nature of Seasonality: s t
(Sindhanuru, n.d.) (Athanasopoulos, n.d.)
The ETS models with seasonality or non-damped trend or both have two-unit roots (i.e., they need
two levels of differencing to make them stationary). All other ETS models have one unit root (they
need one level of differencing to make them stationary).
Let’s see simple ETS model on train dataset. The model is ETS(M,Ad,M):
Remainder is Multiplicative, Nature of trend is Additive but damped, Seasonality is Multiplicative
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
With alpha = 0.1316, close to zero. AICc =2581.995, MAPE = 1.051038 MASE = 0.4693708
Residuals exhibit non stationarity, as p-value = 0.006745 greater than 5%, rejects null hypothesis of
Ljung-Box test. As visible in the plot only one lag24 is outside the line, it may not have significant
effect on prediction. However, MASE is the lowest but AICc value is highest among the models of
this project. Additionally, residuals are normally distributed.
On plotting the fitted forecasted values on training dataset, the fitted line closely follows the original
data points and shape of the ts.
ACCURACY
We have forecasted numbers of individuals crossing the US border, by each model we have built.
Then tested the accuracy of the model on the Test Data. The following is the performance of each
model on test data:
Accuracy on test data
Model Number Model Name AICc of Model MAPE MASE
1 TSLM 1241.86 1.672851 0.7902939
2 Pure ARIMA 852.48 1.570884 0.7342843
3 Seasonal Naive 1195.28 2.017029 0.9388456
4 Holt Winter's 3436.111 1.527619 0.7171233
5 Neural Network - 2.309248 1.0874329
6 ETS 2581.995 1.865868 0.8714899
AICc value is the lowest for Pure ARIMA model, with a low MASE and MAPE.
Holt-winter’s MAPE and MASE are the lowest, but AICc value is the highest.
Forecasting
Let’s forecast the values by Pure ARIMA Model by re-training the chosen model on the whole
dataset, and then produce the forecast.
Simple ETS model:
Yt = l t + b t + s t
Yt = 27398694.1062 + 104492.8019 + sum (1.0273 0.9665 1.0007 0.977 1.1223 1.1185
1.0062 1.0183 0.972 0.9953 0.8618 0.9341)
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
We can see the number of individuals has increased, with replicating the shape previous ts seasons.
These are the forecasted numbers for next 12 months with 80, 90, 95 PI.
In conclusion, it can be deduced that the pattern of influx may witness a decline as compared to the
previous years. This could be an impact of the ongoing pandemic or the increased legal requirements
for crossing the border or many other factors which is beyond the scope of this study.
We can go ahead and now validate the model, using dataset from March 2020 to December 2021,
available at (Border-Crossing-Entry-Data, n.d.).
The model cannot predict movement based on previous database, due to pandemic restrictions across
borders. We can see sharp decrease in April 2020, which gradually increase as restriction lifted over
time. But model shows similar season movement across pandemic, showing macro-economic, political
factors are not factored in the forecasting model. We have learned that Proper forecast requires
Comprehensive approach of analytics and domain knowledge and expert views with Judgemental
forecasts.
Figure 10. Final Forecast by Pure ARIMA
Business Forecasting Project Report PGCBA Batch-IV
Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and
CanadaBorders
Bibliography
Athanasopoulos, R. J. (n.d.). Forecasting: Principles and Practice (2nd ed). Retrieved from otexts.com:
https://otexts.com/fpp2/arima-ets.html
Border-Crossing-Entry-Data. (n.d.). Retrieved from data.bts.gov: https://data.bts.gov/Research-and-
Statistics/Border-Crossing-Entry-Data/keg4-3bc2/data
Sindhanuru, H. (n.d.). Retrieved from www.latentview.com:
https://www.latentview.com/idealab/exponential-smoothing-ets-framework/
time-series-analysisforecast-with-visualization. (n.d.). Retrieved from kaggle: -
https://www.kaggle.com/datafan07/time-series-analysisforecast-with-visualization/data
PowerPoints and R , Rmd files , codes provided by Professor during sessions.
Figure 1. Data Table for Individuals and commercial vehicles crossing US border into states, Jan
1996 to Feb 2020 _________________________________________________________________ 7
Figure 2. Autoplot of entire data, since 1996 ___________________________________________ 7
Figure 3. Seasonality plot of entire data________________________________________________ 8
Figure 4. Seasonality plot of entire data, numbers have decreased but pattern remains similar ____ 8
Figure 5. Seasonality, sub series plots of reduced Dataset. Similar patterns as entire dataset______ 9
Figure 6. Residula check in reduced dataset. ___________________________________________ 10
Figure 7. Fitted line by TSLM model over train dataset.___________________________________ 11
Figure 8. Residual check for TSLM model ______________________________________________ 12
Figure 9. Pure Arima Model fitted over train data _______________________________________ 13
Figure 10. Final Forecast by Pure ARIMA ______________________________________________ 17
Appendix
.rmd file
Knitted word document.
C:UsersShruti
DocumentsXLRIBF Apratim Guhaassignment BFG6Group 6 PGCBA IV BF PROJECTGroup6-Border-Crossing Knitted doc.docx

More Related Content

Similar to Business forecasting project border

Cloud Storage Gateway Market - Outlook (2017-21)
Cloud Storage Gateway Market - Outlook (2017-21)Cloud Storage Gateway Market - Outlook (2017-21)
Cloud Storage Gateway Market - Outlook (2017-21)ResearchFox
 
Microeconomic Essay Topics
Microeconomic Essay TopicsMicroeconomic Essay Topics
Microeconomic Essay Topicsrhvslabdf
 
STATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARY
STATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARYSTATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARY
STATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARYartba
 
Small Business Info 1
Small Business Info 1Small Business Info 1
Small Business Info 1legal1
 
Info On Businesses
Info On BusinessesInfo On Businesses
Info On Businesseslegal1
 
Info Business
Info BusinessInfo Business
Info Businesslegal1
 
Q2 16 earnings-pres_final
Q2 16 earnings-pres_finalQ2 16 earnings-pres_final
Q2 16 earnings-pres_finaltribuneIR
 
Network Science for the Sustainable Development Goals
Network Science for the Sustainable Development GoalsNetwork Science for the Sustainable Development Goals
Network Science for the Sustainable Development GoalsVincent Gauthier
 
DSDP Demographic Study 2016
DSDP Demographic Study 2016DSDP Demographic Study 2016
DSDP Demographic Study 2016Caroline Stevens
 
March 5, 2021 Transportation Market update Report
March 5, 2021 Transportation Market update ReportMarch 5, 2021 Transportation Market update Report
March 5, 2021 Transportation Market update ReportSchneider
 
Fleet activity a strong indicator of economic health
Fleet activity a strong indicator of economic healthFleet activity a strong indicator of economic health
Fleet activity a strong indicator of economic healthMichael Mocanu
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Yasas Senarath
 
Urban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&Carroll
Urban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&CarrollUrban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&Carroll
Urban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&CarrollRichard Wilkie
 
Supplemental Slides to Q3 2016 Earnings Call
Supplemental Slides to Q3 2016 Earnings CallSupplemental Slides to Q3 2016 Earnings Call
Supplemental Slides to Q3 2016 Earnings CalltribuneIR
 
Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...
Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...
Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...jitendra kute
 
Scott MyziaIn 2015 when it came to safeguarding and securing Cyb.docx
Scott MyziaIn 2015 when it came to safeguarding and securing Cyb.docxScott MyziaIn 2015 when it came to safeguarding and securing Cyb.docx
Scott MyziaIn 2015 when it came to safeguarding and securing Cyb.docxbagotjesusa
 
Chapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docx
Chapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docxChapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docx
Chapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docxsleeperharwell
 
Sample Report: Global Online Payment Methods: First Half 2015
Sample Report: Global Online Payment Methods: First Half 2015Sample Report: Global Online Payment Methods: First Half 2015
Sample Report: Global Online Payment Methods: First Half 2015yStats.com
 
MGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final ReportMGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final ReportSara Husna
 

Similar to Business forecasting project border (20)

Cloud Storage Gateway Market - Outlook (2017-21)
Cloud Storage Gateway Market - Outlook (2017-21)Cloud Storage Gateway Market - Outlook (2017-21)
Cloud Storage Gateway Market - Outlook (2017-21)
 
Microeconomic Essay Topics
Microeconomic Essay TopicsMicroeconomic Essay Topics
Microeconomic Essay Topics
 
STATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARY
STATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARYSTATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARY
STATE PLANNING & DESIGN REPORTS EXECUTIVE SUMMARY
 
Small Business Info 1
Small Business Info 1Small Business Info 1
Small Business Info 1
 
Info On Businesses
Info On BusinessesInfo On Businesses
Info On Businesses
 
Info Business
Info BusinessInfo Business
Info Business
 
Q2 16 earnings-pres_final
Q2 16 earnings-pres_finalQ2 16 earnings-pres_final
Q2 16 earnings-pres_final
 
Network Science for the Sustainable Development Goals
Network Science for the Sustainable Development GoalsNetwork Science for the Sustainable Development Goals
Network Science for the Sustainable Development Goals
 
DSDP Demographic Study 2016
DSDP Demographic Study 2016DSDP Demographic Study 2016
DSDP Demographic Study 2016
 
March 5, 2021 Transportation Market update Report
March 5, 2021 Transportation Market update ReportMarch 5, 2021 Transportation Market update Report
March 5, 2021 Transportation Market update Report
 
Fleet activity a strong indicator of economic health
Fleet activity a strong indicator of economic healthFleet activity a strong indicator of economic health
Fleet activity a strong indicator of economic health
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
 
Urban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&Carroll
Urban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&CarrollUrban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&Carroll
Urban Dimensions of Rural Development in Ecuador by Wilkie,Lent.&Carroll
 
Supplemental Slides to Q3 2016 Earnings Call
Supplemental Slides to Q3 2016 Earnings CallSupplemental Slides to Q3 2016 Earnings Call
Supplemental Slides to Q3 2016 Earnings Call
 
Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...
Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...
Real Estate CRM Software 2019 - Global Sales, Price, Revenue, Gross Margin an...
 
Scott MyziaIn 2015 when it came to safeguarding and securing Cyb.docx
Scott MyziaIn 2015 when it came to safeguarding and securing Cyb.docxScott MyziaIn 2015 when it came to safeguarding and securing Cyb.docx
Scott MyziaIn 2015 when it came to safeguarding and securing Cyb.docx
 
Chapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docx
Chapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docxChapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docx
Chapter 1 IntroductionIn The High Cost of Free Parking, Dr. Danie.docx
 
Sample Report: Global Online Payment Methods: First Half 2015
Sample Report: Global Online Payment Methods: First Half 2015Sample Report: Global Online Payment Methods: First Half 2015
Sample Report: Global Online Payment Methods: First Half 2015
 
Stewart Simon Presentation
Stewart Simon PresentationStewart Simon Presentation
Stewart Simon Presentation
 
MGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final ReportMGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final Report
 

More from Shruti Nigam (CWM, AFP) (11)

Morph transition 1.pptx
Morph transition 1.pptxMorph transition 1.pptx
Morph transition 1.pptx
 
Morph transition 2.pptx
Morph transition 2.pptxMorph transition 2.pptx
Morph transition 2.pptx
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Data analysis property area analysis via powerbi
Data analysis property area analysis via powerbiData analysis property area analysis via powerbi
Data analysis property area analysis via powerbi
 
Actionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via TableauActionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via Tableau
 
Data visualization intro2
Data visualization intro2Data visualization intro2
Data visualization intro2
 
Data visualization intro
Data visualization introData visualization intro
Data visualization intro
 
Finanacial institutions nature and role
Finanacial institutions nature and roleFinanacial institutions nature and role
Finanacial institutions nature and role
 
Mutual funds
Mutual fundsMutual funds
Mutual funds
 
Fs unit-i nbfc
Fs unit-i nbfcFs unit-i nbfc
Fs unit-i nbfc
 
NBFC MBA I SEMESTER
NBFC MBA I SEMESTERNBFC MBA I SEMESTER
NBFC MBA I SEMESTER
 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Business forecasting project border

  • 1. Course: Business Forecasting Submitted to: Prof. Apratim Guha Topic Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Submitted by: Name Roll No. Shruti Nigam EA21029 Business Forecasting Project Report PGCBA Batch-IV
  • 2. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Preface As a part of the curriculum for the course Business Forecasting, XLRI PGCBA-4 Program, we are required to apply various models and forecasting techniques on a case study and submit a project report. The basic objective behind doing this project report is to get hands-on experience and practical knowledge on business forecasting tools and techniques that can be used to solve real world problems. In this project report, we have used various techniques like basic exploratory analysis, visualization of time series, regression modelling for time series data, 2-layer ARIMA model for analysis, decomposition methods, residual analysis to smooth out data to be able to forecast accurately and to reach to our conclusion. Doing this report helped us to gain deeper insights into the fields of analytics and application of business forecasting to resolve real-life challenges and problems.
  • 3. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Abstract Business forecasting allows us to analyse the data in hand, creating strategies for projections, and then comparing the forecasting model to the realized outcome. Forecasting can be done by many methods. The time series analysis method has been used in the project to predict the future trend or pattern analysing the given data over given period. Time Series data focuses on the patterns found in the historical data and uses statistical methods to understand how time affects the target variable. Here, concepts such as analysis of the seasonality, trend, cyclicity, and irregularity found in historical data are used to understand the future better. In this project we are applying the time series analysis to determine the current trend of rate of influx of individuals and commercial vehicles from Mexico and Canada to US via multiple modes of transport. Based on which forecasting has been done to compare it with the realized outcome. The overview of methodology has been displayed with introduction in next pages.
  • 4. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Table of Contents Introduction ............................................................................................................................................5 Overview of the procedure to be followed ............................................................................................5 Data Exploration .....................................................................................................................................6 2.2 Data Loading .....................................................................................................................................6 2.3 Data Pre-processing..........................................................................................................................6 2.4 Data Exploration ...............................................................................................................................6 Decomposition........................................................................................................................................9 3. Model 1: Regression with trend and seasonality..............................................................................11 Model 2: Pure ARIMA model ................................................................................................................13 Model 3: Seasonal Naïve.......................................................................................................................14 Model 4:Holt-Winters...........................................................................................................................15 Model 5: Neural network model...........................................................................................................15 Model 6: ETS .........................................................................................................................................15 ACCURACY.............................................................................................................................................16 Forecasting............................................................................................................................................16 Bibliography ..........................................................................................................................................18 Appendix ...............................................................................................................................................18
  • 5. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Introduction The wisdom of globalization happening over the years has introduced us to the concept of tearing down borders. The statement to support this says that growing integration and interdependence leads to a retreat of the regulatory state, more open borders, and more harmonious cross-border relationships. In fact, the prominent free market advocates such as Wall Street Journal even published that border not only more meaningless for the flow of goods and money but also for people, backing the merging ‘borderless world’. However, after the devastating 9/11 terrorist attacks on the mainland US turned the North America’s vision of having a border-free vision into another direction. One of the immediate responses by US authorities after the attack was to do something about the leaking border. Rather than simply being dismantled in the face of intensifying pressures of economic integration, border controls are being retooled and redesigned as part of a new and expanding “war on terrorism.” Traditional border issues such as trade and migration are now inescapably evaluated through a security lens. Optimistic talk of opening borders has been replaced by more anxious and sombre talk about “security perimeters” and “homeland defence”. The American public’s views on the border-less world also re-shaped as prevalent fears about unpredictable terrorism has heightened after the incident. In this project, we are focusing to determine that how the inflow and outflow of the changing practice and politics of North American border controls by analysing the implications of these changes for cross border relations and continental integration. We are using the past data to find and analyses the pattern and past trends and finally to forecast the trend for the most recent year of 2020-21. Abbreviations used in the report ‘ ts – Time Series WN – White noise Overview of the procedure to be followed
  • 6. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Data Exploration 2.1 Data Explanation This data file contains whole data from January 1996 to February 2020 of the total incoming crossing counts into the US. This file contains 7 columns specifying the port and its unique code, the border, the mode of vehicle used, number of people crossing the border into the US, the date and time of crossing, the mode of transport used to cross over. • Port Name: Name of the port from which the border is crossed. • State: States in US • Port Code: Unique port code • Border: US-Canada or US-Mexico border • Month: Jan to Dec till Feb 2020 • Year: 1996 to 2020 • Date (DD/MM/YY): Date of crossing the border • Measure: Mode of transportation • Value: Count of people crossing 2.2 Data Loading The data which is in excel format was loading into a variable. There are 355511 records and 7 variables in the dataset. 2.3 Data Pre-processing We checked for the nulls in the data set as a part of data pre-processing. No nulls were found in the data set. 2.4 Data Exploration The dataset is a representation of influx of immigrants from Mexico and Canada borders of US to various states via varied modes of transport. The data consists of number of migrations on daily basis across multiple entry points of states across the US. In time series forecasting only numerical variables can be considered in X variables/predictors. The data was imported into R and then grouped into Monthly data, from January 1996 to February 2020. Data visualizations plots were plotted to ascertain time series behaviour over the years and months. Plots show a varying trend over the years with no cyclic pattern. Seasonality seems to be prominent characteristic.
  • 7. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Figure 1. Data Table for Individuals and commercial vehicles crossing US border into states, Jan 1996 to Feb 2020 There is a sharp decline post 2001, which suddenly stabilizes by 2012 and start moving slightly upwards. There is a significant change in trend observed from 2010 hence, data till 2010 should not be used in the analysis. We reduce the dataset, will consider data from January 2011 onwards, pls refer Fig-2 Figure 2. Autoplot of entire data, since 1996
  • 8. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders There is clear seasonality every 12 months, dataset can be treated as frequency 12. The change in seasonality can be seen clearly over years, it appears to follow almost similar pattern every year. Over the years people crossing the border follow the same pattern throughout the year, every month as it has been set since 1996. Despite huge decrease in number of people crossing the border every decade. See (Fig-3,4) There seems to be no cyclic pattern, first decade numbers descend, next decade numbers ascend. Even 5 years movements are different. Graphs suggests time series has prominent presence of trend and seasonality components, i.e., time s is non-stationary. The data has been reduced to 2011 onwards, on plotting the data from 2011 to 2020, we observe that there is clear pattern of seasonality and slightly increasing trend. See (Fig-6) and (Fig-7). Figure 3. Seasonality plot of entire data Figure 4. Seasonality plot of entire data, numbers have decreased but pattern remains similar Figure 5. Plot of Reduced dataset since Jan 2011 to Feb 2020
  • 9. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Let us decompose the time series to understand the components that will help in building the model for forecasting. The presence of seasonality in the data between 2011-2020 can be seen in the seasonal plots. Pls refer (Fig-8 and Fig-9) Over the years number of people crossing the borders into US is following the same pattern through the year and months from 1996. Despite huge decrease in number of people crossing the border every decade, the seasonal pattern persists. Pls refer (Fig-8, Fig-9, and Fig-10). Decomposition To further analyse the data, we decompose the data into individual components of trend, seasonality, and residuals/errors. Pls refer (Fig-6) Classical Method (additive model) of decomposition was run on the data and the following observations were made: 1. Presence of strong seasonality 2. Trend line is strong and volatile 3. High Variance in the remainders Figure 5. Seasonality, sub series plots of reduced Dataset. Similar patterns as entire dataset
  • 10. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Classical Method (multiplicative model) of decomposition was also run on the data and the following observations were like the additive model. Therefore, these methods are not comparable and X11 seasonal smoothening is being applied to understand the time series components. X11 method was applied on the data and the findings are follows: 1. It has automatically selected the additive time series structure 2. Presence of seasonality is strong, and intervals do not increase with time The time series decomposition has defined the model as additive. Now, we prepare for creating Models for forecasting. We divide the dataset into train and test. January 2011 to December 2018 is train and January 2019 onwards is test data. Then we check the residuals characteristics in both grouped data and train dataset by Box cox test, by looking at lambda value, checking for heteroskedasticity. Before, going for the modelling we checked for the presence of heteroscedasticity, through Box cox test function. We get the λ = 0.5925186. This makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler. In this case it works quite well. There is clear seasonality every 12 months, dataset can be treated as frequency 12. And it is evident from the ACF plot, lag values are decreasing towards zero, very slowly. Plus, lags are outside transformation points. Hence, the train dataset is non-stationary (however, seasonality being present, no need to check for non-stationarity). Figure 6. Residula check in reduced dataset.
  • 11. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Moreover, the box-pierce test shows the presence of white noise, hence rejecting the null Hypothesis. Let's start modelling the ts for forecasting number of passengers influx into states (including both borders crossing) 3. Model 1: Regression with trend and seasonality The tslm() function fits a linear regression model to time series data. Now we model our dataset for obtaining forecast. To build the forecasting model, data has been split into test and train, with train set containing datapoints from Jan 2010- Dec-2017 and test set containing datapoints from Jan 2018- end. t-value and p-value holds no meaning in terms of forecasting. If the predictions are close to the actual values, we would expect to be close to 1. On the other hand, if the predictions are unrelated to the actual values, then (again, assuming there is an intercept). In all cases, lies between 0 and 1. Multiple R-squared: 0.9575, Adjusted R-squared: 0.9504 are near to 1. Meaning the predicted fitted values by the model 1 are much closer to original prediction. We can see that in the following figure. Let’s check the residuals behaviour too, whether model id good enough for prediction in test data. We check residuals in train data then check accuracy on test data. Because there is no guarantee that model performing good in train will perform same or better in test data. Though, the fitted values are closest to the original values, the residuals of the model exhibit autocorrelation, which is clear from the plot, and because the Breusch-Godfrey test for serial tslm(formula = Value ~ trend + season, data = crossing.ts.train) Values’ = 43948 + (-2177459)* season2 + 1797434 * season3 + 1110099 * season4 + 2456941* season5 + 2125431 * season6 + 5482345 * season7 + 5572770 * season8 + 1241338 * season9 + 1989670 * season10 + 968427 * season11 + 2760972* season12 Figure 7. Fitted line by TSLM model over train dataset.
  • 12. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders correlation test p-value = 0.01292 < 0.05, rejects the Null hypothesis of No autocorrelation/presence of White Noise. Variation in time series is present but there is no sign of heteroscedasticity. The histogram shows that the residuals seem to be slightly skewed, which may also affect the coverage probability of the prediction intervals. Most number of Lags are crossing out of transformation lines, i.e. Prediction Intervals (1.98/N). In any case, the autocorrelation is large, which will impact PI and thus have impact on forecasted values or PI. Hence, we need to fit a second layer ARIMA model to the residuals, for improving the prediction capability by capturing the information left in the residuals. First, check for the differencing needed to smooth out the residuals. Differencing can help stabilise the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality. This process of using a sequence of KPSS tests to determine the appropriate number of first differences is zero to make the data stationary. Because seasonal differencing returns 1 (indicating one seasonal difference is required). These functions suggest we should do both a seasonal difference and no first difference. So, d = 0, and D = 1. Now create the trend and seasonality variables based on this to be fed into ARIMA model. Model fitted by auto.arima is ARIMA(1,0,1)(2,1,0)[12] with AICc = 1241.86 . The coefficients of AR and MA terms is less than 1, the sum of the coefficients of the two seasonal AR terms is less than one and the sum of the coefficients of two MA terms is also less than one. Figure 8. Residual check for TSLM model (1-0.9764 * B) (1 – (-0.4329)* B12 ) (1- B12 )* Yt = (1 + (-0.3645) * B12 ) et
  • 13. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders p-value is 0.1967 greater than 5% in Ljung Box test, meaning fail to reject null hypothesis. i.e. White noise is present. However, we can see lag 7 is going outside PI, only one lag going outside transformation line will not impact the prediction from the model. This seems to be a good model. This ARIMA model is stationary and invertible. AICc=1241.86 , MAPE = 225.185, MAPE = 0.6689771: MODEL TS+ SARIMA Model 2: Pure ARIMA model Before ARIMA , will check for Non stationarity and whether differencing is needed or not. By using Unit root tests These are statistical hypothesis tests of stationarity that are designed for determining whether differencing is required. A number of unit root tests are available, which are based on different assumptions and may lead to conflicting answers. In our analysis, we use the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski, Phillips, Schmidt, & Shin, 1992). In this test, the null hypothesis is that the data are stationary, and we look for evidence that the null hypothesis is false. Consequently, small p-values (e.g., less than 0.05) suggest that differencing is required. On checking residuals on train data, lag 1 is highly significant, and followed by negative lag 6. Lags oscillate after fixed number of lags interval, indicating prominent seasonality. It also shows lags going outside transformation line, meaning no White Noise. KPSS Test for Level Stationarity has p-value = 0.01 less than 0.05, rejects the Null, i.e., series is non- stationary. Augmented Dickey-Fuller Test p-value = 0.2981, Fail to reject Null, i.e., ts is non-stationary. Tests show differencing of 1 lag is needed and zero seasonal differencing is needed to reduce the differenced time series into stationary. We apply Pure ARIMA model with d = 1 and D=0 by auto. Arima(), model is ARIMA(0,1,0) with AICc=1015.96 MAPE = 1.374052 MASE = 0.611111. Figure 9. Pure Arima Model fitted over train data
  • 14. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders A Ljung-Box test returns a p-value = 0.001733 less than 0.05, suggesting that the residuals are NOT white noise. Lag 1 is negative and significant and goes out of transformation line. Others are in waves, only one lag may not have significant impact on forecasting. Seasonality component is not captured by the model. We try to see the fit of the forecasted values by the model with the original dataset. We can easily see that fitted values are closely following the test dataset values. The Model seems a good fit. Next, we try to build a better model on train data directly. To make train data stationary, we need one non-seasonal and one seasonal differencing. Making the next pure model seasonal ARIMA. Model is ARIMA (0,1,1) (0,1,1) [12] with AICc=852.48, MAPE = -0.2317185, MASE = 1.103649 This has the lowest AICc value with MASE. Ljung-Box test has p-value = 0.9549, Fail to reject null i.e., pure WN, evident from the ACF plot too. This is the best model so far. Now, we explore other models. Model 3: Seasonal Naïve The forecasting techniques of Naïve, Seasonal Naïve, Average and Drift was applied on the data. Seasonal naive has least value for MAPE AND MASE on test data On fitting the various forecasting methods output on data, we find a better fit on the naïve seasonal forecasting technique. The checking upon the model’s residuals the following observations can be made (FIG-10): 1. ACF – all the lags beyond the threshold limits 2. Lag 1st is significant, Every 12th lag point higher the rest 3. Residual are left skewed however the curve normal 4. There is strong presence of autocorrelation. On performing the Ljung-box tests we find a highly significant p=value (<2.2e-16) (Fig-11). Indicating residuals are non-stationery and residuals contain more information and require further modelling. Again, we will check differencing needed to make residuals stationary. 1 Non seasonal differencing is needed on residuals. We fit auto.arima with d=1. The model is ARIMA(0,1,1) with AICc=1195.28, MAPE 246.2098 MASE 0.6888965. This is model for residual forecasting. The residuals have become pure WN with Ljung-Box test p- value = 0.1834 greater than 5%. Now, we need to fit the forecasted model into original values. First, calculate forecasted values by adding up forecasted values from seasonal naïve model with forecasted residuals from auto Arima to get proper forecasted values. Now we put this on the plot. - (1-B) * (1- B12 )* Yt = (1 + (-0.7723) * B) + (1 + (-0.6338) * B12 ) et - (1-B) * Yt = (1 + (-0.7723) * B) + (1 + (- 0.8044) * B) et
  • 15. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders As we can see fit is not as good as pure Arima. We will later compare the accuracy. Let’s see how smoothing parameters work in this data for forecasting. Model 4:Holt-Winters We fit both Additive and multiplicative holt winter’s model, as both trend and seasonality are present. The output of HoltWinters() tells us that the estimated value of the alpha parameter is about 0.032 for additive and 0.0616 for multiplicative. This is very close to zero, telling us that the forecasts are based on both recent and less recent observations (although somewhat more weight is placed on recent observations). The model shows additive ts structure fits better than multiplicative. On checking residuals, only lag 24 is significant, all other lags are WN with Ljung-Box test for additive is p-value = 0.01273 less than 5%. Rejects the Null, i.e. some non-stationarity is there. However, one larger lag will not impact the forecast significantly. We try to plot the dataset with fitted values. Residuals are normally distributed with little bit skewed with no heteroscedasticity. The linear exponential smoothing models are all special cases of ARIMA models, the nonlinear exponential smoothing models have no equivalent ARIMA counterparts. For non-stationary residuals and data, we can also explore ETS models. Model 5: Neural network model We model the d=train data by neural network. It is evident that, lag 1 is significant and positive, from lag 18 all lags become negative. No heteroscadascity is present in residuals. The model is NNAR(2,1,2). There is no AICc value computed for this. We later compare the accuracy of the forecast. Let’s explore ETS, we have experienced the data residuals display non stationarity. As we know all ETS models are non-stationary, while some ARIMA models are stationary. Model 6: ETS ETS model is a time series univariate forecasting method; its use focuses on trend and seasonal components. ETS a three-character string identifying the nature of Time-Series components, first character: Nature of Remainder: l t second character: Nature of Trend: b t third character: Nature of Seasonality: s t (Sindhanuru, n.d.) (Athanasopoulos, n.d.) The ETS models with seasonality or non-damped trend or both have two-unit roots (i.e., they need two levels of differencing to make them stationary). All other ETS models have one unit root (they need one level of differencing to make them stationary). Let’s see simple ETS model on train dataset. The model is ETS(M,Ad,M): Remainder is Multiplicative, Nature of trend is Additive but damped, Seasonality is Multiplicative
  • 16. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders With alpha = 0.1316, close to zero. AICc =2581.995, MAPE = 1.051038 MASE = 0.4693708 Residuals exhibit non stationarity, as p-value = 0.006745 greater than 5%, rejects null hypothesis of Ljung-Box test. As visible in the plot only one lag24 is outside the line, it may not have significant effect on prediction. However, MASE is the lowest but AICc value is highest among the models of this project. Additionally, residuals are normally distributed. On plotting the fitted forecasted values on training dataset, the fitted line closely follows the original data points and shape of the ts. ACCURACY We have forecasted numbers of individuals crossing the US border, by each model we have built. Then tested the accuracy of the model on the Test Data. The following is the performance of each model on test data: Accuracy on test data Model Number Model Name AICc of Model MAPE MASE 1 TSLM 1241.86 1.672851 0.7902939 2 Pure ARIMA 852.48 1.570884 0.7342843 3 Seasonal Naive 1195.28 2.017029 0.9388456 4 Holt Winter's 3436.111 1.527619 0.7171233 5 Neural Network - 2.309248 1.0874329 6 ETS 2581.995 1.865868 0.8714899 AICc value is the lowest for Pure ARIMA model, with a low MASE and MAPE. Holt-winter’s MAPE and MASE are the lowest, but AICc value is the highest. Forecasting Let’s forecast the values by Pure ARIMA Model by re-training the chosen model on the whole dataset, and then produce the forecast. Simple ETS model: Yt = l t + b t + s t Yt = 27398694.1062 + 104492.8019 + sum (1.0273 0.9665 1.0007 0.977 1.1223 1.1185 1.0062 1.0183 0.972 0.9953 0.8618 0.9341)
  • 17. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders We can see the number of individuals has increased, with replicating the shape previous ts seasons. These are the forecasted numbers for next 12 months with 80, 90, 95 PI. In conclusion, it can be deduced that the pattern of influx may witness a decline as compared to the previous years. This could be an impact of the ongoing pandemic or the increased legal requirements for crossing the border or many other factors which is beyond the scope of this study. We can go ahead and now validate the model, using dataset from March 2020 to December 2021, available at (Border-Crossing-Entry-Data, n.d.). The model cannot predict movement based on previous database, due to pandemic restrictions across borders. We can see sharp decrease in April 2020, which gradually increase as restriction lifted over time. But model shows similar season movement across pandemic, showing macro-economic, political factors are not factored in the forecasting model. We have learned that Proper forecast requires Comprehensive approach of analytics and domain knowledge and expert views with Judgemental forecasts. Figure 10. Final Forecast by Pure ARIMA
  • 18. Business Forecasting Project Report PGCBA Batch-IV Time Series Forecastingof the number ofIndividuals crossinginto USvia Mexico and CanadaBorders Bibliography Athanasopoulos, R. J. (n.d.). Forecasting: Principles and Practice (2nd ed). Retrieved from otexts.com: https://otexts.com/fpp2/arima-ets.html Border-Crossing-Entry-Data. (n.d.). Retrieved from data.bts.gov: https://data.bts.gov/Research-and- Statistics/Border-Crossing-Entry-Data/keg4-3bc2/data Sindhanuru, H. (n.d.). Retrieved from www.latentview.com: https://www.latentview.com/idealab/exponential-smoothing-ets-framework/ time-series-analysisforecast-with-visualization. (n.d.). Retrieved from kaggle: - https://www.kaggle.com/datafan07/time-series-analysisforecast-with-visualization/data PowerPoints and R , Rmd files , codes provided by Professor during sessions. Figure 1. Data Table for Individuals and commercial vehicles crossing US border into states, Jan 1996 to Feb 2020 _________________________________________________________________ 7 Figure 2. Autoplot of entire data, since 1996 ___________________________________________ 7 Figure 3. Seasonality plot of entire data________________________________________________ 8 Figure 4. Seasonality plot of entire data, numbers have decreased but pattern remains similar ____ 8 Figure 5. Seasonality, sub series plots of reduced Dataset. Similar patterns as entire dataset______ 9 Figure 6. Residula check in reduced dataset. ___________________________________________ 10 Figure 7. Fitted line by TSLM model over train dataset.___________________________________ 11 Figure 8. Residual check for TSLM model ______________________________________________ 12 Figure 9. Pure Arima Model fitted over train data _______________________________________ 13 Figure 10. Final Forecast by Pure ARIMA ______________________________________________ 17 Appendix .rmd file Knitted word document. C:UsersShruti DocumentsXLRIBF Apratim Guhaassignment BFG6Group 6 PGCBA IV BF PROJECTGroup6-Border-Crossing Knitted doc.docx