2. Executive Summery
Forecasting is the necessary issue in business for better prediction of the available data over
years. Every business needs the forecasting so that they can plan for the future. There are
different prediction strategies are available in the statistics through which the better prediction
can be done. If the prediction for the business is strong enough, the forecasting can be done
with more perfection and thus the growth of the business will be better. So, in this paper, the
better prediction is applied for the prediction and forecasting of the Air Passenger data so that
with the forecasting, the business model can be prepared in better way. As the analysis is
corresponding to the air passengers, so the model will be helpful to the airlines industry to
check the passenger level per year and as per the prediction they can make all kinds of
arrangements for their clients. So, on the way to achieve the success, the data analysis is done
using R Programming and the outcomes of the predictions are discussed in the respective
sections.
3. Table of Contents
Introduction................................................................................................................................ 4
Data Overview............................................................................................................................ 4
Exploratory Data Analysis..........................................................................................................5
Reading Data .......................................................................................................................... 5
Summary of Data.................................................................................................................... 6
Raw Data Overview................................................................................................................ 7
Conversion of the data to Time Series..................................................................................... 8
Decompose Data...................................................................................................................... 9
Decompose........................................................................................................................... 9
Trend................................................................................................................................. 11
Seasonal Components........................................................................................................ 11
Random Components ........................................................................................................ 11
Predictive Model................................................................................................................... 11
ARIMA.............................................................................................................................. 11
Linear Regression.............................................................................................................. 12
Comparison of Models....................................................................................................... 13
Recommendation...................................................................................................................... 13
Conclusion................................................................................................................................ 13
References................................................................................................................................. 15
4. Introduction
Airlines service is the most demanding in the world as the passenger can get the comfort while
their journey and within very short time, they can reach their destination. Apart from the
airways, there are others ways are available like the waterways and the roadways. Those for
the national journey by if someone wish to travel beyond the nation, the airways are most
feasible as it is through the air and there is no interruption (Mao & Xiao, 2019). So, for this
kind of journey, the passengers like to travel through air and thus the industries are grown for
that purpose. In this paper, the data regarding to the airlines passengers is taken as the reference
upon which the analysis will be done to make the predictive model. Different models are
applied on the data and then by checking the comparison, the best model will be accepted for
forecasting purpose (Y. Guo, 2018).
Data Overview
The data for the analysis is collected from the repository which contains two attributes namely,
Month and passengers. With a quick observation of the data, it can be found that the data
containing the records of the passengers travel in the range between 1949 to 1960. Within that
range, the number of the passengers that are travelled are provided in the dataset (Y. Hirata,
2017). With a concurrent analysis of the data it can be seen that in the primary stages, the
passengers are lower and in the last state of the data containing high passengers. The data is
shown below:
Table-1: High Numbers of passengers
Top 10-Months with High Passengers
Month #Passengers
1960-07 622
1960-08 606
1959-08 559
1959-07 548
1960-06 535
1960-09 508
1958-08 505
1958-07 491
1959-06 472
1960-05 472
Table-2: Low Numbers of Passengers
5. Least 10-Months with High Passengers
Month #Passengers
1949-11 104
1949-01 112
1950-11 114
1950-01 115
1949-02 118
1949-12 118
1949-10 119
1949-05 121
1950-05 125
1950-02 126
So, it can be seen that the number of passengers is continuously increasing. In the next section
of the paper, the Exploratory data analysis will be done through which the forecasting of the
data will be made visible (J. M. Amigó, 2016).
Exploratory Data Analysis
Exploratory analysis basically explores the statistical issue within a data. The analysis is well
known the field of statistics and data analytics and hence is used her to explore the insight of
the data (Hu, 2018). Hence the data is collected and the analysis is done using the R
Programming Language. R programming is specific for the analysis of the data and to explore
the statistics within the data and so this is used here for the Exploratory Data Analysis of the
Air Passenger data. The subsections are dealing with the coding overview and the description
of the outcomes (F. Karim, 2018).
Reading Data
Primarily the data is fetched from the device location and the data is read to check the
consistency of the read operation of data.
Fig-1: Reading and checking the data head
6. Hence the data is read and the data is checked using the head function which shows the ve rows
of the data. The outcome of the code section is as follows:
Fig-2: Data Checking
So, here the data is checked.
Summary of Data
To check the summery of the data, firs the required packages are imported so that the methods
under those packages can be used for the Exploratory Data Analysis.
Fig-3: Importing necessary packages
After importing the packages, the summery of the data is checked. This can be checked using
the built-in method summery (). The code snippet is shown below:
7. Fig-4: Checking Summary
After checking the summery, the statistical issues of the data will be obtained as the outcome.
The outcome is shown below:
Fig-5: Summary of the Data
The summary shows the statistical issues of the data. Seen that within that data, the minimum
number of passengers arrived are 104 in the month of January, 1949 and the maximum number
of passengers arrived are 622 in the month of June, 1960. The average number of passengers
arrived use to travel is obtained as 280 as mean and with median it is 265.
Raw Data Overview
In this sub section the raw data overview is observed. The raw data is the data frame and the
plot function are applied to check the overview of the data (M. Fukino, 2016). The code section
is shown below:
8. Fig-6: Checking Raw Data
The outcome of the code is shored in the local directory and that is shown below:
Fig-7: Outcome of the Passenger ay data
The outcome of the data shows the trend of the passengers and hence the output is plotted. For
the convenience of the data understanding, the NaN is checked because if any NaN value will
persist in the data, it cannot be applied in a model and the analysis cannot be one using any
statistical model (Li, 2018).
Conversion of the data to Time Series
The primary objective is to do the analysis of the data for passengers with respect to time. So,
for this case, Time Series data is required. Now, first the type of the data will be checked and
if found that it is not a time series data, it will be converted to time series data (Hu, 2018). The
conde snippet for these actions are as follows:
9. (a) (b)
(c)
Fig-8: Conversion of data frame to time series
As shown in the figure, primarily, that was a data frame on which the time series analysis
cannot be done (Li, 2018). So, the data frame is converted to the Time Series so that the
required operations will be performed.
Decompose Data
In this subsection, the data that is now the time series data will be decomposed to get the trend
of the passengers, to get the seasonal components and also to get the random components of
the data. All ae described below after decomposing it (Mao & Xiao, 2019).
Decompose
Decompose is the operation through which different components of the data can be obtained.
Hence the decomposition is shown below:
10. Fig-9: Decomposing Data
With the decomposition of the data using decompose() method, three components are found
and those are defined below with the output plot as defined in the code section.
Fig-10: Components of Data after decomposing
11. Trend
The trend plot is shown in fig-10 describe the trend of the passenger for the use of the airways
(Y. Guo, 2018). As the trend is shown it can be said that the number of passengers is increasing
the year is progressing which means the passenger number will seem to be increased in future
years.
Seasonal Components
The season effect of the data reflects the insight of the data over a period. The plotting is already
shown in Fig-10 which describe the fact that the ups and downs of the passenger with the
progress of year for airlines (Li, 2018).
Random Components
This is the third component of the time series data. Basically, this is the random plot by taking
random data from the original time series for Air Passenger.
Predictive Model
The time series data shows the number of passengers with respect to the given time. Hence, the
objective, as discussed, is to predict the airlines passenger flow for the future year which are
beyond 1960. This required the predictive analysis and the outcome should be obtained in terms
of future trend (H. Guan, 2017). In this subsection, the prediction is done with help of ARIMA
model and Linear Regression model to predict the traffic passenger flow for the airlines which
will be helpful to predict for the business also. Now, the determination of the best model in
between these two can be done using the valuation of the Root Mean Square Error and the other
error can also be the parameters but for this case, the RMSE is considered (Y. Guo, 2018).
ARIMA
The ARIMA Model is dine using the method auto.arima which transform the time series data
to corresponding time ARIMA model and hence produce the Errors that are obtained by this
model. The code is shown below:
Fig-11: ARIMA Model
Using the ARIMA Model the model error is been found out which will be helpful to determine
the model for the future prediction. Hence, the error related to the ARIMA model is shows
below:
12. Fig-12: Accuracy of ARIMA Model
So, for the forecasting purpose, the ARIMA model RMSE is obtained to be 9.888653. Hence,
this value will be compared to the value of the RMSE of Linear Regression to obtain the high
accurate model. The entire error report to compute the accuracy is:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.5256045 9.888653 7.475767 0.1057234 2.870315 0.2437531 0.01
120024
Linear Regression
Linear Regression is another model to predict the trend of the passenger in this paper. Linear
regression is simply following the equation of the straight line and hence it is the simplest one
to deploy the data into model (J. M. Amigó, 2016). Linear Regression is done in this section
with the built-in method called lm. As the data-frame is converted to the time series, to check
out the trend, linear regression is applied on the time series data. The code for linear regression
is shown below:
13. Fig-13: Accuracy of Linear Regression
So, using the Linear Regression model, the RMSE is obtained by 41.57885 which is near about
4 times higher than the RMSE obtained by ARIMA model. The accuracy report of Linear
regression is as follows:
ME RMSE MAE MPE MAPE MASE
Training set 4.274398e-16 41.57885 31.26182 -1.38991 11.87438 0.3482385
Comparison of Models
So, in this section, two models are designed, one is the ARIMA model and another is by Linear
Regression. In these respective codes, the errors are found out to tally the accuracy of the model
(Y. Hirata, 2017). The model accuracy is checked by the built-in method accuracy(model) and
thus the outcome is obtained by in terms of amount of error. It means if the error is lower the
model becomes more efficient (H. Guan, 2017). The forecast will be done on the basis of the
most accurate model. The comparison table is shown below for the errors:
Model Error ME RMSE MAE MPE MAPE MASE
ARIMA 4.27E-16 41.5789 31.2618 -1.3899 11.8744 0.34824
Linear Regression 0.525605 9.88865 7.47577 0.10572 2.87032 0.24375
If the table is observed, it can be seen that most cases, ARIMA attracts lesser error compared
to the Linear Regression model. So, it can be said that, the forecasting of the future trend will
be perfect if it will be done using ARIMA Model (Li, 2018). So, in this subsection, the
forecasting is done using ARIMA Model.
Fig-14: Forecasting using ARIMA (Best fit) model
Recommendation
This analysis is based on the forecasting of the air passenger data. As it is the business issue,
the forecasting model can be applied in any predictive case like forecasting the stock market,
for traffic overviews, weather prediction and others (Hu, 2018). As the model uses the ARIMA,
so the accuracy s higher as the error rate is very low and thus it can predict with efficient
manner and forecast the actual prediction will less error.
Conclusion
In this paper, the data is collected and analysed using R programming. The main analytical
issues drawn here is the timeseries analysis that is to forecast the passenger overview and
prediction for future years so that the business perspective can assume the number of air
14. passenger will be arrived in some particular year (M. Fukino, 2016). All the code shots are
given for the better understanding of the analysis along with the screenshot of the output to
check the consistency of the analysis. Finally, the ARIMA model is chosen for its higher
accuracy and the forecasting is done on the basis of that (Y. Hirata, 2017).
15. References
F. Karim,S.M. H. D. S. C.,2018. LSTM fullyconvolutional networksfortime seriesclassification. IEEE
Access, Volume 6,pp.1662-1669.
H. Guan, S. G. A.Z., 2017. Forecastingmodel basedonneutrosophiclogical relationshipandJaccard
similarity. Symmetry, 9(9),p.171.
Hu, M., 2018. Detectinganomaliesintime seriesdataviaa meta-featurebasedapproach. IEEE
Access, Volume 6,pp.27760-27776.
J. M. Amigó,Y. H. K.A., 2016. On the limitsof probabilisticforecastinginnonlineartimesseries
analysis. ChaosInterdiscipl.J.NonlinearSci, Volume 26.
Li, P.,2018. Dynamicsimilarsub-seriesselectionmethodfortime seriesforecasting. IEEEAccess,
Volume 6,pp.32532-32542.
M. Fukino,Y.H. K.A., 2016. Coarse-grainingtime seriesdata:Recurrence plotof recurrence plots
and itsapplicationformusic. ChaosInterdiscipl.J.NonlinearSci., 26(2).
Mao, S. & Xiao,F., 2019. Time Series ForecastingBasedonComplexNetworkAnalysis. IEEEAccess,
Volume 7,pp.40220 - 40229.
Y. Guo, X.S. N. L. D. F.,2018. An efficientmissingdatapredictionmethodbasedonkronecker
compressive sensinginmultivariable time series. IEEEAccess, Volume 6,pp.57239-57248.
Y. Hirata, K.A., 2017. Improvingtime seriespredictionof solarirradiance aftersunrise:Comparison
amongthree methodsfortime seriesprediction. Sol.Energy, Volume149, pp.294-301.