SlideShare a Scribd company logo
1 of 8
Download to read offline
Flight Delay Prediction using Data Mining
Abstract— Airplane industry is growing fast these
days as it has become the favorite mode of transport
for most people because they are finding it cheap and
is faster than other modes. However, like other modes,
it also has some negative aspects or has some
disadvantages. Due to growing traffic in the airline
industry and many more reasons, the flights are
getting delayed and causing inconvenience to the
customers. It has cost millions of dollars in United
States in the recent years. It has also affected many
transportation companies. To get rid of this problem,
it is necessary to find out the factors causing the delays
in flight. Classification has been proved to be effective
in many fields for solving different problems. Various
classification algorithms are applied like K-Nearest
Neighbors, Decision Tree C50 and Artificial Neural
Networks. The performance of these algorithms is
compared and decision tree c50 turns out to be the best
algorithm with an overall accuracy of 85%.
Keywords— Classification, Data Mining, K-Nearest
Neighbors, C5.0, Artificial Neural Networks.
I. INTRODUCTION
With the increase in population, the use of
vehicles and transportation is also increasing.
Eventually, traffic has increased which is causing lot
of problems. This results in the wastage of time of
lot of people. People going to office or important
meeting faces problem of reaching on time. For
saving time, now-a-days, people are preferring
airline as a transport. However, air-traffic has also
been increased these days which results in delays of
the airplane. There are a number of reasons that are
causing delays in flight, eventually, affecting many
things.
Bureau of Transportation Statistics (BTS) states
that, there are 5 major reasons behind the flights
getting delayed, such as late aircraft, weather,
carrier, NAS and security [1].
Around 18% flights were delayed and 1.5% of
flights were cancelled in the year 2015 in United
States. It costs billions of dollars in the airline
industry each year in the United States (around 40
billion each year). Not only the industry, but even
the customers are affected due to it. 800 million of a
total of 7 billion people travel in the United States
each year [2].
Many people may have an emergency and need
to reach to a place as early as possible. It would be
very inconvenient for that person to opt for an
alternative during such time as he will need to reach
fast. Also, transportation industry is badly affected
resulting in a great loss. Therefore, it is very much
necessary to stop or at least reduce the affects flight
delays has or to reduce the number of flight delays.
However, we first need to find out the factors or the
reasons resulting in the delays of flight. For solving
the problem of flight delay, a dataset has been used
which consists of information about the flights in the
United States. This motivates us to formulate a
research question and a probable solution for it. The
research question is ‘What are the factors the causes
flight arrival delays in the United States?’ and our
objective is to find out these factors using Data
Mining techniques. Different data mining
classification techniques have been used such as K-
Nearest Neighbors (KNN), Decision Tree C50 and
Artificial Neural Networks (ANN).
This data mining technique was applied on two
different datasets. These datasets consist the details
of flights of the US is taken for this paper. One is of
January 2017 and other of January 2018. First
dataset is taken from ‘Data World’ and other from
‘Transtats’ [25][26]. These datasets were combined,
and the combined dataset consisted of around a
million of flights and has around 60 features of the
flights and their delays. These features are
categorized into details of flights, details of the
source and destination, schedule of the flights,
reasons for delays, amount of delays, etc.
• Details of the flights like Carrier, Tail
Number, Airline.
• Details of the source and destination like
source airport, source city, source state,
destination airport, destination city,
destination state.
• Schedule of the flights like the year, month,
day of the week, day of the month, departure
time, arrival time.
• Reasons for delays like weather delay, NAS
delay, carrier delay, security delay, late
aircraft delay.
• Amount of delays in minutes for both,
departure and arrival delays.
• Many more details like wheels-on time,
wheels-off time, taxi-in time, taxi-out time,
whether the flight was cancelled or diverted,
distance travelled, etc.
However, there were many more attributes which
were not taken into consideration for solving the
research question. The main focus was on the arrival
delay so the arrival delay in minutes was the target
variable.
II. RELATED WORK
From the past twenty years, because of the most
convenient and less time-consuming mode of
transportation i.e. air travel is gaining popularity. But
the increase of number of flights also create air
traffic results into flight delay while applying the
classification machine learning algorithm it is found
that departure delay, taxi-out time, origin of flight
gains the most important score [3].
The delays in the flights staggeringly affects the
airline industry because it cost airline industry itself,
customer, economy of country millions of dollars per
month. The reasons of delays can form macroscopic
level to microscopic level. Costing and supervised
machine learning algorithm has been applied to find
cost sensitive classifier and to predict the flight
delays. The performance evaluation is done based on
cost ratio [4].
The bureau of transportation statistics provides the
Airlines data of united states. It gives the detailed
information of flights routes, timings, carrier, types
of delays, etc.. With the help of regression analysis
technique by regularization method it will predict the
flight delay in minutes. With this it will also give the
statistical description of the individual airline and
presents which hours are the busiest [5]. On this
dataset various research is still going on to
recommend customer about the flight delays.
To analyze the flight delay, we need to check every
aspect that are causing the issue. Like Airport, Route
of flights, Airlines, etc. one of them or different
combinations of these parameter should be taken into
consideration while analyzing the delays. To make
prediction better and recommending the best
performance evaluation, the results are grouped into
five parts. Statistical models, probabilistic models,
network representation, operation research, machine
learning will used to forecast the flight delays more
accurately [6].
To know does the airport business matters? for that
we need to check which airports has the maximum
number of flights departed and arrived. For this SQL
business intelligence tool was used. This tool also
presents the visuals and give statistical answer like is
the flight delayed when it departed? This study
presents there is a co-relation within day of the
month, month and departure delay [7]. But this
model presents the visuals by performing clustering
algorithm on the area of interest and with help of this
tool we cannot calculate accuracy percentage.
Two airports are connected by certain routes, that
could create problem in on-time flight arrival. If the
one flight is delayed on a certain route, then the
successive flights will also get delayed because of
this flight. The current delayed flight can affect
badly on all the scheduled flights on that route, the
chain reaction will happen [8]. To solve this problem
Bayesian network can help to know which factors
are influencing the flight delays [9].
While taking other important parameter into
consideration only weather is not lone responsible
for delays. Some research is made on assumption on
the weather condition and flights en-route are most
important factor while analyzing the flight delays
[10]. But the other parameters are fairly related to
weather conditions. Flight delays due to weather
condition shares 40% of the total delays [11]. The
historical weather data has been added to show better
performance. By applying the naïve Bayes and C4.5,
classify the two classes which is non-delayed and
delay above 30 minutes. It found that naïve Bayes
shows the better performance than the C4.5 [12].
The other parameters like time of the day, day of the
week, type of the hour, season might influence the
flight delays. The day of the week like is it weekend,
or week day shows business of the airport. To
classify and predict this, several operations were
performed like Artificial Neural Network (ANN),
Classification and Regression Tree (CART), Markov
Jump Linear System (MJLS). The consistency of
delays and corelated network are analyzed to
determine the delays in the airport. All the three
machine learning algorithm model gave different
accuracy. ANN performed best to show
classification of the origin-destination pair. On
contrast, origin-destination pair regression was best
fitted on Markov Jump Linear System. This study
can help to manage the air-traffic [2].
Two stages are created to perform, binary
classification and then prediction by regression.
Within some major performed machine learning
algorithm, Gradient boosting classifier and Gradient
boosting regressor presents the best results. This
model is built in such way that it can easily associate
with user interface. This interface helps the
passenger to gain prior knowledge about the delay in
the time of the flight the passenger is boarding [1].
Day by day air travel is the most preferable mode of
transportation. Almost all the cities are
interconnected by flights which creates air traffic
congestion. Now controlling this air traffic is also
complex task because it creates great façade in flight
delay. To solve this problem metroplex city, New
York was chosen. New York city’s airport has
served more than 100 million passengers. A multi-
layer clustering is applied to know the spatial
patterns in air-traffic. And by using random forest a
multi-way classification is build [13].
III. METHODOLOGY
There are several methodologies that can be used for
performing the data mining techniques such as
CRISP-DM (CRoss Industry Standard Process for
Data Mining), KDD (Knowledge Discovery in
Databases), SEMMA (Sample, Explore, Modify,
Model and Access), etc.
CRISP-DM is six-phase sequential process model
that is hierarchical and iterative and provides an
extendable framework [14]. SEMMA is also an
iterative model where the internal procedures are
iteratively run until the goal is achieved [15]. Both of
these models are somewhat similar but are slightly
different with respect to tasks, activities, phases, etc.
[16]. However, the method that has been in our
project is the KDD because it is easy, complete and
more accurate. As the name suggests, ‘Knowledge
Discovery in Databases’ is a process of extracting
important and useful hidden knowledge or
information from the databases or the available data.
A simple diagram describing the KDD process is
given below.
Fig. 1. KDD Process [17]
KDD is a nine-step model.
1. Understanding the domain, i.e., identifying
the target or what is to be achieved. In this
project, the target is to identify the factors
that are causing the delays in the flights, as
mentioned in the research problem. For this,
a background knowledge of the problem is
required to be understood to decide the
resources that can be used for solving the
problem.
2. Selecting the subset of variables, i.e., the
resources to be used for solving the question.
For this, a dataset of the details for the flight
and the reasons for flight delays was taken
into consideration, as mentioned above. This
was required so that the discovery can be
performed on it which can help us in
identifying the target we want to achieve.
3. Pre-processing of data, i.e., dealing with the
dirty data. Dirty data is very harmful to work
upon because it does not give us accurate
results. The quality of the results is disturbed
which misleads us to some wrong
information. This includes removing of the
noisy data, replacing of the missing values,
etc. In this project, rows containing the dirty
data was removed and the missing values
were replaced with zeros.
4. Reducing the data, i.e., considering only
those attributes that can contribute to the
target variable. Taking into account some
useless features can also disturb the
performance of model or gives inappropriate
or wrong results, i.e., it misleads us. In this
project, some useless attributes such as the
distance group, the date of the flight, airline
ID, etc. were not considered because it had
nothing to do with the target variable, i.e.,
the amount of time for which the flights are
delayed or the factors affecting the delays.
Another part of this step is the
transformation of the data, i.e., converting
the data into appropriate format.
Categorizing the type of data is an example
of transformation of the data. Many of the
attributes were categorized, for example,
arrival delay in minutes and departure delay
in minutes was categorized as early, on-time,
late and very late. Airport was categorized
by the frequency of flights as less busy,
medium busy and high busy. Distance
travelled by flight was categorized into short
distance, medium distance and long distance.
Week was categorized into weekdays and
weekends. Month was categorized into first
half and second half.
Some part of data was removed. For
example, the flights that were cancelled were
not taken into consideration. Similarly, the
flights that were diverted were not taken into
consideration. This is because the if the
flights were cancelled or diverted, there was
no question of the flights being arriving on-
time or being delayed. Flights that were
departed less than 5 minutes late were
assumed to be departed on time. Only the
top 4 origin airport were considered because
categorizing all the origin airport was not
possible and many more data were removed.
The top 4 origins were found out using the
‘Tableau’ visualization tool. The result is
shown below.
Fig. 2. Origin airports
5. Selecting the data mining procedures, i.e.,
the type of model that is to be constructed or
developed. These can be of different types
like classification, regression, analysis,
clustering, etc. depending the goal of the
domain. In this project, classification was
performed. Classification was used for
identifying the factors that are most affecting
or causing the flight delays.
6. Data mining algorithm, i.e., the technique of
the procedures that is to be applied to get the
results, which in this case, the classification
algorithm that will be applied. K-Nearest
Neighbours (KNN), Artificial Neural
Networks (ANN) and Decision Tree C50
were used. However, the accuracy of all the
models will be calculated and the results will
be compared to decide the best algorithm.
The detailed explanation of the algorithms is
mentioned in the next section.
7. Searching for patterns, i.e., extracting the
hidden patterns present in the output of the
classification like the factors displayed, or
the trees designed, or the network created.
However, it is also required to interpret the
results from the obtained graphs.
8. Interpreting the results, i.e., understanding
the patterns and extracting some important
information from the graphs as mentioned
above. These are the final results which were
aimed in the first step of the process. Some
of the above steps can be iteratively
reperformed to get better results or
understanding them in a better way. In this
project, the obtained results are compared to
identify the best algorithm amongst all.
9. Consolidating the knowledge, i.e.,
strengthening the output and results by
applying it at the right place or forwarding it
to the required area. In this project, the
acquired results can be applied in the real-
world scenario in the airline industry to
prevent the problem that is occurring or
being faced by the people [18].
IV. EVALUATION AND RESULTS
Various classification algorithms were applied on the
dataset as mentioned above.
A. K-Nearest Neighbours
KNN is a simple supervised non-parametric
model in which a sample input is classified into a
class depending upon which class is common
amongst the nearest neighbors. The nearness of the
neighbors is decided by calculating the distance
between them. ‘K’ number of neighbors are present
within a certain distance [19]. This ‘K’ value should
be such that it is appropriate for the model and gives
the minimum error. Smaller the ‘K’ value, poor the
estimation, bigger the value, smoother and better the
estimation. In this project, various ‘K’ values were
calculated. After performing a number of
combinations, we found out that K=31 was fitting
best for our model [20]. KNN has been used because
it gives all the factors that are logically nearer to the
target variables or more affecting or deciding
variables.
Fig. 3. K-values
From the above image, it can be seen that the
error for K=31 was minimum and so it was finalized.
Then, the confusion matrix was generated to
calculate the accuracy of the model.
Fig. 4. Confusion Matrix of KNN
From the above image, it can be seen that out of
all the flights that arrived early, 29513 were correctly
predicted, 82 of late and 118 of very late were
correctly predicted. We also obtained dimensions for
the input variables. They are shown below.
Fig. 5. Dimensions of input variables
After performing these steps, the overall
accuracy and other performance measures were
calculated. The results are shown below.
Fig. 6. Performance measures of KNN
The overall accuracy was found out to be 80%
which is calculated by total truly identified values
divided by the total values. Kappa value of 0.0531
was obtained which is not too low.
B. Decision Tree C50
C50 is type of decision tree classification
where the split is made based on the maximum
information gain [21]. Information gain is calculated
as the product of probability of the class and the log
of that probability [22]. C50 has been used because it
helps in identifying the factors and their usage or
contribution affecting the target class. The root node
or the parent node is more affecting than its child
node. The advantage of C50 algorithm is that it can
be applied to any kind of data and saves a lot of
memory. Another advantage is that it can handle
numeric as well as categorized data. In this project,
we categorized some of the factors and then applied
the c50 algorithm on the dataset. Three attributes
amongst all contributed in generating the decision
tree. The usage of these three attributes is shown
below.
Fig. 7. Attribute usage in C50
As you can see, the usage of ‘NAS delay’ was
100%, and that of ‘Weather delay’ and ‘Taxi out’
was 95.93% and 95.92% respectively. A decision
tree was formed consisting of these three attributes.
The decision tree is shown below.
Fig. 8. Decision tree C50
After this, the predictions were made by
calculating the error present in the model. An error
rate of 15% was obtained, i.e., it was 85% accurate.
This result is shown in the image below.
Fig. 9. Error rate of C50
The classification of the data is also shown in
the above image and it can be observed that it has
performed much better than the KNN algorithm.
C. Artificial Neural Networks
Artificial Neural Networks is the processing of
information in a way similar to the processing of
information done by the human brain [23]. They
need not be manually programmed but learns from
the past experience [24]. ANN consists of several
neurons and is made up of three layers; input layer,
hidden layer and output layer. Each neuron is
assigned a weight and added to the other neuron. The
weights of all these neurons are added and then the
result is calculated. In this project, a total of 12
inputs were selected, 5 hidden layers and 1 output
layer and the accuracy was found out to be 79%. The
accuracy was also checked with 2 hidden layers and
77% was obtained and so 5 hidden layers were
selected. The network and accuracy of 5 hidden
layers is shown below.
Fig. 10. Neural Network with 5 hidden layers
Fig. 11. Accuracy of ANN
V. CONCLUSION AND FUTURE WORK
In this project, different classification algorithms like
KNN, C50 and ANN were implemented to predict
flight delay. The results of these algorithms were
compared and C50 was found out to be the best one
with an accuracy of 85%. There were many factors
that were causing the delays in flight. C50 algorithm
showed that NAS delay, Weather delay and Taxi-out
were the features causing flight delay. These models
can be used and applied in real-world scenarios to
make improvisation in the airline industry. In future,
we can try to improve the prediction model to gain
higher accuracy. Further analysis can be done by
identifying the airline company in which the delays
are occurring the most. Also, during which time of
the year the delays are occurring can be identified by
combining the weather-related dataset.
REFERENCES
[1] R. J. Hansman, “Identification , Characterization ,
and Prediction of Traffic Flow Patterns in Multi-
Airport Systems,” pp. 1–14, 2018.
[2] M. Baluch and T. Bergstra, “Complex Analysis of
United States Flight Data Using a Data Mining
Approach,” pp. 1–6, 2017.
[3] F. Bus, “Application of Machine Learning
Algorithms to Predict Flight Arrival Delays,” vol.
00, pp. 3992–3997, 2015.
[4] N. E. Md Isa, A. Amir, M. Z. Ilyas, and M. S.
Razalli, “The Performance Analysis of K-Nearest
Neighbors (K-NN) Algorithm for Motor Imagery
Classification Based on EEG Signal,” MATEC
Web Conf., vol. 140, p. 01024, 2017.
[5] M. S. B. Maind, “Research Paper on Basic of
Artificial Neural Network,” Int. J. Recent Innov.
Trends Comput. Commun., vol. 2, no. 1, pp. 96–
100, 2014.
[6] S. Choi, Y. J. Kim, S. Briceno, and D. Mavris,
“Prediction of weather-induced airline delays
based on machine learning algorithms,”
AIAA/IEEE Digit. Avion. Syst. Conf. - Proc., vol.
2016–December, pp. 1–6, 2016.
[7] Y. Ding, “Predicting flight delay based on
multiple linear regression Predicting flight delay
based on multiple linear regression,” 2017.
[8] M. Balamurugan and S. Kannan, “Performance
Analysis of Cart and C5 . 0 using Sampling
Techniques,” 2016 IEEE Int. Conf. Adv. Comput.
Appl., pp. 72–75, 2016.
[9] G. Costagliola, V. Fuccella, M. Giordano, and G.
Polese, “Monitoring online tests through data
visualization,” IEEE Trans. Knowl. Data Eng.,
vol. 21, no. 6, pp. 773–784, 2009.
[10] A. Guerra-hern, “Explorations of the BDI Multi-
Agent support for the Knowledge Discovery in
Databases Process,” no. January, 2008.
[11] O. Niakšu, “CRISP Data Mining Methodology
Extension for Medical Domain,” Balt. J. Mod.
Comput., vol. 3, no. 2, pp. 92–109, 2015.
[12] S. Choi, Y. J. Kim, S. Briceno, and D. Mavris,
“Cost-sensitive prediction of airline delays using
machine learning,” AIAA/IEEE Digit. Avion. Syst.
Conf. - Proc., vol. 2017–September, 2017.
[13] P. N. Patil, R. Lathi, and V. Chitre, “Comparison
of C5 . 0 & CART Classification algorithms using
pruning technique,” Int. J. Eng. Res. Technol.,
vol. 1, no. 4, pp. 1–5, 2012.
[14] N. Kuhn and N. Jamadagni, “Application of
Machine Learning Algorithms to Predict Flight
Arrival Delays,” pp. 1–6, 2017.
[15] S. B. Imandoust and M. Bolandraftar,
“Application of K-Nearest Neighbor ( KNN )
Approach for Predicting Economic Events :
Theoretical Background,” Int. J. Eng. Res. Appl.,
vol. 3, no. 5, pp. 605–610, 2013.
[16] Q. Li, W. Lei, F. Rong, W. Bin, and X. Hei, “An
analysis method for flight delays based on
Bayesian network,” Proc. 2015 27th Chinese
Control Decis. Conf. CCDC 2015, pp. 2561–
2565, 2015.
[17] P. Chandraa, N. Prabakaran, and R. Kannadasan,
“Airline delay predictions using supervised
machine learning,” Int. J. Pure Appl. Math., vol.
119, no. Special Issue 7A, 2018.
[18] A. Sternberg, J. Soares, D. Carvalho, and E.
Ogasawara, “A Review on Flight Delay
Prediction,” pp. 1–21, 2017.
[19] B. Thiagarajan, L. Srinivasan, A. V. Sharma, D.
Sreekanthan, and V. Vijayaraghavan, “A machine
learning approach for prediction of on-time
performance of flights,” AIAA/IEEE Digit. Avion.
Syst. Conf. - Proc., vol. 2017–September, 2017.
[20] Y. J. Kim, S. Choi, S. Briceno, and D. Mavris, “A
deep learning approach to flight delay prediction,”
AIAA/IEEE Digit. Avion. Syst. Conf. - Proc., vol.
2016–December, pp. 1–6, 2016.
[21] V. Sharma, S. Rai, and A. Dev, “A
Comprehensive Study of Artificial Neural
Networks,” Int. J. Adv. Res. Comput. Sci. Softw.
Eng., vol. 2, no. 10, pp. 278–284, 2012.
[22] U. Shafique and H. Qaiser, “A Comparative Study
of Data Mining Process Models ( KDD , CRISP-
DM and SEMMA ),” Int. J. Innov. Sci. Res., vol.
12, no. 1, pp. 217–222, 2014.
[23] H. Jair et al., “A comparative between CRISP-
DM and SEMMA through the construction of a
MODIS repository for studies of land use and
cover change,” Adv. Sci. Technol. Eng. Syst. J.,
vol. 2, no. 3, pp. 598–604, 2017.
[24] K. Gopalakrishnan and H. Balakrishnan, “A
Comparative Analysis of Models for Predicting
Delays in Air Traffic Networks,” Eur. Air Traffic
Manag. Res. Dev. Semin., 2017.
[25] Transtats.bts.gov. (2018). OST_R | BTS | Transtats.
[online] Available at:
https://www.transtats.bts.gov/DL_SelectFields.asp
?Table_ID=236 [Accessed 3 Aug. 2018].
[26] Data.world. (2018). data.world. [online] Available
at: https://data.world/hoytick/2017-jan-
ontimeflightdata-usa [Accessed 3 Aug. 2018].

More Related Content

What's hot

Presentation on intelligent traffic prediction system
Presentation on intelligent traffic prediction systemPresentation on intelligent traffic prediction system
Presentation on intelligent traffic prediction systemtanzir3
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPTKapil Rode
 
What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?AbhayDhupar
 
Intelligent transport system
Intelligent transport systemIntelligent transport system
Intelligent transport systemvchhajed
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectHaozhe Wang
 
Forecasting airline passengers with designer machine learning
Forecasting airline passengers with designer machine learningForecasting airline passengers with designer machine learning
Forecasting airline passengers with designer machine learningAlexander Backus
 
2015 Flight Delay/Cancellation Analysis
2015 Flight Delay/Cancellation Analysis2015 Flight Delay/Cancellation Analysis
2015 Flight Delay/Cancellation AnalysisSwapnil Patil
 
The Titanic - machine learning from disaster
The Titanic - machine learning from disasterThe Titanic - machine learning from disaster
The Titanic - machine learning from disasterMostafa Nizam
 
Air Travel Analytics in SAS
Air Travel Analytics in SASAir Travel Analytics in SAS
Air Travel Analytics in SASRohan Nanda
 
Real time information systems in Transportation
Real time information systems in TransportationReal time information systems in Transportation
Real time information systems in TransportationAravind Samala
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
 
Implementation of Data Mining Techniques for Meteorological Data Analysis
Implementation of Data Mining Techniques for Meteorological Data Analysis Implementation of Data Mining Techniques for Meteorological Data Analysis
Implementation of Data Mining Techniques for Meteorological Data Analysis Arofiah Hidayati
 
Airport Collaborative Decision Making
Airport Collaborative Decision Making Airport Collaborative Decision Making
Airport Collaborative Decision Making Grafic.guru
 
Analyzing Titanic Disaster using Machine Learning Algorithms
Analyzing Titanic Disaster using Machine Learning AlgorithmsAnalyzing Titanic Disaster using Machine Learning Algorithms
Analyzing Titanic Disaster using Machine Learning Algorithmsijtsrd
 
2da Edicion Boletin Seguridad Operacional del IDAC
2da Edicion Boletin Seguridad Operacional del IDAC2da Edicion Boletin Seguridad Operacional del IDAC
2da Edicion Boletin Seguridad Operacional del IDACEddian Méndez
 

What's hot (20)

Flight Delay Prediction
Flight Delay PredictionFlight Delay Prediction
Flight Delay Prediction
 
Presentation on intelligent traffic prediction system
Presentation on intelligent traffic prediction systemPresentation on intelligent traffic prediction system
Presentation on intelligent traffic prediction system
 
Big Data For Flight Delay Report
Big Data For Flight Delay ReportBig Data For Flight Delay Report
Big Data For Flight Delay Report
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
 
What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?
 
Intelligent transport system
Intelligent transport systemIntelligent transport system
Intelligent transport system
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining Project
 
Forecasting airline passengers with designer machine learning
Forecasting airline passengers with designer machine learningForecasting airline passengers with designer machine learning
Forecasting airline passengers with designer machine learning
 
2015 Flight Delay/Cancellation Analysis
2015 Flight Delay/Cancellation Analysis2015 Flight Delay/Cancellation Analysis
2015 Flight Delay/Cancellation Analysis
 
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPTBIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
 
The Titanic - machine learning from disaster
The Titanic - machine learning from disasterThe Titanic - machine learning from disaster
The Titanic - machine learning from disaster
 
Air Travel Analytics in SAS
Air Travel Analytics in SASAir Travel Analytics in SAS
Air Travel Analytics in SAS
 
Real time information systems in Transportation
Real time information systems in TransportationReal time information systems in Transportation
Real time information systems in Transportation
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
 
Implementation of Data Mining Techniques for Meteorological Data Analysis
Implementation of Data Mining Techniques for Meteorological Data Analysis Implementation of Data Mining Techniques for Meteorological Data Analysis
Implementation of Data Mining Techniques for Meteorological Data Analysis
 
Airport Collaborative Decision Making
Airport Collaborative Decision Making Airport Collaborative Decision Making
Airport Collaborative Decision Making
 
Analyzing Titanic Disaster using Machine Learning Algorithms
Analyzing Titanic Disaster using Machine Learning AlgorithmsAnalyzing Titanic Disaster using Machine Learning Algorithms
Analyzing Titanic Disaster using Machine Learning Algorithms
 
Airline operations and management
Airline operations and managementAirline operations and management
Airline operations and management
 
2da Edicion Boletin Seguridad Operacional del IDAC
2da Edicion Boletin Seguridad Operacional del IDAC2da Edicion Boletin Seguridad Operacional del IDAC
2da Edicion Boletin Seguridad Operacional del IDAC
 

Similar to Flight delay detection data mining project

PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxPRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxMUSAIDRIS15
 
A Review On Flight Delay Prediction
A Review On Flight Delay PredictionA Review On Flight Delay Prediction
A Review On Flight Delay PredictionTiffany Daniels
 
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAY
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYMACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAY
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYIJDKP
 
Aircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine LearningAircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine LearningChristine Williams
 
Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...
Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...
Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...Kavika Roy
 
Random Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectRandom Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectSaurabh Kale
 
The Internet of Flying Things - Overview
The Internet of Flying Things - OverviewThe Internet of Flying Things - Overview
The Internet of Flying Things - OverviewMichael Denis
 
ITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdfITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdfmustafe39
 
Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...IRJET Journal
 
R Michaels - Analysis of the Effects of Automation for GMC
R Michaels - Analysis of the Effects of Automation for GMCR Michaels - Analysis of the Effects of Automation for GMC
R Michaels - Analysis of the Effects of Automation for GMCRobert Michaels
 
Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013
Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013
Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013SITA
 
INFORMS AAS Newsletter Spring 2013 - Copy
INFORMS AAS Newsletter Spring 2013 - CopyINFORMS AAS Newsletter Spring 2013 - Copy
INFORMS AAS Newsletter Spring 2013 - CopyBenjamin Levy
 
Application of Data Science in the Airline industry
Application of Data Science in the Airline industryApplication of Data Science in the Airline industry
Application of Data Science in the Airline industryEshaNair4
 
IRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysisIRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysisIRJET Journal
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delayiDTechTechnologies
 
IRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute ResolutionIRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute ResolutionIRJET Journal
 
20558-38937-1-PB.pdf
20558-38937-1-PB.pdf20558-38937-1-PB.pdf
20558-38937-1-PB.pdfIjictTeam
 
Airport information systems_airside_mana
Airport information systems_airside_manaAirport information systems_airside_mana
Airport information systems_airside_manamvks rao
 

Similar to Flight delay detection data mining project (20)

PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxPRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
 
A Review On Flight Delay Prediction
A Review On Flight Delay PredictionA Review On Flight Delay Prediction
A Review On Flight Delay Prediction
 
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAY
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYMACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAY
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAY
 
Aircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine LearningAircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine Learning
 
Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...
Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...
Data Warehousing Fuses With Data Visualization To Solve Key Problems of Enter...
 
Airline operational management
Airline operational managementAirline operational management
Airline operational management
 
Random Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectRandom Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics Project
 
The Internet of Flying Things - Overview
The Internet of Flying Things - OverviewThe Internet of Flying Things - Overview
The Internet of Flying Things - Overview
 
ITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdfITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdf
 
Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...
 
R Michaels - Analysis of the Effects of Automation for GMC
R Michaels - Analysis of the Effects of Automation for GMCR Michaels - Analysis of the Effects of Automation for GMC
R Michaels - Analysis of the Effects of Automation for GMC
 
Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013
Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013
Jeff Poole, CANSO, SITA Europe Aviation ICT Forum 2013
 
INFORMS AAS Newsletter Spring 2013 - Copy
INFORMS AAS Newsletter Spring 2013 - CopyINFORMS AAS Newsletter Spring 2013 - Copy
INFORMS AAS Newsletter Spring 2013 - Copy
 
Application of Data Science in the Airline industry
Application of Data Science in the Airline industryApplication of Data Science in the Airline industry
Application of Data Science in the Airline industry
 
IRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysisIRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysis
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delay
 
MarketIS Pres
MarketIS PresMarketIS Pres
MarketIS Pres
 
IRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute ResolutionIRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Accident Information Mining and Insurance Dispute Resolution
 
20558-38937-1-PB.pdf
20558-38937-1-PB.pdf20558-38937-1-PB.pdf
20558-38937-1-PB.pdf
 
Airport information systems_airside_mana
Airport information systems_airside_manaAirport information systems_airside_mana
Airport information systems_airside_mana
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Flight delay detection data mining project

  • 1. Flight Delay Prediction using Data Mining Abstract— Airplane industry is growing fast these days as it has become the favorite mode of transport for most people because they are finding it cheap and is faster than other modes. However, like other modes, it also has some negative aspects or has some disadvantages. Due to growing traffic in the airline industry and many more reasons, the flights are getting delayed and causing inconvenience to the customers. It has cost millions of dollars in United States in the recent years. It has also affected many transportation companies. To get rid of this problem, it is necessary to find out the factors causing the delays in flight. Classification has been proved to be effective in many fields for solving different problems. Various classification algorithms are applied like K-Nearest Neighbors, Decision Tree C50 and Artificial Neural Networks. The performance of these algorithms is compared and decision tree c50 turns out to be the best algorithm with an overall accuracy of 85%. Keywords— Classification, Data Mining, K-Nearest Neighbors, C5.0, Artificial Neural Networks. I. INTRODUCTION With the increase in population, the use of vehicles and transportation is also increasing. Eventually, traffic has increased which is causing lot of problems. This results in the wastage of time of lot of people. People going to office or important meeting faces problem of reaching on time. For saving time, now-a-days, people are preferring airline as a transport. However, air-traffic has also been increased these days which results in delays of the airplane. There are a number of reasons that are causing delays in flight, eventually, affecting many things. Bureau of Transportation Statistics (BTS) states that, there are 5 major reasons behind the flights getting delayed, such as late aircraft, weather, carrier, NAS and security [1]. Around 18% flights were delayed and 1.5% of flights were cancelled in the year 2015 in United States. It costs billions of dollars in the airline industry each year in the United States (around 40 billion each year). Not only the industry, but even the customers are affected due to it. 800 million of a total of 7 billion people travel in the United States each year [2]. Many people may have an emergency and need to reach to a place as early as possible. It would be very inconvenient for that person to opt for an alternative during such time as he will need to reach fast. Also, transportation industry is badly affected resulting in a great loss. Therefore, it is very much necessary to stop or at least reduce the affects flight delays has or to reduce the number of flight delays. However, we first need to find out the factors or the reasons resulting in the delays of flight. For solving the problem of flight delay, a dataset has been used which consists of information about the flights in the United States. This motivates us to formulate a research question and a probable solution for it. The research question is ‘What are the factors the causes flight arrival delays in the United States?’ and our objective is to find out these factors using Data Mining techniques. Different data mining classification techniques have been used such as K- Nearest Neighbors (KNN), Decision Tree C50 and Artificial Neural Networks (ANN). This data mining technique was applied on two different datasets. These datasets consist the details of flights of the US is taken for this paper. One is of January 2017 and other of January 2018. First dataset is taken from ‘Data World’ and other from ‘Transtats’ [25][26]. These datasets were combined, and the combined dataset consisted of around a million of flights and has around 60 features of the flights and their delays. These features are categorized into details of flights, details of the source and destination, schedule of the flights, reasons for delays, amount of delays, etc. • Details of the flights like Carrier, Tail Number, Airline. • Details of the source and destination like source airport, source city, source state, destination airport, destination city, destination state. • Schedule of the flights like the year, month, day of the week, day of the month, departure time, arrival time. • Reasons for delays like weather delay, NAS delay, carrier delay, security delay, late aircraft delay. • Amount of delays in minutes for both, departure and arrival delays. • Many more details like wheels-on time, wheels-off time, taxi-in time, taxi-out time, whether the flight was cancelled or diverted, distance travelled, etc. However, there were many more attributes which were not taken into consideration for solving the research question. The main focus was on the arrival delay so the arrival delay in minutes was the target variable.
  • 2. II. RELATED WORK From the past twenty years, because of the most convenient and less time-consuming mode of transportation i.e. air travel is gaining popularity. But the increase of number of flights also create air traffic results into flight delay while applying the classification machine learning algorithm it is found that departure delay, taxi-out time, origin of flight gains the most important score [3]. The delays in the flights staggeringly affects the airline industry because it cost airline industry itself, customer, economy of country millions of dollars per month. The reasons of delays can form macroscopic level to microscopic level. Costing and supervised machine learning algorithm has been applied to find cost sensitive classifier and to predict the flight delays. The performance evaluation is done based on cost ratio [4]. The bureau of transportation statistics provides the Airlines data of united states. It gives the detailed information of flights routes, timings, carrier, types of delays, etc.. With the help of regression analysis technique by regularization method it will predict the flight delay in minutes. With this it will also give the statistical description of the individual airline and presents which hours are the busiest [5]. On this dataset various research is still going on to recommend customer about the flight delays. To analyze the flight delay, we need to check every aspect that are causing the issue. Like Airport, Route of flights, Airlines, etc. one of them or different combinations of these parameter should be taken into consideration while analyzing the delays. To make prediction better and recommending the best performance evaluation, the results are grouped into five parts. Statistical models, probabilistic models, network representation, operation research, machine learning will used to forecast the flight delays more accurately [6]. To know does the airport business matters? for that we need to check which airports has the maximum number of flights departed and arrived. For this SQL business intelligence tool was used. This tool also presents the visuals and give statistical answer like is the flight delayed when it departed? This study presents there is a co-relation within day of the month, month and departure delay [7]. But this model presents the visuals by performing clustering algorithm on the area of interest and with help of this tool we cannot calculate accuracy percentage. Two airports are connected by certain routes, that could create problem in on-time flight arrival. If the one flight is delayed on a certain route, then the successive flights will also get delayed because of this flight. The current delayed flight can affect badly on all the scheduled flights on that route, the chain reaction will happen [8]. To solve this problem Bayesian network can help to know which factors are influencing the flight delays [9]. While taking other important parameter into consideration only weather is not lone responsible for delays. Some research is made on assumption on the weather condition and flights en-route are most important factor while analyzing the flight delays [10]. But the other parameters are fairly related to weather conditions. Flight delays due to weather condition shares 40% of the total delays [11]. The historical weather data has been added to show better performance. By applying the naïve Bayes and C4.5, classify the two classes which is non-delayed and delay above 30 minutes. It found that naïve Bayes shows the better performance than the C4.5 [12]. The other parameters like time of the day, day of the week, type of the hour, season might influence the flight delays. The day of the week like is it weekend, or week day shows business of the airport. To classify and predict this, several operations were performed like Artificial Neural Network (ANN), Classification and Regression Tree (CART), Markov Jump Linear System (MJLS). The consistency of delays and corelated network are analyzed to determine the delays in the airport. All the three machine learning algorithm model gave different accuracy. ANN performed best to show classification of the origin-destination pair. On contrast, origin-destination pair regression was best fitted on Markov Jump Linear System. This study can help to manage the air-traffic [2]. Two stages are created to perform, binary classification and then prediction by regression. Within some major performed machine learning algorithm, Gradient boosting classifier and Gradient boosting regressor presents the best results. This model is built in such way that it can easily associate with user interface. This interface helps the passenger to gain prior knowledge about the delay in the time of the flight the passenger is boarding [1]. Day by day air travel is the most preferable mode of transportation. Almost all the cities are interconnected by flights which creates air traffic congestion. Now controlling this air traffic is also complex task because it creates great façade in flight delay. To solve this problem metroplex city, New York was chosen. New York city’s airport has served more than 100 million passengers. A multi- layer clustering is applied to know the spatial patterns in air-traffic. And by using random forest a multi-way classification is build [13].
  • 3. III. METHODOLOGY There are several methodologies that can be used for performing the data mining techniques such as CRISP-DM (CRoss Industry Standard Process for Data Mining), KDD (Knowledge Discovery in Databases), SEMMA (Sample, Explore, Modify, Model and Access), etc. CRISP-DM is six-phase sequential process model that is hierarchical and iterative and provides an extendable framework [14]. SEMMA is also an iterative model where the internal procedures are iteratively run until the goal is achieved [15]. Both of these models are somewhat similar but are slightly different with respect to tasks, activities, phases, etc. [16]. However, the method that has been in our project is the KDD because it is easy, complete and more accurate. As the name suggests, ‘Knowledge Discovery in Databases’ is a process of extracting important and useful hidden knowledge or information from the databases or the available data. A simple diagram describing the KDD process is given below. Fig. 1. KDD Process [17] KDD is a nine-step model. 1. Understanding the domain, i.e., identifying the target or what is to be achieved. In this project, the target is to identify the factors that are causing the delays in the flights, as mentioned in the research problem. For this, a background knowledge of the problem is required to be understood to decide the resources that can be used for solving the problem. 2. Selecting the subset of variables, i.e., the resources to be used for solving the question. For this, a dataset of the details for the flight and the reasons for flight delays was taken into consideration, as mentioned above. This was required so that the discovery can be performed on it which can help us in identifying the target we want to achieve. 3. Pre-processing of data, i.e., dealing with the dirty data. Dirty data is very harmful to work upon because it does not give us accurate results. The quality of the results is disturbed which misleads us to some wrong information. This includes removing of the noisy data, replacing of the missing values, etc. In this project, rows containing the dirty data was removed and the missing values were replaced with zeros. 4. Reducing the data, i.e., considering only those attributes that can contribute to the target variable. Taking into account some useless features can also disturb the performance of model or gives inappropriate or wrong results, i.e., it misleads us. In this project, some useless attributes such as the distance group, the date of the flight, airline ID, etc. were not considered because it had nothing to do with the target variable, i.e., the amount of time for which the flights are delayed or the factors affecting the delays. Another part of this step is the transformation of the data, i.e., converting the data into appropriate format. Categorizing the type of data is an example of transformation of the data. Many of the attributes were categorized, for example, arrival delay in minutes and departure delay in minutes was categorized as early, on-time, late and very late. Airport was categorized by the frequency of flights as less busy, medium busy and high busy. Distance travelled by flight was categorized into short distance, medium distance and long distance. Week was categorized into weekdays and weekends. Month was categorized into first half and second half. Some part of data was removed. For example, the flights that were cancelled were not taken into consideration. Similarly, the flights that were diverted were not taken into
  • 4. consideration. This is because the if the flights were cancelled or diverted, there was no question of the flights being arriving on- time or being delayed. Flights that were departed less than 5 minutes late were assumed to be departed on time. Only the top 4 origin airport were considered because categorizing all the origin airport was not possible and many more data were removed. The top 4 origins were found out using the ‘Tableau’ visualization tool. The result is shown below. Fig. 2. Origin airports 5. Selecting the data mining procedures, i.e., the type of model that is to be constructed or developed. These can be of different types like classification, regression, analysis, clustering, etc. depending the goal of the domain. In this project, classification was performed. Classification was used for identifying the factors that are most affecting or causing the flight delays. 6. Data mining algorithm, i.e., the technique of the procedures that is to be applied to get the results, which in this case, the classification algorithm that will be applied. K-Nearest Neighbours (KNN), Artificial Neural Networks (ANN) and Decision Tree C50 were used. However, the accuracy of all the models will be calculated and the results will be compared to decide the best algorithm. The detailed explanation of the algorithms is mentioned in the next section. 7. Searching for patterns, i.e., extracting the hidden patterns present in the output of the classification like the factors displayed, or the trees designed, or the network created. However, it is also required to interpret the results from the obtained graphs. 8. Interpreting the results, i.e., understanding the patterns and extracting some important information from the graphs as mentioned above. These are the final results which were aimed in the first step of the process. Some of the above steps can be iteratively reperformed to get better results or understanding them in a better way. In this project, the obtained results are compared to identify the best algorithm amongst all. 9. Consolidating the knowledge, i.e., strengthening the output and results by applying it at the right place or forwarding it to the required area. In this project, the acquired results can be applied in the real- world scenario in the airline industry to prevent the problem that is occurring or being faced by the people [18]. IV. EVALUATION AND RESULTS Various classification algorithms were applied on the dataset as mentioned above. A. K-Nearest Neighbours KNN is a simple supervised non-parametric model in which a sample input is classified into a class depending upon which class is common amongst the nearest neighbors. The nearness of the neighbors is decided by calculating the distance between them. ‘K’ number of neighbors are present within a certain distance [19]. This ‘K’ value should be such that it is appropriate for the model and gives the minimum error. Smaller the ‘K’ value, poor the estimation, bigger the value, smoother and better the estimation. In this project, various ‘K’ values were calculated. After performing a number of combinations, we found out that K=31 was fitting best for our model [20]. KNN has been used because it gives all the factors that are logically nearer to the target variables or more affecting or deciding variables.
  • 5. Fig. 3. K-values From the above image, it can be seen that the error for K=31 was minimum and so it was finalized. Then, the confusion matrix was generated to calculate the accuracy of the model. Fig. 4. Confusion Matrix of KNN From the above image, it can be seen that out of all the flights that arrived early, 29513 were correctly predicted, 82 of late and 118 of very late were correctly predicted. We also obtained dimensions for the input variables. They are shown below. Fig. 5. Dimensions of input variables After performing these steps, the overall accuracy and other performance measures were calculated. The results are shown below. Fig. 6. Performance measures of KNN The overall accuracy was found out to be 80% which is calculated by total truly identified values divided by the total values. Kappa value of 0.0531 was obtained which is not too low. B. Decision Tree C50 C50 is type of decision tree classification where the split is made based on the maximum information gain [21]. Information gain is calculated as the product of probability of the class and the log
  • 6. of that probability [22]. C50 has been used because it helps in identifying the factors and their usage or contribution affecting the target class. The root node or the parent node is more affecting than its child node. The advantage of C50 algorithm is that it can be applied to any kind of data and saves a lot of memory. Another advantage is that it can handle numeric as well as categorized data. In this project, we categorized some of the factors and then applied the c50 algorithm on the dataset. Three attributes amongst all contributed in generating the decision tree. The usage of these three attributes is shown below. Fig. 7. Attribute usage in C50 As you can see, the usage of ‘NAS delay’ was 100%, and that of ‘Weather delay’ and ‘Taxi out’ was 95.93% and 95.92% respectively. A decision tree was formed consisting of these three attributes. The decision tree is shown below. Fig. 8. Decision tree C50 After this, the predictions were made by calculating the error present in the model. An error rate of 15% was obtained, i.e., it was 85% accurate. This result is shown in the image below. Fig. 9. Error rate of C50 The classification of the data is also shown in the above image and it can be observed that it has performed much better than the KNN algorithm. C. Artificial Neural Networks Artificial Neural Networks is the processing of information in a way similar to the processing of information done by the human brain [23]. They need not be manually programmed but learns from the past experience [24]. ANN consists of several neurons and is made up of three layers; input layer, hidden layer and output layer. Each neuron is assigned a weight and added to the other neuron. The weights of all these neurons are added and then the result is calculated. In this project, a total of 12 inputs were selected, 5 hidden layers and 1 output layer and the accuracy was found out to be 79%. The accuracy was also checked with 2 hidden layers and 77% was obtained and so 5 hidden layers were selected. The network and accuracy of 5 hidden layers is shown below.
  • 7. Fig. 10. Neural Network with 5 hidden layers Fig. 11. Accuracy of ANN V. CONCLUSION AND FUTURE WORK In this project, different classification algorithms like KNN, C50 and ANN were implemented to predict flight delay. The results of these algorithms were compared and C50 was found out to be the best one with an accuracy of 85%. There were many factors that were causing the delays in flight. C50 algorithm showed that NAS delay, Weather delay and Taxi-out were the features causing flight delay. These models can be used and applied in real-world scenarios to make improvisation in the airline industry. In future, we can try to improve the prediction model to gain higher accuracy. Further analysis can be done by identifying the airline company in which the delays are occurring the most. Also, during which time of the year the delays are occurring can be identified by combining the weather-related dataset. REFERENCES [1] R. J. Hansman, “Identification , Characterization , and Prediction of Traffic Flow Patterns in Multi- Airport Systems,” pp. 1–14, 2018. [2] M. Baluch and T. Bergstra, “Complex Analysis of United States Flight Data Using a Data Mining Approach,” pp. 1–6, 2017. [3] F. Bus, “Application of Machine Learning Algorithms to Predict Flight Arrival Delays,” vol. 00, pp. 3992–3997, 2015. [4] N. E. Md Isa, A. Amir, M. Z. Ilyas, and M. S. Razalli, “The Performance Analysis of K-Nearest Neighbors (K-NN) Algorithm for Motor Imagery Classification Based on EEG Signal,” MATEC Web Conf., vol. 140, p. 01024, 2017. [5] M. S. B. Maind, “Research Paper on Basic of Artificial Neural Network,” Int. J. Recent Innov. Trends Comput. Commun., vol. 2, no. 1, pp. 96– 100, 2014. [6] S. Choi, Y. J. Kim, S. Briceno, and D. Mavris, “Prediction of weather-induced airline delays based on machine learning algorithms,” AIAA/IEEE Digit. Avion. Syst. Conf. - Proc., vol. 2016–December, pp. 1–6, 2016. [7] Y. Ding, “Predicting flight delay based on multiple linear regression Predicting flight delay based on multiple linear regression,” 2017. [8] M. Balamurugan and S. Kannan, “Performance Analysis of Cart and C5 . 0 using Sampling Techniques,” 2016 IEEE Int. Conf. Adv. Comput. Appl., pp. 72–75, 2016. [9] G. Costagliola, V. Fuccella, M. Giordano, and G. Polese, “Monitoring online tests through data visualization,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 6, pp. 773–784, 2009. [10] A. Guerra-hern, “Explorations of the BDI Multi- Agent support for the Knowledge Discovery in Databases Process,” no. January, 2008. [11] O. Niakšu, “CRISP Data Mining Methodology Extension for Medical Domain,” Balt. J. Mod. Comput., vol. 3, no. 2, pp. 92–109, 2015. [12] S. Choi, Y. J. Kim, S. Briceno, and D. Mavris, “Cost-sensitive prediction of airline delays using machine learning,” AIAA/IEEE Digit. Avion. Syst. Conf. - Proc., vol. 2017–September, 2017. [13] P. N. Patil, R. Lathi, and V. Chitre, “Comparison of C5 . 0 & CART Classification algorithms using pruning technique,” Int. J. Eng. Res. Technol., vol. 1, no. 4, pp. 1–5, 2012. [14] N. Kuhn and N. Jamadagni, “Application of Machine Learning Algorithms to Predict Flight Arrival Delays,” pp. 1–6, 2017. [15] S. B. Imandoust and M. Bolandraftar, “Application of K-Nearest Neighbor ( KNN ) Approach for Predicting Economic Events : Theoretical Background,” Int. J. Eng. Res. Appl., vol. 3, no. 5, pp. 605–610, 2013. [16] Q. Li, W. Lei, F. Rong, W. Bin, and X. Hei, “An analysis method for flight delays based on
  • 8. Bayesian network,” Proc. 2015 27th Chinese Control Decis. Conf. CCDC 2015, pp. 2561– 2565, 2015. [17] P. Chandraa, N. Prabakaran, and R. Kannadasan, “Airline delay predictions using supervised machine learning,” Int. J. Pure Appl. Math., vol. 119, no. Special Issue 7A, 2018. [18] A. Sternberg, J. Soares, D. Carvalho, and E. Ogasawara, “A Review on Flight Delay Prediction,” pp. 1–21, 2017. [19] B. Thiagarajan, L. Srinivasan, A. V. Sharma, D. Sreekanthan, and V. Vijayaraghavan, “A machine learning approach for prediction of on-time performance of flights,” AIAA/IEEE Digit. Avion. Syst. Conf. - Proc., vol. 2017–September, 2017. [20] Y. J. Kim, S. Choi, S. Briceno, and D. Mavris, “A deep learning approach to flight delay prediction,” AIAA/IEEE Digit. Avion. Syst. Conf. - Proc., vol. 2016–December, pp. 1–6, 2016. [21] V. Sharma, S. Rai, and A. Dev, “A Comprehensive Study of Artificial Neural Networks,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 2, no. 10, pp. 278–284, 2012. [22] U. Shafique and H. Qaiser, “A Comparative Study of Data Mining Process Models ( KDD , CRISP- DM and SEMMA ),” Int. J. Innov. Sci. Res., vol. 12, no. 1, pp. 217–222, 2014. [23] H. Jair et al., “A comparative between CRISP- DM and SEMMA through the construction of a MODIS repository for studies of land use and cover change,” Adv. Sci. Technol. Eng. Syst. J., vol. 2, no. 3, pp. 598–604, 2017. [24] K. Gopalakrishnan and H. Balakrishnan, “A Comparative Analysis of Models for Predicting Delays in Air Traffic Networks,” Eur. Air Traffic Manag. Res. Dev. Semin., 2017. [25] Transtats.bts.gov. (2018). OST_R | BTS | Transtats. [online] Available at: https://www.transtats.bts.gov/DL_SelectFields.asp ?Table_ID=236 [Accessed 3 Aug. 2018]. [26] Data.world. (2018). data.world. [online] Available at: https://data.world/hoytick/2017-jan- ontimeflightdata-usa [Accessed 3 Aug. 2018].