SlideShare a Scribd company logo
1 of 14
Download to read offline
Engineering Analytics Saurabh Kale
P a g e 1 | 14
FLIGHT DELAY PREDICTION AND
VISUALIZATION
ENGINEERING ANALYTICS COURSE
PROJECT
FINAL RESEARCH PAPER
Project By- Saurabh Kale
Project Advisor- Dr. Ying Lin
Engineering Analytics Saurabh Kale
P a g e 2 | 14
ABSTRACT
Air Travel is very common in the USA and it is very important for the traveler to choose a flight which is
cheap and reliable. It is the fastest way to get from A to B. There are risks associated with it, but Aircraft
manufacturers and Airlines are doing a great job of ensuring passenger safety. While travelling, often,
delay is a very important consideration for travelers, especially for people travelling for business and
professionals. It is possible to predict with certain confidence how much a flight is going to be delayed
by depending on certain factors which have been discussed later in the paper.
The data for this study was downloaded from Kaggle Website and the actual source of the data is
Department of Transportation’s Bureau of Transportation Statistics website. The data has more than 5
million rows and 31 columns. Some of the columns are not required for this study and they have been
removed in data preprocessing. Delay values for canceled flights will be outliers if considered which is
why all rows with cancelled flights have been removed in the downloaded dataset. There are obvious
correlations for some columns. Correlation plots have been attached further in the report.
Engineering Analytics Saurabh Kale
P a g e 3 | 14
METHODOLOGY
The scope of the project is to fit a model predicting whether a flight is delayed or not based on certain
features available in the Flights dataset made available by Bureau of Transportation Statistics. The
application of such a study would be a prediction model based online dashboard which takes input from
the user (Flyer) about attributes such as Origin Airport, Destination Airport, Airline, Time of Day, Day of
Week, Day of Month etc. and predicts the time duration a certain flight will be delayed by. This may be
a tool to decide which airline to fly on or what time of the day is best when the delays are minimum.
Because of high model complexity, the project scope was cut down. Instead of the model predicting the
time duration a flight is delayed by, the model predicts a binary response which is whether a certain
flight is delayed or not.
The number of unique factor values for airports was found to be ~325. This made it difficult or even
impossible to fit a CART model to the data. To solve this issue, the airport factor variables were converted
to latitude and longitude numeric variables. This made it easy for these features to be included in the
model. Intuitively thinking, Origin and Destination airports features may be the most important features
to this model because average number of passengers per flight would be maximum at cities like New
York, Chicago, Houston, Los Angeles and other big cities. Another reason why Origin and Destination
airport may be important is because these may be weather related delays at certain locations. Other
important features which are Flight Duration and Distance have correlation with Origin and Destination
and these features have been dropped from the study but may later be considered after evaluation.
Factor values for Airline Variable have been encoded and added as columns and have a ‘0’ or ‘1’ response
depending on which Airline was the Air Carrier in a certain Row.
To sum it up, the features that have been considered in the model are Month, Day of Month, Weekday,
Airline, Origin (as Coordinates) and Destination (as Coordinates). Delay is the binary response which is
being predicted.
Features Considered in the Model
1. Day of Month
2. Day of Week
3. Month
4. Airline
5. Origin Latitude
6. Origin Longitude
7. Destination Latitude
8. Destination Longitude
9. Time of Flight Departure (Scheduled Departure Time)
Engineering Analytics Saurabh Kale
P a g e 4 | 14
ISSUES WITH DATA
Data downloaded from internet always has anomalies (Missing Values, Outliers etc.) associated with it.
In this case, Airport Names data had both numeric and alphabetic codes. This meant that either numeric
or alphabetic data needed to be substituted for the other. This was accomplished in Excel by creating
lookup tables. Data was then imported into R-Studio and numeric codes were replaced with 3-Letter
Alphabet codes.
Model Proposed
• Random forests or random decision forests are an ensemble learning method for classification
and regression that operate by constructing a multitude of decision trees at training time and
outputting the class that is the mode of the classes or mean prediction of the individual trees.
• Parameters in Random Forest Function
- Node Size – Set too high causes small trees.
- nTree – Number of Trees to be grown (Does not lead to overfitting- Advantage of Random
Forests)
- Mtry – Number of Features selected in an iteration random (mtry < p where p is number
of features in the model).
Engineering Analytics Saurabh Kale
P a g e 5 | 14
VARIABLE SELECTION
The following plots will help to validate selection of these variables-
Image 1
The maximum and minimum values for Delay by Day of Week can explain whether flight delay.
Image 2
The maximum and minimum values for Delay by Day of Month can explain whether flight delay.
Engineering Analytics Saurabh Kale
P a g e 6 | 14
Image 3
The maximum and minimum values for delay by Month can explain whether a flight will be delayed.
Image 4
The following 3 plots explain selection of Airline as a feature.
Clearly, Airline, as a feature, will be able to explain whether a flight will be delayed.
Engineering Analytics Saurabh Kale
P a g e 7 | 14
Image 5
Image 6
As it is clear from the above plot, delay increases between 0000 hours and 0500 hours. This model was
later included in the model but was not included in the first set of features.
Engineering Analytics Saurabh Kale
P a g e 8 | 14
Image 7
Flight Time is the Air Time for a certain flight. This feature is correlated with the Origin and Destination
in the sense that the farther apart Origin and Destination is, the more time it will take to reach from
Origin to Destination. This feature has not been considered in the model.
Image 8
This is an informative plot of delay by state. This in important because four features in our model are
coordinates of Airport Locations. This plot captures why those four features are important.
Engineering Analytics Saurabh Kale
P a g e 9 | 14
CORRELATIONS
The following plots show correlations between variables and form the basis for decision to not select
these variables in the model.
Plot 1
X-Sch_Air_Time VS Y-Elapsed_Time (Plot 1)
Plot 2
X-Sch-Air_Time VS Y-Distance (Plot 2)
Engineering Analytics Saurabh Kale
P a g e 10 | 14
Plot 3
X-Sch_Air_Time VS Y-Actual_Air_Time (Plot 3)
There are more correlation plots for other variables such as Scheduled Departure Time, Actual Departure
Time and Wheels off Time.
These have not been attached in the report because they are very similar to the plots above.
MODEL ASSESSMENT AND EVALUATION
Based on raw data, the Binary Delay Classification is as follows-
Delay Frequency Percentages
0 3607308 62.92%
1 2125618 37.07%
Total 5732926 100.00%
The model used to fit this data was RandomForest.
Call to the function is as follows-
RFGeoSpa1 <- randomForest(Delay ~., data = Train1, ntree = 400, mtry = 15 ,nodesize =1)
The ROC Curve is as follows-
Engineering Analytics Saurabh Kale
P a g e 11 | 14
For a cutoff level of 0.45, the TPR rate was found to be 76.09 % and TNR was found to be 51.60%
0 1
0 18141 6748
1 5699 7197
Accuracy of the Model is 67.05 %.
LESSONS LEARNT
Although Random Forests give very good prediction results considering that data is noise free, one model
should not be relied upon. Multiple models must be built and compared with each other to validate the
other model’s accuracy. This not only validates the models that are built, but also leads to thought
process of deciding why a certain model could not perform better than the other.
The fitted Random Forest model for this study may or may not be performing at the most optimal level
with the data provided to it. There may be more variables required such as Type of Aircraft, Number of
Passengers, Number of Support Staff etc. The addition of these variables may lead to better model
performance.
Engineering Analytics Saurabh Kale
P a g e 12 | 14
ERRORS IN PRESENTATION-
The importance of variables when fitting a Random Forest model should only be considered or evaluated
using “Importance” function in R when dealing with Regression using Random Forests, not when
classification is being performed. This was an error in the presentation and this is an attempt to correct
the mistake.
USE OF SOFTWARE
1. R-Package
2. Excel and Excel PowerMap
3. Tableau
Engineering Analytics Saurabh Kale
P a g e 13 | 14
REFERENCES
1. http://kellyjclifton.com/Research/EconImpactsofBicycling/OTRECReport-
ConsBehavTravelChoices_Nov2012.pdf
2. An Introduction to Statistical Learning
3. with Applications in R
4. Class Notes- Engineering Analytics – Dr. Lin
5. https://stackoverflow.com/
6. https://stackexchange.com/
7. https://www.bts.gov/
8. https://www.kaggle.com/datasets
9. https://www.rdocumentation.org/
Engineering Analytics Saurabh Kale
P a g e 14 | 14
APPENDIX-
Corelations-
<- FlightDF[sample(nrow(FlightDF), 5000000 , replace = FALSE, prob =NULL),]
> cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$LATE_AIRCRAFT_DELAY, use = "na.or.complete", method = c("pearson","kendall","s
pearman"))
[1] -0.02135492
>
> cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$AIRLINE_DELAY, use = "na.or.complete", method = c("pearson","kendall","spearman
"))
[1] -0.05103192
>
> cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$SECURITY_DELAY, use = "na.or.complete", method = c("pearson","kendall","spearm
an"))
[1] -0.004781347
>
> cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$AIR_SYSTEM_DELAY, use = "na.or.complete", method = c("pearson","kendall","spea
rman"))
[1] -0.0005082514
>
>
> cor(FlightsDataSample1$SCHEDULED_TIME, FlightsDataSample1$ELAPSED_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman
"))
[1] 0.9852726
>
> cor(FlightsDataSample1$SCHEDULED_TIME, FlightsDataSample1$AIR_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman"))
[1] 0.9907503
>
> cor(FlightsDataSample1$ELAPSED_TIME, FlightsDataSample1$ELAPSED_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman"))
[1] 1
>
> cor(FlightsDataSample1$DISTANCE, FlightsDataSample1$AIR_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman"))
[1] 0.9856394
>
> cor(FlightsDataSample1$DISTANCE, FlightsDataSample1$SCHEDULED_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman"))
[1] 0.9843424
DATA FOR MODEL
The dataset is ~550 MB in size and will be presented to course instructor on request.
The code can be sent to instructor on request.

More Related Content

What's hot

Evaluation of Expression in Query Processing
Evaluation of Expression in Query ProcessingEvaluation of Expression in Query Processing
Evaluation of Expression in Query ProcessingNeel Shah
 
A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT
A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT
A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT IAEME Publication
 
Reading in the future airbrelin
Reading in the future   airbrelinReading in the future   airbrelin
Reading in the future airbrelinMohammed Hadi
 
Reading in the future icelandair1
Reading in the future    icelandair1Reading in the future    icelandair1
Reading in the future icelandair1Mohammed Hadi
 
Reading in the future Turkish Air
Reading in the future    Turkish AirReading in the future    Turkish Air
Reading in the future Turkish AirMohammed Hadi
 
Predicting Aviation Industry Performance (L/F) - 2019
Predicting Aviation Industry Performance (L/F) - 2019Predicting Aviation Industry Performance (L/F) - 2019
Predicting Aviation Industry Performance (L/F) - 2019Mohammed Awad
 
Aeroporti de roma fco
Aeroporti de roma fcoAeroporti de roma fco
Aeroporti de roma fcoMohammed Awad
 
IRJET- CFD-A Trend in Automobile Aerodynamics Technology
IRJET- 	  CFD-A Trend in Automobile Aerodynamics TechnologyIRJET- 	  CFD-A Trend in Automobile Aerodynamics Technology
IRJET- CFD-A Trend in Automobile Aerodynamics TechnologyIRJET Journal
 
IRJET- Topology Optimization of a Lower Barrel in Nose Landing Gear
IRJET- Topology Optimization of a Lower Barrel in Nose Landing GearIRJET- Topology Optimization of a Lower Barrel in Nose Landing Gear
IRJET- Topology Optimization of a Lower Barrel in Nose Landing GearIRJET Journal
 
CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...
CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...
CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...IAEME Publication
 
IRJET- Design and Fluid Flow Analysis of F1 Race Car
IRJET- Design and Fluid Flow Analysis of F1 Race CarIRJET- Design and Fluid Flow Analysis of F1 Race Car
IRJET- Design and Fluid Flow Analysis of F1 Race CarIRJET Journal
 
Airport forecasting issue 45 tls 2020 - Toulouse Airport
Airport forecasting issue 45 tls 2020 - Toulouse AirportAirport forecasting issue 45 tls 2020 - Toulouse Airport
Airport forecasting issue 45 tls 2020 - Toulouse AirportMohammed Awad
 
An Automated Tool for MC/DC Test Data Generation
An Automated Tool for MC/DC Test Data GenerationAn Automated Tool for MC/DC Test Data Generation
An Automated Tool for MC/DC Test Data GenerationAriful Haque
 
IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...
IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...
IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...IRJET Journal
 
sivamani_Resume
sivamani_Resumesivamani_Resume
sivamani_Resumesiva mani
 
Presen 179
Presen 179Presen 179
Presen 179s1140179
 
Tte 332 module 4 s2021 Transportation Projects Metrics
Tte 332 module 4 s2021 Transportation Projects Metrics Tte 332 module 4 s2021 Transportation Projects Metrics
Tte 332 module 4 s2021 Transportation Projects Metrics Wael ElDessouki
 

What's hot (18)

Evaluation of Expression in Query Processing
Evaluation of Expression in Query ProcessingEvaluation of Expression in Query Processing
Evaluation of Expression in Query Processing
 
A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT
A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT
A WORKSPACE SIMULATION FOR TAL TR-2 ARTICULATED ROBOT
 
Reading in the future airbrelin
Reading in the future   airbrelinReading in the future   airbrelin
Reading in the future airbrelin
 
Reading in the future icelandair1
Reading in the future    icelandair1Reading in the future    icelandair1
Reading in the future icelandair1
 
Reading in the future Turkish Air
Reading in the future    Turkish AirReading in the future    Turkish Air
Reading in the future Turkish Air
 
Predicting Aviation Industry Performance (L/F) - 2019
Predicting Aviation Industry Performance (L/F) - 2019Predicting Aviation Industry Performance (L/F) - 2019
Predicting Aviation Industry Performance (L/F) - 2019
 
Aeroporti de roma fco
Aeroporti de roma fcoAeroporti de roma fco
Aeroporti de roma fco
 
IRJET- CFD-A Trend in Automobile Aerodynamics Technology
IRJET- 	  CFD-A Trend in Automobile Aerodynamics TechnologyIRJET- 	  CFD-A Trend in Automobile Aerodynamics Technology
IRJET- CFD-A Trend in Automobile Aerodynamics Technology
 
IRJET- Topology Optimization of a Lower Barrel in Nose Landing Gear
IRJET- Topology Optimization of a Lower Barrel in Nose Landing GearIRJET- Topology Optimization of a Lower Barrel in Nose Landing Gear
IRJET- Topology Optimization of a Lower Barrel in Nose Landing Gear
 
CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...
CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...
CONTAINER TRAFFIC PROJECTIONS USING AHP MODEL IN SELECTING REGIONAL TRANSHIPM...
 
IRJET- Design and Fluid Flow Analysis of F1 Race Car
IRJET- Design and Fluid Flow Analysis of F1 Race CarIRJET- Design and Fluid Flow Analysis of F1 Race Car
IRJET- Design and Fluid Flow Analysis of F1 Race Car
 
Airport forecasting issue 45 tls 2020 - Toulouse Airport
Airport forecasting issue 45 tls 2020 - Toulouse AirportAirport forecasting issue 45 tls 2020 - Toulouse Airport
Airport forecasting issue 45 tls 2020 - Toulouse Airport
 
An Automated Tool for MC/DC Test Data Generation
An Automated Tool for MC/DC Test Data GenerationAn Automated Tool for MC/DC Test Data Generation
An Automated Tool for MC/DC Test Data Generation
 
ICFD12-EG-5044_final
ICFD12-EG-5044_finalICFD12-EG-5044_final
ICFD12-EG-5044_final
 
IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...
IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...
IRJET- New Simulation Methodology for Dynamic Simulation Modeling of Construc...
 
sivamani_Resume
sivamani_Resumesivamani_Resume
sivamani_Resume
 
Presen 179
Presen 179Presen 179
Presen 179
 
Tte 332 module 4 s2021 Transportation Projects Metrics
Tte 332 module 4 s2021 Transportation Projects Metrics Tte 332 module 4 s2021 Transportation Projects Metrics
Tte 332 module 4 s2021 Transportation Projects Metrics
 

Similar to Random Forest Ensemble learning algorithm for Engineering Analytics Project

PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxPRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxMUSAIDRIS15
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delayiDTechTechnologies
 
Flight delay detection data mining project
Flight delay detection data mining projectFlight delay detection data mining project
Flight delay detection data mining projectAkshay Kumar Bhushan
 
Scheduling And Revenue Management Process
Scheduling And Revenue Management ProcessScheduling And Revenue Management Process
Scheduling And Revenue Management Processahmad bassiouny
 
Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...IRJET Journal
 
Air Travel Analytics in SAS
Air Travel Analytics in SASAir Travel Analytics in SAS
Air Travel Analytics in SASRohan Nanda
 
Aviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selectionAviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selectionMohammed Hadi
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectHaozhe Wang
 
Flight departure delay prediction
Flight departure delay predictionFlight departure delay prediction
Flight departure delay predictionVivek Maskara
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...IRJET Journal
 
Aircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine LearningAircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine LearningChristine Williams
 
Hard landing predection
Hard landing predectionHard landing predection
Hard landing predectionRAJUPADHYAY44
 
DOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdfDOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdfShaizaanKhan
 
Predicting Operating Train Delays into New York City using Random Forest Regr...
Predicting Operating Train Delays into New York City using Random Forest Regr...Predicting Operating Train Delays into New York City using Random Forest Regr...
Predicting Operating Train Delays into New York City using Random Forest Regr...AI Publications
 
Optimization : Back to the Core
Optimization : Back to the CoreOptimization : Back to the Core
Optimization : Back to the CoreDimitris Bountolos
 
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET Journal
 
Detailed Project Report.pptx
Detailed Project Report.pptxDetailed Project Report.pptx
Detailed Project Report.pptxZafarmwaris
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihoodAashish Jain
 
IRJET - Design and Computational Fluid Dynamic Simulation of Micro Air Ve...
IRJET -  	  Design and Computational Fluid Dynamic Simulation of Micro Air Ve...IRJET -  	  Design and Computational Fluid Dynamic Simulation of Micro Air Ve...
IRJET - Design and Computational Fluid Dynamic Simulation of Micro Air Ve...IRJET Journal
 

Similar to Random Forest Ensemble learning algorithm for Engineering Analytics Project (20)

PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxPRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delay
 
Airline delay prediction
Airline delay predictionAirline delay prediction
Airline delay prediction
 
Flight delay detection data mining project
Flight delay detection data mining projectFlight delay detection data mining project
Flight delay detection data mining project
 
Scheduling And Revenue Management Process
Scheduling And Revenue Management ProcessScheduling And Revenue Management Process
Scheduling And Revenue Management Process
 
Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...Predicting Flight Delays with Error Calculation using Machine Learned Classif...
Predicting Flight Delays with Error Calculation using Machine Learned Classif...
 
Air Travel Analytics in SAS
Air Travel Analytics in SASAir Travel Analytics in SAS
Air Travel Analytics in SAS
 
Aviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selectionAviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selection
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining Project
 
Flight departure delay prediction
Flight departure delay predictionFlight departure delay prediction
Flight departure delay prediction
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
 
Aircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine LearningAircraft Ticket Price Prediction Using Machine Learning
Aircraft Ticket Price Prediction Using Machine Learning
 
Hard landing predection
Hard landing predectionHard landing predection
Hard landing predection
 
DOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdfDOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdf
 
Predicting Operating Train Delays into New York City using Random Forest Regr...
Predicting Operating Train Delays into New York City using Random Forest Regr...Predicting Operating Train Delays into New York City using Random Forest Regr...
Predicting Operating Train Delays into New York City using Random Forest Regr...
 
Optimization : Back to the Core
Optimization : Back to the CoreOptimization : Back to the Core
Optimization : Back to the Core
 
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
 
Detailed Project Report.pptx
Detailed Project Report.pptxDetailed Project Report.pptx
Detailed Project Report.pptx
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
 
IRJET - Design and Computational Fluid Dynamic Simulation of Micro Air Ve...
IRJET -  	  Design and Computational Fluid Dynamic Simulation of Micro Air Ve...IRJET -  	  Design and Computational Fluid Dynamic Simulation of Micro Air Ve...
IRJET - Design and Computational Fluid Dynamic Simulation of Micro Air Ve...
 

Recently uploaded

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Recently uploaded (20)

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

Random Forest Ensemble learning algorithm for Engineering Analytics Project

  • 1. Engineering Analytics Saurabh Kale P a g e 1 | 14 FLIGHT DELAY PREDICTION AND VISUALIZATION ENGINEERING ANALYTICS COURSE PROJECT FINAL RESEARCH PAPER Project By- Saurabh Kale Project Advisor- Dr. Ying Lin
  • 2. Engineering Analytics Saurabh Kale P a g e 2 | 14 ABSTRACT Air Travel is very common in the USA and it is very important for the traveler to choose a flight which is cheap and reliable. It is the fastest way to get from A to B. There are risks associated with it, but Aircraft manufacturers and Airlines are doing a great job of ensuring passenger safety. While travelling, often, delay is a very important consideration for travelers, especially for people travelling for business and professionals. It is possible to predict with certain confidence how much a flight is going to be delayed by depending on certain factors which have been discussed later in the paper. The data for this study was downloaded from Kaggle Website and the actual source of the data is Department of Transportation’s Bureau of Transportation Statistics website. The data has more than 5 million rows and 31 columns. Some of the columns are not required for this study and they have been removed in data preprocessing. Delay values for canceled flights will be outliers if considered which is why all rows with cancelled flights have been removed in the downloaded dataset. There are obvious correlations for some columns. Correlation plots have been attached further in the report.
  • 3. Engineering Analytics Saurabh Kale P a g e 3 | 14 METHODOLOGY The scope of the project is to fit a model predicting whether a flight is delayed or not based on certain features available in the Flights dataset made available by Bureau of Transportation Statistics. The application of such a study would be a prediction model based online dashboard which takes input from the user (Flyer) about attributes such as Origin Airport, Destination Airport, Airline, Time of Day, Day of Week, Day of Month etc. and predicts the time duration a certain flight will be delayed by. This may be a tool to decide which airline to fly on or what time of the day is best when the delays are minimum. Because of high model complexity, the project scope was cut down. Instead of the model predicting the time duration a flight is delayed by, the model predicts a binary response which is whether a certain flight is delayed or not. The number of unique factor values for airports was found to be ~325. This made it difficult or even impossible to fit a CART model to the data. To solve this issue, the airport factor variables were converted to latitude and longitude numeric variables. This made it easy for these features to be included in the model. Intuitively thinking, Origin and Destination airports features may be the most important features to this model because average number of passengers per flight would be maximum at cities like New York, Chicago, Houston, Los Angeles and other big cities. Another reason why Origin and Destination airport may be important is because these may be weather related delays at certain locations. Other important features which are Flight Duration and Distance have correlation with Origin and Destination and these features have been dropped from the study but may later be considered after evaluation. Factor values for Airline Variable have been encoded and added as columns and have a ‘0’ or ‘1’ response depending on which Airline was the Air Carrier in a certain Row. To sum it up, the features that have been considered in the model are Month, Day of Month, Weekday, Airline, Origin (as Coordinates) and Destination (as Coordinates). Delay is the binary response which is being predicted. Features Considered in the Model 1. Day of Month 2. Day of Week 3. Month 4. Airline 5. Origin Latitude 6. Origin Longitude 7. Destination Latitude 8. Destination Longitude 9. Time of Flight Departure (Scheduled Departure Time)
  • 4. Engineering Analytics Saurabh Kale P a g e 4 | 14 ISSUES WITH DATA Data downloaded from internet always has anomalies (Missing Values, Outliers etc.) associated with it. In this case, Airport Names data had both numeric and alphabetic codes. This meant that either numeric or alphabetic data needed to be substituted for the other. This was accomplished in Excel by creating lookup tables. Data was then imported into R-Studio and numeric codes were replaced with 3-Letter Alphabet codes. Model Proposed • Random forests or random decision forests are an ensemble learning method for classification and regression that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees. • Parameters in Random Forest Function - Node Size – Set too high causes small trees. - nTree – Number of Trees to be grown (Does not lead to overfitting- Advantage of Random Forests) - Mtry – Number of Features selected in an iteration random (mtry < p where p is number of features in the model).
  • 5. Engineering Analytics Saurabh Kale P a g e 5 | 14 VARIABLE SELECTION The following plots will help to validate selection of these variables- Image 1 The maximum and minimum values for Delay by Day of Week can explain whether flight delay. Image 2 The maximum and minimum values for Delay by Day of Month can explain whether flight delay.
  • 6. Engineering Analytics Saurabh Kale P a g e 6 | 14 Image 3 The maximum and minimum values for delay by Month can explain whether a flight will be delayed. Image 4 The following 3 plots explain selection of Airline as a feature. Clearly, Airline, as a feature, will be able to explain whether a flight will be delayed.
  • 7. Engineering Analytics Saurabh Kale P a g e 7 | 14 Image 5 Image 6 As it is clear from the above plot, delay increases between 0000 hours and 0500 hours. This model was later included in the model but was not included in the first set of features.
  • 8. Engineering Analytics Saurabh Kale P a g e 8 | 14 Image 7 Flight Time is the Air Time for a certain flight. This feature is correlated with the Origin and Destination in the sense that the farther apart Origin and Destination is, the more time it will take to reach from Origin to Destination. This feature has not been considered in the model. Image 8 This is an informative plot of delay by state. This in important because four features in our model are coordinates of Airport Locations. This plot captures why those four features are important.
  • 9. Engineering Analytics Saurabh Kale P a g e 9 | 14 CORRELATIONS The following plots show correlations between variables and form the basis for decision to not select these variables in the model. Plot 1 X-Sch_Air_Time VS Y-Elapsed_Time (Plot 1) Plot 2 X-Sch-Air_Time VS Y-Distance (Plot 2)
  • 10. Engineering Analytics Saurabh Kale P a g e 10 | 14 Plot 3 X-Sch_Air_Time VS Y-Actual_Air_Time (Plot 3) There are more correlation plots for other variables such as Scheduled Departure Time, Actual Departure Time and Wheels off Time. These have not been attached in the report because they are very similar to the plots above. MODEL ASSESSMENT AND EVALUATION Based on raw data, the Binary Delay Classification is as follows- Delay Frequency Percentages 0 3607308 62.92% 1 2125618 37.07% Total 5732926 100.00% The model used to fit this data was RandomForest. Call to the function is as follows- RFGeoSpa1 <- randomForest(Delay ~., data = Train1, ntree = 400, mtry = 15 ,nodesize =1) The ROC Curve is as follows-
  • 11. Engineering Analytics Saurabh Kale P a g e 11 | 14 For a cutoff level of 0.45, the TPR rate was found to be 76.09 % and TNR was found to be 51.60% 0 1 0 18141 6748 1 5699 7197 Accuracy of the Model is 67.05 %. LESSONS LEARNT Although Random Forests give very good prediction results considering that data is noise free, one model should not be relied upon. Multiple models must be built and compared with each other to validate the other model’s accuracy. This not only validates the models that are built, but also leads to thought process of deciding why a certain model could not perform better than the other. The fitted Random Forest model for this study may or may not be performing at the most optimal level with the data provided to it. There may be more variables required such as Type of Aircraft, Number of Passengers, Number of Support Staff etc. The addition of these variables may lead to better model performance.
  • 12. Engineering Analytics Saurabh Kale P a g e 12 | 14 ERRORS IN PRESENTATION- The importance of variables when fitting a Random Forest model should only be considered or evaluated using “Importance” function in R when dealing with Regression using Random Forests, not when classification is being performed. This was an error in the presentation and this is an attempt to correct the mistake. USE OF SOFTWARE 1. R-Package 2. Excel and Excel PowerMap 3. Tableau
  • 13. Engineering Analytics Saurabh Kale P a g e 13 | 14 REFERENCES 1. http://kellyjclifton.com/Research/EconImpactsofBicycling/OTRECReport- ConsBehavTravelChoices_Nov2012.pdf 2. An Introduction to Statistical Learning 3. with Applications in R 4. Class Notes- Engineering Analytics – Dr. Lin 5. https://stackoverflow.com/ 6. https://stackexchange.com/ 7. https://www.bts.gov/ 8. https://www.kaggle.com/datasets 9. https://www.rdocumentation.org/
  • 14. Engineering Analytics Saurabh Kale P a g e 14 | 14 APPENDIX- Corelations- <- FlightDF[sample(nrow(FlightDF), 5000000 , replace = FALSE, prob =NULL),] > cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$LATE_AIRCRAFT_DELAY, use = "na.or.complete", method = c("pearson","kendall","s pearman")) [1] -0.02135492 > > cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$AIRLINE_DELAY, use = "na.or.complete", method = c("pearson","kendall","spearman ")) [1] -0.05103192 > > cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$SECURITY_DELAY, use = "na.or.complete", method = c("pearson","kendall","spearm an")) [1] -0.004781347 > > cor(FlightsDataSample1$WEATHER_DELAY, FlightsDataSample1$AIR_SYSTEM_DELAY, use = "na.or.complete", method = c("pearson","kendall","spea rman")) [1] -0.0005082514 > > > cor(FlightsDataSample1$SCHEDULED_TIME, FlightsDataSample1$ELAPSED_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman ")) [1] 0.9852726 > > cor(FlightsDataSample1$SCHEDULED_TIME, FlightsDataSample1$AIR_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman")) [1] 0.9907503 > > cor(FlightsDataSample1$ELAPSED_TIME, FlightsDataSample1$ELAPSED_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman")) [1] 1 > > cor(FlightsDataSample1$DISTANCE, FlightsDataSample1$AIR_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman")) [1] 0.9856394 > > cor(FlightsDataSample1$DISTANCE, FlightsDataSample1$SCHEDULED_TIME, use = "na.or.complete", method = c("pearson","kendall","spearman")) [1] 0.9843424 DATA FOR MODEL The dataset is ~550 MB in size and will be presented to course instructor on request. The code can be sent to instructor on request.