SlideShare a Scribd company logo
1 of 23
University of Cincinnati, Carl H. Lindner College of Business
MS BANA 2017-18
Statistical Computing Project
Study of factors affecting aircraft landing distance
Samrudh Keshava Kumar
M12420395
(003)
The aim of thisprojectis to studythe simulateddataof 950 commercial flightlandingperformances
and understandthe factorsaffectingthe same. Initially,the datawasprocessedtoremove any
missingorabnormal valuesbefore proceedingwiththe analysis.Bivariate analysiswasperformed
betweenthe variables,the speedandtype of aircraft hada highpositive impactonthe landing
distance.A regressionmodelwasbuiltwithall the available variablesandthe model wasimproved
basedon the diagnosticplotsof the regressionmodel.The speed,type,pitchandthe heightof the
aircraft wasfoundto have significanteffectonthe landingdistancesthroughthe regression analysis.
The initial model hadanR-Squaredof 0.85 and MAPE of 22.5%, the R-Squared wasincreasedto0.97
and MAPE reducedto10.8% inthe final model.
Chapter 1
Data exploration and data cleaning
Aim:To verifydataquality&correct thembefore proceedingwiththe analysis.
Loading the datasets into the SAS environment
PROCIMPORTDATAFILE='/home/samrudhkumar0/Project/FAA1.csv'
DBMS=CSV
OUT=FAA1;
GETNAMES=YES;
RUN;
PROCIMPORTDATAFILE='/home/samrudhkumar0/Project/FAA2.csv'
DBMS=CSV
REPLACE
OUT=FAA2;
GETNAMES=YES;
RUN;
/*Print the top 10 rowsof dataof each dataset*/
PROC PRINTDATA=faa1(obs= 10);
RUN;
PROCPRINTDATA=faa2(obs=10);
RUN;
The datasetshave beenloadedintothe SASenvironmentasFAA1andFAA2. The summaryof the
data isobtained usingPROCMEANS.
PROCmeansDATA = FAA1n nmiss max min mean median var;
Title "Basic Summary of FAA1";
RUN;
PROCmeansDATA = FAA2n nmiss max min mean median var;
Title "Basic Summary of FAA2";
RUN;
/*Observed thatFAA2hasa fewempty rows,in the subsequentstep itwill be removed*/
DATA NO_DEADROWS;
SET FAA2;
IFMISSING(AIRCRAFT) then delete;
RUN;
/*The 50 missing observationshavebeen removed thedatasetnow contains150
observations*/
The emptyrowsof data has beenremoved.The missingvaluesunder speed_airwill be dealtwith
later.
Combiningdata sets from differentsources
Before mergingthe datasetstogether,SASrequiresthatboththe datasetsbe sortedinthe same
fashion.The aircraftname and speed_groundare the unique variablesbywhichthe twodatasets
can be merged.
/*Sorting the datasetbeforemerging*/
PROCSORT DATA = FAA1;
BY aircraft speed_ground;
RUN;
PROCSORT DATA = NO_DEADROWS;
BY aircraft speed_ground;
RUN;
DATA MERGED;
MERGE FAA1 NO_DEADROWS;
BY aircraft speed_ground;
/*Merging by speed_ground sincethereis
repetation in the data,speed_groundhasuniquevaluesso isperfect asa primary key*/
RUN;
/*850 OBSERVATIONSAFTERMERGE*/
The combineddatasetshouldhave had800+150 = 950 observationsbutitcontains850
observations.Thisshowsthatthere were 100observationswhichwere notunique. Summaryof the
mergeddatais showninthe table below
Performingthe completenesscheckofeach variable
Usingthe MEANSprocedure withoptionsN andN Miss to displaythe numberof observationsand
the numberof missingvaluesineachvariable.
PROCMEANSDATA = MERGED N NMISS;
RUN;
/*Treating missing values - duration,speed_air*/
642 and50 valuesare missingfromthe variablesspeed_airanddurationrespectively.
Performingthe validitycheck of each variable
Runningthe UNIVARIATEprocedure todetermine the quartile rangesanyvaluesabovethe 99% and
below1% levelscanbe treatedasabnormal values.
PROCUNIVARIATEDATA=MERGEDPLOT;
RUN;
No of passengers Speed_ground Height
Pitch Distance Speed_air
Data cleaning
Basedon the understandingof the datafromthe previoussteps. Abnormal valuesof speed_ground,
height,durationanddistance are deletedfromthe analytical datasetandmovedtoanew datasets
containingonlyoutliers.Forvariable ‘duration’ outof 781 observations,50(~6%) were missing,the
missingvaluescanbe approximatedwiththe average value. Forvariable ‘speed_air’whichhas203
missingoutof 628 (~32%),the missingvaluesare notreplacedsince itwouldleadtoapproximation
errors.
DATA TREATED_DATA;
SET MERGED;
IF SPEED_GROUND< 30 THEN DELETE;
IF SPEED_GROUND> 140 THEN DELETE;
IF HEIGHT < 6 THEN DELETE;
IF (DURATION <40 ANDDURATION >0) THEN DELETE;
IF MISSING(DURATION) THEN DURATION =154.0065385;
IF DISTANCE> 6000 THEN DELETE;
RUN;
/*831 OBSERVATIONSREMAINING*/
PROCMEANSDATA = TREATED_DATA N NMISS;
RUN;
The treateddatasetcontains831 observationsand0 missingvaluesforall variablesexpect
‘speed_air’.
PROCSORT DATA = TREATED_DATA;
BY AIRCRAFTSPEED_GROUND;
RUN;
PROCSORT DATA = MERGED;
BY AIRCRAFTSPEED_GROUND;
RUN;
DATA COMPLEMENT;
MERGED TREATED_DATA (IN = X) MERGED (IN = Y);
BY AIRCRAFTSPEED_GROUND;
IF (X= 1 ANDY = 0) OR (X=0 ANDY = 1);
DROPDURATION SPEED_AIR;
RUN;
PROCPRINTDATA = COMPLEMENT;
RUN;
The above statementsgenerate the table of observationsthatwere removedfromthe maindataset.
It contains19 observationsasexpected.
Summarizingthe distribution
To summarize the distributionof eachvariable,itwouldbe sufficienttolookatthe meanand
medianvaluesof each.
PROCMEANSDATA=TREATED_DATA N MEAN MEDIAN;
TITLE "MEAN ANDMEDIAN OFTREATED DATA";
RUN;
The mean andmedianvaluesof all the variablesexceptdistance are close toeachotherindicating
that theyfollow anormal distribution.
Usingthe UNIVARIATEprocedure the distance variable isplottedtounderstandthe distribution.
PROCUNIVARIATEDATA=TREATED_DATA PLOT;
VARDISTANCE;
RUN;
The distance variable followsaskewedpatternandmaximumobservationsoccurbetween600to
1000 feet.
It was observedthat100 observationswere duplicate andwere removed.The variable speed_air
had 628 observationsmissing,the missingvalueswouldbe treatedduringthe dataanalysissteps.
Chapter 2
Data Visualization
Aim: To understandhowthe independentvariables/factorsaffectthe dependentvariable(distance)
beingmodelled.
Since the data isbeingmodelled usinglinearregression, itisassumedthatthe independentvariables
have a linearrelationshipwiththe predictedvariable.The slope of the plotswillindicate the impact
the independentvariableshave onthe independentvariable (variable beingpredicted) and, the
shape will indicate the type of relationshipi.e.linear,quadraticetc. andthe spread/variabilityof the
data.
/*Chapter2 visualization*/
/*Plottingdistance of landingwithothervariablestounderstandthe relationships*/
proc plotdata = treated_data;
plotdistance*pitch;
plotdistance*height;
plotdistance*speed_air;
plotdistance*speed_ground;
plotdistance*no_pasg;
plotdistance*duration;
plotdistance*aircraft;
run;
The plot indicatesthatthe pitchof the aircraft doesnothave much of an impacton the landing
distance,the datais concentratedinthe centre of the plotand has highvariability.
Hightof the aircraft above the thresholdof the runwayhasa slight positive impactonthe
landingdistance.
The variable speed_airhasaminimumvalue of 90 MPH, below whichthe valueshave not
beencapturedinthe data. The variable speed_airshows ahighpositive correlationwiththe
landingdistance andthe spreadof the data pointslooksminimal.Fromthe regression
analysiswhichwouldbe carriedoutlater,thisvariable should have ahighersignificance.
The speed_groundvariable hasaquadraticrelationshipwiththe landingdistances,below 70mph
the impact is almostnegligiblebutabove 70mphthere seemstobe a highpositive correlation
similartowhatis beingobservedforthe speed_airvariable.
The no_pasg (No.of passengers) doesnotseemtohave animpacton the landingdistances.
Durationof the flightseemstohave aslightnegative impactonthe landingdistance.
The type of aircraftseemsto be affectingthe landingdistance,Airbusseemstoexhibitshorter
landingdistancescomparedtoBoeing.
Furtherto understandthe strengthof the relationships,the correlationbetweenthe variablesis
calculatedusingthe PROCCORRprocedure inSAS.
In the previousplotforspeed_ground,the curve seemstobe flatbelow 70MPH,to testthis a subset
of the data below70MPH is takenandthe correlationis calculatedbetweenspeed_groundand
distance.
data ground_speed_low;
settreated_data;
if speed_ground>70thendelete;
keepspeed_grounddistance;
run;
proc corr data = ground_speed_low;
run;
The correlationbetweenthe twovariablesare 0.11 meaning the speedof the aircrafthasminimal
impacton the landingdistancesbelow70MPH,0.39 forspeedsbelow 80MPH and0.65 forspeeds
below90MPH. For speed_air,the missingvaluescouldbe approximatedtobe equal tothe
speed_groundvalues.
/*Calculatingthe correlationbetweenthe variables*/
proc corr data = treated_data;
run;
The highlightedvaluesindicate variableswhichare highlycorrelated. The variablesspeed_airand
speed_ground are highlycorrelated witheachotherandare correlatedwiththe predictorvariable
(distance).One of the variables should be eliminatedtopreventmulticollinearityerrors.
The variablesspeed_ground,speed_airandaircrafttype seemtohave an impacton the landing
distances.‘speed_ground’and‘speed_air’have the highestcorrelationcoefficientwithdistance and
are correlatedwitheachother.The missingvaluesof speed_aircouldbe imputedwiththe values
fromspeed_groundandthe speed_groundvariable couldbe eliminatedaltogether.
Chapter 3
Statistical Modelling
Aim:Understandthe variablessignificantlyaffectingthe landingdistance andfitalinearmodel to
predictlandingdistance of the aircraft
SASCodesand outputs:
From the previouschapter,the variablesspeed_air, speed_groundandaircrafthassignificantimpact
on the landingdistances.Toinclude aircraftasa variable inthe linearmodel, adummyvariable
calledaircraft_type iscreated withvalues0and1 for AirbusandBoeingrespectively.
/*Run tteston the speed_groundspeed_air*/
data speeds_df;
settreated_data;
if missing(speed_air) thendelete;
keepspeed_airspeed_ground;
run;
proc ttestdata = speeds_df;
pairedspeed_air*speed_ground;
run;
The null hypothesisbeingtestedisthatthe difference betweenthe meansof the twovariablesis
zero.The null hypothesiscannotbe rejectedbecause p>0.05,therefore we couldsaythatthe two
variablesare similar.The meandifference betweenthe twois0.0739 MPH and the correlationis
0.987. Giventhese evidence,the speed_groundisverysimilartospeed_air.The missingvaluesof
speed_aircanbe imputedwithvaluesfromspeed_ground.
A newdatasetiscreatedwiththe above-mentioned changes.
/*Creatinga dummyvariable foraircrafttype to include aircrafttype asa
*variable inthe linearmodel
*/
data final_model_data;
settreated_data;
if aircraft = 'airbus' thenaircraft_type = 0;
else aircraft_type =1;
if missing(speed_air) then speed_air=speed_ground;
drop aircraftspeed_ground;
run;
proc meansdata = final_model_dataN Nmiss;
run;
/*Generate corelationmatrix*/
proc corr data = final_model_data;
run;
Variableswithhighcorrelationwithdistance ishighlighted.None of the independentvariablesare
correlatedwitheachother.
The final datasethasnot missingvaluesand831 observations.Variablesspeed_groundandaircraft
have beeneliminatedfurtheranalysisisperformedonthisdataset.
A regressionmodelisfittedonthe dataset.
/*Fittinga regressionmodel*/
proc reg data = final_model_data;
model distance =speed_airaircraft_type no_pasgpitchheightduration;
run;
Belowisthe summaryof the correlationand the regressionanalysisof the independentvariables.
Independent
Variables Direction
Correlation
Coefficient
P - Value of
corr
coefficient
Regression
Coefficient
Distance ~ All
P Value reg
coeff
Distance~All
speed_air Strongpositive 0.8675 <.0001 42.45547 <.0001
aircraft_type 0.2381 <.0001 481.22446 <.0001
no_pasg no visible realtion -0.0177 0.6093 -2.15925 0.1806
pitch no visible realtion 0.08703 0.0121 34.84949 0.1552
height no visible realtion 0.09941 0.5082 14.07733 <.0001
duration Slightnegative -0.04995 0.1503 0.00415 0.9871
Nextstepisto eliminatevariables whichhave p-value <0.005 one by one.
The resultantmodel usesair_speed,aircrafttype andheightasdependantvariables.The r -Squared
is0.85.
Chapter 4
Model Validation
Aim:Diagnose the model performance byanalysingthe plotof the residuals,R-Squaredandthe
MAPE of the predictedvalues.
/*Model validationcheckif the residualsare normallydistributed*/
proc reg data=final_model_data;
model distance=speed_airaircraft_type height;
run;
The fit diagnosticsforthe predictedvariable show thatthe residualsare notrandom.The non-
randompatternshowsthat the linearmodel isinappropriateandthe dataneedssome
transformations.The model isunderestimatingthe relationshipinthe extreme rangesof landing
distance.
Calculationof MAPE
proc reg data = final_model_data;
model distance =speed_airaircraft_type height;
outputout=predicted_valuespredicted=py;
run;
data predicted_values;
setpredicted_values;
error_abs = abs(distance - py)/distance;
keepdistance py error_abs;
run;
proc meansdata = predicted_values N mean;
var error_abs;
run;
/*MAPE is22.575%*/
Model predictionaccuracyisexpectedtobe bad,the predictionscouldbe improvedby transforming
a fewpredictorvariables.
Chapter 5
Remodelling and model Validation
Aim:Transformpredictorvariablesandensure the residual plotisrandom.Compare the new models
withthe base model.
SASCodes:
data remodelling_data;
setfinal_model_data;
speed_air_4= (speed_air**4);
speed_air_3= speed_air**3;
speed_air_2= speed_air**2;
height_pitch=height*pitch;
run;
proc meansdata = remodelling_dataN NmissMinmax median;
run;
proc corr data = remodelling_data;
run;
From the correlationplot,speed_air_4isgivingthe highestcorrelationtodistance,height_pitch
whichismultiplicationof heightandpitchhasa highercorrelationcomparedtothe individual
variables,thishave beenselectedforthe final model independentvariable list.
/*Speed_airhasnomissingvalues*/
proc plotdata = remodelling_data;
plotdistance*speed_air;
plotdistance*speed_air_2;
plotdistance*speed_air_3;
plotdistance*speed_air_4;
plotdistance*height_pitch;
run;
The transformed speed_air(speed_air^4) variable showsalinearrelationshiptothe landing
distance.The speed_airvariablewill be replacedwithspeed_air_4.
/*Fittinga regressionmodel*/
proc reg data = remodelling_data;
model distance =speed_air_4aircraft_type height_pitch;
run;
The model hasa betterresidual plotthoughthe modelisunderpredictingpredicting the longer
landingdistances,thisisacceptablegiventhe lackof datapointsexplainingthese scenarios.The R-
Squaredhasimprovedfrom0.85 to 0.97 indicatinghigherpredictionaccuracy.
proc reg data = remodelling_data;
model distance =speed_air_4aircraft_type height_pitch;
outputout=predicted_valuespredicted=py;
run;
data predicted_values;
setpredicted_values;
error_abs= abs(distance - py)/distance;
keepdistance pyerror_abs;
run;
proc meansdata = predicted_valuesN mean;
var error_abs;
run;
The MAPE (MeanAbsolute Percentage Error) of the improvedmodelis10.88%.
The MAPE hasreducedfrom22.58% to 10.88%, the transformationof the dataimprovedthe
accuracy of the predictions.
The model canbe furtherimprovedwithmore datapointsespeciallyinthe scenarioswhere the
landingdistances are greaterthan4000 feetsince thisare the casesto be predicted. More variables
such as grossweightof the aircraft,aircraft model no,winddirectionetc.wouldsignificantly
improve thismodel.
Appendix.
Variable dictionary:
Aircraft: The make of an aircraft (BoeingorAirbus).
Duration (in minutes):Flightdurationbetweentakingoff andlanding.The durationof anormal
flightshouldalwaysbe greaterthan40min.
No_pasg: The numberof passengersinaflight.
Speed_ground(inmilesper hour): The groundspeedof an aircraftwhenpassingoverthe threshold
of the runway.If itsvalue islessthan30MPH or greaterthan 140MPH, thenthe landingwouldbe
consideredasabnormal.
Speed_air(in milesperhour): The air speedof an aircraftwhenpassingoverthe thresholdof the
runway.If its value islessthan30MPH or greaterthan 140MPH, thenthe landingwouldbe
consideredasabnormal.
Height(in meters):The heightof an aircraftwhenit ispassingoverthe thresholdof the runway.The
landingaircraftisrequiredtobe at least6 metershighatthe thresholdof the runway.
Pitch (indegrees):Pitchangle of anaircraft whenitis passingoverthe thresholdof the runway.1
Distance (infeet):The landingdistance of anaircraft.More specifically,itreferstothe distance
betweenthe thresholdof the runwayandthe pointwhere the aircraftcan be fullystopped.The
lengthof the airportrunwayis typicallylessthan6000 feet.

More Related Content

Similar to Predicting aircraft landing distances using linear regression

Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesAdrián Vallés
 
Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...
Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...
Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...IRJET Journal
 
Flight Landing Analysis
Flight Landing AnalysisFlight Landing Analysis
Flight Landing AnalysisTauseef Alam
 
Exploring the prospect of operating low cost carriers and legacy carriers fro...
Exploring the prospect of operating low cost carriers and legacy carriers fro...Exploring the prospect of operating low cost carriers and legacy carriers fro...
Exploring the prospect of operating low cost carriers and legacy carriers fro...Nikhil Menon
 
Aircraft position estimation using angle of arrival of received radar signals
Aircraft position estimation using angle of arrival of received radar signalsAircraft position estimation using angle of arrival of received radar signals
Aircraft position estimation using angle of arrival of received radar signalsjournalBEEI
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
 
A study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point ErrorsA study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point Errorsijpla
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningIRJET Journal
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningIRJET Journal
 
Malta international airport 1
Malta international airport 1Malta international airport 1
Malta international airport 1Mohammed Hadi
 
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...Juan Tobar
 
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHMGRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHMijscai
 
Air Traffic Control And Management System
Air Traffic Control And Management SystemAir Traffic Control And Management System
Air Traffic Control And Management SystemJeff Brooks
 
AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012OptiModel
 
j2 Universal - Modelling and Tuning Braking Characteristics
j2 Universal  - Modelling and Tuning Braking Characteristicsj2 Universal  - Modelling and Tuning Braking Characteristics
j2 Universal - Modelling and Tuning Braking CharacteristicsJohn Jeffery
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_JieMDO_Lab
 

Similar to Predicting aircraft landing distances using linear regression (20)

Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian Valles
 
Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...
Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...
Comparative Study on the Prediction of Remaining Useful Life of an Aircraft E...
 
Flight Landing Analysis
Flight Landing AnalysisFlight Landing Analysis
Flight Landing Analysis
 
Exploring the prospect of operating low cost carriers and legacy carriers fro...
Exploring the prospect of operating low cost carriers and legacy carriers fro...Exploring the prospect of operating low cost carriers and legacy carriers fro...
Exploring the prospect of operating low cost carriers and legacy carriers fro...
 
Aircraft position estimation using angle of arrival of received radar signals
Aircraft position estimation using angle of arrival of received radar signalsAircraft position estimation using angle of arrival of received radar signals
Aircraft position estimation using angle of arrival of received radar signals
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
A study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point ErrorsA study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point Errors
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine Learning
 
Climate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine LearningClimate Visibility Prediction Using Machine Learning
Climate Visibility Prediction Using Machine Learning
 
Malta international airport 1
Malta international airport 1Malta international airport 1
Malta international airport 1
 
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
 
Final project
Final projectFinal project
Final project
 
ml-05x01.pdf
ml-05x01.pdfml-05x01.pdf
ml-05x01.pdf
 
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHMGRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
 
Air Traffic Control And Management System
Air Traffic Control And Management SystemAir Traffic Control And Management System
Air Traffic Control And Management System
 
AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012
 
Flight Delay Prediction
Flight Delay PredictionFlight Delay Prediction
Flight Delay Prediction
 
j2 Universal - Modelling and Tuning Braking Characteristics
j2 Universal  - Modelling and Tuning Braking Characteristicsj2 Universal  - Modelling and Tuning Braking Characteristics
j2 Universal - Modelling and Tuning Braking Characteristics
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
Flights Landing Overrun Project
Flights Landing Overrun ProjectFlights Landing Overrun Project
Flights Landing Overrun Project
 

Recently uploaded

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 

Recently uploaded (20)

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Predicting aircraft landing distances using linear regression

  • 1. University of Cincinnati, Carl H. Lindner College of Business MS BANA 2017-18 Statistical Computing Project Study of factors affecting aircraft landing distance Samrudh Keshava Kumar M12420395 (003) The aim of thisprojectis to studythe simulateddataof 950 commercial flightlandingperformances and understandthe factorsaffectingthe same. Initially,the datawasprocessedtoremove any missingorabnormal valuesbefore proceedingwiththe analysis.Bivariate analysiswasperformed betweenthe variables,the speedandtype of aircraft hada highpositive impactonthe landing distance.A regressionmodelwasbuiltwithall the available variablesandthe model wasimproved basedon the diagnosticplotsof the regressionmodel.The speed,type,pitchandthe heightof the aircraft wasfoundto have significanteffectonthe landingdistancesthroughthe regression analysis. The initial model hadanR-Squaredof 0.85 and MAPE of 22.5%, the R-Squared wasincreasedto0.97 and MAPE reducedto10.8% inthe final model.
  • 2. Chapter 1 Data exploration and data cleaning Aim:To verifydataquality&correct thembefore proceedingwiththe analysis. Loading the datasets into the SAS environment PROCIMPORTDATAFILE='/home/samrudhkumar0/Project/FAA1.csv' DBMS=CSV OUT=FAA1; GETNAMES=YES; RUN; PROCIMPORTDATAFILE='/home/samrudhkumar0/Project/FAA2.csv' DBMS=CSV REPLACE OUT=FAA2; GETNAMES=YES; RUN; /*Print the top 10 rowsof dataof each dataset*/ PROC PRINTDATA=faa1(obs= 10); RUN; PROCPRINTDATA=faa2(obs=10); RUN;
  • 3. The datasetshave beenloadedintothe SASenvironmentasFAA1andFAA2. The summaryof the data isobtained usingPROCMEANS. PROCmeansDATA = FAA1n nmiss max min mean median var; Title "Basic Summary of FAA1"; RUN; PROCmeansDATA = FAA2n nmiss max min mean median var; Title "Basic Summary of FAA2"; RUN; /*Observed thatFAA2hasa fewempty rows,in the subsequentstep itwill be removed*/ DATA NO_DEADROWS; SET FAA2; IFMISSING(AIRCRAFT) then delete; RUN; /*The 50 missing observationshavebeen removed thedatasetnow contains150 observations*/
  • 4. The emptyrowsof data has beenremoved.The missingvaluesunder speed_airwill be dealtwith later. Combiningdata sets from differentsources Before mergingthe datasetstogether,SASrequiresthatboththe datasetsbe sortedinthe same fashion.The aircraftname and speed_groundare the unique variablesbywhichthe twodatasets can be merged. /*Sorting the datasetbeforemerging*/ PROCSORT DATA = FAA1; BY aircraft speed_ground; RUN; PROCSORT DATA = NO_DEADROWS; BY aircraft speed_ground; RUN; DATA MERGED; MERGE FAA1 NO_DEADROWS; BY aircraft speed_ground; /*Merging by speed_ground sincethereis repetation in the data,speed_groundhasuniquevaluesso isperfect asa primary key*/ RUN; /*850 OBSERVATIONSAFTERMERGE*/ The combineddatasetshouldhave had800+150 = 950 observationsbutitcontains850 observations.Thisshowsthatthere were 100observationswhichwere notunique. Summaryof the mergeddatais showninthe table below Performingthe completenesscheckofeach variable Usingthe MEANSprocedure withoptionsN andN Miss to displaythe numberof observationsand the numberof missingvaluesineachvariable. PROCMEANSDATA = MERGED N NMISS; RUN; /*Treating missing values - duration,speed_air*/
  • 5. 642 and50 valuesare missingfromthe variablesspeed_airanddurationrespectively. Performingthe validitycheck of each variable Runningthe UNIVARIATEprocedure todetermine the quartile rangesanyvaluesabovethe 99% and below1% levelscanbe treatedasabnormal values. PROCUNIVARIATEDATA=MERGEDPLOT; RUN; No of passengers Speed_ground Height Pitch Distance Speed_air
  • 6. Data cleaning Basedon the understandingof the datafromthe previoussteps. Abnormal valuesof speed_ground, height,durationanddistance are deletedfromthe analytical datasetandmovedtoanew datasets containingonlyoutliers.Forvariable ‘duration’ outof 781 observations,50(~6%) were missing,the missingvaluescanbe approximatedwiththe average value. Forvariable ‘speed_air’whichhas203 missingoutof 628 (~32%),the missingvaluesare notreplacedsince itwouldleadtoapproximation errors. DATA TREATED_DATA; SET MERGED; IF SPEED_GROUND< 30 THEN DELETE; IF SPEED_GROUND> 140 THEN DELETE; IF HEIGHT < 6 THEN DELETE; IF (DURATION <40 ANDDURATION >0) THEN DELETE; IF MISSING(DURATION) THEN DURATION =154.0065385; IF DISTANCE> 6000 THEN DELETE; RUN; /*831 OBSERVATIONSREMAINING*/ PROCMEANSDATA = TREATED_DATA N NMISS; RUN;
  • 7. The treateddatasetcontains831 observationsand0 missingvaluesforall variablesexpect ‘speed_air’. PROCSORT DATA = TREATED_DATA; BY AIRCRAFTSPEED_GROUND; RUN; PROCSORT DATA = MERGED; BY AIRCRAFTSPEED_GROUND; RUN; DATA COMPLEMENT; MERGED TREATED_DATA (IN = X) MERGED (IN = Y); BY AIRCRAFTSPEED_GROUND; IF (X= 1 ANDY = 0) OR (X=0 ANDY = 1); DROPDURATION SPEED_AIR; RUN; PROCPRINTDATA = COMPLEMENT; RUN; The above statementsgenerate the table of observationsthatwere removedfromthe maindataset. It contains19 observationsasexpected.
  • 8. Summarizingthe distribution To summarize the distributionof eachvariable,itwouldbe sufficienttolookatthe meanand medianvaluesof each. PROCMEANSDATA=TREATED_DATA N MEAN MEDIAN; TITLE "MEAN ANDMEDIAN OFTREATED DATA"; RUN; The mean andmedianvaluesof all the variablesexceptdistance are close toeachotherindicating that theyfollow anormal distribution. Usingthe UNIVARIATEprocedure the distance variable isplottedtounderstandthe distribution. PROCUNIVARIATEDATA=TREATED_DATA PLOT; VARDISTANCE; RUN; The distance variable followsaskewedpatternandmaximumobservationsoccurbetween600to 1000 feet. It was observedthat100 observationswere duplicate andwere removed.The variable speed_air had 628 observationsmissing,the missingvalueswouldbe treatedduringthe dataanalysissteps.
  • 9. Chapter 2 Data Visualization Aim: To understandhowthe independentvariables/factorsaffectthe dependentvariable(distance) beingmodelled. Since the data isbeingmodelled usinglinearregression, itisassumedthatthe independentvariables have a linearrelationshipwiththe predictedvariable.The slope of the plotswillindicate the impact the independentvariableshave onthe independentvariable (variable beingpredicted) and, the shape will indicate the type of relationshipi.e.linear,quadraticetc. andthe spread/variabilityof the data. /*Chapter2 visualization*/ /*Plottingdistance of landingwithothervariablestounderstandthe relationships*/ proc plotdata = treated_data; plotdistance*pitch; plotdistance*height; plotdistance*speed_air; plotdistance*speed_ground; plotdistance*no_pasg; plotdistance*duration; plotdistance*aircraft; run; The plot indicatesthatthe pitchof the aircraft doesnothave much of an impacton the landing distance,the datais concentratedinthe centre of the plotand has highvariability.
  • 10. Hightof the aircraft above the thresholdof the runwayhasa slight positive impactonthe landingdistance. The variable speed_airhasaminimumvalue of 90 MPH, below whichthe valueshave not beencapturedinthe data. The variable speed_airshows ahighpositive correlationwiththe landingdistance andthe spreadof the data pointslooksminimal.Fromthe regression analysiswhichwouldbe carriedoutlater,thisvariable should have ahighersignificance.
  • 11. The speed_groundvariable hasaquadraticrelationshipwiththe landingdistances,below 70mph the impact is almostnegligiblebutabove 70mphthere seemstobe a highpositive correlation similartowhatis beingobservedforthe speed_airvariable. The no_pasg (No.of passengers) doesnotseemtohave animpacton the landingdistances.
  • 12. Durationof the flightseemstohave aslightnegative impactonthe landingdistance. The type of aircraftseemsto be affectingthe landingdistance,Airbusseemstoexhibitshorter landingdistancescomparedtoBoeing. Furtherto understandthe strengthof the relationships,the correlationbetweenthe variablesis calculatedusingthe PROCCORRprocedure inSAS.
  • 13. In the previousplotforspeed_ground,the curve seemstobe flatbelow 70MPH,to testthis a subset of the data below70MPH is takenandthe correlationis calculatedbetweenspeed_groundand distance. data ground_speed_low; settreated_data; if speed_ground>70thendelete; keepspeed_grounddistance; run; proc corr data = ground_speed_low; run; The correlationbetweenthe twovariablesare 0.11 meaning the speedof the aircrafthasminimal impacton the landingdistancesbelow70MPH,0.39 forspeedsbelow 80MPH and0.65 forspeeds below90MPH. For speed_air,the missingvaluescouldbe approximatedtobe equal tothe speed_groundvalues. /*Calculatingthe correlationbetweenthe variables*/ proc corr data = treated_data; run;
  • 14. The highlightedvaluesindicate variableswhichare highlycorrelated. The variablesspeed_airand speed_ground are highlycorrelated witheachotherandare correlatedwiththe predictorvariable (distance).One of the variables should be eliminatedtopreventmulticollinearityerrors. The variablesspeed_ground,speed_airandaircrafttype seemtohave an impacton the landing distances.‘speed_ground’and‘speed_air’have the highestcorrelationcoefficientwithdistance and are correlatedwitheachother.The missingvaluesof speed_aircouldbe imputedwiththe values fromspeed_groundandthe speed_groundvariable couldbe eliminatedaltogether.
  • 15. Chapter 3 Statistical Modelling Aim:Understandthe variablessignificantlyaffectingthe landingdistance andfitalinearmodel to predictlandingdistance of the aircraft SASCodesand outputs: From the previouschapter,the variablesspeed_air, speed_groundandaircrafthassignificantimpact on the landingdistances.Toinclude aircraftasa variable inthe linearmodel, adummyvariable calledaircraft_type iscreated withvalues0and1 for AirbusandBoeingrespectively. /*Run tteston the speed_groundspeed_air*/ data speeds_df; settreated_data; if missing(speed_air) thendelete; keepspeed_airspeed_ground; run; proc ttestdata = speeds_df; pairedspeed_air*speed_ground; run; The null hypothesisbeingtestedisthatthe difference betweenthe meansof the twovariablesis zero.The null hypothesiscannotbe rejectedbecause p>0.05,therefore we couldsaythatthe two variablesare similar.The meandifference betweenthe twois0.0739 MPH and the correlationis 0.987. Giventhese evidence,the speed_groundisverysimilartospeed_air.The missingvaluesof speed_aircanbe imputedwithvaluesfromspeed_ground.
  • 16. A newdatasetiscreatedwiththe above-mentioned changes. /*Creatinga dummyvariable foraircrafttype to include aircrafttype asa *variable inthe linearmodel */ data final_model_data; settreated_data; if aircraft = 'airbus' thenaircraft_type = 0; else aircraft_type =1; if missing(speed_air) then speed_air=speed_ground; drop aircraftspeed_ground; run; proc meansdata = final_model_dataN Nmiss; run; /*Generate corelationmatrix*/ proc corr data = final_model_data; run; Variableswithhighcorrelationwithdistance ishighlighted.None of the independentvariablesare correlatedwitheachother. The final datasethasnot missingvaluesand831 observations.Variablesspeed_groundandaircraft have beeneliminatedfurtheranalysisisperformedonthisdataset.
  • 17. A regressionmodelisfittedonthe dataset. /*Fittinga regressionmodel*/ proc reg data = final_model_data; model distance =speed_airaircraft_type no_pasgpitchheightduration; run; Belowisthe summaryof the correlationand the regressionanalysisof the independentvariables. Independent Variables Direction Correlation Coefficient P - Value of corr coefficient Regression Coefficient Distance ~ All P Value reg coeff Distance~All speed_air Strongpositive 0.8675 <.0001 42.45547 <.0001 aircraft_type 0.2381 <.0001 481.22446 <.0001 no_pasg no visible realtion -0.0177 0.6093 -2.15925 0.1806 pitch no visible realtion 0.08703 0.0121 34.84949 0.1552 height no visible realtion 0.09941 0.5082 14.07733 <.0001 duration Slightnegative -0.04995 0.1503 0.00415 0.9871 Nextstepisto eliminatevariables whichhave p-value <0.005 one by one. The resultantmodel usesair_speed,aircrafttype andheightasdependantvariables.The r -Squared is0.85.
  • 18. Chapter 4 Model Validation Aim:Diagnose the model performance byanalysingthe plotof the residuals,R-Squaredandthe MAPE of the predictedvalues. /*Model validationcheckif the residualsare normallydistributed*/ proc reg data=final_model_data; model distance=speed_airaircraft_type height; run;
  • 19. The fit diagnosticsforthe predictedvariable show thatthe residualsare notrandom.The non- randompatternshowsthat the linearmodel isinappropriateandthe dataneedssome transformations.The model isunderestimatingthe relationshipinthe extreme rangesof landing distance. Calculationof MAPE proc reg data = final_model_data; model distance =speed_airaircraft_type height; outputout=predicted_valuespredicted=py; run; data predicted_values; setpredicted_values; error_abs = abs(distance - py)/distance; keepdistance py error_abs; run; proc meansdata = predicted_values N mean; var error_abs; run; /*MAPE is22.575%*/ Model predictionaccuracyisexpectedtobe bad,the predictionscouldbe improvedby transforming a fewpredictorvariables. Chapter 5 Remodelling and model Validation Aim:Transformpredictorvariablesandensure the residual plotisrandom.Compare the new models withthe base model. SASCodes: data remodelling_data; setfinal_model_data; speed_air_4= (speed_air**4); speed_air_3= speed_air**3; speed_air_2= speed_air**2; height_pitch=height*pitch; run; proc meansdata = remodelling_dataN NmissMinmax median; run; proc corr data = remodelling_data; run;
  • 20. From the correlationplot,speed_air_4isgivingthe highestcorrelationtodistance,height_pitch whichismultiplicationof heightandpitchhasa highercorrelationcomparedtothe individual variables,thishave beenselectedforthe final model independentvariable list. /*Speed_airhasnomissingvalues*/ proc plotdata = remodelling_data; plotdistance*speed_air; plotdistance*speed_air_2; plotdistance*speed_air_3; plotdistance*speed_air_4; plotdistance*height_pitch; run;
  • 21. The transformed speed_air(speed_air^4) variable showsalinearrelationshiptothe landing distance.The speed_airvariablewill be replacedwithspeed_air_4. /*Fittinga regressionmodel*/ proc reg data = remodelling_data; model distance =speed_air_4aircraft_type height_pitch; run;
  • 22. The model hasa betterresidual plotthoughthe modelisunderpredictingpredicting the longer landingdistances,thisisacceptablegiventhe lackof datapointsexplainingthese scenarios.The R- Squaredhasimprovedfrom0.85 to 0.97 indicatinghigherpredictionaccuracy. proc reg data = remodelling_data; model distance =speed_air_4aircraft_type height_pitch; outputout=predicted_valuespredicted=py; run; data predicted_values; setpredicted_values; error_abs= abs(distance - py)/distance; keepdistance pyerror_abs; run; proc meansdata = predicted_valuesN mean; var error_abs; run; The MAPE (MeanAbsolute Percentage Error) of the improvedmodelis10.88%. The MAPE hasreducedfrom22.58% to 10.88%, the transformationof the dataimprovedthe accuracy of the predictions. The model canbe furtherimprovedwithmore datapointsespeciallyinthe scenarioswhere the landingdistances are greaterthan4000 feetsince thisare the casesto be predicted. More variables such as grossweightof the aircraft,aircraft model no,winddirectionetc.wouldsignificantly improve thismodel.
  • 23. Appendix. Variable dictionary: Aircraft: The make of an aircraft (BoeingorAirbus). Duration (in minutes):Flightdurationbetweentakingoff andlanding.The durationof anormal flightshouldalwaysbe greaterthan40min. No_pasg: The numberof passengersinaflight. Speed_ground(inmilesper hour): The groundspeedof an aircraftwhenpassingoverthe threshold of the runway.If itsvalue islessthan30MPH or greaterthan 140MPH, thenthe landingwouldbe consideredasabnormal. Speed_air(in milesperhour): The air speedof an aircraftwhenpassingoverthe thresholdof the runway.If its value islessthan30MPH or greaterthan 140MPH, thenthe landingwouldbe consideredasabnormal. Height(in meters):The heightof an aircraftwhenit ispassingoverthe thresholdof the runway.The landingaircraftisrequiredtobe at least6 metershighatthe thresholdof the runway. Pitch (indegrees):Pitchangle of anaircraft whenitis passingoverthe thresholdof the runway.1 Distance (infeet):The landingdistance of anaircraft.More specifically,itreferstothe distance betweenthe thresholdof the runwayandthe pointwhere the aircraftcan be fullystopped.The lengthof the airportrunwayis typicallylessthan6000 feet.