Goal of this project is to study what factors and how they would impact the landing distance of a commercial flight. For this I worked with dataset containing details of BOEING and AIRBUS flights.
2. Page | 1
SUMMARY:
Goal of thisprojectis tostudy whatfactors andhow theywouldimpactthe landingdistance of a
commercial flight.Forthis Ireceiveddatasetcontainingdetails of BOEINGandAIRBUSflights.My
firststepisto performprocedure fordata preparation toremove emptyrows,identifyduplicate
rows,identifysample size foreachvariable and take outoutliers fromthe dataset. All the mentioned
stepshelpedme understanddistributionof variables,identifynormal valuesandoutliersandform
newdatasetwithnormal values.
To furtherunderstandthe correlationof variableswiththe landingdistance,Iconducteddescriptive
studyof normal valueswith predictorvariable (whichisdistance) and all the response variables.This
studygave me some perspective aboutrelationshipof speed_airandspeed_groundwiththe landing
distance.IusedPearsoncorrelationtechniquetofurtherunderstandcollinearitybetweenall the
variables,anditturnedoutthat speed_air islinearlycorrelatedwith speed_groundand there is no
signof othervariables showingany distributionpatternwitheachother.
Finally,Ifittedalinearregressionmodelwithintercept onthe normal dataafterremoving
speed_air. Followingisthe equation:
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see fromthe equationthatdurationof flightandnumberof passengerstravellinghave
negative impactonlandingdistance butspeedof ground,heightof aircraftandpitchhave positive
impact.
3. Page | 2
VARIABLEDICTIONARY:
Aircraft: Themake of an aircraft(Boeing or Airbus).
Duration (in minutes): Flight duration between taking off and landing. The
duration of a normal flight should always be greater than 40min.
No_pasg: The number of passengers in a flight.
Speed_ground (in miles per hour): The ground speed of an aircraftwhen
passing over the threshold of the runway. If its valueis less than 30MPH or
greater than 140MPH, then the landing would be considered as abnormal.
Speed_air (in miles per hour): The air speed of an aircraftwhen passing over
the threshold of the runway. If its value is less than 30MPH or greater than
140MPH, then the landing would be considered as abnormal.
Height (in meters): The height of an aircraft when it is passing over the
threshold of the runway. Thelanding aircraftis required to be at least 6 meters
high at the threshold of the runway.
Pitch (in degrees): Pitch angle of an aircraft when it is passing over the
threshold of the runway.
Distance (in feet): The landing distance of an aircraft. More specifically, it
refers to the distance between the threshold of the runway and the point
wherethe aircraftcan be fully stopped. The length of the airportrunway is
typically less than 6000 feet.
4. Page | 3
CHAPTER 1: DATA PREPARATION
DESCRIPTION: Datapreparation is a very important stepinunderstanding the
sample size. My aim here is toretainas much data as possible toobtaina
best fittedmodel.
STEP 1: Uploading Data Files:
FILENAME flight1 '/home/sainijn0/Stat_computing/FAA1.xls';
PROC IMPORT DATAFILE=FLIGHT1
DBMS=XLS
OUT=FLIGHT1;
GETNAMES=YES;
PROC PRINT DATA=FLIGHT1(obs=10);
FILENAME flight2 '/home/sainijn0/Stat_computing/FAA2.xls';
PROC IMPORT DATAFILE=FLIGHT2
DBMS=XLS
OUT=FLIGHT2;
GETNAMES=YES;
PROC PRINT DATA=FLIGHT2(obs=10);
5. Page | 4
STEP 2: Removing Empty Rows From DataSets:
DATA FLIGHT1;
SET FLIGHT1;
IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND
MISSING(SPEED_GROUND)
AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND
MISSING(DISTANCE)
THEN DELETE;
RUN;
PROC PRINT DATA=flight1(obs=10);
RUN;
6. Page | 5
DATA FLIGHT2;
SET FLIGHT2;
IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND
MISSING(SPEED_GROUND)
AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND
MISSING(DISTANCE)
THEN DELETE;
RUN;
PROC PRINT DATA=flight2(obs=10);
RUN;
STEP 3: Combining Data Sets:
DATA COMBINED_FLIGHT;
SET flight1 FLIGHT2;
PROC PRINT DATA=COMBINED_FLIGHT(OBS=10);
11. Page | 10
STEP 7: Identifying Abnormal Rows:
DATA VALIDATE1;
SET COMBINED_FLIGHT;
IF DURATION<40 THEN FLAG=1;
ELSE IF HEIGHT<6 THEN FLAG=1;
ELSE IF SPEED_GROUND<30 OR SPEED_GROUND>140 THEN FLAG=1;
ELSE IF DISTANCE>6000 THEN FLAG=1;
ELSE FLAG=0;
PROC PRINT DATA=VALIDATE1(OBS=10);
STEP 8: Summary Of Abnormal And Normal Data:
DATA FLAGGED_FLIGHTS;
SET VALIDATE1;
IF FLAG=1;
PROC MEANS DATA=FLAGGED_FLIGHTS;
TITLE ABNORMAL DATA;
12. Page | 11
DATA NORMAL_FLIGHTS;
SET VALIDATE1;
IF FLAG=0;
PROC MEANS DATA=NORMAL_FLIGHTS;
TITLE NORMAL DATA;
Observation:We observedthat after removingduplicate valuesand empty rows, and further
removingoutlierswe get195 rows for speed_airand 781 data rows for each of the following
variables:
Duration
No_pasg
Speed_ground
Height
Pitch
Distance
13. Page | 12
CHAPTER 2: DESCRIPTIVE STUDY
WORKING WITH NORMAL DATA
DESCRIPTION: Purpose of this chapter is to use scatterplotsandPearson
correlationtounderstandany correlationbetweenvariables.
STEP 1: Creating Scatterplots BetweenResponse Variable AndPredictor
Variables.
CODE:
PROC PLOT DATA=NORMAL_FLIGHTS;
PLOT DISTANCE*DURATION;
PLOT DISTANCE*NO_PASG;
PLOT DISTANCE*SPEED_GROUND;
PLOT DISTANCE*SPEED_AIR;
PLOT DISTANCE*HEIGHT;
PLOT DISTANCE*PITCH;
RUN;
17. Page | 16
STEP 2: PearsonCorrelationBetweenVariables:
PROC CORR DATA=NORMAL_FLIGHTS;
VAR DISTANCE DURATION NO_PASG SPEED_GROUND SPEED_AIR HEIGHT
PITCH;
OBSERVATION:
This turns out to be very useful stepas it helpsus in identifyingcorrelationbetweeneachof the
variables.
Now, it seemsfromthe output that correlationbetween“speed_ground” and“speed_air” is0.988
(nearlyperfectcorrelation) which tellsus that they show high collinearity.Therefore,we can drop
one of these variables inregressionstep.
18. Page | 17
Since “speed_air” hasonly 195 data rows filled ascompared to “speed_ground” whichhas 781
data rows. Hence we drop “speed_air” variable fromour regressionmodel.
CHAPTER 3: STATISTICAL STUDY
DESCRIPTION: Statistical study is avery important stepas it helps us in fitting
a regressionmodel topresent datawithinterceptandif eachvariable is
positively or negatively correlatedwith landing distance
STEP 1: Fitting Linear RegressionModel OnOur Normal Dataset:
PROC REG DATA=NORMAL_FLIGHTS;
MODEL DISTANCE = DURATION NO_PASG SPEED_GROUND HEIGHT PITCH;
RUN;
20. Page | 19
OBSERVATION: Equation From RegressionOf Normal Data ComesOut To Be:
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see from the equationthat duration of flightand number of passengerstravellinghave a
negative impact on landingdistance but speedof ground,heightof aircraft and pitch have a
positive impact.
21. Page | 20
Q&A
1. How manyobservations(flights) doyouuse tofityour final model?If notall 950 flights,why?
Solution:I used781 observationstofitthe model outof 950. 69 rowswere removedfromdataset
because theycontainedoutliersand100 rows were duplicates(basedonkeys:speed_ground,
speed_air,height,pitchanddistance).
2. What factors andhow theyimpactthe landingdistance of aflight?
Solution:RegressionEquation
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see fromthe regression equationthatdurationof flightandnumberof passengerstravelling
have a negative impactonlandingdistance butspeedof ground,heightof aircraftandpitch have a
positive impact.
3. Is there anydifferencebetweenthe twomakesBoeingandAirbus?
Solution:
RegressionEquationforBoeingaircrafts
DISTANCE = -1796.98903 + 0.45775*(DURATION) – 1.9524*(NO_PASG) +
42.28107*(SPEED_GROUND) + 14.23727*(HEIGHT) – 39.31818*(PITCH)
RegressionEquationforAirbusaircrafts
DISTANCE = -2788.83858 -0.30974*(DURATION) – 0.33445*(NO_PASG) +
42.90888*(SPEED_GROUND) + 13.98867*(HEIGHT) + 80.78737*(PITCH)
We can see a wide shiftininterceptandpitchcoefficientsforthe 2 makes,butwhatreallygetsmy
attentionisthe change insignsfor coefficientsof pitchandduration.Thistellsusthat pitchand
duration have opposite impactonthe landingdistance whenwe compare BoeingandAirbus
aircrafts.