SlideShare a Scribd company logo
1 of 35
Download to read offline
BANA 6043 STATISTICAL COMPUTING Mid Term Project
Submitted by Tauseef Alam (M10666228)
Abstract:
Landing overrun is problem for most flight landing operations. In this report we are trying
to identify key factors affecting the landing distance of commercial flights. In order to
determine the factors and quantify the impact of factors on landing distance we created a
linear regression model keeping landing distance as dependent variable.
Landing distance is largely dependent on ground speed of aircraft, aircraft type and height
(descending order with respect to weight) of the aircraft when it passes through the
threshold of runway.
Distance= -945.4486 + 460.522*aircraft_typ + 0.27335*speed_gr_sq + 14.056*height
1. For ‘Boeing’ aircraft type the predicted landing distance would be 0.25644 points greater
than ‘Airbus’ aircraft type.
2. For every one-unit increase in square of ground speed there will be 0.92932-unit increase
in the predicted landing distance
3. For every one-unit increase in height above threshold there will be 0.15345-unit increase
in the predicted landing distance
CHAPTER 1: Data Preparation
This chapter will illustrate the step by step methods and techniques used for data
preparation. Each step will be followed by reasoning, AS code and then output of SAS
code or log file
Step1: Read landing data from the two excel files ‘FAA-1.xls’ and ‘FAA-2.xls’.
SAS Code:
PROC IMPORT OUT= FAA1 DATAFILE= "C:UsersalamtfDownloadsFAA1.xls"
DBMS=xls REPLACE;
SHEET="FAA1";
GETNAMES=YES;
RUN;
proc print data=FAA1;
run;
PROC IMPORT OUT= FAA2 DATAFILE= "C:UsersalamtfDownloadsFAA2.xls"
DBMS=xls REPLACE;
SHEET="FAA2";
GETNAMES=YES;
RUN;
proc print data=FAA2;
run;
/*deleting all missing observations from FAA2*/
data FAA2;
set FAA2;
if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and
pitch=. and distance=. then delete;
run;
data FAA1;
set FAA1;
if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and
pitch=. and distance=. then delete;
run;
Reasoning: The dataset is read from the excel files and printed to see the the
observations. This we can do in small dataset types. However, for large datasets we
cannot do this. In the last data step, we are deleting observations which are completely
blank.
Logfile output:
69 /*deleting all missing observations*/
70 data FAA2;
71 set FAA2;
72 if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and
pitch=. and
72 ! distance=. then delete;
73 run;
NOTE: There were 150 observations read from the data set WORK.FAA2.
NOTE: The data set WORK.FAA2 has 150 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
74 data FAA1;
75 set FAA1;
76 if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and
pitch=. and
76 ! distance=. then delete;
77 run;
NOTE: There were 800 observations read from the data set WORK.FAA1.
NOTE: The data set WORK.FAA1 has 800 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Step2: Combining the datasets:
SAS Code:
data faa;
set faa1 faa2;
run;
proc print data=faa;
run;
Reasoning: The two datasets created after reading file FAA1.xls and FAA2.xls
Are appended as both are having same kind of information. FAA2 do not have variable
duration. Hence for combined file all 150 transactions from this file will be missing for
duration variable.
Output:
Step3: Check for removing duplicate observations
SAS Code:
proc sort data= faa out= faa_n nodupkey;
by aircraft no_pasg speed_ground speed_air height pitch distance;
run;
Reasoning: duplicate observations are removed from the data as they do not add any extra
information
Log File Output:
80 proc sort data= faa out= faa_n nodupkey;
81 by aircraft no_pasg speed_ground speed_air height pitch distance;
82 run;
NOTE: There were 950 observations read from the data set WORK.FAA.
NOTE: 100 observations with duplicate key values were deleted.
NOTE: The data set WORK.FAA_N has 850 observations and 8 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Comment: There are 850 total observations on which we will be doing our further
analysis.
Step3: Performing Completeness check for each variable: examine if missing values are
present.
SAS Code:
proc means data=faa_n n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99;
var no_pasg speed_ground speed_air height pitch distance duration;
run;
Reasoning: n and nmiss options are used in proc means to get the count of non-missing
and missing observations. The percentile distribution is check the get a feeling of
distribution of different variables:
Output:
Comments: Here we see that variable speed_air is missing for 75% of the total data and
Duration is missing for 6% of the data. Right Now we are keeping both the variables in
our further analysis. However, we may need to drop these variables or do missing value
imputation before we do our final analysis.
Step4: Removing data abnormality.
SAS code:
/*treetment to remove abnormal observations*/
data faa_fn;
set faa_n;
/*duration of normal flights is always greater than 40 min */
if duration ne . and duration < 40 then delete;
/*Speed_ground value less than 30 and more than 140 is considered abnormal*/
if speed_ground ne . and (speed_ground <30 or speed_ground > 140) then delete;
/*Speed_air value less than 30 and more than 140 is considered abnormal*/
if speed_air ne . and (speed_air < 30 or speed_air > 140) then delete;
/*height should be at least 6 meters at threshold of runway*/
if height ne . and Height < 6 then delete ;
/*distance should be less than airport run way length which is 6000 feet*/
if distance ne . and distance >6000 THEN DELETE;
run;
Reasoning: Certain checks are applied on the data to remove the abnormal observations.
These abnormality definitions are referred from Project.pdf file, which was given.
Step5: Checking the cleaned data to validate the treatment applied
proc means data=faa_fn n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99;
var no_pasg speed_ground speed_air height pitch;
run;
Proc freq data=faa_fn;
tables aircraft/missing list;
run;
SAS output:
Comment: We can see from the distribution the variables that all the abnormality is
removed. There are 831 observations remaining after removing the abnormality from the
data.
Step6: Summarization of distribution of each variable:
1. Duration(in minutes): Flight duration between taking off and landing. The
duration of a normal flight should always be greater than 40min.
Summarizing points:
a) We can see from the box plot, Extreme values table and Quintile
distributions that there are some outliers in the data.
b) From the QQ plot and histogram the distribution and Test of
normality table it seems distribution of variable is positively skewed.
However, if we remove the outlier the distribution may tend to
normal distribution.
c) There are 50 missing observations which need to be treated before
using in our analysis
2. No_pasg: The number of passengers in a flight.
Summary of variable:
a) We can see from the box plot, Extreme values table and Quintile
distributions that there are some outliers in the data.
b) From the QQ plot and histogram the distribution seems negatively
skewed.
c) From the test of normality table. The variable fails most of the
normality tests as p value is is less than .05. hence the variable is not
normal
d) There are no missing observations.
3. Speed_ground: (in miles per hour): The ground speed of an aircraft when passing
over the threshold of the runway. If its value is less than 30MPH or greater than
140MPH, then the landing would be considered as abnormal.
Summary of variable:
a) We can see from the box plot and Quintile distributions that there
are very few outliers in the data.
b) From the QQ plot and histogram it seems distribution of variable is
positively skewed.
c) From Test of normality table, the variable passes all the normality
test as the p value for all the test is greater than .05.
d) There are no missing observations.
4. Speed_air: (in miles per hour): The air speed of an aircraft when passing over the
threshold of the runway. If its value is less than 30MPH or greater than 140MPH,
then the landing would be considered as abnormal.
Summary of variable:
a) For 75% of the population this variable is missing.
b) We cannot interpret much about the distribution of this variable as it is filled for a
small population.
5. Height(in meters): The height of an aircraft when it is passing over the threshold
of the runway. The landing aircraft is required to be at least 6 meters high at the
threshold of the runway.
Summary of variable:
a) We can see from the box plot and Quintile distributions that there
are very few outliers in the data.
b) From the QQ plot and histogram distribution of variable is positively
skewed.
c) From the Test of normality table as the p value of all the test is
greater than .05 hence the distribution of this variable follows
normal distribution
d) There are no missing observations.
6. Pitch (in degrees): Pitch angle of an aircraft when it is passing over the threshold
of the runway.
Summary of variable:
a) From the QQ plot and histogram it seems the variable is normal.
From the test of normality table, we can see p- value of most of the
test is greater than .05 hence the variable is normally distributed.
b) There are no missing observations.
7. Distance (in feet): The landing distance of an aircraft. More specifically, it refers
to the distance between the threshold of the runway and the point where the
aircraft can be fully stopped. The length of the airport runway is typically less than
6000 feet.
Summary of variable:
a) The distribution of the variable is positively skewed. From the qq plot and
test of normality we can say the distribution is not normal (since plave of
all the 4 test is less than .05)
b) There are some extreme values which needs to be treted before using this
variable in analysis. The extreme value table the the box plot and quntile
distribution suggest the same.
Step7: List of questions which I had during the data preparation:
a) As there are lots of missing data for Speed_air and duration variable. We need to
either delete missing values or we need to perform missing value imputation, Its
not clear to me right now which approach to take.
b) There are certain variables like no_pasg for which it was not clear whether the
distribution is normal or not as the normality test and qq plot are giving
contradicting inference.
c) Need to determine which variables need to be dropped from the analysis. For
example, speed air is missing for more than 75% of data. Hence we should drop
this.
CHAPTER 2: Data Exploration
This chapter will illustrate the step by step methods and techniques used for data
exploration. Each step will be followed by reasoning or observation, SAS code and then
output of SAS code or log file
Step1: Check the distribution of all the independent variables and dependent variable to
identify missing percentage and outliers for each variables
SAS CODE:
proc means data=faa_fn n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99;
var distance no_pasg speed_ground speed_air height pitch duration;
run;
data faa_fn1;
set faa_fn;
drop speed_air;
run;
/* there are 781 observation */
proc means data=faa_fn1 n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99;
var distance no_pasg speed_ground height pitch duration;
run;
Observation:
1. Speed_air has 76 percent of missing data hence we are dropping this variable for
our further analysis
2. Duration has 50 missing values. We are not deleting these observations as of now.
After doing the first few iterations we will see whether to keep or drop this
variable. If we drop this variable, then we will not lose information from deleting
50 observations.
Step 2: See the trend of dependent variable with each independent variable
SAS CODE:
proc plot data= faa_fn1;
plot distance*(no_pasg speed_ground height pitch duration);
run;
Output:
1. We cannot see any strong linear trend between duration and variables distance,
height, no_pasg and pitch.
2. speed_ground has a second order relationship with distance. We might need to
transform this variable to a second degree variable. We will need to see the
residual plots from our initial modelling iterations to confirm this relationship.
Step 3: Before proceeding with model building we need to check the strength of linear
relationship between independent and dependent variables. Also, we need to check which
two independent variables are highly correlated. We will only take one amongst the
correlated variable.
SAS Code:
proc corr data= faa_fn1;
run;
Output:
Observations:
1. We can see that there is no significant correlation between variables dependent
variable distance vs independent variables duration and no_pasg.( p value of
hypothesis testing that there is no correlation between variables is greater than .05
significance level)
2. There exists a strong positive correlation between speed_ground and distance.
This gives impression that it will be a strong predictor.
3. Pitch and height are had significant correlation with distance but they have a week
linear relationship with distance.
Step4: Creating binary variables for categorical variables having two levels ( for more
than two levels we need to crate level-1 number of indicators or use proc glm, right now
we will be using proc reg)
1. Variable aircraft has two values ‘boeing’ and ‘airbus’.
2. Creating variable aircraft_typ, which is set to 1 when the aircraft is ‘boeing’
SAS CODE:
data faa_fn2;
set faa_fn1;
if aircraft='boeing' then aircraft_typ=1; else aircraft_typ=0;
run;
proc freq data= faa_fn2;
tables aircraft aircraft_typ;
run;
CHAPTER 3: Data Modelling
Step1:
Run proc reg with all the variables from the end of chapter 2 step 4.
SAS Code:
proc reg data=train_faa;
model distance=aircraft_typ no_pasg pitch duration speed_ground height/vif;
output out=outp_reg r=res residual=outp_residual;
run;
Output1:
Observations:
1. There are 831 observations in training data of which 50 are missing due to missing
values in duration. These observations are not used in any of the analysis in proc
reg.
2. Analysis of Variance table shows that there is some dependability between the
independent and dependent variables. P value less than .05 significance level
indicates rejection of null hypothesis that all the coefficients of independent
variable are zero
3. R-Square and Adj-R-sq values are greater than 80% which indicates significant
variance in the dependent variable is explained by these independent variables
4. Parameter estimate table: we can see that from this run, variables such as no_pasg,
pitch and duration are not significant hence we will drop these variables in our
next iteration. (p-value > .05 indicates we can accept the null hypothesis that the
coefficient is zero)
5. Airtcraft_typ, speed_ground and height are significant variables which we will
keep in our next iteration.
6. Plot between residuals and independent variable shows that there exist a 2nd
order
relationship between speed ground and distance which we also verified from
scatterplot from chapter 2.
7. 3. There is no multi-collinearity between the variables as VIF is less than 10 for all
variables
Output 2:
Observations:
Assumptions of linear regression states that the residuals should be identically distributed
and should be independent from each other.
1. Constant variance: Heteroscedasticity: The plot between standardized residuals
and predicted value should be identically distributed. Here this condition is not
met.
2. Normality Assumption: The qq plot shows the residuals are not following
normality assumption.
3. The U shaped graph between residual and predicted values shows there is some
nonlinear term in the model. Speed_ground variable which we identified above
might be the variable which is causing this non linearity.
Step2.
Re run proc reg with learnings from first iteration
SAS Code:
/* squaring variable seed_ground as per the resedual plots */
data faa_fn3;
set faa_fn2;
speed_gr_sq=speed_ground*speed_ground;
run;
proc reg data=faa_fn3 ;
model distance=aircraft_typ speed_gr_sq height/vif stb;/* for giving standardized
coefficients to report*/
output out=outp_reg r=res residual=outp_residual;
run;
Output1:
Observations:
1. After transformation R-sq adjusted jumped to 92%
2. All the variables are significant and have intuitive sign. For example, higher the
speed higher should be landing distance
3. There is no multi-collinearity between the variables as VIF is less than 10 for all
variables
Output2:
Observations and Conclusion:
1. The graph between standardized residual and predicted values have become
slightly identically distributed, which means the randomness in variance is
reduced slightly.
Step 3.
Test of normality of residuals:
SAS Code:
PROC UNIVARIATE DATA=outp_reg
NORMAL PLOT;
VAR RES;
RUN;
Output:
Observation:
1. Residuals do not have zero mean.As P value of the locationn test is less than .05
we have to reject the null hypothesis that the mean is zero.
2. Residuals fails normality test which means the reseduals are not normaly
distributed. This defies normality assumtion of linear regression.
Summary:
Landing distance is largely dependent on ground speed of aircraft, aircraft type and height
(descending order with respect to weight) of the aircraft when it passes through the
threshold of runway.
Distance= -945.4486 + 460.522*aircraft_typ + 0.27335*speed_gr_sq + 14.056*height
1. For ‘Boeing’ aircraft type(aircraft_typ) the predicted landing distance would be
460.522 points greater than ‘Airbus’ aircraft type.
2. For every one-unit increase in square of ground speed (speed_gr_sq) there will be
0.27335-unit increase in the predicted landing distance
3. For every one-unit increase in height(height) above threshold there will be 14.056-
unit increase in the predicted landing distance
Answers to qestions asked at the end
1. How many observations (flights) do you use to fit your final model? If not all 950
flights, why?
Answer: There were 831 observations used to fit the final model. 119
observations are deleted to remove the abnormal ground speed of aircrafts,
duration of flight landing, height above the threshold of runway etc.
2. What factors and how they impact the landing distance of a flight?
Answer: Landing distance is largely dependent on ground speed of aircraft,
aircraft type and height (descending order with respect to weight) of the aircraft
when it passes through the threshold of runway.
A. For ‘Boeing’ aircraft type the predicted landing distance would be 460.522
points greater than ‘Airbus’ aircraft type.
B. For every one-unit increase in square of ground speed there will be
0.27335-unit increase in the predicted landing distance
C. For every one-unit increase in height above threshold there will be 14.056-
unit increase in the predicted landing distance
3. Is there any difference between the two makes Boeing and Airbus?
Answer: Yes, there is a significant difference between the make of two
commercial aircrafts, Boeing and Airbus. From our final results of report, we can
say that:
For ‘Boeing’ aircraft type the predicted landing distance would be 460.522 points
greater than ‘Airbus’ aircraft type.

More Related Content

What's hot

Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...
Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...
Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...inventy
 
Autonomous Perching Quadcopter
Autonomous Perching QuadcopterAutonomous Perching Quadcopter
Autonomous Perching QuadcopterYucheng Chen
 
j2 Universal - Modelling and Tuning Braking Characteristics
j2 Universal  - Modelling and Tuning Braking Characteristicsj2 Universal  - Modelling and Tuning Braking Characteristics
j2 Universal - Modelling and Tuning Braking CharacteristicsJohn Jeffery
 
Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...
Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...
Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...TELKOMNIKA JOURNAL
 
Ds lab manual by s.k.rath
Ds lab manual by s.k.rathDs lab manual by s.k.rath
Ds lab manual by s.k.rathSANTOSH RATH
 
16.100Project
16.100Project16.100Project
16.100ProjectEric Tu
 
Reference Parameter, Passing object by reference, constant parameter & Defaul...
Reference Parameter, Passing object by reference, constant parameter & Defaul...Reference Parameter, Passing object by reference, constant parameter & Defaul...
Reference Parameter, Passing object by reference, constant parameter & Defaul...Meghaj Mallick
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
 
Polymer Brush Data Processor
Polymer Brush Data ProcessorPolymer Brush Data Processor
Polymer Brush Data ProcessorCory Bethrant
 
Data structures lab manual
Data structures lab manualData structures lab manual
Data structures lab manualSyed Mustafa
 
Model predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-enginesModel predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-enginespace130557
 
Arima model
Arima modelArima model
Arima modelJassika
 

What's hot (20)

Arrays in SAS
Arrays in SASArrays in SAS
Arrays in SAS
 
Flights Landing Overrun Project
Flights Landing Overrun ProjectFlights Landing Overrun Project
Flights Landing Overrun Project
 
Code optimization
Code optimization Code optimization
Code optimization
 
PLSQL Practices
PLSQL PracticesPLSQL Practices
PLSQL Practices
 
Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...
Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...
Analytical Evaluation of Generalized Predictive Control Algorithms Using a Fu...
 
Autonomous Perching Quadcopter
Autonomous Perching QuadcopterAutonomous Perching Quadcopter
Autonomous Perching Quadcopter
 
j2 Universal - Modelling and Tuning Braking Characteristics
j2 Universal  - Modelling and Tuning Braking Characteristicsj2 Universal  - Modelling and Tuning Braking Characteristics
j2 Universal - Modelling and Tuning Braking Characteristics
 
Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...
Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...
Multivariable Parametric Modeling of a Greenhouse by Minimizing the Quadratic...
 
Presentation
PresentationPresentation
Presentation
 
Ds lab manual by s.k.rath
Ds lab manual by s.k.rathDs lab manual by s.k.rath
Ds lab manual by s.k.rath
 
16.100Project
16.100Project16.100Project
16.100Project
 
iteapaper
iteapaperiteapaper
iteapaper
 
Lecture4
Lecture4Lecture4
Lecture4
 
Reference Parameter, Passing object by reference, constant parameter & Defaul...
Reference Parameter, Passing object by reference, constant parameter & Defaul...Reference Parameter, Passing object by reference, constant parameter & Defaul...
Reference Parameter, Passing object by reference, constant parameter & Defaul...
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
 
Polymer Brush Data Processor
Polymer Brush Data ProcessorPolymer Brush Data Processor
Polymer Brush Data Processor
 
Data structures lab manual
Data structures lab manualData structures lab manual
Data structures lab manual
 
Model predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-enginesModel predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-engines
 
VTU Data Structures Lab Manual
VTU Data Structures Lab ManualVTU Data Structures Lab Manual
VTU Data Structures Lab Manual
 
Arima model
Arima modelArima model
Arima model
 

Similar to Flight Landing Analysis

Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesAdrián Vallés
 
Statistical computing project
Statistical computing projectStatistical computing project
Statistical computing projectRashmiSubrahmanya
 
3DoF Helicopter Trim , Deceleration manouver simulation, Stability
3DoF Helicopter Trim , Deceleration manouver simulation, Stability 3DoF Helicopter Trim , Deceleration manouver simulation, Stability
3DoF Helicopter Trim , Deceleration manouver simulation, Stability Deepak Paul Tirkey
 
AIRCRAFT PITCH EECE 682 Computer Control Of Dynamic.docx
AIRCRAFT PITCH EECE 682  Computer Control Of Dynamic.docxAIRCRAFT PITCH EECE 682  Computer Control Of Dynamic.docx
AIRCRAFT PITCH EECE 682 Computer Control Of Dynamic.docxgalerussel59292
 
NT-33 Report Final
NT-33 Report FinalNT-33 Report Final
NT-33 Report FinalZack White
 
detail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfdetail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfssuserf86fba
 
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
Algorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTSAlgorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTS
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTSAlicia Edwards
 
Please follow the cod eand comments for description CODE #incl.pdf
Please follow the cod eand comments for description CODE #incl.pdfPlease follow the cod eand comments for description CODE #incl.pdf
Please follow the cod eand comments for description CODE #incl.pdfannaielectronicsvill
 
7-White Box Testing.ppt
7-White Box Testing.ppt7-White Box Testing.ppt
7-White Box Testing.pptHirenderPal
 
In this project you implement a program such that it simulates the p.pdf
In this project you implement a program such that it simulates the p.pdfIn this project you implement a program such that it simulates the p.pdf
In this project you implement a program such that it simulates the p.pdffathimafancy
 
Introductiontoflowchart 110630082600-phpapp01
Introductiontoflowchart 110630082600-phpapp01Introductiontoflowchart 110630082600-phpapp01
Introductiontoflowchart 110630082600-phpapp01VincentAcapen1
 
Introduction to flowchart
Introduction to flowchartIntroduction to flowchart
Introduction to flowchartJordan Delacruz
 
Computer Science Programming Assignment Help
Computer Science Programming Assignment HelpComputer Science Programming Assignment Help
Computer Science Programming Assignment HelpProgramming Homework Help
 
References1. HCS 2010 online manuals.2. Data Data provi.docx
References1. HCS 2010 online manuals.2. Data  Data provi.docxReferences1. HCS 2010 online manuals.2. Data  Data provi.docx
References1. HCS 2010 online manuals.2. Data Data provi.docxdebishakespeare
 
Triton UAS Technical Design Paper 2020-2021
Triton UAS Technical Design Paper 2020-2021Triton UAS Technical Design Paper 2020-2021
Triton UAS Technical Design Paper 2020-2021KennyPham19
 

Similar to Flight Landing Analysis (20)

Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian Valles
 
Statistical computing project
Statistical computing projectStatistical computing project
Statistical computing project
 
Flight Data Analysis
Flight Data AnalysisFlight Data Analysis
Flight Data Analysis
 
Flight Landing Risk Assessment Project
Flight Landing Risk Assessment ProjectFlight Landing Risk Assessment Project
Flight Landing Risk Assessment Project
 
3DoF Helicopter Trim , Deceleration manouver simulation, Stability
3DoF Helicopter Trim , Deceleration manouver simulation, Stability 3DoF Helicopter Trim , Deceleration manouver simulation, Stability
3DoF Helicopter Trim , Deceleration manouver simulation, Stability
 
White box testing
White box testingWhite box testing
White box testing
 
AIRCRAFT PITCH EECE 682 Computer Control Of Dynamic.docx
AIRCRAFT PITCH EECE 682  Computer Control Of Dynamic.docxAIRCRAFT PITCH EECE 682  Computer Control Of Dynamic.docx
AIRCRAFT PITCH EECE 682 Computer Control Of Dynamic.docx
 
NT-33 Report Final
NT-33 Report FinalNT-33 Report Final
NT-33 Report Final
 
detail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfdetail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdf
 
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
Algorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTSAlgorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTS
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
 
Algorithm manual
Algorithm manualAlgorithm manual
Algorithm manual
 
Please follow the cod eand comments for description CODE #incl.pdf
Please follow the cod eand comments for description CODE #incl.pdfPlease follow the cod eand comments for description CODE #incl.pdf
Please follow the cod eand comments for description CODE #incl.pdf
 
7-White Box Testing.ppt
7-White Box Testing.ppt7-White Box Testing.ppt
7-White Box Testing.ppt
 
In this project you implement a program such that it simulates the p.pdf
In this project you implement a program such that it simulates the p.pdfIn this project you implement a program such that it simulates the p.pdf
In this project you implement a program such that it simulates the p.pdf
 
Introductiontoflowchart 110630082600-phpapp01
Introductiontoflowchart 110630082600-phpapp01Introductiontoflowchart 110630082600-phpapp01
Introductiontoflowchart 110630082600-phpapp01
 
Introduction to flowchart
Introduction to flowchartIntroduction to flowchart
Introduction to flowchart
 
Computer Science Programming Assignment Help
Computer Science Programming Assignment HelpComputer Science Programming Assignment Help
Computer Science Programming Assignment Help
 
Cpk problem solving_pcba smt machine
Cpk problem solving_pcba smt machineCpk problem solving_pcba smt machine
Cpk problem solving_pcba smt machine
 
References1. HCS 2010 online manuals.2. Data Data provi.docx
References1. HCS 2010 online manuals.2. Data  Data provi.docxReferences1. HCS 2010 online manuals.2. Data  Data provi.docx
References1. HCS 2010 online manuals.2. Data Data provi.docx
 
Triton UAS Technical Design Paper 2020-2021
Triton UAS Technical Design Paper 2020-2021Triton UAS Technical Design Paper 2020-2021
Triton UAS Technical Design Paper 2020-2021
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Flight Landing Analysis

  • 1. BANA 6043 STATISTICAL COMPUTING Mid Term Project Submitted by Tauseef Alam (M10666228) Abstract: Landing overrun is problem for most flight landing operations. In this report we are trying to identify key factors affecting the landing distance of commercial flights. In order to determine the factors and quantify the impact of factors on landing distance we created a linear regression model keeping landing distance as dependent variable. Landing distance is largely dependent on ground speed of aircraft, aircraft type and height (descending order with respect to weight) of the aircraft when it passes through the threshold of runway. Distance= -945.4486 + 460.522*aircraft_typ + 0.27335*speed_gr_sq + 14.056*height 1. For ‘Boeing’ aircraft type the predicted landing distance would be 0.25644 points greater than ‘Airbus’ aircraft type. 2. For every one-unit increase in square of ground speed there will be 0.92932-unit increase in the predicted landing distance 3. For every one-unit increase in height above threshold there will be 0.15345-unit increase in the predicted landing distance CHAPTER 1: Data Preparation This chapter will illustrate the step by step methods and techniques used for data preparation. Each step will be followed by reasoning, AS code and then output of SAS code or log file Step1: Read landing data from the two excel files ‘FAA-1.xls’ and ‘FAA-2.xls’. SAS Code: PROC IMPORT OUT= FAA1 DATAFILE= "C:UsersalamtfDownloadsFAA1.xls" DBMS=xls REPLACE; SHEET="FAA1"; GETNAMES=YES; RUN; proc print data=FAA1; run;
  • 2. PROC IMPORT OUT= FAA2 DATAFILE= "C:UsersalamtfDownloadsFAA2.xls" DBMS=xls REPLACE; SHEET="FAA2"; GETNAMES=YES; RUN; proc print data=FAA2; run; /*deleting all missing observations from FAA2*/ data FAA2; set FAA2; if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and pitch=. and distance=. then delete; run; data FAA1; set FAA1; if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and pitch=. and distance=. then delete; run; Reasoning: The dataset is read from the excel files and printed to see the the observations. This we can do in small dataset types. However, for large datasets we cannot do this. In the last data step, we are deleting observations which are completely blank. Logfile output: 69 /*deleting all missing observations*/ 70 data FAA2; 71 set FAA2; 72 if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and pitch=. and 72 ! distance=. then delete; 73 run;
  • 3. NOTE: There were 150 observations read from the data set WORK.FAA2. NOTE: The data set WORK.FAA2 has 150 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 74 data FAA1; 75 set FAA1; 76 if aircraft="" and no_pasg=. and speed_ground=. and speed_air=. and height=. and pitch=. and 76 ! distance=. then delete; 77 run; NOTE: There were 800 observations read from the data set WORK.FAA1. NOTE: The data set WORK.FAA1 has 800 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds Step2: Combining the datasets: SAS Code: data faa; set faa1 faa2; run; proc print data=faa; run; Reasoning: The two datasets created after reading file FAA1.xls and FAA2.xls Are appended as both are having same kind of information. FAA2 do not have variable duration. Hence for combined file all 150 transactions from this file will be missing for duration variable. Output:
  • 4. Step3: Check for removing duplicate observations SAS Code: proc sort data= faa out= faa_n nodupkey; by aircraft no_pasg speed_ground speed_air height pitch distance; run; Reasoning: duplicate observations are removed from the data as they do not add any extra information
  • 5. Log File Output: 80 proc sort data= faa out= faa_n nodupkey; 81 by aircraft no_pasg speed_ground speed_air height pitch distance; 82 run; NOTE: There were 950 observations read from the data set WORK.FAA. NOTE: 100 observations with duplicate key values were deleted. NOTE: The data set WORK.FAA_N has 850 observations and 8 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds Comment: There are 850 total observations on which we will be doing our further analysis. Step3: Performing Completeness check for each variable: examine if missing values are present. SAS Code: proc means data=faa_n n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99; var no_pasg speed_ground speed_air height pitch distance duration; run; Reasoning: n and nmiss options are used in proc means to get the count of non-missing and missing observations. The percentile distribution is check the get a feeling of distribution of different variables:
  • 6. Output: Comments: Here we see that variable speed_air is missing for 75% of the total data and Duration is missing for 6% of the data. Right Now we are keeping both the variables in our further analysis. However, we may need to drop these variables or do missing value imputation before we do our final analysis. Step4: Removing data abnormality. SAS code: /*treetment to remove abnormal observations*/ data faa_fn; set faa_n; /*duration of normal flights is always greater than 40 min */ if duration ne . and duration < 40 then delete; /*Speed_ground value less than 30 and more than 140 is considered abnormal*/ if speed_ground ne . and (speed_ground <30 or speed_ground > 140) then delete; /*Speed_air value less than 30 and more than 140 is considered abnormal*/ if speed_air ne . and (speed_air < 30 or speed_air > 140) then delete; /*height should be at least 6 meters at threshold of runway*/ if height ne . and Height < 6 then delete ; /*distance should be less than airport run way length which is 6000 feet*/ if distance ne . and distance >6000 THEN DELETE; run; Reasoning: Certain checks are applied on the data to remove the abnormal observations. These abnormality definitions are referred from Project.pdf file, which was given. Step5: Checking the cleaned data to validate the treatment applied
  • 7. proc means data=faa_fn n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99; var no_pasg speed_ground speed_air height pitch; run; Proc freq data=faa_fn; tables aircraft/missing list; run; SAS output: Comment: We can see from the distribution the variables that all the abnormality is removed. There are 831 observations remaining after removing the abnormality from the data. Step6: Summarization of distribution of each variable: 1. Duration(in minutes): Flight duration between taking off and landing. The duration of a normal flight should always be greater than 40min.
  • 8.
  • 9.
  • 10. Summarizing points: a) We can see from the box plot, Extreme values table and Quintile distributions that there are some outliers in the data. b) From the QQ plot and histogram the distribution and Test of normality table it seems distribution of variable is positively skewed. However, if we remove the outlier the distribution may tend to normal distribution. c) There are 50 missing observations which need to be treated before using in our analysis 2. No_pasg: The number of passengers in a flight.
  • 11.
  • 12.
  • 13. Summary of variable: a) We can see from the box plot, Extreme values table and Quintile distributions that there are some outliers in the data. b) From the QQ plot and histogram the distribution seems negatively skewed. c) From the test of normality table. The variable fails most of the normality tests as p value is is less than .05. hence the variable is not normal
  • 14. d) There are no missing observations. 3. Speed_ground: (in miles per hour): The ground speed of an aircraft when passing over the threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal.
  • 15. Summary of variable: a) We can see from the box plot and Quintile distributions that there are very few outliers in the data. b) From the QQ plot and histogram it seems distribution of variable is positively skewed. c) From Test of normality table, the variable passes all the normality test as the p value for all the test is greater than .05. d) There are no missing observations. 4. Speed_air: (in miles per hour): The air speed of an aircraft when passing over the
  • 16. threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Summary of variable: a) For 75% of the population this variable is missing. b) We cannot interpret much about the distribution of this variable as it is filled for a small population. 5. Height(in meters): The height of an aircraft when it is passing over the threshold of the runway. The landing aircraft is required to be at least 6 meters high at the threshold of the runway.
  • 17.
  • 18. Summary of variable: a) We can see from the box plot and Quintile distributions that there are very few outliers in the data. b) From the QQ plot and histogram distribution of variable is positively skewed. c) From the Test of normality table as the p value of all the test is greater than .05 hence the distribution of this variable follows normal distribution d) There are no missing observations.
  • 19. 6. Pitch (in degrees): Pitch angle of an aircraft when it is passing over the threshold of the runway.
  • 20. Summary of variable: a) From the QQ plot and histogram it seems the variable is normal. From the test of normality table, we can see p- value of most of the test is greater than .05 hence the variable is normally distributed. b) There are no missing observations. 7. Distance (in feet): The landing distance of an aircraft. More specifically, it refers to the distance between the threshold of the runway and the point where the aircraft can be fully stopped. The length of the airport runway is typically less than 6000 feet.
  • 21.
  • 22. Summary of variable: a) The distribution of the variable is positively skewed. From the qq plot and test of normality we can say the distribution is not normal (since plave of all the 4 test is less than .05) b) There are some extreme values which needs to be treted before using this variable in analysis. The extreme value table the the box plot and quntile distribution suggest the same.
  • 23. Step7: List of questions which I had during the data preparation: a) As there are lots of missing data for Speed_air and duration variable. We need to either delete missing values or we need to perform missing value imputation, Its not clear to me right now which approach to take. b) There are certain variables like no_pasg for which it was not clear whether the distribution is normal or not as the normality test and qq plot are giving contradicting inference. c) Need to determine which variables need to be dropped from the analysis. For example, speed air is missing for more than 75% of data. Hence we should drop this.
  • 24. CHAPTER 2: Data Exploration This chapter will illustrate the step by step methods and techniques used for data exploration. Each step will be followed by reasoning or observation, SAS code and then output of SAS code or log file Step1: Check the distribution of all the independent variables and dependent variable to identify missing percentage and outliers for each variables SAS CODE: proc means data=faa_fn n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99; var distance no_pasg speed_ground speed_air height pitch duration; run; data faa_fn1; set faa_fn; drop speed_air; run; /* there are 781 observation */ proc means data=faa_fn1 n nmiss min mean p1 p5 p10 p25 p50 p75 p95 p99; var distance no_pasg speed_ground height pitch duration; run; Observation: 1. Speed_air has 76 percent of missing data hence we are dropping this variable for our further analysis 2. Duration has 50 missing values. We are not deleting these observations as of now. After doing the first few iterations we will see whether to keep or drop this variable. If we drop this variable, then we will not lose information from deleting 50 observations. Step 2: See the trend of dependent variable with each independent variable SAS CODE: proc plot data= faa_fn1; plot distance*(no_pasg speed_ground height pitch duration); run; Output:
  • 25. 1. We cannot see any strong linear trend between duration and variables distance, height, no_pasg and pitch. 2. speed_ground has a second order relationship with distance. We might need to transform this variable to a second degree variable. We will need to see the residual plots from our initial modelling iterations to confirm this relationship. Step 3: Before proceeding with model building we need to check the strength of linear relationship between independent and dependent variables. Also, we need to check which two independent variables are highly correlated. We will only take one amongst the correlated variable. SAS Code: proc corr data= faa_fn1; run; Output: Observations: 1. We can see that there is no significant correlation between variables dependent variable distance vs independent variables duration and no_pasg.( p value of hypothesis testing that there is no correlation between variables is greater than .05 significance level) 2. There exists a strong positive correlation between speed_ground and distance. This gives impression that it will be a strong predictor.
  • 26. 3. Pitch and height are had significant correlation with distance but they have a week linear relationship with distance. Step4: Creating binary variables for categorical variables having two levels ( for more than two levels we need to crate level-1 number of indicators or use proc glm, right now we will be using proc reg) 1. Variable aircraft has two values ‘boeing’ and ‘airbus’. 2. Creating variable aircraft_typ, which is set to 1 when the aircraft is ‘boeing’ SAS CODE: data faa_fn2; set faa_fn1; if aircraft='boeing' then aircraft_typ=1; else aircraft_typ=0; run; proc freq data= faa_fn2; tables aircraft aircraft_typ; run;
  • 27. CHAPTER 3: Data Modelling Step1: Run proc reg with all the variables from the end of chapter 2 step 4. SAS Code: proc reg data=train_faa; model distance=aircraft_typ no_pasg pitch duration speed_ground height/vif; output out=outp_reg r=res residual=outp_residual; run; Output1:
  • 28. Observations: 1. There are 831 observations in training data of which 50 are missing due to missing values in duration. These observations are not used in any of the analysis in proc reg. 2. Analysis of Variance table shows that there is some dependability between the independent and dependent variables. P value less than .05 significance level indicates rejection of null hypothesis that all the coefficients of independent variable are zero 3. R-Square and Adj-R-sq values are greater than 80% which indicates significant variance in the dependent variable is explained by these independent variables 4. Parameter estimate table: we can see that from this run, variables such as no_pasg, pitch and duration are not significant hence we will drop these variables in our next iteration. (p-value > .05 indicates we can accept the null hypothesis that the coefficient is zero) 5. Airtcraft_typ, speed_ground and height are significant variables which we will keep in our next iteration. 6. Plot between residuals and independent variable shows that there exist a 2nd order relationship between speed ground and distance which we also verified from scatterplot from chapter 2. 7. 3. There is no multi-collinearity between the variables as VIF is less than 10 for all variables
  • 29. Output 2: Observations: Assumptions of linear regression states that the residuals should be identically distributed and should be independent from each other. 1. Constant variance: Heteroscedasticity: The plot between standardized residuals and predicted value should be identically distributed. Here this condition is not met. 2. Normality Assumption: The qq plot shows the residuals are not following normality assumption. 3. The U shaped graph between residual and predicted values shows there is some nonlinear term in the model. Speed_ground variable which we identified above might be the variable which is causing this non linearity.
  • 30. Step2. Re run proc reg with learnings from first iteration SAS Code: /* squaring variable seed_ground as per the resedual plots */ data faa_fn3; set faa_fn2; speed_gr_sq=speed_ground*speed_ground; run; proc reg data=faa_fn3 ; model distance=aircraft_typ speed_gr_sq height/vif stb;/* for giving standardized coefficients to report*/ output out=outp_reg r=res residual=outp_residual; run; Output1:
  • 31.
  • 32. Observations: 1. After transformation R-sq adjusted jumped to 92% 2. All the variables are significant and have intuitive sign. For example, higher the speed higher should be landing distance 3. There is no multi-collinearity between the variables as VIF is less than 10 for all variables Output2:
  • 33. Observations and Conclusion: 1. The graph between standardized residual and predicted values have become slightly identically distributed, which means the randomness in variance is reduced slightly. Step 3. Test of normality of residuals: SAS Code: PROC UNIVARIATE DATA=outp_reg NORMAL PLOT; VAR RES; RUN; Output:
  • 34. Observation: 1. Residuals do not have zero mean.As P value of the locationn test is less than .05 we have to reject the null hypothesis that the mean is zero. 2. Residuals fails normality test which means the reseduals are not normaly distributed. This defies normality assumtion of linear regression. Summary: Landing distance is largely dependent on ground speed of aircraft, aircraft type and height (descending order with respect to weight) of the aircraft when it passes through the threshold of runway. Distance= -945.4486 + 460.522*aircraft_typ + 0.27335*speed_gr_sq + 14.056*height 1. For ‘Boeing’ aircraft type(aircraft_typ) the predicted landing distance would be 460.522 points greater than ‘Airbus’ aircraft type. 2. For every one-unit increase in square of ground speed (speed_gr_sq) there will be 0.27335-unit increase in the predicted landing distance 3. For every one-unit increase in height(height) above threshold there will be 14.056- unit increase in the predicted landing distance Answers to qestions asked at the end
  • 35. 1. How many observations (flights) do you use to fit your final model? If not all 950 flights, why? Answer: There were 831 observations used to fit the final model. 119 observations are deleted to remove the abnormal ground speed of aircrafts, duration of flight landing, height above the threshold of runway etc. 2. What factors and how they impact the landing distance of a flight? Answer: Landing distance is largely dependent on ground speed of aircraft, aircraft type and height (descending order with respect to weight) of the aircraft when it passes through the threshold of runway. A. For ‘Boeing’ aircraft type the predicted landing distance would be 460.522 points greater than ‘Airbus’ aircraft type. B. For every one-unit increase in square of ground speed there will be 0.27335-unit increase in the predicted landing distance C. For every one-unit increase in height above threshold there will be 14.056- unit increase in the predicted landing distance 3. Is there any difference between the two makes Boeing and Airbus? Answer: Yes, there is a significant difference between the make of two commercial aircrafts, Boeing and Airbus. From our final results of report, we can say that: For ‘Boeing’ aircraft type the predicted landing distance would be 460.522 points greater than ‘Airbus’ aircraft type.