SlideShare a Scribd company logo
1 of 21
Download to read offline
Project Report
Statistical Computing
Rashmi Subrahmanya
University of Cincinnati
Contents
Executive Summary....................................................................................................................................i
Chapter 1: Data Preparation.....................................................................................................................1
Variable dictionary....................................................................................................................................1
SAS Code ...............................................................................................................................................1
SAS Output............................................................................................................................................4
Observations.........................................................................................................................................7
Conclusion.............................................................................................................................................8
Chapter 2: Descriptive Study ....................................................................................................................8
SAS Code ...............................................................................................................................................8
SAS Output............................................................................................................................................9
Observations.......................................................................................................................................11
Conclusion...........................................................................................................................................12
Chapter 3: Statistical Modeling...............................................................................................................12
SAS Code .............................................................................................................................................12
SAS Output..........................................................................................................................................13
Observations.......................................................................................................................................16
Conclusion...........................................................................................................................................17
i
Executive Summary
This project is carried out to understand which factors influence the landing distance of flights to
minimize the risk of over run. Chapter 1 explores the provided data and cleans it by removing
blank, duplicate and abnormal observations. It also gives a brief description of the variables
considered in the project and their distribution and summary statistics. Variables considered are
duration of the flight, number of passengers, make of aircraft, speed of aircraft on ground, speed
of aircraft in air, height and pitch of aircraft.
Chapter 2 explores the relationship between different factors influencing landing distance and
between the factors and landing distance. This helps to understand which factors are strongly
correlated with landing distance. Chapter 3 explores factors significant for landing distance using
regression analysis and then fits a linear regression model based on significant factors.
1
Chapter 1: Data Preparation
Goal: To explore and clean data
Data: Landing data from 950 commercial flights
Variable dictionary
Aircraft: The make of an aircraft (Boeing or Airbus).
Duration (in minutes): Flight duration between taking off and landing. The duration of a
normal flight should always be greater than 40min.
No_pasg: The number of passengers in a flight.
Speed_ground (in miles per hour): The ground speed of an aircraft when passing over the
threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the
landing would be considered as abnormal.
Speed_air (in miles per hour): The air speed of an aircraft when passing over the threshold of
the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be
considered as abnormal.
Height (in meters): The height of an aircraft when it is passing over the threshold of the runway.
The landing aircraft is required to be at least 6 meters high at the threshold of the runway.
Pitch (in degrees): Pitch angle of an aircraft when it is passing over the threshold of the runway.
Distance (in feet): The landing distance of an aircraft. More specifically, it refers to the distance
between the threshold of the runway and the point where the aircraft can be fully stopped. The
length of the airport runway is typically less than 6000 feet.
SAS Code
/**Importing FAA1.xls**/
PROC IMPORT DATAFILE="~/Classwork/Project/FAA1.xls"
DBMS=xls
OUT=work.faa1;
GETNAMES=yes;
RUN;
PROC PRINT DATA=work.faa1;
2
RUN;
/**Importing FAA2.xls**/
PROC IMPORT DATAFILE="~/Classwork/Project/FAA2.xls"
DBMS=xls
OUT=work.faa2;
GETNAMES=yes;
RUN;
PROC PRINT DATA=work.faa2;
RUN;
/**Deleting blank rows which were imported in FAA2.xls**/
DATA faa2;
SET faa2;
IF aircraft='' THEN DELETE;
RUN;
PROC PRINT DATA=faa2;
RUN;
/**Combining two data sets by concatenation**/
DATA combined;
SET faa1 faa2;
RUN;
PROC PRINT DATA=combined;
RUN;
/**Checking for Duplicate Rows**/
PROC SORT DATA=combined OUT=sorted NODUPKEY;
BY speed_ground;
RUN;
PROC PRINT DATA=sorted;
RUN;
/** Checking for missing values**/
PROC SORT DATA=sorted OUT=sorted;
BY aircraft;
RUN;
PROC MEANS DATA=sorted N NMISS MEAN RANGE;
VAR duration no_pasg speed_ground speed_air height pitch distance;
RUN;
3
/**Checking for abnormal values**/
DATA validation;
SET sorted;
IF duration>=40 THEN normal_duration='YES';
ELSE IF duration = ' ' THEN normal_duration=' ';
ELSE normal_duration='NO';
IF speed_ground>=30 AND speed_ground<=140 THEN normal_speed_ground='YES';
ELSE normal_speed_ground='NO';
IF speed_air>=30 AND speed_air<=140 THEN normal_speed_air='YES';
ELSE IF speed_air=' ' THEN normal_speed_air=' ';
ELSE normal_speed_air='NO';
IF height>=6 THEN normal_height='YES';
ELSE normal_height='NO';
IF distance<6000 THEN normal_distance='YES';
ELSE normal_distance='NO';
RUN;
PROC PRINT DATA=validation;
RUN;
/**Counting abnormal values**/
PROC FREQ DATA=validation;
TABLE normal_duration normal_speed_ground normal_speed_air normal_height
normal_distance;
RUN;
/**Since number of observations with abnormal values is low, they are deleted**/
DATA combined_new;
SET validation;
IF normal_duration='NO' THEN DELETE;
IF normal_speed_ground='NO' THEN DELETE;
IF normal_speed_air='NO' THEN DELETE;
IF normal_height='NO' THEN DELETE;
IF normal_distance='NO' THEN DELETE;
RUN;
PROC PRINT DATA=combined_new;
RUN;
/**Summarizing the distribution of each variable**/
PROC UNIVARIATE DATA=combined_new;
4
VAR duration no_pasg speed_ground speed_air height pitch distance;
RUN;
/**Summary statistics of cleaned data**/
PROC MEANS DATA=combined_new;
VAR duration no_pasg speed_ground speed_air height pitch distance;
RUN;
/**Renaming the data set**/
DATA flight;
SET combined_new;
RUN;
SAS Output
Figure 1: Checking for missing values in the combined data set
5
Figure 2: Checking for number of abnormal values in the variables
Figure 3: Distribution of duration Figure 4: Distribution of no_pasg
40 60 80 100 120 140 160 180 200 220 240 260 280 300
duration
0
5
10
15
20
Percent
Distributionof duration
30 34 38 42 46 50 54 58 62 66 70 74 78 82 86
no_pasg
0
5
10
15
20
25
Percent
Distributionof no_pasg
6
Figure 5: Distribution of speed_ground Figure 6: Distribution of speed_air
Figure 7:Distribution of height Figure 8: Distribution of pitch
Figure 9: Distribution of distance
36 44 52 60 68 76 84 92 100 108 116 124 132
speed_ground
0
5
10
15
20
Percent Distributionof speed_ground
92.5 97.5 102.5 107.5 112.5 117.5 122.5 127.5 132.5
speed_air
0
5
10
15
20
25
30
Percent
Distributionof speed_air
8 12 16 20 24 28 32 36 40 44 48 52 56 60
height
0
5
10
15
20
Percent
Distributionof height
2.25 2.55 2.85 3.15 3.45 3.75 4.05 4.35 4.65 4.95 5.25 5.55 5.85
pitch
0
5
10
15
20
25
Percent
Distributionof pitch
200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 5400
distance
0
5
10
15
20
25
30
Percent
Distributionof distance
7
Observations
• It is observed that variable ‘duration’ is missing in FAA2 data set and there were 50 blank
rows i.e, they did not contain any data. These blank rows were deleted and then the two
data sets are combined by concatenation. The resulting table had 950 observations and 8
variables.
• It was observed that there were duplicate rows. These were removed using NODUPKEY
with speed_ground as key since the chance that two speed_ground values are exactly the
same up to 10 decimal places is very low. Totally 100 duplicate rows were removed.
• A check for missing values was done using PROC MEANS. From figure 1, it is observed
that there are no missing values in the columns – no_pasg, speed_ground, height, pitch and
distance. There are 50 missing values in column duration. This is because the column was
not present in FAA2.xls which had 50 unique observations/rows. However, there is huge
number of missing values in the column speed_air. 642 out of 850 values are missing.
• A validity check of variables results in the following number of abnormal values:
Variable Number of abnormal
values
Number of missing
values
Percent of total
rows
duration 5 50 0.63
speed_ground 3 0 0.35
speed_air 1 642 0.48
height 10 0 1.18
distance 2 0 0.24
Table 1: Count of abnormal values
• PROC UNIVARIATE is used to understand the basic statistical measures and distribution
of the variables. Histogram plots were created for each variable. It can be observed that
distributions of duration, no_pasg, speed_ground, height and pitch are almost symmetrical
while that of speed_air is highly right skewed and that of height is also right skewed.
Summary Statistics of cleaned data is:
Figure 10:Summary Statistics of cleaned data
8
Conclusion
• Since speed_air variable does not have most of the values, we need to determine if this
variable is important to finding the risk of flight landing. Also, ‘duration’ variable has 50
missing values. If yes, then we may have to use substitute the missing values. If no, then it
can be dropped from further analysis. However, it is better to keep the variables in data
preparation stage and get to know their impact on flight landing distance. Imputation for
the missing values can be done in later stage.
• It is seen from table 1 that there are few rows which have abnormal values of the variables.
Such observations can be deleted as the percentage of such rows is very low compared to
total number of rows. After deleting such rows, we are left with 831 observations/rows.
Chapter 2: Descriptive Study
Goal: To explore relationship between each variable and landing distance and among the
variables.
SAS Code
/**Observing the relationship between landing distance and each variable using plots**/
PROC PLOT DATA=flight;
PLOT distance*duration;
PLOT distance*no_pasg;
PLOT distance*speed_ground;
PLOT distance*speed_air;
PLOT distance*height;
PLOT distance*pitch;
RUN;
/**Computing correlation coefficient with landing distance**/
PROC CORR DATA=flight;
VAR duration no_pasg speed_ground speed_air height pitch;
WITH distance;
TITLE Correlation Coefficients with landing distance;
RUN;
/**Computing the correlation coefficient for all pairs of variables**/
PROC CORR DATA=flight;
VAR distance duration no_pasg speed_ground speed_air height pitch;
TITLE Pairwise Correlation Coefficients;
RUN;
/**Observing relationship between landing distance, speed_ground and speed_air**/
9
PROC PLOT DATA=flight;
PLOT distance*speed_ground;
PLOT distance*speed_air;
PLOT speed_ground*speed_air;
PLOT distance*speed_ground='*' distance*speed_air='$'/overlay;
RUN;
SAS Output
Plots showing relationship between landing distance and each factor:
Figure 11:Plot of distance vs duration Figure 12: plot of distance vs no_pasg
Figure 13: Plot of distance vs speed_ground Figure 14: Plot of distance vs speed_air
10
Figure 15: Plot of distance vs height Figure 16: Plot of distance vs pitch
Table 2: Correlation coefficient with landing distance
11
Table 3: Pairwise correlation coefficients
Figure 17: Plot showing correlation between speed_ground and speed_air Figure 18: Overlaying of plots
Observations
• From the plots, it is observed that speed_ground and speed_air are strongly and positively
correlated with landing distance, while the other factors are weakly correlated with
12
distance. The same is verified from table 2, where correlation coefficient for speed_ground
and speed_air is 0.86624 and 0.94210 respectively.
• P-values for factors in table 2 indicates that speed_ground, speed_air, height and pitch are
significant ones, assuming 0.05 level of significance.
• Looking at pairwise correlation coefficient table, it can be seen that speed_ground and
speed_air are strongly and positively correlated with each other with r value of 0.987.
• Imputing for missing values of speed_air with mean, affects the correlation between
speed_air and distance. So, I did not impute missing values for speed_air.
Conclusion
• Since speed_ground and speed_air are highly correlated, we may have to drop one variable.
They may be representing same information. We will look at regression analysis and then
decide.
Chapter 3: Statistical Modeling
Goal: To fit a linear regression model
SAS Code
/**Regression Analysis including only speed_ground**/
PROC REG DATA=flight;
MODEL distance=speed_ground;
TITLE Regression Analysis of the data set;
RUN;
/**Regression Analysis including only speed_air**/
PROC REG DATA=flight;
MODEL distance=speed_air;
TITLE Regression Analysis of the data set;
RUN;
/**Regression Analysis including speed_ground and speed_air**/
PROC REG DATA=flight;
MODEL distance=speed_ground speed_air;
TITLE Regression Analysis of the data set;
RUN;
/**Regression Analysis including all the factors**/
PROC REG DATA=flight;
MODEL distance=duration no_pasg speed_ground speed_air height pitch/vif;
TITLE Regression Analysis of the data set;
RUN;
13
/**Regression Analysis including significant factors**/
PROC REG DATA=flight;
MODEL distance=speed_ground height pitch;
TITLE Regression Analysis of the data set;
RUN;
/**Regression Analysis including significant factors**/
PROC REG DATA=flight;
MODEL distance=speed_air height pitch;
TITLE Regression Analysis of the data set;
RUN;
/**Computing the correlation coefficient for all pairs of variables in the final model**/
PROC CORR DATA=flight;
VAR distance speed_ground height pitch;
TITLE Pairwise Correlation Coefficients in the final model;
RUN;
/**Model Diagnostics**/
PROC REG DATA=combined_new;
MODEL distance=speed_ground height pitch/r;
OUTPUT OUT=diagnostics r=residual;
RUN;
SAS Output
14
Figure 19: Regression analysis including only speed_ground Figure 20: Regression analysis including only speed_air
Figure 21: Regression analysis including speed_ground and speed_air Figure 22: Regression analysis of data set
15
Figure 23:Regression analysis with speed_ground, height and pitch
Figure 24: Regression analysis with speed_air, height and pitch
16
Observations
• When we do regression analysis using speed_ground, a positive coefficient is observed in
the equation. We get:
Distance = -1773.941 + 41.44*speed_ground
• When we do regression analysis using only speed_air, a positive coefficient is observed in
the equation. But only 203 observations are used due to large number of missing values in
speed_air. We get:
Distance = -5444.71 + 79.532*speed_air
• When we do regression analysis using both speed_ground and speed_air factors, it is
observed that coefficient of speed_ground becomes negative and decreases in value, while
coefficient of speed_air remains positive and increases in value. The value of standard error
also increases for both speed_groundand speed_air. Compiling the results in a table, we
get:
Model Parameter
estimate of
speed_ground
Parameter
estimate of
speed_air
Standard error
of
speed_ground
Standard error
of speed_air
Speed_ground 41.44 - 0.83017 -
Speed_air - 79.532 - 1.9968
Speed_ground,
speed_air
-14.37 93.958 12.68367 12.88610
Again, 203 observations are considered due to large number of missing values in speed_air.
We get:
Distance = -5462.283-14.37*speed_ground+93.958*speed_air
Also, if we look at p-values, it shows that speed_ground is insignificant factor, but in
reality, it is a significant factor. This indicates multicollinearity.
• Observing p-values from regression analysis of all factors, we see that factors duration and
no_pasg are insignificant for the model. They are dropped from further analysis. P-value
of speed_ground indicates that it is insignificant, however this is due to multicollinearity.
• Scenario 1: Considering speed_ground, height and pitch. When we do regression analysis
with these three factors, we get:
Distance = -3039.75+42.06925*speed_ground+13.49852*height+200.93948*pitch
However, the value of adjusted r-square reduces from 91.47 to 78.59 when we drop the
factor speed_air.
• Scenario 2: Considering speed_air, height and pitch. When we do regression analysis with
these three factors, we get:
Distance = -6478.3942+80.79711*speed_air+12.81754*height+124.29384*pitch
The value of adjusted r-square is almost the same.
17
Conclusion
Speed_air and speed_ground are highly correlated and including both of them in the final model
might result in unstable model and incorrect predictions. Speed_air can be considered as
speed_ground plus speed of the wind. Both represent almost the same information and it is better
to drop one of the variables in the final model.
Looking at the values of adjusted r-square for scenarios 1 and 2, it seems that it is better to drop
speed_ground factor since it reduces the value of adjusted r-square. A reduced adjusted r-square
value implies that percentage of variation in dependent variable explained by the independent
variables is less. But we need to consider the fact that speed_air has lot of missing values and the
adjusted r- square was calculated using only 203 available observations. It is better to go with
scenario one, i.e, consider speed_ground, height and pitch in the final model.
Final model is:
Distance = -3039.75+42.06925*speed_ground+13.49852*height+200.93948*pitch
To reduce the risk of landing over running, the values of speed_groun, height and pitch should be
such that distance is less than 6000 feet.
18
Figure 25: Diagnostics for final model

More Related Content

Similar to Statistical computing project

FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisQuynh Tran
 
Regression Analysis on Flights data
Regression Analysis on Flights dataRegression Analysis on Flights data
Regression Analysis on Flights dataMansi Verma
 
Router Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsRouter Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsMorteza Mahdilar
 
Modeling and Prediction using SAS
Modeling and Prediction using SASModeling and Prediction using SAS
Modeling and Prediction using SASJatin Saini
 
An ADF Special Report
An ADF Special Report An ADF Special Report
An ADF Special Report Luc Bors
 
PLSQLmy Updated (1).pptx
PLSQLmy Updated (1).pptxPLSQLmy Updated (1).pptx
PLSQLmy Updated (1).pptxvamsiyadav39
 
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docxInstruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docxcarliotwaycave
 
Predictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySparkPredictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySparkRussell Jurney
 
project report in C++ programming and SQL
project report in C++ programming and SQLproject report in C++ programming and SQL
project report in C++ programming and SQLvikram mahendra
 
4 operators, expressions &amp; statements
4  operators, expressions &amp; statements4  operators, expressions &amp; statements
4 operators, expressions &amp; statementsMomenMostafa
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution planspaulguerin
 
I need help to modify my code according to the instructions- Modify th.pdf
I need help to modify my code according to the instructions- Modify th.pdfI need help to modify my code according to the instructions- Modify th.pdf
I need help to modify my code according to the instructions- Modify th.pdfpnaran46
 
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6Mahesh Vallampati
 
C++ project on police station software
C++ project on police station softwareC++ project on police station software
C++ project on police station softwaredharmenderlodhi021
 

Similar to Statistical computing project (20)

FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and Analysis
 
Regression Analysis on Flights data
Regression Analysis on Flights dataRegression Analysis on Flights data
Regression Analysis on Flights data
 
Router Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsRouter Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditions
 
Modeling and Prediction using SAS
Modeling and Prediction using SASModeling and Prediction using SAS
Modeling and Prediction using SAS
 
An ADF Special Report
An ADF Special Report An ADF Special Report
An ADF Special Report
 
PLSQLmy Updated (1).pptx
PLSQLmy Updated (1).pptxPLSQLmy Updated (1).pptx
PLSQLmy Updated (1).pptx
 
Reporting solutions for ADF Applications
Reporting solutions for ADF ApplicationsReporting solutions for ADF Applications
Reporting solutions for ADF Applications
 
Gps c
Gps cGps c
Gps c
 
Oop object oriented programing topics
Oop object oriented programing topicsOop object oriented programing topics
Oop object oriented programing topics
 
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docxInstruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
 
Predictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySparkPredictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySpark
 
project report in C++ programming and SQL
project report in C++ programming and SQLproject report in C++ programming and SQL
project report in C++ programming and SQL
 
4 operators, expressions &amp; statements
4  operators, expressions &amp; statements4  operators, expressions &amp; statements
4 operators, expressions &amp; statements
 
Flights Landing Overrun Project
Flights Landing Overrun ProjectFlights Landing Overrun Project
Flights Landing Overrun Project
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
 
Sap snc configuration
Sap snc configurationSap snc configuration
Sap snc configuration
 
I need help to modify my code according to the instructions- Modify th.pdf
I need help to modify my code according to the instructions- Modify th.pdfI need help to modify my code according to the instructions- Modify th.pdf
I need help to modify my code according to the instructions- Modify th.pdf
 
PL-SQL.pdf
PL-SQL.pdfPL-SQL.pdf
PL-SQL.pdf
 
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6
 
C++ project on police station software
C++ project on police station softwareC++ project on police station software
C++ project on police station software
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Statistical computing project

  • 1. Project Report Statistical Computing Rashmi Subrahmanya University of Cincinnati
  • 2. Contents Executive Summary....................................................................................................................................i Chapter 1: Data Preparation.....................................................................................................................1 Variable dictionary....................................................................................................................................1 SAS Code ...............................................................................................................................................1 SAS Output............................................................................................................................................4 Observations.........................................................................................................................................7 Conclusion.............................................................................................................................................8 Chapter 2: Descriptive Study ....................................................................................................................8 SAS Code ...............................................................................................................................................8 SAS Output............................................................................................................................................9 Observations.......................................................................................................................................11 Conclusion...........................................................................................................................................12 Chapter 3: Statistical Modeling...............................................................................................................12 SAS Code .............................................................................................................................................12 SAS Output..........................................................................................................................................13 Observations.......................................................................................................................................16 Conclusion...........................................................................................................................................17
  • 3. i Executive Summary This project is carried out to understand which factors influence the landing distance of flights to minimize the risk of over run. Chapter 1 explores the provided data and cleans it by removing blank, duplicate and abnormal observations. It also gives a brief description of the variables considered in the project and their distribution and summary statistics. Variables considered are duration of the flight, number of passengers, make of aircraft, speed of aircraft on ground, speed of aircraft in air, height and pitch of aircraft. Chapter 2 explores the relationship between different factors influencing landing distance and between the factors and landing distance. This helps to understand which factors are strongly correlated with landing distance. Chapter 3 explores factors significant for landing distance using regression analysis and then fits a linear regression model based on significant factors.
  • 4. 1 Chapter 1: Data Preparation Goal: To explore and clean data Data: Landing data from 950 commercial flights Variable dictionary Aircraft: The make of an aircraft (Boeing or Airbus). Duration (in minutes): Flight duration between taking off and landing. The duration of a normal flight should always be greater than 40min. No_pasg: The number of passengers in a flight. Speed_ground (in miles per hour): The ground speed of an aircraft when passing over the threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Speed_air (in miles per hour): The air speed of an aircraft when passing over the threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Height (in meters): The height of an aircraft when it is passing over the threshold of the runway. The landing aircraft is required to be at least 6 meters high at the threshold of the runway. Pitch (in degrees): Pitch angle of an aircraft when it is passing over the threshold of the runway. Distance (in feet): The landing distance of an aircraft. More specifically, it refers to the distance between the threshold of the runway and the point where the aircraft can be fully stopped. The length of the airport runway is typically less than 6000 feet. SAS Code /**Importing FAA1.xls**/ PROC IMPORT DATAFILE="~/Classwork/Project/FAA1.xls" DBMS=xls OUT=work.faa1; GETNAMES=yes; RUN; PROC PRINT DATA=work.faa1;
  • 5. 2 RUN; /**Importing FAA2.xls**/ PROC IMPORT DATAFILE="~/Classwork/Project/FAA2.xls" DBMS=xls OUT=work.faa2; GETNAMES=yes; RUN; PROC PRINT DATA=work.faa2; RUN; /**Deleting blank rows which were imported in FAA2.xls**/ DATA faa2; SET faa2; IF aircraft='' THEN DELETE; RUN; PROC PRINT DATA=faa2; RUN; /**Combining two data sets by concatenation**/ DATA combined; SET faa1 faa2; RUN; PROC PRINT DATA=combined; RUN; /**Checking for Duplicate Rows**/ PROC SORT DATA=combined OUT=sorted NODUPKEY; BY speed_ground; RUN; PROC PRINT DATA=sorted; RUN; /** Checking for missing values**/ PROC SORT DATA=sorted OUT=sorted; BY aircraft; RUN; PROC MEANS DATA=sorted N NMISS MEAN RANGE; VAR duration no_pasg speed_ground speed_air height pitch distance; RUN;
  • 6. 3 /**Checking for abnormal values**/ DATA validation; SET sorted; IF duration>=40 THEN normal_duration='YES'; ELSE IF duration = ' ' THEN normal_duration=' '; ELSE normal_duration='NO'; IF speed_ground>=30 AND speed_ground<=140 THEN normal_speed_ground='YES'; ELSE normal_speed_ground='NO'; IF speed_air>=30 AND speed_air<=140 THEN normal_speed_air='YES'; ELSE IF speed_air=' ' THEN normal_speed_air=' '; ELSE normal_speed_air='NO'; IF height>=6 THEN normal_height='YES'; ELSE normal_height='NO'; IF distance<6000 THEN normal_distance='YES'; ELSE normal_distance='NO'; RUN; PROC PRINT DATA=validation; RUN; /**Counting abnormal values**/ PROC FREQ DATA=validation; TABLE normal_duration normal_speed_ground normal_speed_air normal_height normal_distance; RUN; /**Since number of observations with abnormal values is low, they are deleted**/ DATA combined_new; SET validation; IF normal_duration='NO' THEN DELETE; IF normal_speed_ground='NO' THEN DELETE; IF normal_speed_air='NO' THEN DELETE; IF normal_height='NO' THEN DELETE; IF normal_distance='NO' THEN DELETE; RUN; PROC PRINT DATA=combined_new; RUN; /**Summarizing the distribution of each variable**/ PROC UNIVARIATE DATA=combined_new;
  • 7. 4 VAR duration no_pasg speed_ground speed_air height pitch distance; RUN; /**Summary statistics of cleaned data**/ PROC MEANS DATA=combined_new; VAR duration no_pasg speed_ground speed_air height pitch distance; RUN; /**Renaming the data set**/ DATA flight; SET combined_new; RUN; SAS Output Figure 1: Checking for missing values in the combined data set
  • 8. 5 Figure 2: Checking for number of abnormal values in the variables Figure 3: Distribution of duration Figure 4: Distribution of no_pasg 40 60 80 100 120 140 160 180 200 220 240 260 280 300 duration 0 5 10 15 20 Percent Distributionof duration 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 no_pasg 0 5 10 15 20 25 Percent Distributionof no_pasg
  • 9. 6 Figure 5: Distribution of speed_ground Figure 6: Distribution of speed_air Figure 7:Distribution of height Figure 8: Distribution of pitch Figure 9: Distribution of distance 36 44 52 60 68 76 84 92 100 108 116 124 132 speed_ground 0 5 10 15 20 Percent Distributionof speed_ground 92.5 97.5 102.5 107.5 112.5 117.5 122.5 127.5 132.5 speed_air 0 5 10 15 20 25 30 Percent Distributionof speed_air 8 12 16 20 24 28 32 36 40 44 48 52 56 60 height 0 5 10 15 20 Percent Distributionof height 2.25 2.55 2.85 3.15 3.45 3.75 4.05 4.35 4.65 4.95 5.25 5.55 5.85 pitch 0 5 10 15 20 25 Percent Distributionof pitch 200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 5400 distance 0 5 10 15 20 25 30 Percent Distributionof distance
  • 10. 7 Observations • It is observed that variable ‘duration’ is missing in FAA2 data set and there were 50 blank rows i.e, they did not contain any data. These blank rows were deleted and then the two data sets are combined by concatenation. The resulting table had 950 observations and 8 variables. • It was observed that there were duplicate rows. These were removed using NODUPKEY with speed_ground as key since the chance that two speed_ground values are exactly the same up to 10 decimal places is very low. Totally 100 duplicate rows were removed. • A check for missing values was done using PROC MEANS. From figure 1, it is observed that there are no missing values in the columns – no_pasg, speed_ground, height, pitch and distance. There are 50 missing values in column duration. This is because the column was not present in FAA2.xls which had 50 unique observations/rows. However, there is huge number of missing values in the column speed_air. 642 out of 850 values are missing. • A validity check of variables results in the following number of abnormal values: Variable Number of abnormal values Number of missing values Percent of total rows duration 5 50 0.63 speed_ground 3 0 0.35 speed_air 1 642 0.48 height 10 0 1.18 distance 2 0 0.24 Table 1: Count of abnormal values • PROC UNIVARIATE is used to understand the basic statistical measures and distribution of the variables. Histogram plots were created for each variable. It can be observed that distributions of duration, no_pasg, speed_ground, height and pitch are almost symmetrical while that of speed_air is highly right skewed and that of height is also right skewed. Summary Statistics of cleaned data is: Figure 10:Summary Statistics of cleaned data
  • 11. 8 Conclusion • Since speed_air variable does not have most of the values, we need to determine if this variable is important to finding the risk of flight landing. Also, ‘duration’ variable has 50 missing values. If yes, then we may have to use substitute the missing values. If no, then it can be dropped from further analysis. However, it is better to keep the variables in data preparation stage and get to know their impact on flight landing distance. Imputation for the missing values can be done in later stage. • It is seen from table 1 that there are few rows which have abnormal values of the variables. Such observations can be deleted as the percentage of such rows is very low compared to total number of rows. After deleting such rows, we are left with 831 observations/rows. Chapter 2: Descriptive Study Goal: To explore relationship between each variable and landing distance and among the variables. SAS Code /**Observing the relationship between landing distance and each variable using plots**/ PROC PLOT DATA=flight; PLOT distance*duration; PLOT distance*no_pasg; PLOT distance*speed_ground; PLOT distance*speed_air; PLOT distance*height; PLOT distance*pitch; RUN; /**Computing correlation coefficient with landing distance**/ PROC CORR DATA=flight; VAR duration no_pasg speed_ground speed_air height pitch; WITH distance; TITLE Correlation Coefficients with landing distance; RUN; /**Computing the correlation coefficient for all pairs of variables**/ PROC CORR DATA=flight; VAR distance duration no_pasg speed_ground speed_air height pitch; TITLE Pairwise Correlation Coefficients; RUN; /**Observing relationship between landing distance, speed_ground and speed_air**/
  • 12. 9 PROC PLOT DATA=flight; PLOT distance*speed_ground; PLOT distance*speed_air; PLOT speed_ground*speed_air; PLOT distance*speed_ground='*' distance*speed_air='$'/overlay; RUN; SAS Output Plots showing relationship between landing distance and each factor: Figure 11:Plot of distance vs duration Figure 12: plot of distance vs no_pasg Figure 13: Plot of distance vs speed_ground Figure 14: Plot of distance vs speed_air
  • 13. 10 Figure 15: Plot of distance vs height Figure 16: Plot of distance vs pitch Table 2: Correlation coefficient with landing distance
  • 14. 11 Table 3: Pairwise correlation coefficients Figure 17: Plot showing correlation between speed_ground and speed_air Figure 18: Overlaying of plots Observations • From the plots, it is observed that speed_ground and speed_air are strongly and positively correlated with landing distance, while the other factors are weakly correlated with
  • 15. 12 distance. The same is verified from table 2, where correlation coefficient for speed_ground and speed_air is 0.86624 and 0.94210 respectively. • P-values for factors in table 2 indicates that speed_ground, speed_air, height and pitch are significant ones, assuming 0.05 level of significance. • Looking at pairwise correlation coefficient table, it can be seen that speed_ground and speed_air are strongly and positively correlated with each other with r value of 0.987. • Imputing for missing values of speed_air with mean, affects the correlation between speed_air and distance. So, I did not impute missing values for speed_air. Conclusion • Since speed_ground and speed_air are highly correlated, we may have to drop one variable. They may be representing same information. We will look at regression analysis and then decide. Chapter 3: Statistical Modeling Goal: To fit a linear regression model SAS Code /**Regression Analysis including only speed_ground**/ PROC REG DATA=flight; MODEL distance=speed_ground; TITLE Regression Analysis of the data set; RUN; /**Regression Analysis including only speed_air**/ PROC REG DATA=flight; MODEL distance=speed_air; TITLE Regression Analysis of the data set; RUN; /**Regression Analysis including speed_ground and speed_air**/ PROC REG DATA=flight; MODEL distance=speed_ground speed_air; TITLE Regression Analysis of the data set; RUN; /**Regression Analysis including all the factors**/ PROC REG DATA=flight; MODEL distance=duration no_pasg speed_ground speed_air height pitch/vif; TITLE Regression Analysis of the data set; RUN;
  • 16. 13 /**Regression Analysis including significant factors**/ PROC REG DATA=flight; MODEL distance=speed_ground height pitch; TITLE Regression Analysis of the data set; RUN; /**Regression Analysis including significant factors**/ PROC REG DATA=flight; MODEL distance=speed_air height pitch; TITLE Regression Analysis of the data set; RUN; /**Computing the correlation coefficient for all pairs of variables in the final model**/ PROC CORR DATA=flight; VAR distance speed_ground height pitch; TITLE Pairwise Correlation Coefficients in the final model; RUN; /**Model Diagnostics**/ PROC REG DATA=combined_new; MODEL distance=speed_ground height pitch/r; OUTPUT OUT=diagnostics r=residual; RUN; SAS Output
  • 17. 14 Figure 19: Regression analysis including only speed_ground Figure 20: Regression analysis including only speed_air Figure 21: Regression analysis including speed_ground and speed_air Figure 22: Regression analysis of data set
  • 18. 15 Figure 23:Regression analysis with speed_ground, height and pitch Figure 24: Regression analysis with speed_air, height and pitch
  • 19. 16 Observations • When we do regression analysis using speed_ground, a positive coefficient is observed in the equation. We get: Distance = -1773.941 + 41.44*speed_ground • When we do regression analysis using only speed_air, a positive coefficient is observed in the equation. But only 203 observations are used due to large number of missing values in speed_air. We get: Distance = -5444.71 + 79.532*speed_air • When we do regression analysis using both speed_ground and speed_air factors, it is observed that coefficient of speed_ground becomes negative and decreases in value, while coefficient of speed_air remains positive and increases in value. The value of standard error also increases for both speed_groundand speed_air. Compiling the results in a table, we get: Model Parameter estimate of speed_ground Parameter estimate of speed_air Standard error of speed_ground Standard error of speed_air Speed_ground 41.44 - 0.83017 - Speed_air - 79.532 - 1.9968 Speed_ground, speed_air -14.37 93.958 12.68367 12.88610 Again, 203 observations are considered due to large number of missing values in speed_air. We get: Distance = -5462.283-14.37*speed_ground+93.958*speed_air Also, if we look at p-values, it shows that speed_ground is insignificant factor, but in reality, it is a significant factor. This indicates multicollinearity. • Observing p-values from regression analysis of all factors, we see that factors duration and no_pasg are insignificant for the model. They are dropped from further analysis. P-value of speed_ground indicates that it is insignificant, however this is due to multicollinearity. • Scenario 1: Considering speed_ground, height and pitch. When we do regression analysis with these three factors, we get: Distance = -3039.75+42.06925*speed_ground+13.49852*height+200.93948*pitch However, the value of adjusted r-square reduces from 91.47 to 78.59 when we drop the factor speed_air. • Scenario 2: Considering speed_air, height and pitch. When we do regression analysis with these three factors, we get: Distance = -6478.3942+80.79711*speed_air+12.81754*height+124.29384*pitch The value of adjusted r-square is almost the same.
  • 20. 17 Conclusion Speed_air and speed_ground are highly correlated and including both of them in the final model might result in unstable model and incorrect predictions. Speed_air can be considered as speed_ground plus speed of the wind. Both represent almost the same information and it is better to drop one of the variables in the final model. Looking at the values of adjusted r-square for scenarios 1 and 2, it seems that it is better to drop speed_ground factor since it reduces the value of adjusted r-square. A reduced adjusted r-square value implies that percentage of variation in dependent variable explained by the independent variables is less. But we need to consider the fact that speed_air has lot of missing values and the adjusted r- square was calculated using only 203 available observations. It is better to go with scenario one, i.e, consider speed_ground, height and pitch in the final model. Final model is: Distance = -3039.75+42.06925*speed_ground+13.49852*height+200.93948*pitch To reduce the risk of landing over running, the values of speed_groun, height and pitch should be such that distance is less than 6000 feet.
  • 21. 18 Figure 25: Diagnostics for final model