SlideShare a Scribd company logo
1 of 24
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
SUMMARY
The goal of this project is to determine the various factors and study their effects on landing
distance of a commercial flight. The motivation behind the study is to reduce the risk of landing
overrun. We perform our analysis on two data sets FAA1 and FAA2. We follow the various steps
for data analysis like data cleaning, data exploration, data visualization, modeling and model
checking. During the data cleaning stage we remove the blank records, duplicate observations
and the abnormal values. After studying the distribution of all variables like distance, duration,
speed air, speed ground, height, pitch and no of passengers we run the correlation analysis for
all the variables to determine the relationship between them. We observe that ‘Distance’ is
highly correlated to ‘speed air’ and ‘speed ground’. The regression analysis helps us to
determine that the factors like speed ground, height and pitch significantly affect landing
distance. Also, we determine the factors which play a significant role when the make of the
aircraft is considered separately. We conclude that for Boeing the predictor ‘pitch’ does not
play a significant role. A final model for independent variable ‘Distance’ is obtained in terms of
predictors ‘speed ground’, ‘height’ and ‘pitch’. Finally, we also conduct a model diagnostic for
the above derived model.
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Importing the data into SAS:
CODE:
Combining the data sets from different sources:
The rows containingall blankobservationsare alsodeletedinthisstep.
CODE:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
OUTPUT:
Removing the Duplicates from the data sets:
In this step all the rows with same observations, i.e. duplicate rows, are removed from the data
set.
CODE:
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Performing the completeness check of each variable – examine if missing values are present:
From the belowoutputwe cansee that the variable ‘duration’has 50 null observationsand the variable
‘speed_air’has642 null observations.The variables ‘duration’and‘speed_air’are crucial foranalysisas
they directlyimpactthe final goal of ourstudy.So, at the data cleaningstage we wouldnotdelete the
variablesorthe observationswithmissingvalues variablesandpreserve itforlaterstudyandanalysis.
CODE:
OUTPUT:
Performing the validity check of each variable – examine if abnormal values are present:
CODE:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
OUTPUT:
From the above output,itcan be seenthat there are 19 observationswithabnormal values.Thus,
abnormal value constitutesonly2.24%of the complete dataset.Asthispercentage isverylow,we can
separate these valuesinanotherdatasetanddelete itfromthe maindata set.
Separating the abnormal values intoanother data set:
In this step, we are creating a data set ‘Abnormal’ which would contain all observations with
abnormal values which could be used further in the analysis or testing of model.
CODE:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
OUTPUT:
Removingthe abnormal valuesfrom the Main Data set:
CODE:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Summarizing the distribution of each variable:
We will use the univariateprocedure tosummarizethe distributionof eachvariable.
Descriptive measureslike mean,median,mode,stddev,variance,skewness,kurtosis,range,inter-
quantile range will helpustounderstand the distributionof the data.
Histogramhas alsobeenplottedforeachvariable tosummarize andvisualize the distributionof the
variable.
From the valuesof skewnessandkurtosiswe caninfer the following:
The variables‘duration’,no_pasg’,‘speed_ground’,‘height’and‘pitch’are almostsymmetrically
distributedandapproximatelyfollownormal distribution.The variable‘height’isslightskewedtowards
the right.The variable ‘pitch’hasthickertails.
The variable speed_airisskewedtowardsthe right.
The variable ‘distance’isheavilyskewedtowardsthe right.
CODE:
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Establishing the relation of Speed_Air with other variables:
The variable speed_airhas75.53% missingdata.Asthisvariable hasa significantimpactonthe landing
distance we will trytopredictthe missingvaluesfromothervariableslikespeed_ground,height,pitch
and duration.
CODE:
OUTPUT:
INTERPRETATION:
From the above table we can see thatspeed_airhashighpositive correlationwithspeed_ground.But,
there isno correlationwithothervariables.Hence,we cantryto predictthe value of speed_airfrom
speed_ground.
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Predicting the value of Speed_Air:
We will runsimple linearregressionwithspeed_airasthe dependentvariable andspeed_groundasthe
independentvariable.
CODE:
OUTPUT:
INTERPRETATION:
The p-value fromanalysisof variance showsthatthe independentvariable ‘speed_ground’canreliably
predictthe dependentvariable‘speed_air’.The R-square value indicatesthatabout98% of variance in
speed_aircanbe predictedfromspeed_ground. The p-value fromparameterestimate suggestthat
parameterestimate forspeed_ground issignificantlydifferentfromzero. The model forspeed_Airis
givenas:
Speed_Air=0.9754(Speed_Ground)+2.64036
Imputing the value of Speed_Air:
CODE:
Establishing relation of Distance with other variables:
The entire purpose of thisstudyisto model the dependentvariable ‘Distance’intermsof independent
variablesduration,speed_air,speed_ground,pitch,heightandno_pasg.We will calculate the
correlationmatrix of all the above variablestodetermine the inter-relationshipbetweenthe variables.
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
CODE:
OUTPUT:
CODE:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
INTERPRETATION:
From the correlationmatrix we canobserve thatco-efficientof correlationbetween‘distance and
speed_air’andbetween‘distance andspeed_ground’issignificantlyhigh. The correlationof distance
withothervariablesisextremelysmall whichindicatesthatthere isnoindependentlinearrelationship
of the variableswithdistance.The same isevidentfromthe X-Yplotsshownabove. Also, the co-efficient
of correlationbetweenotherpairof variablesisextremelylow andsowe can conclude thatthere isno
inter-relationshipbetweenthem.
Creating a Model for Distance:
CODE:
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
INTERPRETATION:
From the above regressionanalysiswe cansee that79% of variance inDistance can be predictedusing
the independentvariables.
But, the p-valuesof speed_ground,durationandpitchindicate thatthe parameterestimatesforthese
variablesare insignificantanddonot influence the dependentvariable ‘Distance’.
The p-value forspeed_airissignificant;butas75% of the valuesof speed_airispredictedusing
speed_ground,thissignificance isnotsubstantial.Asspeed_Airismodelledusingspeed_ground we can
conclude thatspeed_groundhassignificantimpactondistance andwe will considerthisvariable(and
not speed_air) alongwithheightandpitchformodelling.
Revised Model for Distance: (Considering both Aircraft makes)
CODE:
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
INTERPRETATION:
The p-value fromanalysisof variance showsthatthe independentvariableScanreliablypredictthe
dependentvariable ‘Distance’.The R-square valueindicatesthatabout79% of variance in‘Distance’can
be predictedfromthe selectedpredictors.The p-value fromparameterestimatesuggestthatparameter
estimate forall predictorsare significantlydifferentfromzero.
The Model forDistance can be givenas:
Distance= (-3039.75) + (42.06925)*(speed_ground) + (13.49852)*(height) + (200.93948)* (pitch)
Revised Regression analysis for Distance: (Separately for Airbus and Boeing)
CODE:
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
INTERPRETATION:
From the above tableswe canconclude the following:
1. For Airbusthe predictorsremainthe same
2. For Boeingthe variable ‘pitch’becomes insignificantanddoesnotsignificantlyimpactthe
‘Distance’.
Model Diagnostics:
CODE:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
OUTPUT:
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
INTERPRETATION:
The mean forresidualsis0.From the Test of Normalitytable above we cancheckthe Shapiro-Wilktest
statistic,whichindicates thatdistributionof ‘Residuals’isclose tonormal distribution.
Statistical ComputingFinalProject
Pranil Deone,MSBANA,M12412774
How many observationsdo you use to fityour final model.If not all 950 flights,why?
We use 831 observationstofitourfinal model.Whenwe delete the blankrows,duplicate observations
and the observationscontainingabnormal valuesfromthe complete datasetwe are leftwith831
observations.
The variablesspeedairanddurationhave missingvalues.But,astheydonot impactthe distance we
don’tconsidertheminourfinal model.Also,the impactof speedairiscompensatedbyspeedground.
What factors and how theyimpact the landing distance of the flight?
The factors speedground,heightandpitchimpactthe landingdistance. Thesepredictorshave apositive
impactof the Distance.
Is there any difference betweenthe twomakes Boeing and Airbus?
Whenwe considerthe aircraftmake separatelythe factorsaffectingthe landingdistance remainthe
same exceptforone change.For Boeing,the factor‘pitch’becomesinsignificantanddoesnotaffectthe
landingdistance.

More Related Content

Similar to Final project

ENGR 132 Final Project
ENGR 132 Final ProjectENGR 132 Final Project
ENGR 132 Final Project
Mia Sheppard
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
Gurdal Ertek
 

Similar to Final project (20)

Stats computing project_final
Stats computing project_finalStats computing project_final
Stats computing project_final
 
reliability analysis in adjustment computation
reliability analysis in adjustment computationreliability analysis in adjustment computation
reliability analysis in adjustment computation
 
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
 
Regression kriging
Regression krigingRegression kriging
Regression kriging
 
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATIONSENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
 
Sensitivity analysis in a lidar camera calibration
Sensitivity analysis in a lidar camera calibrationSensitivity analysis in a lidar camera calibration
Sensitivity analysis in a lidar camera calibration
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
Geo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality AssessmentGeo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality Assessment
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
 
DEFECT PREDICTION USING ORDER STATISTICS
DEFECT PREDICTION USING ORDER STATISTICSDEFECT PREDICTION USING ORDER STATISTICS
DEFECT PREDICTION USING ORDER STATISTICS
 
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
 
Regression Analysis on Flights data
Regression Analysis on Flights dataRegression Analysis on Flights data
Regression Analysis on Flights data
 
ENGR 132 Final Project
ENGR 132 Final ProjectENGR 132 Final Project
ENGR 132 Final Project
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
PPT.pdf internship demo on machine lerning
PPT.pdf internship demo on machine lerningPPT.pdf internship demo on machine lerning
PPT.pdf internship demo on machine lerning
 
Keeping the same rules 2
Keeping the same rules 2Keeping the same rules 2
Keeping the same rules 2
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
Correlation and regression in r
Correlation and regression in rCorrelation and regression in r
Correlation and regression in r
 
A study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point ErrorsA study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point Errors
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 

Recently uploaded

如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
yulianti213969
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 

Recently uploaded (20)

如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 

Final project

  • 1. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 SUMMARY The goal of this project is to determine the various factors and study their effects on landing distance of a commercial flight. The motivation behind the study is to reduce the risk of landing overrun. We perform our analysis on two data sets FAA1 and FAA2. We follow the various steps for data analysis like data cleaning, data exploration, data visualization, modeling and model checking. During the data cleaning stage we remove the blank records, duplicate observations and the abnormal values. After studying the distribution of all variables like distance, duration, speed air, speed ground, height, pitch and no of passengers we run the correlation analysis for all the variables to determine the relationship between them. We observe that ‘Distance’ is highly correlated to ‘speed air’ and ‘speed ground’. The regression analysis helps us to determine that the factors like speed ground, height and pitch significantly affect landing distance. Also, we determine the factors which play a significant role when the make of the aircraft is considered separately. We conclude that for Boeing the predictor ‘pitch’ does not play a significant role. A final model for independent variable ‘Distance’ is obtained in terms of predictors ‘speed ground’, ‘height’ and ‘pitch’. Finally, we also conduct a model diagnostic for the above derived model.
  • 2. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 Importing the data into SAS: CODE: Combining the data sets from different sources: The rows containingall blankobservationsare alsodeletedinthisstep. CODE:
  • 3. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 OUTPUT: Removing the Duplicates from the data sets: In this step all the rows with same observations, i.e. duplicate rows, are removed from the data set. CODE: OUTPUT:
  • 4. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 Performing the completeness check of each variable – examine if missing values are present: From the belowoutputwe cansee that the variable ‘duration’has 50 null observationsand the variable ‘speed_air’has642 null observations.The variables ‘duration’and‘speed_air’are crucial foranalysisas they directlyimpactthe final goal of ourstudy.So, at the data cleaningstage we wouldnotdelete the variablesorthe observationswithmissingvalues variablesandpreserve itforlaterstudyandanalysis. CODE: OUTPUT: Performing the validity check of each variable – examine if abnormal values are present: CODE:
  • 5. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 OUTPUT: From the above output,itcan be seenthat there are 19 observationswithabnormal values.Thus, abnormal value constitutesonly2.24%of the complete dataset.Asthispercentage isverylow,we can separate these valuesinanotherdatasetanddelete itfromthe maindata set. Separating the abnormal values intoanother data set: In this step, we are creating a data set ‘Abnormal’ which would contain all observations with abnormal values which could be used further in the analysis or testing of model. CODE:
  • 7. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 Summarizing the distribution of each variable: We will use the univariateprocedure tosummarizethe distributionof eachvariable. Descriptive measureslike mean,median,mode,stddev,variance,skewness,kurtosis,range,inter- quantile range will helpustounderstand the distributionof the data. Histogramhas alsobeenplottedforeachvariable tosummarize andvisualize the distributionof the variable. From the valuesof skewnessandkurtosiswe caninfer the following: The variables‘duration’,no_pasg’,‘speed_ground’,‘height’and‘pitch’are almostsymmetrically distributedandapproximatelyfollownormal distribution.The variable‘height’isslightskewedtowards the right.The variable ‘pitch’hasthickertails. The variable speed_airisskewedtowardsthe right. The variable ‘distance’isheavilyskewedtowardsthe right. CODE: OUTPUT:
  • 12. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 Establishing the relation of Speed_Air with other variables: The variable speed_airhas75.53% missingdata.Asthisvariable hasa significantimpactonthe landing distance we will trytopredictthe missingvaluesfromothervariableslikespeed_ground,height,pitch and duration. CODE: OUTPUT: INTERPRETATION: From the above table we can see thatspeed_airhashighpositive correlationwithspeed_ground.But, there isno correlationwithothervariables.Hence,we cantryto predictthe value of speed_airfrom speed_ground.
  • 13. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 Predicting the value of Speed_Air: We will runsimple linearregressionwithspeed_airasthe dependentvariable andspeed_groundasthe independentvariable. CODE: OUTPUT: INTERPRETATION: The p-value fromanalysisof variance showsthatthe independentvariable ‘speed_ground’canreliably predictthe dependentvariable‘speed_air’.The R-square value indicatesthatabout98% of variance in speed_aircanbe predictedfromspeed_ground. The p-value fromparameterestimate suggestthat parameterestimate forspeed_ground issignificantlydifferentfromzero. The model forspeed_Airis givenas: Speed_Air=0.9754(Speed_Ground)+2.64036 Imputing the value of Speed_Air: CODE: Establishing relation of Distance with other variables: The entire purpose of thisstudyisto model the dependentvariable ‘Distance’intermsof independent variablesduration,speed_air,speed_ground,pitch,heightandno_pasg.We will calculate the correlationmatrix of all the above variablestodetermine the inter-relationshipbetweenthe variables.
  • 18. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 INTERPRETATION: From the correlationmatrix we canobserve thatco-efficientof correlationbetween‘distance and speed_air’andbetween‘distance andspeed_ground’issignificantlyhigh. The correlationof distance withothervariablesisextremelysmall whichindicatesthatthere isnoindependentlinearrelationship of the variableswithdistance.The same isevidentfromthe X-Yplotsshownabove. Also, the co-efficient of correlationbetweenotherpairof variablesisextremelylow andsowe can conclude thatthere isno inter-relationshipbetweenthem. Creating a Model for Distance: CODE: OUTPUT:
  • 19. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 INTERPRETATION: From the above regressionanalysiswe cansee that79% of variance inDistance can be predictedusing the independentvariables. But, the p-valuesof speed_ground,durationandpitchindicate thatthe parameterestimatesforthese variablesare insignificantanddonot influence the dependentvariable ‘Distance’. The p-value forspeed_airissignificant;butas75% of the valuesof speed_airispredictedusing speed_ground,thissignificance isnotsubstantial.Asspeed_Airismodelledusingspeed_ground we can conclude thatspeed_groundhassignificantimpactondistance andwe will considerthisvariable(and not speed_air) alongwithheightandpitchformodelling. Revised Model for Distance: (Considering both Aircraft makes) CODE: OUTPUT:
  • 20. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 INTERPRETATION: The p-value fromanalysisof variance showsthatthe independentvariableScanreliablypredictthe dependentvariable ‘Distance’.The R-square valueindicatesthatabout79% of variance in‘Distance’can be predictedfromthe selectedpredictors.The p-value fromparameterestimatesuggestthatparameter estimate forall predictorsare significantlydifferentfromzero. The Model forDistance can be givenas: Distance= (-3039.75) + (42.06925)*(speed_ground) + (13.49852)*(height) + (200.93948)* (pitch) Revised Regression analysis for Distance: (Separately for Airbus and Boeing) CODE: OUTPUT:
  • 21. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 INTERPRETATION: From the above tableswe canconclude the following: 1. For Airbusthe predictorsremainthe same 2. For Boeingthe variable ‘pitch’becomes insignificantanddoesnotsignificantlyimpactthe ‘Distance’. Model Diagnostics: CODE:
  • 23. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 INTERPRETATION: The mean forresidualsis0.From the Test of Normalitytable above we cancheckthe Shapiro-Wilktest statistic,whichindicates thatdistributionof ‘Residuals’isclose tonormal distribution.
  • 24. Statistical ComputingFinalProject Pranil Deone,MSBANA,M12412774 How many observationsdo you use to fityour final model.If not all 950 flights,why? We use 831 observationstofitourfinal model.Whenwe delete the blankrows,duplicate observations and the observationscontainingabnormal valuesfromthe complete datasetwe are leftwith831 observations. The variablesspeedairanddurationhave missingvalues.But,astheydonot impactthe distance we don’tconsidertheminourfinal model.Also,the impactof speedairiscompensatedbyspeedground. What factors and how theyimpact the landing distance of the flight? The factors speedground,heightandpitchimpactthe landingdistance. Thesepredictorshave apositive impactof the Distance. Is there any difference betweenthe twomakes Boeing and Airbus? Whenwe considerthe aircraftmake separatelythe factorsaffectingthe landingdistance remainthe same exceptforone change.For Boeing,the factor‘pitch’becomesinsignificantanddoesnotaffectthe landingdistance.