SlideShare a Scribd company logo
1 of 24
Download to read offline
_
BANA 6043 Project
NAME: AYANK GUPTA UCID:M12388639
Background: Flight landing.
Motivation: To reduce the risk of landing overrun.
Goal: To study what factors and how they would impact the landing distance of
a commercial flight.
Data: Landing data (landing distance and other parameters) from 950
commercial flights (not real data set but simulated from statistical models). See
two Excel files ‘FAA-1.xls’ (800 flights) and ‘FAA-2.xls’ (150 flights).
_
Chapter 1: Data Preparation
1. Combining of the data sets from different sources
Output of both the imports
/**FAA1**/
_
/**FAA2**/
/** Combing both the data sets **/
_
/*Checking for Duplicates and removing them from the combines datasets*/
Note: We observed 100 duplicates entries from the combines dataset hence
removed it from it.
_
2. Performing the completeness check of each variable – examine if
missing values are present;
Variable N Missing Values % Missing Values
Duration 800 50 5.8%
no_pasg 850 0 0%
speed_ground 850 0 0%
speed_air 208 642 75%
Height 850 0 0%
Pitch 850 0 0%
Distance 850 0 0%
Note:
1. 16% of the values of the DURATION variable are missing because 50 rows are missing from
the FAA2 datasets
2. 75% of the values of the speed_air are missing and we need to further examine the column
for data cleaning
Performing the validity check of each variable – examine if abnormal values are present;
_
NOTE: Here we see that the height of the few values in height are negative and we need to flag them
out from our next analysis.
In our next analysis, we will perform the analysis on each and every variable based on the business
rule given for each variables.
_
/*Checking for outliers in height*/
Note: By performing the above step we are able to identify the heights with negative hieghts.
_
Cleaning the data based on the results of Steps 2 and 3
Note : We are able to remove 18 Values according to the abnormalities
1. For now we are not removing the missing values rows because it will create bias in the data
a. I am planning to impute the missing values.
b. Or I will be using some approximations like mean to fill the missing values
_
Summarizing the distribution of each variable
We went ahead to see the distribution of each and every variable to see which of the variable
shows a normal distribution and those variables who are in a way skewed or biased to
Variable Label N Mean Std Dev Minimum Maximum
Skweness
duration duration 782 154.731 48.335 41.949 305.622
0.192089
no_pasg no_pasg 832 60.060 7.488 29.000 87.000
-0.015304
speed_ground speed_ground 832 79.611 18.829 33.574 136.659
0.110191
speed_air speed_air 204 103.646 9.982 90.003 136.423
0.9447
height height 832 30.474 9.791 6.228 59.946
0.125057
pitch pitch 832 4.005 0.526 2.284 5.927
0.016221
distance distance 832 1,528.240 911.045 41.722 6,309.950
1.560395
DURATION
_
NO_PSNG
Speed Ground
_
Speed Air
Height:
_
Pitch
Distance
_
CHAPTER 2: Descriptive Study (XY plots and correlation studies)
Distance Vs Duration
Distance Vs NO_PASG
_
Distance Vs Speed Ground
Distance Vs Air Speed
_
Distance Vs Height
Distance Vs Pitch
_
My Interpretation on the XY plot of the data
1. Distance Vs Duration: The values seem to scatter and the relationship
doesn’t seem to be linear
2. Distance Vs No_Pasg: the relationship is not linear
3. Distance Vs Speed_Ground: The relation is linear or in other words the
relationship shows a monotonic relationship
4. Distance Vs Speed air is fairly linear but we have a lot of missing values in
the speed air, hence the relationship cannot be considered significant
5. Distance Vs Height and Pitch seems a bit scattered
_
Correlation Matrix between the variables and their interpretation:
Interpretation of the Correlation between the independent Variables
➢ We need to check the collinearity between all the independent variables to check for multi
collinearity between the independent variables which might lead to some discrepancy in our
linear regression models
➢ We observe that correlation between speed air and speed ground and hence while
considering both the variables in regression we need to be extra carful
➢ Except of that we can observe that all the other variables are fairly uncorrelated with each
other which is a good sign for our regression model
Note: Argument against considering the Air speed variables:
We observe that air speed variables have almost 70% missing values which means if we try to
impute the variables using sensible imputation or through predictive imputation we will be
predicting more that 70% of the values based on the remaining 30% values which may not be a wise
or a sensible decision to do.
Another factor since values of ground speed and air speed are very much correlated we can instead
only use air ground for our regression model.
_
Chapter 3: Statistical modelling
Please look at the R square which is value which we can use to check the regression model with one
another to check for the accuracy of the regression model.
Our Aim on the model improvement will be to have a model with a better R Square but with a
caution that we don’t overfit the model.
Note: For our next iteration of the model we will consider only the variables speed ground , height
and pitch
_
Now we need to check the variables that we need to consider for our regression Analysis.
All the variables with P vales more that 0.1 will be not considered for the analysis.
For the variables with P value slightly significant should be carefully selected as we might be over
fitting our model which will be harmful when we are testing our results on the test sets.
_
Note :
We observe a few things like the residual shows a normal distribution.
Since the R square values doesn’t change we have our regression model finalized with the significant
variables. And R square value seems pretty good for a model in terms of accuracy
We further need to validate a model.
We can either validate our regression model by testing its accuracy on the test data set.
Since at this movement we don’t have a test data set present, we can perform a basic validation
with the help of model checking.
_
Model checking
Observation
1. The residual is normal distributed
2. The mean of the residual is 0
3. We have a constant Variance
Hence, we can conclude that the model is validates through model checking
_
Chapter 4: Project Summary
Summary of the Project
Background: Flight landing.
Motivation: To reduce the risk of landing overrun.
Goal: To study what factors and how they would impact the landing distance of a commercial flight.
Data: Landing data (landing distance and other parameters) from 950 commercial flights (not real
data set but simulated from statistical models
1. Data Preparation
a. Combined both data sets.
b. Removed duplicates on the datasets
c. Removed the abnormal observation from the data sets
d. Checked the distribution of each variable in the datasets.
2. Descriptive Study (XY plots and correlation studies)
a. Studying the X-Y plot between the different variables.
i. We observed that relationship between distance and ground speed is highly
linear
ii. Whereas relationship between distance w.r.t height and pitch are slightly
linear
iii. Relationship between of distance with duration and Nonpigs is obviously not
linear
b. Studying the Correlation between the independent variables
i. Only ground speed and air speed showed a great collinearity but since the
speed air is highly empty we can remove it from our regression model and
hence we don’t need to worry about the multi collinearity.
ii. All the other variables are quite non- collinear.
3. Statistical modelling- Linear regression.
a. To study the factors with respect to the landing distance we made a linear
regression.
i. R2
of the model was roughly 0.84.
ii. It showed ground speed, height and aircraft as significant variables with P
value less than .0001
b. Correction in the model: To make a better model we consider only the significant
variables and then checked the R2
which has increased slightly.
i. Now our dependent variable which is distance depends on the independent
variables which are Ground speed, Height and aircraft.
Our regression models
Distance= 42.7*(Ground Speed)+14.5*(Height)-501(air_craft_flag)-2052
_
Answering the Questions
How many observations (flights) do you use to fit your final model? If not all 950 flights,
why?
1. There were 832 observation that I used to train my data to fit the linear
regression models
1. We removed 100 observations because they were duplicates
2. We further removed 18 values since they were the abnormal values.
3. We could have removed 50 observations for which duration was empty but we did
not because duration was not a significant parameter when considering for
regression
2. What factors and how they impact the landing distance of a flight?
Factors that Affect the landing distance as follows:
1. Ground Speed: With an increase in ground speed the landing distance increases
2. Height: With an increase in height the landing distance increases
3. Air_Craft_flag: Where 1 stands for Airbus and 0 stands for Boing. Both make of the
aircraft showed different behaviour in terms of landing distance
3.Is there any difference between the two makes Boeing and Airbus
_
For Airbus N=444
For Boeing N=388
When we make a regression, model check them with respect to aircraft make we observe
For Boeing, pitch is insignificant in the regression model whereas for air bus, it is quite significant

More Related Content

What's hot

[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence
[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence
[ITP - Lecture 06] Operators, Arithmetic Expression and Order of PrecedenceMuhammad Hammad Waseem
 
Piecewise Functions
Piecewise FunctionsPiecewise Functions
Piecewise Functionsktini
 
Lab 1 ball toss data analysis (physics with vernier experimen
Lab 1 ball toss data analysis (physics with vernier experimenLab 1 ball toss data analysis (physics with vernier experimen
Lab 1 ball toss data analysis (physics with vernier experimenADDY50
 
Unit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in cUnit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in cSowmya Jyothi
 
Operator precedence and associativity
Operator precedence and associativityOperator precedence and associativity
Operator precedence and associativityDr.Sandhiya Ravi
 
Piecewise and Step Functions
Piecewise and Step FunctionsPiecewise and Step Functions
Piecewise and Step Functionsktini
 

What's hot (12)

C++
C++ C++
C++
 
Oop using JAVA
Oop using JAVAOop using JAVA
Oop using JAVA
 
[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence
[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence
[ITP - Lecture 06] Operators, Arithmetic Expression and Order of Precedence
 
Python Lecture 5
Python Lecture 5Python Lecture 5
Python Lecture 5
 
AUTO MPG Regression Analysis
AUTO MPG Regression AnalysisAUTO MPG Regression Analysis
AUTO MPG Regression Analysis
 
Piecewise Functions
Piecewise FunctionsPiecewise Functions
Piecewise Functions
 
Lab 1 ball toss data analysis (physics with vernier experimen
Lab 1 ball toss data analysis (physics with vernier experimenLab 1 ball toss data analysis (physics with vernier experimen
Lab 1 ball toss data analysis (physics with vernier experimen
 
Lecture4
Lecture4Lecture4
Lecture4
 
Presentation
PresentationPresentation
Presentation
 
Unit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in cUnit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in c
 
Operator precedence and associativity
Operator precedence and associativityOperator precedence and associativity
Operator precedence and associativity
 
Piecewise and Step Functions
Piecewise and Step FunctionsPiecewise and Step Functions
Piecewise and Step Functions
 

Similar to Stats computing project_final

FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisQuynh Tran
 
Predicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regressionPredicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regressionPrerit Saxena
 
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...Abimbola Ogundipe
 
Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013Rajib Layek
 
Use of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for RankingUse of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for Rankingijsrd.com
 
A study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point ErrorsA study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point Errorsijpla
 
Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesAdrián Vallés
 
Stats ca report_18180485
Stats ca report_18180485Stats ca report_18180485
Stats ca report_18180485sarthakkhare3
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaNisheet Mahajan
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delayiDTechTechnologies
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...IRJET Journal
 
Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model AkarshAvinash
 
Auto MPG Regression Analysis
Auto MPG Regression AnalysisAuto MPG Regression Analysis
Auto MPG Regression AnalysisAnirudh Srinath.V
 

Similar to Stats computing project_final (20)

FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and Analysis
 
Predicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regressionPredicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regression
 
Flight landing Project
Flight landing ProjectFlight landing Project
Flight landing Project
 
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
Optimized Multi model Fuzzy Altitude and Translational Velocity Controller fo...
 
Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013Business Market Research on Instant Messaging -2013
Business Market Research on Instant Messaging -2013
 
Use of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for RankingUse of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for Ranking
 
Time series project
Time series projectTime series project
Time series project
 
Flight Landing Risk Assessment Project
Flight Landing Risk Assessment ProjectFlight Landing Risk Assessment Project
Flight Landing Risk Assessment Project
 
A study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point ErrorsA study of the Behavior of Floating-Point Errors
A study of the Behavior of Floating-Point Errors
 
Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian Valles
 
2011-36-0091
2011-36-00912011-36-0091
2011-36-0091
 
Stats ca report_18180485
Stats ca report_18180485Stats ca report_18180485
Stats ca report_18180485
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delay
 
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...IRJET -  	  Comparative Study of Flight Delay Prediction using Back Propagati...
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
 
Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model
 
Airline delay prediction
Airline delay predictionAirline delay prediction
Airline delay prediction
 
Auto MPG Regression Analysis
Auto MPG Regression AnalysisAuto MPG Regression Analysis
Auto MPG Regression Analysis
 
Coverage and Introduction to UVM
Coverage and Introduction to UVMCoverage and Introduction to UVM
Coverage and Introduction to UVM
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Stats computing project_final

  • 1. _ BANA 6043 Project NAME: AYANK GUPTA UCID:M12388639 Background: Flight landing. Motivation: To reduce the risk of landing overrun. Goal: To study what factors and how they would impact the landing distance of a commercial flight. Data: Landing data (landing distance and other parameters) from 950 commercial flights (not real data set but simulated from statistical models). See two Excel files ‘FAA-1.xls’ (800 flights) and ‘FAA-2.xls’ (150 flights).
  • 2. _ Chapter 1: Data Preparation 1. Combining of the data sets from different sources Output of both the imports /**FAA1**/
  • 3. _ /**FAA2**/ /** Combing both the data sets **/
  • 4. _ /*Checking for Duplicates and removing them from the combines datasets*/ Note: We observed 100 duplicates entries from the combines dataset hence removed it from it.
  • 5. _ 2. Performing the completeness check of each variable – examine if missing values are present; Variable N Missing Values % Missing Values Duration 800 50 5.8% no_pasg 850 0 0% speed_ground 850 0 0% speed_air 208 642 75% Height 850 0 0% Pitch 850 0 0% Distance 850 0 0% Note: 1. 16% of the values of the DURATION variable are missing because 50 rows are missing from the FAA2 datasets 2. 75% of the values of the speed_air are missing and we need to further examine the column for data cleaning Performing the validity check of each variable – examine if abnormal values are present;
  • 6. _ NOTE: Here we see that the height of the few values in height are negative and we need to flag them out from our next analysis. In our next analysis, we will perform the analysis on each and every variable based on the business rule given for each variables.
  • 7. _ /*Checking for outliers in height*/ Note: By performing the above step we are able to identify the heights with negative hieghts.
  • 8. _ Cleaning the data based on the results of Steps 2 and 3 Note : We are able to remove 18 Values according to the abnormalities 1. For now we are not removing the missing values rows because it will create bias in the data a. I am planning to impute the missing values. b. Or I will be using some approximations like mean to fill the missing values
  • 9. _ Summarizing the distribution of each variable We went ahead to see the distribution of each and every variable to see which of the variable shows a normal distribution and those variables who are in a way skewed or biased to Variable Label N Mean Std Dev Minimum Maximum Skweness duration duration 782 154.731 48.335 41.949 305.622 0.192089 no_pasg no_pasg 832 60.060 7.488 29.000 87.000 -0.015304 speed_ground speed_ground 832 79.611 18.829 33.574 136.659 0.110191 speed_air speed_air 204 103.646 9.982 90.003 136.423 0.9447 height height 832 30.474 9.791 6.228 59.946 0.125057 pitch pitch 832 4.005 0.526 2.284 5.927 0.016221 distance distance 832 1,528.240 911.045 41.722 6,309.950 1.560395 DURATION
  • 13. _ CHAPTER 2: Descriptive Study (XY plots and correlation studies) Distance Vs Duration Distance Vs NO_PASG
  • 14. _ Distance Vs Speed Ground Distance Vs Air Speed
  • 16. _ My Interpretation on the XY plot of the data 1. Distance Vs Duration: The values seem to scatter and the relationship doesn’t seem to be linear 2. Distance Vs No_Pasg: the relationship is not linear 3. Distance Vs Speed_Ground: The relation is linear or in other words the relationship shows a monotonic relationship 4. Distance Vs Speed air is fairly linear but we have a lot of missing values in the speed air, hence the relationship cannot be considered significant 5. Distance Vs Height and Pitch seems a bit scattered
  • 17. _ Correlation Matrix between the variables and their interpretation: Interpretation of the Correlation between the independent Variables ➢ We need to check the collinearity between all the independent variables to check for multi collinearity between the independent variables which might lead to some discrepancy in our linear regression models ➢ We observe that correlation between speed air and speed ground and hence while considering both the variables in regression we need to be extra carful ➢ Except of that we can observe that all the other variables are fairly uncorrelated with each other which is a good sign for our regression model Note: Argument against considering the Air speed variables: We observe that air speed variables have almost 70% missing values which means if we try to impute the variables using sensible imputation or through predictive imputation we will be predicting more that 70% of the values based on the remaining 30% values which may not be a wise or a sensible decision to do. Another factor since values of ground speed and air speed are very much correlated we can instead only use air ground for our regression model.
  • 18. _ Chapter 3: Statistical modelling Please look at the R square which is value which we can use to check the regression model with one another to check for the accuracy of the regression model. Our Aim on the model improvement will be to have a model with a better R Square but with a caution that we don’t overfit the model. Note: For our next iteration of the model we will consider only the variables speed ground , height and pitch
  • 19. _ Now we need to check the variables that we need to consider for our regression Analysis. All the variables with P vales more that 0.1 will be not considered for the analysis. For the variables with P value slightly significant should be carefully selected as we might be over fitting our model which will be harmful when we are testing our results on the test sets.
  • 20. _ Note : We observe a few things like the residual shows a normal distribution. Since the R square values doesn’t change we have our regression model finalized with the significant variables. And R square value seems pretty good for a model in terms of accuracy We further need to validate a model. We can either validate our regression model by testing its accuracy on the test data set. Since at this movement we don’t have a test data set present, we can perform a basic validation with the help of model checking.
  • 21. _ Model checking Observation 1. The residual is normal distributed 2. The mean of the residual is 0 3. We have a constant Variance Hence, we can conclude that the model is validates through model checking
  • 22. _ Chapter 4: Project Summary Summary of the Project Background: Flight landing. Motivation: To reduce the risk of landing overrun. Goal: To study what factors and how they would impact the landing distance of a commercial flight. Data: Landing data (landing distance and other parameters) from 950 commercial flights (not real data set but simulated from statistical models 1. Data Preparation a. Combined both data sets. b. Removed duplicates on the datasets c. Removed the abnormal observation from the data sets d. Checked the distribution of each variable in the datasets. 2. Descriptive Study (XY plots and correlation studies) a. Studying the X-Y plot between the different variables. i. We observed that relationship between distance and ground speed is highly linear ii. Whereas relationship between distance w.r.t height and pitch are slightly linear iii. Relationship between of distance with duration and Nonpigs is obviously not linear b. Studying the Correlation between the independent variables i. Only ground speed and air speed showed a great collinearity but since the speed air is highly empty we can remove it from our regression model and hence we don’t need to worry about the multi collinearity. ii. All the other variables are quite non- collinear. 3. Statistical modelling- Linear regression. a. To study the factors with respect to the landing distance we made a linear regression. i. R2 of the model was roughly 0.84. ii. It showed ground speed, height and aircraft as significant variables with P value less than .0001 b. Correction in the model: To make a better model we consider only the significant variables and then checked the R2 which has increased slightly. i. Now our dependent variable which is distance depends on the independent variables which are Ground speed, Height and aircraft. Our regression models Distance= 42.7*(Ground Speed)+14.5*(Height)-501(air_craft_flag)-2052
  • 23. _ Answering the Questions How many observations (flights) do you use to fit your final model? If not all 950 flights, why? 1. There were 832 observation that I used to train my data to fit the linear regression models 1. We removed 100 observations because they were duplicates 2. We further removed 18 values since they were the abnormal values. 3. We could have removed 50 observations for which duration was empty but we did not because duration was not a significant parameter when considering for regression 2. What factors and how they impact the landing distance of a flight? Factors that Affect the landing distance as follows: 1. Ground Speed: With an increase in ground speed the landing distance increases 2. Height: With an increase in height the landing distance increases 3. Air_Craft_flag: Where 1 stands for Airbus and 0 stands for Boing. Both make of the aircraft showed different behaviour in terms of landing distance 3.Is there any difference between the two makes Boeing and Airbus
  • 24. _ For Airbus N=444 For Boeing N=388 When we make a regression, model check them with respect to aircraft make we observe For Boeing, pitch is insignificant in the regression model whereas for air bus, it is quite significant