SlideShare a Scribd company logo
1 of 21
BANA 6043: PROJECT WORK
OCTOBER 2, 2017
JATIN SAINI
M12382157
Page | 1
SUMMARY:
Goal of thisprojectis tostudy whatfactors andhow theywouldimpactthe landingdistance of a
commercial flight.Forthis Ireceiveddatasetcontainingdetails of BOEINGandAIRBUSflights.My
firststepisto performprocedure fordata preparation toremove emptyrows,identifyduplicate
rows,identifysample size foreachvariable and take outoutliers fromthe dataset. All the mentioned
stepshelpedme understanddistributionof variables,identifynormal valuesandoutliersandform
newdatasetwithnormal values.
To furtherunderstandthe correlationof variableswiththe landingdistance,Iconducteddescriptive
studyof normal valueswith predictorvariable (whichisdistance) and all the response variables.This
studygave me some perspective aboutrelationshipof speed_airandspeed_groundwiththe landing
distance.IusedPearsoncorrelationtechniquetofurtherunderstandcollinearitybetweenall the
variables,anditturnedoutthat speed_air islinearlycorrelatedwith speed_groundand there is no
signof othervariables showingany distributionpatternwitheachother.
Finally,Ifittedalinearregressionmodelwithintercept onthe normal dataafterremoving
speed_air. Followingisthe equation:
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see fromthe equationthatdurationof flightandnumberof passengerstravellinghave
negative impactonlandingdistance butspeedof ground,heightof aircraftandpitchhave positive
impact.
Page | 2
VARIABLEDICTIONARY:
Aircraft: Themake of an aircraft(Boeing or Airbus).
Duration (in minutes): Flight duration between taking off and landing. The
duration of a normal flight should always be greater than 40min.
No_pasg: The number of passengers in a flight.
Speed_ground (in miles per hour): The ground speed of an aircraftwhen
passing over the threshold of the runway. If its valueis less than 30MPH or
greater than 140MPH, then the landing would be considered as abnormal.
Speed_air (in miles per hour): The air speed of an aircraftwhen passing over
the threshold of the runway. If its value is less than 30MPH or greater than
140MPH, then the landing would be considered as abnormal.
Height (in meters): The height of an aircraft when it is passing over the
threshold of the runway. Thelanding aircraftis required to be at least 6 meters
high at the threshold of the runway.
Pitch (in degrees): Pitch angle of an aircraft when it is passing over the
threshold of the runway.
Distance (in feet): The landing distance of an aircraft. More specifically, it
refers to the distance between the threshold of the runway and the point
wherethe aircraftcan be fully stopped. The length of the airportrunway is
typically less than 6000 feet.
Page | 3
CHAPTER 1: DATA PREPARATION
DESCRIPTION: Datapreparation is a very important stepinunderstanding the
sample size. My aim here is toretainas much data as possible toobtaina
best fittedmodel.
STEP 1: Uploading Data Files:
FILENAME flight1 '/home/sainijn0/Stat_computing/FAA1.xls';
PROC IMPORT DATAFILE=FLIGHT1
DBMS=XLS
OUT=FLIGHT1;
GETNAMES=YES;
PROC PRINT DATA=FLIGHT1(obs=10);
FILENAME flight2 '/home/sainijn0/Stat_computing/FAA2.xls';
PROC IMPORT DATAFILE=FLIGHT2
DBMS=XLS
OUT=FLIGHT2;
GETNAMES=YES;
PROC PRINT DATA=FLIGHT2(obs=10);
Page | 4
STEP 2: Removing Empty Rows From DataSets:
DATA FLIGHT1;
SET FLIGHT1;
IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND
MISSING(SPEED_GROUND)
AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND
MISSING(DISTANCE)
THEN DELETE;
RUN;
PROC PRINT DATA=flight1(obs=10);
RUN;
Page | 5
DATA FLIGHT2;
SET FLIGHT2;
IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND
MISSING(SPEED_GROUND)
AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND
MISSING(DISTANCE)
THEN DELETE;
RUN;
PROC PRINT DATA=flight2(obs=10);
RUN;
STEP 3: Combining Data Sets:
DATA COMBINED_FLIGHT;
SET flight1 FLIGHT2;
PROC PRINT DATA=COMBINED_FLIGHT(OBS=10);
Page | 6
STEP 4: Removing Duplicates:
PROC SORT DATA=COMBINED_FLIGHT NODUPKEY;
BY SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE;
PROC PRINT DATA=combined_flight(OBS=10);
Page | 7
STEP 5: Finding Missing Values For EachVariable:
PROC MEANS DATA=COMBINED_FLIGHT NMISS;
TITLE MISSING VALUES;
STEP 6: Observing Variable Distributions:
PROC CHART DATA=COMBINED_FLIGHT;
VBAR SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE;
Page | 8
Page | 9
Page | 10
STEP 7: Identifying Abnormal Rows:
DATA VALIDATE1;
SET COMBINED_FLIGHT;
IF DURATION<40 THEN FLAG=1;
ELSE IF HEIGHT<6 THEN FLAG=1;
ELSE IF SPEED_GROUND<30 OR SPEED_GROUND>140 THEN FLAG=1;
ELSE IF DISTANCE>6000 THEN FLAG=1;
ELSE FLAG=0;
PROC PRINT DATA=VALIDATE1(OBS=10);
STEP 8: Summary Of Abnormal And Normal Data:
DATA FLAGGED_FLIGHTS;
SET VALIDATE1;
IF FLAG=1;
PROC MEANS DATA=FLAGGED_FLIGHTS;
TITLE ABNORMAL DATA;
Page | 11
DATA NORMAL_FLIGHTS;
SET VALIDATE1;
IF FLAG=0;
PROC MEANS DATA=NORMAL_FLIGHTS;
TITLE NORMAL DATA;
Observation:We observedthat after removingduplicate valuesand empty rows, and further
removingoutlierswe get195 rows for speed_airand 781 data rows for each of the following
variables:
 Duration
 No_pasg
 Speed_ground
 Height
 Pitch
 Distance
Page | 12
CHAPTER 2: DESCRIPTIVE STUDY
WORKING WITH NORMAL DATA
DESCRIPTION: Purpose of this chapter is to use scatterplotsandPearson
correlationtounderstandany correlationbetweenvariables.
STEP 1: Creating Scatterplots BetweenResponse Variable AndPredictor
Variables.
CODE:
PROC PLOT DATA=NORMAL_FLIGHTS;
PLOT DISTANCE*DURATION;
PLOT DISTANCE*NO_PASG;
PLOT DISTANCE*SPEED_GROUND;
PLOT DISTANCE*SPEED_AIR;
PLOT DISTANCE*HEIGHT;
PLOT DISTANCE*PITCH;
RUN;
Page | 13
Page | 14
Page | 15
Page | 16
STEP 2: PearsonCorrelationBetweenVariables:
PROC CORR DATA=NORMAL_FLIGHTS;
VAR DISTANCE DURATION NO_PASG SPEED_GROUND SPEED_AIR HEIGHT
PITCH;
OBSERVATION:
This turns out to be very useful stepas it helpsus in identifyingcorrelationbetweeneachof the
variables.
Now, it seemsfromthe output that correlationbetween“speed_ground” and“speed_air” is0.988
(nearlyperfectcorrelation) which tellsus that they show high collinearity.Therefore,we can drop
one of these variables inregressionstep.
Page | 17
Since “speed_air” hasonly 195 data rows filled ascompared to “speed_ground” whichhas 781
data rows. Hence we drop “speed_air” variable fromour regressionmodel.
CHAPTER 3: STATISTICAL STUDY
DESCRIPTION: Statistical study is avery important stepas it helps us in fitting
a regressionmodel topresent datawithinterceptandif eachvariable is
positively or negatively correlatedwith landing distance
STEP 1: Fitting Linear RegressionModel OnOur Normal Dataset:
PROC REG DATA=NORMAL_FLIGHTS;
MODEL DISTANCE = DURATION NO_PASG SPEED_GROUND HEIGHT PITCH;
RUN;
Page | 18
Page | 19
OBSERVATION: Equation From RegressionOf Normal Data ComesOut To Be:
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see from the equationthat duration of flightand number of passengerstravellinghave a
negative impact on landingdistance but speedof ground,heightof aircraft and pitch have a
positive impact.
Page | 20
Q&A
1. How manyobservations(flights) doyouuse tofityour final model?If notall 950 flights,why?
Solution:I used781 observationstofitthe model outof 950. 69 rowswere removedfromdataset
because theycontainedoutliersand100 rows were duplicates(basedonkeys:speed_ground,
speed_air,height,pitchanddistance).
2. What factors andhow theyimpactthe landingdistance of aflight?
Solution:RegressionEquation
DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) +
42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH)
We can see fromthe regression equationthatdurationof flightandnumberof passengerstravelling
have a negative impactonlandingdistance butspeedof ground,heightof aircraftandpitch have a
positive impact.
3. Is there anydifferencebetweenthe twomakesBoeingandAirbus?
Solution:
RegressionEquationforBoeingaircrafts
DISTANCE = -1796.98903 + 0.45775*(DURATION) – 1.9524*(NO_PASG) +
42.28107*(SPEED_GROUND) + 14.23727*(HEIGHT) – 39.31818*(PITCH)
RegressionEquationforAirbusaircrafts
DISTANCE = -2788.83858 -0.30974*(DURATION) – 0.33445*(NO_PASG) +
42.90888*(SPEED_GROUND) + 13.98867*(HEIGHT) + 80.78737*(PITCH)
We can see a wide shiftininterceptandpitchcoefficientsforthe 2 makes,butwhatreallygetsmy
attentionisthe change insignsfor coefficientsof pitchandduration.Thistellsusthat pitchand
duration have opposite impactonthe landingdistance whenwe compare BoeingandAirbus
aircrafts.

More Related Content

Similar to Modeling and Prediction using SAS

Predicting aircraft landing distances using linear regression
Predicting aircraft landing distances using linear regressionPredicting aircraft landing distances using linear regression
Predicting aircraft landing distances using linear regressionSamrudh Keshava Kumar
 
Statistical computing project
Statistical computing projectStatistical computing project
Statistical computing projectRashmiSubrahmanya
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delayiDTechTechnologies
 
Regression Analysis on Flights data
Regression Analysis on Flights dataRegression Analysis on Flights data
Regression Analysis on Flights dataMansi Verma
 
Flight Landing Distance Study Using SAS
Flight Landing Distance Study Using SASFlight Landing Distance Study Using SAS
Flight Landing Distance Study Using SASSarita Maharia
 
Prediction of Airlines Delay
Prediction of Airlines Delay Prediction of Airlines Delay
Prediction of Airlines Delay Dinesh Kommireddi
 
Using PostgreSQL for Flight Planning
Using PostgreSQL for Flight PlanningUsing PostgreSQL for Flight Planning
Using PostgreSQL for Flight PlanningBlake Crosby
 
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...CSCJournals
 
Predicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regressionPredicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regressionPrerit Saxena
 
big data slides.pptx
big data slides.pptxbig data slides.pptx
big data slides.pptxBSwethaBindu
 
j2 Universal - Modelling and Tuning Braking Characteristics
j2 Universal  - Modelling and Tuning Braking Characteristicsj2 Universal  - Modelling and Tuning Braking Characteristics
j2 Universal - Modelling and Tuning Braking CharacteristicsJohn Jeffery
 
Hard landing predection
Hard landing predectionHard landing predection
Hard landing predectionRAJUPADHYAY44
 
DOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdfDOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdfShaizaanKhan
 
Enhancing Pilot Ability to Perform CDA with Descriptive Waypoints
Enhancing Pilot Ability to Perform CDA with Descriptive WaypointsEnhancing Pilot Ability to Perform CDA with Descriptive Waypoints
Enhancing Pilot Ability to Perform CDA with Descriptive WaypointsMichael LaMarr
 
CDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle AirportCDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle AirportMohammed Awad
 

Similar to Modeling and Prediction using SAS (20)

Predicting aircraft landing distances using linear regression
Predicting aircraft landing distances using linear regressionPredicting aircraft landing distances using linear regression
Predicting aircraft landing distances using linear regression
 
Flight Data Analysis
Flight Data AnalysisFlight Data Analysis
Flight Data Analysis
 
Statistical computing project
Statistical computing projectStatistical computing project
Statistical computing project
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delay
 
Regression Analysis on Flights data
Regression Analysis on Flights dataRegression Analysis on Flights data
Regression Analysis on Flights data
 
Flight Landing Distance Study Using SAS
Flight Landing Distance Study Using SASFlight Landing Distance Study Using SAS
Flight Landing Distance Study Using SAS
 
Prediction of Airlines Delay
Prediction of Airlines Delay Prediction of Airlines Delay
Prediction of Airlines Delay
 
Using PostgreSQL for Flight Planning
Using PostgreSQL for Flight PlanningUsing PostgreSQL for Flight Planning
Using PostgreSQL for Flight Planning
 
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
 
Predicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regressionPredicting aircraft landing overruns using quadratic linear regression
Predicting aircraft landing overruns using quadratic linear regression
 
big data slides.pptx
big data slides.pptxbig data slides.pptx
big data slides.pptx
 
j2 Universal - Modelling and Tuning Braking Characteristics
j2 Universal  - Modelling and Tuning Braking Characteristicsj2 Universal  - Modelling and Tuning Braking Characteristics
j2 Universal - Modelling and Tuning Braking Characteristics
 
Hard landing predection
Hard landing predectionHard landing predection
Hard landing predection
 
DOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdfDOC245-20240219-WA0000_240219_090212.pdf
DOC245-20240219-WA0000_240219_090212.pdf
 
Flights Landing Overrun Project
Flights Landing Overrun ProjectFlights Landing Overrun Project
Flights Landing Overrun Project
 
Airline delay prediction
Airline delay predictionAirline delay prediction
Airline delay prediction
 
Enhancing Pilot Ability to Perform CDA with Descriptive Waypoints
Enhancing Pilot Ability to Perform CDA with Descriptive WaypointsEnhancing Pilot Ability to Perform CDA with Descriptive Waypoints
Enhancing Pilot Ability to Perform CDA with Descriptive Waypoints
 
Boeing-VSD
Boeing-VSDBoeing-VSD
Boeing-VSD
 
Max Gap Tim Final
Max Gap Tim FinalMax Gap Tim Final
Max Gap Tim Final
 
CDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle AirportCDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle Airport
 

Recently uploaded

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

Modeling and Prediction using SAS

  • 1. BANA 6043: PROJECT WORK OCTOBER 2, 2017 JATIN SAINI M12382157
  • 2. Page | 1 SUMMARY: Goal of thisprojectis tostudy whatfactors andhow theywouldimpactthe landingdistance of a commercial flight.Forthis Ireceiveddatasetcontainingdetails of BOEINGandAIRBUSflights.My firststepisto performprocedure fordata preparation toremove emptyrows,identifyduplicate rows,identifysample size foreachvariable and take outoutliers fromthe dataset. All the mentioned stepshelpedme understanddistributionof variables,identifynormal valuesandoutliersandform newdatasetwithnormal values. To furtherunderstandthe correlationof variableswiththe landingdistance,Iconducteddescriptive studyof normal valueswith predictorvariable (whichisdistance) and all the response variables.This studygave me some perspective aboutrelationshipof speed_airandspeed_groundwiththe landing distance.IusedPearsoncorrelationtechniquetofurtherunderstandcollinearitybetweenall the variables,anditturnedoutthat speed_air islinearlycorrelatedwith speed_groundand there is no signof othervariables showingany distributionpatternwitheachother. Finally,Ifittedalinearregressionmodelwithintercept onthe normal dataafterremoving speed_air. Followingisthe equation: DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) + 42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH) We can see fromthe equationthatdurationof flightandnumberof passengerstravellinghave negative impactonlandingdistance butspeedof ground,heightof aircraftandpitchhave positive impact.
  • 3. Page | 2 VARIABLEDICTIONARY: Aircraft: Themake of an aircraft(Boeing or Airbus). Duration (in minutes): Flight duration between taking off and landing. The duration of a normal flight should always be greater than 40min. No_pasg: The number of passengers in a flight. Speed_ground (in miles per hour): The ground speed of an aircraftwhen passing over the threshold of the runway. If its valueis less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Speed_air (in miles per hour): The air speed of an aircraftwhen passing over the threshold of the runway. If its value is less than 30MPH or greater than 140MPH, then the landing would be considered as abnormal. Height (in meters): The height of an aircraft when it is passing over the threshold of the runway. Thelanding aircraftis required to be at least 6 meters high at the threshold of the runway. Pitch (in degrees): Pitch angle of an aircraft when it is passing over the threshold of the runway. Distance (in feet): The landing distance of an aircraft. More specifically, it refers to the distance between the threshold of the runway and the point wherethe aircraftcan be fully stopped. The length of the airportrunway is typically less than 6000 feet.
  • 4. Page | 3 CHAPTER 1: DATA PREPARATION DESCRIPTION: Datapreparation is a very important stepinunderstanding the sample size. My aim here is toretainas much data as possible toobtaina best fittedmodel. STEP 1: Uploading Data Files: FILENAME flight1 '/home/sainijn0/Stat_computing/FAA1.xls'; PROC IMPORT DATAFILE=FLIGHT1 DBMS=XLS OUT=FLIGHT1; GETNAMES=YES; PROC PRINT DATA=FLIGHT1(obs=10); FILENAME flight2 '/home/sainijn0/Stat_computing/FAA2.xls'; PROC IMPORT DATAFILE=FLIGHT2 DBMS=XLS OUT=FLIGHT2; GETNAMES=YES; PROC PRINT DATA=FLIGHT2(obs=10);
  • 5. Page | 4 STEP 2: Removing Empty Rows From DataSets: DATA FLIGHT1; SET FLIGHT1; IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND MISSING(SPEED_GROUND) AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND MISSING(DISTANCE) THEN DELETE; RUN; PROC PRINT DATA=flight1(obs=10); RUN;
  • 6. Page | 5 DATA FLIGHT2; SET FLIGHT2; IF MISSING(NO_PASG) AND MISSING(DURATION) AND MISSING(AIRCRAFT) AND MISSING(SPEED_GROUND) AND MISSING(SPEED_AIR) AND MISSING(HEIGHT) AND MISSING(PITCH) AND MISSING(DISTANCE) THEN DELETE; RUN; PROC PRINT DATA=flight2(obs=10); RUN; STEP 3: Combining Data Sets: DATA COMBINED_FLIGHT; SET flight1 FLIGHT2; PROC PRINT DATA=COMBINED_FLIGHT(OBS=10);
  • 7. Page | 6 STEP 4: Removing Duplicates: PROC SORT DATA=COMBINED_FLIGHT NODUPKEY; BY SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE; PROC PRINT DATA=combined_flight(OBS=10);
  • 8. Page | 7 STEP 5: Finding Missing Values For EachVariable: PROC MEANS DATA=COMBINED_FLIGHT NMISS; TITLE MISSING VALUES; STEP 6: Observing Variable Distributions: PROC CHART DATA=COMBINED_FLIGHT; VBAR SPEED_AIR SPEED_GROUND HEIGHT PITCH DISTANCE;
  • 11. Page | 10 STEP 7: Identifying Abnormal Rows: DATA VALIDATE1; SET COMBINED_FLIGHT; IF DURATION<40 THEN FLAG=1; ELSE IF HEIGHT<6 THEN FLAG=1; ELSE IF SPEED_GROUND<30 OR SPEED_GROUND>140 THEN FLAG=1; ELSE IF DISTANCE>6000 THEN FLAG=1; ELSE FLAG=0; PROC PRINT DATA=VALIDATE1(OBS=10); STEP 8: Summary Of Abnormal And Normal Data: DATA FLAGGED_FLIGHTS; SET VALIDATE1; IF FLAG=1; PROC MEANS DATA=FLAGGED_FLIGHTS; TITLE ABNORMAL DATA;
  • 12. Page | 11 DATA NORMAL_FLIGHTS; SET VALIDATE1; IF FLAG=0; PROC MEANS DATA=NORMAL_FLIGHTS; TITLE NORMAL DATA; Observation:We observedthat after removingduplicate valuesand empty rows, and further removingoutlierswe get195 rows for speed_airand 781 data rows for each of the following variables:  Duration  No_pasg  Speed_ground  Height  Pitch  Distance
  • 13. Page | 12 CHAPTER 2: DESCRIPTIVE STUDY WORKING WITH NORMAL DATA DESCRIPTION: Purpose of this chapter is to use scatterplotsandPearson correlationtounderstandany correlationbetweenvariables. STEP 1: Creating Scatterplots BetweenResponse Variable AndPredictor Variables. CODE: PROC PLOT DATA=NORMAL_FLIGHTS; PLOT DISTANCE*DURATION; PLOT DISTANCE*NO_PASG; PLOT DISTANCE*SPEED_GROUND; PLOT DISTANCE*SPEED_AIR; PLOT DISTANCE*HEIGHT; PLOT DISTANCE*PITCH; RUN;
  • 17. Page | 16 STEP 2: PearsonCorrelationBetweenVariables: PROC CORR DATA=NORMAL_FLIGHTS; VAR DISTANCE DURATION NO_PASG SPEED_GROUND SPEED_AIR HEIGHT PITCH; OBSERVATION: This turns out to be very useful stepas it helpsus in identifyingcorrelationbetweeneachof the variables. Now, it seemsfromthe output that correlationbetween“speed_ground” and“speed_air” is0.988 (nearlyperfectcorrelation) which tellsus that they show high collinearity.Therefore,we can drop one of these variables inregressionstep.
  • 18. Page | 17 Since “speed_air” hasonly 195 data rows filled ascompared to “speed_ground” whichhas 781 data rows. Hence we drop “speed_air” variable fromour regressionmodel. CHAPTER 3: STATISTICAL STUDY DESCRIPTION: Statistical study is avery important stepas it helps us in fitting a regressionmodel topresent datawithinterceptandif eachvariable is positively or negatively correlatedwith landing distance STEP 1: Fitting Linear RegressionModel OnOur Normal Dataset: PROC REG DATA=NORMAL_FLIGHTS; MODEL DISTANCE = DURATION NO_PASG SPEED_GROUND HEIGHT PITCH; RUN;
  • 20. Page | 19 OBSERVATION: Equation From RegressionOf Normal Data ComesOut To Be: DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) + 42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH) We can see from the equationthat duration of flightand number of passengerstravellinghave a negative impact on landingdistance but speedof ground,heightof aircraft and pitch have a positive impact.
  • 21. Page | 20 Q&A 1. How manyobservations(flights) doyouuse tofityour final model?If notall 950 flights,why? Solution:I used781 observationstofitthe model outof 950. 69 rowswere removedfromdataset because theycontainedoutliersand100 rows were duplicates(basedonkeys:speed_ground, speed_air,height,pitchanddistance). 2. What factors andhow theyimpactthe landingdistance of aflight? Solution:RegressionEquation DISTANCE = -2826.33022 – 0.10338*(DURATION) - 2.35945*(NO_PASG) + 42.15245*(SPEED_GROUND) + 13.58277*(HEIGHT) + 187.99136*(PITCH) We can see fromthe regression equationthatdurationof flightandnumberof passengerstravelling have a negative impactonlandingdistance butspeedof ground,heightof aircraftandpitch have a positive impact. 3. Is there anydifferencebetweenthe twomakesBoeingandAirbus? Solution: RegressionEquationforBoeingaircrafts DISTANCE = -1796.98903 + 0.45775*(DURATION) – 1.9524*(NO_PASG) + 42.28107*(SPEED_GROUND) + 14.23727*(HEIGHT) – 39.31818*(PITCH) RegressionEquationforAirbusaircrafts DISTANCE = -2788.83858 -0.30974*(DURATION) – 0.33445*(NO_PASG) + 42.90888*(SPEED_GROUND) + 13.98867*(HEIGHT) + 80.78737*(PITCH) We can see a wide shiftininterceptandpitchcoefficientsforthe 2 makes,butwhatreallygetsmy attentionisthe change insignsfor coefficientsof pitchandduration.Thistellsusthat pitchand duration have opposite impactonthe landingdistance whenwe compare BoeingandAirbus aircrafts.