  1. 1. Dealing with Large Data Sets:Airline Departure and Arrival Data for Florida Airports(2012)Group:Group 17Group Members: Andrew Cruz, John Idasetima, David Battle, Michael Ritchhart,Erik LopezCourse: LIS3706Date: 4/24/2013
  2. 2. 1. When is it more likely to depart and arrive on schedule to your assigned airport?Flight schedules seem to have many different variations with some days being delayedand other days being ahead of schedule. These flights are often changed due to unforeseenevents, but some delays can be prevented. A detailed analysis was conducted on the Ft.Lauderdale airport to highlight when flights were more likely to depart and arrive on schedule. Inthe analysis the data was separated by days, months, and years measuring the average amount ofminutes flights were delayed departing from and arriving to the airport. Notes were also taken onthe amount of flights that arrived on time and the amount of flights that departed on time.The data was originally sorted by the airport from which the flight originated from. Beingthat there was still data about the flights from the Ft. Lauderdale airport remaining the data hadto be sorted again to extract the remaining information. The remaining information pertained tothe flights that departed from other airports but arrived at the Ft. Lauderdale airport. Thedeparture and arrival data pertaining to the Ft. Lauderdale airport was then selected and removedto create a new document.An analysis was done for each month initially focusing on the total flights for each monthranging from 9,381 to 13,159. These numbers were found then separated into flights thatdeparted early, on time, and late. To determine these numbers a filter was placed on thedeparture delay minutes. The early, on time, and late flights were separated by different numberswhich were either positive, zero, or negative respectfully. The same was also done for the flightsthat arrived at the airport by placing a filter on the arrival delay minutes. These flights were alsoseparated by different numbers except the positive numbered flights were late, the negativenumbered flights were early, and the on time flights remained the same. Averages for each dayand month were found for these numbers by adding up the total amount of minutes for each day.For departures the averages for each day ranged from 2.3 to 19.8 minutes. The range for themonths was between 4.8 and 14.6 minutes. The same was also done for the flight arrivals withthe average minutes for daysbeing between -6.4 and 18.4. When separated by months the rangewas between -0.68 and 12.1 minutes. Percentages were then found for these totals by creating afunction that took the amount of early, on time, and late flights then dividing it by the totalamount flights for the airport.Another data analysis was completed measuring the amount of flights that departed earlyand on time for each day of the week for every month. For this analysis two filters were placedon the graph with one being on the day of week column and the other on departure delay columnthat displays the minutes the flight was delayed. A custom filter was used for the departure delaycolumn selecting all rows that were positive or equal to zero. After the filter was added afunction was added to the column, which counted each flight for that day. The amounts of flightswere rangingbetween 361 to 1140 flights. A similar procedure was followed when the analysis ofthe flights that arrived early or on time was performed. This procedure differed from the previousprocess in that the numbers selected were in the arrival delay column and the early and on time
  3. 3. numbers were negative and zero respectfully. The numbers for this analysis were rangingbetween 645 and 1395.Percentages for the flights that departed early and on time were found for each day of theweek. The values were found by taking the total amount of flights that departed early and ontime for that for that day of the week in that particular month and divided by the total amount ofearly and on time flights. The values ranged between 4.68 and 14.0 percent.These steps wererepeated for the flights that arrived early or on time to the airport. The analysis returned valuesranging from 5.2 to 11.5 percent.After carefully analyzing the flight schedules it appeared that for delayed minutes on theflight departure April had the lowest average and December had the highest average. For flightarrival delays February had the lowest average time while December once again had the highestamount. September had the least amount of flights to depart from their airport on time or earlywhile December had the most. August had the least amount of flights to arrive on time or early tothe designated airport while March the most. Tuesday’s have the least amount of flights thatarrive early or on time while Friday’s have the most. Friday’s have the least amount of flightsthat arrive on time or early to airports while Sunday has the most flights to arrive on time.2. When is it more likely to get delayed (departure/arrival)?Flights leaving out of Ft Lauderdale International Airport as well as flights arriving to theairport have the possibility of being delayed. A delayed arrival into the airport can be caused by anumber of things including but not limited to: a late departure, lack of docking space, luggagemistakes, poor weather, and much more. Flights departing from the airport can be delayed formany similar reasons. In order to find out when flights departing from, and arriving to the Ft.Lauderdale International Airport were more likely delayed, various data manipulations wereperformed on a data set containing detailed flight information about the airport over the course ofa year.By using filters to select only the flights departing out of Ft. Lauderdale, we were able tofigure out the number of delayed departures over the course of the year. To obtain only thedelayed flights, we filtered out all of the negative numbers, zeros, and blanks from the delayedtimes dataset. Results indicated a total of 22,635 delayed departures over the course of the year.To obtain the data correlating to delays that only related to Arrivals at the Ft. LauderdaleAirport, we used filters to select only Ft. Lauderdale as the Destination airport, and the appliedthe same filters to the Arrival Delay time dataset to remove early or unknown arrival times.Results of this manipulation indicated a total of 25,368 delayed arrivals over the course of theyear.Thus, it is evident that flights were more likely to be delayed when arriving to the Ft.Lauderdale International Airport. However, while arrivals to the airport had a larger number ofdelays over the course of the year, the data indicates that during specific months, the number of
  4. 4. departure delays exceeded the number of arrival delays. The only two months to fall into thiscategory were January, which had exactly one more delayed departure, and March, which hadexactly 25 more delayed departures than delayed arrivals. Interestingly, during both of thesemonths, the mean time passengers spent waiting were approximately 27 minutes for delays, and25 minutes for arrivals.Despite delays being more likely to occur during arrivals, the mean time spent waiting fordelayed arrivals was less than the mean time spent waiting for delayed departures for everysingle month of the year. The graph below illustrates this occurrence, as well as the approximatedelay times for each month and their respective delay types.Evidently, wait times (delays) were highest for both arrivals and departures duringDecember, most likely due to the traffic caused by the many holidays that take place during thismonth.
  5. 5. 3. What are the likely causes of delay?According to the information provided, the most likely cause of delay to be encounteredat the Ft. Lauderdale airport is a carrier delay. According to Federal Aviation Administration acarrier delay is “Carrier delay is within the control of the air carrier. Examples of occurrencesthat may determine carrier delay are: aircraft cleaning, aircraft damage, awaiting the arrival ofconnecting passengers or crew, baggage, bird strike, cargo loading, catering, computer, outage-carrier equipment, crew legality (pilot or attendant rest), damage by hazardous goods,engineering inspection, fueling, handling disabled passengers, late crew, lavatory servicing,maintenance, over sales, potable water servicing, removal of unruly passenger, slow boarding orseating, stowing carry-on baggage, weight and balance delays.” Due to this long laundry list ofcarrier delay causes, it is easy to see why this delay is the most frequent.After research I found that the carrier delays are not usually caused by just one of thepreviously mentioned scenarios. For most delays, there is a sequence of these events that are theculprit. For example, a bird strike (which doesn’t happen too often) could cause a minor aircraftdamage. Even minor aircraft damage is treated as a very serious matter for the safety of thepassengers and the crew. The repair of the minor damage can range anywhere from 1 hour to afew days. Once the aircraft damage is repaired, there is a required engineering inspection that hasto be implemented. You can now see that the rare chance of hitting a bird that damages the planecan be the cause for a quite lengthy delay.The table below displays the total number of carrier delays for each month in the year of2012. The total number (13,341) of carrier delays far exceeds the total amount for any other typeof delay mentioned. This data was gathered by separating the given information by airports, andthen filtering the carrier delays from greatest to smallest. Once filtered, the tallies were countedup to give the numbers that are presented.Total Amount of Carrier Delays for 2012January February March April May June1007 1222 1223 879 904 1078July August September October November December1375 1254 713 958 975 1753Total amount of carrier delays 2012: 13,341
  6. 6. 4. What are the numbers of cancellations, delays per month/year?According to the data provided, there were 0 flight cancellations for the Ft. LauderdaleInternational Airport in the 2012 year. This can be proven by the fact that every flight within theyear has a scheduled departure time followed by an actual departure time. Because there is adeparture, it is safe to assume that there are no cancellations. There are in fact flights thatdeparted and have no arrival times, but there isn’t enough information to guarantee that acancellation was officially made. A plane could have no arrival times for many reasons includingwrecks, improper logging, etc. Nevertheless, the table below displays the amount of flights thathad no arrivals to the Ft. Lauderdale International Airport in the 2012 year.Flights with No Arrivals 2012January February March April May June39 25 66 28 45 62July August September October November December78 89 39 65 74 71Total number of flights with no arrivals: 681Unlike the previously mentioned fact of no cancellations for the 2012 year, the Ft.Lauderdale International Airport has plenty of delays for the year. Fortunately, we have beenprovided with an abundance of flight arrival and departure data. This data allows us to figure outhow many delays and cancellations took place. Furthermore, the data specifies exactly whatcaused the delays for the flights. The categories for the delays are as follows:1. Carrier Delay2. Weather Delay3. NAS Delay4. Security Delay5. Late Aircraft DelayWe can calculate the number of delays by searching for arrival delays that are greaterthan zero. This is an easy task to accomplish with the filtered data. With that information, we canfind the following:
  7. 7. Total Amount of Delays for 2012 (arrivals + departures)January February March April May June4016 3918 4892 3917 3841 4181July August September October November December4885 4477 2725 3806 4072 5678Total Amount of Delays for 2012 (arrivals and departures): 50,408The total number of delays relevant to the Ft. Lauderdale International Airport is 50,408.This does seem to be a number that is quite high, but please note that this is not only acalculation of departures from the airport, but also a calculation of the number of late arrivalscoming in to the airport as well. This data was extracted by ordering the Delay minutes columnof the Ft. Lauderdale Airport in order from greatest to least. Once this was done, I scrolled downto the point where the last delay with a positive number (not including 0) was present. If youlook at the table you will notice that summer months (June, July, and August) and Holidaymonths (December and November) have the highest rate of delays. This is easily believablebecause these some of Florida’s most trafficked dates.5. What are the number of on-time arrivals and departures?The extracted flight data can be used to find the exact number of on-time arrivals anddepartures for Ft. Lauderdale International Airport. Filters must be applied to columns in order tosort the data. The origin airport IDs, destination airport IDs, and arrival/departure delays columnsare used extensively in our analysis.On-time arrivals and departures will have a value thats equal to 0 for their respectivedelay column. The information is derived from each airline that operates with Fort LauderdaleInternational Airport. The following graph represents the number of on-time arrivals anddepartures for 2012:
  8. 8. Analysis of On-Time ArrivalsThe destination airport IDs must match with Ft. Lauderdale International Airport. Flightsare inbound to the airport, so theyre arrivals. The values for the arrival delay column must equal0. When this information is filtered, the records can be tallied and the number of on-time arrivalscan be found. The average number of flight arrivals per month for 2012 is 121.83. The totalnumber of on-time arrivals for 2012 is 1462.Analysis of On-Time DeparturesDifferent columns are used to find the departure data. The origin airport IDs must matchwith Ft. Lauderdale International Airport. This is due to the fact that flights are departing fromthat airport. The values for the departure delay column must be equal to 0. The number of on-time departures is derived from records that meet the filtering criteria.The average number of departures per month for 2012 is 304. With 3648 total on-timedepartures, its apparent that Fort Lauderdale International Airport has a greater amount of on-time departing flights than arriving flights.Finding Airlines with More On-Time and Delayed Flights050100150200250300350400JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecemberOn-Time Arrivals and Departures for 2012ArrivalsDepartures
  9. 9. AnalysisFrontier Airlines has the highest number of departure delays. Southwest Airlines has thegreatest amount of on-time departures. American Airlines has the highest number of arrivaldelays. Frontier Airlines has the greatest amount of on-time arrivals.In order to find the number of on-time arrivals/departures for each airliner, you must addan additional filter to the data. Once again, the destination and origin columns are used todetermine the amount of on-time flights. In order to find the number of on-time arrivals for FortLauderdale, you simply have to filter the destination to Fort Lauderdale International Airport andyou must filter the arrival delays column for 0.Unlike the previous analysis, the carrier column must be filtered in order to find thenumber of on-time arrivals for an airline. For instance, American Airlines had 111 on-timearrivals for 2012. The code for American Airlines is "AA." Other codes can be found in thedocument that accompanies the data.If theres a delay or an early departure, then the value for the arrival delays column willNOT be equal to zero. One must use the origin and the departure delay columns to find thenumber of on-time departures from the airport. These steps are the same steps as before, but thedifference is that the data is also being sorted by the carrier column. Both sets of data must berecorded from each airline.The same process must be repeated to find the number of delayed flights. The arrival anddeparture delay columns must be filtered for values greater than 0. A negative value indicates an0.00%10.00%20.00%30.00%40.00%50.00%60.00%Departure Delay %Departure On-Time %Arrival Delay %Arrival On-Time %
  10. 10. early flight, a 0 indicates an on-time flight, and a greater than zero value indicates a delayedflight. Delays can also be further subdivided for specific reasons (such as carrier, weather, etc).The number of arrival and departure delays must be recorded.After this is done, the total number of departures and arrivals for each airliner is recorded.On-time and delayed departures are divided by the total number of flights for each airline. Thesame thing is done for arrivals. Once this is done, the percentages of on-time and delayed flightscan be placed onto a graph.