To analyzing the big data of flight database to identify the various factors which drives an airline company into loss.
For analyzing the data we have used the major technologies such as Big data concepts, Apache Pig, Map Reduce etc.
We have created some queries which gives a clear view of reasons on which an airline company should work or take some step in
order to get increases the predictability.
We believe that our approach will be helpful to bring some growth in business of airline companies as well as the business analyst.
Flight data analysis using apache pig--------------Final Year Project
1. Submitted By............
SANJIB MITRA(150403074)
SANTANU SINGHA (150403076)
SHRUTI KULSHRESTHA (150403085)
SUBHAM KUMAR MAHANTY(150403101)
Bachelor of Technology
In
Electronics and Communication
Underthesupervisionof
Mr. Souvik Pal
Department of Computer Science and Engineering Engineering
2. CONTENTS
Abstract
Aim Of heE Project
What is big data
Tools We Have Use In Our Project
WHAT WE HAVE DONE IN OUR PROJECT
Some Output Of Our Project
Discussion
Conclusion
3. ABSTRACT
To analyzing the big data of flight database to identify the various
factors which drives an airline company into loss.
For analyzing the data we have used the major technologies such
as Big data concepts, Apache Pig, Map Reduce etc.
We have created some queries which gives a clear view of reasons
on which an airline company should work or take some step in
order to get increases the predictability.
We believe that our approach will be helpful to bring some growth
in business of airline companies as well as the business analyst.
4. AIM OF THE PROJECT
The main aim of the project was optimization.
At first we had to analyze the data so that we can work upon the obvious
reasons which today’s people suffer while travelling in flights .
Here we generate few queries and try to optimize the time between
various destinations so that we can use it for some better purpose and
improvements,
It is noticed that many a time due to the same reasons many flights get
delayed over and over again so we accumulated data of a certain period of
time analyzed it and worked over certain areas.
5. What is big data
A collection of data setssolarge and
complex that it becomes difficult to
processusing on-hand database
managementtools or traditional
data processing applications.”
OR
“Big data is high-volume, high-
velocity and high-variety
information assetsthat demand cost-
effective, innovative forms of
information processing for enhanced
insight and decision making.”
7. WHAT WE HAVE DONE IN OUR PROJECT
I. Find out the top five most visited destinations.
II. Which month has seen the most number of cancellations due to bad weather?
III. Top ten origins with the highest, AVG departure_delay.
IV. Which route (origin & destination) has seen the maximum diversion?
V. Maximum no of flights cancelled in which month?
VI. Find out the top ten ORIGINS for which the reason of delay Is ” security _ delay”.
VII. Top ten destinations with the average arrival_ delay?
VIII. Top twenty five airports where minimum numbers of flight landed?
IX. Which route origin and destination has average Air System delay?
X. Top ten origins with the highest Average WEATHER_DELAY?
XI. Reason for which maximum numbers of flights were cancelled?
XII. Which airport has seen the maximum number of flights cancelled?
XIII. Find the top 10 routes with maximum distance, between origin and destination?
8.
9. Which route (origin & destination) has seen the maximum
diversion?
Queries Answer
10. Top twenty five airports where minimum numbers of flight
landed?
Queries Answer
11. Which airport has seen the maximum number of flights cancelled?
Queries Answer
12. DISCUSSION
Hence in the given project we analyzed a given flight data with 1Crore * 31 Rows and
Columns respectively and then going through it. There were around thirteen queries after
analyzing the data carefully.
These queries mainly consisted of reasons for delay and no. of flights and its origins and
destinations.
Hence after going through the problems we tried our best to minimize the loses so that we
can increase the profits of the flight companies and reduce the harassments caused the
passengers due to weather conditions, air system delay, security delay, airline delay, late
aircraft delay, weather delay.
We along with our project mentor took forward the steps to look into the project and hence
find out in details which is kept unseen till now.
13. CONCLUSION
Hadoop Mapreduce is now a popular choice for
performing large-scale data analytics. Bigdata analytics
using pig
sheds light on significant issues faced by flight data and
we can find the numbers of flight cancelled per month.