Hal im Abbas 
Rashid Al i 
Amanda 
Gi lber t 
LaTia Jef ferson 
THINK BIG BOOTCAMP 
PROJECT
DATA INGESTION
METHODOLOGY 
 Created Python MapReduce job to format the data for 
ingestion 
 Python dictionary to deal with paired data 
 Logic to ignore lines with data issues 
 Executed a Hadoop streaming job to ingest the data 
 Loaded the data into tables via hive 
 I n g e s te d FA A’ s A i rc r af t Re g i st r y d a t a 
 Re- ingested data by site
PRELIMINARY ANALYSIS
INITIAL EXPLORATION 
 Most frequently repor ting craf ts 
 Percentage of records accepted & ingested 
 99.80% 
 1924 lines rejected
SITE COMPARISON 
Site 1 Site 2 
Number of Sightings 563715 449904 
Average Speed 342.02 395.37 
Average Alt 15919.28 20295.33
SITE LATITUDE LONGITUDE DATA 
 Sampled latitude and longitude data both sites 
 Found average latitude and longitude for each 
 Site one: 42.22, -70.85 
 Site two: 42.12, -71.49
SITE 1 RELATIVE LOCATION
SITE 2 RELATIVE LOCATION
MASTER DATA QUERIES 
CREATE TABLE model_summary AS SELECT mdl_code, make, 
model , max(speed), max(alt) FROM master_data GROUP BY 
mdl_code, make, model; 
CREATE TABLE aircraf t_summary AS SELECT ident , make, 
model , max(speed), max(alt) FROM master_data GROUP BY 
ident , make, model; 
CREATE TABLE owner_summary AS SELECT owner_name, 
count(distinct(hexid) ) AS count_hex FROM master_data GROUP 
BY owner_name;
DATA SCIENCE & 
VISUALIZATIONS
FASTEST PLANES
TOP SPEED VS CRUISING 
ALTITUDE BY MAKE
UNIQUE FLIGHTS BY 
AIRLINE
NUMBER OF SIGHTINGS 
BY AIRCRAFT MAKE
Team3 presentation
Team3 presentation

Team3 presentation

  • 1.
    Hal im Abbas Rashid Al i Amanda Gi lber t LaTia Jef ferson THINK BIG BOOTCAMP PROJECT
  • 2.
  • 3.
    METHODOLOGY  CreatedPython MapReduce job to format the data for ingestion  Python dictionary to deal with paired data  Logic to ignore lines with data issues  Executed a Hadoop streaming job to ingest the data  Loaded the data into tables via hive  I n g e s te d FA A’ s A i rc r af t Re g i st r y d a t a  Re- ingested data by site
  • 5.
  • 6.
    INITIAL EXPLORATION Most frequently repor ting craf ts  Percentage of records accepted & ingested  99.80%  1924 lines rejected
  • 7.
    SITE COMPARISON Site1 Site 2 Number of Sightings 563715 449904 Average Speed 342.02 395.37 Average Alt 15919.28 20295.33
  • 8.
    SITE LATITUDE LONGITUDEDATA  Sampled latitude and longitude data both sites  Found average latitude and longitude for each  Site one: 42.22, -70.85  Site two: 42.12, -71.49
  • 9.
  • 10.
  • 11.
    MASTER DATA QUERIES CREATE TABLE model_summary AS SELECT mdl_code, make, model , max(speed), max(alt) FROM master_data GROUP BY mdl_code, make, model; CREATE TABLE aircraf t_summary AS SELECT ident , make, model , max(speed), max(alt) FROM master_data GROUP BY ident , make, model; CREATE TABLE owner_summary AS SELECT owner_name, count(distinct(hexid) ) AS count_hex FROM master_data GROUP BY owner_name;
  • 12.
    DATA SCIENCE & VISUALIZATIONS
  • 13.
  • 15.
    TOP SPEED VSCRUISING ALTITUDE BY MAKE
  • 17.
  • 19.
    NUMBER OF SIGHTINGS BY AIRCRAFT MAKE

Editor's Notes