Two beginners in Python analysed network data from the Lufthansa Open API and constructed web applications showcasing these results using Python, SQL and D3.js.
3. Objective
● To produce a map that shows the location of airports in Europe and the
direct flights in-between
○ What we need...
■ list of European airports
■ direct flights between any two airports
● To analyse the importance of airports based on the number of
connections needed
○ What we need...
■ list of European airports
■ number of direct flights for each airport
■ rank of airports by number of destinations
3
4. Definitions
● Case study in Europe (defined as EU28 including Britain, plus Switzerland
and Norway)
● Connection: the smallest number of transfers needed to travel between
two airports
○ Connection = 0: direct flight (without transfer)
○ Connection = 1: with 1 transfer
○ Connection > 1: with >1 transfer
4
A
B
C
D
5. Lufthansa Open API
● Reference data: Countries, Cities, Airports
● Operations: Flight Schedules
A priori, the data are not limited to Lufthansa flights
5
6. Structure of data in the API
Example: Berlin-Tegel airport TXL in XML
6
“Airport”,
“RailwayStation” or
"BusStation"
8. Methodology
3 MOOCs on Coursera (University of Michigan)
- “Python Data Structures”
- “Using Python to Access Web Data”
- “Using Databases with Python”
2 books on Python:
- “Thinking Python” - Allen B. Downey
- “Python For everybody” - Charles Severance
8
11. How to GET data
Step 1: Acquire all reference data on Countries, Cities and
Airports...
Problem: 1,261 airports in total
→ get all records in several loops by altering the value of offset
number of records returned
Maximum is 100!
11
12. How to GET data (2)
Step 2: Information on all flights between European airports
over a week (2017/01/20-2017/01/26)
Obtain a list of European airports by SQL
→ 2 loops to create all possible pairs
220 x 220 = 48400 pairs = 3 h of execution per day!
12
need to always include origin, destination and date in request
20. Data analysis in SQL
- 5 airports as origine with the greatest number of direct
connections
- Data over a week
- Net flights per day
- Frequency by week
- 5 hubs based on Lufthansa BD
20
21. Data analysis in SQL (2)
- 5 airports as destination with the greatest number of direct
connections
21
22. Data analysis in SQL (3)
- Airports in France as origin in Lufthansa DB
22
23. Data analysis in SQL (4)
- 5 airports as origin with least direct connections
23
24. Data analysis in SQL (5)
- Connections of 5 hubs as origin in Lufthansa DB
24
Airport Connection = 0 Connection = 1 Connections > 1
Frankfurt (FRA) 91 105 24
Munich (MUC) 92 103 25
Vienna (VIE) 58 132 30
Zurich (ZRH) 51 132 37
Brussel (BRU) 48 117 55
25. Data analysis in SQL (6)
Airports with Connections(= 1) from Frankfurt, sorted by country
25
26. Visualisation: Force-directed graph in D3.js
● Physical model: forces of attraction and repulsion
● Algorithm defined in D3.js (JavaScript), a popular
package for data visualisation
Drawings obtained with force-directed
algorithms
Source:
https://cs.brown.edu/~rt/gdhandbook/chapters/fo
rce-directed.pdf
26
30. Limitations and perspectives
- Limitations
- Quality of data
- Exclusivity of data
- Perspectives
- A map that shows the frequency of service between airports
- Country profile: domestic VS local flights
- Airlines: legacy VS budget
30