Spread and Page Rank
for Interactive Maps
Wm Leler - Flightstats, Inc.
wm@flightstats.com
openstreetmap.org
zoom level 6
openstreetmap.org
Missing: San Francisco, San Jose, Los Angeles, LasVegas,
Phoenix, Seattle,Vancouver, Detroit, Dallas, NYC, Miami
Stamen Terrain
Beaverton? Hillsboro? Forest Grove? Tigard?
Google Maps
Seattle? Denver? Salt Lake City? LasVegas?
MapQuest
5
6
where did they go?
The Big Problem
• A map is a spatial display of a bunch of
objects: cities, highways, parks, airports, etc.
• At most zoom levels, there are far too
many objects to display.
• What is the best way to pick which objects
to display (per zoom level) on a map?
Our Immediate
Problem
• At FlightStats we need to decide which
airports to draw on a map
Every airport
that supplies
us with flight
data (4180)
FAA Categories
Based on % of passenger-enplanements
1. Primary large hub (>1% of p-e)
2. Primary medium hub (.25 - 1%)
3. Primary small hub (.05 - .25%)
4. Primary non-hub (<0.05%)
5. Secondary (< 10,000 p-e / year)
Bad Solution
• Not so good for maps
• Airports bunch together and leave big
empty spaces
Bunching
Empty Spaces
Bunching
Delay Map
• South America has
only 3 large
primary airports
• Two serve the
same city (Rio)
• Africa has one
(JNB)
Even Worse
Internationally
• Should be dependent on zoom level
• as you zoom in, you want to see more
• Number of passengers is a bad measure
• short commuter flights to little airports
should count less
• flights to other major airports should
count more
More Problems
Goals
• Show “important” airports
• major airports
• plus less major airports if they are the
primary airport for an area
• Avoid airports “bunching up”
Solution
Importance of an airport is based on:
1. How connected it is to other airports
weighted by the importance of each
other airport (recursive)
2. Area for which it is the primary airport
3. Based on current map view
Connectivity
• Calculate connectivity using the PageRank
algorithm (by Larry Page at Google,
adapted by Steve Wilson at FlightStats)
SEA
PDX
EUG
PDT
RDM
PR(PDX) =
foreach airport x:
flights(PDX, x) * PR(x)
Connectivity
• Calculate connectivity using the PageRank
algorithm (by Larry Page at Google,
adapted by Steve Wilson at FlightStats)
SEA
PDX
EUG
PDT
RDM
PR(PDX) =
foreach airport x:
flights(PDX, x) * PR(x)
.0025
.0036
.00014
.0002.0003
Spread
• (Page) Rank still suffers from bunching and
empty spaces
• Add Spread – the distance to the closest
airport of higher rank
• a reasonable proxy for the area
• The Spread for PDX is 111.98 nautical
miles (the distance to SEA)
Combining
Rank and Spread
Rank Spread X
SFO 0.0040 293.5 1.19
OAK 0.0015 10.26 0.0158
SJC 0.0012 24.8 0.0287
SMF 0.0011 65.6 0.0702
The four biggest airports in Northern CA
Algorithm
1. Pre-calculate Rank and Spread for all
airports (they don’t change very often)
2. For each map view, display the N airports
with the largest product of Rank and
Spread
3. (optionally) set minimum Rank and Spread
Flexibility
• Applications can pick their weighting of
Rank and Spread, and value of N
• N can depend on zoom level
• Can also use them as limits
• minimum Spread to debunch airports
• minimum Rank to hide small airports
Beyond Airports
• Can use Spread to space out almost
anything on a map
• cities, neighborhoods, roads, parks,
mountains, rivers, lakes, ...
• your data!
• Rank emphasizes connectivity over raw
size or category
Connectivity
gaps
noise
DEMO
http://demo.flightstats-ops.com/spread
http://www.slideshare.net/wmleler/
spread-20277955

Spread and Page Rank for Interactive Maps

  • 1.
    Spread and PageRank for Interactive Maps Wm Leler - Flightstats, Inc. wm@flightstats.com
  • 2.
  • 3.
    openstreetmap.org Missing: San Francisco,San Jose, Los Angeles, LasVegas, Phoenix, Seattle,Vancouver, Detroit, Dallas, NYC, Miami
  • 4.
  • 5.
    Google Maps Seattle? Denver?Salt Lake City? LasVegas?
  • 6.
  • 7.
    The Big Problem •A map is a spatial display of a bunch of objects: cities, highways, parks, airports, etc. • At most zoom levels, there are far too many objects to display. • What is the best way to pick which objects to display (per zoom level) on a map?
  • 8.
    Our Immediate Problem • AtFlightStats we need to decide which airports to draw on a map Every airport that supplies us with flight data (4180)
  • 9.
    FAA Categories Based on% of passenger-enplanements 1. Primary large hub (>1% of p-e) 2. Primary medium hub (.25 - 1%) 3. Primary small hub (.05 - .25%) 4. Primary non-hub (<0.05%) 5. Secondary (< 10,000 p-e / year)
  • 10.
    Bad Solution • Notso good for maps • Airports bunch together and leave big empty spaces Bunching Empty Spaces Bunching Delay Map
  • 11.
    • South Americahas only 3 large primary airports • Two serve the same city (Rio) • Africa has one (JNB) Even Worse Internationally
  • 12.
    • Should bedependent on zoom level • as you zoom in, you want to see more • Number of passengers is a bad measure • short commuter flights to little airports should count less • flights to other major airports should count more More Problems
  • 13.
    Goals • Show “important”airports • major airports • plus less major airports if they are the primary airport for an area • Avoid airports “bunching up”
  • 14.
    Solution Importance of anairport is based on: 1. How connected it is to other airports weighted by the importance of each other airport (recursive) 2. Area for which it is the primary airport 3. Based on current map view
  • 15.
    Connectivity • Calculate connectivityusing the PageRank algorithm (by Larry Page at Google, adapted by Steve Wilson at FlightStats) SEA PDX EUG PDT RDM PR(PDX) = foreach airport x: flights(PDX, x) * PR(x)
  • 16.
    Connectivity • Calculate connectivityusing the PageRank algorithm (by Larry Page at Google, adapted by Steve Wilson at FlightStats) SEA PDX EUG PDT RDM PR(PDX) = foreach airport x: flights(PDX, x) * PR(x) .0025 .0036 .00014 .0002.0003
  • 17.
    Spread • (Page) Rankstill suffers from bunching and empty spaces • Add Spread – the distance to the closest airport of higher rank • a reasonable proxy for the area • The Spread for PDX is 111.98 nautical miles (the distance to SEA)
  • 18.
    Combining Rank and Spread RankSpread X SFO 0.0040 293.5 1.19 OAK 0.0015 10.26 0.0158 SJC 0.0012 24.8 0.0287 SMF 0.0011 65.6 0.0702 The four biggest airports in Northern CA
  • 19.
    Algorithm 1. Pre-calculate Rankand Spread for all airports (they don’t change very often) 2. For each map view, display the N airports with the largest product of Rank and Spread 3. (optionally) set minimum Rank and Spread
  • 20.
    Flexibility • Applications canpick their weighting of Rank and Spread, and value of N • N can depend on zoom level • Can also use them as limits • minimum Spread to debunch airports • minimum Rank to hide small airports
  • 21.
    Beyond Airports • Canuse Spread to space out almost anything on a map • cities, neighborhoods, roads, parks, mountains, rivers, lakes, ... • your data! • Rank emphasizes connectivity over raw size or category
  • 22.
  • 23.