Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Trending Places
on OpenStreetMap
State of the Map, Brussels, 24.9.2016
Stefan Keller
Geometa Lab at HSR
University of Appl...
Trending Places on
OpenStreetMap
• A big data project with a Twitter bot
• @trending_places (and github)
• Goal: Find sign...
Log data
• A web map consists of map tiles at
different zoom levels
• The views of these tiles are logged daily
and publis...
Log count
for each line in all the logs
{
z, x, y = extract coordinate from line
ip = extract source IP address from line
...
How?
• For previous 7 days the tile view logs are aggregated
up to zoom level 14
• A T-score is calculated to standardize ...
Challenges: Crawling
activity on osm
Visualizing OSM.org's Map
Views
• Lukas Martinelli (@lukmartinelli), May 2016
http://lukasmartinelli.ch/python/2015/05/24/...
Challenges: Clustering
around a trending place
Challenges: Ranking
trending places
Normalization
Challenges: Reverse
Geocoding
• Given a coordinates (from tile boundary)
• Give most relevant geographic name
inside / nea...
Ex. of strong correlation:
Fort McMurray (CA)
1-3 May 2016: Wildfire across approximately 5900 square km
(1/6 Belgium 2x L...
Ex. of strong correlation #2:
Flüelen (CH)
1 June 2016: Switzerland celebrated the world's longest railway
tunnel (“The Go...
Example of strong corr. #3:
San Severino Marche (IT)
24 August 2016: Earthquake of 6.2 on the moment magnitude scale hit
C...
More statistics…
• Processing time: 5h (using SQLite / Python)
• Reporting period: 2016-04-11 - 2016-09-18
• No. reports: ...
Open questions
• Why so much russian places (and places
from post-Soviet states)?
• Influence of crawling?
• Bias of place...
Final open questions…
• Do you know…
– Sea Cliff (US),
– Sitionuevo (CO),
– Athens (GR),
– Sacele (RO), or
– Pretoria (ZA)...
https://youtu.be/olmL1fUnQAQ
Thanks
Also to Bhavya Chandra (main author, NTU
Singapore), Matt Amos, Lukas Martinelli
(@lukasmartinelli), Pavel Tyslacki...
Upcoming SlideShare
Loading in …5
×

Trending Places on OpenStreetMap

228 views

Published on

Presentation at State of the Map, Brussels, 24.9.2016, about a data engineering project with a Twitter bot. It's goal is to find significant viewing activity worldwide on the main web map ("slippy map") of OpenStreetMap (OSM.org). See http://2016.stateofthemap.org/2016/trending-places-on-openstreetmap/ and Twitter @trending_places.

Published in: Data & Analytics
  • Be the first to comment

Trending Places on OpenStreetMap

  1. 1. Trending Places on OpenStreetMap State of the Map, Brussels, 24.9.2016 Stefan Keller Geometa Lab at HSR University of Applied Sciences Rapperswil
  2. 2. Trending Places on OpenStreetMap • A big data project with a Twitter bot • @trending_places (and github) • Goal: Find significant viewing activity worldwide on the main web map (“slippy map”) of OpenStreetMap (OSM) • This activity may be indicative of popular news or events in that region
  3. 3. Log data • A web map consists of map tiles at different zoom levels • The views of these tiles are logged daily and published in an anonymized form with a delay of 2 days • http://planet.openstreetmap.org/tile_logs/
  4. 4. Log count for each line in all the logs { z, x, y = extract coordinate from line ip = extract source IP address from line counter[z, x, y] += 1 source_addresses[z, x, y].append(ip) } for each (z, x, y) key in counter { if counter[z, x, y] >= 10 { if count_unique(source_addresses[z, x, y]) >= 3 { print z, x, y, counter[z, x, y] } } } File Format (as TSV): date,z,y,z where z=zoom, x/y=TMS index
  5. 5. How? • For previous 7 days the tile view logs are aggregated up to zoom level 14 • A T-score is calculated to standardize the data • Values above a certain threshold are filtered out to catch spikes • These spikes are ranked relative to the mean increase in views overall (compensates growth of OSM) • Clustering eliminates locations that are near one another • Tile coordinates are reverse geocoded using Nominatim in order to get geographic names • A Twitter bot @trending_places announces the top 10 each day arfter 10 a.m. (or en error in case)
  6. 6. Challenges: Crawling activity on osm
  7. 7. Visualizing OSM.org's Map Views • Lukas Martinelli (@lukmartinelli), May 2016 http://lukasmartinelli.ch/python/2015/05/24/parsing-and- visualizing-osm-access-logs.html • Martin Raifer (@tyr_asd), September 2016 http://www.openstreetmap.org/user/tyr_asd/diary/39434
  8. 8. Challenges: Clustering around a trending place
  9. 9. Challenges: Ranking trending places Normalization
  10. 10. Challenges: Reverse Geocoding • Given a coordinates (from tile boundary) • Give most relevant geographic name inside / nearby • Using place geographic names => Nominatim • (no POIs yet)
  11. 11. Ex. of strong correlation: Fort McMurray (CA) 1-3 May 2016: Wildfire across approximately 5900 square km (1/6 Belgium 2x Luxembourg), destroying ~2,400 homes
  12. 12. Ex. of strong correlation #2: Flüelen (CH) 1 June 2016: Switzerland celebrated the world's longest railway tunnel (“The Gotthard Base Tunnel”) through the Alps…
  13. 13. Example of strong corr. #3: San Severino Marche (IT) 24 August 2016: Earthquake of 6.2 on the moment magnitude scale hit Central Italy. Its epicentre was southeast of Perugia and north of L'Aquila, in an area near the borders of the Umbria, Lazio, Abruzzo and Marche regions. As of 16 September 2016, 297 people have been killed
  14. 14. More statistics… • Processing time: 5h (using SQLite / Python) • Reporting period: 2016-04-11 - 2016-09-18 • No. reports: 125 (out of 160 days) • Top 10 countries overall: RU 293, US 131, DE 70, UA 67, FR 46, PL 44, NO 43, ES 35, RO 33, GB 31 • Top 10 place names overall: Saratovsky District (RU) 16, 57.04.53.26 (RU) 13, Stara Emetivka (UA) 13, Tatarstan (RU) 13, Jambyl Province (KZ) 11, Johor Bahru (MY Malaysia) 11, Odessa (UA) 11, Shimen (TW) 11, N.N. 11, Black Point (US) 10
  15. 15. Open questions • Why so much russian places (and places from post-Soviet states)? • Influence of crawling? • Bias of places with spikes after zero activity vs. crowded places? • Other bias? • Better than T-Score? E.g. w/ Poisson Distribution (multivariate ARIMA?)
  16. 16. Final open questions… • Do you know… – Sea Cliff (US), – Sitionuevo (CO), – Athens (GR), – Sacele (RO), or – Pretoria (ZA) ………? • Wonder why !
  17. 17. https://youtu.be/olmL1fUnQAQ
  18. 18. Thanks Also to Bhavya Chandra (main author, NTU Singapore), Matt Amos, Lukas Martinelli (@lukasmartinelli), Pavel Tyslacki (@tbicr), Joost Schouppe (joostjakob) Stefan Keller Geometa Lab at HSR University of Applied Sciences Rapperswil (Switzerland) www.hsr.ch/geometalab @sfkeller

×