Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

  • 3,341 views
Uploaded on

Speaker: Marc Wick

Speaker: Marc Wick

More in: Technology , Travel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,341
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
211
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. GeoNames “Under the Hood: How GeoNames Aggregates many Sources into One Data Set“ GeoNames is ... aggregator of free geo data I am ... Marc Wick self employed software engineer, Switzerland
  • 2. GeoNames Feature Density Map GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
  • 3. GeoNames - Gazetteer Pragmatic, useful, ease of use Over 6.5 million features Cc-by licence 9 feature classes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
  • 4. Screen shot Berlin GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4
  • 5. Origins and Goal Proprietary application Team up together contribute modifications to central data base. applications switch to GeoNames from proprietary aggregation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
  • 6. Challenge A lot of data IS available Many providers Languages Scripts GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
  • 7. GeoNames Ambassadors GeoNames contact Speak local language Know local situation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
  • 8. Data Sources National Mapping Agencies Statistical Offices Postal codes National Geospatial-Intelligence Agency (NGA)‫‏‬ Applications using GeoNames − Data files − Manual modifications GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
  • 9. US vs Europe US data is freely available European data is not available Rest of the World? Consequences GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
  • 10. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
  • 11. Future of geodata availability We believe basic geodata will be free in most countries Why : − Economy − Traffic Policy and Road Safety (road signs)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
  • 12. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
  • 13. Free Availability is only a First Step GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
  • 14. Who aggregates data GeoNames Super national mapping agencies Super national organisations INSPIRE GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
  • 15. Problems and Solutions I Shape / GML FWTools/ GDAL/OGR Datum reprojection Postgis/epsg/native tools/custom impl GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
  • 16. Problems and Solutions II FeatureCodes not 1:1 Pattern matching non-ASCII Transliteration Country codes Admin1 codes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
  • 17. Place name matching Geocoding Distance feature type and feature code Reverse geocoding, compare name similarity − levenshtein distance − letter pair similarity GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
  • 18. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
  • 19. Wikipedia GeoTemplates Proliferation of GeoFormats No consensus, Anarchy Examples − <geo>48 46 36 N 121 48 51 W</geo> − {{coor d|48.7767|N|121.8142|W|}} − Berlin : |lat_deg = 52|lat_min = 31 − ... (Any template you could possibly think of is used somewhere)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
  • 20. Alternate Names ... Italian : Berlino English : Berlin Arabic : ‫نيلرب‬ Korean : Thai : เบอรลิน Russian : Берлин Chinese : Marathi : बर् लि न ... (ca 100 names)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
  • 21. Postal codes Geocode – postal code numeric distance Accuracy, completeness ScribbleMaps by Robert Kosara GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
  • 22. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
  • 23. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
  • 24. Data Dump Flat csv files Simple format Ease of use Full daily dump daily modifications rdf GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
  • 25. Web Services Search − Ranking Tf idf Relevancy − I18n GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
  • 26. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
  • 27. Hierarchy Web Services Hierarchy Child Neighbour Sibling GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
  • 28. Apache mod rewrite ROME (RSS)‫‏‬ jdom.org (xml)‫ ‏‬JSON Tomcat (Java)‫‏‬ JMS activeMQ Lucene SRTM3 Gtopo30 JDBC Full Text Index TF-IDF Database : Postgres (postgis)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
  • 29. Libraries Java Drupal Ruby Php Perl Python Lisp GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
  • 30. Synchronization Dail dump Daily modification Jms Rdf dump, periodically GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
  • 31. Linked Data GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
  • 32. Applications using GeoNames thousands of applications search Site navigation geo-coding GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
  • 33. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
  • 34. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
  • 35. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
  • 36. Thank you for your attention. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36