Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

  • 4,975 views
Uploaded on

Speaker: Marc Wick

Speaker: Marc Wick

More in: Technology , Travel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,975
On Slideshare
4,970
From Embeds
5
Number of Embeds
1

Actions

Shares
Downloads
211
Comments
0
Likes
3

Embeds 5

http://www.slideshare.net 5

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. GeoNames “Under the Hood: How GeoNames Aggregates many Sources into One Data Set“ GeoNames is ... aggregator of free geo data I am ... Marc Wick self employed software engineer, Switzerland
  • 2. GeoNames Feature Density Map GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
  • 3. GeoNames - Gazetteer Pragmatic, useful, ease of use Over 6.5 million features Cc-by licence 9 feature classes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
  • 4. Screen shot Berlin GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4
  • 5. Origins and Goal Proprietary application Team up together contribute modifications to central data base. applications switch to GeoNames from proprietary aggregation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
  • 6. Challenge A lot of data IS available Many providers Languages Scripts GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
  • 7. GeoNames Ambassadors GeoNames contact Speak local language Know local situation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
  • 8. Data Sources National Mapping Agencies Statistical Offices Postal codes National Geospatial-Intelligence Agency (NGA)‫‏‬ Applications using GeoNames − Data files − Manual modifications GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
  • 9. US vs Europe US data is freely available European data is not available Rest of the World? Consequences GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
  • 10. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
  • 11. Future of geodata availability We believe basic geodata will be free in most countries Why : − Economy − Traffic Policy and Road Safety (road signs)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
  • 12. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
  • 13. Free Availability is only a First Step GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
  • 14. Who aggregates data GeoNames Super national mapping agencies Super national organisations INSPIRE GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
  • 15. Problems and Solutions I Shape / GML FWTools/ GDAL/OGR Datum reprojection Postgis/epsg/native tools/custom impl GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
  • 16. Problems and Solutions II FeatureCodes not 1:1 Pattern matching non-ASCII Transliteration Country codes Admin1 codes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
  • 17. Place name matching Geocoding Distance feature type and feature code Reverse geocoding, compare name similarity − levenshtein distance − letter pair similarity GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
  • 18. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
  • 19. Wikipedia GeoTemplates Proliferation of GeoFormats No consensus, Anarchy Examples − <geo>48 46 36 N 121 48 51 W</geo> − {{coor d|48.7767|N|121.8142|W|}} − Berlin : |lat_deg = 52|lat_min = 31 − ... (Any template you could possibly think of is used somewhere)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
  • 20. Alternate Names ... Italian : Berlino English : Berlin Arabic : ‫نيلرب‬ Korean : Thai : เบอรลิน Russian : Берлин Chinese : Marathi : बर् लि न ... (ca 100 names)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
  • 21. Postal codes Geocode – postal code numeric distance Accuracy, completeness ScribbleMaps by Robert Kosara GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
  • 22. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
  • 23. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
  • 24. Data Dump Flat csv files Simple format Ease of use Full daily dump daily modifications rdf GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
  • 25. Web Services Search − Ranking Tf idf Relevancy − I18n GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
  • 26. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
  • 27. Hierarchy Web Services Hierarchy Child Neighbour Sibling GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
  • 28. Apache mod rewrite ROME (RSS)‫‏‬ jdom.org (xml)‫ ‏‬JSON Tomcat (Java)‫‏‬ JMS activeMQ Lucene SRTM3 Gtopo30 JDBC Full Text Index TF-IDF Database : Postgres (postgis)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
  • 29. Libraries Java Drupal Ruby Php Perl Python Lisp GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
  • 30. Synchronization Dail dump Daily modification Jms Rdf dump, periodically GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
  • 31. Linked Data GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
  • 32. Applications using GeoNames thousands of applications search Site navigation geo-coding GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
  • 33. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
  • 34. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
  • 35. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
  • 36. Thank you for your attention. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36