Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
Upcoming SlideShare
Loading in...5
×
 

Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

on

  • 4,909 views

Speaker: Marc Wick

Speaker: Marc Wick

Statistics

Views

Total Views
4,909
Views on SlideShare
4,904
Embed Views
5

Actions

Likes
3
Downloads
211
Comments
0

1 Embed 5

http://www.slideshare.net 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set Presentation Transcript

  • GeoNames “Under the Hood: How GeoNames Aggregates many Sources into One Data Set“ GeoNames is ... aggregator of free geo data I am ... Marc Wick self employed software engineer, Switzerland
  • GeoNames Feature Density Map GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
  • GeoNames - Gazetteer Pragmatic, useful, ease of use Over 6.5 million features Cc-by licence 9 feature classes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3 View slide
  • Screen shot Berlin GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4 View slide
  • Origins and Goal Proprietary application Team up together contribute modifications to central data base. applications switch to GeoNames from proprietary aggregation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
  • Challenge A lot of data IS available Many providers Languages Scripts GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
  • GeoNames Ambassadors GeoNames contact Speak local language Know local situation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
  • Data Sources National Mapping Agencies Statistical Offices Postal codes National Geospatial-Intelligence Agency (NGA)‫‏‬ Applications using GeoNames − Data files − Manual modifications GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
  • US vs Europe US data is freely available European data is not available Rest of the World? Consequences GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
  • Future of geodata availability We believe basic geodata will be free in most countries Why : − Economy − Traffic Policy and Road Safety (road signs)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
  • Free Availability is only a First Step GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
  • Who aggregates data GeoNames Super national mapping agencies Super national organisations INSPIRE GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
  • Problems and Solutions I Shape / GML FWTools/ GDAL/OGR Datum reprojection Postgis/epsg/native tools/custom impl GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
  • Problems and Solutions II FeatureCodes not 1:1 Pattern matching non-ASCII Transliteration Country codes Admin1 codes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
  • Place name matching Geocoding Distance feature type and feature code Reverse geocoding, compare name similarity − levenshtein distance − letter pair similarity GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
  • Wikipedia GeoTemplates Proliferation of GeoFormats No consensus, Anarchy Examples − <geo>48 46 36 N 121 48 51 W</geo> − {{coor d|48.7767|N|121.8142|W|}} − Berlin : |lat_deg = 52|lat_min = 31 − ... (Any template you could possibly think of is used somewhere)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
  • Alternate Names ... Italian : Berlino English : Berlin Arabic : ‫نيلرب‬ Korean : Thai : เบอรลิน Russian : Берлин Chinese : Marathi : बर् लि न ... (ca 100 names)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
  • Postal codes Geocode – postal code numeric distance Accuracy, completeness ScribbleMaps by Robert Kosara GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
  • Data Dump Flat csv files Simple format Ease of use Full daily dump daily modifications rdf GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
  • Web Services Search − Ranking Tf idf Relevancy − I18n GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
  • Hierarchy Web Services Hierarchy Child Neighbour Sibling GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
  • Apache mod rewrite ROME (RSS)‫‏‬ jdom.org (xml)‫ ‏‬JSON Tomcat (Java)‫‏‬ JMS activeMQ Lucene SRTM3 Gtopo30 JDBC Full Text Index TF-IDF Database : Postgres (postgis)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
  • Libraries Java Drupal Ruby Php Perl Python Lisp GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
  • Synchronization Dail dump Daily modification Jms Rdf dump, periodically GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
  • Linked Data GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
  • Applications using GeoNames thousands of applications search Site navigation geo-coding GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
  • GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
  • Thank you for your attention. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36