GeoNames
        “Under the Hood: How GeoNames Aggregates
             many Sources into One Data Set“




             Ge...
GeoNames Feature Density Map




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   2
GeoNames - Gazetteer
    Pragmatic, useful, ease of use
    Over 6.5 million features
    Cc-by licence
    9 feature clas...
Screen shot Berlin




GeoNames, Marc Wick       Web 2.0 Expo - 8. Nov 2007 Berlin   4
Origins and Goal
    Proprietary application
    Team up together
    contribute modifications to central data base.
    a...
Challenge
    A lot of data IS
    available
    Many providers
    Languages
    Scripts




GeoNames, Marc Wick    Web 2...
GeoNames Ambassadors
                                             GeoNames contact
                                       ...
Data Sources
    National Mapping Agencies
    Statistical Offices
    Postal codes
    National Geospatial-Intelligence A...
US vs Europe
    US data is freely available
    European data is not available
    Rest of the World?
    Consequences


...
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   10
Future of geodata availability
    We believe basic geodata will be free in most
    countries


    Why :
      −   Econo...
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   12
Free Availability is only a First Step




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   13
Who aggregates data
    GeoNames
    Super national mapping agencies
    Super national organisations


    INSPIRE




Ge...
Problems and Solutions I
    Shape / GML                               FWTools/ GDAL/OGR
    Datum reprojection           ...
Problems and Solutions II
    FeatureCodes not 1:1                     Pattern matching
    non-ASCII                     ...
Place name matching
    Geocoding
    Distance
    feature type and feature code
    Reverse geocoding, compare name simil...
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   18
Wikipedia GeoTemplates
    Proliferation of GeoFormats
    No consensus, Anarchy
    Examples
      −   <geo>48 46 36 N 12...
Alternate Names

  ...
  Italian : Berlino
  English : Berlin
  Arabic : ‫نيلرب‬
  Korean :
  Thai          : เบอรลิน
  R...
Postal codes
    Geocode – postal code numeric distance
    Accuracy, completeness


    ScribbleMaps by Robert Kosara



...
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   22
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   23
Data Dump
    Flat csv files
    Simple format
    Ease of use
    Full daily dump
    daily modifications
    rdf



GeoN...
Web Services
    Search
      −   Ranking
              Tf idf
              Relevancy
      −   I18n




GeoNames, Marc W...
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   26
Hierarchy Web Services
    Hierarchy
    Child
    Neighbour
    Sibling




GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov ...
Apache

                       mod rewrite

                                   ROME (RSS)‫‏‬         jdom.org (xml)‫ ‏‬JSO...
Libraries
                                             Java
                                             Drupal
          ...
Synchronization
    Dail dump
    Daily modification
    Jms


    Rdf dump, periodically




GeoNames, Marc Wick     Web ...
Linked Data




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   31
Applications using GeoNames
    thousands of applications
    search
    Site navigation
    geo-coding




GeoNames, Marc...
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   33
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   34
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   35
Thank you for your attention.




GeoNames, Marc Wick         Web 2.0 Expo - 8. Nov 2007 Berlin   36
Upcoming SlideShare
Loading in...5
×

Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

3,564

Published on

Speaker: Marc Wick

Published in: Technology, Travel
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,564
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
214
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

  1. 1. GeoNames “Under the Hood: How GeoNames Aggregates many Sources into One Data Set“ GeoNames is ... aggregator of free geo data I am ... Marc Wick self employed software engineer, Switzerland
  2. 2. GeoNames Feature Density Map GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
  3. 3. GeoNames - Gazetteer Pragmatic, useful, ease of use Over 6.5 million features Cc-by licence 9 feature classes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
  4. 4. Screen shot Berlin GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4
  5. 5. Origins and Goal Proprietary application Team up together contribute modifications to central data base. applications switch to GeoNames from proprietary aggregation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
  6. 6. Challenge A lot of data IS available Many providers Languages Scripts GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
  7. 7. GeoNames Ambassadors GeoNames contact Speak local language Know local situation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
  8. 8. Data Sources National Mapping Agencies Statistical Offices Postal codes National Geospatial-Intelligence Agency (NGA)‫‏‬ Applications using GeoNames − Data files − Manual modifications GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
  9. 9. US vs Europe US data is freely available European data is not available Rest of the World? Consequences GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
  10. 10. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
  11. 11. Future of geodata availability We believe basic geodata will be free in most countries Why : − Economy − Traffic Policy and Road Safety (road signs)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
  12. 12. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
  13. 13. Free Availability is only a First Step GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
  14. 14. Who aggregates data GeoNames Super national mapping agencies Super national organisations INSPIRE GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
  15. 15. Problems and Solutions I Shape / GML FWTools/ GDAL/OGR Datum reprojection Postgis/epsg/native tools/custom impl GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
  16. 16. Problems and Solutions II FeatureCodes not 1:1 Pattern matching non-ASCII Transliteration Country codes Admin1 codes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
  17. 17. Place name matching Geocoding Distance feature type and feature code Reverse geocoding, compare name similarity − levenshtein distance − letter pair similarity GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
  18. 18. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
  19. 19. Wikipedia GeoTemplates Proliferation of GeoFormats No consensus, Anarchy Examples − <geo>48 46 36 N 121 48 51 W</geo> − {{coor d|48.7767|N|121.8142|W|}} − Berlin : |lat_deg = 52|lat_min = 31 − ... (Any template you could possibly think of is used somewhere)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
  20. 20. Alternate Names ... Italian : Berlino English : Berlin Arabic : ‫نيلرب‬ Korean : Thai : เบอรลิน Russian : Берлин Chinese : Marathi : बर् लि न ... (ca 100 names)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
  21. 21. Postal codes Geocode – postal code numeric distance Accuracy, completeness ScribbleMaps by Robert Kosara GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
  22. 22. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
  23. 23. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
  24. 24. Data Dump Flat csv files Simple format Ease of use Full daily dump daily modifications rdf GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
  25. 25. Web Services Search − Ranking Tf idf Relevancy − I18n GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
  26. 26. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
  27. 27. Hierarchy Web Services Hierarchy Child Neighbour Sibling GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
  28. 28. Apache mod rewrite ROME (RSS)‫‏‬ jdom.org (xml)‫ ‏‬JSON Tomcat (Java)‫‏‬ JMS activeMQ Lucene SRTM3 Gtopo30 JDBC Full Text Index TF-IDF Database : Postgres (postgis)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
  29. 29. Libraries Java Drupal Ruby Php Perl Python Lisp GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
  30. 30. Synchronization Dail dump Daily modification Jms Rdf dump, periodically GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
  31. 31. Linked Data GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
  32. 32. Applications using GeoNames thousands of applications search Site navigation geo-coding GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
  33. 33. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
  34. 34. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
  35. 35. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
  36. 36. Thank you for your attention. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×