SlideShare a Scribd company logo
GeoNames
        “Under the Hood: How GeoNames Aggregates
             many Sources into One Data Set“




             GeoNames is ...
        aggregator of free geo data


                   I am ...
                 Marc Wick
self employed software engineer, Switzerland
GeoNames Feature Density Map




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   2
GeoNames - Gazetteer
    Pragmatic, useful, ease of use
    Over 6.5 million features
    Cc-by licence
    9 feature classes




GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov 2007 Berlin   3
Screen shot Berlin




GeoNames, Marc Wick       Web 2.0 Expo - 8. Nov 2007 Berlin   4
Origins and Goal
    Proprietary application
    Team up together
    contribute modifications to central data base.
    applications switch to GeoNames from
    proprietary aggregation




GeoNames, Marc Wick      Web 2.0 Expo - 8. Nov 2007 Berlin   5
Challenge
    A lot of data IS
    available
    Many providers
    Languages
    Scripts




GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov 2007 Berlin   6
GeoNames Ambassadors
                                             GeoNames contact
                                             Speak local language
                                             Know local situation




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin             7
Data Sources
    National Mapping Agencies
    Statistical Offices
    Postal codes
    National Geospatial-Intelligence Agency (NGA)‫‏‬
    Applications using GeoNames
      −   Data files
      −   Manual modifications


GeoNames, Marc Wick     Web 2.0 Expo - 8. Nov 2007 Berlin   8
US vs Europe
    US data is freely available
    European data is not available
    Rest of the World?
    Consequences




GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov 2007 Berlin   9
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   10
Future of geodata availability
    We believe basic geodata will be free in most
    countries


    Why :
      −   Economy
      −   Traffic Policy and Road Safety (road signs)‫‏‬




GeoNames, Marc Wick      Web 2.0 Expo - 8. Nov 2007 Berlin   11
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   12
Free Availability is only a First Step




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   13
Who aggregates data
    GeoNames
    Super national mapping agencies
    Super national organisations


    INSPIRE




GeoNames, Marc Wick        Web 2.0 Expo - 8. Nov 2007 Berlin   14
Problems and Solutions I
    Shape / GML                               FWTools/ GDAL/OGR
    Datum reprojection                        Postgis/epsg/native
                                              tools/custom impl




GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov 2007 Berlin            15
Problems and Solutions II
    FeatureCodes not 1:1                     Pattern matching
    non-ASCII                                Transliteration
    Country codes
    Admin1 codes




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin         16
Place name matching
    Geocoding
    Distance
    feature type and feature code
    Reverse geocoding, compare name similarity
      −   levenshtein distance
      −   letter pair similarity




GeoNames, Marc Wick        Web 2.0 Expo - 8. Nov 2007 Berlin   17
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   18
Wikipedia GeoTemplates
    Proliferation of GeoFormats
    No consensus, Anarchy
    Examples
      −   <geo>48 46 36 N 121 48 51 W</geo>
      −   {{coor d|48.7767|N|121.8142|W|}}
      −   Berlin : |lat_deg = 52|lat_min = 31
      −   ... (Any template you could possibly think of is used somewhere)‫‏‬



GeoNames, Marc Wick            Web 2.0 Expo - 8. Nov 2007 Berlin              19
Alternate Names

  ...
  Italian : Berlino
  English : Berlin
  Arabic : ‫نيلرب‬
  Korean :
  Thai          : เบอรลิน
  Russian : Берлин
  Chinese :
  Marathi : बर् लि न
  ... (ca 100 names)‫‏‬
GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov 2007 Berlin   20
Postal codes
    Geocode – postal code numeric distance
    Accuracy, completeness


    ScribbleMaps by Robert Kosara




GeoNames, Marc Wick     Web 2.0 Expo - 8. Nov 2007 Berlin   21
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   22
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   23
Data Dump
    Flat csv files
    Simple format
    Ease of use
    Full daily dump
    daily modifications
    rdf



GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   24
Web Services
    Search
      −   Ranking
              Tf idf
              Relevancy
      −   I18n




GeoNames, Marc Wick        Web 2.0 Expo - 8. Nov 2007 Berlin   25
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   26
Hierarchy Web Services
    Hierarchy
    Child
    Neighbour
    Sibling




GeoNames, Marc Wick    Web 2.0 Expo - 8. Nov 2007 Berlin   27
Apache

                       mod rewrite

                                   ROME (RSS)‫‏‬         jdom.org (xml)‫ ‏‬JSON

                                Tomcat (Java)‫‏‬                          JMS
                                                                        activeMQ


                        Lucene




                                                                           SRTM3
                                                                                   Gtopo30
                                                    JDBC
                      Full Text Index
                      TF-IDF



                                     Database : Postgres
                                                                        (postgis)‫‏‬


GeoNames, Marc Wick                 Web 2.0 Expo - 8. Nov 2007 Berlin                        28
Libraries
                                             Java
                                             Drupal
                                             Ruby
                                             Php
                                             Perl
                                             Python
                                             Lisp

GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   29
Synchronization
    Dail dump
    Daily modification
    Jms


    Rdf dump, periodically




GeoNames, Marc Wick     Web 2.0 Expo - 8. Nov 2007 Berlin   30
Linked Data




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   31
Applications using GeoNames
    thousands of applications
    search
    Site navigation
    geo-coding




GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   32
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   33
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   34
GeoNames, Marc Wick   Web 2.0 Expo - 8. Nov 2007 Berlin   35
Thank you for your attention.




GeoNames, Marc Wick         Web 2.0 Expo - 8. Nov 2007 Berlin   36

More Related Content

Similar to Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
Channy Yun
 
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryComparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Ghislain Atemezing
 
GeoKnow: Making the Web an Exploratory Place for Spatial Data
GeoKnow: Making the Web an Exploratory Place for Spatial DataGeoKnow: Making the Web an Exploratory Place for Spatial Data
GeoKnow: Making the Web an Exploratory Place for Spatial Data
OpenLink Software
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
adunne
 
GIS in the Rockies Geospatial Revolution
GIS in the Rockies Geospatial RevolutionGIS in the Rockies Geospatial Revolution
GIS in the Rockies Geospatial Revolution
Peter Batty
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearch
Taswar Bhatti
 
OSGeo Live Lightening Overview
OSGeo Live Lightening OverviewOSGeo Live Lightening Overview
OSGeo Live Lightening Overview
Jody Garnett
 
PCIC Data Portal 2.0
PCIC Data Portal 2.0PCIC Data Portal 2.0
PCIC Data Portal 2.0
James Hiebert
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
Kudos S.A.S
 
The User-participated Geospatial Web as Open Platform
The User-participated Geospatial Web as Open PlatformThe User-participated Geospatial Web as Open Platform
The User-participated Geospatial Web as Open Platform
Channy Yun
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
Tugdual Grall
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
NCGIC The Geospatial Revolution
NCGIC The Geospatial RevolutionNCGIC The Geospatial Revolution
NCGIC The Geospatial Revolution
Peter Batty
 
Scalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesScalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized Services
Globus
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
Tugdual Grall
 
Big Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use CaseBig Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use Case
Big Data Value Association
 
Geoprocessing with Neo4j-Spatial and OSM
Geoprocessing with Neo4j-Spatial and OSMGeoprocessing with Neo4j-Spatial and OSM
Geoprocessing with Neo4j-Spatial and OSM
Craig Taverner
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS Community
MongoDB
 

Similar to Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set (20)

The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
 
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryComparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their Geometry
 
ITCV
ITCVITCV
ITCV
 
GeoKnow: Making the Web an Exploratory Place for Spatial Data
GeoKnow: Making the Web an Exploratory Place for Spatial DataGeoKnow: Making the Web an Exploratory Place for Spatial Data
GeoKnow: Making the Web an Exploratory Place for Spatial Data
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
 
GIS in the Rockies Geospatial Revolution
GIS in the Rockies Geospatial RevolutionGIS in the Rockies Geospatial Revolution
GIS in the Rockies Geospatial Revolution
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearch
 
OSGeo Live Lightening Overview
OSGeo Live Lightening OverviewOSGeo Live Lightening Overview
OSGeo Live Lightening Overview
 
PCIC Data Portal 2.0
PCIC Data Portal 2.0PCIC Data Portal 2.0
PCIC Data Portal 2.0
 
Instalação geo ip
Instalação geo ipInstalação geo ip
Instalação geo ip
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
 
The User-participated Geospatial Web as Open Platform
The User-participated Geospatial Web as Open PlatformThe User-participated Geospatial Web as Open Platform
The User-participated Geospatial Web as Open Platform
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
NCGIC The Geospatial Revolution
NCGIC The Geospatial RevolutionNCGIC The Geospatial Revolution
NCGIC The Geospatial Revolution
 
Scalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesScalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized Services
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Big Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use CaseBig Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use Case
 
Geoprocessing with Neo4j-Spatial and OSM
Geoprocessing with Neo4j-Spatial and OSMGeoprocessing with Neo4j-Spatial and OSM
Geoprocessing with Neo4j-Spatial and OSM
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS Community
 

More from adunne

Seedcamp Overview
Seedcamp OverviewSeedcamp Overview
Seedcamp Overview
adunne
 
Netvibes Preview
Netvibes PreviewNetvibes Preview
Netvibes Preview
adunne
 
Community Practices: From Forums to Social Networks
Community Practices: From Forums to Social NetworksCommunity Practices: From Forums to Social Networks
Community Practices: From Forums to Social Networks
adunne
 
Designing Tag Navigation
Designing Tag NavigationDesigning Tag Navigation
Designing Tag Navigation
adunne
 
Social Commerce and Community
Social Commerce and CommunitySocial Commerce and Community
Social Commerce and Community
adunne
 
The Starfish and the Spider
The Starfish and the SpiderThe Starfish and the Spider
The Starfish and the Spider
adunne
 
Ginger Preview
Ginger PreviewGinger Preview
Ginger Preview
adunne
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solr
adunne
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
adunne
 
The Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms IndustryThe Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms Industry
adunne
 
Building Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data CentersBuilding Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data Centers
adunne
 
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
adunne
 
Designing for a Web of Data
Designing for a Web of DataDesigning for a Web of Data
Designing for a Web of Data
adunne
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
adunne
 
Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...
adunne
 
Your User's Privacy
Your User's PrivacyYour User's Privacy
Your User's Privacy
adunne
 
Trends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine MarketingTrends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine Marketing
adunne
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
adunne
 
Breaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for AccessibilityBreaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for Accessibility
adunne
 
Web 2.0 Design Patterns, Models and Analysis
Web 2.0 Design Patterns, Models and AnalysisWeb 2.0 Design Patterns, Models and Analysis
Web 2.0 Design Patterns, Models and Analysis
adunne
 

More from adunne (20)

Seedcamp Overview
Seedcamp OverviewSeedcamp Overview
Seedcamp Overview
 
Netvibes Preview
Netvibes PreviewNetvibes Preview
Netvibes Preview
 
Community Practices: From Forums to Social Networks
Community Practices: From Forums to Social NetworksCommunity Practices: From Forums to Social Networks
Community Practices: From Forums to Social Networks
 
Designing Tag Navigation
Designing Tag NavigationDesigning Tag Navigation
Designing Tag Navigation
 
Social Commerce and Community
Social Commerce and CommunitySocial Commerce and Community
Social Commerce and Community
 
The Starfish and the Spider
The Starfish and the SpiderThe Starfish and the Spider
The Starfish and the Spider
 
Ginger Preview
Ginger PreviewGinger Preview
Ginger Preview
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solr
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
The Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms IndustryThe Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms Industry
 
Building Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data CentersBuilding Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data Centers
 
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
 
Designing for a Web of Data
Designing for a Web of DataDesigning for a Web of Data
Designing for a Web of Data
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...
 
Your User's Privacy
Your User's PrivacyYour User's Privacy
Your User's Privacy
 
Trends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine MarketingTrends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine Marketing
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
 
Breaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for AccessibilityBreaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for Accessibility
 
Web 2.0 Design Patterns, Models and Analysis
Web 2.0 Design Patterns, Models and AnalysisWeb 2.0 Design Patterns, Models and Analysis
Web 2.0 Design Patterns, Models and Analysis
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set

  • 1. GeoNames “Under the Hood: How GeoNames Aggregates many Sources into One Data Set“ GeoNames is ... aggregator of free geo data I am ... Marc Wick self employed software engineer, Switzerland
  • 2. GeoNames Feature Density Map GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
  • 3. GeoNames - Gazetteer Pragmatic, useful, ease of use Over 6.5 million features Cc-by licence 9 feature classes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
  • 4. Screen shot Berlin GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4
  • 5. Origins and Goal Proprietary application Team up together contribute modifications to central data base. applications switch to GeoNames from proprietary aggregation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
  • 6. Challenge A lot of data IS available Many providers Languages Scripts GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
  • 7. GeoNames Ambassadors GeoNames contact Speak local language Know local situation GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
  • 8. Data Sources National Mapping Agencies Statistical Offices Postal codes National Geospatial-Intelligence Agency (NGA)‫‏‬ Applications using GeoNames − Data files − Manual modifications GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
  • 9. US vs Europe US data is freely available European data is not available Rest of the World? Consequences GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
  • 10. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
  • 11. Future of geodata availability We believe basic geodata will be free in most countries Why : − Economy − Traffic Policy and Road Safety (road signs)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
  • 12. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
  • 13. Free Availability is only a First Step GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
  • 14. Who aggregates data GeoNames Super national mapping agencies Super national organisations INSPIRE GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
  • 15. Problems and Solutions I Shape / GML FWTools/ GDAL/OGR Datum reprojection Postgis/epsg/native tools/custom impl GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
  • 16. Problems and Solutions II FeatureCodes not 1:1 Pattern matching non-ASCII Transliteration Country codes Admin1 codes GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
  • 17. Place name matching Geocoding Distance feature type and feature code Reverse geocoding, compare name similarity − levenshtein distance − letter pair similarity GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
  • 18. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
  • 19. Wikipedia GeoTemplates Proliferation of GeoFormats No consensus, Anarchy Examples − <geo>48 46 36 N 121 48 51 W</geo> − {{coor d|48.7767|N|121.8142|W|}} − Berlin : |lat_deg = 52|lat_min = 31 − ... (Any template you could possibly think of is used somewhere)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
  • 20. Alternate Names ... Italian : Berlino English : Berlin Arabic : ‫نيلرب‬ Korean : Thai : เบอรลิน Russian : Берлин Chinese : Marathi : बर् लि न ... (ca 100 names)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
  • 21. Postal codes Geocode – postal code numeric distance Accuracy, completeness ScribbleMaps by Robert Kosara GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
  • 22. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
  • 23. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
  • 24. Data Dump Flat csv files Simple format Ease of use Full daily dump daily modifications rdf GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
  • 25. Web Services Search − Ranking Tf idf Relevancy − I18n GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
  • 26. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
  • 27. Hierarchy Web Services Hierarchy Child Neighbour Sibling GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
  • 28. Apache mod rewrite ROME (RSS)‫‏‬ jdom.org (xml)‫ ‏‬JSON Tomcat (Java)‫‏‬ JMS activeMQ Lucene SRTM3 Gtopo30 JDBC Full Text Index TF-IDF Database : Postgres (postgis)‫‏‬ GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
  • 29. Libraries Java Drupal Ruby Php Perl Python Lisp GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
  • 30. Synchronization Dail dump Daily modification Jms Rdf dump, periodically GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
  • 31. Linked Data GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
  • 32. Applications using GeoNames thousands of applications search Site navigation geo-coding GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
  • 33. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
  • 34. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
  • 35. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
  • 36. Thank you for your attention. GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36