Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • GUI Description: The following picture depict the browsing of a geospatial hierarchical facet. Any selection on the facet acts as a filter in GUI and can be easily removed at any time. Moreover, if a facet item is selected then all the children facet item is displayed in sorted order with document counts.
  • Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards

    1. 1. Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open Semantic Web Standards Stephane Fellah Barry J. Glick Yaser Bishr smartRealm LLC [email_address] Association of American Geographers (AAG) Annual Conference Washington DC, April 15, 2010
    2. 2. Agenda <ul><li>Gazetteer overview </li></ul><ul><li>Project overview </li></ul><ul><li>Open standards used </li></ul><ul><li>Geocoding Process </li></ul><ul><li>Prototype application </li></ul>
    3. 3. Gazetteer Overview
    4. 4. Role of gazetteer services ADEPT, Smith, October 1999 Where is …? What’s there? What happened there? Books News Web Publications Archives Geo referencing using gazetteer services Data smartRealm LLC confidential 9/16/2009
    5. 5. Semantic gazetteer vs. traditional gazetteers smartRealm LLC confidential 9/16/2009 Semantic gazetteer Traditional gazetteer services Multiple classification schemes  Point geometry   Multi geometry  Geo-spatial semantic relations  Time stamp for features   Time stamp for geometries  Time stamp for other properties  Semantic disambiguation  Profile gazetteer KB capable  Spatial queries (Bbox, AOI)   (only points)
    6. 6. Project description
    7. 7. R&D Project Goals <ul><li>Demonstrate the value of geo-enabling a librarian database….by </li></ul><ul><ul><li>Geocoding and spatially indexing a complete librarian database; i.e. ASFA </li></ul></ul><ul><ul><li>Implementing geographic search of documents integrated with topic and author search and map-based visualization of results </li></ul></ul><ul><ul><li>Assisting users in discovering relevant information by surfacing the controlled vocabularies of ASFA </li></ul></ul><ul><ul><li>Testing of prototype by users to assess utility, ease of use, etc </li></ul></ul><ul><li>Demonstrate the value of linked data and semantic by: </li></ul><ul><ul><li>Enabling geospatial reasoning </li></ul></ul><ul><ul><li>Encoding taxonomies in machine processable format </li></ul></ul><ul><ul><li>Resolve ambiguity of terms </li></ul></ul><ul><ul><li>Reusability of linked data </li></ul></ul>
    8. 8. ASFA: Aquatic Sciences and Fisheries Abstracts <ul><li>ASFA series is the premier reference in the field of aquatic resources. </li></ul><ul><li>Input to ASFA is provided by a growing international network of information centers monitoring over 5,000 serial publications, books, reports, conference proceedings, translations and limited distribution literature. </li></ul><ul><li>ASFA is a component of the  Aquatic Sciences and Fisheries Information System (ASFIS) , formed by four United Nations agency sponsors of ASFA and a network of international and national partners. </li></ul><ul><li>1.3 million records encoded in XML. </li></ul>
    9. 9. ASFIS 6 <ul><li>Descriptors used for subject indexing and retrieval of information on all aspects of aquatic sciences and technology </li></ul><ul><li>6267 vocabulary terms allowing the </li></ul><ul><li>We used existing SKOS encoding of the taxonomy </li></ul>
    10. 10. ASFIS 7 <ul><li>Geographic descriptors used in ASFA system </li></ul><ul><li>Not officially standardized </li></ul><ul><li>Inconsistencies due manual entries </li></ul><ul><li>Hardwired in system </li></ul><ul><li>Goal of this project: </li></ul><ul><ul><li>Encode semantically ASFIS7 taxonomy </li></ul></ul><ul><ul><li>Geocoding of the taxnomy </li></ul></ul><ul><ul><li>Enable spatial search in ASFA database. </li></ul></ul>
    11. 11. Support Multiple Use Cases <ul><li>Researcher has a specific research goal : provide a quicker, simpler way to filter results to get to the relevant documents </li></ul><ul><ul><li>I’m looking for research on coral reef diseases in the western Caribbean region </li></ul></ul><ul><li>Researcher has a specific area of interest : allow user to use map or geographic terms to define area of interest and use it to find relevant research </li></ul><ul><ul><li>I am studying the Danube delta region…what research is available in ASFA for this area? (and what topics does the research address?) </li></ul></ul><ul><li>Geo-exploration of research: researcher is interested in a specific topic and uses the map to explore relevant document. </li></ul><ul><ul><li>My research interest is oyster farming. Where in the world has research been conducted on this topic? </li></ul></ul><ul><li>Others: </li></ul><ul><ul><li>Where does a specific author conduct his/her research? </li></ul></ul><ul><ul><li>Which authors have published the most research on a specific area of interest? </li></ul></ul><ul><ul><li>What is the geographic distribution of research on a specific topic? (and where are gaps?) </li></ul></ul>
    12. 12. Open Standards used
    13. 13. RDF: Graph Representation Equivalent in relational model Model minimalist: the TRIPLE Model association attribute Literal Object Object
    14. 14. Linked Open Data
    15. 15. Geospatial Semantic Web Architecture Source: Berners-Lee AAAI July 2006 Geospatial Datatypes Geospatial Functions Geospatial Ontology Extensions Geospatial Logic
    16. 16. SKOS <ul><li>SKOS = Simple Knowledge Organization System </li></ul><ul><li>A common data model for sharing and linking knowledge organization systems (KOS) via the Semantic Web. </li></ul><ul><ul><li>KOS examples: thesauri, taxonomies, classification schemes, subject heading systems … … </li></ul></ul><ul><li>Machine processable and portable representation </li></ul><ul><li>Extensible </li></ul>
    17. 17. SKOS Thesaurus Example
    18. 18. Example of Classification Scheme <ul><ul><li>90. GEOPHYSICS, ASTRONOMY, AND ASTROPHYSICS </li></ul></ul><ul><ul><li>91. Solid Earth physics </li></ul></ul><ul><ul><li>91.10.-v Geodesy and gravity </li></ul></ul><ul><ul><li>91.10.Pp Gravimetric measurements and instruments </li></ul></ul>
    19. 19. Example of Classification Scheme
    20. 20. Semantic Geo-encoding <ul><li>Arrange geographic places in an order from most general to most specific, e.g. </li></ul><ul><ul><li>World/Continent/Country/State or Province/City </li></ul></ul><ul><ul><li>World/Ocean/Ocean Region/Sea/Bay </li></ul></ul><ul><ul><li>World/Continent/Country/River or Lake </li></ul></ul><ul><li>This allows user to move up and down hierarchy in search and to find related, more specific and more general terms </li></ul><ul><li>Also helps in distinguishing geographic place names that are ambiguous, e.g. Mississippi as river vs. Mississippi as state, etc. </li></ul>
    21. 21. Geo-SKOS <ul><li>Define an extension of SKOS for geospatial concept. </li></ul><ul><li>GeoConcept is a subclass of Concept </li></ul><ul><li>GeoConcept has location propertyies </li></ul><ul><li>Specialization of narrower and broader </li></ul><ul><ul><li>Narrower => Narrower-partitive,… </li></ul></ul><ul><ul><li>Broader => Broader-partitive,… </li></ul></ul><ul><ul><li>Related => Nearby, SW of, west of,… </li></ul></ul>
    22. 22. Geocoding process
    23. 23. Geocoding Process ASFA XML Q3 list Q3 extraction SKOS Encoding Top Concepts (Countries, Sea Zones) ASFIS7 SKOS Geocoder Geocoded ASFIS7 SKOS Post Processing (bbox, centroid) Reasoning Post-processed Geocoded ASFIS7 SKOS Indexing Inferred Geocoded ASFIS7 SKOS ASFIS7 Index Indexing Mapping SmartRealm Gazetteer Oracle Spatial Index
    24. 24. Approach <ul><li>Encode legacy data from q3 fields in ASFA </li></ul><ul><li>Not using Authoritative list because no direct matching between terms </li></ul><ul><li>Sea codes not handled in authoritative list </li></ul><ul><li>Polygons and linestrings have priority on points </li></ul>
    25. 25. ASFA Data <ul><li><rec id=&quot;16&quot; status=&quot;1&quot; type=&quot;Journal Article&quot; jdf=&quot;Q1;Y&quot;> </li></ul><ul><li>     <ti>Divergence Among Barking Frogs (Eleutherodactylus Augusti) In The     </li></ul><ul><li>          Southwestern United States</ti> </li></ul><ul><li>     <ab>Barking frogs (Eleutherodactylus augusti) are distributed from southern Mexico along                    the Sierra Madre Occidental into Arizona and the SierraMadre Oriental into Texas and                 New Mexico. ....      </ab> </li></ul><ul><li>         <pt>Journal Article</pt> </li></ul><ul><li>     <q1> </li></ul><ul><li>         <term>Amphibiotic species</term> </li></ul><ul><li>         <term>Burrowing organisms</term> </li></ul><ul><li>         <term>Burrows</term> </li></ul><ul><li>         <term>Coloration</term> </li></ul><ul><li>         ...... </li></ul><ul><li>     </q1> </li></ul><ul><li>     <q2> </li></ul><ul><li>         <term>Anura</term> </li></ul><ul><li>      <term>Eleutherodactylus</term> </li></ul><ul><li>         <term>Eleutherodactylus augusti</term> </li></ul><ul><li>     </q2> </li></ul><ul><li>     <q3> </li></ul><ul><li>         <term>ISW,Mexico</term> </li></ul><ul><li>         <term>USA, Arizona</term> </li></ul><ul><li>         <term>USA, New Mexico</term> </li></ul><ul><li>     </q3> </li></ul>
    26. 26. Q3 field extraction <ul><li>*--MED, Turkey, Bursa, Gemlik Bay *--Turkey, Bursa - British-Colimbia - Canada -Vancouver A, America A, America , East Coast A, Antarctic Bottom Water A, Atlantic A, Atlantic Plate A, Atlantic, Antarctic Bottom Water A, Atlantic, Gulf Stream A, Atlantic, Macaronesian Is. A, Atlantic, Mid-Atlantic Ridge A, Atlantic, Rio Grande Plateau A, Central Atlantic A, Mid-Atlantic Bight A, Mid-Atlantic Ridge A, Mid-Atlantic Ridge, Lucky Strike A, Mid-Atlantic Ridge, Oceanographer Fracture Zone A, North Atlantic A, Northwest Atlantic Basin A, Rockall Trough A, Sargasso Sea A, Southern Hemispere Oceans A, atlantic A,Atlantic AE, Africa AE, Atlantic AE, Central Atlantic </li></ul>
    27. 27. Challenge: Inconsistent name and conventions <ul><li>China, Nin gsia Hui Autonomous Region, Yinchwan China, People'S Rep., Hubei Prov., Wuhan China, People's R China, People's R., Hailung Hsien China, People's Rep, Changjiang Delta China, People's. Rep., Xizang, Qing Zang Gaoyuan Plateau </li></ul><ul><li>China, Peoples Rep China, Peoples Rep., Fuxian L. China, Peoples rep., Dayawan Huizhou China, Peoples's Rep., Yunnan Prov., Yuanjiang R. China, Peoples, Rep., Ya-Er L. China, Peoptes Rep. Qingdao China, Reople's Rep., Yangtze R. China, Rep., Donghu L. China, people's Rep. China, people's rep. Chinea, People's Rep., Three Gorges Reservoir </li></ul>
    28. 28. Challenge: Legacy names <ul><ul><li>Germany, F.R </li></ul></ul><ul><ul><li>Germany, F.R., Westphalia </li></ul></ul><ul><ul><li>Germany, Fed. Rep </li></ul></ul><ul><ul><li>Germany, Fed. Rep. </li></ul></ul><ul><ul><li>Germany, Fed. Rep., Westphalia </li></ul></ul><ul><ul><li>Germany, Fed.Rep </li></ul></ul><ul><ul><li>Germany, Fed.Rep., Wuerttemberg </li></ul></ul><ul><ul><li>Germany, Feldbach Brook </li></ul></ul><ul><ul><li>Germany, D.R., Wipper R </li></ul></ul><ul><ul><li>Germany, Dem Rep </li></ul></ul><ul><ul><li>Germany, Dem. Rep </li></ul></ul><ul><ul><li>Germany, Dem. Rep., Helme R </li></ul></ul><ul><ul><li>Germany, Dem.Rep </li></ul></ul><ul><ul><li>Germany, Dem.Rep., Harz </li></ul></ul>
    29. 29. SKOS Encoding <ul><li>asfis7:USA/California/San_Diego_City       </li></ul><ul><li>a       skos:Concept ; </li></ul><ul><li>      skos:prefLabel &quot;San Diego City&quot;@en . </li></ul><ul><li>      skos:altLabel &quot;San Diego Cty.&quot;@en ; </li></ul><ul><li>      skos:altLabel &quot;San Diego&quot;@en ;      </li></ul><ul><li>      skos:broader   asfis7:USA/California ; </li></ul><ul><li>      skos:narrower    asfis7:USA/California/San_Diego_City/San_Luis_Rey_River ,                               asfis7:USA/California/San_Diego_City/Point_Loma ,                               asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek ; </li></ul><ul><li>      skos:inScheme <>.       </li></ul>
    30. 30. Data sources used <ul><li>Geonames </li></ul><ul><li>Digital Chart of the World (DCW) </li></ul><ul><ul><li>Countries </li></ul></ul><ul><ul><li>Admin1 </li></ul></ul><ul><ul><li>Admin2 </li></ul></ul><ul><ul><li>World Seas </li></ul></ul><ul><ul><li>World Rivers </li></ul></ul><ul><ul><li>Continent </li></ul></ul><ul><ul><li>World Regions </li></ul></ul><ul><li>FAO Geonetwork </li></ul><ul><ul><li>ASFA Data </li></ul></ul>
    31. 31. Geonames RDBMS KMS SQL Engine KMS Mapping RDBMS Model Table/Column Country.shp Feature Store KMS Feature Engine KMS Mapping Feature Model FeatureType Attribute Admin1.shp Feature Store KMS Feature Engine KMS Mapping RDF Graph RDF Graph RDF Graph Feature Model FeatureType Attribute World Seas ASFA Sea Zones Feature Store KMS Feature Engine KMS Mapping RDF Graph Feature Model FeatureType Attribute Data Product Layer Hydrology Ontology Administrative Division Ontology Feature Model Upper ontology Ontological Layer Semantic Gazetteer API Data Layer Knowledge Integration Approach
    32. 32. Geocoding information <ul><li>Geometry (polygon, linestring or point) </li></ul><ul><li>Centroid </li></ul><ul><li>Bounding box </li></ul><ul><li>Feature types </li></ul><ul><li>Alternate names </li></ul><ul><li>Neighbor places (similar to RT) </li></ul>
    33. 33. Geocoded Concept <ul><li><> </li></ul><ul><li>       a       skos:Concept ; </li></ul><ul><li>skos:altLabel &quot;State of Nepal&quot;, &quot;Neipeal&quot;, &quot;Nepalia&quot;... </li></ul><ul><li>ft:centroid &quot;POINT (84 28)&quot;^^ks:wkt ; </li></ul><ul><li>       </li></ul><ul><li>ft:featureType <> ; </li></ul><ul><li>       </li></ul><ul><li>ft:geometry &quot;MULTIPOLYGON (((82.70109558105469 27.711105346679688, 82.65790557861328 ....... 82.59803771972656 27.69027328491211, 82.571755981445312 27.690410614013672, 82.70109558105469 27.711105346679688)))&quot;^^ks:wktMultiPolygon ; </li></ul><ul><li>     </li></ul><ul><li>owl:sameAs <> </li></ul><ul><li>  </li></ul>
    34. 34. Postprocessing <ul><li>Centroid computed from geometry </li></ul><ul><li>Bounding box computed from polygon geometry. </li></ul><ul><li>If no polygon, inherit bounding box from parent </li></ul><ul><li>Centroid are not inherited </li></ul>
    35. 35. Inferencing <ul><li>asfis7:USA/California/San_Diego_Cit y       a       skos:Concept ;       skos:prefLabel &quot;San Diego City&quot;@en .       skos:altLabel &quot;San Diego Cty.&quot;@en ; </li></ul><ul><li>      skos:altLabel &quot;San Diego&quot;@en ;      </li></ul><ul><li> skos:broader asfis7:USA/California;       </li></ul><ul><li>skos:broaderTransitive  asfis7:USA ,                                                 asfis7:USA/California ;       skos:narrower asfis7:USA/California/San_Diego_City/San_Luis_Rey_River ,                     asfis7:USA/California/San_Diego_City/Point_Loma ,                      asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek;       </li></ul><ul><li> skos:narrowerTransitive               asfis7:USA/California/San_Diego_City/San_Luis_Rey_River ,               asfis7:USA/California/San_Diego_City/Point_Loma ,               asfis7:USA/California/San_Diego_City/Los_Penasquito_Creek;       </li></ul><ul><li>skos:inScheme <>. </li></ul>
    36. 36. SKOS Indexing <ul><li>Field indexed in Lucene/Solr </li></ul><ul><ul><li>Id </li></ul></ul><ul><ul><li>Type </li></ul></ul><ul><ul><li>Preferred labels, alternate labels </li></ul></ul><ul><ul><li>Geometry </li></ul></ul><ul><ul><li>Centroid </li></ul></ul><ul><ul><li>Bounding box </li></ul></ul><ul><ul><li>Narrower, narrower transitive </li></ul></ul><ul><ul><li>Broader, broader transitive </li></ul></ul><ul><ul><li>Related </li></ul></ul><ul><ul><li>Feature types </li></ul></ul><ul><ul><li>Equivalent terms </li></ul></ul><ul><li>Id, centroid and geometries are spatially indexed in Oracle spatial </li></ul>
    37. 37. Prototype Application
    38. 39. Advantages of Faceted Search <ul><li>Lets the user decide how to start, and how to explore and group </li></ul><ul><li>After refinement, categories that are not relevant to the current results disappear </li></ul><ul><li>Seamlessly integrates keyword search with the organizational structure. </li></ul><ul><li>Very easy to expand out (loosen constraints) </li></ul><ul><li>Very easy to build up complex queries </li></ul>
    39. 40. Advantages of Faceted Search <ul><li>Can’t end up with empty results sets </li></ul><ul><ul><li>(except with keyword search) </li></ul></ul><ul><li>Helps avoid feelings of being lost </li></ul><ul><li>Easier to explore the collection </li></ul><ul><ul><li>Helps users infer what kinds of things are in the collection. </li></ul></ul><ul><ul><li>Evokes a feeling of “browsing the shelves” </li></ul></ul><ul><li>Is preferred over standard search for collection browsing in usability studies </li></ul><ul><ul><li>(Interface must be designed properly) </li></ul></ul>
    40. 41. Geospatial Hierarchical facet
    41. 42. Benefit of semantic approach <ul><li>Unique identifier for place </li></ul><ul><li>Distinction in search between direct place and indirect place (by transitivity) </li></ul><ul><li>Multilingual search </li></ul><ul><li>Alternate names search still point to same uri (New York, NYC, Big Apple) </li></ul><ul><li>Linkable to other data (reusable for different applications) </li></ul><ul><li>Reasoning </li></ul><ul><li>Easy integration </li></ul>
    42. 43. Accomplishments <ul><li>Geo-semantic enabled ASFA prototype is a breakthrough </li></ul><ul><ul><li>Not just pins on a map – fully integrated geo-spatial and semantic search with GIS display and operations </li></ul></ul><ul><ul><li>Uses geographic knowledge base and map interface to aid search and discovery </li></ul></ul><ul><li>Unique aspects: </li></ul><ul><ul><li>Tagging research document not just to points, but to linear features and areal regions on the earth’s surface </li></ul></ul><ul><ul><li>Allowing for user-defined areas of interest, including polygons </li></ul></ul><ul><ul><li>Creating a geo-semantic structure for the locations to enable enhanced search because of inheritance and inference: </li></ul></ul><ul><ul><ul><li>e.g. if something is tagged with “Naked Island, Alaska” we know that it is part of North America and USA but also that it is within Prince William Sound which is within the Gulf of Alaska, which is part of the eastern North Pacific ocean region. Thus a search for research on oil spills in Prince William Sound will also include any documents tagged with Naked Island, Alaska even without any explicit mention of Pr. Wm. Sound in the document </li></ul></ul></ul>