Semantics-enhanced Geoscience Interoperability, Analytics, and Applications


Published on

Talk given by prof. T.K. Prasad at the workshop on Semantics in Geospatial Architectures: Applications and Implementation. The workshop was held from October 28-29, 2013 at Pyle Center (702 Langdon Street, Madison, WI), University of Wisconsin-Madison.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

  1. 1. Semantics-enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 1
  2. 2. Outline •  Semantics-empowered Cyberinfrastructure for Geoscience Applications –  Approaches, Benefits, and Challenges (reflecting cost/convenience/pay-off trade-offs) •  Expressive search and integration using Geospatial information –  –  SPARQL enhancements Practical applications using semantic technologies, sensor data streams, and spatial information 2
  3. 3. Semantics-empowered Cyberinfrastructure for Geoscience applications 3
  4. 4. Domain Goals and Challenges Data-driven understanding of the evolution of oceans, atmosphere, and solid earth over time through physical, chemical and biological processes. •  Cultural challenges –  Proper protection, control, and credit for sharing data •  Technological challenges –  Computational tools and repositories conducive to easy exchange, curation, and attribution of data Data sharing can promote re-analysis/re-interpretation of extant data, reducing “redundant” data collections. 4
  5. 5. Category of Geoscience Data Characteristics Strategy for Reuse CI Strategy Short tail science data created by large organization s a n d projects Few, large (TB+), structured, spatially rich (e.g., remote sensing), largely homogeneous, highly visible, curated Planned integration strategies, could use formal ontologies / domain models and vocabularies, visualization tools and APIs Data centers / grids generally using relational databases and files, maintained by people with significant IT skills Long tail science data created by individual scientists and small groups Many, small (GB+), heterogeneous, invisible (except via publications), poorly curated Multi-domain and broad vocabularies (including community established ones), create semantic metadata (annotations) and optionally publish, search and download legacy data, or use an open data initiative Web-based easy to learn and use semantic tools for annotation, publication, search and download that can be used by individual scientists without significant IT skills 5
  6. 6. Our Thesis Associating machine-processable semantics with the long tail of science data and documents can help overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. 6
  7. 7. What?: Nature of Data •  Structured Data (e.g., relational) •  Semi-structured, Heterogeneous Documents (e.g., Geoscience publications and technical specs, which usually include text, numerics, maps and images) •  Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries) 7
  8. 8. What?: Granularity of Semantics and Associated Applications •  Lightweight semantics: File-level annotation to enable discovery and sharing of long tail of science data •  Richer semantics: Document-level annotation and extraction for semantic search and summarization •  Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data 8
  9. 9. Why?: Benefits of Lightweight Semantics •  Ease of use by domain experts –  Faster and wider adoption, promoting evolution •  Low upfront cost to support •  Shallow semantics has wider applicability to a range of documents/data and appeal to a broader community of geoscientists •  Bottom-line: “Learn to Walk before we Run” 9
  10. 10. How?: Ingredients for Semantics-based Cyber Infrastructure •  Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies) •  Ease self-publishing and discovery •  Data citation index to credit for data sharing •  S e m i - a u t o m a t i c a n n o t a t i o n o f d a t a a n d documents : Manual + Automatic 10
  11. 11. Example: Lightweight Semantic Registration of Data Title of data Selected from five tier vocabulary provided Keywords Type of data maps, excel files, images, text Data format structured or unstructured Description of data brief unstructured description of content Contact information of provider(s) name of provider(s), email for verification, lineage Spatial extent of data and reference system location Temporal extent of data date range in time or age range if not recent Date and type of Related Publication(s) Journal, Thesis, Agency report, not published Host site for publication Journal, Library, Personal computer Access restrictions copyright regulations 11
  12. 12. System Architecture and Components 12
  13. 13. Deeper Issues: Semantic Formalization of Tabular Data Problems and A Practical Approach (“When rubber meets the road”) skip   13
  14. 14. Nature of tables •  Compact structures for sharing information –  Minimize duplication •  Types of Tables –  Regular : Dense Grid with explicit schema information in terms of column and row headings => Tractable –  Irregular: Sparse Grid with implicit schema and ad hoc placement of heading => Hard 14
  15. 15. Challenges Associated with Typical Spreadsheet/Table •  Meant for human consumption •  Irregular : –  Not simple rectangular grid •  Heterogeneous –  All rows not interpreted similarly •  Complex –  Meaning of each row and each column context dependent •  Footnotes modify meaning of entries (esp. in materials and process specifications) 15
  16. 16. Practical Semi-Automatic Content Extraction •  DESIGN: Develop regular data structures that can be used to formalize tabular information. –  Provide a natural expression of data –  Provide semantics to data, thereby removing potential ambiguities –  Enable automatic translation •  USE: Manual population of regular tables and automatic translation into LOD 16
  17. 17. Expressive search and integration using Geospatial information 17
  18. 18. Outline •  Query Language Support for SpatioTemporal Context: SPARQL-ST (=> GeoSPARQL) •  Practical Applications that use SpatioTemporal information for joining Sensor Data to enable Machine Perception 18
  19. 19. Overview : SPARQL-ST •  SPARQL –  W3C recommended query language for RDF data (as of Jan. 15, 2008) –  Graph pattern-based queries (subgraph match) •  SPARQL-ST –  Spatial variables –  Temporal variables –  Spatial filter expressions –  Temporal filter expressions skipToEg   19
  20. 20. Find all politicians that represent areas within 100 miles of the district represented by Nancy Pelosi. SELECT ?n WHERE { ?p foaf:name ?n . ?p usgov:hasRole ?r . ?r usgov:forOffice ?o . ?o usgov:represents ?q . ?q stt:located_at %g . ?a foaf:name “Nancy Pelosi” . ?a usgov:hasRole ?b . ?b usgov:forOffice ?c . ?c usgov:represents ?d . ?d stt:located_at %h . SPATIAL FILTER (distance(%g, %h) <= 100 miles) } 20
  21. 21. Find all politicians representing congressional districts within a given geographical area at any time in October 2013 SELECT ?p WHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:represents ?c #t3 . ?c stt:located_at %g #t4 . SPATIAL FILTER (inside(%g, GEOM(POLYGON (( -75.14 40.88, -70.77 40.88, -70.77 42.35, -75.14 42.35, -75.14 40.88))) )) TEMPORAL FILTER ( anyinteract(intersect (#t1, #t2, #t3, #t4), interval(10:01:2013, 10:31:2013, MM:DD:YYYY))) } 21
  22. 22. Summary of SPARQL-ST •  Relationship-centric nature of the RDF data model extended for querying STT data •  Querying –  Supports spatial and temporal relationships in graph pattern queries –  Integrates well with current standards •  Implementation –  Good scalability on large synthetic/real-world data –  Only system for spatial and temporal RDF 22
  23. 23. OGC GeoSPARQL Slides by Matt Perry of Oracle (also: Kno.e.sis Alumnus) 4th Annual Spatial Ontology Community of Practice Workshop (SOCoP) USGS, 12201 Sunrise Valley Drive , Reston VA December 2, 2011 23
  24. 24. What Does GeoSPARQL Give Us? •  Vocabulary for Query Patterns –  Classes •  Spatial Object, Feature, Geometry –  Properties •  Topological relations •  Links between features and geometries –  Datatypes for geometry literals •  ogc:WKTLiteral, ogc:GMLLiteral SkipToEg •  Query Functions –  Topological relations, distance, buffer, intersection, … •  Entailment Components –  RIF rules to expand feature-feature query into geometry query –  Gives a common interface for qualitative and quantitative systems OGC ® 24
  25. 25. Example Query Find the three closest Mexican restaurants PREFIX PREFIX PREFIX PREFIX : <> ogc: <> ogcf: <> epsg: <> SELECT ?restaurant WHERE { ?restaurant rdf:type :Restaurant . ?restaurant :cuisine :Mexican . ?restaurant :pointGeometry ?rGeo . ?rGeo ogc:asWKT ?rWKT } ORDER BY ASC(ogcf:distance(“POINT(…)”^^ogc:WKTLiteral, ?rWKT, ogc:KM)) LIMIT 3 OGC ® 25
  26. 26. Practical Applications that use Spatio-Temporal information for joining Sensor Data to enable Machine Perception 26
  27. 27. Applications using spatial and/or temporal information •  Location-aware applications –  Four Squares –  Open Street Maps •  Spatio-temporal-thematic (STT) contextenhanced data integration, querying, and inferencing (machine perception) –  Semantic Sensor Web (+ SemSOS) •  Abstract weather sensor data streams to weather features 27
  28. 28. (cont’d) •  Applications supporting expressive queries –  Human comprehensible vs machine processable •  Geonames (LOD) ↔ Lat-long, GPS data –  What is the current temperature or traffic delay at Dayton International Airport? –  Knowledge-based query expansion/reasoning •  Bridging vocabulary mismatches in the queries and the data, e.g., using semantic relationships between regions and landmark locations –  Find schools in OH –  Find schools near Wright State University 28
  29. 29. Semantic Sensor Observation Service Architecture : Making the Data Smart 29
  30. 30. SSW demo with Mesowest data (Machine Perception) 30
  31. 31. Implementation of Perception Cycle 31
  32. 32. Trusted Perception Cycle Demo 32
  33. 33. Sensor Discovery on Linked Data Demo 33
  34. 34. Kno.e.sis thank you, and please visit us at Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA 34