Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Krishnaprasad Thirunarayan and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH-45435

1
Outline
•  Semantics-empowered Cyberinfrastructure for
Geoscience Applications
– 

Approaches, Benefits, and Challenges
(reflecting cost/convenience/pay-off trade-offs)

•  Expressive search and integration using
Geospatial information
– 
– 

SPARQL enhancements
Practical applications using semantic technologies,
sensor data streams, and spatial information

2
Semantics-empowered
Cyberinfrastructure for Geoscience
applications

3
Domain Goals and Challenges
Data-driven understanding of the evolution of oceans,
atmosphere, and solid earth over time through physical,
chemical and biological processes.
•  Cultural challenges
–  Proper protection, control, and credit for sharing data

•  Technological challenges
–  Computational tools and repositories conducive to easy
exchange, curation, and attribution of data

Data sharing can promote re-analysis/re-interpretation of
extant data, reducing “redundant” data collections.
4
Category of
Geoscience
Data

Characteristics

Strategy for Reuse

CI Strategy

Short tail
science
data created
by large
organization
s
a n d
projects

Few, large (TB+),
structured, spatially
rich (e.g., remote
sensing), largely
homogeneous,
highly visible,
curated

Planned integration
strategies, could use formal
ontologies / domain models
and vocabularies,
visualization tools and APIs

Data centers / grids
generally using
relational databases
and files, maintained
by people with
significant IT skills

Long tail
science
data created
by individual
scientists
and small
groups

Many, small (GB+),
heterogeneous,
invisible (except via
publications),
poorly curated

Multi-domain and broad
vocabularies (including
community established
ones), create semantic
metadata (annotations) and
optionally publish, search
and download legacy data,
or use an open data
initiative

Web-based easy to
learn and use semantic
tools for annotation,
publication, search and
download that can be
used by individual
scientists without
significant IT skills
5
Our Thesis
Associating machine-processable semantics
with the long tail of science data and
documents can help overcome challenges
associated with data discovery, integration
and interoperability caused by data
heterogeneity.

6
What?: Nature of Data
•  Structured Data (e.g., relational)
•  Semi-structured, Heterogeneous Documents
(e.g., Geoscience publications and technical
specs, which usually include text, numerics, maps
and images)
•  Tabular data (e.g., ad hoc spreadsheets and
complex tables incorporating “irregular” entries)
7
What?: Granularity of Semantics and Associated Applications

•  Lightweight semantics: File-level annotation to
enable discovery and sharing of long tail of
science data
•  Richer semantics: Document-level annotation and
extraction for semantic search and summarization
•  Fine-grained semantics: Data integration,
interoperability and reasoning in Linked Open Data
8
Why?: Benefits of Lightweight Semantics
•  Ease of use by domain experts
–  Faster and wider adoption, promoting evolution

•  Low upfront cost to support
•  Shallow semantics has wider applicability to a
range of documents/data and appeal to a broader
community of geoscientists
•  Bottom-line: “Learn to Walk before we Run”
9
How?: Ingredients for Semantics-based Cyber Infrastructure

•  Use of community-ratified controlled vocabularies
and lightweight ontologies (upper-level,
hierarchies)
•  Ease self-publishing and discovery
•  Data citation index to credit for data sharing
•  S e m i - a u t o m a t i c a n n o t a t i o n o f d a t a a n d
documents : Manual + Automatic
10
Example: Lightweight Semantic Registration of Data
Title of data

Selected from five tier vocabulary
provided Keywords

Type of data

maps, excel files, images, text

Data format

structured or unstructured

Description of data

brief unstructured description of content

Contact information of provider(s)

name of provider(s), email for verification,
lineage

Spatial extent of data and
reference system

location

Temporal extent of data

date range in time or age range if not recent

Date and type of Related
Publication(s)

Journal, Thesis, Agency report, not published

Host site for publication

Journal, Library, Personal computer

Access restrictions

copyright regulations
11
System Architecture and Components

12
Deeper Issues: Semantic Formalization
of Tabular Data
Problems and A Practical Approach
(“When rubber meets the road”)

skip	
  
13
Nature of tables
•  Compact structures for sharing information
–  Minimize duplication

•  Types of Tables
–  Regular : Dense Grid with explicit schema
information in terms of column and row
headings => Tractable
–  Irregular: Sparse Grid with implicit schema and
ad hoc placement of heading => Hard
14
Challenges Associated with Typical Spreadsheet/Table

•  Meant for human consumption
•  Irregular :
–  Not simple rectangular grid
•  Heterogeneous
–  All rows not interpreted similarly
•  Complex
–  Meaning of each row and each column context
dependent
•  Footnotes modify meaning of entries (esp. in materials
and process specifications)
15
Practical Semi-Automatic Content Extraction
•  DESIGN: Develop regular data structures that
can be used to formalize tabular information.
–  Provide a natural expression of data
–  Provide semantics to data, thereby removing potential
ambiguities
–  Enable automatic translation

•  USE: Manual population of regular tables and
automatic translation into LOD

16
Expressive search and integration
using Geospatial information

17
Outline
•  Query Language Support for SpatioTemporal Context: SPARQL-ST
(=> GeoSPARQL)
•  Practical Applications that use SpatioTemporal information for joining Sensor
Data to enable Machine Perception
18
Overview : SPARQL-ST
•  SPARQL
–  W3C recommended query language for RDF data (as of
Jan. 15, 2008)
–  Graph pattern-based queries (subgraph match)

•  SPARQL-ST
–  Spatial variables
–  Temporal variables
–  Spatial filter expressions
–  Temporal filter expressions
skipToEg	
  
19
Find all politicians that represent areas within 100 miles of the
district represented by Nancy Pelosi.
SELECT ?n
WHERE {
?p foaf:name ?n .
?p usgov:hasRole ?r .
?r usgov:forOffice ?o .
?o usgov:represents ?q .
?q stt:located_at %g .
?a foaf:name “Nancy Pelosi” .
?a usgov:hasRole ?b .
?b usgov:forOffice ?c .
?c usgov:represents ?d .
?d stt:located_at %h .
SPATIAL FILTER (distance(%g, %h) <= 100 miles)
}

20
Find all politicians representing congressional districts within a
given geographical area at any time in October 2013
SELECT ?p
WHERE {
?p usgov:hasRole ?r #t1 .
?r usgov:forOffice ?o #t2 .
?o usgov:represents ?c #t3 .
?c stt:located_at %g #t4 .
SPATIAL FILTER (inside(%g, GEOM(POLYGON ((
-75.14 40.88, -70.77 40.88, -70.77 42.35,
-75.14 42.35, -75.14 40.88))) ))
TEMPORAL FILTER (
anyinteract(intersect (#t1, #t2, #t3, #t4),
interval(10:01:2013, 10:31:2013, MM:DD:YYYY)))
}
21
Summary of SPARQL-ST
•  Relationship-centric nature of the RDF data model
extended for querying STT data
•  Querying
–  Supports spatial and temporal relationships in graph
pattern queries
–  Integrates well with current standards
•  Implementation
–  Good scalability on large synthetic/real-world data
–  Only system for spatial and temporal RDF
22
OGC GeoSPARQL
Slides by Matt Perry of Oracle
(also: Kno.e.sis Alumnus)
4th Annual Spatial Ontology Community of Practice Workshop (SOCoP)
USGS, 12201 Sunrise Valley Drive , Reston VA
December 2, 2011

23
What Does GeoSPARQL Give Us?
•  Vocabulary for Query Patterns
–  Classes
•  Spatial Object, Feature, Geometry

–  Properties
•  Topological relations
•  Links between features and geometries

–  Datatypes for geometry literals
•  ogc:WKTLiteral, ogc:GMLLiteral

SkipToEg

•  Query Functions
–  Topological relations, distance, buffer, intersection, …

•  Entailment Components
–  RIF rules to expand feature-feature query into geometry query
–  Gives a common interface for qualitative and quantitative systems

OGC

®
24
Example Query
Find the three closest Mexican restaurants
PREFIX
PREFIX
PREFIX
PREFIX

: <http://my.com/appSchema#>
ogc: <http://www.opengis.net/geosparql#>
ogcf: <http://www.opengis.net/geosparql/functions#>
epsg: <http://www.opengis.net/def/crs/EPSG/0/>

SELECT ?restaurant
WHERE { ?restaurant rdf:type
:Restaurant .
?restaurant :cuisine
:Mexican .
?restaurant :pointGeometry ?rGeo .
?rGeo
ogc:asWKT
?rWKT }
ORDER BY ASC(ogcf:distance(“POINT(…)”^^ogc:WKTLiteral,
?rWKT, ogc:KM))
LIMIT 3

OGC

®
25
Practical Applications
that use Spatio-Temporal information
for joining Sensor Data to enable
Machine Perception

26
Applications using spatial and/or temporal information
•  Location-aware applications
–  Four Squares
–  Open Street Maps
•  Spatio-temporal-thematic (STT) contextenhanced data integration, querying, and
inferencing (machine perception)
–  Semantic Sensor Web (+ SemSOS)
•  Abstract weather sensor data streams to
weather features
27
(cont’d)
•  Applications supporting expressive queries
–  Human comprehensible vs machine processable
• 

Geonames (LOD) ↔ Lat-long, GPS data
– 

What is the current temperature or traffic delay at Dayton
International Airport?

–  Knowledge-based query expansion/reasoning
•  Bridging vocabulary mismatches in the queries and the data,
e.g., using semantic relationships between regions and
landmark locations
–  Find schools in OH
–  Find schools near Wright State University

28
Semantic Sensor Observation Service Architecture :
Making the Data Smart

29
SSW demo with Mesowest data (Machine Perception)

http://archive.knoesis.org/projects/sensorweb/demos/semsos_mesowest/ssos_demo.htm
30
Implementation of Perception Cycle

31
Trusted Perception Cycle Demo

http://www.youtube.com/watch?v=lTxzghCjGgU
32
Sensor Discovery on Linked Data Demo

http://archive.knoesis.org/projects/sensorweb/demos/sensor_discovery_on_lod/sample.htm
33
Kno.e.sis
thank you, and please visit us at

http://knoesis.org/

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA


34

Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

  • 1.
    Semantics-enhanced Geoscience Interoperability,Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 1
  • 2.
    Outline •  Semantics-empowered Cyberinfrastructurefor Geoscience Applications –  Approaches, Benefits, and Challenges (reflecting cost/convenience/pay-off trade-offs) •  Expressive search and integration using Geospatial information –  –  SPARQL enhancements Practical applications using semantic technologies, sensor data streams, and spatial information 2
  • 3.
  • 4.
    Domain Goals andChallenges Data-driven understanding of the evolution of oceans, atmosphere, and solid earth over time through physical, chemical and biological processes. •  Cultural challenges –  Proper protection, control, and credit for sharing data •  Technological challenges –  Computational tools and repositories conducive to easy exchange, curation, and attribution of data Data sharing can promote re-analysis/re-interpretation of extant data, reducing “redundant” data collections. 4
  • 5.
    Category of Geoscience Data Characteristics Strategy forReuse CI Strategy Short tail science data created by large organization s a n d projects Few, large (TB+), structured, spatially rich (e.g., remote sensing), largely homogeneous, highly visible, curated Planned integration strategies, could use formal ontologies / domain models and vocabularies, visualization tools and APIs Data centers / grids generally using relational databases and files, maintained by people with significant IT skills Long tail science data created by individual scientists and small groups Many, small (GB+), heterogeneous, invisible (except via publications), poorly curated Multi-domain and broad vocabularies (including community established ones), create semantic metadata (annotations) and optionally publish, search and download legacy data, or use an open data initiative Web-based easy to learn and use semantic tools for annotation, publication, search and download that can be used by individual scientists without significant IT skills 5
  • 6.
    Our Thesis Associating machine-processablesemantics with the long tail of science data and documents can help overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. 6
  • 7.
    What?: Nature ofData •  Structured Data (e.g., relational) •  Semi-structured, Heterogeneous Documents (e.g., Geoscience publications and technical specs, which usually include text, numerics, maps and images) •  Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries) 7
  • 8.
    What?: Granularity ofSemantics and Associated Applications •  Lightweight semantics: File-level annotation to enable discovery and sharing of long tail of science data •  Richer semantics: Document-level annotation and extraction for semantic search and summarization •  Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data 8
  • 9.
    Why?: Benefits ofLightweight Semantics •  Ease of use by domain experts –  Faster and wider adoption, promoting evolution •  Low upfront cost to support •  Shallow semantics has wider applicability to a range of documents/data and appeal to a broader community of geoscientists •  Bottom-line: “Learn to Walk before we Run” 9
  • 10.
    How?: Ingredients forSemantics-based Cyber Infrastructure •  Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies) •  Ease self-publishing and discovery •  Data citation index to credit for data sharing •  S e m i - a u t o m a t i c a n n o t a t i o n o f d a t a a n d documents : Manual + Automatic 10
  • 11.
    Example: Lightweight SemanticRegistration of Data Title of data Selected from five tier vocabulary provided Keywords Type of data maps, excel files, images, text Data format structured or unstructured Description of data brief unstructured description of content Contact information of provider(s) name of provider(s), email for verification, lineage Spatial extent of data and reference system location Temporal extent of data date range in time or age range if not recent Date and type of Related Publication(s) Journal, Thesis, Agency report, not published Host site for publication Journal, Library, Personal computer Access restrictions copyright regulations 11
  • 12.
  • 13.
    Deeper Issues: SemanticFormalization of Tabular Data Problems and A Practical Approach (“When rubber meets the road”) skip   13
  • 14.
    Nature of tables • Compact structures for sharing information –  Minimize duplication •  Types of Tables –  Regular : Dense Grid with explicit schema information in terms of column and row headings => Tractable –  Irregular: Sparse Grid with implicit schema and ad hoc placement of heading => Hard 14
  • 15.
    Challenges Associated withTypical Spreadsheet/Table •  Meant for human consumption •  Irregular : –  Not simple rectangular grid •  Heterogeneous –  All rows not interpreted similarly •  Complex –  Meaning of each row and each column context dependent •  Footnotes modify meaning of entries (esp. in materials and process specifications) 15
  • 16.
    Practical Semi-Automatic ContentExtraction •  DESIGN: Develop regular data structures that can be used to formalize tabular information. –  Provide a natural expression of data –  Provide semantics to data, thereby removing potential ambiguities –  Enable automatic translation •  USE: Manual population of regular tables and automatic translation into LOD 16
  • 17.
    Expressive search andintegration using Geospatial information 17
  • 18.
    Outline •  Query LanguageSupport for SpatioTemporal Context: SPARQL-ST (=> GeoSPARQL) •  Practical Applications that use SpatioTemporal information for joining Sensor Data to enable Machine Perception 18
  • 19.
    Overview : SPARQL-ST • SPARQL –  W3C recommended query language for RDF data (as of Jan. 15, 2008) –  Graph pattern-based queries (subgraph match) •  SPARQL-ST –  Spatial variables –  Temporal variables –  Spatial filter expressions –  Temporal filter expressions skipToEg   19
  • 20.
    Find all politiciansthat represent areas within 100 miles of the district represented by Nancy Pelosi. SELECT ?n WHERE { ?p foaf:name ?n . ?p usgov:hasRole ?r . ?r usgov:forOffice ?o . ?o usgov:represents ?q . ?q stt:located_at %g . ?a foaf:name “Nancy Pelosi” . ?a usgov:hasRole ?b . ?b usgov:forOffice ?c . ?c usgov:represents ?d . ?d stt:located_at %h . SPATIAL FILTER (distance(%g, %h) <= 100 miles) } 20
  • 21.
    Find all politiciansrepresenting congressional districts within a given geographical area at any time in October 2013 SELECT ?p WHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:represents ?c #t3 . ?c stt:located_at %g #t4 . SPATIAL FILTER (inside(%g, GEOM(POLYGON (( -75.14 40.88, -70.77 40.88, -70.77 42.35, -75.14 42.35, -75.14 40.88))) )) TEMPORAL FILTER ( anyinteract(intersect (#t1, #t2, #t3, #t4), interval(10:01:2013, 10:31:2013, MM:DD:YYYY))) } 21
  • 22.
    Summary of SPARQL-ST • Relationship-centric nature of the RDF data model extended for querying STT data •  Querying –  Supports spatial and temporal relationships in graph pattern queries –  Integrates well with current standards •  Implementation –  Good scalability on large synthetic/real-world data –  Only system for spatial and temporal RDF 22
  • 23.
    OGC GeoSPARQL Slides byMatt Perry of Oracle (also: Kno.e.sis Alumnus) 4th Annual Spatial Ontology Community of Practice Workshop (SOCoP) USGS, 12201 Sunrise Valley Drive , Reston VA December 2, 2011 23
  • 24.
    What Does GeoSPARQLGive Us? •  Vocabulary for Query Patterns –  Classes •  Spatial Object, Feature, Geometry –  Properties •  Topological relations •  Links between features and geometries –  Datatypes for geometry literals •  ogc:WKTLiteral, ogc:GMLLiteral SkipToEg •  Query Functions –  Topological relations, distance, buffer, intersection, … •  Entailment Components –  RIF rules to expand feature-feature query into geometry query –  Gives a common interface for qualitative and quantitative systems OGC ® 24
  • 25.
    Example Query Find thethree closest Mexican restaurants PREFIX PREFIX PREFIX PREFIX : <http://my.com/appSchema#> ogc: <http://www.opengis.net/geosparql#> ogcf: <http://www.opengis.net/geosparql/functions#> epsg: <http://www.opengis.net/def/crs/EPSG/0/> SELECT ?restaurant WHERE { ?restaurant rdf:type :Restaurant . ?restaurant :cuisine :Mexican . ?restaurant :pointGeometry ?rGeo . ?rGeo ogc:asWKT ?rWKT } ORDER BY ASC(ogcf:distance(“POINT(…)”^^ogc:WKTLiteral, ?rWKT, ogc:KM)) LIMIT 3 OGC ® 25
  • 26.
    Practical Applications that useSpatio-Temporal information for joining Sensor Data to enable Machine Perception 26
  • 27.
    Applications using spatialand/or temporal information •  Location-aware applications –  Four Squares –  Open Street Maps •  Spatio-temporal-thematic (STT) contextenhanced data integration, querying, and inferencing (machine perception) –  Semantic Sensor Web (+ SemSOS) •  Abstract weather sensor data streams to weather features 27
  • 28.
    (cont’d) •  Applications supportingexpressive queries –  Human comprehensible vs machine processable •  Geonames (LOD) ↔ Lat-long, GPS data –  What is the current temperature or traffic delay at Dayton International Airport? –  Knowledge-based query expansion/reasoning •  Bridging vocabulary mismatches in the queries and the data, e.g., using semantic relationships between regions and landmark locations –  Find schools in OH –  Find schools near Wright State University 28
  • 29.
    Semantic Sensor ObservationService Architecture : Making the Data Smart 29
  • 30.
    SSW demo withMesowest data (Machine Perception) http://archive.knoesis.org/projects/sensorweb/demos/semsos_mesowest/ssos_demo.htm 30
  • 31.
  • 32.
    Trusted Perception CycleDemo http://www.youtube.com/watch?v=lTxzghCjGgU 32
  • 33.
    Sensor Discovery onLinked Data Demo http://archive.knoesis.org/projects/sensorweb/demos/sensor_discovery_on_lod/sample.htm 33
  • 34.
    Kno.e.sis thank you, andplease visit us at http://knoesis.org/ Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA 34