Semantic Technologies for Big Sciences including Astrophysics

Semantic Technologies for Big Science and Astrophysics
Invited presentation: EarthCube Solar-Terrestrial End-User Workshop
NJIT, Newark NJ, August 13-15, 2014
Amit Sheth, T. K. Prasad
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

2
Astrophysics
Lots of data
Heterogeneous
Complex
http://en.wikipedia.org/wiki/Astrophysics#mediaviewer/File:NGC_4414_%28NASA-med%29.jpg

3
Challenge
• How can we handle this vast, heterogeneous,
and complex data space?
• Focus on complexity rather than raw processing:
integration, collaboration, reuse
Can Semantic (Web) technologies ease
the challenges and empower the scientists?

The Semantic Web vision: 1999-2001
• Sir Tim Berners Lee, in his 1999 “Weaving the Web” book,
emphasized the significance of metadata about Web
documents.
• Well known May 2001 article presented an agent and an AI
based vision for “next generation of the World Wide Web”
with content amenable to automation.
• With Taalee (later Voquette, Semagix) I founded in 1999, I
pursued a highly practical realization with semantic search,
browsing and analysis products. Had commercial
applications starting 2000, patent awarded in 2001.
4

1
• Agreement and Knowledge: Agreement about a
common vocabulary/nomenclature, conceptual
models and domain knowledge, ontology
– Codified as Schema + Knowledge Base.
– Agreement is what enables interoperability.
– Formal machine processable description is what
leads to automation.
– Manual, semi-automated, automated creation of
ontologies

2
• Semantic Annotation (Metadata Extraction):
Associating meaning with data, or labeling data so
it is more meaningful to the system and people.
– Manual
– Semi-automatic (automatic with human
verification)
– Automatic

3
• Reasoning/Computation, Applications:
– Semantics enabled search, browsing
– Data integration, collaboration
– Visualization
– Analyses including pattern discovery, mining, hypothesis
validation
– Answering complex queries, making connections (paths,
sub graphs), supporting discovery

How to integrate well? From Syntax to Semantics
9

SSN
Ontology
Using Semantics to Climb Levels of Abstraction: an example
3 Interpreted data
(abductive)
[in OWL]
e.g., diagnosis
2 Interpreted data
(deductive)
[in OWL]
e.g., threshold
1 Annotated Data
[in RDF]
e.g., label
0 Raw Data
[in TEXT]
e.g., number
Intellego
Hyperthyroidism
… …
Elevated
Blood
Pressure
Systolic blood pressure of 150 mmHg
“150”
10

Semantic Web technologies – in practice
● Ontologies to capture domain knowledge (sometimes
taxonomy/nomenclature is good enough)
● Languages to represent/capture domain knowledge
and data - OWL, RDF/RDFS.
● Data sharing and publishing online (e.g., LOD).
● Annotation, semantic search, semantic browsing
● Provenance,…
Widely used in biomedicine; quite a few applications in
healthcare, growing use and explorations in geosciences
and more…
11

In this talk, I will review/borrow from
• ScienceWISE at EPFL which uses semantic
technology to serve Physicists including
Astrophysicists: shared vocabulary, annotation,
browsing for related concepts
• Semantic (web) technologies for health care and
life sciences encompassing collaborative research,
prototypes, open source tools and ontologies,
deployed applications, commercialization,…
• MaterialWays: Our project in Materials Genome
Initiatives …
12

“Ontology” in physics domain – ScienseWISE
● ScienceWISE
WISE - Web based Interactive Semantic Environment
● An interactive and crowdsourced tool to capture
knowledge from scientists’ daily routine work.
● Core consists of a community built ontology.
● Literature gets annotated and bookmarked using
the ontology.
13

14
ontology
annotation
bookmarking &
recommendations
http://sciencewise.info/

Value Proposition
Associating machine-processable semantics
with scientific, engineering data and
documents can help overcome challenges
associated with data discovery, integration
and interoperability caused by data
heterogeneity.
15

Benefits of using semantics for Astrophysicists (and other sciences)
• Challenges
– Massive volume
– Heterogeneity (i.e., from many sources, format/structure, text,
images).
– Interoperability and sharing data
– Provenance and Access Control.
• Need techniques beyond ScienceWISE
– Interested in data beyond scientific publications
– Data sharing (and credit/data citation for data sharing)
– Provenance and Access control
– A framework to capture, search, and discover astrophysical
data
16

Nature of Data and Documents
17
Relational/Tabular Data
XML document
Image
Technical Specs
Irregular Tables
Publications

Granularity of Semantics and Applications: Examples
• Synonyms
– Chemistry, Chemical Composition, Chemical Analysis, ...
– Bend Test, Bending, ...
– Delivery Condition, Process/Surface Finish, Temper, "as received by
purchaser", ...
• Coreference vs broadening/narrowing
– Tubing vs welded tubing vs flash-welded part
• Capturing characteristic-value pairs
– Recognize and Normalize: “0.1 inch and under in nominal thickness”
is translated to “Thickness <= 0.1 in”.
– Glean elided characteristic: controlled term “solution heat treated”
implies the characteristic “heat treat type”.
18

Granularity of Semantics and Associated Applications
• Lightweight semantics: File and document-level
annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and
extraction for semantic search and summarization
• Fine-grained semantics: Data integration and
interoperability.
19

Using Semantic Web Technologies
Machine-processable semantics achieved by
addressing
• Syntactic Heterogeneity: Using XML syntax and
RDF datamodel (labelled graph structure)
• Semantic Heterogeneity:
– Using “common” controlled vocabularies, taxonomies
and ontologies
– Using federated data sources, exchanges, querying,
and services
20

Ingredients for Semantics-based Cyber Infrastructure
• Use of community-ratified controlled
vocabularies and lightweight ontologies
(upper-level, hierarchies)
• Semi-automatic annotation of data and
documents
• Support for provenance and access control
21

A proposed “light-weight semantics” approach
(for highly distributed community, low start up time, long tail science)…
22

23
Our applications in
Materials Genome Initiative
Materialways (our project related to Material Genomics Initiative):
http://wiki.knoesis.org/index.php/MaterialWays

Matvocab home page
Search and discovery
Annotate documents
Visualize the
knowledge base
Create process
assertions
Query vocabulary
View, edit, and add

Annotate, search, and track provenance
• Vocabulary is used to annotate documents.
• Annotated documents can be indexed.
• Documents can be integrated reliably based
on common terms of interest and
provenance information.
26

27
Annotate documents using standard vocabulary

Create process assertions (OnCET)
• Add information about inputs to and outputs
of a process as assertions in triple form
using standard vocabulary.
• Add assertions about materials domain
knowledge using vocabulary terms and
relationship among them, e.g., about
process control parameters and
performance characteristics.

Provenance Metadata
• Explains the origin of an artifact, such as
– How was it created?
– Who created it?
– When was it created?
• Example: for a given material X
– Which processes are involved in making the material and
what are the relevant performance properties?
– What are the inputs, control parameters and outputs of a
process?
– Which research/engineering team performed an
experiment?

30
Capturing provenance metadata - iExplore
generic PMC prepreg
generic hand lay-up
generic PMC lay-up
generic autoclave cure
generic PMC
subjected to
subjected to
yields
yields

Vocabulary Provenance
31
ASM Handbook
MIL Handbook 5
Vocabulary terms MIL Handbook 17
Vocabulary term exWpreisksei-db ina RsDeFd a nCd rpoubwlishde-ds oonluinrec (hinttpg:// kVnooecsias.borug/mlaartvyocab/A-basis)

32
Capturing Vocabulary Provenance - iExplore
Definition
Rights
Source
Vocabulary term

Our proposal - Astrophysics
• Tagging, annotation, search
• Knowledgebase ->
Ontology
• Provenance – at every data
level
• Data access control
• Capture process flows
• Capture relationships
between concept instances
• Visualization of process
flows
ScienceWISE - Physics
• Tagging, annotation, search
• Ontology ->
Knowledgebase
• Provenance
33

Our approach to help in Astrophysics
• Access control and provenance details at every
data level -> handle huge amount of astrophysics
data.
• Create relationships between concepts and
visualize them in graph format.
• Adding facts or assertion about each concept.

Databases
Personal desktops
Lab notebooks
Single
Access
36

Public-Private Data Sharing
• Enhance publicly available datasets while
retaining intellectual property data privately for
businesses
Private data and metadata
(e.g. ongoing experimental processes, intellectual property data)
37
Selectively shared data and metadata
(e.g. with ongoing collaborators, licensed data)
Public data and metadata
(e.g., released products, material specifications)

Federated Architecture
OEM partner A
38
Private
Shared
Public
Federal Endpoint
1. User
Authentication
2. Federated Semantic
Query Processor
AC
Processor
Semantic
Query
Processor
Private
Shared
Public
AC
Processor
Semantic
Query
Processor
OEM partner B
3. Semantics
Mappings
Private
Shared
Public
AC
Processor
Semantic
Query
Processor
OEM supplier C

Principles of a Federation
• Each component controls access to its local data
independently (local autonomy).
• A query is decomposed to multiple sub-queries,
each sub-query is executed at one component.
• Results from sub-queries are combined by the
federated query processor (control global access)

Can we choose any part of our
Semantic Web data
to share with public community,
or with selective collaborators ?

Different levels of granularity
– Individual resources
• Example: a material product, a manufacturing process
– Individual triples
• Example: properties of a product, or process
– Entire datasets
Enable flexible selection of any data piece to be
shared at anytime

Federal
Endpoint
2. AC-embedded Query Execution
Local Component A
Creating
Resources
Granting
Permissions
Inferring
Permissions
AC
Processes
User X of either
Public group or Collaborators
Manager Y
of component A
1. Query Rewriting

Various Policies
• Role-based Access Control (RBAC)
• Mandatory Access Control (MAC)
• Attribute-based Access Control (ABAC)
• Discretionary Access Control (DAC)
1. Which policy? Depends on the
organization’s needs!
2. Our AC mechanism can be extended to
support any of these policies.

Advance capability: semantic browsing
• Example of Scooner:
http://wiki.knoesis.org/index.php/Scooner
• Demo:
http://knoesis.wright.edu/library/demos/scooner-demo/
44

Take Away
Use of semantic web technologies
can help overcome challenges associated with
data discovery, integration, and interoperability,
caused by data heterogeneity.
Use provenance and access control information
help share/exchange data reliably.
45

46
Kno.e.sis
Thank you, and please visit us at
http://knoesis.org/
http://wiki.knoesis.org/index.php/MaterialWays
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Special Thanks (MaterialWays team): . Clare Paul (AFRL),
Kalpa Gunaratna, Vinh Nguyen, Sarasi Lalithsena, Swapnil Soni. Nitisha Jayakumar, Siva Cheekula.

Semantic Technologies for Big Sciences including Astrophysics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Semantic Technologies for Big Sciences including Astrophysics

Similar to Semantic Technologies for Big Sciences including Astrophysics (20)

Recently uploaded

Recently uploaded (20)

Semantic Technologies for Big Sciences including Astrophysics

Editor's Notes