SlideShare a Scribd company logo
1 of 46
Semantic Technologies for Big Science and Astrophysics 
Invited presentation: EarthCube Solar-Terrestrial End-User Workshop 
NJIT, Newark NJ, August 13-15, 2014 
Amit Sheth, T. K. Prasad 
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
2 
Astrophysics 
Lots of data 
Heterogeneous 
Complex 
http://en.wikipedia.org/wiki/Astrophysics#mediaviewer/File:NGC_4414_%28NASA-med%29.jpg
3 
Challenge 
• How can we handle this vast, heterogeneous, 
and complex data space? 
• Focus on complexity rather than raw processing: 
integration, collaboration, reuse 
Can Semantic (Web) technologies ease 
the challenges and empower the scientists?
The Semantic Web vision: 1999-2001 
• Sir Tim Berners Lee, in his 1999 “Weaving the Web” book, 
emphasized the significance of metadata about Web 
documents. 
• Well known May 2001 article presented an agent and an AI 
based vision for “next generation of the World Wide Web” 
with content amenable to automation. 
• With Taalee (later Voquette, Semagix) I founded in 1999, I 
pursued a highly practical realization with semantic search, 
browsing and analysis products. Had commercial 
applications starting 2000, patent awarded in 2001. 
4
1 
2 
3 
of 
Semantic Web
1 
• Agreement and Knowledge: Agreement about a 
common vocabulary/nomenclature, conceptual 
models and domain knowledge, ontology 
– Codified as Schema + Knowledge Base. 
– Agreement is what enables interoperability. 
– Formal machine processable description is what 
leads to automation. 
– Manual, semi-automated, automated creation of 
ontologies
2 
• Semantic Annotation (Metadata Extraction): 
Associating meaning with data, or labeling data so 
it is more meaningful to the system and people. 
– Manual 
– Semi-automatic (automatic with human 
verification) 
– Automatic
3 
• Reasoning/Computation, Applications: 
– Semantics enabled search, browsing 
– Data integration, collaboration 
– Visualization 
– Analyses including pattern discovery, mining, hypothesis 
validation 
– Answering complex queries, making connections (paths, 
sub graphs), supporting discovery
How to integrate well? From Syntax to Semantics 
9
SSN 
Ontology 
Using Semantics to Climb Levels of Abstraction: an example 
3 Interpreted data 
(abductive) 
[in OWL] 
e.g., diagnosis 
2 Interpreted data 
(deductive) 
[in OWL] 
e.g., threshold 
1 Annotated Data 
[in RDF] 
e.g., label 
0 Raw Data 
[in TEXT] 
e.g., number 
Intellego 
Hyperthyroidism 
… … 
Elevated 
Blood 
Pressure 
Systolic blood pressure of 150 mmHg 
“150” 
10
Semantic Web technologies – in practice 
● Ontologies to capture domain knowledge (sometimes 
taxonomy/nomenclature is good enough) 
● Languages to represent/capture domain knowledge 
and data - OWL, RDF/RDFS. 
● Data sharing and publishing online (e.g., LOD). 
● Annotation, semantic search, semantic browsing 
● Provenance,… 
Widely used in biomedicine; quite a few applications in 
healthcare, growing use and explorations in geosciences 
and more… 
11
In this talk, I will review/borrow from 
• ScienceWISE at EPFL which uses semantic 
technology to serve Physicists including 
Astrophysicists: shared vocabulary, annotation, 
browsing for related concepts 
• Semantic (web) technologies for health care and 
life sciences encompassing collaborative research, 
prototypes, open source tools and ontologies, 
deployed applications, commercialization,… 
• MaterialWays: Our project in Materials Genome 
Initiatives … 
12
“Ontology” in physics domain – ScienseWISE 
● ScienceWISE 
WISE - Web based Interactive Semantic Environment 
● An interactive and crowdsourced tool to capture 
knowledge from scientists’ daily routine work. 
● Core consists of a community built ontology. 
● Literature gets annotated and bookmarked using 
the ontology. 
13
14 
ontology 
annotation 
bookmarking & 
recommendations 
http://sciencewise.info/
Value Proposition 
Associating machine-processable semantics 
with scientific, engineering data and 
documents can help overcome challenges 
associated with data discovery, integration 
and interoperability caused by data 
heterogeneity. 
15
Benefits of using semantics for Astrophysicists (and other sciences) 
• Challenges 
– Massive volume 
– Heterogeneity (i.e., from many sources, format/structure, text, 
images). 
– Interoperability and sharing data 
– Provenance and Access Control. 
• Need techniques beyond ScienceWISE 
– Interested in data beyond scientific publications 
– Data sharing (and credit/data citation for data sharing) 
– Provenance and Access control 
– A framework to capture, search, and discover astrophysical 
data 
16
Nature of Data and Documents 
17 
Relational/Tabular Data 
XML document 
Image 
Technical Specs 
Irregular Tables 
Publications
Granularity of Semantics and Applications: Examples 
• Synonyms 
– Chemistry, Chemical Composition, Chemical Analysis, ... 
– Bend Test, Bending, ... 
– Delivery Condition, Process/Surface Finish, Temper, "as received by 
purchaser", ... 
• Coreference vs broadening/narrowing 
– Tubing vs welded tubing vs flash-welded part 
• Capturing characteristic-value pairs 
– Recognize and Normalize: “0.1 inch and under in nominal thickness” 
is translated to “Thickness <= 0.1 in”. 
– Glean elided characteristic: controlled term “solution heat treated” 
implies the characteristic “heat treat type”. 
18
Granularity of Semantics and Associated Applications 
• Lightweight semantics: File and document-level 
annotation to enable discovery and sharing 
• Richer semantics: Data-level annotation and 
extraction for semantic search and summarization 
• Fine-grained semantics: Data integration and 
interoperability. 
19
Using Semantic Web Technologies 
Machine-processable semantics achieved by 
addressing 
• Syntactic Heterogeneity: Using XML syntax and 
RDF datamodel (labelled graph structure) 
• Semantic Heterogeneity: 
– Using “common” controlled vocabularies, taxonomies 
and ontologies 
– Using federated data sources, exchanges, querying, 
and services 
20
Ingredients for Semantics-based Cyber Infrastructure 
• Use of community-ratified controlled 
vocabularies and lightweight ontologies 
(upper-level, hierarchies) 
• Semi-automatic annotation of data and 
documents 
• Support for provenance and access control 
21
A proposed “light-weight semantics” approach 
(for highly distributed community, low start up time, long tail science)… 
22
23 
Our applications in 
Materials Genome Initiative 
Materialways (our project related to Material Genomics Initiative): 
http://wiki.knoesis.org/index.php/MaterialWays
Matvocab home page 
Search and discovery 
Annotate documents 
Visualize the 
knowledge base 
Create process 
assertions 
Query vocabulary 
View, edit, and add
25 
Search & Discovery
Annotate, search, and track provenance 
• Vocabulary is used to annotate documents. 
• Annotated documents can be indexed. 
• Documents can be integrated reliably based 
on common terms of interest and 
provenance information. 
26
27 
Annotate documents using standard vocabulary
Create process assertions (OnCET) 
• Add information about inputs to and outputs 
of a process as assertions in triple form 
using standard vocabulary. 
• Add assertions about materials domain 
knowledge using vocabulary terms and 
relationship among them, e.g., about 
process control parameters and 
performance characteristics.
Provenance Metadata 
• Explains the origin of an artifact, such as 
– How was it created? 
– Who created it? 
– When was it created? 
• Example: for a given material X 
– Which processes are involved in making the material and 
what are the relevant performance properties? 
– What are the inputs, control parameters and outputs of a 
process? 
– Which research/engineering team performed an 
experiment?
30 
Capturing provenance metadata - iExplore 
generic PMC prepreg 
generic hand lay-up 
generic PMC lay-up 
generic autoclave cure 
generic PMC 
subjected to 
subjected to 
yields 
yields
Vocabulary Provenance 
31 
ASM Handbook 
MIL Handbook 5 
Vocabulary terms MIL Handbook 17 
Vocabulary term exWpreisksei-db ina RsDeFd a nCd rpoubwlishde-ds oonluinrec (hinttpg:// kVnooecsias.borug/mlaartvyocab/A-basis)
32 
Capturing Vocabulary Provenance - iExplore 
Definition 
Rights 
Source 
Vocabulary term
Our proposal - Astrophysics 
• Tagging, annotation, search 
• Knowledgebase -> 
Ontology 
• Provenance – at every data 
level 
• Data access control 
• Capture process flows 
• Capture relationships 
between concept instances 
• Visualization of process 
flows 
ScienceWISE - Physics 
• Tagging, annotation, search 
• Ontology -> 
Knowledgebase 
• Provenance 
33
Our approach to help in Astrophysics 
• Access control and provenance details at every 
data level -> handle huge amount of astrophysics 
data. 
• Create relationships between concepts and 
visualize them in graph format. 
• Adding facts or assertion about each concept.
35 
Data Access
Databases 
Personal desktops 
Lab notebooks 
Single 
Access 
36
Public-Private Data Sharing 
• Enhance publicly available datasets while 
retaining intellectual property data privately for 
businesses 
Private data and metadata 
(e.g. ongoing experimental processes, intellectual property data) 
37 
Selectively shared data and metadata 
(e.g. with ongoing collaborators, licensed data) 
Public data and metadata 
(e.g., released products, material specifications)
Federated Architecture 
OEM partner A 
38 
Private 
Shared 
Public 
Federal Endpoint 
1. User 
Authentication 
2. Federated Semantic 
Query Processor 
AC 
Processor 
Semantic 
Query 
Processor 
Private 
Shared 
Public 
AC 
Processor 
Semantic 
Query 
Processor 
OEM partner B 
3. Semantics 
Mappings 
Private 
Shared 
Public 
AC 
Processor 
Semantic 
Query 
Processor 
OEM supplier C
Principles of a Federation 
• Each component controls access to its local data 
independently (local autonomy). 
• A query is decomposed to multiple sub-queries, 
each sub-query is executed at one component. 
• Results from sub-queries are combined by the 
federated query processor (control global access)
Can we choose any part of our 
Semantic Web data 
to share with public community, 
or with selective collaborators ?
Different levels of granularity 
– Individual resources 
• Example: a material product, a manufacturing process 
– Individual triples 
• Example: properties of a product, or process 
– Entire datasets 
Enable flexible selection of any data piece to be 
shared at anytime
Federal 
Endpoint 
2. AC-embedded Query Execution 
Local Component A 
Creating 
Resources 
Granting 
Permissions 
Inferring 
Permissions 
AC 
Processes 
User X of either 
Public group or Collaborators 
Manager Y 
of component A 
1. Query Rewriting
Various Policies 
• Role-based Access Control (RBAC) 
• Mandatory Access Control (MAC) 
• Attribute-based Access Control (ABAC) 
• Discretionary Access Control (DAC) 
1. Which policy? Depends on the 
organization’s needs! 
2. Our AC mechanism can be extended to 
support any of these policies.
Advance capability: semantic browsing 
• Example of Scooner: 
http://wiki.knoesis.org/index.php/Scooner 
• Demo: 
http://knoesis.wright.edu/library/demos/scooner-demo/ 
44
Take Away 
Use of semantic web technologies 
can help overcome challenges associated with 
data discovery, integration, and interoperability, 
caused by data heterogeneity. 
Use provenance and access control information 
help share/exchange data reliably. 
45
46 
Kno.e.sis 
Thank you, and please visit us at 
http://knoesis.org/ 
http://wiki.knoesis.org/index.php/MaterialWays 
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing 
Wright State University, Dayton, Ohio, USA 
Special Thanks (MaterialWays team): . Clare Paul (AFRL), 
Kalpa Gunaratna, Vinh Nguyen, Sarasi Lalithsena, Swapnil Soni. Nitisha Jayakumar, Siva Cheekula.

More Related Content

What's hot

Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantinimaxfalc
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining FrameworkPrakhyath Rai
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...Dr. Haxel Consult
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...National Institute of Informatics (NII)
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 

What's hot (20)

Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Text mining
Text miningText mining
Text mining
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantini
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Text MIning
Text MIningText MIning
Text MIning
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining Framework
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
Text mining
Text miningText mining
Text mining
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 

Viewers also liked

Domain case study: successful application of Semantic Web technologies and to...
Domain case study: successful application of Semantic Web technologies and to...Domain case study: successful application of Semantic Web technologies and to...
Domain case study: successful application of Semantic Web technologies and to...Artificial Intelligence Institute at UofSC
 
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated ConferencesComputing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated ConferencesArtificial Intelligence Institute at UofSC
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Artificial Intelligence Institute at UofSC
 
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...Artificial Intelligence Institute at UofSC
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 

Viewers also liked (17)

User Experiences of Enterprise Semantic Content Management
User Experiences of Enterprise Semantic Content ManagementUser Experiences of Enterprise Semantic Content Management
User Experiences of Enterprise Semantic Content Management
 
Trust networks
Trust networksTrust networks
Trust networks
 
Active Perception over Machine and Citizen Sensing
Active Perception  over Machine and Citizen SensingActive Perception  over Machine and Citizen Sensing
Active Perception over Machine and Citizen Sensing
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]
 
Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web
 
Domain case study: successful application of Semantic Web technologies and to...
Domain case study: successful application of Semantic Web technologies and to...Domain case study: successful application of Semantic Web technologies and to...
Domain case study: successful application of Semantic Web technologies and to...
 
Meena Nagarajan Ph.D. Dissertation Defense
Meena Nagarajan Ph.D. Dissertation DefenseMeena Nagarajan Ph.D. Dissertation Defense
Meena Nagarajan Ph.D. Dissertation Defense
 
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated ConferencesComputing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
 
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
Introduction to Kno.e.sis Center - March 2011
Introduction to Kno.e.sis Center - March 2011Introduction to Kno.e.sis Center - March 2011
Introduction to Kno.e.sis Center - March 2011
 
Kino : Making Semantic Annotations Easier
Kino : Making Semantic Annotations EasierKino : Making Semantic Annotations Easier
Kino : Making Semantic Annotations Easier
 
How to Leverage Social Media Communities for Crisis Response Coordination
How to Leverage Social Media Communities for Crisis Response CoordinationHow to Leverage Social Media Communities for Crisis Response Coordination
How to Leverage Social Media Communities for Crisis Response Coordination
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 

Similar to Semantic Technologies for Big Sciences including Astrophysics

Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Access Innovations, Inc.
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysisbdemchak
 

Similar to Semantic Technologies for Big Sciences including Astrophysics (20)

Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 

Recently uploaded

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 

Recently uploaded (20)

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 

Semantic Technologies for Big Sciences including Astrophysics

  • 1. Semantic Technologies for Big Science and Astrophysics Invited presentation: EarthCube Solar-Terrestrial End-User Workshop NJIT, Newark NJ, August 13-15, 2014 Amit Sheth, T. K. Prasad Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
  • 2. 2 Astrophysics Lots of data Heterogeneous Complex http://en.wikipedia.org/wiki/Astrophysics#mediaviewer/File:NGC_4414_%28NASA-med%29.jpg
  • 3. 3 Challenge • How can we handle this vast, heterogeneous, and complex data space? • Focus on complexity rather than raw processing: integration, collaboration, reuse Can Semantic (Web) technologies ease the challenges and empower the scientists?
  • 4. The Semantic Web vision: 1999-2001 • Sir Tim Berners Lee, in his 1999 “Weaving the Web” book, emphasized the significance of metadata about Web documents. • Well known May 2001 article presented an agent and an AI based vision for “next generation of the World Wide Web” with content amenable to automation. • With Taalee (later Voquette, Semagix) I founded in 1999, I pursued a highly practical realization with semantic search, browsing and analysis products. Had commercial applications starting 2000, patent awarded in 2001. 4
  • 5. 1 2 3 of Semantic Web
  • 6. 1 • Agreement and Knowledge: Agreement about a common vocabulary/nomenclature, conceptual models and domain knowledge, ontology – Codified as Schema + Knowledge Base. – Agreement is what enables interoperability. – Formal machine processable description is what leads to automation. – Manual, semi-automated, automated creation of ontologies
  • 7. 2 • Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people. – Manual – Semi-automatic (automatic with human verification) – Automatic
  • 8. 3 • Reasoning/Computation, Applications: – Semantics enabled search, browsing – Data integration, collaboration – Visualization – Analyses including pattern discovery, mining, hypothesis validation – Answering complex queries, making connections (paths, sub graphs), supporting discovery
  • 9. How to integrate well? From Syntax to Semantics 9
  • 10. SSN Ontology Using Semantics to Climb Levels of Abstraction: an example 3 Interpreted data (abductive) [in OWL] e.g., diagnosis 2 Interpreted data (deductive) [in OWL] e.g., threshold 1 Annotated Data [in RDF] e.g., label 0 Raw Data [in TEXT] e.g., number Intellego Hyperthyroidism … … Elevated Blood Pressure Systolic blood pressure of 150 mmHg “150” 10
  • 11. Semantic Web technologies – in practice ● Ontologies to capture domain knowledge (sometimes taxonomy/nomenclature is good enough) ● Languages to represent/capture domain knowledge and data - OWL, RDF/RDFS. ● Data sharing and publishing online (e.g., LOD). ● Annotation, semantic search, semantic browsing ● Provenance,… Widely used in biomedicine; quite a few applications in healthcare, growing use and explorations in geosciences and more… 11
  • 12. In this talk, I will review/borrow from • ScienceWISE at EPFL which uses semantic technology to serve Physicists including Astrophysicists: shared vocabulary, annotation, browsing for related concepts • Semantic (web) technologies for health care and life sciences encompassing collaborative research, prototypes, open source tools and ontologies, deployed applications, commercialization,… • MaterialWays: Our project in Materials Genome Initiatives … 12
  • 13. “Ontology” in physics domain – ScienseWISE ● ScienceWISE WISE - Web based Interactive Semantic Environment ● An interactive and crowdsourced tool to capture knowledge from scientists’ daily routine work. ● Core consists of a community built ontology. ● Literature gets annotated and bookmarked using the ontology. 13
  • 14. 14 ontology annotation bookmarking & recommendations http://sciencewise.info/
  • 15. Value Proposition Associating machine-processable semantics with scientific, engineering data and documents can help overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. 15
  • 16. Benefits of using semantics for Astrophysicists (and other sciences) • Challenges – Massive volume – Heterogeneity (i.e., from many sources, format/structure, text, images). – Interoperability and sharing data – Provenance and Access Control. • Need techniques beyond ScienceWISE – Interested in data beyond scientific publications – Data sharing (and credit/data citation for data sharing) – Provenance and Access control – A framework to capture, search, and discover astrophysical data 16
  • 17. Nature of Data and Documents 17 Relational/Tabular Data XML document Image Technical Specs Irregular Tables Publications
  • 18. Granularity of Semantics and Applications: Examples • Synonyms – Chemistry, Chemical Composition, Chemical Analysis, ... – Bend Test, Bending, ... – Delivery Condition, Process/Surface Finish, Temper, "as received by purchaser", ... • Coreference vs broadening/narrowing – Tubing vs welded tubing vs flash-welded part • Capturing characteristic-value pairs – Recognize and Normalize: “0.1 inch and under in nominal thickness” is translated to “Thickness <= 0.1 in”. – Glean elided characteristic: controlled term “solution heat treated” implies the characteristic “heat treat type”. 18
  • 19. Granularity of Semantics and Associated Applications • Lightweight semantics: File and document-level annotation to enable discovery and sharing • Richer semantics: Data-level annotation and extraction for semantic search and summarization • Fine-grained semantics: Data integration and interoperability. 19
  • 20. Using Semantic Web Technologies Machine-processable semantics achieved by addressing • Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure) • Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies and ontologies – Using federated data sources, exchanges, querying, and services 20
  • 21. Ingredients for Semantics-based Cyber Infrastructure • Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies) • Semi-automatic annotation of data and documents • Support for provenance and access control 21
  • 22. A proposed “light-weight semantics” approach (for highly distributed community, low start up time, long tail science)… 22
  • 23. 23 Our applications in Materials Genome Initiative Materialways (our project related to Material Genomics Initiative): http://wiki.knoesis.org/index.php/MaterialWays
  • 24. Matvocab home page Search and discovery Annotate documents Visualize the knowledge base Create process assertions Query vocabulary View, edit, and add
  • 25. 25 Search & Discovery
  • 26. Annotate, search, and track provenance • Vocabulary is used to annotate documents. • Annotated documents can be indexed. • Documents can be integrated reliably based on common terms of interest and provenance information. 26
  • 27. 27 Annotate documents using standard vocabulary
  • 28. Create process assertions (OnCET) • Add information about inputs to and outputs of a process as assertions in triple form using standard vocabulary. • Add assertions about materials domain knowledge using vocabulary terms and relationship among them, e.g., about process control parameters and performance characteristics.
  • 29. Provenance Metadata • Explains the origin of an artifact, such as – How was it created? – Who created it? – When was it created? • Example: for a given material X – Which processes are involved in making the material and what are the relevant performance properties? – What are the inputs, control parameters and outputs of a process? – Which research/engineering team performed an experiment?
  • 30. 30 Capturing provenance metadata - iExplore generic PMC prepreg generic hand lay-up generic PMC lay-up generic autoclave cure generic PMC subjected to subjected to yields yields
  • 31. Vocabulary Provenance 31 ASM Handbook MIL Handbook 5 Vocabulary terms MIL Handbook 17 Vocabulary term exWpreisksei-db ina RsDeFd a nCd rpoubwlishde-ds oonluinrec (hinttpg:// kVnooecsias.borug/mlaartvyocab/A-basis)
  • 32. 32 Capturing Vocabulary Provenance - iExplore Definition Rights Source Vocabulary term
  • 33. Our proposal - Astrophysics • Tagging, annotation, search • Knowledgebase -> Ontology • Provenance – at every data level • Data access control • Capture process flows • Capture relationships between concept instances • Visualization of process flows ScienceWISE - Physics • Tagging, annotation, search • Ontology -> Knowledgebase • Provenance 33
  • 34. Our approach to help in Astrophysics • Access control and provenance details at every data level -> handle huge amount of astrophysics data. • Create relationships between concepts and visualize them in graph format. • Adding facts or assertion about each concept.
  • 36. Databases Personal desktops Lab notebooks Single Access 36
  • 37. Public-Private Data Sharing • Enhance publicly available datasets while retaining intellectual property data privately for businesses Private data and metadata (e.g. ongoing experimental processes, intellectual property data) 37 Selectively shared data and metadata (e.g. with ongoing collaborators, licensed data) Public data and metadata (e.g., released products, material specifications)
  • 38. Federated Architecture OEM partner A 38 Private Shared Public Federal Endpoint 1. User Authentication 2. Federated Semantic Query Processor AC Processor Semantic Query Processor Private Shared Public AC Processor Semantic Query Processor OEM partner B 3. Semantics Mappings Private Shared Public AC Processor Semantic Query Processor OEM supplier C
  • 39. Principles of a Federation • Each component controls access to its local data independently (local autonomy). • A query is decomposed to multiple sub-queries, each sub-query is executed at one component. • Results from sub-queries are combined by the federated query processor (control global access)
  • 40. Can we choose any part of our Semantic Web data to share with public community, or with selective collaborators ?
  • 41. Different levels of granularity – Individual resources • Example: a material product, a manufacturing process – Individual triples • Example: properties of a product, or process – Entire datasets Enable flexible selection of any data piece to be shared at anytime
  • 42. Federal Endpoint 2. AC-embedded Query Execution Local Component A Creating Resources Granting Permissions Inferring Permissions AC Processes User X of either Public group or Collaborators Manager Y of component A 1. Query Rewriting
  • 43. Various Policies • Role-based Access Control (RBAC) • Mandatory Access Control (MAC) • Attribute-based Access Control (ABAC) • Discretionary Access Control (DAC) 1. Which policy? Depends on the organization’s needs! 2. Our AC mechanism can be extended to support any of these policies.
  • 44. Advance capability: semantic browsing • Example of Scooner: http://wiki.knoesis.org/index.php/Scooner • Demo: http://knoesis.wright.edu/library/demos/scooner-demo/ 44
  • 45. Take Away Use of semantic web technologies can help overcome challenges associated with data discovery, integration, and interoperability, caused by data heterogeneity. Use provenance and access control information help share/exchange data reliably. 45
  • 46. 46 Kno.e.sis Thank you, and please visit us at http://knoesis.org/ http://wiki.knoesis.org/index.php/MaterialWays Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Special Thanks (MaterialWays team): . Clare Paul (AFRL), Kalpa Gunaratna, Vinh Nguyen, Sarasi Lalithsena, Swapnil Soni. Nitisha Jayakumar, Siva Cheekula.

Editor's Notes

  1. Addl. areas that can be benefited => Geoscience Syntactic (format) and semantic (domain models, perspectives) heterogeneity [text vs excel vs XML) (units of measure, well-entrenched vocabularies)
  2. Use Case: Materials and process specifications Variety challenge: Sources of heterogeneity syntactic (excel, XML, text) vs semantic (UOM, controlled terms) Attribute value-pairs : explicit vs implicit : conditioned on shape, dimension (making these connections explicit from text doc non-trivial) Table captions : Use text-based metadata to help mediate => tabular data
  3. (Ref: B 50T26 S7, Sections 1, 4.2, 4.4) Synonyms: stemming (syntactic) to richer thesaurus (simple KB) (to MAP doc / text strings to domain concepts / ontology) Coreference issues: Purpose of Semantics => What is literally given vs what is really meant? E.g., KB says welded tubing ISA tubing, but in a paragraph that describes ‘welded tubing’, one can refer to it using “the tubing”. RECALL: materials and process specs typically describe: composition, processing, testing, and packaging of material Formalizing a procedure (a process or a test) as an aggregation of characteristic/parameter-value pairs Besides determining related phrases using clause, line, paragraph boundary, etc. we may need to use semantic/domain model/ontology to normalize or fill-in implicit details ============================== PLUS NLP-lite issues: There is confusion regarding the distribution of “and” over “or”, and over the interpretation of “and” and “or”. For instance, is “X or Y and Z” = “X and Z or Y and Z”? Similarly, “and” in the context “P is X and Y” connotes intersection, while “and” in the context of “P and Q are X” connotes union. ------------ Ingot chemistry vs product chemistry
  4. Semantics at different levels of detail and developed in stages : “Rome was not built in a day”! : Cost-benefit trade-offs ------------------------------------------------------ ANALOGY: Table of content (top-down, prescribed, static) vs Index (bottom-up, gleaned to describe, dynamic) -------------------------------------------------------- Controlled vocabularies <= Lightweight ontologies [ legacy vocab + community agreed semantic relationships] <= Formal ontologies Original document vs its translation => traceability (provenance) --------- Past Research: We have dealt with top-down UMLS ontology vs bottom-up facts from Pubmed in HPCO (Literature-based discovery -> LBD) --------- Pick from existing upper-level ontology vocabulary => manual ; indexing table columns, rows, captions Semi-automatic metadata generation/embedding => annotation: mapping text to concept; summarization: triple extraction => semantic search with bg KB Translation and summarization - [Integration and Interoperation requires Alignment of vocabularies] Graphical representation and querying Literature-based discovery: navigate through the documents based on path search through their LOD renditions (extractions) ----------------------------- RECALL: materials and process specs typically describe: composition, processing, testing, and packaging of material Formalizing a procedure (a process or a test) as an aggregation of characteristic/parameter-value pairs = LOD  Eventually allows combining and comparing specs ============================== Biomaterials use case: Gold surface affinity of peptide sequence =================== -------------- Compare, manipulate, and combine specs
  5. Use Case: Materials and process specifications Variety challenge: Sources of heterogeneity syntactic (excel, XML, text) vs semantic (UOM, controlled terms) Attribute value-pairs : explicit vs implicit : conditioned on shape, dimension (making these connections explicit from text doc non-trivial) Table captions : Use text-based metadata to help mediate => tabular data ----------------------- Unification – integration vs federation – interoperation/mediation
  6. Less training ASTM, NIST, MIL-stds (Handbook 21, 5) Flat list of terms and their associated definitions Hierarchical organization of properties, alloys, performance metrics, … Cross relationships: (1) Qualitative dependencies (proportionality) (2) Quantitative dependencies (equations/formula)
  7. Vocabulary created in the previous step used to automatically annotate the set of documents
  8. Definition Example
  9. Our tool i-Explore allows to browse how a particular output product is being created, the processed involved, and the input materials
  10. We created a Mediawiki extension for managing and editing vocabulary terms. Due to the nature of the material science domain, one term may have been defined differently in multiple sources. The source/right provenance metadata is captured in our provenance data model.
  11. Our tool i-Explore allows to browse how a particular output product is being created, the processed involved, and the input materials
  12. Data is spread all over the heterogeneous sources but inaccessbile to researchers and engineers: private lab info, a desktop, notebook, firewall To make it easy for everyone, a single access point to search for all publicly available information about materials?
  13. For each organization like research lab and industry company, there are three kinds of data: private, selectively shared and public
  14. Semantics Mappings
  15. To meet customized needs of different organizations
  16. By capturing the access control primitive operators in processes 1) A manager Y of a local component can grant access to individual users or a group of users. The Public group is dedicated to the entired federated system. Any resources granted to this Public group is available for everyone. 2) Meanwhile, we are also able to track any access rights in the system. One important scenario may be, one manager Y suspects can ask why a suspectious user has access to an important resource.