SlideShare a Scribd company logo
1 of 31
Amarnath Gupta
University of California San Diego
NIF as a Multi-Model Semantic
Information System
Part 1: Relational, XML, RDF and OWL models
Preamble – 1
 As we design and extend the NIF system we
recognize that
 Users will give us data in any form that is
convenient for them
 Standard data may be stored in a flat file
 Web service output can be in XML
 Semantic Web enthusiasts may represent data using
proper RDF
 However, regardless of the form in which data
may be represented
 The NIF system must treat them
(query, index, relate, ...) in a uniform manner
 The NIF system must utilize the underlying systems
Preamble – 2
 In this presentation we intend to
 Explain our perspective on these different
data models
 Provide a background on the data models
we consider
 Offer a sense of the “semantic character” of
these data models
 Present our design philosophy on
 Where to keep them separate
 Where to transform them into a common model
What is a Data Model?
 A conceptual data model
 A formal representation of the users’/application’s
mental model of data elements and their
relationships that should be put in a
database, manipulated, queried and operated upon
 A logical data model
 A formal description of the data model in a logical
structure that a computer can use to perform the
queries and other operations. In many cases, the
same conceptual model can be represented by
different logical models
 A physical data model
 An implementable version of the data model in
terms of data structures, access structures
(e.g., indices) and the set of low-level operations
A Conceptual Model
ORM Model – Terry Halpin
Object
Relationship/
Role
Value
Constraint
Uniqueness
Constraint
Inter-relationship
Constraint
Value
Type
n-ary
Role
A Logical Data Model
 A formal specification of
 The structure of the data
 The structure tells us how the data is organized
(123, “Purkinje Cell”, Cerebellum)
(828, Hippocampus, “Hilar Cell”)
 Often the structure of the data, together with some
constraints, represent some semantics
 If the data are not structured (like free text), the techniques for
handling them will be different.
 Operations on this structure
 Every data model is based on some mathematical principles
that define what you can do with the data
 the nature of data values
 Data domains and data types
 operations on data values
is not structured
The Relational Data Model
NeuronID NeuronName BrainRegion NeuroTransmitte
r
Current
1 Purkinje Cell Cerebellum Glutamate Transient Na+
2 Hilar Cell Dentate
Gyrus
GABA Ca2+
 Attribute Domain all possible values the attribute can
take
 Candidate key: a set of columns that uniquely
determines a row
 Relational model is a set (bag) of tuples model
 Metadata stored in a separate catalog which is also
relational
 First order constraints
 All queries are about some combination of
 Selecting rows, columns
 Combining tables by union, intersection, join
 Computing data or aggregate functions
 Grouping and sorting
Table: Neurons
Attribute name
Attribute value:
Cannot be compl
Relation name
Tuple
Object Relational Model
 Eases some of the problems of the classical relational
model
 Data values can be of arbitrary data types
 Sets (e.g., multiple currents for a neuron)
 Tuples (e.g., references ordered by year)
 Time-series (e.g., raw EEG data)
 Spatial Data (e.g., atlases in CCDB)
 Each data type can have its own operations
 Find all data points within a neighborhood of a spatial location
 Queries are still values
 Catalog queries and data queries cannot be mixed in a single
query
 All industrial-strength DBMSs use some version this
model
 Need to be a skilled DB programmer to develop custom
XML (Two Perspectives)
 Document Community
 data = linear text documents
 mark up (annotate) text pieces to describe
context, structure, semantics of the marked text
<physiologicalCondition> Oxidative stress </physiologicalCondition> has been
proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis
</biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible
source of <physiologicalCondition> oxidative stress </physiologicalCondition> in
<brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is
the redox reactions that specifically involve <chemical> dopamine </chemical> and
produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
XML (Two Perspectives)
 Database Community
 XML as a (most prominent) example of the semi-
structured data model
=> captures the whole spectrum from highly
structured, regular data to unstructured data
(relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?>
<NDTF_Annotation>
<description>A new annotation file </description>
<timeMarker>true</timeMarker>
<timeResolution>0.000001</timeResolution>
<interval group_id="04">
<eventNote timeOffset="1237888.230” attachedFile="sound1.wmv”
application="realplayer">Text message for the event
start.</eventNote>
<eventNote timeOffset="18958585.232">Text message for the event
end.</eventNote>
</interval>
From the CARMEN gro
XML as a Logical Data Model
 XML is a tree-structured
document
 Nodes
 Element nodes
 Children can be ordered
 Recursive elements
(parts under parts)
 Attribute nodes
 Mandatory or optional
 Edges
 Sub-element edges
 Attribute edges
 IDRef edges
 Constraints
 References
 Value restrictions, OneOf
 Cardinality
• Trees are more flexible than
tables
• Any number of nodes can be
added anywhere without
breaking the model
XML as a Logical Data Model
• XML has its own schema language
• Lets you specify a complex type system
• A database is a collection of XML trees
 Storing XML
 Mostly relational with some very clever indexing to encode
the hierarchy, tree paths, and order
 Querying XML
 Elements, attribute names, values and structure can be
queried
 Multiple trees can be joined by value
 Example (Xpath)
 http://mousespinal.brain-
map.org/imageseries/detail/100002661.xml
 Find images of the spinal column
 //image[//structurelabel/text()=“SPINAL
COLUMN”]/ish_image_path
Misusing and Abusing XML
 Using XML if your data is relational
 It will result in flat trees that will suffer from complex
querying
 Encoding orders and hierarchies that need special
parsing
<Brand_Mixtures count=“2”>
<Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa)
</Brand_Mixture_1>
<Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets
(carbidopa + levodopa) </Brand_Mixture_2>
</Brand_Mixtures>
 Using implicit multi-valuedness
<atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3">
<array dictRef="cml:calcCharge" dataType="xsd:decimal"
units="cml:electron">0.2 -0.3
0.1</array>
</atomArray>
Expressing Semantics in XML
 Adorning elements with Namespaces
 A namespace is a unique URI (Uniform Resource
Locator)
 To disambiguate between two elements that happen to share
the same name
 To group elements relating to a common idea together
<item xmlns:bp="http://www.biopax.org/release/biopax-
level1.owl#">
<bp:protein ID="Protein1">
<bp:NAME>Metalloelastase</bp:NAME>
<bp:XREF>
<bp:unificationXref rdf:ID="Xref1">
<bp:ID>NP_304845</bp:ID>
<bp:DB>RefSeq</bp:DB>
</bp:unificationXref>
</bp:XREF>
</bp:protein>
The Problem with XML
Semantics
 Two different XML
representations of the
same kind of
information may not
be easily unifiable
 What did XML not
encode?
Resource Description Format
(RDF)
Rdf:statement
URI(CNTFR- URI(modulat
es)
URI(eSNCA-
mediated
neurotoxicity)
Rdf:type
Rdf:object
Rdf:predicate
Rdf:subject
URI(membra
ne-protein)
Rdf:type
URI(protein-
mediated
toxicity)
Rdf:type
Rdf:property
The Basic Constructs of RDF
 RDF meta-model basic elements
 All defined in rdf namespace
 http://www.w3.org/1999/02/22-rdf-syntax-ns#
 Types (or classes)
 rdf:resource – everything that can be identified (with a
URI)
 rdf:property – specialization of a resource expressing a
binary relation between two resources
 rdf:statement – a triple with properties
rdf:subject, rdf:predicate, rdf:object
 Properties
 rdf:type - subject is an instance of that category or
class defined by the value
 rdf:subject, rdf:predicate, rdf:object – relate elements
of statement tuple to a resource of type statement.
Relational Data vis-à-vis RDF
 Node to edge ratio is
relatively small in
many applications
 Number of
relationships need not
be fixed at design time
 The general tendency
is keep the number of
edge labels small
 Graph-based
operations can be
performed on
RDF, which requires
an unspecified number
of joins in relational
data
RDF Blank Nodes
 RDF allows one to create anonymous objects whose
existence is known but details are not
 There exists some neuron to which both NeuronX and
NeuronY connect
 <neurons:NeuronX
rdf:about="http://neurons.org/Neuron#NeuronX">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronX>
 <neurons:NeuronY
rdf:about="http://neurons.org/Neuron#NeuronY">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronY>
RDF Schema
 Declaration of vocabularies
 classes, properties, and relationships defined by a
particular community
 rdfs:Class, rdfs:subClassOf
 Property-related
 rdfs:subPropertyOf
 relationship of properties to classes
 rdfs:domain, rdfs:range
 Provides substructure for inferences based on existing
triples
 NOT prescriptive, but descriptive
 This is different from XML Schema
 Schema language is an expression of basic RDF model
 uses meta-model constructs:
resources, statements, properties
Examples of RDF Inferencing
 From this we can infer
 (:alice rdf:type parent)
 (:betty rdf:type parent)
 (:eve rdf:type female-person)
 (:charles rdf:type :person)
RDF as a Logical Data Model
 RDF does not distinguish between different
relationships
 Instance-to-type
 Instance-to-instance
 Type-to-instance
 No transitivity inference is possible over, say, rdf:type
 RDF (as well as XML) has lost the notion of the
abstract data type like spatial object or time
 Operations on object types does not mix well with RDF
 Constraints like uniqueness, 1-to-1
relationships, cannot be expressed
 SPARQL, the query language for RDF is
 An edge-only language – it cannot express the //
construct of XML
 Blank nodes are treated as variables not output in the
results
 Parts of the language are undecidable!
A problem is undecidable if it can be proved that there can be no algorithm
OWL
 Components of an OWL Ontology
 Vocabulary (concepts)
 Structure (attributes of concepts and hierarchy)
 Concept-to-concept, concept-to-data, property-to-
property relationships
 Logical characteristics of relationships
 Domain and range restrictions
 Properties of relations (symmetry, transitivity)
 Cardinality of relations
 Open world vs. Closed world assumptions
 Contrast to most reasoning systems that assume
anything absent from knowledge base is not true
 Need to maintain monotonicity with tolerance for
contradictions
 OWL Classes
Class of all classes
Basic OWL Constructs
 Creating OWL Classes
 disjointWith
 Neurons are not glial cells
 sameClassAs (equivalence)
 Class Gabaergic neuron is exactly the same class as
neuronswhich has GABA as neurotransmitter
 Enumerations (on instances)
 Class Cerebellar lobules are Lobule I, Lobule II, …
 Boolean set semantics (on classes)
 Union (logical disjunction)
 Class nerve cell is union of neuron, glial cell
 Intersection (logical conjunction of class with properties)
 Class hippocampal neurons is conjunction of things of
class Neuron and have property (has-soma-located-in)
(hippocampus union any class that is (part-of)
hippocampus)
 complimentOf (logical negation)
 Class ‘benign tumor’ is disjunct of class ‘malignant
tumor’
Properties of OWL Properties
 Transitive Property
 P(x,y) and P(y,z) P(x,z) subclassOf
 SymmetricProperty
 P(x,y) iff P(y,x) is_functionally_related_to
 Functional Property
 P(x,y) and P(x,z) y=z soma_located_in
 inverseOf
 P1(x,y) iff P2(y,x) regulates is_regulated_by
 InverseFunctional Property
 P(y,x) and P(z,x) y=z is_isoform_of
 Cardinality
 Only 0 or 1 in OWL-lite and OWL-full
Instances in OWL
 Instances are distinct from Classes
 In RDF there is no distinction between class and
instances
 <Species, type, Class>
 <Lion, type, Species>
 <MyLion, type, Lion>
 OWL DL restrictions
 Type separation
 Class can not also be an individual or property
 Property can not also be an individual or class
is allowed in RDF
A Rough Comparison
~
RDF and OWL do not represent n-ary roles
Querying OWL
 The are several languages in the making
 SPARQL engines (e.g., Virtuoso) are used often
 Pellet is used for reasoning tasks
 Subsumption
 Consistency
 New, more advanced languages like nSPARQL
are coming up
 vSPARQL is being developed to enable views on
SPARQL, which will lead to nested SPARQL
queries
 Our goal
 Develop a query processor for these advanced
languages
 Part of OntoQuest, our ontological information
Where does NIF stand in this?
 Not every model is directly inter-convertible with every
other model
 NIF is designed to
 Work with multiple models
 Ensure that the modeling capability and query capability of
every model is preserved in its native form
 Queries in our system get translated to queries in the native
forms of the databases we federate
 Express the local semantics of any data appropriately by
 Augmenting the semantic model of the data
 Connecting the data to NIF’s ontology
 Extending the NIF ontology in the process
 Develop a mechanism to create a common integrated
model over these models
 this model is an ontological graph that incorporates object and
temporal semantics
Example of An Ontological Extension
 Representing time and events
 Phenotypes, physiology, …
 Instants, intervals, and periods
 Temporal granularity of observation
 Events
 Multi-temporal observations based on conditions on properties
 Modeling states, objects in state, and state transitions
 One-only, repeatable, and time deictic events
 Subevents
 History of objects, events, roles
 Subtype migration, Temporal roles and role migration
 Progression of disease, symptom or recovery states
 Repeatability
Considering
TOWL and
Temporal
ORM
Questions?

More Related Content

What's hot

Ijarcet vol-2-issue-2-676-678
Ijarcet vol-2-issue-2-676-678Ijarcet vol-2-issue-2-676-678
Ijarcet vol-2-issue-2-676-678
Editor IJARCET
 
Deductive Databases Presentation
Deductive Databases PresentationDeductive Databases Presentation
Deductive Databases Presentation
Maroun Baydoun
 

What's hot (19)

Oodbms ch 20
Oodbms ch 20Oodbms ch 20
Oodbms ch 20
 
Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
 
Ordbms
OrdbmsOrdbms
Ordbms
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
Chapt 1 odbms
Chapt 1 odbmsChapt 1 odbms
Chapt 1 odbms
 
Ontology
OntologyOntology
Ontology
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data Integration
 
Object Oriented Database Management System
Object Oriented Database Management SystemObject Oriented Database Management System
Object Oriented Database Management System
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short Notes
 
Ijarcet vol-2-issue-2-676-678
Ijarcet vol-2-issue-2-676-678Ijarcet vol-2-issue-2-676-678
Ijarcet vol-2-issue-2-676-678
 
Database management system chapter5
Database management system chapter5Database management system chapter5
Database management system chapter5
 
Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integration
 
Deductive Databases Presentation
Deductive Databases PresentationDeductive Databases Presentation
Deductive Databases Presentation
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic Web
 
RESTful Services
RESTful ServicesRESTful Services
RESTful Services
 
Semantics 101
Semantics 101Semantics 101
Semantics 101
 
Database system
Database system Database system
Database system
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
Object oriented database concepts
Object oriented database conceptsObject oriented database concepts
Object oriented database concepts
 

Similar to NIF as a Multi-Model Semantic Information System

Dbms Lec Uog 02
Dbms Lec Uog 02Dbms Lec Uog 02
Dbms Lec Uog 02
smelltulip
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdf
rsujeet169
 
Part2- The Atomic Information Resource
Part2- The Atomic Information ResourcePart2- The Atomic Information Resource
Part2- The Atomic Information Resource
JEAN-MICHEL LETENNIER
 

Similar to NIF as a Multi-Model Semantic Information System (20)

Dbms Lec Uog 02
Dbms Lec Uog 02Dbms Lec Uog 02
Dbms Lec Uog 02
 
Presentation1
Presentation1Presentation1
Presentation1
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database
 
DatabaseManagementSystem.pptx
DatabaseManagementSystem.pptxDatabaseManagementSystem.pptx
DatabaseManagementSystem.pptx
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdf
 
Part2- The Atomic Information Resource
Part2- The Atomic Information ResourcePart2- The Atomic Information Resource
Part2- The Atomic Information Resource
 
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdfDatabase systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
 
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdfDatabase systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
 
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdfDatabase systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
 
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdfDatabase systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
 
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdfDatabase systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
 
Database systems Handbook.pdf
Database systems Handbook.pdfDatabase systems Handbook.pdf
Database systems Handbook.pdf
 
Database systems Handbook.pdf
Database systems Handbook.pdfDatabase systems Handbook.pdf
Database systems Handbook.pdf
 
Database systems Handbook.pdf
Database systems Handbook.pdfDatabase systems Handbook.pdf
Database systems Handbook.pdf
 
Space efficient structures for json documents
Space efficient structures for json documentsSpace efficient structures for json documents
Space efficient structures for json documents
 
Presentation
PresentationPresentation
Presentation
 
Spatial Database and Database Management System
Spatial Database and Database Management SystemSpatial Database and Database Management System
Spatial Database and Database Management System
 
2. Chapter Two.pdf
2. Chapter Two.pdf2. Chapter Two.pdf
2. Chapter Two.pdf
 

More from Neuroscience Information Framework

More from Neuroscience Information Framework (20)

Why should my institution support RRIDs?
Why should my institution support RRIDs?Why should my institution support RRIDs?
Why should my institution support RRIDs?
 
Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?
 
Funders and RRIDs
Funders and RRIDsFunders and RRIDs
Funders and RRIDs
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
INCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource LayerINCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource Layer
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
NIF services overview
NIF services overviewNIF services overview
NIF services overview
 
NIF Lexical Overview
NIF Lexical OverviewNIF Lexical Overview
NIF Lexical Overview
 
NIF Services
NIF ServicesNIF Services
NIF Services
 
NIF Data Registration
NIF Data RegistrationNIF Data Registration
NIF Data Registration
 
NIF Data Ingest
NIF Data IngestNIF Data Ingest
NIF Data Ingest
 
NIF Data Federation
NIF Data FederationNIF Data Federation
NIF Data Federation
 
NIF Overview
NIF Overview NIF Overview
NIF Overview
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
NIF: A vision for a uniform resource layer
NIF: A vision for a uniform resource layerNIF: A vision for a uniform resource layer
NIF: A vision for a uniform resource layer
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

NIF as a Multi-Model Semantic Information System

  • 1. Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System Part 1: Relational, XML, RDF and OWL models
  • 2. Preamble – 1  As we design and extend the NIF system we recognize that  Users will give us data in any form that is convenient for them  Standard data may be stored in a flat file  Web service output can be in XML  Semantic Web enthusiasts may represent data using proper RDF  However, regardless of the form in which data may be represented  The NIF system must treat them (query, index, relate, ...) in a uniform manner  The NIF system must utilize the underlying systems
  • 3. Preamble – 2  In this presentation we intend to  Explain our perspective on these different data models  Provide a background on the data models we consider  Offer a sense of the “semantic character” of these data models  Present our design philosophy on  Where to keep them separate  Where to transform them into a common model
  • 4. What is a Data Model?  A conceptual data model  A formal representation of the users’/application’s mental model of data elements and their relationships that should be put in a database, manipulated, queried and operated upon  A logical data model  A formal description of the data model in a logical structure that a computer can use to perform the queries and other operations. In many cases, the same conceptual model can be represented by different logical models  A physical data model  An implementable version of the data model in terms of data structures, access structures (e.g., indices) and the set of low-level operations
  • 5. A Conceptual Model ORM Model – Terry Halpin Object Relationship/ Role Value Constraint Uniqueness Constraint Inter-relationship Constraint Value Type n-ary Role
  • 6. A Logical Data Model  A formal specification of  The structure of the data  The structure tells us how the data is organized (123, “Purkinje Cell”, Cerebellum) (828, Hippocampus, “Hilar Cell”)  Often the structure of the data, together with some constraints, represent some semantics  If the data are not structured (like free text), the techniques for handling them will be different.  Operations on this structure  Every data model is based on some mathematical principles that define what you can do with the data  the nature of data values  Data domains and data types  operations on data values is not structured
  • 7. The Relational Data Model NeuronID NeuronName BrainRegion NeuroTransmitte r Current 1 Purkinje Cell Cerebellum Glutamate Transient Na+ 2 Hilar Cell Dentate Gyrus GABA Ca2+  Attribute Domain all possible values the attribute can take  Candidate key: a set of columns that uniquely determines a row  Relational model is a set (bag) of tuples model  Metadata stored in a separate catalog which is also relational  First order constraints  All queries are about some combination of  Selecting rows, columns  Combining tables by union, intersection, join  Computing data or aggregate functions  Grouping and sorting Table: Neurons Attribute name Attribute value: Cannot be compl Relation name Tuple
  • 8. Object Relational Model  Eases some of the problems of the classical relational model  Data values can be of arbitrary data types  Sets (e.g., multiple currents for a neuron)  Tuples (e.g., references ordered by year)  Time-series (e.g., raw EEG data)  Spatial Data (e.g., atlases in CCDB)  Each data type can have its own operations  Find all data points within a neighborhood of a spatial location  Queries are still values  Catalog queries and data queries cannot be mixed in a single query  All industrial-strength DBMSs use some version this model  Need to be a skilled DB programmer to develop custom
  • 9. XML (Two Perspectives)  Document Community  data = linear text documents  mark up (annotate) text pieces to describe context, structure, semantics of the marked text <physiologicalCondition> Oxidative stress </physiologicalCondition> has been proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis </biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible source of <physiologicalCondition> oxidative stress </physiologicalCondition> in <brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is the redox reactions that specifically involve <chemical> dopamine </chemical> and produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
  • 10. XML (Two Perspectives)  Database Community  XML as a (most prominent) example of the semi- structured data model => captures the whole spectrum from highly structured, regular data to unstructured data (relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?> <NDTF_Annotation> <description>A new annotation file </description> <timeMarker>true</timeMarker> <timeResolution>0.000001</timeResolution> <interval group_id="04"> <eventNote timeOffset="1237888.230” attachedFile="sound1.wmv” application="realplayer">Text message for the event start.</eventNote> <eventNote timeOffset="18958585.232">Text message for the event end.</eventNote> </interval> From the CARMEN gro
  • 11. XML as a Logical Data Model  XML is a tree-structured document  Nodes  Element nodes  Children can be ordered  Recursive elements (parts under parts)  Attribute nodes  Mandatory or optional  Edges  Sub-element edges  Attribute edges  IDRef edges  Constraints  References  Value restrictions, OneOf  Cardinality • Trees are more flexible than tables • Any number of nodes can be added anywhere without breaking the model
  • 12. XML as a Logical Data Model • XML has its own schema language • Lets you specify a complex type system • A database is a collection of XML trees  Storing XML  Mostly relational with some very clever indexing to encode the hierarchy, tree paths, and order  Querying XML  Elements, attribute names, values and structure can be queried  Multiple trees can be joined by value  Example (Xpath)  http://mousespinal.brain- map.org/imageseries/detail/100002661.xml  Find images of the spinal column  //image[//structurelabel/text()=“SPINAL COLUMN”]/ish_image_path
  • 13. Misusing and Abusing XML  Using XML if your data is relational  It will result in flat trees that will suffer from complex querying  Encoding orders and hierarchies that need special parsing <Brand_Mixtures count=“2”> <Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa) </Brand_Mixture_1> <Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets (carbidopa + levodopa) </Brand_Mixture_2> </Brand_Mixtures>  Using implicit multi-valuedness <atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3"> <array dictRef="cml:calcCharge" dataType="xsd:decimal" units="cml:electron">0.2 -0.3 0.1</array> </atomArray>
  • 14. Expressing Semantics in XML  Adorning elements with Namespaces  A namespace is a unique URI (Uniform Resource Locator)  To disambiguate between two elements that happen to share the same name  To group elements relating to a common idea together <item xmlns:bp="http://www.biopax.org/release/biopax- level1.owl#"> <bp:protein ID="Protein1"> <bp:NAME>Metalloelastase</bp:NAME> <bp:XREF> <bp:unificationXref rdf:ID="Xref1"> <bp:ID>NP_304845</bp:ID> <bp:DB>RefSeq</bp:DB> </bp:unificationXref> </bp:XREF> </bp:protein>
  • 15. The Problem with XML Semantics  Two different XML representations of the same kind of information may not be easily unifiable  What did XML not encode?
  • 16. Resource Description Format (RDF) Rdf:statement URI(CNTFR- URI(modulat es) URI(eSNCA- mediated neurotoxicity) Rdf:type Rdf:object Rdf:predicate Rdf:subject URI(membra ne-protein) Rdf:type URI(protein- mediated toxicity) Rdf:type Rdf:property
  • 17. The Basic Constructs of RDF  RDF meta-model basic elements  All defined in rdf namespace  http://www.w3.org/1999/02/22-rdf-syntax-ns#  Types (or classes)  rdf:resource – everything that can be identified (with a URI)  rdf:property – specialization of a resource expressing a binary relation between two resources  rdf:statement – a triple with properties rdf:subject, rdf:predicate, rdf:object  Properties  rdf:type - subject is an instance of that category or class defined by the value  rdf:subject, rdf:predicate, rdf:object – relate elements of statement tuple to a resource of type statement.
  • 18. Relational Data vis-à-vis RDF  Node to edge ratio is relatively small in many applications  Number of relationships need not be fixed at design time  The general tendency is keep the number of edge labels small  Graph-based operations can be performed on RDF, which requires an unspecified number of joins in relational data
  • 19. RDF Blank Nodes  RDF allows one to create anonymous objects whose existence is known but details are not  There exists some neuron to which both NeuronX and NeuronY connect  <neurons:NeuronX rdf:about="http://neurons.org/Neuron#NeuronX"> <conn:connectsTo> <neurons:Neuron rdf:nodeID=“n1"/> </conn:connectsTo> </neurons:NeuronX>  <neurons:NeuronY rdf:about="http://neurons.org/Neuron#NeuronY"> <conn:connectsTo> <neurons:Neuron rdf:nodeID=“n1"/> </conn:connectsTo> </neurons:NeuronY>
  • 20. RDF Schema  Declaration of vocabularies  classes, properties, and relationships defined by a particular community  rdfs:Class, rdfs:subClassOf  Property-related  rdfs:subPropertyOf  relationship of properties to classes  rdfs:domain, rdfs:range  Provides substructure for inferences based on existing triples  NOT prescriptive, but descriptive  This is different from XML Schema  Schema language is an expression of basic RDF model  uses meta-model constructs: resources, statements, properties
  • 21. Examples of RDF Inferencing  From this we can infer  (:alice rdf:type parent)  (:betty rdf:type parent)  (:eve rdf:type female-person)  (:charles rdf:type :person)
  • 22. RDF as a Logical Data Model  RDF does not distinguish between different relationships  Instance-to-type  Instance-to-instance  Type-to-instance  No transitivity inference is possible over, say, rdf:type  RDF (as well as XML) has lost the notion of the abstract data type like spatial object or time  Operations on object types does not mix well with RDF  Constraints like uniqueness, 1-to-1 relationships, cannot be expressed  SPARQL, the query language for RDF is  An edge-only language – it cannot express the // construct of XML  Blank nodes are treated as variables not output in the results  Parts of the language are undecidable! A problem is undecidable if it can be proved that there can be no algorithm
  • 23. OWL  Components of an OWL Ontology  Vocabulary (concepts)  Structure (attributes of concepts and hierarchy)  Concept-to-concept, concept-to-data, property-to- property relationships  Logical characteristics of relationships  Domain and range restrictions  Properties of relations (symmetry, transitivity)  Cardinality of relations  Open world vs. Closed world assumptions  Contrast to most reasoning systems that assume anything absent from knowledge base is not true  Need to maintain monotonicity with tolerance for contradictions  OWL Classes Class of all classes
  • 24. Basic OWL Constructs  Creating OWL Classes  disjointWith  Neurons are not glial cells  sameClassAs (equivalence)  Class Gabaergic neuron is exactly the same class as neuronswhich has GABA as neurotransmitter  Enumerations (on instances)  Class Cerebellar lobules are Lobule I, Lobule II, …  Boolean set semantics (on classes)  Union (logical disjunction)  Class nerve cell is union of neuron, glial cell  Intersection (logical conjunction of class with properties)  Class hippocampal neurons is conjunction of things of class Neuron and have property (has-soma-located-in) (hippocampus union any class that is (part-of) hippocampus)  complimentOf (logical negation)  Class ‘benign tumor’ is disjunct of class ‘malignant tumor’
  • 25. Properties of OWL Properties  Transitive Property  P(x,y) and P(y,z) P(x,z) subclassOf  SymmetricProperty  P(x,y) iff P(y,x) is_functionally_related_to  Functional Property  P(x,y) and P(x,z) y=z soma_located_in  inverseOf  P1(x,y) iff P2(y,x) regulates is_regulated_by  InverseFunctional Property  P(y,x) and P(z,x) y=z is_isoform_of  Cardinality  Only 0 or 1 in OWL-lite and OWL-full
  • 26. Instances in OWL  Instances are distinct from Classes  In RDF there is no distinction between class and instances  <Species, type, Class>  <Lion, type, Species>  <MyLion, type, Lion>  OWL DL restrictions  Type separation  Class can not also be an individual or property  Property can not also be an individual or class is allowed in RDF
  • 27. A Rough Comparison ~ RDF and OWL do not represent n-ary roles
  • 28. Querying OWL  The are several languages in the making  SPARQL engines (e.g., Virtuoso) are used often  Pellet is used for reasoning tasks  Subsumption  Consistency  New, more advanced languages like nSPARQL are coming up  vSPARQL is being developed to enable views on SPARQL, which will lead to nested SPARQL queries  Our goal  Develop a query processor for these advanced languages  Part of OntoQuest, our ontological information
  • 29. Where does NIF stand in this?  Not every model is directly inter-convertible with every other model  NIF is designed to  Work with multiple models  Ensure that the modeling capability and query capability of every model is preserved in its native form  Queries in our system get translated to queries in the native forms of the databases we federate  Express the local semantics of any data appropriately by  Augmenting the semantic model of the data  Connecting the data to NIF’s ontology  Extending the NIF ontology in the process  Develop a mechanism to create a common integrated model over these models  this model is an ontological graph that incorporates object and temporal semantics
  • 30. Example of An Ontological Extension  Representing time and events  Phenotypes, physiology, …  Instants, intervals, and periods  Temporal granularity of observation  Events  Multi-temporal observations based on conditions on properties  Modeling states, objects in state, and state transitions  One-only, repeatable, and time deictic events  Subevents  History of objects, events, roles  Subtype migration, Temporal roles and role migration  Progression of disease, symptom or recovery states  Repeatability Considering TOWL and Temporal ORM