The document discusses different data models including relational, XML, RDF, and OWL models. It provides background on each model, explaining their logical structure and how semantics can be expressed. While the models are not directly inter-convertible, the NIF system is designed to work with multiple models by translating queries to the native format of each data source, augmenting models with semantic information, and developing a common integrated ontological graph.
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
NIF as a Multi-Model Semantic Information System
1. Amarnath Gupta
University of California San Diego
NIF as a Multi-Model Semantic
Information System
Part 1: Relational, XML, RDF and OWL models
2. Preamble – 1
As we design and extend the NIF system we
recognize that
Users will give us data in any form that is
convenient for them
Standard data may be stored in a flat file
Web service output can be in XML
Semantic Web enthusiasts may represent data using
proper RDF
However, regardless of the form in which data
may be represented
The NIF system must treat them
(query, index, relate, ...) in a uniform manner
The NIF system must utilize the underlying systems
3. Preamble – 2
In this presentation we intend to
Explain our perspective on these different
data models
Provide a background on the data models
we consider
Offer a sense of the “semantic character” of
these data models
Present our design philosophy on
Where to keep them separate
Where to transform them into a common model
4. What is a Data Model?
A conceptual data model
A formal representation of the users’/application’s
mental model of data elements and their
relationships that should be put in a
database, manipulated, queried and operated upon
A logical data model
A formal description of the data model in a logical
structure that a computer can use to perform the
queries and other operations. In many cases, the
same conceptual model can be represented by
different logical models
A physical data model
An implementable version of the data model in
terms of data structures, access structures
(e.g., indices) and the set of low-level operations
5. A Conceptual Model
ORM Model – Terry Halpin
Object
Relationship/
Role
Value
Constraint
Uniqueness
Constraint
Inter-relationship
Constraint
Value
Type
n-ary
Role
6. A Logical Data Model
A formal specification of
The structure of the data
The structure tells us how the data is organized
(123, “Purkinje Cell”, Cerebellum)
(828, Hippocampus, “Hilar Cell”)
Often the structure of the data, together with some
constraints, represent some semantics
If the data are not structured (like free text), the techniques for
handling them will be different.
Operations on this structure
Every data model is based on some mathematical principles
that define what you can do with the data
the nature of data values
Data domains and data types
operations on data values
is not structured
7. The Relational Data Model
NeuronID NeuronName BrainRegion NeuroTransmitte
r
Current
1 Purkinje Cell Cerebellum Glutamate Transient Na+
2 Hilar Cell Dentate
Gyrus
GABA Ca2+
Attribute Domain all possible values the attribute can
take
Candidate key: a set of columns that uniquely
determines a row
Relational model is a set (bag) of tuples model
Metadata stored in a separate catalog which is also
relational
First order constraints
All queries are about some combination of
Selecting rows, columns
Combining tables by union, intersection, join
Computing data or aggregate functions
Grouping and sorting
Table: Neurons
Attribute name
Attribute value:
Cannot be compl
Relation name
Tuple
8. Object Relational Model
Eases some of the problems of the classical relational
model
Data values can be of arbitrary data types
Sets (e.g., multiple currents for a neuron)
Tuples (e.g., references ordered by year)
Time-series (e.g., raw EEG data)
Spatial Data (e.g., atlases in CCDB)
Each data type can have its own operations
Find all data points within a neighborhood of a spatial location
Queries are still values
Catalog queries and data queries cannot be mixed in a single
query
All industrial-strength DBMSs use some version this
model
Need to be a skilled DB programmer to develop custom
9. XML (Two Perspectives)
Document Community
data = linear text documents
mark up (annotate) text pieces to describe
context, structure, semantics of the marked text
<physiologicalCondition> Oxidative stress </physiologicalCondition> has been
proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis
</biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible
source of <physiologicalCondition> oxidative stress </physiologicalCondition> in
<brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is
the redox reactions that specifically involve <chemical> dopamine </chemical> and
produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
10. XML (Two Perspectives)
Database Community
XML as a (most prominent) example of the semi-
structured data model
=> captures the whole spectrum from highly
structured, regular data to unstructured data
(relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?>
<NDTF_Annotation>
<description>A new annotation file </description>
<timeMarker>true</timeMarker>
<timeResolution>0.000001</timeResolution>
<interval group_id="04">
<eventNote timeOffset="1237888.230” attachedFile="sound1.wmv”
application="realplayer">Text message for the event
start.</eventNote>
<eventNote timeOffset="18958585.232">Text message for the event
end.</eventNote>
</interval>
From the CARMEN gro
11. XML as a Logical Data Model
XML is a tree-structured
document
Nodes
Element nodes
Children can be ordered
Recursive elements
(parts under parts)
Attribute nodes
Mandatory or optional
Edges
Sub-element edges
Attribute edges
IDRef edges
Constraints
References
Value restrictions, OneOf
Cardinality
• Trees are more flexible than
tables
• Any number of nodes can be
added anywhere without
breaking the model
12. XML as a Logical Data Model
• XML has its own schema language
• Lets you specify a complex type system
• A database is a collection of XML trees
Storing XML
Mostly relational with some very clever indexing to encode
the hierarchy, tree paths, and order
Querying XML
Elements, attribute names, values and structure can be
queried
Multiple trees can be joined by value
Example (Xpath)
http://mousespinal.brain-
map.org/imageseries/detail/100002661.xml
Find images of the spinal column
//image[//structurelabel/text()=“SPINAL
COLUMN”]/ish_image_path
13. Misusing and Abusing XML
Using XML if your data is relational
It will result in flat trees that will suffer from complex
querying
Encoding orders and hierarchies that need special
parsing
<Brand_Mixtures count=“2”>
<Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa)
</Brand_Mixture_1>
<Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets
(carbidopa + levodopa) </Brand_Mixture_2>
</Brand_Mixtures>
Using implicit multi-valuedness
<atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3">
<array dictRef="cml:calcCharge" dataType="xsd:decimal"
units="cml:electron">0.2 -0.3
0.1</array>
</atomArray>
14. Expressing Semantics in XML
Adorning elements with Namespaces
A namespace is a unique URI (Uniform Resource
Locator)
To disambiguate between two elements that happen to share
the same name
To group elements relating to a common idea together
<item xmlns:bp="http://www.biopax.org/release/biopax-
level1.owl#">
<bp:protein ID="Protein1">
<bp:NAME>Metalloelastase</bp:NAME>
<bp:XREF>
<bp:unificationXref rdf:ID="Xref1">
<bp:ID>NP_304845</bp:ID>
<bp:DB>RefSeq</bp:DB>
</bp:unificationXref>
</bp:XREF>
</bp:protein>
15. The Problem with XML
Semantics
Two different XML
representations of the
same kind of
information may not
be easily unifiable
What did XML not
encode?
17. The Basic Constructs of RDF
RDF meta-model basic elements
All defined in rdf namespace
http://www.w3.org/1999/02/22-rdf-syntax-ns#
Types (or classes)
rdf:resource – everything that can be identified (with a
URI)
rdf:property – specialization of a resource expressing a
binary relation between two resources
rdf:statement – a triple with properties
rdf:subject, rdf:predicate, rdf:object
Properties
rdf:type - subject is an instance of that category or
class defined by the value
rdf:subject, rdf:predicate, rdf:object – relate elements
of statement tuple to a resource of type statement.
18. Relational Data vis-à-vis RDF
Node to edge ratio is
relatively small in
many applications
Number of
relationships need not
be fixed at design time
The general tendency
is keep the number of
edge labels small
Graph-based
operations can be
performed on
RDF, which requires
an unspecified number
of joins in relational
data
19. RDF Blank Nodes
RDF allows one to create anonymous objects whose
existence is known but details are not
There exists some neuron to which both NeuronX and
NeuronY connect
<neurons:NeuronX
rdf:about="http://neurons.org/Neuron#NeuronX">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronX>
<neurons:NeuronY
rdf:about="http://neurons.org/Neuron#NeuronY">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronY>
20. RDF Schema
Declaration of vocabularies
classes, properties, and relationships defined by a
particular community
rdfs:Class, rdfs:subClassOf
Property-related
rdfs:subPropertyOf
relationship of properties to classes
rdfs:domain, rdfs:range
Provides substructure for inferences based on existing
triples
NOT prescriptive, but descriptive
This is different from XML Schema
Schema language is an expression of basic RDF model
uses meta-model constructs:
resources, statements, properties
21. Examples of RDF Inferencing
From this we can infer
(:alice rdf:type parent)
(:betty rdf:type parent)
(:eve rdf:type female-person)
(:charles rdf:type :person)
22. RDF as a Logical Data Model
RDF does not distinguish between different
relationships
Instance-to-type
Instance-to-instance
Type-to-instance
No transitivity inference is possible over, say, rdf:type
RDF (as well as XML) has lost the notion of the
abstract data type like spatial object or time
Operations on object types does not mix well with RDF
Constraints like uniqueness, 1-to-1
relationships, cannot be expressed
SPARQL, the query language for RDF is
An edge-only language – it cannot express the //
construct of XML
Blank nodes are treated as variables not output in the
results
Parts of the language are undecidable!
A problem is undecidable if it can be proved that there can be no algorithm
23. OWL
Components of an OWL Ontology
Vocabulary (concepts)
Structure (attributes of concepts and hierarchy)
Concept-to-concept, concept-to-data, property-to-
property relationships
Logical characteristics of relationships
Domain and range restrictions
Properties of relations (symmetry, transitivity)
Cardinality of relations
Open world vs. Closed world assumptions
Contrast to most reasoning systems that assume
anything absent from knowledge base is not true
Need to maintain monotonicity with tolerance for
contradictions
OWL Classes
Class of all classes
24. Basic OWL Constructs
Creating OWL Classes
disjointWith
Neurons are not glial cells
sameClassAs (equivalence)
Class Gabaergic neuron is exactly the same class as
neuronswhich has GABA as neurotransmitter
Enumerations (on instances)
Class Cerebellar lobules are Lobule I, Lobule II, …
Boolean set semantics (on classes)
Union (logical disjunction)
Class nerve cell is union of neuron, glial cell
Intersection (logical conjunction of class with properties)
Class hippocampal neurons is conjunction of things of
class Neuron and have property (has-soma-located-in)
(hippocampus union any class that is (part-of)
hippocampus)
complimentOf (logical negation)
Class ‘benign tumor’ is disjunct of class ‘malignant
tumor’
25. Properties of OWL Properties
Transitive Property
P(x,y) and P(y,z) P(x,z) subclassOf
SymmetricProperty
P(x,y) iff P(y,x) is_functionally_related_to
Functional Property
P(x,y) and P(x,z) y=z soma_located_in
inverseOf
P1(x,y) iff P2(y,x) regulates is_regulated_by
InverseFunctional Property
P(y,x) and P(z,x) y=z is_isoform_of
Cardinality
Only 0 or 1 in OWL-lite and OWL-full
26. Instances in OWL
Instances are distinct from Classes
In RDF there is no distinction between class and
instances
<Species, type, Class>
<Lion, type, Species>
<MyLion, type, Lion>
OWL DL restrictions
Type separation
Class can not also be an individual or property
Property can not also be an individual or class
is allowed in RDF
28. Querying OWL
The are several languages in the making
SPARQL engines (e.g., Virtuoso) are used often
Pellet is used for reasoning tasks
Subsumption
Consistency
New, more advanced languages like nSPARQL
are coming up
vSPARQL is being developed to enable views on
SPARQL, which will lead to nested SPARQL
queries
Our goal
Develop a query processor for these advanced
languages
Part of OntoQuest, our ontological information
29. Where does NIF stand in this?
Not every model is directly inter-convertible with every
other model
NIF is designed to
Work with multiple models
Ensure that the modeling capability and query capability of
every model is preserved in its native form
Queries in our system get translated to queries in the native
forms of the databases we federate
Express the local semantics of any data appropriately by
Augmenting the semantic model of the data
Connecting the data to NIF’s ontology
Extending the NIF ontology in the process
Develop a mechanism to create a common integrated
model over these models
this model is an ontological graph that incorporates object and
temporal semantics
30. Example of An Ontological Extension
Representing time and events
Phenotypes, physiology, …
Instants, intervals, and periods
Temporal granularity of observation
Events
Multi-temporal observations based on conditions on properties
Modeling states, objects in state, and state transitions
One-only, repeatable, and time deictic events
Subevents
History of objects, events, roles
Subtype migration, Temporal roles and role migration
Progression of disease, symptom or recovery states
Repeatability
Considering
TOWL and
Temporal
ORM