NIF as a Multi-Model Semantic Information System

Amarnath Gupta
University of California San Diego
NIF as a Multi-Model Semantic
Information System
Part 1: Relational, XML, RDF and OWL models

Preamble – 1
 As we design and extend the NIF system we
recognize that
 Users will give us data in any form that is
convenient for them
 Standard data may be stored in a flat file
 Web service output can be in XML
 Semantic Web enthusiasts may represent data using
proper RDF
 However, regardless of the form in which data
may be represented
 The NIF system must treat them
(query, index, relate, ...) in a uniform manner
 The NIF system must utilize the underlying systems

Preamble – 2
 In this presentation we intend to
 Explain our perspective on these different
data models
 Provide a background on the data models
we consider
 Offer a sense of the “semantic character” of
these data models
 Present our design philosophy on
 Where to keep them separate
 Where to transform them into a common model

What is a Data Model?
 A conceptual data model
 A formal representation of the users’/application’s
mental model of data elements and their
relationships that should be put in a
database, manipulated, queried and operated upon
 A logical data model
 A formal description of the data model in a logical
structure that a computer can use to perform the
queries and other operations. In many cases, the
same conceptual model can be represented by
different logical models
 A physical data model
 An implementable version of the data model in
terms of data structures, access structures
(e.g., indices) and the set of low-level operations

A Conceptual Model
ORM Model – Terry Halpin
Object
Relationship/
Role
Value
Constraint
Uniqueness
Constraint
Inter-relationship
Constraint
Value
Type
n-ary
Role

A Logical Data Model
 A formal specification of
 The structure of the data
 The structure tells us how the data is organized
(123, “Purkinje Cell”, Cerebellum)
(828, Hippocampus, “Hilar Cell”)
 Often the structure of the data, together with some
constraints, represent some semantics
 If the data are not structured (like free text), the techniques for
handling them will be different.
 Operations on this structure
 Every data model is based on some mathematical principles
that define what you can do with the data
 the nature of data values
 Data domains and data types
 operations on data values
is not structured

The Relational Data Model
NeuronID NeuronName BrainRegion NeuroTransmitte
r
Current
1 Purkinje Cell Cerebellum Glutamate Transient Na+
2 Hilar Cell Dentate
Gyrus
GABA Ca2+
 Attribute Domain all possible values the attribute can
take
 Candidate key: a set of columns that uniquely
determines a row
 Relational model is a set (bag) of tuples model
 Metadata stored in a separate catalog which is also
relational
 First order constraints
 All queries are about some combination of
 Selecting rows, columns
 Combining tables by union, intersection, join
 Computing data or aggregate functions
 Grouping and sorting
Table: Neurons
Attribute name
Attribute value:
Cannot be compl
Relation name
Tuple

Object Relational Model
 Eases some of the problems of the classical relational
model
 Data values can be of arbitrary data types
 Sets (e.g., multiple currents for a neuron)
 Tuples (e.g., references ordered by year)
 Time-series (e.g., raw EEG data)
 Spatial Data (e.g., atlases in CCDB)
 Each data type can have its own operations
 Find all data points within a neighborhood of a spatial location
 Queries are still values
 Catalog queries and data queries cannot be mixed in a single
query
 All industrial-strength DBMSs use some version this
model
 Need to be a skilled DB programmer to develop custom

XML (Two Perspectives)
 Document Community
 data = linear text documents
 mark up (annotate) text pieces to describe
context, structure, semantics of the marked text
<physiologicalCondition> Oxidative stress </physiologicalCondition> has been
proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis
</biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible
source of <physiologicalCondition> oxidative stress </physiologicalCondition> in
<brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is
the redox reactions that specifically involve <chemical> dopamine </chemical> and
produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.

XML (Two Perspectives)
 Database Community
 XML as a (most prominent) example of the semi-
structured data model
=> captures the whole spectrum from highly
structured, regular data to unstructured data
(relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?>
<NDTF_Annotation>
<description>A new annotation file </description>
<timeMarker>true</timeMarker>
<timeResolution>0.000001</timeResolution>
<interval group_id="04">
<eventNote timeOffset="1237888.230” attachedFile="sound1.wmv”
application="realplayer">Text message for the event
start.</eventNote>
<eventNote timeOffset="18958585.232">Text message for the event
end.</eventNote>
</interval>
From the CARMEN gro

XML as a Logical Data Model
 XML is a tree-structured
document
 Nodes
 Element nodes
 Children can be ordered
 Recursive elements
(parts under parts)
 Attribute nodes
 Mandatory or optional
 Edges
 Sub-element edges
 Attribute edges
 IDRef edges
 Constraints
 References
 Value restrictions, OneOf
 Cardinality
• Trees are more flexible than
tables
• Any number of nodes can be
added anywhere without
breaking the model

XML as a Logical Data Model
• XML has its own schema language
• Lets you specify a complex type system
• A database is a collection of XML trees
 Storing XML
 Mostly relational with some very clever indexing to encode
the hierarchy, tree paths, and order
 Querying XML
 Elements, attribute names, values and structure can be
queried
 Multiple trees can be joined by value
 Example (Xpath)
 http://mousespinal.brain-
map.org/imageseries/detail/100002661.xml
 Find images of the spinal column
 //image[//structurelabel/text()=“SPINAL
COLUMN”]/ish_image_path

Misusing and Abusing XML
 Using XML if your data is relational
 It will result in flat trees that will suffer from complex
querying
 Encoding orders and hierarchies that need special
parsing
<Brand_Mixtures count=“2”>
<Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa)
</Brand_Mixture_1>
<Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets
(carbidopa + levodopa) </Brand_Mixture_2>
</Brand_Mixtures>
 Using implicit multi-valuedness
<atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3">
<array dictRef="cml:calcCharge" dataType="xsd:decimal"
units="cml:electron">0.2 -0.3
0.1</array>
</atomArray>

Expressing Semantics in XML
 Adorning elements with Namespaces
 A namespace is a unique URI (Uniform Resource
Locator)
 To disambiguate between two elements that happen to share
the same name
 To group elements relating to a common idea together
<item xmlns:bp="http://www.biopax.org/release/biopax-
level1.owl#">
<bp:protein ID="Protein1">
<bp:NAME>Metalloelastase</bp:NAME>
<bp:XREF>
<bp:unificationXref rdf:ID="Xref1">
<bp:ID>NP_304845</bp:ID>
<bp:DB>RefSeq</bp:DB>
</bp:unificationXref>
</bp:XREF>
</bp:protein>

The Problem with XML
Semantics
 Two different XML
representations of the
same kind of
information may not
be easily unifiable
 What did XML not
encode?

Resource Description Format
(RDF)
Rdf:statement
URI(CNTFR- URI(modulat
es)
URI(eSNCA-
mediated
neurotoxicity)
Rdf:type
Rdf:object
Rdf:predicate
Rdf:subject
URI(membra
ne-protein)
Rdf:type
URI(protein-
mediated
toxicity)
Rdf:type
Rdf:property

The Basic Constructs of RDF
 RDF meta-model basic elements
 All defined in rdf namespace
 http://www.w3.org/1999/02/22-rdf-syntax-ns#
 Types (or classes)
 rdf:resource – everything that can be identified (with a
URI)
 rdf:property – specialization of a resource expressing a
binary relation between two resources
 rdf:statement – a triple with properties
rdf:subject, rdf:predicate, rdf:object
 Properties
 rdf:type - subject is an instance of that category or
class defined by the value
 rdf:subject, rdf:predicate, rdf:object – relate elements
of statement tuple to a resource of type statement.

Relational Data vis-à-vis RDF
 Node to edge ratio is
relatively small in
many applications
 Number of
relationships need not
be fixed at design time
 The general tendency
is keep the number of
edge labels small
 Graph-based
operations can be
performed on
RDF, which requires
an unspecified number
of joins in relational
data

RDF Blank Nodes
 RDF allows one to create anonymous objects whose
existence is known but details are not
 There exists some neuron to which both NeuronX and
NeuronY connect
 <neurons:NeuronX
rdf:about="http://neurons.org/Neuron#NeuronX">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronX>
 <neurons:NeuronY
rdf:about="http://neurons.org/Neuron#NeuronY">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronY>

RDF Schema
 Declaration of vocabularies
 classes, properties, and relationships defined by a
particular community
 rdfs:Class, rdfs:subClassOf
 Property-related
 rdfs:subPropertyOf
 relationship of properties to classes
 rdfs:domain, rdfs:range
 Provides substructure for inferences based on existing
triples
 NOT prescriptive, but descriptive
 This is different from XML Schema
 Schema language is an expression of basic RDF model
 uses meta-model constructs:
resources, statements, properties

Examples of RDF Inferencing
 From this we can infer
 (:alice rdf:type parent)
 (:betty rdf:type parent)
 (:eve rdf:type female-person)
 (:charles rdf:type :person)

RDF as a Logical Data Model
 RDF does not distinguish between different
relationships
 Instance-to-type
 Instance-to-instance
 Type-to-instance
 No transitivity inference is possible over, say, rdf:type
 RDF (as well as XML) has lost the notion of the
abstract data type like spatial object or time
 Operations on object types does not mix well with RDF
 Constraints like uniqueness, 1-to-1
relationships, cannot be expressed
 SPARQL, the query language for RDF is
 An edge-only language – it cannot express the //
construct of XML
 Blank nodes are treated as variables not output in the
results
 Parts of the language are undecidable!
A problem is undecidable if it can be proved that there can be no algorithm

OWL
 Components of an OWL Ontology
 Vocabulary (concepts)
 Structure (attributes of concepts and hierarchy)
 Concept-to-concept, concept-to-data, property-to-
property relationships
 Logical characteristics of relationships
 Domain and range restrictions
 Properties of relations (symmetry, transitivity)
 Cardinality of relations
 Open world vs. Closed world assumptions
 Contrast to most reasoning systems that assume
anything absent from knowledge base is not true
 Need to maintain monotonicity with tolerance for
contradictions
 OWL Classes
Class of all classes

Basic OWL Constructs
 Creating OWL Classes
 disjointWith
 Neurons are not glial cells
 sameClassAs (equivalence)
 Class Gabaergic neuron is exactly the same class as
neuronswhich has GABA as neurotransmitter
 Enumerations (on instances)
 Class Cerebellar lobules are Lobule I, Lobule II, …
 Boolean set semantics (on classes)
 Union (logical disjunction)
 Class nerve cell is union of neuron, glial cell
 Intersection (logical conjunction of class with properties)
 Class hippocampal neurons is conjunction of things of
class Neuron and have property (has-soma-located-in)
(hippocampus union any class that is (part-of)
hippocampus)
 complimentOf (logical negation)
 Class ‘benign tumor’ is disjunct of class ‘malignant
tumor’

Properties of OWL Properties
 Transitive Property
 P(x,y) and P(y,z) P(x,z) subclassOf
 SymmetricProperty
 P(x,y) iff P(y,x) is_functionally_related_to
 Functional Property
 P(x,y) and P(x,z) y=z soma_located_in
 inverseOf
 P1(x,y) iff P2(y,x) regulates is_regulated_by
 InverseFunctional Property
 P(y,x) and P(z,x) y=z is_isoform_of
 Cardinality
 Only 0 or 1 in OWL-lite and OWL-full

Instances in OWL
 Instances are distinct from Classes
 In RDF there is no distinction between class and
instances
 <Species, type, Class>
 <Lion, type, Species>
 <MyLion, type, Lion>
 OWL DL restrictions
 Type separation
 Class can not also be an individual or property
 Property can not also be an individual or class
is allowed in RDF

A Rough Comparison
~
RDF and OWL do not represent n-ary roles

Querying OWL
 The are several languages in the making
 SPARQL engines (e.g., Virtuoso) are used often
 Pellet is used for reasoning tasks
 Subsumption
 Consistency
 New, more advanced languages like nSPARQL
are coming up
 vSPARQL is being developed to enable views on
SPARQL, which will lead to nested SPARQL
queries
 Our goal
 Develop a query processor for these advanced
languages
 Part of OntoQuest, our ontological information

Where does NIF stand in this?
 Not every model is directly inter-convertible with every
other model
 NIF is designed to
 Work with multiple models
 Ensure that the modeling capability and query capability of
every model is preserved in its native form
 Queries in our system get translated to queries in the native
forms of the databases we federate
 Express the local semantics of any data appropriately by
 Augmenting the semantic model of the data
 Connecting the data to NIF’s ontology
 Extending the NIF ontology in the process
 Develop a mechanism to create a common integrated
model over these models
 this model is an ontological graph that incorporates object and
temporal semantics

Example of An Ontological Extension
 Representing time and events
 Phenotypes, physiology, …
 Instants, intervals, and periods
 Temporal granularity of observation
 Events
 Multi-temporal observations based on conditions on properties
 Modeling states, objects in state, and state transitions
 One-only, repeatable, and time deictic events
 Subevents
 History of objects, events, roles
 Subtype migration, Temporal roles and role migration
 Progression of disease, symptom or recovery states
 Repeatability
Considering
TOWL and
Temporal
ORM

NIF as a Multi-Model Semantic Information System

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to NIF as a Multi-Model Semantic Information System

Similar to NIF as a Multi-Model Semantic Information System (20)

More from Neuroscience Information Framework

More from Neuroscience Information Framework (20)

Recently uploaded

Recently uploaded (20)

NIF as a Multi-Model Semantic Information System