Your SlideShare is downloading. ×
NIF as a Multi-Model Semantic Information System
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NIF as a Multi-Model Semantic Information System


Published on

Amarnath Gupta …

Amarnath Gupta
NIF Webinar - March 30, 2010

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Amarnath GuptaUniversity of California San DiegoNIF as a Multi-Model SemanticInformation SystemPart 1: Relational, XML, RDF and OWL models
  • 2. Preamble – 1 As we design and extend the NIF system werecognize that Users will give us data in any form that isconvenient for them Standard data may be stored in a flat file Web service output can be in XML Semantic Web enthusiasts may represent data usingproper RDF However, regardless of the form in which datamay be represented The NIF system must treat them(query, index, relate, ...) in a uniform manner The NIF system must utilize the underlying systems
  • 3. Preamble – 2 In this presentation we intend to Explain our perspective on these differentdata models Provide a background on the data modelswe consider Offer a sense of the “semantic character” ofthese data models Present our design philosophy on Where to keep them separate Where to transform them into a common model
  • 4. What is a Data Model? A conceptual data model A formal representation of the users’/application’smental model of data elements and theirrelationships that should be put in adatabase, manipulated, queried and operated upon A logical data model A formal description of the data model in a logicalstructure that a computer can use to perform thequeries and other operations. In many cases, thesame conceptual model can be represented bydifferent logical models A physical data model An implementable version of the data model interms of data structures, access structures(e.g., indices) and the set of low-level operations
  • 5. A Conceptual ModelORM Model – Terry HalpinObjectRelationship/RoleValueConstraintUniquenessConstraintInter-relationshipConstraintValueTypen-aryRole
  • 6. A Logical Data Model A formal specification of The structure of the data The structure tells us how the data is organized(123, “Purkinje Cell”, Cerebellum)(828, Hippocampus, “Hilar Cell”) Often the structure of the data, together with someconstraints, represent some semantics If the data are not structured (like free text), the techniques forhandling them will be different. Operations on this structure Every data model is based on some mathematical principlesthat define what you can do with the data the nature of data values Data domains and data types operations on data valuesis not structured
  • 7. The Relational Data ModelNeuronID NeuronName BrainRegion NeuroTransmitterCurrent1 Purkinje Cell Cerebellum Glutamate Transient Na+2 Hilar Cell DentateGyrusGABA Ca2+ Attribute Domain all possible values the attribute cantake Candidate key: a set of columns that uniquelydetermines a row Relational model is a set (bag) of tuples model Metadata stored in a separate catalog which is alsorelational First order constraints All queries are about some combination of Selecting rows, columns Combining tables by union, intersection, join Computing data or aggregate functions Grouping and sortingTable: NeuronsAttribute nameAttribute value:Cannot be complRelation nameTuple
  • 8. Object Relational Model Eases some of the problems of the classical relationalmodel Data values can be of arbitrary data types Sets (e.g., multiple currents for a neuron) Tuples (e.g., references ordered by year) Time-series (e.g., raw EEG data) Spatial Data (e.g., atlases in CCDB) Each data type can have its own operations Find all data points within a neighborhood of a spatial location Queries are still values Catalog queries and data queries cannot be mixed in a singlequery All industrial-strength DBMSs use some version thismodel Need to be a skilled DB programmer to develop custom
  • 9. XML (Two Perspectives) Document Community data = linear text documents mark up (annotate) text pieces to describecontext, structure, semantics of the marked text<physiologicalCondition> Oxidative stress </physiologicalCondition> has beenproposed to be involved in the <biologicalProcess context=“disease”> pathogenesis</biologicalProcess> of <disease> Parkinsons disease</disease> (PD). A plausiblesource of <physiologicalCondition> oxidative stress </physiologicalCondition> in<brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> isthe redox reactions that specifically involve <chemical> dopamine </chemical> andproduce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
  • 10. XML (Two Perspectives) Database Community XML as a (most prominent) example of the semi-structured data model=> captures the whole spectrum from highlystructured, regular data to unstructured data(relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?><NDTF_Annotation><description>A new annotation file </description><timeMarker>true</timeMarker><timeResolution>0.000001</timeResolution><interval group_id="04"><eventNote timeOffset="1237888.230” attachedFile="sound1.wmv”application="realplayer">Text message for the eventstart.</eventNote><eventNote timeOffset="18958585.232">Text message for the eventend.</eventNote></interval>From the CARMEN gro
  • 11. XML as a Logical Data Model XML is a tree-structureddocument Nodes Element nodes Children can be ordered Recursive elements(parts under parts) Attribute nodes Mandatory or optional Edges Sub-element edges Attribute edges IDRef edges Constraints References Value restrictions, OneOf Cardinality• Trees are more flexible thantables• Any number of nodes can beadded anywhere withoutbreaking the model
  • 12. XML as a Logical Data Model• XML has its own schema language• Lets you specify a complex type system• A database is a collection of XML trees Storing XML Mostly relational with some very clever indexing to encodethe hierarchy, tree paths, and order Querying XML Elements, attribute names, values and structure can bequeried Multiple trees can be joined by value Example (Xpath) Find images of the spinal column //image[//structurelabel/text()=“SPINALCOLUMN”]/ish_image_path
  • 13. Misusing and Abusing XML Using XML if your data is relational It will result in flat trees that will suffer from complexquerying Encoding orders and hierarchies that need specialparsing<Brand_Mixtures count=“2”><Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa)</Brand_Mixture_1><Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets(carbidopa + levodopa) </Brand_Mixture_2></Brand_Mixtures> Using implicit multi-valuedness<atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3"><array dictRef="cml:calcCharge" dataType="xsd:decimal"units="cml:electron">0.2 -0.30.1</array></atomArray>
  • 14. Expressing Semantics in XML Adorning elements with Namespaces A namespace is a unique URI (Uniform ResourceLocator) To disambiguate between two elements that happen to sharethe same name To group elements relating to a common idea together<item xmlns:bp=""><bp:protein ID="Protein1"><bp:NAME>Metalloelastase</bp:NAME><bp:XREF><bp:unificationXref rdf:ID="Xref1"><bp:ID>NP_304845</bp:ID><bp:DB>RefSeq</bp:DB></bp:unificationXref></bp:XREF></bp:protein>
  • 15. The Problem with XMLSemantics Two different XMLrepresentations of thesame kind ofinformation may notbe easily unifiable What did XML notencode?
  • 16. Resource Description Format(RDF)Rdf:statementURI(CNTFR- URI(modulates)URI(eSNCA-mediatedneurotoxicity)Rdf:typeRdf:objectRdf:predicateRdf:subjectURI(membrane-protein)Rdf:typeURI(protein-mediatedtoxicity)Rdf:typeRdf:property
  • 17. The Basic Constructs of RDF RDF meta-model basic elements All defined in rdf namespace Types (or classes) rdf:resource – everything that can be identified (with aURI) rdf:property – specialization of a resource expressing abinary relation between two resources rdf:statement – a triple with propertiesrdf:subject, rdf:predicate, rdf:object Properties rdf:type - subject is an instance of that category orclass defined by the value rdf:subject, rdf:predicate, rdf:object – relate elementsof statement tuple to a resource of type statement.
  • 18. Relational Data vis-à-vis RDF Node to edge ratio isrelatively small inmany applications Number ofrelationships need notbe fixed at design time The general tendencyis keep the number ofedge labels small Graph-basedoperations can beperformed onRDF, which requiresan unspecified numberof joins in relationaldata
  • 19. RDF Blank Nodes RDF allows one to create anonymous objects whoseexistence is known but details are not There exists some neuron to which both NeuronX andNeuronY connect <neurons:NeuronXrdf:about=""><conn:connectsTo><neurons:Neuron rdf:nodeID=“n1"/></conn:connectsTo></neurons:NeuronX> <neurons:NeuronYrdf:about=""><conn:connectsTo><neurons:Neuron rdf:nodeID=“n1"/></conn:connectsTo></neurons:NeuronY>
  • 20. RDF Schema Declaration of vocabularies classes, properties, and relationships defined by aparticular community rdfs:Class, rdfs:subClassOf Property-related rdfs:subPropertyOf relationship of properties to classes rdfs:domain, rdfs:range Provides substructure for inferences based on existingtriples NOT prescriptive, but descriptive This is different from XML Schema Schema language is an expression of basic RDF model uses meta-model constructs:resources, statements, properties
  • 21. Examples of RDF Inferencing From this we can infer (:alice rdf:type parent) (:betty rdf:type parent) (:eve rdf:type female-person) (:charles rdf:type :person)
  • 22. RDF as a Logical Data Model RDF does not distinguish between differentrelationships Instance-to-type Instance-to-instance Type-to-instance No transitivity inference is possible over, say, rdf:type RDF (as well as XML) has lost the notion of theabstract data type like spatial object or time Operations on object types does not mix well with RDF Constraints like uniqueness, 1-to-1relationships, cannot be expressed SPARQL, the query language for RDF is An edge-only language – it cannot express the //construct of XML Blank nodes are treated as variables not output in theresults Parts of the language are undecidable!A problem is undecidable if it can be proved that there can be no algorithm
  • 23. OWL Components of an OWL Ontology Vocabulary (concepts) Structure (attributes of concepts and hierarchy) Concept-to-concept, concept-to-data, property-to-property relationships Logical characteristics of relationships Domain and range restrictions Properties of relations (symmetry, transitivity) Cardinality of relations Open world vs. Closed world assumptions Contrast to most reasoning systems that assumeanything absent from knowledge base is not true Need to maintain monotonicity with tolerance forcontradictions OWL ClassesClass of all classes
  • 24. Basic OWL Constructs Creating OWL Classes disjointWith Neurons are not glial cells sameClassAs (equivalence) Class Gabaergic neuron is exactly the same class asneuronswhich has GABA as neurotransmitter Enumerations (on instances) Class Cerebellar lobules are Lobule I, Lobule II, … Boolean set semantics (on classes) Union (logical disjunction) Class nerve cell is union of neuron, glial cell Intersection (logical conjunction of class with properties) Class hippocampal neurons is conjunction of things ofclass Neuron and have property (has-soma-located-in)(hippocampus union any class that is (part-of)hippocampus) complimentOf (logical negation) Class ‘benign tumor’ is disjunct of class ‘malignanttumor’
  • 25. Properties of OWL Properties Transitive Property P(x,y) and P(y,z) P(x,z) subclassOf SymmetricProperty P(x,y) iff P(y,x) is_functionally_related_to Functional Property P(x,y) and P(x,z) y=z soma_located_in inverseOf P1(x,y) iff P2(y,x) regulates is_regulated_by InverseFunctional Property P(y,x) and P(z,x) y=z is_isoform_of Cardinality Only 0 or 1 in OWL-lite and OWL-full
  • 26. Instances in OWL Instances are distinct from Classes In RDF there is no distinction between class andinstances <Species, type, Class> <Lion, type, Species> <MyLion, type, Lion> OWL DL restrictions Type separation Class can not also be an individual or property Property can not also be an individual or classis allowed in RDF
  • 27. A Rough Comparison~RDF and OWL do not represent n-ary roles
  • 28. Querying OWL The are several languages in the making SPARQL engines (e.g., Virtuoso) are used often Pellet is used for reasoning tasks Subsumption Consistency New, more advanced languages like nSPARQLare coming up vSPARQL is being developed to enable views onSPARQL, which will lead to nested SPARQLqueries Our goal Develop a query processor for these advancedlanguages Part of OntoQuest, our ontological information
  • 29. Where does NIF stand in this? Not every model is directly inter-convertible with everyother model NIF is designed to Work with multiple models Ensure that the modeling capability and query capability ofevery model is preserved in its native form Queries in our system get translated to queries in the nativeforms of the databases we federate Express the local semantics of any data appropriately by Augmenting the semantic model of the data Connecting the data to NIF’s ontology Extending the NIF ontology in the process Develop a mechanism to create a common integratedmodel over these models this model is an ontological graph that incorporates object andtemporal semantics
  • 30. Example of An Ontological Extension Representing time and events Phenotypes, physiology, … Instants, intervals, and periods Temporal granularity of observation Events Multi-temporal observations based on conditions on properties Modeling states, objects in state, and state transitions One-only, repeatable, and time deictic events Subevents History of objects, events, roles Subtype migration, Temporal roles and role migration Progression of disease, symptom or recovery states RepeatabilityConsideringTOWL andTemporalORM
  • 31. Questions?