Your SlideShare is downloading. ×
NIF as a Multi-Model Semantic Information System
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NIF as a Multi-Model Semantic Information System

1,008
views

Published on

Amarnath Gupta …

Amarnath Gupta
NIF Webinar - March 30, 2010

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,008
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Amarnath GuptaUniversity of California San DiegoNIF as a Multi-Model SemanticInformation SystemPart 1: Relational, XML, RDF and OWL models
  • 2. Preamble – 1 As we design and extend the NIF system werecognize that Users will give us data in any form that isconvenient for them Standard data may be stored in a flat file Web service output can be in XML Semantic Web enthusiasts may represent data usingproper RDF However, regardless of the form in which datamay be represented The NIF system must treat them(query, index, relate, ...) in a uniform manner The NIF system must utilize the underlying systems
  • 3. Preamble – 2 In this presentation we intend to Explain our perspective on these differentdata models Provide a background on the data modelswe consider Offer a sense of the “semantic character” ofthese data models Present our design philosophy on Where to keep them separate Where to transform them into a common model
  • 4. What is a Data Model? A conceptual data model A formal representation of the users’/application’smental model of data elements and theirrelationships that should be put in adatabase, manipulated, queried and operated upon A logical data model A formal description of the data model in a logicalstructure that a computer can use to perform thequeries and other operations. In many cases, thesame conceptual model can be represented bydifferent logical models A physical data model An implementable version of the data model interms of data structures, access structures(e.g., indices) and the set of low-level operations
  • 5. A Conceptual ModelORM Model – Terry HalpinObjectRelationship/RoleValueConstraintUniquenessConstraintInter-relationshipConstraintValueTypen-aryRole
  • 6. A Logical Data Model A formal specification of The structure of the data The structure tells us how the data is organized(123, “Purkinje Cell”, Cerebellum)(828, Hippocampus, “Hilar Cell”) Often the structure of the data, together with someconstraints, represent some semantics If the data are not structured (like free text), the techniques forhandling them will be different. Operations on this structure Every data model is based on some mathematical principlesthat define what you can do with the data the nature of data values Data domains and data types operations on data valuesis not structured
  • 7. The Relational Data ModelNeuronID NeuronName BrainRegion NeuroTransmitterCurrent1 Purkinje Cell Cerebellum Glutamate Transient Na+2 Hilar Cell DentateGyrusGABA Ca2+ Attribute Domain all possible values the attribute cantake Candidate key: a set of columns that uniquelydetermines a row Relational model is a set (bag) of tuples model Metadata stored in a separate catalog which is alsorelational First order constraints All queries are about some combination of Selecting rows, columns Combining tables by union, intersection, join Computing data or aggregate functions Grouping and sortingTable: NeuronsAttribute nameAttribute value:Cannot be complRelation nameTuple
  • 8. Object Relational Model Eases some of the problems of the classical relationalmodel Data values can be of arbitrary data types Sets (e.g., multiple currents for a neuron) Tuples (e.g., references ordered by year) Time-series (e.g., raw EEG data) Spatial Data (e.g., atlases in CCDB) Each data type can have its own operations Find all data points within a neighborhood of a spatial location Queries are still values Catalog queries and data queries cannot be mixed in a singlequery All industrial-strength DBMSs use some version thismodel Need to be a skilled DB programmer to develop custom
  • 9. XML (Two Perspectives) Document Community data = linear text documents mark up (annotate) text pieces to describecontext, structure, semantics of the marked text<physiologicalCondition> Oxidative stress </physiologicalCondition> has beenproposed to be involved in the <biologicalProcess context=“disease”> pathogenesis</biologicalProcess> of <disease> Parkinsons disease</disease> (PD). A plausiblesource of <physiologicalCondition> oxidative stress </physiologicalCondition> in<brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> isthe redox reactions that specifically involve <chemical> dopamine </chemical> andproduce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
  • 10. XML (Two Perspectives) Database Community XML as a (most prominent) example of the semi-structured data model=> captures the whole spectrum from highlystructured, regular data to unstructured data(relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?><NDTF_Annotation><description>A new annotation file </description><timeMarker>true</timeMarker><timeResolution>0.000001</timeResolution><interval group_id="04"><eventNote timeOffset="1237888.230” attachedFile="sound1.wmv”application="realplayer">Text message for the eventstart.</eventNote><eventNote timeOffset="18958585.232">Text message for the eventend.</eventNote></interval>From the CARMEN gro
  • 11. XML as a Logical Data Model XML is a tree-structureddocument Nodes Element nodes Children can be ordered Recursive elements(parts under parts) Attribute nodes Mandatory or optional Edges Sub-element edges Attribute edges IDRef edges Constraints References Value restrictions, OneOf Cardinality• Trees are more flexible thantables• Any number of nodes can beadded anywhere withoutbreaking the model
  • 12. XML as a Logical Data Model• XML has its own schema language• Lets you specify a complex type system• A database is a collection of XML trees Storing XML Mostly relational with some very clever indexing to encodethe hierarchy, tree paths, and order Querying XML Elements, attribute names, values and structure can bequeried Multiple trees can be joined by value Example (Xpath) http://mousespinal.brain-map.org/imageseries/detail/100002661.xml Find images of the spinal column //image[//structurelabel/text()=“SPINALCOLUMN”]/ish_image_path
  • 13. Misusing and Abusing XML Using XML if your data is relational It will result in flat trees that will suffer from complexquerying Encoding orders and hierarchies that need specialparsing<Brand_Mixtures count=“2”><Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa)</Brand_Mixture_1><Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets(carbidopa + levodopa) </Brand_Mixture_2></Brand_Mixtures> Using implicit multi-valuedness<atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3"><array dictRef="cml:calcCharge" dataType="xsd:decimal"units="cml:electron">0.2 -0.30.1</array></atomArray>
  • 14. Expressing Semantics in XML Adorning elements with Namespaces A namespace is a unique URI (Uniform ResourceLocator) To disambiguate between two elements that happen to sharethe same name To group elements relating to a common idea together<item xmlns:bp="http://www.biopax.org/release/biopax-level1.owl#"><bp:protein ID="Protein1"><bp:NAME>Metalloelastase</bp:NAME><bp:XREF><bp:unificationXref rdf:ID="Xref1"><bp:ID>NP_304845</bp:ID><bp:DB>RefSeq</bp:DB></bp:unificationXref></bp:XREF></bp:protein>
  • 15. The Problem with XMLSemantics Two different XMLrepresentations of thesame kind ofinformation may notbe easily unifiable What did XML notencode?
  • 16. Resource Description Format(RDF)Rdf:statementURI(CNTFR- URI(modulates)URI(eSNCA-mediatedneurotoxicity)Rdf:typeRdf:objectRdf:predicateRdf:subjectURI(membrane-protein)Rdf:typeURI(protein-mediatedtoxicity)Rdf:typeRdf:property
  • 17. The Basic Constructs of RDF RDF meta-model basic elements All defined in rdf namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# Types (or classes) rdf:resource – everything that can be identified (with aURI) rdf:property – specialization of a resource expressing abinary relation between two resources rdf:statement – a triple with propertiesrdf:subject, rdf:predicate, rdf:object Properties rdf:type - subject is an instance of that category orclass defined by the value rdf:subject, rdf:predicate, rdf:object – relate elementsof statement tuple to a resource of type statement.
  • 18. Relational Data vis-à-vis RDF Node to edge ratio isrelatively small inmany applications Number ofrelationships need notbe fixed at design time The general tendencyis keep the number ofedge labels small Graph-basedoperations can beperformed onRDF, which requiresan unspecified numberof joins in relationaldata
  • 19. RDF Blank Nodes RDF allows one to create anonymous objects whoseexistence is known but details are not There exists some neuron to which both NeuronX andNeuronY connect <neurons:NeuronXrdf:about="http://neurons.org/Neuron#NeuronX"><conn:connectsTo><neurons:Neuron rdf:nodeID=“n1"/></conn:connectsTo></neurons:NeuronX> <neurons:NeuronYrdf:about="http://neurons.org/Neuron#NeuronY"><conn:connectsTo><neurons:Neuron rdf:nodeID=“n1"/></conn:connectsTo></neurons:NeuronY>
  • 20. RDF Schema Declaration of vocabularies classes, properties, and relationships defined by aparticular community rdfs:Class, rdfs:subClassOf Property-related rdfs:subPropertyOf relationship of properties to classes rdfs:domain, rdfs:range Provides substructure for inferences based on existingtriples NOT prescriptive, but descriptive This is different from XML Schema Schema language is an expression of basic RDF model uses meta-model constructs:resources, statements, properties
  • 21. Examples of RDF Inferencing From this we can infer (:alice rdf:type parent) (:betty rdf:type parent) (:eve rdf:type female-person) (:charles rdf:type :person)
  • 22. RDF as a Logical Data Model RDF does not distinguish between differentrelationships Instance-to-type Instance-to-instance Type-to-instance No transitivity inference is possible over, say, rdf:type RDF (as well as XML) has lost the notion of theabstract data type like spatial object or time Operations on object types does not mix well with RDF Constraints like uniqueness, 1-to-1relationships, cannot be expressed SPARQL, the query language for RDF is An edge-only language – it cannot express the //construct of XML Blank nodes are treated as variables not output in theresults Parts of the language are undecidable!A problem is undecidable if it can be proved that there can be no algorithm
  • 23. OWL Components of an OWL Ontology Vocabulary (concepts) Structure (attributes of concepts and hierarchy) Concept-to-concept, concept-to-data, property-to-property relationships Logical characteristics of relationships Domain and range restrictions Properties of relations (symmetry, transitivity) Cardinality of relations Open world vs. Closed world assumptions Contrast to most reasoning systems that assumeanything absent from knowledge base is not true Need to maintain monotonicity with tolerance forcontradictions OWL ClassesClass of all classes
  • 24. Basic OWL Constructs Creating OWL Classes disjointWith Neurons are not glial cells sameClassAs (equivalence) Class Gabaergic neuron is exactly the same class asneuronswhich has GABA as neurotransmitter Enumerations (on instances) Class Cerebellar lobules are Lobule I, Lobule II, … Boolean set semantics (on classes) Union (logical disjunction) Class nerve cell is union of neuron, glial cell Intersection (logical conjunction of class with properties) Class hippocampal neurons is conjunction of things ofclass Neuron and have property (has-soma-located-in)(hippocampus union any class that is (part-of)hippocampus) complimentOf (logical negation) Class ‘benign tumor’ is disjunct of class ‘malignanttumor’
  • 25. Properties of OWL Properties Transitive Property P(x,y) and P(y,z) P(x,z) subclassOf SymmetricProperty P(x,y) iff P(y,x) is_functionally_related_to Functional Property P(x,y) and P(x,z) y=z soma_located_in inverseOf P1(x,y) iff P2(y,x) regulates is_regulated_by InverseFunctional Property P(y,x) and P(z,x) y=z is_isoform_of Cardinality Only 0 or 1 in OWL-lite and OWL-full
  • 26. Instances in OWL Instances are distinct from Classes In RDF there is no distinction between class andinstances <Species, type, Class> <Lion, type, Species> <MyLion, type, Lion> OWL DL restrictions Type separation Class can not also be an individual or property Property can not also be an individual or classis allowed in RDF
  • 27. A Rough Comparison~RDF and OWL do not represent n-ary roles
  • 28. Querying OWL The are several languages in the making SPARQL engines (e.g., Virtuoso) are used often Pellet is used for reasoning tasks Subsumption Consistency New, more advanced languages like nSPARQLare coming up vSPARQL is being developed to enable views onSPARQL, which will lead to nested SPARQLqueries Our goal Develop a query processor for these advancedlanguages Part of OntoQuest, our ontological information
  • 29. Where does NIF stand in this? Not every model is directly inter-convertible with everyother model NIF is designed to Work with multiple models Ensure that the modeling capability and query capability ofevery model is preserved in its native form Queries in our system get translated to queries in the nativeforms of the databases we federate Express the local semantics of any data appropriately by Augmenting the semantic model of the data Connecting the data to NIF’s ontology Extending the NIF ontology in the process Develop a mechanism to create a common integratedmodel over these models this model is an ontological graph that incorporates object andtemporal semantics
  • 30. Example of An Ontological Extension Representing time and events Phenotypes, physiology, … Instants, intervals, and periods Temporal granularity of observation Events Multi-temporal observations based on conditions on properties Modeling states, objects in state, and state transitions One-only, repeatable, and time deictic events Subevents History of objects, events, roles Subtype migration, Temporal roles and role migration Progression of disease, symptom or recovery states RepeatabilityConsideringTOWL andTemporalORM
  • 31. Questions?