Data: evolution● Increased complexity (2/2) ● Clinical records are not what they used to be :-) ● From plain text to structured info ● Refer to external sources (ICD,...) ● Multimedia (pacemaker, images, 3D) ● Soon: genetic info, link to ancestors EHR...
Data: evolution● Increased sharing/reuse ● Possible now that data are available electronically ● Cumulative effect (specially in complex domains such as bio, with lots of inter- dependencies) ● Sometimes in purposes not originally forseen
Data● Increased quantity● Increased complexity● Increased sharing/reuseShifting from direct consumption by humansto consumption by program(s) for otherprograms or for humans
Requirement1: data annotation● Proxy so that the whole dataset does not have to be examined at each query ● Annotations can be difficult or time-consuming to produce ● Easier or faster or better results when considering the annotations instead of the data● Share not only the data, but their annotations as well! ● Annotations become data of their own (although we seldom annotate them :-)
Requirement1: data annotationFiguring out the correct and relevantinformation is easy for Homo Sapiens... Ex: how much does “The Semantic Web primer” costs?
Requirement1: data annotationFiguring out the correct and relevantinformation is easy for Homo Sapiens...... but difficult for Programus Simplex
Requirement2 : data integrationAggregate and compose information Ex: how old were the Nebula award winners when they won the prize? Ex: how many books had they published? Ex: average age of the canadian Nebula winners
Requirement3 : data interpretation Google query on owl Retrieve all the pictures of a sailing boat in a harbor in Brittany Retrieve all the radiological exams of a fracture of the leg
Requirement3: data interpretationGoogle for “owl” Noise : owl (bird) VS. owl (DL language) Silence : a page mentioning “Web Ontology Language” but not “OWL” would be ignoredHow about looking for an OWL ontologyabout owls (the birds)? :-)Annotations are great but not enoughThe meaning associated to these annotations isimportant too
Requirement3: data interpretation Retrieve all the pictures of a sailing boat in a harbor in Brittany :-):-( :-)
Ontologies: what they areOntologies: formal representation of ashared conceptualization [Gruber] [Chandrasekaran]Annotations underlying structureOftentimes, everything that is implicit in a factualdocument (clinical record, factual report...)
Ontologies: what they are notOntology (the branch of philosophy)Controlled vocabulary, terminologies,...(although both are useful)Sets of annotated data (genericity is the key)
Ontologies: principlesIndividuals: things They are instances of classes We hardly see them in ontologies (genericity)... … except when they represent things that are widely reused (e.g. geographic entities
Ontologies: principlesProperties: binary relation btw individuals Ontologies can specify domain and range Additional features : transitivity, functionnality, symmetry, reflexivity,...
Ontologies: principlesClasses: sets of things (think genericity) e.g. Rabbit (as opposed to Bugs Bunny) Organized hierarchically (taxonomy) from the more general to the more specific (multiple inherit. ok) Inheritance of properties True path rule: if class A annotates some data, then all the ancestors of A are also valid annotations (so if you tag a picture as BugsBunny, you do not need to mention Rabbit, CartoonCharacter,...) Can represent constraints on the properties of their instances
Data and ontologies: example rdfs:subClassOf Sci-Fi CLASSES Book Book General knowledge (RDFS realm) rdf:type rdf:type INSTANCE(S) Dune Data-specific, No generalization (RDF realm)
Data and ontologies: example The semantics of RDFS allows us to infer that Dune is an instance of Book! rdfs:subClassOf (so we do not need Book Sci-Fi to say it explicitly in Book the RDF file anymore) rdf:type rdf:type Dune
Data and ontologies: exampleLitterat. Sci-Fi Book Award Award Person rdfs:subClassOf rdfs:subClassOf rdfs:subClassOf Country Nebula Sci-Fi Author Award Book rdf:type rdf:type United rdf:type rdf:type States rdf:type Dune citizenOf Nebula authorOf Award wonAward Frank 1965 Herbert
SynthesisAnnotations are important for efficient datadescription Integration (incl. future reuse) Interpretration Focus on describing data as precisely as possibleOntologies are important for interpreting thesedescription General knowledge about a domain Reusable Support automatic reasoning
SynthesisBuilding ontologies is difficult We have a strong experience in building bad ontologies… but having a wide adoption is more important The lesson learned from Gene Ontology