BRII Project @ Bristol 21/09/2009


   Frankenstein
  ontologies, and
the models that use
       them.
First things first
- what the main issues are with this sort of
information.
Which leads quickly to:
- how we internally r...
The Problem:

Information about research and the people
     involved in it changes all the time.

   Otherwise, it wouldn...
The Problem:

Information about research and the people
     involved in it changes all the time.

   Otherwise, it wouldn...
Don't forget, Information about Things should
           always include context.

  - Where did the information come from?...
Validity and context

            Ranges from the simple:

“Jane Smith (born Jane Doe) publishes under the
            nam...
Validity and context

              To the precise:




“Jane Doe, 35, married July 1995 to Richard
                Smith....
How we cope
The system that holds the canonical version of the
Things metadata does not provide the query
technologies we ...
How we cope
We are not bound to any one way of looking at or
indexing our data.


Currently, we use RDF to serialise our i...
The basic model (axioms)
●   There are Things
●   These Things can change over time
●   These Things can hold information ...
Current Implementation
●   Bag of “Stuff” → Object-based storage
    ●   Fedora http://www.fedora-commons.org/, or
    ●  ...
“Revisions, personas, etc”
●   Things can have different information about
    them which is valid in different situations...
Current Implementation
●
    Eg (from a MANIFEST describing a named graph)
....
 <rdf:Description rdf:about="info:fedora/o...
Current Implementation
●   Some of the important context for the graph at
    “info:fedora/ora:1/first":
  <ov:validUntil ...
Contexts to think about
    Context is not restricted to serialised graphs!
●   dcterms:source
●   dcterms:creator
●   foa...
The Dr Frankenstein approach
●   “A little from here, a     ●   SKOS and
    little from there –            taxonomies lik...
The Dr Frankenstein approach
    Not forgetting Dr Frankenstein added bits and pieces of his
    own devising
●   http://v...
The CERIF question
   Bottom line is that CERIF is an Interchange
format – originally conceived to allow commercial
  mana...
The CERIF question


 CERIF will allow to to be shared with a similar
system (IMHO it will be like sharing a SQL dump
    ...
Upcoming SlideShare
Loading in...5
×

Choices, modelling and Frankenstein Ontologies

911

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
911
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Choices, modelling and Frankenstein Ontologies

  1. 1. BRII Project @ Bristol 21/09/2009 Frankenstein ontologies, and the models that use them.
  2. 2. First things first - what the main issues are with this sort of information. Which leads quickly to: - how we internally represent Things and why we do it the way we do. - the vocabularies we use - and how we are helping the university to contribute back
  3. 3. The Problem: Information about research and the people involved in it changes all the time. Otherwise, it wouldn't be research.
  4. 4. The Problem: Information about research and the people involved in it changes all the time. Otherwise, it wouldn't be research.
  5. 5. Don't forget, Information about Things should always include context. - Where did the information come from? - From whom? - How old is it? - How valid is it? - Who can see it? - When is it valid?
  6. 6. Validity and context Ranges from the simple: “Jane Smith (born Jane Doe) publishes under the name George Maxwell” They are all names after all, only the context lets us tell them apart – when to use them and when not.
  7. 7. Validity and context To the precise: “Jane Doe, 35, married July 1995 to Richard Smith... “
  8. 8. How we cope The system that holds the canonical version of the Things metadata does not provide the query technologies we are using at any given time. This allows for a clear separation of concepts and information from the forms in which we wish to ask questions of them.
  9. 9. How we cope We are not bound to any one way of looking at or indexing our data. Currently, we use RDF to serialise our information in the store and lucene(Solr) and quadstore(4Store) indexes.
  10. 10. The basic model (axioms) ● There are Things ● These Things can change over time ● These Things can hold information that is valid in certain contexts ● Not just valid at a point in time ● (Things can hold more than just metadata – it's a bag of “stuff”)
  11. 11. Current Implementation ● Bag of “Stuff” → Object-based storage ● Fedora http://www.fedora-commons.org/, or ● Pairtree FS-based, or ● 'Bucket' (SUN Honeycomb, OpenStorage, Amazon S3) ● Contains ROOT and MANIFEST (serialised graphs) ● ROOT contains the RDF triples that are globally true (identifiers, birthnames, that sort of thing mainly) ● MANIFEST contains triples describing the other objects in the bag, and their relationship to other objects/resource in or out of the bag. ● Most importantly, the MANIFEST contains the context of the other parts of the bag.
  12. 12. “Revisions, personas, etc” ● Things can have different information about them which is valid in different situations. ● Publication Thing (like an article) can have revisions ● Person can have personas (between 1995 and 2003 person X published under 'Dr Jones') ● All things are allowed this capability and the current implementation handles these using named graphs in the bag.
  13. 13. Current Implementation ● Eg (from a MANIFEST describing a named graph) .... <rdf:Description rdf:about="info:fedora/ora:1/first"> <ov:validUntil rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-09- 20T09:31:45.847065</ov:validUntil> <ov:validFrom rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009- 09-19T09:31:45.846982</ov:validFrom> <rdf:type rdf:resource="http://www.w3.org/2004/03/trix/rdfg-1/Graph"/> <dc:format>application/rdf+xml</dc:format> <dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-09- 21T09:31:45.848289</dcterms:created> <foaf:primaryTopic rdf:resource="info:fedora/ora:1"/> </rdf:Description>
  14. 14. Current Implementation ● Some of the important context for the graph at “info:fedora/ora:1/first": <ov:validUntil [..] >2009-09-20T09:31:45.847065</ ov:validUntil> <ov:validFrom [..] >2009-09-19T09:31:45.846982</ ov:validFrom> <rdf:type rdf:resource="http://www.w3.org/2004/03/ trix/rdfg-1/Graph"/> <foaf:primaryTopic rdf:resource="info:fedora/ora:1"/>
  15. 15. Contexts to think about Context is not restricted to serialised graphs! ● dcterms:source ● dcterms:creator ● foaf:depiction ● dcterms:subject ● Geo:* ● Evidence (who stated this assertion with what evidence?)
  16. 16. The Dr Frankenstein approach ● “A little from here, a ● SKOS and little from there – taxonomies likeLcsh making sure that the (Library of Congress whole works...” subject headings) ● Foaf ● Hartig and Zhao's ● Bio provenance ● Bibo, Dcterms and DC ontology - ● RES – Researcher http://purl.org/NET/ ontology – Ann Bowtell, Katie provenance/guide
  17. 17. The Dr Frankenstein approach Not forgetting Dr Frankenstein added bits and pieces of his own devising ● http://vocab.ox.ac.uk - a home for ontologies, taxonomies, software to be used with them, and information about them ● Activities are afoot to gather domain area taxonomies and to provide simple APIs to maintain these for normal researchers. ● Ehumanities ● Maths ● The HASSET theasaurus (we are in contact with them, but legal uncertainty on their part is holding things up)
  18. 18. The CERIF question Bottom line is that CERIF is an Interchange format – originally conceived to allow commercial management systems to interchange data en masse. It does have certain design flaws due to its relational database legacy and lowest common denominator approach Unfortunately, I foresee many people saying 'we need a CERIF system' and contractors giving them just that – a system that uses an interchange format as it's datastore format.
  19. 19. The CERIF question CERIF will allow to to be shared with a similar system (IMHO it will be like sharing a SQL dump between versions of wordpress) Linked Data starts with the premise that it is sharing the data already with anyone with a webbrowser.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×