2010 10 provxg_datagovuk

3,136 views

Published on

Presentation on provenance use cases from data.gov.uk and introduction of opmv for the prov-xg meeting on 2010-Oct-08.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,136
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2010 10 provxg_datagovuk

  1. 1. + The Open Provenance Model Vocabulary Jun Zhao University of Oxford Jun.zhao@zoo.ox.ac.uk
  2. 2. + Outline   Background about data.gov.uk   The use cases   XML serialization   Data transformation on the fly   Complex and nested processes   Provenance of non-digital artifacts   The Open Provenance Model Vocabulary (OPMV)   The rationale   An overview   Examples   Future work   Summary
  3. 3. + data.gov.uk   Linking UK government data   Aims:   Provide a set of best practices for government agencies   Provide the minimum set of tooling and specification to facilitate the publication and consumption of data   Encourage “responsible” data publishing
  4. 4. + Downloaded
from;
 Unzipped
from,
etc
 input
 output
 Made
accessible
 XSLT Processor XSLT Parameter RDF File Binding XSLT Stylesheet Who, when, which version, XSLT Template how Contributed
by
Jeni
Tennison

  5. 5. + On-the-fly Transformation Who, when, which version, how http://mytransportatio.db/j10 Data transformation wrapper Contributed
by
Stuart
Williams

  6. 6. + Complex Data Creation Pipeline Document Reset PR GATE Pipeline ANNIE English Tokeniser ANNIE English Splitter GateXMLRegressionTransformati ANNIE POS Tagger on Data.gov.uk Morphological Analyzer Data.gov.uk Flexible GateXMLRdfaTransformation Roof Gazetteer Data.gov.uk Generic Gazeteer GATE Noun Phrase Chunker RdfaRdfXmlTransformation Data.gov.uk Generic Transducer TSO Coreference Courtesy
of
Paul
Appleby
from
TSO
(Data
Enrichment
Service)

  7. 7. Services
used
by
execuGons
 S3 S2 S1 accessedService
 wasTriggeredBy
 wasTriggeredBy
 p3 p2 p1 Level
1:
Provenance

 of
execuGon

 iteraGonOfProcess
 at
a
higher
level
 hasParentProcess
 p4 followed
 p5 p21 p22 Level
0:
Provenance

 of
execuGon
at
a
 detailed
level
 wasGeneratedBy
 wasGeneratedBy
 wasGeneratedBy
 d6 d5 d3 d2 An
arGfact
 d1 A
data
collecGon
 wasDerivedFrom
 d4
  8. 8. + Non-digital Data Objects   Organizations   Organizational structure changes over time   Origin organization, resulting Organization   Boundary   Legislation An
organizaGon
ontology:
hOp://www.epimorphics.com/public/vocabulary/org.html

  9. 9. + The Challenges   Data of different representations, of physical forms, of granularity   Not tooling support   Provenance across different types of systems   Identification   Different terminologies
  10. 10. + The Gaps   A vocabulary being able to describe provenance of all types of data, from different systems   A vocabulary providing enough terms to describe provenance accurately   Guidance on creating and publishing provenance on the Web   Tool supports for creating and publishing provenance on the Web   Provenance access
  11. 11. + The Open Provenance Model Vocabulary   Based on the Open Provenance Model   Enable “responsible” data publication, in order to trace the responsible agents and to reproduce results   Enable to describe provenance of any types of data   An alternative implementation of the OWL OPM Serialization
  12. 12. + The Rationale   Grounded upon existing SW technologies   Do not explicitly define a graph, OPMGraph   Named Graphs   Reuse existing vocabularies   Lightweight   3 classes and 12 properties   Reuse 3 classes from the W3C Time Ontology   Easy to use and extend
  13. 13. + Overview of the Vocabulary   Defined as a vocabulary expressed using OWL   Implement the core concepts of the Open Provenance Model   No specific granularity prescribed   Partitioned into:   The Core Module   Other typed modules: common, xml, gate, sparql
  14. 14. + Overview of OPMV wasDerivedFrom
 Agent wasUsedAt
 Artifact wasGeneratedAt
 wasControlledBy
 used
 wasGeneratedBy
 wasPerformedAt
 time: Process TemporalEntity 1 prefix time: http://www.w3.org/2006/time# wasTriggeredBy
 Object
properGes
implemenGng
OPM
 time:Interval time:Instant wasStartedAt
 Object
properGes
not
as
exactly

 wasEndedAt
 defined
in
OPM
 rdfs:subClassOf
relaGonships
 withRespectOf

  15. 15. + The When and Who of an Artifact _:d0         rdf:type        opmv:Artifact ;         opmv:wasGeneratedAt             _:t0 ;         opmv:wasGeneratedBy [                 rdf:type        opmv:Process ;                 opmv:wasPerformedBy             _:p0     ] . _:t0         rdf:type        time:Instant ;         time:inXSDDateTime "2010-10-07T12:09:00Z"^^xsd:dateTime ; . _:p0         rdf:type        opmv:Agent, foaf:Agent ; .
  16. 16. + The Creation of An artifact (PC 3) pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" . pc1:a3 rdf:type opmv:Artifact opmv:wasGeneratedBy [ rdf:type opmv:Process; opmv:used pc1:p1 ; opmv:wasPerformedAt [ rdf:type time:Interval ; time:hasBeginning [ a time:Instant ; time:inXSDDateTime "{PROCESS START TIME}"^^xsd:dateTime ] ; time:hasEnd time:hasBeginning [ a time:Instant ; time:inXSDDateTime "{PROCESS END TIME}"^^xsd:dateTime ] ] ].
  17. 17. + The Provenance of An Organization @Prefix org: <http://www.w3.org/ns/org#> eg:org1 rdf:type org:Organization, opmv:artifact ; org:resultedFrom [ ### subPropertyOf opmv:wasGeneratedBy rdf:type org:ChangeEvent, opmv:Process ; org:originalOrganization eg:org0 ; ### subPropertyOf opmv:used ] .
  18. 18. + Using Named Graphs for OPM Accounts pc1:gr_273 { pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" . pc1:a3 rdf:type opmv:Artifact opmv:wasGeneratedBy [ rdf:type opmv:Process; opmv:used pc1:p1 ] . pc1:p1 rdf:type opmv:Artifact . } pcl:gr_273 rdf:type <http://www.w3.org/2004/03/trix/rdfg-1/Graph> .
  19. 19. + Comparison with OPM OWL   A more intuitive OWL ontology and RDF representation   Take full advantage of SW technologies   Lack of explicit semantics for graph membership   Less expressivity, e.g. no cardinality constraints
  20. 20. + Future Development   More typed modules   A guide on how to publish provenance   Where and how much   What is the minimum provenance   How to represent the information
  21. 21. + Summary   The vocabulary is well-accepted and easy to understand for the data.gov.uk team   Experimental adoption, not yet large scale production   Missing the guidance on what provenance information to be created and published, and how   Lack of ideas about how provenance information will be used
  22. 22. This work is created by Jun Zhao and licensed under a Creative Commons Attribution-Share Alike + 3.0 License (http://creativecommons.org/ licenses/by-sa/3.0/)

×