• Save
2010 10 provxg_datagovuk
Upcoming SlideShare
Loading in...5
×
 

2010 10 provxg_datagovuk

on

  • 4,916 views

Presentation on provenance use cases from data.gov.uk and introduction of opmv for the prov-xg meeting on 2010-Oct-08.

Presentation on provenance use cases from data.gov.uk and introduction of opmv for the prov-xg meeting on 2010-Oct-08.

Statistics

Views

Total Views
4,916
Views on SlideShare
4,916
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

2010 10 provxg_datagovuk 2010 10 provxg_datagovuk Presentation Transcript

  • + The Open Provenance Model Vocabulary Jun Zhao University of Oxford Jun.zhao@zoo.ox.ac.uk
  • + Outline   Background about data.gov.uk   The use cases   XML serialization   Data transformation on the fly   Complex and nested processes   Provenance of non-digital artifacts   The Open Provenance Model Vocabulary (OPMV)   The rationale   An overview   Examples   Future work   Summary
  • + data.gov.uk   Linking UK government data   Aims:   Provide a set of best practices for government agencies   Provide the minimum set of tooling and specification to facilitate the publication and consumption of data   Encourage “responsible” data publishing
  • + Downloaded
from;
 Unzipped
from,
etc
 input
 output
 Made
accessible
 XSLT Processor XSLT Parameter RDF File Binding XSLT Stylesheet Who, when, which version, XSLT Template how Contributed
by
Jeni
Tennison

  • + On-the-fly Transformation Who, when, which version, how http://mytransportatio.db/j10 Data transformation wrapper Contributed
by
Stuart
Williams

  • + Complex Data Creation Pipeline Document Reset PR GATE Pipeline ANNIE English Tokeniser ANNIE English Splitter GateXMLRegressionTransformati ANNIE POS Tagger on Data.gov.uk Morphological Analyzer Data.gov.uk Flexible GateXMLRdfaTransformation Roof Gazetteer Data.gov.uk Generic Gazeteer GATE Noun Phrase Chunker RdfaRdfXmlTransformation Data.gov.uk Generic Transducer TSO Coreference Courtesy
of
Paul
Appleby
from
TSO
(Data
Enrichment
Service)

  • Services
used
by
execuGons
 S3 S2 S1 accessedService
 wasTriggeredBy
 wasTriggeredBy
 p3 p2 p1 Level
1:
Provenance

 of
execuGon

 iteraGonOfProcess
 at
a
higher
level
 hasParentProcess
 p4 followed
 p5 p21 p22 Level
0:
Provenance

 of
execuGon
at
a
 detailed
level
 wasGeneratedBy
 wasGeneratedBy
 wasGeneratedBy
 d6 d5 d3 d2 An
arGfact
 d1 A
data
collecGon
 wasDerivedFrom
 d4
  • + Non-digital Data Objects   Organizations   Organizational structure changes over time   Origin organization, resulting Organization   Boundary   Legislation An
organizaGon
ontology:
hOp://www.epimorphics.com/public/vocabulary/org.html

  • + The Challenges   Data of different representations, of physical forms, of granularity   Not tooling support   Provenance across different types of systems   Identification   Different terminologies
  • + The Gaps   A vocabulary being able to describe provenance of all types of data, from different systems   A vocabulary providing enough terms to describe provenance accurately   Guidance on creating and publishing provenance on the Web   Tool supports for creating and publishing provenance on the Web   Provenance access
  • + The Open Provenance Model Vocabulary   Based on the Open Provenance Model   Enable “responsible” data publication, in order to trace the responsible agents and to reproduce results   Enable to describe provenance of any types of data   An alternative implementation of the OWL OPM Serialization
  • + The Rationale   Grounded upon existing SW technologies   Do not explicitly define a graph, OPMGraph   Named Graphs   Reuse existing vocabularies   Lightweight   3 classes and 12 properties   Reuse 3 classes from the W3C Time Ontology   Easy to use and extend
  • + Overview of the Vocabulary   Defined as a vocabulary expressed using OWL   Implement the core concepts of the Open Provenance Model   No specific granularity prescribed   Partitioned into:   The Core Module   Other typed modules: common, xml, gate, sparql
  • + Overview of OPMV wasDerivedFrom
 Agent wasUsedAt
 Artifact wasGeneratedAt
 wasControlledBy
 used
 wasGeneratedBy
 wasPerformedAt
 time: Process TemporalEntity 1 prefix time: http://www.w3.org/2006/time# wasTriggeredBy
 Object
properGes
implemenGng
OPM
 time:Interval time:Instant wasStartedAt
 Object
properGes
not
as
exactly

 wasEndedAt
 defined
in
OPM
 rdfs:subClassOf
relaGonships
 withRespectOf

  • + The When and Who of an Artifact _:d0         rdf:type        opmv:Artifact ;         opmv:wasGeneratedAt             _:t0 ;         opmv:wasGeneratedBy [                 rdf:type        opmv:Process ;                 opmv:wasPerformedBy             _:p0     ] . _:t0         rdf:type        time:Instant ;         time:inXSDDateTime "2010-10-07T12:09:00Z"^^xsd:dateTime ; . _:p0         rdf:type        opmv:Agent, foaf:Agent ; .
  • + The Creation of An artifact (PC 3) pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" . pc1:a3 rdf:type opmv:Artifact opmv:wasGeneratedBy [ rdf:type opmv:Process; opmv:used pc1:p1 ; opmv:wasPerformedAt [ rdf:type time:Interval ; time:hasBeginning [ a time:Instant ; time:inXSDDateTime "{PROCESS START TIME}"^^xsd:dateTime ] ; time:hasEnd time:hasBeginning [ a time:Instant ; time:inXSDDateTime "{PROCESS END TIME}"^^xsd:dateTime ] ] ].
  • + The Provenance of An Organization @Prefix org: <http://www.w3.org/ns/org#> eg:org1 rdf:type org:Organization, opmv:artifact ; org:resultedFrom [ ### subPropertyOf opmv:wasGeneratedBy rdf:type org:ChangeEvent, opmv:Process ; org:originalOrganization eg:org0 ; ### subPropertyOf opmv:used ] .
  • + Using Named Graphs for OPM Accounts pc1:gr_273 { pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" . pc1:a3 rdf:type opmv:Artifact opmv:wasGeneratedBy [ rdf:type opmv:Process; opmv:used pc1:p1 ] . pc1:p1 rdf:type opmv:Artifact . } pcl:gr_273 rdf:type <http://www.w3.org/2004/03/trix/rdfg-1/Graph> .
  • + Comparison with OPM OWL   A more intuitive OWL ontology and RDF representation   Take full advantage of SW technologies   Lack of explicit semantics for graph membership   Less expressivity, e.g. no cardinality constraints
  • + Future Development   More typed modules   A guide on how to publish provenance   Where and how much   What is the minimum provenance   How to represent the information
  • + Summary   The vocabulary is well-accepted and easy to understand for the data.gov.uk team   Experimental adoption, not yet large scale production   Missing the guidance on what provenance information to be created and published, and how   Lack of ideas about how provenance information will be used
  • This work is created by Jun Zhao and licensed under a Creative Commons Attribution-Share Alike + 3.0 License (http://creativecommons.org/ licenses/by-sa/3.0/)