Provenance Requirements for the
               Next Version of RDF

                          Christian Bizer         Yolanda Gil
     Jun Zhao            Freie Universität     ISI, University of
University of Oxford          Berlin          Southern California

                                       Satya Sahoo
          Paolo Missier              Kno.e.sis Center,
     University of Manchester      Wright State University



                       W3C Provenance Incubator Group
Outline
●   Background
●   Gathering provenance requirements
    ●   From both user and technical perspectives
    ●   From three dimensions: content, management and use
●   Requirement to RDF
●   Further information
The W3C Provenance Incubator Group
●   http://www.w3.org/2005/Incubator/prov/
●   Formed in September 2009 as part of the W3C
    Semantic Web Activity
●   Aim to provide
    ●   A state-of-the art understanding, and
    ●   A roadmap in the area of provenance for Semantic Web
        technologies, development, and possible
        standardization
A Definition of Web Provenance

The initial sources of information used as well
as any entity and process involved in
producing a data item

The data can be any web resource: a document,
an image, a dataset, an RDF statement or a set
of RDF graphs, ....
The Importance of Provenance
The Importance of Provenance
The Key Idea
●   We require additional capabilities that the current
    standard RDF model does not offer
    ●   Identity management of RDF statements
    ●   Annotation framework
    ●   ....
●   For interoperability we require standardized
    vocabularies and best practices for provenance
    descriptions
Where do our requirements come from?
Activities So Far
●   Collected >30 provenance use cases
●   Defined provenance dimensions
    ●   Content: attribution, evolution, process, entailment, etc
    ●   Management: publication, access, scalability, etc
    ●   Use: interoperability, trust, understanding, debugging, etc
●   A provenance requirement document
    ●   Three flagship use cases
    ●   http://www.w3.org/2005/Incubator/prov/wiki/User_Requirements


                         http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Dimensions
What are the requirements?
Requirements from Provenance Content:
       What Needs to Be Represented
Requirement 1: Identity
●   Ability to refer to the resource being described
    ●   An area of an image, an RDF graph, a set of RDF graphs...
●   Resolving equality
Requirement 2: Evolution
●   How different versions are related
●   What transformations were applied
●   Best practices for minting new URIs
Requirements from Provenance Content:
       What Needs to Be Represented

Requirement 3: Entailment
●   Represent the distinction between asserted versus
    inferred provenance
Requirements from Provenance
                Management
Requirement 4: Publication
●   Linking provenance assertions with the resource
    ●   How to publish provenance: embed or link?
●   Associate publisher’s identification (e.g., digital
    signature)
Requirement 5: Querying
●   Query formulation: may mix references to the resource
    and to its provenance
●   Efficient query execution
Requirements from Provenance Use
●   No requirements were uncovered
State of the Art
●   Extension/alternatives to RDF models
    ●   RDF reification
        –   Querying is cumbersome
        –   Others ...
    ●   Named Graphs
    ●   OWL annotations
    ●   RDF molecules, Temporal RDF, PaCE Model ....




                 http://www.w3.org/2005/Incubator/prov/wiki/Relevant_Technologies
State of the Art (Cont.)
●   Vocabularies/ontologies to express provenance
    information
    ●   The Open Provenance Model (OPM)
    ●   Inference Web - Open Proof Language (PML)
    ●   The Provenance Vocabulary
    ●   Dublin Core
    ●   Open Archives Initiative - Object Reuse and Exchange (OAI-ORE)
    ●   Semantic Web Publishing Vocabulary
    ●   The SWAN-SIOC alignment
    ●   The Changeset Vocabulary
    ●   .......    http://www.w3.org/2005/Incubator/prov/wiki/Relevant_Technologies
Provenance Requirements to the
            RDF Community
●   Identification
    ●   Of any artifact, be a resource, a single RDF statement,
        a set of RDF statements or Web resources
    ●   Identity management
●   Annotations of RDF graphs
●   Standardized schemata, ontologies and vocabularies
Activities Ongoing
●   Mapping key terms from various provenance-related
    vocabularies
●   Report on the state-of-the-art in the area of
    provenance
See Also
●   The incubator group:
    http://www.w3.org/2005/Incubator/prov/
●   Provenance requirement document:
    http://www.w3.org/2005/Incubator/prov/wiki/User_Req
    uirements
●   Mapping provenance-related vocabularies:
    http://www.w3.org/2005/Incubator/prov/wiki/Provenanc
    e_Vocabulary_Mappings
Special thanks to members and invited experts of the
 W3C Provenance Incubator Group and UK EPSRC


           This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
  (http://creativecommons.org/licenses/by-sa/3.0/)

2010 06 rdf_next

  • 1.
    Provenance Requirements forthe Next Version of RDF Christian Bizer Yolanda Gil Jun Zhao Freie Universität ISI, University of University of Oxford Berlin Southern California Satya Sahoo Paolo Missier Kno.e.sis Center, University of Manchester Wright State University W3C Provenance Incubator Group
  • 2.
    Outline ● Background ● Gathering provenance requirements ● From both user and technical perspectives ● From three dimensions: content, management and use ● Requirement to RDF ● Further information
  • 3.
    The W3C ProvenanceIncubator Group ● http://www.w3.org/2005/Incubator/prov/ ● Formed in September 2009 as part of the W3C Semantic Web Activity ● Aim to provide ● A state-of-the art understanding, and ● A roadmap in the area of provenance for Semantic Web technologies, development, and possible standardization
  • 4.
    A Definition ofWeb Provenance The initial sources of information used as well as any entity and process involved in producing a data item The data can be any web resource: a document, an image, a dataset, an RDF statement or a set of RDF graphs, ....
  • 5.
  • 6.
  • 7.
    The Key Idea ● We require additional capabilities that the current standard RDF model does not offer ● Identity management of RDF statements ● Annotation framework ● .... ● For interoperability we require standardized vocabularies and best practices for provenance descriptions
  • 8.
    Where do ourrequirements come from?
  • 9.
    Activities So Far ● Collected >30 provenance use cases ● Defined provenance dimensions ● Content: attribution, evolution, process, entailment, etc ● Management: publication, access, scalability, etc ● Use: interoperability, trust, understanding, debugging, etc ● A provenance requirement document ● Three flagship use cases ● http://www.w3.org/2005/Incubator/prov/wiki/User_Requirements http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Dimensions
  • 10.
    What are therequirements?
  • 11.
    Requirements from ProvenanceContent: What Needs to Be Represented Requirement 1: Identity ● Ability to refer to the resource being described ● An area of an image, an RDF graph, a set of RDF graphs... ● Resolving equality Requirement 2: Evolution ● How different versions are related ● What transformations were applied ● Best practices for minting new URIs
  • 12.
    Requirements from ProvenanceContent: What Needs to Be Represented Requirement 3: Entailment ● Represent the distinction between asserted versus inferred provenance
  • 13.
    Requirements from Provenance Management Requirement 4: Publication ● Linking provenance assertions with the resource ● How to publish provenance: embed or link? ● Associate publisher’s identification (e.g., digital signature) Requirement 5: Querying ● Query formulation: may mix references to the resource and to its provenance ● Efficient query execution
  • 14.
    Requirements from ProvenanceUse ● No requirements were uncovered
  • 15.
    State of theArt ● Extension/alternatives to RDF models ● RDF reification – Querying is cumbersome – Others ... ● Named Graphs ● OWL annotations ● RDF molecules, Temporal RDF, PaCE Model .... http://www.w3.org/2005/Incubator/prov/wiki/Relevant_Technologies
  • 16.
    State of theArt (Cont.) ● Vocabularies/ontologies to express provenance information ● The Open Provenance Model (OPM) ● Inference Web - Open Proof Language (PML) ● The Provenance Vocabulary ● Dublin Core ● Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) ● Semantic Web Publishing Vocabulary ● The SWAN-SIOC alignment ● The Changeset Vocabulary ● ....... http://www.w3.org/2005/Incubator/prov/wiki/Relevant_Technologies
  • 17.
    Provenance Requirements tothe RDF Community ● Identification ● Of any artifact, be a resource, a single RDF statement, a set of RDF statements or Web resources ● Identity management ● Annotations of RDF graphs ● Standardized schemata, ontologies and vocabularies
  • 18.
    Activities Ongoing ● Mapping key terms from various provenance-related vocabularies ● Report on the state-of-the-art in the area of provenance
  • 19.
    See Also ● The incubator group: http://www.w3.org/2005/Incubator/prov/ ● Provenance requirement document: http://www.w3.org/2005/Incubator/prov/wiki/User_Req uirements ● Mapping provenance-related vocabularies: http://www.w3.org/2005/Incubator/prov/wiki/Provenanc e_Vocabulary_Mappings
  • 20.
    Special thanks tomembers and invited experts of the W3C Provenance Incubator Group and UK EPSRC This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)

Editor's Notes

  • #12 When you want to say something about a set of RDF triples. Not only the content but also their publication and querying. - interoperability - standard way
  • #13 When you want to say something about a set of RDF triples. Not only the content but also their publication and querying. - interoperability - standard way
  • #14 When you want to say something about a set of RDF triples. Not only the content but also their publication and querying. - interoperability - standard way