Introduction
                    An Event-Centric Model
                                Summary




An Event-Centric Provenance Model for Digital
                  Libraries

       C. Tang        D. Castelli L. Candela                      P. Manghi
                       P. Pagano C. Thanos
 Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR, Pisa - Italy
                         name.surname@isti.cnr.it

     6th Italian Research Conference on Digital Libraries
                Padua, Italy, 28-29 January 2010



                             C. Tang et al.   An Event-Centric Provenance Model
Introduction
                   An Event-Centric Model
                               Summary


Outline



  1   Introduction
         Motivations


  2   An Event-Centric Model
        The Constituents
        Exploiting the Model




                            C. Tang et al.   An Event-Centric Provenance Model
Introduction
                   An Event-Centric Model    Motivations
                               Summary


What is Provenance?



  Some pseudo-definitions:
      “a summary of the history and context of the data”
      “the parts of the input that influenced (or that explain) a
      part of the output”
      “the part of the input that shows where a part of the output
      came from”
      “a causal graph that shows how a result was computed”




                            C. Tang et al.   An Event-Centric Provenance Model
Introduction
                    An Event-Centric Model    Motivations
                                Summary


What is Provenance?



     Provenance is thus information about
          source, derivation, influences, history
     . . . of an object
          program result, database query
     In e-Science (thus in DLs), it is essential for
          efficiency, reproducibility, accountability, explanation, data
          cleaning, certifying scientific value of data




                             C. Tang et al.   An Event-Centric Provenance Model
Introduction
                   An Event-Centric Model    Motivations
                               Summary


What is the Problem?


  Many models are being developed
      Where-provenance, links output parts to equal input parts
      Why-provenance, explains “why” some data appears in the
      result
      How-provenance, explains “how” a result was calculated
      Workflow, describes result of a parallel/distributed program
  . . . using different assumptions, e.g. system scope, program,
  granularity
  Our goal: develop a “non invasive” and “open” model
  supporting “provenance generation”



                            C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Idea


  Add a layer dedicated to capture provenance-oriented data

                                                             Reference Objects




                                                             Information Objects




                                                              Events




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  Event is a happening having an effect on a Reference Object
  <happenedTo> an Object




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  Each Event has a Type for filtering purposes




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  Description captures the “how” of the Event




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  Place captures the “where” of the Event




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  Time captures the “when” of the Event




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  The Agent controls the Event




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  Rationale captures the “why” of the Event




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


The Model

  The Parameter is any additional information




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                             The Constituents
                   An Event-Centric Model
                                             Exploiting the Model
                               Summary


The Model

  Don’t reinvent the wheel!!!




                            C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                           The Constituents
             An Event-Centric Model
                                           Exploiting the Model
                         Summary


Computing the provenance



                                       4                                1




                  5
                                                3                   2




                      C. Tang et al.       An Event-Centric Provenance Model
Introduction
                                             The Constituents
                   An Event-Centric Model
                                             Exploiting the Model
                               Summary


The granularity issue

  High flexibility by relying on the Information Object relationships

                                                              Reference Objects

                       part-of


                                                              Information Objects




                                                               Events




                            C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                           The Constituents
                 An Event-Centric Model
                                           Exploiting the Model
                             Summary


The AquaMaps scenario



     AquaMaps is one of the VRE supported by the D4Science
     e-Infrastructure
         Aggregate data on species from multiple and evolving data
         sources (e.g. OBIS, GBIF)
         Curate aggregated data
         Generate species distribution and biodiversity prediction
         maps




                          C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


Example 1



  Find the events occurred to the Salmon object




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


Example 2



  Find the contributors to the Salmon object




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                                            The Constituents
                  An Event-Centric Model
                                            Exploiting the Model
                              Summary


Example 3



  How to explain the existence of the Salmon object




                           C. Tang et al.   An Event-Centric Provenance Model
Introduction
                An Event-Centric Model
                            Summary


Summary

    Provenance is an essential feature in Digital Libraries and
    eScience scenarios
    Many provenance models are being developed using
    different assumptions
    A DL oriented provenance model that is event-based,
    “open” and “non invasive”
    Future steps
        validation and consolidation of the model in the context of
        new DLs application scenarios
        implementation of a infrastructural service realising the
        model in the D4Science infrastructure




                         C. Tang et al.   An Event-Centric Provenance Model
Introduction
                An Event-Centric Model
                            Summary


Summary
    Provenance is an essential feature in Digital Libraries and
    eScience scenarios
    Many provenance models are being developed using
    different assumptions
    A DL oriented provenance model that is event-based,
    “open” and “non invasive”
    Future steps
        validation and consolidation of the model in the context of
        new DLs application scenarios
        implementation of a infrastructural service realising the
        model in the D4Science infrastructure



               http://www.d4science.eu
                 http://www.dlorg.eu
                         C. Tang et al.   An Event-Centric Provenance Model

An Event-Centric Provenance Model for Digital Libraries @ IRCDL 2010

  • 1.
    Introduction An Event-Centric Model Summary An Event-Centric Provenance Model for Digital Libraries C. Tang D. Castelli L. Candela P. Manghi P. Pagano C. Thanos Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR, Pisa - Italy name.surname@isti.cnr.it 6th Italian Research Conference on Digital Libraries Padua, Italy, 28-29 January 2010 C. Tang et al. An Event-Centric Provenance Model
  • 2.
    Introduction An Event-Centric Model Summary Outline 1 Introduction Motivations 2 An Event-Centric Model The Constituents Exploiting the Model C. Tang et al. An Event-Centric Provenance Model
  • 3.
    Introduction An Event-Centric Model Motivations Summary What is Provenance? Some pseudo-definitions: “a summary of the history and context of the data” “the parts of the input that influenced (or that explain) a part of the output” “the part of the input that shows where a part of the output came from” “a causal graph that shows how a result was computed” C. Tang et al. An Event-Centric Provenance Model
  • 4.
    Introduction An Event-Centric Model Motivations Summary What is Provenance? Provenance is thus information about source, derivation, influences, history . . . of an object program result, database query In e-Science (thus in DLs), it is essential for efficiency, reproducibility, accountability, explanation, data cleaning, certifying scientific value of data C. Tang et al. An Event-Centric Provenance Model
  • 5.
    Introduction An Event-Centric Model Motivations Summary What is the Problem? Many models are being developed Where-provenance, links output parts to equal input parts Why-provenance, explains “why” some data appears in the result How-provenance, explains “how” a result was calculated Workflow, describes result of a parallel/distributed program . . . using different assumptions, e.g. system scope, program, granularity Our goal: develop a “non invasive” and “open” model supporting “provenance generation” C. Tang et al. An Event-Centric Provenance Model
  • 6.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Idea Add a layer dedicated to capture provenance-oriented data Reference Objects Information Objects Events C. Tang et al. An Event-Centric Provenance Model
  • 7.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Event is a happening having an effect on a Reference Object <happenedTo> an Object C. Tang et al. An Event-Centric Provenance Model
  • 8.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Each Event has a Type for filtering purposes C. Tang et al. An Event-Centric Provenance Model
  • 9.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Description captures the “how” of the Event C. Tang et al. An Event-Centric Provenance Model
  • 10.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Place captures the “where” of the Event C. Tang et al. An Event-Centric Provenance Model
  • 11.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Time captures the “when” of the Event C. Tang et al. An Event-Centric Provenance Model
  • 12.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model The Agent controls the Event C. Tang et al. An Event-Centric Provenance Model
  • 13.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Rationale captures the “why” of the Event C. Tang et al. An Event-Centric Provenance Model
  • 14.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model The Parameter is any additional information C. Tang et al. An Event-Centric Provenance Model
  • 15.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The Model Don’t reinvent the wheel!!! C. Tang et al. An Event-Centric Provenance Model
  • 16.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary Computing the provenance 4 1 5 3 2 C. Tang et al. An Event-Centric Provenance Model
  • 17.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The granularity issue High flexibility by relying on the Information Object relationships Reference Objects part-of Information Objects Events C. Tang et al. An Event-Centric Provenance Model
  • 18.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary The AquaMaps scenario AquaMaps is one of the VRE supported by the D4Science e-Infrastructure Aggregate data on species from multiple and evolving data sources (e.g. OBIS, GBIF) Curate aggregated data Generate species distribution and biodiversity prediction maps C. Tang et al. An Event-Centric Provenance Model
  • 19.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary Example 1 Find the events occurred to the Salmon object C. Tang et al. An Event-Centric Provenance Model
  • 20.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary Example 2 Find the contributors to the Salmon object C. Tang et al. An Event-Centric Provenance Model
  • 21.
    Introduction The Constituents An Event-Centric Model Exploiting the Model Summary Example 3 How to explain the existence of the Salmon object C. Tang et al. An Event-Centric Provenance Model
  • 22.
    Introduction An Event-Centric Model Summary Summary Provenance is an essential feature in Digital Libraries and eScience scenarios Many provenance models are being developed using different assumptions A DL oriented provenance model that is event-based, “open” and “non invasive” Future steps validation and consolidation of the model in the context of new DLs application scenarios implementation of a infrastructural service realising the model in the D4Science infrastructure C. Tang et al. An Event-Centric Provenance Model
  • 23.
    Introduction An Event-Centric Model Summary Summary Provenance is an essential feature in Digital Libraries and eScience scenarios Many provenance models are being developed using different assumptions A DL oriented provenance model that is event-based, “open” and “non invasive” Future steps validation and consolidation of the model in the context of new DLs application scenarios implementation of a infrastructural service realising the model in the D4Science infrastructure http://www.d4science.eu http://www.dlorg.eu C. Tang et al. An Event-Centric Provenance Model