SlideShare a Scribd company logo
Scientific Workflow Management System


             Janus
             Provenance


Research
objects,
myExperiment,
and

Open
Provenance
for
collabora;ve
E‐science
                                               REPRISE
workshop
‐
IDCC’09
   Paolo Missier
    Information Management Group
    School of Computer Science, University of Manchester, UK


            with additional material by Sean Bechhofer and Matthew Gamble,
                                e-Labs design group, University of Manchester
                                                                                        1
                                                          IDCC’09, London - P.Missier
Momentum on sharing and collaboration
  Special issue of Nature on Data Sharing (Sept. 2009)




The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168–
169 (2009)
Prepublication data sharing:
Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9
September 2009          http://www.nature.com/news/specials/datasharing/index.html 2
                                                               IDCC’09, London - P.Missier
Momentum on sharing and collaboration
  Special issue of Nature on Data Sharing (Sept. 2009)
                          • timeliness requires rapid sharing
                          • repurposing
                          • the Human Genome project use case




The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168–
169 (2009)
Prepublication data sharing:
Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9
September 2009          http://www.nature.com/news/specials/datasharing/index.html 2
                                                               IDCC’09, London - P.Missier
Momentum on sharing and collaboration
  Special issue of Nature on Data Sharing (Sept. 2009)
                          • timeliness requires rapid sharing
                          • repurposing
                          • the Human Genome project use case
• Ongoing debate in several communities
   – Clinical trials [1]
   – Earth Sciences -- ESIP - data preservation / stewardship, 2009
   – Long established in some communities - Atmospheric sciences,
     1998 [2]
• Science Commons recommendations for Open Science
   – Open Science recommendations from Science Commons (July 2008) [link]

The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168–
169 (2009)
Prepublication data sharing:
Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9
September 2009          http://www.nature.com/news/specials/datasharing/index.html 2
                                                               IDCC’09, London - P.Missier
Reference scenario

   workflow             workflow
      +                 execution
input dataset
 specification




                                    3
Reference scenario

       workflow             workflow
          +                 execution
    input dataset
     specification




?




                                        3
Reference scenario

       workflow                workflow
          +                    execution
    input dataset
     specification




?
                                 outcome
                     outcome   (provenance)
                      (data)




                                           3
Reference scenario

       workflow                workflow
          +                    execution
    input dataset
     specification




?
                                 outcome
                     outcome   (provenance)
                      (data)




                                   Research
                                    Object
                                   Packaging


                                           3
Reference scenario

       workflow                workflow
          +                    execution
    input dataset
     specification




?
                                 outcome
                     outcome   (provenance)
                      (data)




                                   Research
                                    Object
                                   Packaging


                                           3
Reference scenario

                  workflow                workflow
                     +                    execution
               input dataset
                specification




?
                                            outcome
                                outcome   (provenance)
                                 (data)




     browse                                   Research
      query                                    Object
    unbundle                                  Packaging
      reuse

                                                      3
Reference scenario

                  workflow                workflow
                     +                    execution
               input dataset
                specification




?
      Data-mediated                         outcome
          implicit              outcome   (provenance)
       collaboration             (data)




     browse                                   Research
      query                                    Object
    unbundle                                  Packaging
      reuse

                                                      3
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Paul’s

Paul’s
Pack
               QTL


 Research

  Object




              Common pathways
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper



                Results
                                 Common pathways
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper


                                             Representation

                Results
                                 Common pathways
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper


                                             Representation

                Results                      Domain Relations


                                 Common pathways
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper

                        produces         Published in

                                                                    Representation

                              Results                               Domain Relations


                                                        Common pathways
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper

                        produces         Published in

                                                                    Representation

                              Results                               Domain Relations

                                                                    Aggregation
                                                        Common pathways
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper
 Metadata               produces         Published in

                                                                    Representation

                              Results                               Domain Relations

                                                                    Aggregation
                                                        Common pathways
ORE: representing generic aggregations




Resource Map                                        Data structure
(descriptor)



   http://www.openarchives.org/ore/1.0/primer.html section 4
A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations:
Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for
Information Science and Technology (JASIST), to appear, 2009.




                                                                                    6
Content: Workflow provenance




A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced




                                  8
Content: Workflow provenance




A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced




                                  8
Content: Workflow provenance




                                  A detailed trace of workflow execution
        lister
                                  - tasks performed, data transformations
                  get pathways
                   by genes1      - inputs used, outputs produced
                 merge pathways



     gene_id


    concat gene pathway ids

        output




pathway_genes
                                                                    8
Why provenance matters, if done right
• To establish quality, relevance, trust
• To track information attribution through complex transformations
• To describe one’s experiment to others, for understanding / reuse
• To provide evidence in support of scientific claims
• To enable post hoc process analysis for improvement, re-design




The W3C Incubator on Provenance has been collecting numerous use cases:
http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases#




                                                      IDCC’09, London - P.Missier
What users expect to learn

                                  • Causal relations:
                                    - which pathways come from which genes?
                                    - which processes contributed to producing an
        lister                          image?
                                    -   which process(es) caused data to be incorrect?
                  get pathways
                   by genes1
                                    -   which data caused a process to fail?

                 merge pathways   • Process and data analytics:
                                    – analyze variations in output vs an input
     gene_id                          parameter sweep (multiple process runs)
                                    – how often has my favourite service been
    concat gene pathway ids           executed? on what inputs?
                                    – who produced this data?
        output
                                    – how often does this pathway turn up when the
                                      input genes range over a certain set S?

pathway_genes
                                                                                 10
                                                              IDCC’09, London - P.Missier
Open Provenance Model
• graph of causal dependencies involving data and processors
• not necessarily generated by a workflow!
• v1.0.1 currently open for comments


      wasGeneratedBy (R)
 A                                P
                                                     Goal:
       used (R)
 P                     A                             standardize causal dependencies
                                           to enable provenance metadata exchange

                                           wgb(R5)
 A1     wgb(R1)        used(R3)       A3             P1
                  P3
                                           wgb(R6)
 A2     wgb(R2)        used(R4)       A4             P2



                                                                                              11
                                                                IDCC’09, London - P.Missier
The 3rd provenance challenge

• Chosen workflow from the Pan-STARRS project
  – Panoramic Survey Telescope & Rapid Response Syste


• http://twiki.ipaw.info/bin/view/Challenge/
  ThirdProvenanceChallenge



• Goal:
  – demonstrate “provenance interoperability” at query level




                                                                              12
                                                IDCC’09, London - P.Missier
The 3rd provenance challenge workflow


read input file




load database




verify




                                13
The 3rd provenance challenge workflow


read input file




load database




verify




                                13
OPM and query-interoperability
Team A
                                 prov(WA)
 encode W                                    execute
                   run WA
   as WA                                     query Q




            OPM(prov(WA))         export    Q(prov(WA))
                                 prov(WA)




                                                          14
OPM and query-interoperability
Team A
                                     prov(WA)
 encode W                                            execute
                   run WA
   as WA                                             query Q




            OPM(prov(WA))             export        Q(prov(WA))
                                     prov(WA)




Team B
                                                     Q(PWA)


                                    PWA =
                            import(OPM(prov(WA)))

                                                      execute
              import
                                                      query Q
                                                                  14
OPM and query-interoperability
Team A
                                     prov(WA)
 encode W                                            execute
                   run WA
   as WA                                             query Q




            OPM(prov(WA))             export        Q(prov(WA))
                                     prov(WA)



                                                           ?
Team B
                                                     Q(PWA)


                                    PWA =
                            import(OPM(prov(WA)))

                                                      execute
              import
                                                      query Q
                                                                  14
OPM in Taverna
     skippable




                 15
OPM in Taverna
     skippable




                 15
OPM in Taverna
                                                     skippable

➡ the answer to any TP query can be viewed as an OPM graph
➡ encoded as RDF/XML (using the Tupelo provenance API)




                                                                 15
Additional requirements




                  16
Additional requirements
• Artifact values require uniform common identifier
  scheme
  – each group used artifacts to refer to its own data results
  – but those results were expressed using proprietary
    naming conventions
  – Linked Data in OPM?




                                                                 16
Additional requirements
• Artifact values require uniform common identifier
  scheme
  – each group used artifacts to refer to its own data results
  – but those results were expressed using proprietary
    naming conventions
  – Linked Data in OPM?

• OPM accounts for structural causal relationships
  – additional domain-specific knowledge required
  – attaching semantic annotations to OPM graph nodes




                                                                 16
Additional requirements
• Artifact values require uniform common identifier
  scheme
  – each group used artifacts to refer to its own data results
  – but those results were expressed using proprietary
    naming conventions
  – Linked Data in OPM?

• OPM accounts for structural causal relationships
  – additional domain-specific knowledge required
  – attaching semantic annotations to OPM graph nodes

• OPM graphs can grow very large
  – reduce size by exporting only query results
     • Taverna approach
  – multiple levels of abstraction
     • through OPM accounts (“points of view”)                   16
Query results as OPM graphs

                              prov(WA)
encode W                                  execute
                  run WA
  as WA                                   query Q




           OPM(prov(WA))       export    Q(prov(WA))
                              prov(WA)
Query results as OPM graphs

                              prov(WA)
encode W                                  execute
                  run WA
  as WA                                   query Q




           OPM(prov(WA))       export    Q(prov(WA))
                              prov(WA)
Query results as OPM graphs

                              prov(WA)
encode W                                    execute
                  run WA
  as WA                                     query Q




           OPM(prov(WA))       export      Q(prov(WA))
                             Q(prov(WA))
Query results as OPM graphs

                             prov(WA)
encode W                                   execute
                run WA
  as WA                                    query Q




       OPM(Q(prov(WA)))       export      Q(prov(WA))
                            Q(prov(WA))
Query results as OPM graphs

                                 prov(WA)
 encode W                                       execute
                  run WA
   as WA                                        query Q




         OPM(Q(prov(WA)))        export        Q(prov(WA))
                               Q(prov(WA))




- Approach implemented in Taverna 2.1
- Internal provenance DB with ad hoc query language
- To be released soon
Full-fledged data-mediated collaborations

 exp. A            workflow A +
                     input A

                     Research
                      Object result
              result    A
                          provenance
             datasets             A
                A




                                       18
Full-fledged data-mediated collaborations

 exp. A            workflow A +
                     input A

                     Research
                      Object result
              result    A
                          provenance
             datasets             A
                A




                                       18
Full-fledged data-mediated collaborations

                     exp. A         workflow A +
                                      input A

                                      Research
                                       Object result
                               result    A
                                           provenance
                              datasets             A
                                 A




result A → input B



                                                        18
Full-fledged data-mediated collaborations

                     exp. A             workflow A +
                                          input A

                                          Research
                                           Object result
                                   result    A
                                               provenance
                                  datasets             A
                                     A




                                        workflow B+
                                          input B

                                          Research
                                           Object result
                         exp. B    result    B
                                               provenance
result A → input B                datasets             B
                                     B


                                                            18
Full-fledged data-mediated collaborations




                workflow A +
                  input A      workflow B +
                                 inputB
     result A → input B
                       Research
        result          Object  result
       datasets result   A+B provenance
           A    datasets            A+B
                   B




                                              18
Full-fledged data-mediated collaborations




                workflow A +
                  input A      workflow B +
                                 inputB
     result A → input B
                       Research
        result          Object  result
       datasets result   A+B provenance
           A    datasets            A+B
                   B

      Provenance composition
       accounts for implicit
          collaboration



                                              18
Full-fledged data-mediated collaborations




                                workflow A +
                                  input A      workflow B +
                                                 inputB
                     result A → input B
                                       Research
                        result          Object  result
                       datasets result   A+B provenance
                           A    datasets            A+B
                                   B

                      Provenance composition
                       accounts for implicit
                          collaboration


Aligned with focus of upcoming Provenance Challenge 4:
“connect my provenance to yours" into a whole OPM provenance graph.
                                                                 18
Contacts


The myGrid Consortium (Manchester, Southampton)

                     http://mygrid.org.uk


                     http://www.myexperiment.org



       Janus         Me: pmissier@acm.org
       Provenance




                                               19

More Related Content

What's hot

Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using Clustering
IJORCS
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief Analytics
DataTactics
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
IJORCS
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
BrianDeCost
 
Dmdw1
Dmdw1Dmdw1
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PyData
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
Anubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
Anubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05
John Cobb
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
Shintaro Fukushima
 
Statistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data UsageStatistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data Usage
Markus Luczak-Rösch
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
Anubhav Jain
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 

What's hot (15)

Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using Clustering
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief Analytics
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Dmdw1
Dmdw1Dmdw1
Dmdw1
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
 
Statistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data UsageStatistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data Usage
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 

Viewers also liked

Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Paolo Missier
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
Paolo Missier
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
Paolo Missier
 
Session talk @ AGU09
Session talk @ AGU09Session talk @ AGU09
Session talk @ AGU09
Paolo Missier
 
PDT: Personal Data from Things, and its provenance
PDT: Personal Data from Things,and its provenancePDT: Personal Data from Things,and its provenance
PDT: Personal Data from Things, and its provenance
Paolo Missier
 
Structured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperStructured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paper
Paolo Missier
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paolo Missier
 
SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management
Paolo Missier
 
Ipaw12 datalog paper talk
Ipaw12 datalog paper talkIpaw12 datalog paper talk
Ipaw12 datalog paper talk
Paolo Missier
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paolo Missier
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Paolo Missier
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphs
Paolo Missier
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
Paolo Missier
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Paolo Missier
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
Paolo Missier
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Paolo Missier
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
Paolo Missier
 

Viewers also liked (17)

Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
Session talk @ AGU09
Session talk @ AGU09Session talk @ AGU09
Session talk @ AGU09
 
PDT: Personal Data from Things, and its provenance
PDT: Personal Data from Things,and its provenancePDT: Personal Data from Things,and its provenance
PDT: Personal Data from Things, and its provenance
 
Structured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperStructured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paper
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
 
SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management
 
Ipaw12 datalog paper talk
Ipaw12 datalog paper talkIpaw12 datalog paper talk
Ipaw12 datalog paper talk
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphs
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 

Similar to Invited talk @ DCC09 workshop

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
National Information Standards Organization (NISO)
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
Anubhav Jain
 
Calder palgrave uksg2
Calder palgrave uksg2Calder palgrave uksg2
Calder palgrave uksg2
Calder palgrave uksg2Calder palgrave uksg2
The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011
Kaitlin Thaney
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
David De Roure
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
tsbbbu
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
Daniel JACOB
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
LIBER Europe
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
Jason Hattrick-Simpers
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
Rob Grim
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Jian Qin
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
jodischneider
 
Linking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersLinking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papers
Anita de Waard
 
Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...
CITE
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
Sherry Lake
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
Ian Foster
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
raj_vij
 

Similar to Invited talk @ DCC09 workshop (20)

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Calder palgrave uksg2
Calder palgrave uksg2Calder palgrave uksg2
Calder palgrave uksg2
 
Calder palgrave uksg2
Calder palgrave uksg2Calder palgrave uksg2
Calder palgrave uksg2
 
The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
 
Linking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersLinking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papers
 
Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 

More from Paolo Missier

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
Paolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 

More from Paolo Missier (20)

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

Invited talk @ DCC09 workshop

  • 1. Scientific Workflow Management System Janus Provenance Research
objects,
myExperiment,
and
 Open
Provenance
for
collabora;ve
E‐science REPRISE
workshop
‐
IDCC’09 Paolo Missier Information Management Group School of Computer Science, University of Manchester, UK with additional material by Sean Bechhofer and Matthew Gamble, e-Labs design group, University of Manchester 1 IDCC’09, London - P.Missier
  • 2. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168– 169 (2009) Prepublication data sharing: Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9 September 2009 http://www.nature.com/news/specials/datasharing/index.html 2 IDCC’09, London - P.Missier
  • 3. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168– 169 (2009) Prepublication data sharing: Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9 September 2009 http://www.nature.com/news/specials/datasharing/index.html 2 IDCC’09, London - P.Missier
  • 4. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case • Ongoing debate in several communities – Clinical trials [1] – Earth Sciences -- ESIP - data preservation / stewardship, 2009 – Long established in some communities - Atmospheric sciences, 1998 [2] • Science Commons recommendations for Open Science – Open Science recommendations from Science Commons (July 2008) [link] The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168– 169 (2009) Prepublication data sharing: Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9 September 2009 http://www.nature.com/news/specials/datasharing/index.html 2 IDCC’09, London - P.Missier
  • 5. Reference scenario workflow workflow + execution input dataset specification 3
  • 6. Reference scenario workflow workflow + execution input dataset specification ? 3
  • 7. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) 3
  • 8. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) Research Object Packaging 3
  • 9. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) Research Object Packaging 3
  • 10. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) browse Research query Object unbundle Packaging reuse 3
  • 11. Reference scenario workflow workflow + execution input dataset specification ? Data-mediated outcome implicit outcome (provenance) collaboration (data) browse Research query Object unbundle Packaging reuse 3
  • 12. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 13. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 14. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 15. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 16. Paul’s
 Paul’s
Pack QTL Research
 Object Common pathways
  • 17. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Results Common pathways
  • 18. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Common pathways
  • 19. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Domain Relations Common pathways
  • 20. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper produces Published in Representation Results Domain Relations Common pathways
  • 21. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper produces Published in Representation Results Domain Relations Aggregation Common pathways
  • 22. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper Metadata produces Published in Representation Results Domain Relations Aggregation Common pathways
  • 23. ORE: representing generic aggregations Resource Map Data structure (descriptor) http://www.openarchives.org/ore/1.0/primer.html section 4 A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for Information Science and Technology (JASIST), to appear, 2009. 6
  • 24.
  • 25. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced 8
  • 26. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced 8
  • 27. Content: Workflow provenance A detailed trace of workflow execution lister - tasks performed, data transformations get pathways by genes1 - inputs used, outputs produced merge pathways gene_id concat gene pathway ids output pathway_genes 8
  • 28. Why provenance matters, if done right • To establish quality, relevance, trust • To track information attribution through complex transformations • To describe one’s experiment to others, for understanding / reuse • To provide evidence in support of scientific claims • To enable post hoc process analysis for improvement, re-design The W3C Incubator on Provenance has been collecting numerous use cases: http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases# IDCC’09, London - P.Missier
  • 29. What users expect to learn • Causal relations: - which pathways come from which genes? - which processes contributed to producing an lister image? - which process(es) caused data to be incorrect? get pathways by genes1 - which data caused a process to fail? merge pathways • Process and data analytics: – analyze variations in output vs an input gene_id parameter sweep (multiple process runs) – how often has my favourite service been concat gene pathway ids executed? on what inputs? – who produced this data? output – how often does this pathway turn up when the input genes range over a certain set S? pathway_genes 10 IDCC’09, London - P.Missier
  • 30. Open Provenance Model • graph of causal dependencies involving data and processors • not necessarily generated by a workflow! • v1.0.1 currently open for comments wasGeneratedBy (R) A P Goal: used (R) P A standardize causal dependencies to enable provenance metadata exchange wgb(R5) A1 wgb(R1) used(R3) A3 P1 P3 wgb(R6) A2 wgb(R2) used(R4) A4 P2 11 IDCC’09, London - P.Missier
  • 31. The 3rd provenance challenge • Chosen workflow from the Pan-STARRS project – Panoramic Survey Telescope & Rapid Response Syste • http://twiki.ipaw.info/bin/view/Challenge/ ThirdProvenanceChallenge • Goal: – demonstrate “provenance interoperability” at query level 12 IDCC’09, London - P.Missier
  • 32. The 3rd provenance challenge workflow read input file load database verify 13
  • 33. The 3rd provenance challenge workflow read input file load database verify 13
  • 34. OPM and query-interoperability Team A prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA) 14
  • 35. OPM and query-interoperability Team A prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA) Team B Q(PWA) PWA = import(OPM(prov(WA))) execute import query Q 14
  • 36. OPM and query-interoperability Team A prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA) ? Team B Q(PWA) PWA = import(OPM(prov(WA))) execute import query Q 14
  • 37. OPM in Taverna skippable 15
  • 38. OPM in Taverna skippable 15
  • 39. OPM in Taverna skippable ➡ the answer to any TP query can be viewed as an OPM graph ➡ encoded as RDF/XML (using the Tupelo provenance API) 15
  • 41. Additional requirements • Artifact values require uniform common identifier scheme – each group used artifacts to refer to its own data results – but those results were expressed using proprietary naming conventions – Linked Data in OPM? 16
  • 42. Additional requirements • Artifact values require uniform common identifier scheme – each group used artifacts to refer to its own data results – but those results were expressed using proprietary naming conventions – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes 16
  • 43. Additional requirements • Artifact values require uniform common identifier scheme – each group used artifacts to refer to its own data results – but those results were expressed using proprietary naming conventions – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes • OPM graphs can grow very large – reduce size by exporting only query results • Taverna approach – multiple levels of abstraction • through OPM accounts (“points of view”) 16
  • 44. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA)
  • 45. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA)
  • 46. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) Q(prov(WA))
  • 47. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(Q(prov(WA))) export Q(prov(WA)) Q(prov(WA))
  • 48. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(Q(prov(WA))) export Q(prov(WA)) Q(prov(WA)) - Approach implemented in Taverna 2.1 - Internal provenance DB with ad hoc query language - To be released soon
  • 49. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A 18
  • 50. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A 18
  • 51. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A result A → input B 18
  • 52. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A workflow B+ input B Research Object result exp. B result B provenance result A → input B datasets B B 18
  • 53. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B 18
  • 54. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration 18
  • 55. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration Aligned with focus of upcoming Provenance Challenge 4: “connect my provenance to yours" into a whole OPM provenance graph. 18
  • 56. Contacts The myGrid Consortium (Manchester, Southampton) http://mygrid.org.uk http://www.myexperiment.org Janus Me: pmissier@acm.org Provenance 19