SlideShare a Scribd company logo
Reconstructing Provenance                         Sara Magliacane - VU University Amsterdam
                                                                                          Advisors: Paul Groth and Frank van Harmelen



                                   Problem Statement                                                                                                                        An initial prototype implementation
The provenance of a data item is the metadata describing how,                                                                                                      As a first step we focus on dependencies between files instead of
when and by whom the data item was produced.                                                                                                                       sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                                               We implemented a prototype of the pipeline using open-source
resulting in collections of files with only basic filesystem                                                                                                       components, like Apache Lucene, Apache Tika and Dropbox API.
metadata, e.g. timestamps.                                                                                                                                         As signal detectors we used well-known similarity measures.

In this case, is it possible to reconstruct provenance post hoc?                                                                                              <2,4%      C*.7*2,.4491;%                             D672)A.4.4%E.1.*+521%                                          D672)A.4.4%C*F191;%                                                 G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                                                                                                                                                                                                   !#$%


                                                                                                                                                                                                                                                                                         @9:).*%).-72*+:%                                                                                      !          "
                                                                                                                                                                        '()*+,)%-.)+/+)+%%                                   8.()%49-9:+*9)6%                                                                                                                           I.9;A)./%BF-%
                                                                                                                                                                                                                                                                                          91,2A.*.1,.%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        !
                                                                                      @*A#<7"#A,#8,/#                                                                                                                                                                                         B9-9:+*9)6%
                                                                                                                                                                                                                                                                                                                                                                                               &      $#"%
                                                                                                                                                               &          01/.(%,21).1)%                                     0-+;.%49-9:+*9)6%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                                      9*5,#.":*597B*"C#                                                                                                                                                                                      )A*.4A2:/4%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        "
                                                                                                                                                                          013.*%4.-+15,%                                     <2-+91=47.,9>,%                                             <2-+91=47.,9>,%
                                                                                                                                                                               )67.4%                                          49-9:+*9)6%                                                  >:).*91;%

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                  ?.)+/+)+%
                                                                                                                                                                                                                                  49-9:+*9)6%
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                           4,!5(
                                                          67"8#(
                                                                                   4,!5(
                                                                                 !"$"8$"!+(
                                                                                                           9"$"!+$"-#:
                                                                                                           !"$"8$"!+(
                                                                                                                                                                                        Initial (encouraging) results
                                 )#*+$#!,$)%!&'(
           !"!#$%!&'(
             =+",# #         #      #        #        #       #    #         #        #       #       #        #         #></*?,5#
                                                                                                                                                                   We performed an experiment with a small set of biomedical
                                                                       !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                          publications, annotated manually by two domain experts.
                                                                       ./01(                      ./31(                       ./21(


                                                                                                                                                                                                 Cluster 1: Blood Cultures                               Cluster 2: Markers                    Cluster 3: General
                                                                                                                                                                                                 EvidenceQ||                                             EvidenceQX                            Guideline




                                                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                                                                             22




                                                                                                                                                                                                                        23                                              17




                                                                                                                                                                                                                                                          15                                                                  2                6                                 7




                                    Research Question                                                                                                                                                 13
                                                                                                                                                                                                               14            20




                                                                                                                                                                                                                             16                     21
                                                                                                                                                                                                                                                          18                 19




                                                                                                                                                                                                                                                                                     0




                                                                                                                                                                                                                                                                                         1
                                                                                                                                                                                                                                                                                                                                       4




                                                                                                                                                                                                                                                                                                                                           3       5
                                                                                                                                                                                                                                                                                                                                                                             8




                                                                                                                                                                                                                                                                                                                                                                                     9    10




                                                                                                                                                                                                                                                                                                                                                                                         11




                                                                                                                                                                                                                                      24                                                                                                                   12



     How can one automatically, accurately and efficiently                                                                                                                                                           5




     reconstruct a plausible provenance of files in a shared folder,                                                                                                                                                                                                                                                 23




                                                                                                                                                                                )"*+#,-*+(
                                                                                                                                                                                                                                               20                                                              17




     intended as the sequences of operations connecting the files?
                                                                                                                                                                                                                                                                                    19                                                                          7




                                                                                                                                                                                                               4                                                                                15                                                                  8




                                                                                                                                                                                             3                                                                                                                                    14




                                                                                                                                                                                                  2                                                                                                   18                                               9




                                                                                                                                                                                                           6                                                            22




                                                                                                                                                                                                                                                                                                                         21



                                                                                                                                                                                                                                                                   16




                                                                                                                                                                                                                                           0                                                              13                                                            10




                                                                                                                                                                                                                                                               1                                                                                                        11




                             Approach & Methodology
                                                                                                                                                                                                                                                                                                                                                           12




                                                                                                                                                                                                                                                                                                     24




                                                                                                                                                                                                      Cluster 1: Blood Cultures                                              Cluster 2: Markers                                        Cluster 3: General
                                                                                                                                                                                                      EvidenceQ||                                                            EvidenceQX                                                Guideline




     We propose a multi-signal pipeline approach that reconstructs                                                                                              F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and                                                                                             F1-score of 0.70 for the aggregation of various similarities
     metadata as evidence of the relationships between files.

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                                                                           Future work
                                                             #$4:2-4#-';'<=>'

                                                                                                                                                #$%&'              Following the planned methodology, we will explore additional
8$#A'      @1-%1$#-AA)4,'               B&%$0C-A-A'D-4-1+E$4'      B&%$0C-A-A'@1F4)4,'             G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                            !           "
                                                                                                                                                                   components for each of the pipeline phases and consider also
           ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                                   computational efficiency.
                                                                                                                                            (        )*+,-'
 !
 (          342-/'#$40-40'                6),4+7'8-0-#0$1('              6),4+7'9)70-1('                  G,,1-,+0$1('                 #$4:2-4#-';'<=?'

 "
                  5'                             5'                              5'                            ==='
                                                                                                                                            !
                                                                                                                                                #$%&'


                                                                                                                                                        "
                                                                                                                                                                                                                                      Bibliography
                                                                                                                                                 (                    (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral
                                                                                                                                                                      Consortium 2012

        The research methodology is an iterative process, that will                                                                                                   (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata
        incrementally integrate existing approaches in literature and                                                                                                 Annotation through Reconstructing Provenance, Third International
        evaluate the performance on benchmark corpora.                                                                                                                Workshop on the role of Semantic Web in Provenance Management,
                                                                                                                                                                      ESWC 2012
Advisors: Paul Groth and Frank van Harmelen



                            Problem Statement                                                                                              An initial prototype im
The provenance of a data item is the metadata describing how,                                                                        As a first step we focus on dependen
when and by whom the data item was produced.                                                                                         sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                 We implemented a prototype of the p
resulting in collections of files with only basic filesystem                                                                         components, like Apache Lucene, Ap
metadata, e.g. timestamps.                                                                                                           As signal detectors we used well-kno

In this case, is it possible to reconstruct provenance post hoc?                                                                <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F


                                                                                                                                                                                                                             @9:).*%).-72*
                                                                                                                                        '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%
                                                                                                                                                                                                                              91,2A.*.1,
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                !
                                                                               @*A#<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                 &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                               9*5,#.":*597B*"C#                                                                                                                                 )A*.4A2:/4
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                "
                                                                                                                                         013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,
                                                                                                                                              )67.4%                              49-9:+*9)6%                                   >:).*91;%

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                           ?.)+/+)+%
                                                                                                                                                                                     49-9:+*9)6%
        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                  4,!5(
                                                 67"8#(
                                                                            4,!5(
                                                                          !"$"8$"!+(
                                                                                                   9"$"!+$"-#:
                                                                                                   !"$"8$"!+(
                                                                                                                                                        Initial (encouragin
                          )#*+$#!,$)%!&'(
      !"!#$%!&'(
        =+",# #       #       #       #      #       #       #        #        #       #       #      #      #></*?,5#
                                                                                                                                     We performed an experiment with a
                                                                 !,-)#$%!!)(               !,-)#$%!!)(            !,-)#$%!!)(        publications, annotated manually by
                                                                 ./01(                     ./31(                  ./21(


                                                                                                                                                             Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: G
                                                                                                                                                             EvidenceQ||                           EvidenceQX                      Guideline




                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                   22




                                                                                                                                                                           23                                 17




                                                                                                                                                                                                    15




                              Research Question                                                                                                                13
                                                                                                                                                                    14          20




                                                                                                                                                                                16            21
                                                                                                                                                                                                    18             19




                                                                                                                                                                                                                         0




                                                                                                                                                                                                                             1




                                                                                                                                                                                         24
013.*%4.-+15,%                                      <2-+91=47.,9>,%                                                 <2-+91=47.,9>,%
                                                                                       Advisors: Paul Groth and Frank van )67.4%
                                                                                                                           Harmelen 49-9:+*9)6%                                                                                                                                          >:).*91;%

           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                              ?.)+/+)+%
                                                                                                                                                                                                                           49-9:+*9)6%
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#




                               Problem Statement                                                                                                                   An initial prototype im
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




The provenance of a data item is the metadata describing how,
                                                       4,!5(
                                                      67"8#(
                                                                                4,!5(
                                                                              !"$"8$"!+(
                                                                                                        9"$"!+$"-#:
                                                                                                        !"$"8$"!+(
                                                                                                                                                                         Initial (encouraging
                                                                                                                                                              As a first step we focus on dependenc
when !"!#$%!&'( whom the data item was produced.
     and by )#*+$#!,$)%!&'(                                                                                                                                   sequences of operations.
                                                                                                                                                              We performed an experiment with a sm
          =+",# #        #       #        #       #        #    #         #        #       #       #        #         #></*?,5#
                                                                    !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                        publications, annotated manually by tw
Provenance is crucial in many ./01(
                                  settings, but often it is ./21( tracked,
                                              ./31(          not                                                                                              We implemented a prototype of the pip
resulting in collections of files with only basic filesystem                                                                                                  components, like Apache Lucene, Apa
                                                                                                                                                                                          Cluster 1: Blood Cultures
                                                                                                                                                                                          EvidenceQ||
                                                                                                                                                                                                                                                  Cluster 2: Markers
                                                                                                                                                                                                                                                  EvidenceQX
                                                                                                                                                                                                                                                                                            Cluster 3: General
                                                                                                                                                                                                                                                                                            Guideline

metadata, e.g. timestamps.                                                                                                                                    As signal detectors we used well-know




                                                                                                                                                                        !"#$#%&'(
                                                                                                                                                                                                                                                                      22




                                                                                                                                                                                                                 23                                              17




In this case, is it possible to reconstruct provenance post hoc?                                                                                         <2,4%   C*.7*2,.4491;%                              D672)A.4.4%E.1.*+521%                 15
                                                                                                                                                                                                                                                                            D672)A.4.4%C*F191;%                           2




                                 Research Question                                                                                                               '()*+,)%-.)+/+)+%%
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16


                                                                                                                                                                                                                       8.()%49-9:+*9)6%
                                                                                                                                                                                                                                             21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                      @9:).*%).-72*+:%
                                                                                                                                                                                                                                                                                       91,2A.*.1,.%
                                                                                                                                                          !
                                                                                                                                                                                                                                                                                      1


          !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#
                                                    @*A#<7"#A,#8,/#                                                                                                                                                            24

                                                                                                                                                                                                                                                                                           B9-9:+*9)6%
    How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
         can one automatically, accurately and efficiently
                                                    9*5,#.":*597B*"C#
                                                                                                                                                          &       01/.(%,21).1)%                             5
                                                                                                                                                                                                                      0-+;.%49-9:+*9)6%
                                                                                                                                                                                                                                                                                          )A*.4A2:/4%
    reconstruct a plausible provenance of files in a shared folder,
         !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                       "                                                                                                                                                     23




                                                                                                                                                                         )"*+#,-*+(
                                                                                                                                                                  013.*%4.-+15,%                                      <2-+91=47.,9>,%   20

                                                                                                                                                                                                                                                                                      <2-+91=47.,9>,%      17




                                                                                                                                                                                                                                                                                         >:).*91;%
    intended as the sequences of operations connecting the files?                                                                                                       )67.4%                                           49-9:+*9)6%
                                                                                                                                                                                                                                                                             19




                                                                                                                                                                                                        4                                                                                   15




                                                                                                                                                                                      3                                                                                                                                       14



           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                              2                               ?.)+/+)+%                                                              18




                                                                                                                                                                                                    6
                                                                                                                                                                                                                           49-9:+*9)6%                           22




           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                                                               21



                                                                                                                                                                                                                                                            16




                                                                                                                                                                                                                                    0                                                                 13


           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#                                                                                                                                                         1




                         Approach & Methodology
                                                                                                                                                                                 Initial (encouraging
                                                                                                                                                                                                                                                                                                 24




                                                                                                                                                                                               Cluster 1: Blood Cultures                                              Cluster 2: Markers                                           C
                                                                                                                                                                                               EvidenceQ||                                                            EvidenceQX                                                   G
                                                                                                        9"$"!+$"-#:
                                                                                4,!5(                   !"$"8$"!+(
                                                       4,!5(
                                                      67"8#(                  !"$"8$"!+(


    We !"!#$%!&'(
        propose )#*+$#!,$)%!&'(
                    a multi-signal pipeline approach that reconstructs                                                                                     F1-score of 0.49an experiment with a sm
                                                                                                                                                            We performed for only text similarity
    plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and
         =+",# #  #   #   #     #
                                     using
                                        #               #
                                                    !,-)#$%!!)(
                                                               # #></*?,5#
                                                                      !,-)#$%!!)(
                                                                                                                                                           F1-score of 0.70 for the aggregation of v
                                                                                                                                                            publications, annotated manually by tw
    metadata as evidence of the./01( relationships between./21(
                                                    ./31(
                                                                       files.
                                                                                                                                                                                          Cluster 1: Blood Cultures                               Cluster 2: Markers                        Cluster 3: General




                                                                                                                                                                                                                                    Future work
                                                                                                                                                                                          EvidenceQ||                                             EvidenceQX                                Guideline

    The pipeline consists of four stages, each containing several




                                                                                                                                                                         !"#$#%&'(
                                                                                                                                                                                                                                                                      22




    components that can be executed in parallel:            #$4:2-4#-';'<=>'
                                                                                                                                                                                                                 23                                              17




                                                                                                                                                                                                                                                   15                                                                     2



                                                                                                                                                            Following the planned methodology, we
8$#A'   @1-%1$#-AA)4,'
                                 Research Question
                                     B&%$0C-A-A'D-4-1+E$4'     B&%$0C-A-A'@1F4)4,'              G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                         !
                                                                                                                                             #$%&'


                                                                                                                                                     "
                                                                                                                                                            components for each of the pipeline ph
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16                     21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                                  0




        ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                            computational efficiency.                                                                                                  1




                                                                                                                                         (                                                                                     24
013.*%4.-+15,%
                                                                                                                                                                   013.*%4.-+15,%                                                   <2-+91=47.,9>,%
                                                                                                                                                                                                                                       <2-+91=47.,9>,%                                                                            <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     >:).*91;%
                                                                                                                                                                                                                                                                                                                                         >:).*91;%




                                                                                                                                                                     )"*+#,
                                                                                                                                                                    )67.4%
                                                                                                                                                                        )67.4%      2

                                                                                                                                                                                                                                      49-9:+*9)6%
                                                                                                                                                                                                                                          49-9:+*9)6%                                                                                                18




                                                                                                                                                                                                 6                                                                                                  22




                                                                                                                                                                                                                                                                                                                                                                                        21



              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                        ?.)+/+)+%
                                                                                                                                                                                                                                                ?.)+/+)+%                                      16




                                                                                                                                                                                                                                             49-9:+*9)6%
                                                                                                                                                                                                                                                 49-9:+*9)6%0                                                                                            13




              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                  1




                            Approach & Methodology
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                                                                                                                                                                                        Cluster 1: Blood Cultures
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide




     We propose a multi-signal pipeline approach that reconstructs
                                                      4,!5(
                                                          4,!5(
                                                     67"8#(
                                                         67"8#(
                                                                               4,!5(
                                                                                   4,!5(
                                                                             !"$"8$"!+(
                                                                                 !"$"8$"!+(
                                                                                                       9"$"!+$"-#:
                                                                                                           9"$"!+$"-#:
                                                                                                       !"$"8$"!+(
                                                                                                           !"$"8$"!+(
                                                                                                                                                                           Initial (encouraging)
                                                                                                                                                                            Initial (encouraging
                                                                                                                                                          F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and
                         )#*+$#!,$)%!&'(
                             )#*+$#!,$)%!&'(
                                                                                                                                                          F1-score of 0.70 for the aggregation of va
         !"!#$%!&'(
            !"!#$%!&'(
     metadata #as evidence of# the relationships between files.
           =+",# #
               =+",#   # # # # # # # #       # # # # # # # # # # # # # #></*?,5#
                                                                          #></*?,5#
                                                                                                                                                           We performed an experiment with a a sm
                                                                                                                                                             We performed an experiment with sma
                                                                     !,-)#$%!!)(
                                                                         !,-)#$%!!)(          !,-)#$%!!)(
                                                                                                  !,-)#$%!!)(             !,-)#$%!!)(
                                                                                                                              !,-)#$%!!)(                  publications, annotated manually by two
                                                                                                                                                             publications, annotated manually by tw
                                                                     ./01(
                                                                         ./01(                ./31(
                                                                                                  ./31(                   ./21(
                                                                                                                              ./21(

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                   Cluster 1: Blood Blood Cultures Cluster 2: Markers
                                                                                                                                                                                         Cluster 1: Cultures
                                                                                                                                                                                   EvidenceQ||
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                            Future work   EvidenceQX
                                                                                                                                                                                                                                                                               EvidenceQX
                                                                                                                                                                                                                                                                                                                                           Cluster 3: General
                                                                                                                                                                                                                                                                                                                                                 Cluster 3: General
                                                                                                                                                                                                                                                                                                                                           Guideline
                                                                                                                                                                                                                                                                                                                                                Guideline




                                                                                                                                                                     !"#$#%&'(
                                                                                                                                                                     !"#$#%&'(
                                                             #$4:2-4#-';'<=>'                                                                                                                                                                                                                            22             22




                                                                                                                                            #$%&'         Following the planned methodology, we w                              23            23                                                     17             17




8$#A'      @1-%1$#-AA)4,'            B&%$0C-A-A'D-4-1+E$4'         B&%$0C-A-A'@1F4)4,'          G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                                          components for each of the pipeline phas                                                                         15             15                                                                                                 2             2        6




                                    Research Question
                                     Research Question
                                                                                                                                       !            "                                                14                    14       20            20                       18             18             19             19                                                                             4




           ./01+#0'*-0+2+0+''           6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'               G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                       (
                                                                                                                                                                                        13                    13                    16            16                 21              21                                       0        0                                                                        3


                                                                                                                                                 )*+,-'
 !                                                                                                                                                                                                                                                                                                                                1        1




 (
     How can automatically, accurately and efficiently #$4:2-4#-';'<=?'
   How342-/'#$40-40' one automatically, 6),4+7'9)70-1('
         can one 6),4+7'8-0-#0$1('
                                                                                                                                                                                                                                                       24             24

                                                        G,,1-,+0$1('
                                         accurately and efficiently
 "
                                                                                                                                                                                                                                                       Bibliography
                                                                                                                                                                                                                   5                     5




                                                                     #$%&'
   reconstruct a a plausible provenance of files ===' a shared folder,
     reconstruct plausible provenance of files in in shared folder,
            5'               5'              5'               a                                                                                                                                                                                                                                                                                                    23                   23




                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                                                                                                                20              20                                                                            17                   17




   intended as the sequences ofof operations connecting the!files?
     intended as the sequences operations connecting the files?
                                                                                                                                                                                                                                                                                                                             19       19




                                                                           "                                                                                                                         4                     4                                                                                                                   15                   15




                                                                                                                                                            (1) Sara Magliacane: Reconstructing Prove
                                                                                                                                                                               3             3                                                                                                                                                                                                   14            14




                                                                                                                                             (
                                                                                                                                                                                    2                     2                                                                                                                                          18                   18




                                                                                                                                                            Consortium 2012
                                                                                                                                                                                                 6                     6                                                                            22             22




                                                                                                                                                                                                                                                                                                                                                                                        21            21



                                                                                                                                                                                                                                                                                               16             16




                                                                                                                                                                                                                                                            0              0                                                                             13                   13




        The research methodology is an iterative process, that will                                                                                         (2) Paul Groth, Yolanda Gil, Sara Magliacan
                                                                                                                                                                                                                                                                                1              1




                            Approach &&Methodology
                             Approach Methodology
        incrementally integrate existing approaches in literature and                                                                                       Annotation through Reconstructing Provena
                                                                                                                                                                                        Cluster 1: BloodBlood Cultures
                                                                                                                                                                                             Cluster 1: Cultures

                                                                                                                                                            Workshop on the role of Semantic Web in P
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                             EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                              Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                              EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24                   24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide
                                                                                                                                                                                                                                                                                                                                                                                                           C
                                                                                                                                                                                                                                                                                                                                                                                                           G

        evaluate the performance on benchmark corpora.
                                                                                                                                                            ESWC 2012
     We propose a a multi-signal pipeline approach that reconstructs
       We propose multi-signal pipeline approach that reconstructs                                                                                        F1-score ofof 0.49 for only text similarity
                                                                                                                                                           F1-score 0.49 for only text similarity
     plausible provenance traces using the contents ofof the files and
       plausible provenance traces using the contents the files and                                                                                        F1-score ofof 0.70 for the aggregation of v
                                                                                                                                                           F1-score 0.70 for the aggregation of va
     metadata as evidence ofof the relationships between files.
       metadata as evidence the relationships between files.

     The pipeline consists ofof four stages, each containing several
       The pipeline consists four stages, each containing several
     components that can be executed in in parallel:
       components that can be executed parallel:
                                                                                                                                                                                                                                                            Future work
                                                                                                                                                                                                                                                             Future work
                                                              #$4:2-4#-';'<=>'
                                                                 #$4:2-4#-';'<=>'

                                                                                                                                            #$%&'
                                                                                                                                                #$%&'     Following the planned methodology, we w
                                                                                                                                                            Following the planned methodology, we
8$#A'
   8$#A'   @1-%1$#-AA)4,'
               @1-%1$#-AA)4,'           B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,'
                                     B&%$0C-A-A'D-4-1+E$4'       B&%$0C-A-A'@1F4)4,'            G,,1-,+E$4'+42'1+4H)4,'
                                                                                                    G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                       ! ! " "            components for each ofof the pipeline ph
                                                                                                                                                            components for each the pipeline phas
           ./01+#0'*-0+2+0+''
               ./01+#0'*-0+2+0+''       6),4+7'8-0-#0$1!'
                                            6),4+7'8-0-#0$1!'          6),4+7'9)70-1!'
                                                                           6),4+7'9)70-1!'           G,,1-,+0$1!'
                                                                                                         G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                                            computational efficiency.
                                                                                                                                       ( (
isors: Paul Groth and Frank van Harmelen



nt                                                      An initial prototype implementation
adata describing how,                             As a first step we focus on dependencies between files instead of
duced.                                            sequences of operations.

t often it is not tracked,                        We implemented a prototype of the pipeline using open-source
sic filesystem                                    components, like Apache Lucene, Apache Tika and Dropbox API.
                                                  As signal detectors we used well-known similarity measures.

ovenance post hoc?                           <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F191;%                             G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                   !#$%


                                                                                                                                          @9:).*%).-72*+:%                                                     !          "
                                                     '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%                                                                             I.9;A)./%BF-%
                                                                                                                                           91,2A.*.1,.%
                                              !
<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                                                                                               &      $#"%
                                              &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
#.":*597B*"C#                                                                                                                                 )A*.4A2:/4%
                                              "
                                                      013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,9>,%
                                                           )67.4%                              49-9:+*9)6%                                   >:).*91;%

563-:6#################                                                                           ?.)+/+)+%
                                                                                                  49-9:+*9)6%
,<05,3*5/63-:6#

3,563-:6#




          9"$"!+$"-#:
          !"$"8$"!+(
                                                                     Initial (encouraging) results
    #          #          #></*?,5#
                                                  We performed an experiment with a small set of biomedical
,-)#$%!!)(                     !,-)#$%!!)(        publications, annotated manually by two domain experts.
 31(                           ./21(


                                                                          Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: General
                                                                          EvidenceQ||                           EvidenceQX                      Guideline
                                                             !"#$#%&'(




                                                                                                                                22




                                                                                        23                                 17




                                                                                                                 15                                           2              6                   7




on                                                                          13
                                                                                 14          20




                                                                                             16            21
                                                                                                                 18             19




                                                                                                                                      0




                                                                                                                                          1
                                                                                                                                                                     4




                                                                                                                                                                         3       5
                                                                                                                                                                                             8




                                                                                                                                                                                                     9    10




                                                                                                                                                                                                         11




                                                                                                      24                                                                             12
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"

More Related Content

Similar to ISWC DC poster "Reconstructing Provenance"

Haiku licence experience - fossa2010
Haiku licence experience - fossa2010Haiku licence experience - fossa2010
Haiku licence experience - fossa2010
fOSSa - Free Open Source Software Academia Conference
 
Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais Educativas
Marcel Caraciolo
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service Design
Andy Polaine
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandIlkka Kakko
 
Exec ed june '10 ss
Exec ed june '10 ssExec ed june '10 ss
Exec ed june '10 ss
Caley Cantrell
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentationSiteriCR2
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2CR2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laicaguest45bb716a5
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localization
lzenki
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data
Ed Parsons
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
Amparo Elizabeth Cano Basave
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostilafabiobelem7
 
Organizational development
Organizational developmentOrganizational development
Organizational development
Seta Wicaksana
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
YONG ZHENG
 
Layouts
LayoutsLayouts
Layouts
apbarone
 
All about Apache ACE
All about Apache ACEAll about Apache ACE
All about Apache ACE
OSGi User Group France
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentationSiteriCR2
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-sojaelbisaltico
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprises
daenu
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Chiara Ojeda
 

Similar to ISWC DC poster "Reconstructing Provenance" (20)

Haiku licence experience - fossa2010
Haiku licence experience - fossa2010Haiku licence experience - fossa2010
Haiku licence experience - fossa2010
 
Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais Educativas
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service Design
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, Finland
 
Exec ed june '10 ss
Exec ed june '10 ssExec ed june '10 ss
Exec ed june '10 ss
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentation
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laica
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localization
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostila
 
Organizational development
Organizational developmentOrganizational development
Organizational development
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
 
Layouts
LayoutsLayouts
Layouts
 
All about Apache ACE
All about Apache ACEAll about Apache ACE
All about Apache ACE
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentation
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-soja
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprises
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

ISWC DC poster "Reconstructing Provenance"

  • 1. Reconstructing Provenance Sara Magliacane - VU University Amsterdam Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype implementation The provenance of a data item is the metadata describing how, As a first step we focus on dependencies between files instead of when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the pipeline using open-source resulting in collections of files with only basic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. metadata, e.g. timestamps. As signal detectors we used well-known similarity measures. In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a small set of biomedical !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 Research Question 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12 How can one automatically, accurately and efficiently 5 reconstruct a plausible provenance of files in a shared folder, 23 )"*+#,-*+( 20 17 intended as the sequences of operations connecting the files? 19 7 4 15 8 3 14 2 18 9 6 22 21 16 0 13 10 1 11 Approach & Methodology 12 24 Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline We propose a multi-signal pipeline approach that reconstructs F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and F1-score of 0.70 for the aggregation of various similarities metadata as evidence of the relationships between files. The pipeline consists of four stages, each containing several components that can be executed in parallel: Future work #$4:2-4#-';'<=>' #$%&' Following the planned methodology, we will explore additional 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! " components for each of the pipeline phases and consider also ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( )*+,-' ! ( 342-/'#$40-40' 6),4+7'8-0-#0$1(' 6),4+7'9)70-1(' G,,1-,+0$1(' #$4:2-4#-';'<=?' " 5' 5' 5' ===' ! #$%&' " Bibliography ( (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral Consortium 2012 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata incrementally integrate existing approaches in literature and Annotation through Reconstructing Provenance, Third International evaluate the performance on benchmark corpora. Workshop on the role of Semantic Web in Provenance Management, ESWC 2012
  • 2. Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype im The provenance of a data item is the metadata describing how, As a first step we focus on dependen when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the p resulting in collections of files with only basic filesystem components, like Apache Lucene, Ap metadata, e.g. timestamps. As signal detectors we used well-kno In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F @9:).*%).-72* '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% 91,2A.*.1, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47., )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouragin )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: G EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 Research Question 13 14 20 16 21 18 19 0 1 24
  • 3. 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% Advisors: Paul Groth and Frank van )67.4% Harmelen 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# Problem Statement An initial prototype im !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# The provenance of a data item is the metadata describing how, 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging As a first step we focus on dependenc when !"!#$%!&'( whom the data item was produced. and by )#*+$#!,$)%!&'( sequences of operations. We performed an experiment with a sm =+",# # # # # # # # # # # # # #></*?,5# !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by tw Provenance is crucial in many ./01( settings, but often it is ./21( tracked, ./31( not We implemented a prototype of the pip resulting in collections of files with only basic filesystem components, like Apache Lucene, Apa Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX Cluster 3: General Guideline metadata, e.g. timestamps. As signal detectors we used well-know !"#$#%&'( 22 23 17 In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% 15 D672)A.4.4%C*F191;% 2 Research Question '()*+,)%-.)+/+)+%% 13 14 20 16 8.()%49-9:+*9)6% 21 18 19 0 @9:).*%).-72*+:% 91,2A.*.1,.% ! 1 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# @*A#<7"#A,#8,/# 24 B9-9:+*9)6% How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# can one automatically, accurately and efficiently 9*5,#.":*597B*"C# & 01/.(%,21).1)% 5 0-+;.%49-9:+*9)6% )A*.4A2:/4% reconstruct a plausible provenance of files in a shared folder, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 23 )"*+#,-*+( 013.*%4.-+15,% <2-+91=47.,9>,% 20 <2-+91=47.,9>,% 17 >:).*91;% intended as the sequences of operations connecting the files? )67.4% 49-9:+*9)6% 19 4 15 3 14 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# 2 ?.)+/+)+% 18 6 49-9:+*9)6% 22 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 21 16 0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 1 Approach & Methodology Initial (encouraging 24 Cluster 1: Blood Cultures Cluster 2: Markers C EvidenceQ|| EvidenceQX G 9"$"!+$"-#: 4,!5( !"$"8$"!+( 4,!5( 67"8#( !"$"8$"!+( We !"!#$%!&'( propose )#*+$#!,$)%!&'( a multi-signal pipeline approach that reconstructs F1-score of 0.49an experiment with a sm We performed for only text similarity plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and =+",# # # # # # using # # !,-)#$%!!)( # #></*?,5# !,-)#$%!!)( F1-score of 0.70 for the aggregation of v publications, annotated manually by tw metadata as evidence of the./01( relationships between./21( ./31( files. Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General Future work EvidenceQ|| EvidenceQX Guideline The pipeline consists of four stages, each containing several !"#$#%&'( 22 components that can be executed in parallel: #$4:2-4#-';'<=>' 23 17 15 2 Following the planned methodology, we 8$#A' @1-%1$#-AA)4,' Research Question B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! #$%&' " components for each of the pipeline ph 13 14 20 16 21 18 19 0 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. 1 ( 24
  • 4. 013.*%4.-+15,% 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% >:).*91;% >:).*91;% )"*+#, )67.4% )67.4% 2 49-9:+*9)6% 49-9:+*9)6% 18 6 22 21 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% ?.)+/+)+% 16 49-9:+*9)6% 49-9:+*9)6%0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 1 Approach & Methodology !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX 24 Cluste Guide We propose a multi-signal pipeline approach that reconstructs 4,!5( 4,!5( 67"8#( 67"8#( 4,!5( 4,!5( !"$"8$"!+( !"$"8$"!+( 9"$"!+$"-#: 9"$"!+$"-#: !"$"8$"!+( !"$"8$"!+( Initial (encouraging) Initial (encouraging F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and )#*+$#!,$)%!&'( )#*+$#!,$)%!&'( F1-score of 0.70 for the aggregation of va !"!#$%!&'( !"!#$%!&'( metadata #as evidence of# the relationships between files. =+",# # =+",# # # # # # # # # # # # # # # # # # # # # # #></*?,5# #></*?,5# We performed an experiment with a a sm We performed an experiment with sma !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two publications, annotated manually by tw ./01( ./01( ./31( ./31( ./21( ./21( The pipeline consists of four stages, each containing several components that can be executed in parallel: Cluster 1: Blood Blood Cultures Cluster 2: Markers Cluster 1: Cultures EvidenceQ|| EvidenceQ|| Cluster 2: Markers Future work EvidenceQX EvidenceQX Cluster 3: General Cluster 3: General Guideline Guideline !"#$#%&'( !"#$#%&'( #$4:2-4#-';'<=>' 22 22 #$%&' Following the planned methodology, we w 23 23 17 17 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' components for each of the pipeline phas 15 15 2 2 6 Research Question Research Question ! " 14 14 20 20 18 18 19 19 4 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( 13 13 16 16 21 21 0 0 3 )*+,-' ! 1 1 ( How can automatically, accurately and efficiently #$4:2-4#-';'<=?' How342-/'#$40-40' one automatically, 6),4+7'9)70-1(' can one 6),4+7'8-0-#0$1(' 24 24 G,,1-,+0$1(' accurately and efficiently " Bibliography 5 5 #$%&' reconstruct a a plausible provenance of files ===' a shared folder, reconstruct plausible provenance of files in in shared folder, 5' 5' 5' a 23 23 )"*+#,-*+( )"*+#,-*+( 20 20 17 17 intended as the sequences ofof operations connecting the!files? intended as the sequences operations connecting the files? 19 19 " 4 4 15 15 (1) Sara Magliacane: Reconstructing Prove 3 3 14 14 ( 2 2 18 18 Consortium 2012 6 6 22 22 21 21 16 16 0 0 13 13 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacan 1 1 Approach &&Methodology Approach Methodology incrementally integrate existing approaches in literature and Annotation through Reconstructing Provena Cluster 1: BloodBlood Cultures Cluster 1: Cultures Workshop on the role of Semantic Web in P EvidenceQ|| EvidenceQ|| Cluster 2: Markers Cluster 2: Markers EvidenceQX EvidenceQX 24 24 Cluste Guide C G evaluate the performance on benchmark corpora. ESWC 2012 We propose a a multi-signal pipeline approach that reconstructs We propose multi-signal pipeline approach that reconstructs F1-score ofof 0.49 for only text similarity F1-score 0.49 for only text similarity plausible provenance traces using the contents ofof the files and plausible provenance traces using the contents the files and F1-score ofof 0.70 for the aggregation of v F1-score 0.70 for the aggregation of va metadata as evidence ofof the relationships between files. metadata as evidence the relationships between files. The pipeline consists ofof four stages, each containing several The pipeline consists four stages, each containing several components that can be executed in in parallel: components that can be executed parallel: Future work Future work #$4:2-4#-';'<=>' #$4:2-4#-';'<=>' #$%&' #$%&' Following the planned methodology, we w Following the planned methodology, we 8$#A' 8$#A' @1-%1$#-AA)4,' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' G,,1-,+E$4'+42'1+4H)4,' ! ! " " components for each ofof the pipeline ph components for each the pipeline phas ./01+#0'*-0+2+0+'' ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' 6),4+7'9)70-1!' G,,1-,+0$1!' G,,1-,+0$1!' computational efficiency. computational efficiency. ( (
  • 5. isors: Paul Groth and Frank van Harmelen nt An initial prototype implementation adata describing how, As a first step we focus on dependencies between files instead of duced. sequences of operations. t often it is not tracked, We implemented a prototype of the pipeline using open-source sic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. As signal detectors we used well-known similarity measures. ovenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% ! <7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% #.":*597B*"C# )A*.4A2:/4% " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% 563-:6################# ?.)+/+)+% 49-9:+*9)6% ,<05,3*5/63-:6# 3,563-:6# 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results # # #></*?,5# We performed an experiment with a small set of biomedical ,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. 31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 on 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12