The slides based on the poster of the ISWC 2012 doctoral consortium on "Reconstructing Provenance". Trying to summarize what I think of the next 3 years of my PhD.
Blueprint+: Developing a Tool for Service DesignAndy Polaine
Presented at the Service Design Network Conference 09 in Madeira. The presentation is about a work-in-progress examining how we can best expand the service design blueprint diagramming to include other critical information such as time and emotional states of the participants in the service.
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...YONG ZHENG
The aim of the Experience Discovery project is to recommend extracurricular activities to high school and middle school students in urban areas. In implementing this system, we have been able to make use of both usage data and data drawn from a social networking site. Using pilot data, we are able to show that very simple aggregation techniques applied to the social network can improve recommendation accuracy.
OSGi technology is becoming the preferred approach for creating highly modular and dynamically extensible applications. With open source framework implementations like Eclipse Equinox and Apache Felix readily available, there is no better time to move to OSGi technology. However, doing so requires to master the assembly, provisioning, and discovery of the components that make-up your system. Apache ACE, an Apache Incubator project, is a software distribution framework that allows to centrally manage and distribute software components, configuration data, and other artifacts to target systems. We will focus on building and managing OSGi deployments, showing you how to use Apache ACE to bootstrap a framework and deploy to remotely managed systems. Also, we will show how ACE can be used to deploy bundles to an Android based phone.
Risk management: Social media usage in enterprisesdaenu
The usage of social media platforms is increasing rapidly and now also more and more enterprises start to have their own presence on different social media platforms. Even if an enterprise is present on a social media platform, it isn‘t given that the own employees are allowed to access these platforms mostly due to the existing risks. One of the biggest risks is the loss of the reputation of a enterprise that only with a continuos monitoring of the social media platforms can be reduced. With a clear social media governance including a clear strategy and a risk analysis an enterprise can train their employees in a awareness program.
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Chiara Ojeda
Tweak your Slides, workshop on visual design for educators. This is draft 3, which includes examples of my own past slide shows and revisions of these shows.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Blueprint+: Developing a Tool for Service DesignAndy Polaine
Presented at the Service Design Network Conference 09 in Madeira. The presentation is about a work-in-progress examining how we can best expand the service design blueprint diagramming to include other critical information such as time and emotional states of the participants in the service.
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...YONG ZHENG
The aim of the Experience Discovery project is to recommend extracurricular activities to high school and middle school students in urban areas. In implementing this system, we have been able to make use of both usage data and data drawn from a social networking site. Using pilot data, we are able to show that very simple aggregation techniques applied to the social network can improve recommendation accuracy.
OSGi technology is becoming the preferred approach for creating highly modular and dynamically extensible applications. With open source framework implementations like Eclipse Equinox and Apache Felix readily available, there is no better time to move to OSGi technology. However, doing so requires to master the assembly, provisioning, and discovery of the components that make-up your system. Apache ACE, an Apache Incubator project, is a software distribution framework that allows to centrally manage and distribute software components, configuration data, and other artifacts to target systems. We will focus on building and managing OSGi deployments, showing you how to use Apache ACE to bootstrap a framework and deploy to remotely managed systems. Also, we will show how ACE can be used to deploy bundles to an Android based phone.
Risk management: Social media usage in enterprisesdaenu
The usage of social media platforms is increasing rapidly and now also more and more enterprises start to have their own presence on different social media platforms. Even if an enterprise is present on a social media platform, it isn‘t given that the own employees are allowed to access these platforms mostly due to the existing risks. One of the biggest risks is the loss of the reputation of a enterprise that only with a continuos monitoring of the social media platforms can be reduced. With a clear social media governance including a clear strategy and a risk analysis an enterprise can train their employees in a awareness program.
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Chiara Ojeda
Tweak your Slides, workshop on visual design for educators. This is draft 3, which includes examples of my own past slide shows and revisions of these shows.
Similar to ISWC DC poster "Reconstructing Provenance" (20)
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
1. Reconstructing Provenance Sara Magliacane - VU University Amsterdam
Advisors: Paul Groth and Frank van Harmelen
Problem Statement An initial prototype implementation
The provenance of a data item is the metadata describing how, As a first step we focus on dependencies between files instead of
when and by whom the data item was produced. sequences of operations.
Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the pipeline using open-source
resulting in collections of files with only basic filesystem components, like Apache Lucene, Apache Tika and Dropbox API.
metadata, e.g. timestamps. As signal detectors we used well-known similarity measures.
In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;%
!#$%
@9:).*%).-72*+:% ! "
'()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-%
91,2A.*.1,.%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# !
@*A#<7"#A,#8,/# B9-9:+*9)6%
& $#"%
& 01/.(%,21).1)% 0-+;.%49-9:+*9)6%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
9*5,#.":*597B*"C# )A*.4A2:/4%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# "
013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,%
)67.4% 49-9:+*9)6% >:).*91;%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+%
49-9:+*9)6%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
4,!5(
67"8#(
4,!5(
!"$"8$"!+(
9"$"!+$"-#:
!"$"8$"!+(
Initial (encouraging) results
)#*+$#!,$)%!&'(
!"!#$%!&'(
=+",# # # # # # # # # # # # # #></*?,5#
We performed an experiment with a small set of biomedical
!,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts.
./01( ./31( ./21(
Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General
EvidenceQ|| EvidenceQX Guideline
!"#$#%&'(
22
23 17
15 2 6 7
Research Question 13
14 20
16 21
18 19
0
1
4
3 5
8
9 10
11
24 12
How can one automatically, accurately and efficiently 5
reconstruct a plausible provenance of files in a shared folder, 23
)"*+#,-*+(
20 17
intended as the sequences of operations connecting the files?
19 7
4 15 8
3 14
2 18 9
6 22
21
16
0 13 10
1 11
Approach & Methodology
12
24
Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General
EvidenceQ|| EvidenceQX Guideline
We propose a multi-signal pipeline approach that reconstructs F1-score of 0.49 for only text similarity
plausible provenance traces using the contents of the files and F1-score of 0.70 for the aggregation of various similarities
metadata as evidence of the relationships between files.
The pipeline consists of four stages, each containing several
components that can be executed in parallel:
Future work
#$4:2-4#-';'<=>'
#$%&' Following the planned methodology, we will explore additional
8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,'
! "
components for each of the pipeline phases and consider also
./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!'
computational efficiency.
( )*+,-'
!
( 342-/'#$40-40' 6),4+7'8-0-#0$1(' 6),4+7'9)70-1(' G,,1-,+0$1(' #$4:2-4#-';'<=?'
"
5' 5' 5' ==='
!
#$%&'
"
Bibliography
( (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral
Consortium 2012
The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata
incrementally integrate existing approaches in literature and Annotation through Reconstructing Provenance, Third International
evaluate the performance on benchmark corpora. Workshop on the role of Semantic Web in Provenance Management,
ESWC 2012
2. Advisors: Paul Groth and Frank van Harmelen
Problem Statement An initial prototype im
The provenance of a data item is the metadata describing how, As a first step we focus on dependen
when and by whom the data item was produced. sequences of operations.
Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the p
resulting in collections of files with only basic filesystem components, like Apache Lucene, Ap
metadata, e.g. timestamps. As signal detectors we used well-kno
In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F
@9:).*%).-72*
'()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6%
91,2A.*.1,
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# !
@*A#<7"#A,#8,/# B9-9:+*9)6%
& 01/.(%,21).1)% 0-+;.%49-9:+*9)6%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
9*5,#.":*597B*"C# )A*.4A2:/4
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# "
013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,
)67.4% 49-9:+*9)6% >:).*91;%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+%
49-9:+*9)6%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
4,!5(
67"8#(
4,!5(
!"$"8$"!+(
9"$"!+$"-#:
!"$"8$"!+(
Initial (encouragin
)#*+$#!,$)%!&'(
!"!#$%!&'(
=+",# # # # # # # # # # # # # #></*?,5#
We performed an experiment with a
!,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by
./01( ./31( ./21(
Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: G
EvidenceQ|| EvidenceQX Guideline
!"#$#%&'(
22
23 17
15
Research Question 13
14 20
16 21
18 19
0
1
24
3. 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,%
Advisors: Paul Groth and Frank van )67.4%
Harmelen 49-9:+*9)6% >:).*91;%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+%
49-9:+*9)6%
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
Problem Statement An initial prototype im
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
The provenance of a data item is the metadata describing how,
4,!5(
67"8#(
4,!5(
!"$"8$"!+(
9"$"!+$"-#:
!"$"8$"!+(
Initial (encouraging
As a first step we focus on dependenc
when !"!#$%!&'( whom the data item was produced.
and by )#*+$#!,$)%!&'( sequences of operations.
We performed an experiment with a sm
=+",# # # # # # # # # # # # # #></*?,5#
!,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by tw
Provenance is crucial in many ./01(
settings, but often it is ./21( tracked,
./31( not We implemented a prototype of the pip
resulting in collections of files with only basic filesystem components, like Apache Lucene, Apa
Cluster 1: Blood Cultures
EvidenceQ||
Cluster 2: Markers
EvidenceQX
Cluster 3: General
Guideline
metadata, e.g. timestamps. As signal detectors we used well-know
!"#$#%&'(
22
23 17
In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% 15
D672)A.4.4%C*F191;% 2
Research Question '()*+,)%-.)+/+)+%%
13
14 20
16
8.()%49-9:+*9)6%
21
18 19
0
@9:).*%).-72*+:%
91,2A.*.1,.%
!
1
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#
@*A#<7"#A,#8,/# 24
B9-9:+*9)6%
How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
can one automatically, accurately and efficiently
9*5,#.":*597B*"C#
& 01/.(%,21).1)% 5
0-+;.%49-9:+*9)6%
)A*.4A2:/4%
reconstruct a plausible provenance of files in a shared folder,
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 23
)"*+#,-*+(
013.*%4.-+15,% <2-+91=47.,9>,% 20
<2-+91=47.,9>,% 17
>:).*91;%
intended as the sequences of operations connecting the files? )67.4% 49-9:+*9)6%
19
4 15
3 14
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# 2 ?.)+/+)+% 18
6
49-9:+*9)6% 22
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 21
16
0 13
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 1
Approach & Methodology
Initial (encouraging
24
Cluster 1: Blood Cultures Cluster 2: Markers C
EvidenceQ|| EvidenceQX G
9"$"!+$"-#:
4,!5( !"$"8$"!+(
4,!5(
67"8#( !"$"8$"!+(
We !"!#$%!&'(
propose )#*+$#!,$)%!&'(
a multi-signal pipeline approach that reconstructs F1-score of 0.49an experiment with a sm
We performed for only text similarity
plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and
=+",# # # # # #
using
# #
!,-)#$%!!)(
# #></*?,5#
!,-)#$%!!)(
F1-score of 0.70 for the aggregation of v
publications, annotated manually by tw
metadata as evidence of the./01( relationships between./21(
./31(
files.
Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General
Future work
EvidenceQ|| EvidenceQX Guideline
The pipeline consists of four stages, each containing several
!"#$#%&'(
22
components that can be executed in parallel: #$4:2-4#-';'<=>'
23 17
15 2
Following the planned methodology, we
8$#A' @1-%1$#-AA)4,'
Research Question
B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,'
!
#$%&'
"
components for each of the pipeline ph
13
14 20
16 21
18 19
0
./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!'
computational efficiency. 1
( 24
4. 013.*%4.-+15,%
013.*%4.-+15,% <2-+91=47.,9>,%
<2-+91=47.,9>,% <2-+91=47.,9>,%
<2-+91=47.,9>,%
>:).*91;%
>:).*91;%
)"*+#,
)67.4%
)67.4% 2
49-9:+*9)6%
49-9:+*9)6% 18
6 22
21
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+%
?.)+/+)+% 16
49-9:+*9)6%
49-9:+*9)6%0 13
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 1
Approach & Methodology
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
!"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
Cluster 1: Blood Cultures
EvidenceQ||
Cluster 2: Markers
EvidenceQX
24
Cluste
Guide
We propose a multi-signal pipeline approach that reconstructs
4,!5(
4,!5(
67"8#(
67"8#(
4,!5(
4,!5(
!"$"8$"!+(
!"$"8$"!+(
9"$"!+$"-#:
9"$"!+$"-#:
!"$"8$"!+(
!"$"8$"!+(
Initial (encouraging)
Initial (encouraging
F1-score of 0.49 for only text similarity
plausible provenance traces using the contents of the files and
)#*+$#!,$)%!&'(
)#*+$#!,$)%!&'(
F1-score of 0.70 for the aggregation of va
!"!#$%!&'(
!"!#$%!&'(
metadata #as evidence of# the relationships between files.
=+",# #
=+",# # # # # # # # # # # # # # # # # # # # # # #></*?,5#
#></*?,5#
We performed an experiment with a a sm
We performed an experiment with sma
!,-)#$%!!)(
!,-)#$%!!)( !,-)#$%!!)(
!,-)#$%!!)( !,-)#$%!!)(
!,-)#$%!!)( publications, annotated manually by two
publications, annotated manually by tw
./01(
./01( ./31(
./31( ./21(
./21(
The pipeline consists of four stages, each containing several
components that can be executed in parallel:
Cluster 1: Blood Blood Cultures Cluster 2: Markers
Cluster 1: Cultures
EvidenceQ||
EvidenceQ||
Cluster 2: Markers
Future work EvidenceQX
EvidenceQX
Cluster 3: General
Cluster 3: General
Guideline
Guideline
!"#$#%&'(
!"#$#%&'(
#$4:2-4#-';'<=>' 22 22
#$%&' Following the planned methodology, we w 23 23 17 17
8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,'
components for each of the pipeline phas 15 15 2 2 6
Research Question
Research Question
! " 14 14 20 20 18 18 19 19 4
./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!'
computational efficiency.
(
13 13 16 16 21 21 0 0 3
)*+,-'
! 1 1
(
How can automatically, accurately and efficiently #$4:2-4#-';'<=?'
How342-/'#$40-40' one automatically, 6),4+7'9)70-1('
can one 6),4+7'8-0-#0$1('
24 24
G,,1-,+0$1('
accurately and efficiently
"
Bibliography
5 5
#$%&'
reconstruct a a plausible provenance of files ===' a shared folder,
reconstruct plausible provenance of files in in shared folder,
5' 5' 5' a 23 23
)"*+#,-*+(
)"*+#,-*+(
20 20 17 17
intended as the sequences ofof operations connecting the!files?
intended as the sequences operations connecting the files?
19 19
" 4 4 15 15
(1) Sara Magliacane: Reconstructing Prove
3 3 14 14
(
2 2 18 18
Consortium 2012
6 6 22 22
21 21
16 16
0 0 13 13
The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacan
1 1
Approach &&Methodology
Approach Methodology
incrementally integrate existing approaches in literature and Annotation through Reconstructing Provena
Cluster 1: BloodBlood Cultures
Cluster 1: Cultures
Workshop on the role of Semantic Web in P
EvidenceQ||
EvidenceQ||
Cluster 2: Markers
Cluster 2: Markers
EvidenceQX
EvidenceQX
24 24
Cluste
Guide
C
G
evaluate the performance on benchmark corpora.
ESWC 2012
We propose a a multi-signal pipeline approach that reconstructs
We propose multi-signal pipeline approach that reconstructs F1-score ofof 0.49 for only text similarity
F1-score 0.49 for only text similarity
plausible provenance traces using the contents ofof the files and
plausible provenance traces using the contents the files and F1-score ofof 0.70 for the aggregation of v
F1-score 0.70 for the aggregation of va
metadata as evidence ofof the relationships between files.
metadata as evidence the relationships between files.
The pipeline consists ofof four stages, each containing several
The pipeline consists four stages, each containing several
components that can be executed in in parallel:
components that can be executed parallel:
Future work
Future work
#$4:2-4#-';'<=>'
#$4:2-4#-';'<=>'
#$%&'
#$%&' Following the planned methodology, we w
Following the planned methodology, we
8$#A'
8$#A' @1-%1$#-AA)4,'
@1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,'
B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,'
G,,1-,+E$4'+42'1+4H)4,'
! ! " " components for each ofof the pipeline ph
components for each the pipeline phas
./01+#0'*-0+2+0+''
./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!'
6),4+7'8-0-#0$1!' 6),4+7'9)70-1!'
6),4+7'9)70-1!' G,,1-,+0$1!'
G,,1-,+0$1!'
computational efficiency.
computational efficiency.
( (
5. isors: Paul Groth and Frank van Harmelen
nt An initial prototype implementation
adata describing how, As a first step we focus on dependencies between files instead of
duced. sequences of operations.
t often it is not tracked, We implemented a prototype of the pipeline using open-source
sic filesystem components, like Apache Lucene, Apache Tika and Dropbox API.
As signal detectors we used well-known similarity measures.
ovenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;%
!#$%
@9:).*%).-72*+:% ! "
'()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-%
91,2A.*.1,.%
!
<7"#A,#8,/# B9-9:+*9)6%
& $#"%
& 01/.(%,21).1)% 0-+;.%49-9:+*9)6%
#.":*597B*"C# )A*.4A2:/4%
"
013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,%
)67.4% 49-9:+*9)6% >:).*91;%
563-:6################# ?.)+/+)+%
49-9:+*9)6%
,<05,3*5/63-:6#
3,563-:6#
9"$"!+$"-#:
!"$"8$"!+(
Initial (encouraging) results
# # #></*?,5#
We performed an experiment with a small set of biomedical
,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts.
31( ./21(
Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General
EvidenceQ|| EvidenceQX Guideline
!"#$#%&'(
22
23 17
15 2 6 7
on 13
14 20
16 21
18 19
0
1
4
3 5
8
9 10
11
24 12