SlideShare a Scribd company logo
1 of 43
Download to read offline
Converting Scripts into Reproducible
Workflow Research Objects
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros
lucas.carvalho@ic.unicamp.br
Baltimore, Maryland, USA
October 23-26, 2016
2
Background and Motivation
● Data-Intensive Experiments
– Collection of scripts, programs and (big) data
Papers
3
Background and Motivation
● Data-Intensive Experiments
– Collection of scripts, programs and (big) data
Papers
How to understand,
reproduce or reuse
data and models of
experiments?
4
Background and Motivation
● Data-Intensive Experiments
– Collection of scripts, programs and (big) data
Manual collection and
organization of data provenance
Papers
How to understand,
reproduce or reuse
data and models of
experiments?
5
Background and Motivation
● Script-based experiments
What are the inputs
and outputs?
How to change this
local program for a
similar web service?
Example of script code.
Difficult to
understand, to reuse,
and to reproduce.
6
Background and Motivation
● Scientific Workflows
Example of Scientific Workflow Management System.
7
Create
Understand
Reuse
Reproduce
Overview
8
Create
Understand
Reuse
Reproduce
Overview
+
9
Create
Understand
Reuse
Reproduce
Overview
+
Step 2
Step 1
Step 3
Step 4
Step 5
Methodology
10
Related Work
● Script-language specific.
● Workflow-engine specific.
● A new language is needed.
● Outcome is not an executable workflow.
● Do not collect provenance data of the
conversion process.
11
Two Kind of Experts
● Scientists
– Domain experts who understand the experiment, and
the script (sometimes called user);
● Curators:
– Scientists who are also familiar with workflow and
script programming or;
– Computer scientists who are familiar enough with the
domain to be able to implement our methodology;
– Responsible for authoring, documenting and
publishing workflows and associated resources.
12
Requirements
● Produce workflow-like view of the script.
● Create an executable workflow and compare
execution of workflow and script.
● Modify the workflow resources.
● Record provenance data.
● Aggregate all resources to support
Reproducibility and Reuse.
1
2
3
4
5
13
Requirements
● Produce workflow-like view of the script.1
Activity 1
Port 1 Port 2 Port 3
Port 1 Port 2
Activity 2
Port 3
Port 3
Activity n
Port n
Script-based experiment.
Abstract workflow.
14
Requirements
● Create executable workflow and compare
execution of workflow and script.
2
Executable workflow. Script-based experiment.
15
Requirements
● Modify the workflow resources.3
Local
(a)
(b)
Algorithm A Algorithm B
16
Requirements
● Record provenance data4
Activity 1
Output 1 Output 2
wasGeneratedBy wasGeneratedBy
Sample
used
“2012-06-01”
wasStartedAt
Activity 2
used
LucasWorkflow
Run
wasAssociatedWith
used
17
Requirements
● Aggregate all resources to support
Reproducibility and Reuse.
5
Abstract
workflows
Concrete
workflows
Annotations
Papers and
Reports
Provenance
Authors
Scripts
Data
18
Script
Generate Abstract
Workflow
Generate Abstract
Workflow
Create an
executable workflow
Create an
executable workflow
Refine workflowRefine workflow
Bundle Resources into
a Research Object
Bundle Resources into
a Research Object
Annotate and
check quality
Annotate and
check quality
Abstract
workflow
Concrete
workflow
2
1
3
4
5
Methodology
19
Workflow Research Object (WRO)
● Research Objects are
semantically rich
aggregations of resources
that bring together data,
methods and people in
scientific investigations.
● WROs encapsulate scientific
workflows and additional
information regarding their
context and resources.
Research Object Model
20
Running Example
● Molecular Dynamics Simulations
– Many branches of material sciences, computational
engineering, physics and chemistry.
– Scripts (shell script), programs (NAMD, VMD, Fortran)
– Phases: set up, simulation and analysis of trajectories.
– Inputs: protein structure, simulation parameters and
force field files.
– Output: trajectories and analysis results.
21
Step
Generate Abstract Workflow
1
Script code.
22
Step
Generate Abstract Workflow
1
Manually
annotate
Script code.
Annotated script code.
23
Step
Generate Abstract Workflow
1
Manually
annotate
Create
workflow-like
view
Script code.
Annotated script code.
Abstract workflow.
24
Step
Generate Abstract Workflow
1
code blocks
Input/ouput
YesWorkflow
McPhillips et. al, 2015
- Code comments
- Tags:
● @begin
● @end
● @desc
● @in
● @out
● ...
T. McPhillips et al. (2015), “Yesworkflow: A user-oriented, language-
independent tool for recovering workflow information from scripts,”
International Journal of Digital Curation, vol. 10, no. 1, pp. 298–313, 2015.
Create
Workflow-like
view
Abstract workflow.
Annotated script code.
25
Step
Generate Abstract Workflow
1
Create
Workflow-like
view
Abstract workflow.
Annotated script code.
26
Step
Create an executable workflow
2
Abstract workflow.
27
Step
Create an executable workflow
2
Create implementation
of activities
Copy code blocks from
the script.
Abstract workflow.
Executable workflow.
28
Step
Create an executable workflow
2
Create implementation
of activities
Copy code blocks from
the script.
Abstract workflow.
Executable workflow.
29
Step
Create an executable workflow
2
Create implementation
of activities
Copy code blocks from
the script.
Abstract workflow.
Executable workflow.
Script code.
30
Step
Refine executable workflow
3
Modify resources:
● Algorithms
● Data Sets
● Parallelization
● Web Services
● ...
Executable workflow.
New workflow version.
31
Step
Refine executable workflow
3
Create new
version
Modify resources:
● Algorithms
● Data Sets
● Parallelization
● Web Services
● ...
Executable workflow.
New workflow version.
32
Steps
Record provenance data: execution traces.
2 3
wasEnactedBy
split
Output 1 Output 2
wasGeneratedBy wasGeneratedBy
Sample
used
“2012-06-01”
wasStartedAt
psgen
used
LucasWorkflow
Run
wasAssociatedWith
used
hasSpecification
W3C PROV
Executable workflow.
33
Steps
Record provenance data: conversion process.
2 3
wasDerivedFrom
wasDerivedFrom
wasDerivedFrom
wasAssociatedWith
CuratorCurator
W3C PROV
Executable workflow.
New workflow version.
Script code.
34
Step
Annotate and check quality
● Annotations describing the workflow.
● Use provenance data
– To check the quality of the conversion process.
● Run checks to verify the soundness of the
workflow.
4
35
Step
Annotate and check quality
4
Script code.
Executable workflow.
36
Step
Annotate and check quality
4
Workflow version.
Initial Executable workflow.
37
Step
Annotate and check quality
● Common mistakes during the conversion:
– not clearly identified the main logical processing
units in the script;
– a mistake when migrating script code into the
corresponding activity;
– not provided the correct input files and parameters;
– the coding of the workflow itself contained errors.
4
38
Step
Bundle Resources into a Research Object
5
Script Abstract
workflow
Concrete
workflow(s)
Annotations
Paper
Provenance
Data
Attributions
39
Contributions
● A methodology that guides curators in a
principled manner to transform scripts into
reproducible and reusable WRO;
● This addresses an important issue in the area
of script provenance;
40
Conclusions
● We addressed issues wrt understanding, reuse and
reproducibility of script-based experiments.
● The methodology created was:
– elaborated based on requirements;
– showcased via a real world use case from the field of Molecular
Dynamics;
● We exploited tools and standards from the scientific
community:
– Scientific Workflows, YesWorkflow, Research Objects, the W3C
PROV recommendations and the Web Annotation Data Model.
● The bundle is available at http://w3id.org/w2share/s2rwro/
41
Next Steps
● Evaluation using other case studies;
● Evaluation of the cost of the effectiveness of
our methodology;
● Extension of YesWorkflow to support the
semantic annotation of blocks;
● Implementation of tools.
42
Acknowledgments
● FAPESP (grant # 2014/23861-4)
● CCES/CEPID (grant # 2013/08293-7)
– Center for Computational Engineering & Sciences
● LIS (Laboratory of Information Systems)
● Prof. Munir Skaf and his group from Institute of
Chemistry - Unicamp.
Converting Scripts into Reproducible
Workflow Research Objects
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros
lucas.carvalho@ic.unicamp.br
Baltimore, Maryland, USA
October 23-26, 2016

More Related Content

What's hot

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Axel Reichwein
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
 
Improving the chemistry content of Wikipedia using workflow tools
Improving the chemistry content of Wikipedia using workflow toolsImproving the chemistry content of Wikipedia using workflow tools
Improving the chemistry content of Wikipedia using workflow toolsMitch Miller
 
Achieving the digital thread through PLM and ALM integration using oslc
Achieving the digital thread through PLM and ALM integration using oslcAchieving the digital thread through PLM and ALM integration using oslc
Achieving the digital thread through PLM and ALM integration using oslcAxel Reichwein
 
Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Axel Reichwein
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
Introduction to Open Services for Lifecycle Collaboration (OSLC)
Introduction to Open Services for Lifecycle Collaboration (OSLC)Introduction to Open Services for Lifecycle Collaboration (OSLC)
Introduction to Open Services for Lifecycle Collaboration (OSLC)Axel Reichwein
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesResearch Data Alliance
 
Standard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary CollaborationStandard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary CollaborationAxel Reichwein
 
2017 06-01-eswc2017-ug
2017 06-01-eswc2017-ug2017 06-01-eswc2017-ug
2017 06-01-eswc2017-ugMonika Solanki
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
OSLC & The Future of Interoperability
OSLC & The Future of InteroperabilityOSLC & The Future of Interoperability
OSLC & The Future of InteroperabilityKoneksys
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 

What's hot (20)

The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
Improving the chemistry content of Wikipedia using workflow tools
Improving the chemistry content of Wikipedia using workflow toolsImproving the chemistry content of Wikipedia using workflow tools
Improving the chemistry content of Wikipedia using workflow tools
 
Achieving the digital thread through PLM and ALM integration using oslc
Achieving the digital thread through PLM and ALM integration using oslcAchieving the digital thread through PLM and ALM integration using oslc
Achieving the digital thread through PLM and ALM integration using oslc
 
Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC)
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
Introduction to Open Services for Lifecycle Collaboration (OSLC)
Introduction to Open Services for Lifecycle Collaboration (OSLC)Introduction to Open Services for Lifecycle Collaboration (OSLC)
Introduction to Open Services for Lifecycle Collaboration (OSLC)
 
Coming to terms to FAIR semantics
Coming to terms to FAIR semanticsComing to terms to FAIR semantics
Coming to terms to FAIR semantics
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologies
 
Standard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary CollaborationStandard Web APIs for Multidisciplinary Collaboration
Standard Web APIs for Multidisciplinary Collaboration
 
2017 06-01-eswc2017-ug
2017 06-01-eswc2017-ug2017 06-01-eswc2017-ug
2017 06-01-eswc2017-ug
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
OSLC & The Future of Interoperability
OSLC & The Future of InteroperabilityOSLC & The Future of Interoperability
OSLC & The Future of Interoperability
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 

Viewers also liked

元OracleMasterPlatinumがCloudSpanner触ってみた
元OracleMasterPlatinumがCloudSpanner触ってみた元OracleMasterPlatinumがCloudSpanner触ってみた
元OracleMasterPlatinumがCloudSpanner触ってみたKumano Ryo
 
2017年春のPerl
2017年春のPerl2017年春のPerl
2017年春のPerlcharsbar
 
Blenderで学ぶスカルプト勉強会
Blenderで学ぶスカルプト勉強会Blenderで学ぶスカルプト勉強会
Blenderで学ぶスカルプト勉強会kurosaurus
 
The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17NVIDIA
 
55 New Features in JDK 9
55 New Features in JDK 955 New Features in JDK 9
55 New Features in JDK 9Simon Ritter
 
15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedIn15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedInLinkedIn
 
The Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsThe Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsGood Funnel
 
The Be-All, End-All List of Small Business Tax Deductions
The Be-All, End-All List of Small Business Tax DeductionsThe Be-All, End-All List of Small Business Tax Deductions
The Be-All, End-All List of Small Business Tax DeductionsWagepoint
 
ELSA France "Teaching is us!"
ELSA France "Teaching is us!" ELSA France "Teaching is us!"
ELSA France "Teaching is us!" Adrian Scarlett
 
Sharing fängt mit geben an - Interview im HR Performance Magazin
Sharing fängt mit geben an - Interview im HR Performance MagazinSharing fängt mit geben an - Interview im HR Performance Magazin
Sharing fängt mit geben an - Interview im HR Performance MagazinHarald Schirmer
 
Introduction to computer virus
Introduction to computer virusIntroduction to computer virus
Introduction to computer virusYouQue ™
 
Powering of bangladesh- Vision 2021
Powering of bangladesh- Vision 2021Powering of bangladesh- Vision 2021
Powering of bangladesh- Vision 2021Mukhlasur Rahman
 
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetesLuchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetesThierry Debels
 
Sustainability Day Leeds 2017
Sustainability Day Leeds 2017Sustainability Day Leeds 2017
Sustainability Day Leeds 20174 All of Us
 
Gramatrix: El despertar (I)
Gramatrix: El despertar (I)Gramatrix: El despertar (I)
Gramatrix: El despertar (I)guest020711
 

Viewers also liked (19)

元OracleMasterPlatinumがCloudSpanner触ってみた
元OracleMasterPlatinumがCloudSpanner触ってみた元OracleMasterPlatinumがCloudSpanner触ってみた
元OracleMasterPlatinumがCloudSpanner触ってみた
 
2017年春のPerl
2017年春のPerl2017年春のPerl
2017年春のPerl
 
Blenderで学ぶスカルプト勉強会
Blenderで学ぶスカルプト勉強会Blenderで学ぶスカルプト勉強会
Blenderで学ぶスカルプト勉強会
 
The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17
 
55 New Features in JDK 9
55 New Features in JDK 955 New Features in JDK 9
55 New Features in JDK 9
 
Technology Vision 2017 - Overview
Technology Vision 2017 - OverviewTechnology Vision 2017 - Overview
Technology Vision 2017 - Overview
 
15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedIn15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedIn
 
The Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsThe Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer Interviews
 
The Be-All, End-All List of Small Business Tax Deductions
The Be-All, End-All List of Small Business Tax DeductionsThe Be-All, End-All List of Small Business Tax Deductions
The Be-All, End-All List of Small Business Tax Deductions
 
ELSA France "Teaching is us!"
ELSA France "Teaching is us!" ELSA France "Teaching is us!"
ELSA France "Teaching is us!"
 
Delhi State Report - February 2017
Delhi State Report - February 2017Delhi State Report - February 2017
Delhi State Report - February 2017
 
Sharing fängt mit geben an - Interview im HR Performance Magazin
Sharing fängt mit geben an - Interview im HR Performance MagazinSharing fängt mit geben an - Interview im HR Performance Magazin
Sharing fängt mit geben an - Interview im HR Performance Magazin
 
Introduction to computer virus
Introduction to computer virusIntroduction to computer virus
Introduction to computer virus
 
Powering of bangladesh- Vision 2021
Powering of bangladesh- Vision 2021Powering of bangladesh- Vision 2021
Powering of bangladesh- Vision 2021
 
Retrato do Brasil 2015
Retrato do Brasil 2015Retrato do Brasil 2015
Retrato do Brasil 2015
 
Nuts and Bolts of GMOs - Harold Trick
Nuts and Bolts of GMOs - Harold TrickNuts and Bolts of GMOs - Harold Trick
Nuts and Bolts of GMOs - Harold Trick
 
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetesLuchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
 
Sustainability Day Leeds 2017
Sustainability Day Leeds 2017Sustainability Day Leeds 2017
Sustainability Day Leeds 2017
 
Gramatrix: El despertar (I)
Gramatrix: El despertar (I)Gramatrix: El despertar (I)
Gramatrix: El despertar (I)
 

Similar to Converting scripts into reproducible workflow research objects

Detecting common scientific workflow fragments using templates and execution ...
Detecting common scientific workflow fragments using templates and execution ...Detecting common scientific workflow fragments using templates and execution ...
Detecting common scientific workflow fragments using templates and execution ...dgarijo
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use caseDimitris Kontokostas
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)Revolution Analytics
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Replication and Benchmarking in Software Analytics
Replication and Benchmarking in Software AnalyticsReplication and Benchmarking in Software Analytics
Replication and Benchmarking in Software AnalyticsUniversity of Zurich
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...Aravind Sesagiri Raamkumar
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vsIan Feller
 

Similar to Converting scripts into reproducible workflow research objects (20)

Detecting common scientific workflow fragments using templates and execution ...
Detecting common scientific workflow fragments using templates and execution ...Detecting common scientific workflow fragments using templates and execution ...
Detecting common scientific workflow fragments using templates and execution ...
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Replication and Benchmarking in Software Analytics
Replication and Benchmarking in Software AnalyticsReplication and Benchmarking in Software Analytics
Replication and Benchmarking in Software Analytics
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
 

More from Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Khalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in SepublicaKhalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenanceKhalid Belhajjame
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Khalid Belhajjame
 

More from Khalid Belhajjame (20)

Provenance witha purpose
Provenance witha purposeProvenance witha purpose
Provenance witha purpose
 
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Anr cair meeting feb 2016
Anr cair meeting feb 2016Anr cair meeting feb 2016
Anr cair meeting feb 2016
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)
 

Recently uploaded

_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Converting scripts into reproducible workflow research objects

  • 1. Converting Scripts into Reproducible Workflow Research Objects Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros lucas.carvalho@ic.unicamp.br Baltimore, Maryland, USA October 23-26, 2016
  • 2. 2 Background and Motivation ● Data-Intensive Experiments – Collection of scripts, programs and (big) data Papers
  • 3. 3 Background and Motivation ● Data-Intensive Experiments – Collection of scripts, programs and (big) data Papers How to understand, reproduce or reuse data and models of experiments?
  • 4. 4 Background and Motivation ● Data-Intensive Experiments – Collection of scripts, programs and (big) data Manual collection and organization of data provenance Papers How to understand, reproduce or reuse data and models of experiments?
  • 5. 5 Background and Motivation ● Script-based experiments What are the inputs and outputs? How to change this local program for a similar web service? Example of script code. Difficult to understand, to reuse, and to reproduce.
  • 6. 6 Background and Motivation ● Scientific Workflows Example of Scientific Workflow Management System.
  • 10. 10 Related Work ● Script-language specific. ● Workflow-engine specific. ● A new language is needed. ● Outcome is not an executable workflow. ● Do not collect provenance data of the conversion process.
  • 11. 11 Two Kind of Experts ● Scientists – Domain experts who understand the experiment, and the script (sometimes called user); ● Curators: – Scientists who are also familiar with workflow and script programming or; – Computer scientists who are familiar enough with the domain to be able to implement our methodology; – Responsible for authoring, documenting and publishing workflows and associated resources.
  • 12. 12 Requirements ● Produce workflow-like view of the script. ● Create an executable workflow and compare execution of workflow and script. ● Modify the workflow resources. ● Record provenance data. ● Aggregate all resources to support Reproducibility and Reuse. 1 2 3 4 5
  • 13. 13 Requirements ● Produce workflow-like view of the script.1 Activity 1 Port 1 Port 2 Port 3 Port 1 Port 2 Activity 2 Port 3 Port 3 Activity n Port n Script-based experiment. Abstract workflow.
  • 14. 14 Requirements ● Create executable workflow and compare execution of workflow and script. 2 Executable workflow. Script-based experiment.
  • 15. 15 Requirements ● Modify the workflow resources.3 Local (a) (b) Algorithm A Algorithm B
  • 16. 16 Requirements ● Record provenance data4 Activity 1 Output 1 Output 2 wasGeneratedBy wasGeneratedBy Sample used “2012-06-01” wasStartedAt Activity 2 used LucasWorkflow Run wasAssociatedWith used
  • 17. 17 Requirements ● Aggregate all resources to support Reproducibility and Reuse. 5 Abstract workflows Concrete workflows Annotations Papers and Reports Provenance Authors Scripts Data
  • 18. 18 Script Generate Abstract Workflow Generate Abstract Workflow Create an executable workflow Create an executable workflow Refine workflowRefine workflow Bundle Resources into a Research Object Bundle Resources into a Research Object Annotate and check quality Annotate and check quality Abstract workflow Concrete workflow 2 1 3 4 5 Methodology
  • 19. 19 Workflow Research Object (WRO) ● Research Objects are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations. ● WROs encapsulate scientific workflows and additional information regarding their context and resources. Research Object Model
  • 20. 20 Running Example ● Molecular Dynamics Simulations – Many branches of material sciences, computational engineering, physics and chemistry. – Scripts (shell script), programs (NAMD, VMD, Fortran) – Phases: set up, simulation and analysis of trajectories. – Inputs: protein structure, simulation parameters and force field files. – Output: trajectories and analysis results.
  • 24. 24 Step Generate Abstract Workflow 1 code blocks Input/ouput YesWorkflow McPhillips et. al, 2015 - Code comments - Tags: ● @begin ● @end ● @desc ● @in ● @out ● ... T. McPhillips et al. (2015), “Yesworkflow: A user-oriented, language- independent tool for recovering workflow information from scripts,” International Journal of Digital Curation, vol. 10, no. 1, pp. 298–313, 2015. Create Workflow-like view Abstract workflow. Annotated script code.
  • 26. 26 Step Create an executable workflow 2 Abstract workflow.
  • 27. 27 Step Create an executable workflow 2 Create implementation of activities Copy code blocks from the script. Abstract workflow. Executable workflow.
  • 28. 28 Step Create an executable workflow 2 Create implementation of activities Copy code blocks from the script. Abstract workflow. Executable workflow.
  • 29. 29 Step Create an executable workflow 2 Create implementation of activities Copy code blocks from the script. Abstract workflow. Executable workflow. Script code.
  • 30. 30 Step Refine executable workflow 3 Modify resources: ● Algorithms ● Data Sets ● Parallelization ● Web Services ● ... Executable workflow. New workflow version.
  • 31. 31 Step Refine executable workflow 3 Create new version Modify resources: ● Algorithms ● Data Sets ● Parallelization ● Web Services ● ... Executable workflow. New workflow version.
  • 32. 32 Steps Record provenance data: execution traces. 2 3 wasEnactedBy split Output 1 Output 2 wasGeneratedBy wasGeneratedBy Sample used “2012-06-01” wasStartedAt psgen used LucasWorkflow Run wasAssociatedWith used hasSpecification W3C PROV Executable workflow.
  • 33. 33 Steps Record provenance data: conversion process. 2 3 wasDerivedFrom wasDerivedFrom wasDerivedFrom wasAssociatedWith CuratorCurator W3C PROV Executable workflow. New workflow version. Script code.
  • 34. 34 Step Annotate and check quality ● Annotations describing the workflow. ● Use provenance data – To check the quality of the conversion process. ● Run checks to verify the soundness of the workflow. 4
  • 35. 35 Step Annotate and check quality 4 Script code. Executable workflow.
  • 36. 36 Step Annotate and check quality 4 Workflow version. Initial Executable workflow.
  • 37. 37 Step Annotate and check quality ● Common mistakes during the conversion: – not clearly identified the main logical processing units in the script; – a mistake when migrating script code into the corresponding activity; – not provided the correct input files and parameters; – the coding of the workflow itself contained errors. 4
  • 38. 38 Step Bundle Resources into a Research Object 5 Script Abstract workflow Concrete workflow(s) Annotations Paper Provenance Data Attributions
  • 39. 39 Contributions ● A methodology that guides curators in a principled manner to transform scripts into reproducible and reusable WRO; ● This addresses an important issue in the area of script provenance;
  • 40. 40 Conclusions ● We addressed issues wrt understanding, reuse and reproducibility of script-based experiments. ● The methodology created was: – elaborated based on requirements; – showcased via a real world use case from the field of Molecular Dynamics; ● We exploited tools and standards from the scientific community: – Scientific Workflows, YesWorkflow, Research Objects, the W3C PROV recommendations and the Web Annotation Data Model. ● The bundle is available at http://w3id.org/w2share/s2rwro/
  • 41. 41 Next Steps ● Evaluation using other case studies; ● Evaluation of the cost of the effectiveness of our methodology; ● Extension of YesWorkflow to support the semantic annotation of blocks; ● Implementation of tools.
  • 42. 42 Acknowledgments ● FAPESP (grant # 2014/23861-4) ● CCES/CEPID (grant # 2013/08293-7) – Center for Computational Engineering & Sciences ● LIS (Laboratory of Information Systems) ● Prof. Munir Skaf and his group from Institute of Chemistry - Unicamp.
  • 43. Converting Scripts into Reproducible Workflow Research Objects Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros lucas.carvalho@ic.unicamp.br Baltimore, Maryland, USA October 23-26, 2016