SlideShare a Scribd company logo
1 of 12
Reproducible scientific workflows
Tomasz Miksa
Vienna University of Technology
& SBA Research, Austria
Tomasz Miksa tmiksa@sba-research.org
eScience and Research Infrastructures
 Scientists exchange
- facilities
- resources
- services
- datasets
 Research requires
- special tooling and software
- workflows to
• capture
• transform
• visualize
• interpret the data
Tomasz Miksa tmiksa@sba-research.org
Taverna Workflow
Workflows and Context
 ‘Workflows’ can be
- ad hoc commands and scripts
executed manually
- well-structured processes
executed within a controlled environment
 Workflows
- share infrastructure with other processes
- delegate tasks to tools installed in the system
- require specific configurations
- can use distributed systems
#!/bin/bash
# fetch data
java -jar GestBarragensWSClientIQData.jar
unzip -o IQData.zip
# fix encoding
#iconv -f LATIN1 -t UTF-8 iq.r > iq_utf8.r
# generate references
R --vanilla < iq_utf8.r > IQout.txt
# create pdf
pdflatex iq.tex
pdflatex iq.tex
Script
Tomasz Miksa tmiksa@sba-research.org
Reproducibility
 Current studies show very low reproducibility in
- medicine
- economy
- computer science
 Reproducibility requires
- well documented research workflows
- precise information
on the experiment's environment
Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Neuroanatomical studies
 FreeSurfer Software
- cortical thickness and volume of neuroanatomical structures
 Different
- FreeSurfer Versions
• v4.3.1, v4.5.0, v5.0.0
- Workstation
• Mac, Hewlett‐Packard
- Operating system version
• OSX 10.5, OSX 10.6
E. Gronenschild, P. Habets, H. I. L. Jacobs, R. Mengelers, N. Rozendaal, J. van Os, and M. Marcelis, “The effects
of freesurfer version, workstation type, and macintosh operating system version on anatomical volume and cortical
thickness measurements,” 2012.
Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Computer Science
 613 papers in 8 ACM conferences
C. Collberg and T. Proebsting, “Measuring reproducibility in computer systems research,”
2014. [Online]. Available: http://reproducibility.cs.arizona.edu/tr.pdf
Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Computer Science
 E-mail responses from authors
- Wrong version
- Code will be available soon
- Programmer left
- Bad backup practices
- Commercial code
- Proprietary academic code
- Intellectual property
- No intention to release
- …
Variety of solutions
 Workflow systems
 Interactive notebooks
 Virtualisation
 Containers
 Code repositories
 Automated builds
 Service monitoring
 Metadata standards
 Provenance
 Preservation planning
 Repositories
Tomasz Miksa tmiksa@sba-research.org
TIMBUS - Process preservation
 Digital preservation of business processes
 Based on risk management
 Context modelling is the key
Tomasz Miksa tmiksa@sba-research.org
TIMBUS - Context modelling
 Context model
 Automated extractors
 Process execution monitoring
 Service monitoring
TIMBUS - Risk mitigation
strategies
 Metadata and
documentation
 Migration
- File formats
- Storage media
- Alternative services
• Open source service
• In‐housing of services
 Emulation
 Virtualisation
 Mock‐up of systems
Tomasz Miksa tmiksa@sba-research.org
Summary
 Scientific experiments
- workflows for data processing with software dependencies
 Risks affecting reproducibility
- low due to insufficient experiment description
 Solutions for improving reproducibility
- improve data management, sharing and reuse
 TIMBUS approach for process preservation
- based on risk management practices
- using context modelling to evaluate preservation alternatives

More Related Content

Viewers also liked

PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...PERICLES_FP7
 
Autoevaluacion real
Autoevaluacion realAutoevaluacion real
Autoevaluacion realkevinamores
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES_FP7
 
Silabo de invopera2015
Silabo de invopera2015Silabo de invopera2015
Silabo de invopera2015Raul Logroño
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangePERICLES_FP7
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016PERICLES_FP7
 

Viewers also liked (9)

La naturaleza es vida
La naturaleza es vidaLa naturaleza es vida
La naturaleza es vida
 
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
 
Autoevaluacion real
Autoevaluacion realAutoevaluacion real
Autoevaluacion real
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016
 
Silabo de invopera2015
Silabo de invopera2015Silabo de invopera2015
Silabo de invopera2015
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on Change
 
Planning of a project
Planning of a projectPlanning of a project
Planning of a project
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
 
Type of communication
Type of communicationType of communication
Type of communication
 

Similar to Reproducibile scientific workflows - Acting on Change 2016

Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Research Data Alliance
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platformconfluent
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOSQAware GmbH
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Merlien Institute
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Codemotion
 
utdallas.edu
utdallas.eduutdallas.edu
utdallas.edubutest
 
utdallas.edu
utdallas.eduutdallas.edu
utdallas.edubutest
 

Similar to Reproducibile scientific workflows - Acting on Change 2016 (20)

H2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional DesignH2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional Design
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
 
Data streaming at VRT
Data streaming at VRTData streaming at VRT
Data streaming at VRT
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
utdallas.edu
utdallas.eduutdallas.edu
utdallas.edu
 
utdallas.edu
utdallas.eduutdallas.edu
utdallas.edu
 

More from PERICLES_FP7

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17PERICLES_FP7
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...PERICLES_FP7
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopPERICLES_FP7
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelPERICLES_FP7
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectivePERICLES_FP7
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...PERICLES_FP7
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...PERICLES_FP7
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...PERICLES_FP7
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016PERICLES_FP7
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES_FP7
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...PERICLES_FP7
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016PERICLES_FP7
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...PERICLES_FP7
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES_FP7
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES_FP7
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES_FP7
 
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...PERICLES_FP7
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termPERICLES_FP7
 
The PeriCAT Framework
The PeriCAT FrameworkThe PeriCAT Framework
The PeriCAT FrameworkPERICLES_FP7
 
PERICLES - Choice of Information Encapsulation (IE) Technique
PERICLES - Choice of Information Encapsulation (IE) TechniquePERICLES - Choice of Information Encapsulation (IE) Technique
PERICLES - Choice of Information Encapsulation (IE) TechniquePERICLES_FP7
 

More from PERICLES_FP7 (20)

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshop
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information Model
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspective
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
 
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-term
 
The PeriCAT Framework
The PeriCAT FrameworkThe PeriCAT Framework
The PeriCAT Framework
 
PERICLES - Choice of Information Encapsulation (IE) Technique
PERICLES - Choice of Information Encapsulation (IE) TechniquePERICLES - Choice of Information Encapsulation (IE) Technique
PERICLES - Choice of Information Encapsulation (IE) Technique
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Reproducibile scientific workflows - Acting on Change 2016

  • 1. Reproducible scientific workflows Tomasz Miksa Vienna University of Technology & SBA Research, Austria
  • 2. Tomasz Miksa tmiksa@sba-research.org eScience and Research Infrastructures  Scientists exchange - facilities - resources - services - datasets  Research requires - special tooling and software - workflows to • capture • transform • visualize • interpret the data
  • 3. Tomasz Miksa tmiksa@sba-research.org Taverna Workflow Workflows and Context  ‘Workflows’ can be - ad hoc commands and scripts executed manually - well-structured processes executed within a controlled environment  Workflows - share infrastructure with other processes - delegate tasks to tools installed in the system - require specific configurations - can use distributed systems #!/bin/bash # fetch data java -jar GestBarragensWSClientIQData.jar unzip -o IQData.zip # fix encoding #iconv -f LATIN1 -t UTF-8 iq.r > iq_utf8.r # generate references R --vanilla < iq_utf8.r > IQout.txt # create pdf pdflatex iq.tex pdflatex iq.tex Script
  • 4. Tomasz Miksa tmiksa@sba-research.org Reproducibility  Current studies show very low reproducibility in - medicine - economy - computer science  Reproducibility requires - well documented research workflows - precise information on the experiment's environment
  • 5. Tomasz Miksa tmiksa@sba-research.org Reproducibility Neuroanatomical studies  FreeSurfer Software - cortical thickness and volume of neuroanatomical structures  Different - FreeSurfer Versions • v4.3.1, v4.5.0, v5.0.0 - Workstation • Mac, Hewlett‐Packard - Operating system version • OSX 10.5, OSX 10.6 E. Gronenschild, P. Habets, H. I. L. Jacobs, R. Mengelers, N. Rozendaal, J. van Os, and M. Marcelis, “The effects of freesurfer version, workstation type, and macintosh operating system version on anatomical volume and cortical thickness measurements,” 2012.
  • 6. Tomasz Miksa tmiksa@sba-research.org Reproducibility Computer Science  613 papers in 8 ACM conferences C. Collberg and T. Proebsting, “Measuring reproducibility in computer systems research,” 2014. [Online]. Available: http://reproducibility.cs.arizona.edu/tr.pdf
  • 7. Tomasz Miksa tmiksa@sba-research.org Reproducibility Computer Science  E-mail responses from authors - Wrong version - Code will be available soon - Programmer left - Bad backup practices - Commercial code - Proprietary academic code - Intellectual property - No intention to release - …
  • 8. Variety of solutions  Workflow systems  Interactive notebooks  Virtualisation  Containers  Code repositories  Automated builds  Service monitoring  Metadata standards  Provenance  Preservation planning  Repositories
  • 9. Tomasz Miksa tmiksa@sba-research.org TIMBUS - Process preservation  Digital preservation of business processes  Based on risk management  Context modelling is the key
  • 10. Tomasz Miksa tmiksa@sba-research.org TIMBUS - Context modelling  Context model  Automated extractors  Process execution monitoring  Service monitoring
  • 11. TIMBUS - Risk mitigation strategies  Metadata and documentation  Migration - File formats - Storage media - Alternative services • Open source service • In‐housing of services  Emulation  Virtualisation  Mock‐up of systems
  • 12. Tomasz Miksa tmiksa@sba-research.org Summary  Scientific experiments - workflows for data processing with software dependencies  Risks affecting reproducibility - low due to insufficient experiment description  Solutions for improving reproducibility - improve data management, sharing and reuse  TIMBUS approach for process preservation - based on risk management practices - using context modelling to evaluate preservation alternatives