SlideShare a Scribd company logo
1 of 31
Download to read offline
Making computations reproducible
Tokyo.SciPy #6
2014-08-02
1 / 31
Abstract
Scientific computations tend to involve a number of experiments
under different conditions.
It is important to manage computational experiments so that their
results are reproducible.
In this talk we introduce 3 rules to make computations reproducible.
2 / 31
Outline
...1 Introduction
...2 Discipline
Three elements
Three rules
Complements
...3 Practice
...4 Summary
3 / 31
1. Introduction
4 / 31
Background
A lab notebook is indispensable for experimental research in natural
science. One of its role is to make experiments reproducible.
Why not for computational research?
.
......Lack of reproducibility means lack of reliability.
5 / 31
Common problems
Common problems in computational experiments:
I confused which results is got under which condition.
I overwrote previous results without intent.
I used inconsistent data to get invalid results.
...
Not a few problems are caused due an inappropriate management
of experiments.
6 / 31
Goal
To archive all results of each experiment with
all information required to reproduce them
so that we can retrieve and restore easily
in a systematic and costless way.
7 / 31
Note
What is introduced in this talk is not a established methodology,
but a collection of field techniques. Same with wording.
In this talk, we will not deal with
distributed computation
documentation or test
publishing of a paper
release of OSS
8 / 31
2. Discipline
9 / 31
Three elements
We distinguish the following elements which affect reproducibility of
computations:
Algorithm an algorithm coded into a program
implemented by yourself, calling external library, ...
Data input and output data, intermediate data to reuse
Environment software and hardware environment
external library, server configuration, platform, ...
10 / 31
Three rules
Give an Identifier to each element and archive them.
Record a machine-readable Recipe
with a human-readable comments.
Make every manipulation Mechanized.
11 / 31
.
Identifier..
......Give an Identifier to each element and archive them.
Algorithm
use version control system
Data
give a name to distinguish data kind
give a version to distinguish concrete content
Environment
find information of platform
find a version (optionally build parameters) of a library
Keep in mind to track all elements during the whole process:
every code under version control
no data without an identifier
no temporary environment
12 / 31
.
Recipe
..
......
Record a machine-readable Recipe
with a human-readable comments.
A recipe should include all information
required to reproduce the results of an experiment
(other than contents of Algorithm, Data and Environment
stored in other place.)
A recipe should be machine-readable to re-conduct the experiment.
A recipe should include a human-readable comment
on purpose and/or meanings of the experiment.
A recipe should be generated automatically by tracking
experiments.
13 / 31
Typically a recipe include the following information:
in which order
which data is processed
by which algorithm
under which environment
with which Parameter
Typically a recipe consists of the followings:
a script file to run the whole process
a configuration file which specifies parameters and identifiers
a text file of comments
14 / 31
.
Mechanize..
......Make every manipulation Mechanized.
Run the whole process of an experiment by a single operation.
No manual manipulation of data.
No manual compilation of source codes.
Automated provision of an environment.
15 / 31
complement: Tentative experiment
Too large archive detracts substantive significant of reproducibility.
For tentative experiments with ephemeral results,
it is not necessarily required to record.
test of codes
trial on tiny data
...
If there is a possibility to get a result which might be used, referred
or looked up afterward, then it should be recorded.
16 / 31
complement: Reuse of intermediate data
In order to reuse intermediate data, utilize an identifier.
Explicitly specify intermediate data to reuse by an identifier.
Automatically detect available intermediate data
based on dependency.
...
17 / 31
3. Practice
18 / 31
Identify Algorithm
Use a version control system to manage source codes
such as Git and Mercurial.
It is easy to record a revision and uncommitted changes
at each experiment.
(Learn inside of VCS if you need more flexible management.)
19 / 31
Identify Data
File
Give appropriate names to directories and files,
then a resolved absolute path can be used as an identifier.
If no meaningful word is thought up, use time-stamp or hash.
DB or other API
A pair of URI and query of which results are constant
can be used as an identifier.
If API behaves randomly, keep the results at hand (w/time-stamp).
20 / 31
Identify Environment
Python package
Use PyPa tools (virtualenv, setuptools and pip) or Conda/enstaller.
Library
Use HashDist.
It is an alternative to utilize CDE.
Platform
Use platform, a standard library of Python
Server configuration
Use Ansible or other configuration management tool,
and Vagrant or other provisioning tool.
21 / 31
HashDist
A tool for developing, building and managing software stacks.
An software stack is described by YAML.
We can create, copy, move and remove software stacks.
$ git checkout stack.yml
$ hit build stack.yaml
22 / 31
Recipe: configuration file
A configuration in recipe should be of a machine-readable format.
Use ConfigParser, PyYAML or json module
to read/write parameters in INI, YAML or JSON format.
A receipt should include the followings:
command line argument
environment variable
random seed
23 / 31
Recipe: script file
A script in recipe should run the whole process
by a single operation.
There are several alternatives to realize such a script:
utilize a build tool (such as Autotools, Scons, and maf)
utilize a job-flow tool (such as Ruffus, Luigi)
write a small script by hand (e.g. run.py)
24 / 31
maf
“maf is a waf extension for writing computational experiments.”
Conduct computational experiments as build processes.
Focus on machine learning:
list configurations
run programs with each configuration
aggregate and visualize their results
25 / 31
Recipe: automatic generation
Do it yourself, or use Sumatra.
“Sumatra: automated tracking of scientific computations”
recording information about experiments, linking to data files
command line & web interface
integration with LATEX/Sphinx
$ smt run --executable=python --main=main.py 
conf.param input.data
$ smt comment "..."
$ smt info
$ smt repeat
26 / 31
4. Summary
27 / 31
Summary
We have introduced 3 rules to manage computational experiments
so that their results are reproducible.
However, our method is just a makeshift patchwork of field
techniques.
.
......
We need a tool to manage experiments
in more integrated, systematic and sophisticated manner
for reproducible computations.
28 / 31
Links
PyPa http://python-packaging-user-guide.
readthedocs.org
Conda http://conda.pydata.org
enstaller https://github.com/enthought/enstaller
HashDist http://hashdist.github.io
CDE http://www.pgbovine.net/cde.html
Ansible http://www.ansible.com
Vagrant http://www.vagrantup.com
Scons http://www.scons.org
maf https://github.com/pfi/maf
Ruffus http://www.ruffus.org.uk
Luigi https://github.com/spotify/luigi
Sumatra http://neuralensemble.org/sumatra
29 / 31
References
[1] G. K. Sandve, A. Nekrutenko, J. Taylor, E. Hovig, “Ten Simple Rules
for Reproducible Computational Research,” PLoS Comput. Biol.
9(10): e1003285 (2013). doi:10.1371/journal.pcbi.1003285
[2] V. Stodden, F. Leisch, R. Peng, “Implementing Reproducible
Research,” Open Science Framework (2014). osf.io/s9tya
30 / 31
fin.
back to outline
Revision: f2b0e97 (2014-08-03)
31 / 31

More Related Content

Similar to Making Computations Reproducible

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learningMikhail Rozhkov
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computingHeman Pathak
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaGeorge Markomanolis
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolIvo Jimenez
 
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareThe Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareAndrey Karpov
 
RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG Greg.Helton
 
OSLec 4& 5(Processesinoperatingsystem).ppt
OSLec 4& 5(Processesinoperatingsystem).pptOSLec 4& 5(Processesinoperatingsystem).ppt
OSLec 4& 5(Processesinoperatingsystem).pptssusere16bd9
 
The PeriCAT Framework
The PeriCAT FrameworkThe PeriCAT Framework
The PeriCAT FrameworkPERICLES_FP7
 
ECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/SnaptutorialECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/Snaptutorialpinck2380
 
ECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/SnaptutorialECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/Snaptutorialpinck200
 
A report on designing a model for improving CPU Scheduling by using Machine L...
A report on designing a model for improving CPU Scheduling by using Machine L...A report on designing a model for improving CPU Scheduling by using Machine L...
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventuremylittleadventure
 
Citadel training on context awareness solution
Citadel training on context awareness solutionCitadel training on context awareness solution
Citadel training on context awareness solutionRamnGonzlezRuiz2
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
 

Similar to Making Computations Reproducible (20)

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to Omega
 
18CSL48.pdf
18CSL48.pdf18CSL48.pdf
18CSL48.pdf
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareThe Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
 
RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG
 
OSLec 4& 5(Processesinoperatingsystem).ppt
OSLec 4& 5(Processesinoperatingsystem).pptOSLec 4& 5(Processesinoperatingsystem).ppt
OSLec 4& 5(Processesinoperatingsystem).ppt
 
The PeriCAT Framework
The PeriCAT FrameworkThe PeriCAT Framework
The PeriCAT Framework
 
ECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/SnaptutorialECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/Snaptutorial
 
ECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/SnaptutorialECET 360 help A Guide to career/Snaptutorial
ECET 360 help A Guide to career/Snaptutorial
 
A report on designing a model for improving CPU Scheduling by using Machine L...
A report on designing a model for improving CPU Scheduling by using Machine L...A report on designing a model for improving CPU Scheduling by using Machine L...
A report on designing a model for improving CPU Scheduling by using Machine L...
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
Unit 3 part2
Unit 3 part2Unit 3 part2
Unit 3 part2
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventure
 
Citadel training on context awareness solution
Citadel training on context awareness solutionCitadel training on context awareness solution
Citadel training on context awareness solution
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
 
Lect 3-4 Zaheer Abbas
Lect 3-4 Zaheer AbbasLect 3-4 Zaheer Abbas
Lect 3-4 Zaheer Abbas
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 

Making Computations Reproducible

  • 2. Abstract Scientific computations tend to involve a number of experiments under different conditions. It is important to manage computational experiments so that their results are reproducible. In this talk we introduce 3 rules to make computations reproducible. 2 / 31
  • 3. Outline ...1 Introduction ...2 Discipline Three elements Three rules Complements ...3 Practice ...4 Summary 3 / 31
  • 5. Background A lab notebook is indispensable for experimental research in natural science. One of its role is to make experiments reproducible. Why not for computational research? . ......Lack of reproducibility means lack of reliability. 5 / 31
  • 6. Common problems Common problems in computational experiments: I confused which results is got under which condition. I overwrote previous results without intent. I used inconsistent data to get invalid results. ... Not a few problems are caused due an inappropriate management of experiments. 6 / 31
  • 7. Goal To archive all results of each experiment with all information required to reproduce them so that we can retrieve and restore easily in a systematic and costless way. 7 / 31
  • 8. Note What is introduced in this talk is not a established methodology, but a collection of field techniques. Same with wording. In this talk, we will not deal with distributed computation documentation or test publishing of a paper release of OSS 8 / 31
  • 10. Three elements We distinguish the following elements which affect reproducibility of computations: Algorithm an algorithm coded into a program implemented by yourself, calling external library, ... Data input and output data, intermediate data to reuse Environment software and hardware environment external library, server configuration, platform, ... 10 / 31
  • 11. Three rules Give an Identifier to each element and archive them. Record a machine-readable Recipe with a human-readable comments. Make every manipulation Mechanized. 11 / 31
  • 12. . Identifier.. ......Give an Identifier to each element and archive them. Algorithm use version control system Data give a name to distinguish data kind give a version to distinguish concrete content Environment find information of platform find a version (optionally build parameters) of a library Keep in mind to track all elements during the whole process: every code under version control no data without an identifier no temporary environment 12 / 31
  • 13. . Recipe .. ...... Record a machine-readable Recipe with a human-readable comments. A recipe should include all information required to reproduce the results of an experiment (other than contents of Algorithm, Data and Environment stored in other place.) A recipe should be machine-readable to re-conduct the experiment. A recipe should include a human-readable comment on purpose and/or meanings of the experiment. A recipe should be generated automatically by tracking experiments. 13 / 31
  • 14. Typically a recipe include the following information: in which order which data is processed by which algorithm under which environment with which Parameter Typically a recipe consists of the followings: a script file to run the whole process a configuration file which specifies parameters and identifiers a text file of comments 14 / 31
  • 15. . Mechanize.. ......Make every manipulation Mechanized. Run the whole process of an experiment by a single operation. No manual manipulation of data. No manual compilation of source codes. Automated provision of an environment. 15 / 31
  • 16. complement: Tentative experiment Too large archive detracts substantive significant of reproducibility. For tentative experiments with ephemeral results, it is not necessarily required to record. test of codes trial on tiny data ... If there is a possibility to get a result which might be used, referred or looked up afterward, then it should be recorded. 16 / 31
  • 17. complement: Reuse of intermediate data In order to reuse intermediate data, utilize an identifier. Explicitly specify intermediate data to reuse by an identifier. Automatically detect available intermediate data based on dependency. ... 17 / 31
  • 19. Identify Algorithm Use a version control system to manage source codes such as Git and Mercurial. It is easy to record a revision and uncommitted changes at each experiment. (Learn inside of VCS if you need more flexible management.) 19 / 31
  • 20. Identify Data File Give appropriate names to directories and files, then a resolved absolute path can be used as an identifier. If no meaningful word is thought up, use time-stamp or hash. DB or other API A pair of URI and query of which results are constant can be used as an identifier. If API behaves randomly, keep the results at hand (w/time-stamp). 20 / 31
  • 21. Identify Environment Python package Use PyPa tools (virtualenv, setuptools and pip) or Conda/enstaller. Library Use HashDist. It is an alternative to utilize CDE. Platform Use platform, a standard library of Python Server configuration Use Ansible or other configuration management tool, and Vagrant or other provisioning tool. 21 / 31
  • 22. HashDist A tool for developing, building and managing software stacks. An software stack is described by YAML. We can create, copy, move and remove software stacks. $ git checkout stack.yml $ hit build stack.yaml 22 / 31
  • 23. Recipe: configuration file A configuration in recipe should be of a machine-readable format. Use ConfigParser, PyYAML or json module to read/write parameters in INI, YAML or JSON format. A receipt should include the followings: command line argument environment variable random seed 23 / 31
  • 24. Recipe: script file A script in recipe should run the whole process by a single operation. There are several alternatives to realize such a script: utilize a build tool (such as Autotools, Scons, and maf) utilize a job-flow tool (such as Ruffus, Luigi) write a small script by hand (e.g. run.py) 24 / 31
  • 25. maf “maf is a waf extension for writing computational experiments.” Conduct computational experiments as build processes. Focus on machine learning: list configurations run programs with each configuration aggregate and visualize their results 25 / 31
  • 26. Recipe: automatic generation Do it yourself, or use Sumatra. “Sumatra: automated tracking of scientific computations” recording information about experiments, linking to data files command line & web interface integration with LATEX/Sphinx $ smt run --executable=python --main=main.py conf.param input.data $ smt comment "..." $ smt info $ smt repeat 26 / 31
  • 28. Summary We have introduced 3 rules to manage computational experiments so that their results are reproducible. However, our method is just a makeshift patchwork of field techniques. . ...... We need a tool to manage experiments in more integrated, systematic and sophisticated manner for reproducible computations. 28 / 31
  • 29. Links PyPa http://python-packaging-user-guide. readthedocs.org Conda http://conda.pydata.org enstaller https://github.com/enthought/enstaller HashDist http://hashdist.github.io CDE http://www.pgbovine.net/cde.html Ansible http://www.ansible.com Vagrant http://www.vagrantup.com Scons http://www.scons.org maf https://github.com/pfi/maf Ruffus http://www.ruffus.org.uk Luigi https://github.com/spotify/luigi Sumatra http://neuralensemble.org/sumatra 29 / 31
  • 30. References [1] G. K. Sandve, A. Nekrutenko, J. Taylor, E. Hovig, “Ten Simple Rules for Reproducible Computational Research,” PLoS Comput. Biol. 9(10): e1003285 (2013). doi:10.1371/journal.pcbi.1003285 [2] V. Stodden, F. Leisch, R. Peng, “Implementing Reproducible Research,” Open Science Framework (2014). osf.io/s9tya 30 / 31
  • 31. fin. back to outline Revision: f2b0e97 (2014-08-03) 31 / 31