fasteners
psutils
$ corrcli watcher start
Launch watcher with ID: 6ea96596ac29
$ python my_script.py params.json 6ea96596ac29
$ corrcli jobs list
label status time stamp pid
0 27f427d2 finished 16-06-26 21:03:46 27800
$ corrcli sync
Daniel Wheelera
and Faïçal Yannick P. Congoa,b
a
Material Science and Engineering Division, Material Measurement Laboratory, NIST,
b
LIMOS, Blaise Pascal University, France
Email: daniel.wheeler@nist.gov
SIMULATION MANAGEMENT AND EXECUTION CONTROL
I-What is simulation management?
 Data context (metadata) is essential to verify scientific
research claims.
 Version control and other workflow tools capture
varying aspects of data context.
 Simulation management is concerned with the
metadata for a scientific simulation or execution.
 Execution control tools capture simulation execution
context in an analogous way to version control.
References
[1] A. Davison, “Automated Capture of Experiment Context
for Easier Reproducibility in Computational Research”, CISE,
2012, DOI: 10.1109/MCSE.2012.41
[2] F. Y. P. Congo, “Building a Cloud Service for Reproducible
Simulation Management”, Scipy Proceedings, 2015
[3] F. Y. P. Congo, “Cloud of Reproducible Records”, MGI
website, 2015, https://mgi.nist.gov/cloud-
reproducible-records
Workfow
Tools
Wrapping
Tools
II-Execution control in context
Execution Control
Version
Control
 Ubiquitous, robust
 Command line
 Web integration
 Highly collaborative
 Not suitable for capturing
execution context
 Suitable for recording
stable automated
executions
 Provides log, search and
view of execution history
 Capture entire simulation
context
 Version environments
 Collaborative

 Not collaborative with
current tools
 Not robust or ubiquitous
 Not suitable for log,
search and view of
history
 Suitable for building
pipelines of distinct tasks
 Enables a clear division
of tasks enabling flexible
and adjustable pipeline
implementation by non-
experts
 Black box design for each
section of the pipeline
 Monolithic in nature
encouraging isolated
ecosystem of tools
IV-Summary
III-Execution Control Apps
$ smt init
$ smt configure configure=python –
--main=script.py
$ smt run params.json
$ smt list
538732dcc47d
 Record saved at end of execution
 Relatively mature code base
 Local web view
 Computation launched as a subprocess
 Robust local record store on file system
 Use “watcher” to inspect process – no subprocesses
 Continuously update records
 Sync to CoRR API independently
 https://github.com/usnistgov/corr-cli/
 Separate API and Cloud apps for scalability
 Code base under construction
 Centralized cloud platform
 https://github.com/usnistgov/corr
 Capture version control details
 Capture python dependencies
 Capture input and output files
 Capture parameters
 Capture CPU and memory usage
 Capture record commentary
Features
What is CoRR?
 The Cloud of Reproducible Records is a
web platform and command line tool
client (CoRR-cli) for recording execution
context as a set of records.
Future Work
 Overcome government security issues to host live app
 Implement searchable table view of data
 Implement common metadata standard and
interoperability between CoRR and Sumatra
 Enlarge CoRR-cli metadata capture capabilities to be
equivalent to Sumatra for Python

Simulation Management and Execution Control

  • 1.
    fasteners psutils $ corrcli watcherstart Launch watcher with ID: 6ea96596ac29 $ python my_script.py params.json 6ea96596ac29 $ corrcli jobs list label status time stamp pid 0 27f427d2 finished 16-06-26 21:03:46 27800 $ corrcli sync Daniel Wheelera and Faïçal Yannick P. Congoa,b a Material Science and Engineering Division, Material Measurement Laboratory, NIST, b LIMOS, Blaise Pascal University, France Email: daniel.wheeler@nist.gov SIMULATION MANAGEMENT AND EXECUTION CONTROL I-What is simulation management?  Data context (metadata) is essential to verify scientific research claims.  Version control and other workflow tools capture varying aspects of data context.  Simulation management is concerned with the metadata for a scientific simulation or execution.  Execution control tools capture simulation execution context in an analogous way to version control. References [1] A. Davison, “Automated Capture of Experiment Context for Easier Reproducibility in Computational Research”, CISE, 2012, DOI: 10.1109/MCSE.2012.41 [2] F. Y. P. Congo, “Building a Cloud Service for Reproducible Simulation Management”, Scipy Proceedings, 2015 [3] F. Y. P. Congo, “Cloud of Reproducible Records”, MGI website, 2015, https://mgi.nist.gov/cloud- reproducible-records Workfow Tools Wrapping Tools II-Execution control in context Execution Control Version Control  Ubiquitous, robust  Command line  Web integration  Highly collaborative  Not suitable for capturing execution context  Suitable for recording stable automated executions  Provides log, search and view of execution history  Capture entire simulation context  Version environments  Collaborative   Not collaborative with current tools  Not robust or ubiquitous  Not suitable for log, search and view of history  Suitable for building pipelines of distinct tasks  Enables a clear division of tasks enabling flexible and adjustable pipeline implementation by non- experts  Black box design for each section of the pipeline  Monolithic in nature encouraging isolated ecosystem of tools IV-Summary III-Execution Control Apps $ smt init $ smt configure configure=python – --main=script.py $ smt run params.json $ smt list 538732dcc47d  Record saved at end of execution  Relatively mature code base  Local web view  Computation launched as a subprocess  Robust local record store on file system  Use “watcher” to inspect process – no subprocesses  Continuously update records  Sync to CoRR API independently  https://github.com/usnistgov/corr-cli/  Separate API and Cloud apps for scalability  Code base under construction  Centralized cloud platform  https://github.com/usnistgov/corr  Capture version control details  Capture python dependencies  Capture input and output files  Capture parameters  Capture CPU and memory usage  Capture record commentary Features What is CoRR?  The Cloud of Reproducible Records is a web platform and command line tool client (CoRR-cli) for recording execution context as a set of records. Future Work  Overcome government security issues to host live app  Implement searchable table view of data  Implement common metadata standard and interoperability between CoRR and Sumatra  Enlarge CoRR-cli metadata capture capabilities to be equivalent to Sumatra for Python