SlideShare a Scribd company logo
1 of 27
Techniques for Preserving
Scientific Software Executions:
Preserve the Mess
or Encourage Cleanliness?
Douglas Thain, Peter Ivie, and Haiyan Meng
http://ccl.cse.nd.edu
The Cooperative Computing Lab
The Cooperative Computing Lab
• We collaborate with people who have large
scale computing problems in science,
engineering, and other fields.
• We operate computer systems on the
O(10,000) cores: clusters, clouds, grids.
• We conduct computer science research in the
context of real people and problems.
• We release open source software for large
scale distributed computing.
3
http://ccl.cse.nd.edu
Some of Our Collaborators
K. Lannon: Analyze 2PB of
data produced by the LHC
experiment at CERN
J. Izaguirre: Simulate 10M
different configurations of
a complex protein.
S. Emrich: Analyze DNA in
thousands of genomes for
similar sub-sequences.
P. Flynn: Computational
experiments on millions of
acquired face videos.
Reproducibility in Scientific Computing
is Very Poor Today
• Can I re-run a result from a colleague from five
years ago, and obtain the same result?
• How about a student in my lab from last week?
• Today, are we preparing for our current results to
be re-used by others five years from now?
• Multiple reasons why not:
– Rapid technological change.
– No archival of artifacts.
– Many implicit dependencies.
– Lack of backwards compatibility.
– Lack of social incentives.
Our scientific collaborators see the
value in reproducibility…
But only if it can be done
- easily
- at large scale
- with high performance
In principle, preserving a
software execution is easy.
I want to preserve my simulation method
and results so other people can try it out.
data output
DOI:10.XXXX DOI:10.ZZZZ
DOI:10.CCCC
DOI:10.YYYY
… and repeat this 1M times
with different –p values.
mysim.exe –in data –out output –p 10
But it’s not that simple!
I want to preserve my simulation method
and results so other people can try it out.
data output
mysim.exe –in data –out output –p 10
config
calib HTTP GET
Green Goat Linux 57.83.09.B
libsim
ruby
X86-64 CPU / 64GB RAM / 200GB Disk
SIM_MODE=clever
The problem is
implicit dependencies:
(things you need but cannot see)
How do we find them?
How do we deliver them?
Two approaches:
Preserve the Mess
(VMs and Parrot)
Encourage Cleanliness
(Umbrella and Prune)
Preserve the Mess: Save a VM
data output
sim.exe –in data –out output –p 10
config
Green Goat Linux 57.83.09.B
libsim
ruby
Hardware
SIM_MODE=clever
Virtual Hardware
data output
sim.exe –in data –out output –p 10
config
Green Goat Linux 57.83.09.B
libsim
ruby
SIM_MODE=clever
VMM
DOI:10.XXXX
calib HTTP GET
Preserve the Mess: Save a VM
• Not a bad place to start, but:
– Captures more things than necessary.
– Duplication across similar VM images.
– Hard to disentangle things logically – what if you
want to run the same thing with a new OS?
– Doesn’t capture network interactions.
– May be coupled to a specific VM technology.
– VM services are not data archives.
Preserve the Mess:
Trace Individual Dependencies
data output
sim.exe –in data –out output –p 10
config
Green Goat Linux 57.83.09.B
libsim
ruby
Hardware
SIM_MODE=clever
calib HTTP GET
/home/fred/data
/home/fred/config
/home/fred/sim.exe
/usr/lib/libsim
/usr/bin/ruby
http://somewhere/calib
. . .
Resource Actually UsedOriginal Machine
A portable package that can be
re-executed using Docker,
Parrot, or Amazon
Parrot
Trace all
system calls
Case Study: 166TB reduced to a 21GB package.
Haiyan Meng, Matthias Wolf, Peter Ivie, Anna Woodard, Michael Hildreth, Douglas Thain,
A Case Study in Preserving a High Energy Physics Application with Parrot,
Journal of Physics: Conference Series, December, 2015.
Preserve the Mess:
Trace Individual Dependencies
• Solves some problems:
– Captures only what is necessary.
– Observes network interactions.
– Portable across VM technologies.
• Still has some of the same problems:
– Hard to disentangle things logically – what if you
want to run the same thing with a new OS?
– Duplication across similar VM images / packages.
– VM services are not data archives.
What we really want:
A structured way to compose an
application with all of its
dependencies.
Enable preservation and sharing of
data and images for efficiency.
Encourage Cleanliness: Umbrella
kernel: { name: “Linux”, version: “3.2”; }
opsys: { name: “Red Hat”, version: “6.1” }
software: {
mysim: {
url: “doi://10.WW/ZZZZ”
mount: “/soft/sim”,
}
}
data: {
input: {
url: “http://some.url”
mount : “/data/input”,
}
calib: {
url: “doi://10.XX/YYYY”
mount: “/data/calib”,
}
}
“umbrella run mysim.json”
mysim.json
RedHat 6.1
Linux 3.2.4
mysim
input calib
RedHat 6.1
input
mysim
calib
Online Data Archives
RedHat 6.1
RedHat 6.2
Mysim 3.1
Mysim 3.2
Institutional Repository
input1
calib1
input2
calib2
Linux 3.2.4
Linux 3.2.5
RedHat 6.1
Linux 3.2.4
Mysim 3.1
input1
Umbrella specifies a reproducible environment while
avoiding duplication and enabling precise adjustments.
Run the experiment
Same thing, but use
different input data.
Same thing, but
update the OS
RedHat 6.1
Linux 3.2.4
Mysim 3.1
input2
RedHat 6.2
Linux 3.2.4
Mysim 3.1
input2
Haiyan Meng and Douglas Thain, Umbrella: A Portable Environment Creator for Reproducible
Computing on Clusters, Clouds, and Grids, Virt. Tech. for Distributed Computing, June 2015.
Specification is More Important
Than Mechanism
• Umbrella can work in a variety of ways:
– Native Hardware: Just check for compatibility.
– Amazon: allocate VM, copy and unpack tarballs.
– Docker: create container, mount volumes.
– Parrot: Download tarballs, mount at runtime.
– Condor: Request compatible machine.
• More ways will be possible in the future as
technologies come and go.
• Key requirement: Efficient runtime composition,
rather than copying, to allow shared deps.
Encourage Cleanliness:
Construct workflows from carefully
specified building blocks.
Encouraging Cleanliness: PRUNE
• Observation: The Unix execution model is part of
the problem, because it allows implicit deps.
• Can we improve upon the standard command-
line shell interface to make it reproducible?
• Instead of interpreting an opaque string:
mysim.exe –in data –out calib
• Ask the user to invoke a function instead:
output = mysim( input, calib ) IN ENV mysim.json
PRUNE – Preserving Run Environment
PUT “/tmp/input1.dat” AS “input1” [id 3ba8c2]
PUT “/tmp/input2.dat” AS “input2” [id dab209]
PUT “/tmp/calib.dat” AS “calib” [id 64c2fa]
PUT “sim.function” AS “sim” [id fffda7]
out1 = sim( input1, calib ) IN ENV mysim.json
[out1 is bab598]
out2 = sim( input2, calib ) IN ENV mysim.json
[out2 is 392caf]
out3 = sim( input2, calib2 ) IN ENV bettersim.json
[out3 is 232768]
RedHat 6.1
RedHat 6.2
Mysim 3.1
Mysim 3.2
Online Data Archive
input1
calib1
outpu11
myenv1
Linux 3.2.4
Linux 3.2.5
PRUNE connects together precisely reproducible
executions and gives each item a unique identifier
myenv1
input1
calib1
output 1sim
output1 = sim( input1, calib1 ) IN ENV mysim.json
bab598 = fffda7 ( 3ba8c2, 64c2fa ) IN ENV c8c832
myenv1
sim2
Recapitulation
• Key problem: User cannot see the implicit
dependencies that are critical to their code.
• Preserve the Mess:
– VMs: Just capture everything present.
– Parrot: Capture only what is actually used.
• Encourage Cleanliness:
– Umbrella: Describe all deps in a simple specification.
– PRUNE: Use a specification with every task run.
• Preserving the mess may work for a handful of
tasks, but cleanliness is required for millions!
Many Open Problems!
• Naming: Tension between usability and
durability: DOIs, UUIDs, HMACs, . . .
• Overhead: Tools must be close to native
performance, or they won’t get used.
• Usability: Do users have to change behavior?
• Layers: Preserve program binaries, or sources +
compilers, or something else?
• Repositories: Will they take provisional data?
• Compatibility: Can we plug into existing tools?
• Composition: Connect systems together?
For more information…
27
Haiyan Meng
hmeng@nd.edu
Peter Ivie
pivie@nd.edu
Douglas Thain
dthain@nd.edu
Parrot Packaging:
http://ccl.cse.nd.edu/software/parrot
Umbrella Technology Preview:
http://ccl.cse.nd.edu/software/umbrella

More Related Content

What's hot

Installation of application server 10g in red hat 4
Installation of application server 10g in red hat 4Installation of application server 10g in red hat 4
Installation of application server 10g in red hat 4uzzzle
 
Ufo Ship for AWS ECS
Ufo Ship for AWS ECSUfo Ship for AWS ECS
Ufo Ship for AWS ECSTung Nguyen
 
Using docker for data science - part 2
Using docker for data science - part 2Using docker for data science - part 2
Using docker for data science - part 2Calvin Giles
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data scienceCalvin Giles
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Maarten Mulders
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsLaurynas Biveinis
 
Automated Deployment with Fabric
Automated Deployment with FabricAutomated Deployment with Fabric
Automated Deployment with Fabrictanihito
 
Vancouver presentation
Vancouver presentationVancouver presentation
Vancouver presentationColleen_Murphy
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoChris Stivers
 
Puppet Camp Charlotte 2015: Exporting Resources: There and Back Again
Puppet Camp Charlotte 2015: Exporting Resources: There and Back AgainPuppet Camp Charlotte 2015: Exporting Resources: There and Back Again
Puppet Camp Charlotte 2015: Exporting Resources: There and Back AgainPuppet
 
Tensorflow in Docker
Tensorflow in DockerTensorflow in Docker
Tensorflow in DockerEric Ahn
 
A little systemtap
A little systemtapA little systemtap
A little systemtapyang bingwu
 
Weird things we've seen with OpenStack Neutron
Weird things we've seen with OpenStack NeutronWeird things we've seen with OpenStack Neutron
Weird things we've seen with OpenStack NeutronNick Jones
 
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Cloudflare
 
Building Docker images with Puppet
Building Docker images with PuppetBuilding Docker images with Puppet
Building Docker images with PuppetNick Jones
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database ReplicationMehdi Valikhani
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Stephane Jourdan
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy Systemadrian_nye
 
2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Herokuronnywang_tw
 

What's hot (20)

Installation of application server 10g in red hat 4
Installation of application server 10g in red hat 4Installation of application server 10g in red hat 4
Installation of application server 10g in red hat 4
 
Ufo Ship for AWS ECS
Ufo Ship for AWS ECSUfo Ship for AWS ECS
Ufo Ship for AWS ECS
 
Using docker for data science - part 2
Using docker for data science - part 2Using docker for data science - part 2
Using docker for data science - part 2
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data science
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
Automated Deployment with Fabric
Automated Deployment with FabricAutomated Deployment with Fabric
Automated Deployment with Fabric
 
Vancouver presentation
Vancouver presentationVancouver presentation
Vancouver presentation
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & Go
 
Puppet Camp Charlotte 2015: Exporting Resources: There and Back Again
Puppet Camp Charlotte 2015: Exporting Resources: There and Back AgainPuppet Camp Charlotte 2015: Exporting Resources: There and Back Again
Puppet Camp Charlotte 2015: Exporting Resources: There and Back Again
 
Tensorflow in Docker
Tensorflow in DockerTensorflow in Docker
Tensorflow in Docker
 
A little systemtap
A little systemtapA little systemtap
A little systemtap
 
Weird things we've seen with OpenStack Neutron
Weird things we've seen with OpenStack NeutronWeird things we've seen with OpenStack Neutron
Weird things we've seen with OpenStack Neutron
 
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
 
Building Docker images with Puppet
Building Docker images with PuppetBuilding Docker images with Puppet
Building Docker images with Puppet
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku
 

Viewers also liked

Glenlyon norfolk school ib
Glenlyon norfolk school ibGlenlyon norfolk school ib
Glenlyon norfolk school ibiamprosperous
 
Pickering College - school and appearance&requirements and expectatins
Pickering College - school and appearance&requirements and expectatinsPickering College - school and appearance&requirements and expectatins
Pickering College - school and appearance&requirements and expectatinsiamprosperous
 
Alexander college general flyer english
Alexander college general flyer englishAlexander college general flyer english
Alexander college general flyer englishiamprosperous
 
UCW 加西大学 工商管理硕士
UCW 加西大学   工商管理硕士UCW 加西大学   工商管理硕士
UCW 加西大学 工商管理硕士iamprosperous
 
Queens university-handbook
Queens university-handbookQueens university-handbook
Queens university-handbookiamprosperous
 
Alexander academy tuition-and-fees-all-programs-2015
Alexander academy tuition-and-fees-all-programs-2015Alexander academy tuition-and-fees-all-programs-2015
Alexander academy tuition-and-fees-all-programs-2015iamprosperous
 
York School District unlocking potencial for learning
York School District unlocking potencial for learningYork School District unlocking potencial for learning
York School District unlocking potencial for learningiamprosperous
 
Canadian tourism college hrm diplomas + testimonials
Canadian tourism college hrm diplomas + testimonialsCanadian tourism college hrm diplomas + testimonials
Canadian tourism college hrm diplomas + testimonialsiamprosperous
 

Viewers also liked (15)

Glenlyon norfolk school ib
Glenlyon norfolk school ibGlenlyon norfolk school ib
Glenlyon norfolk school ib
 
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
 
Pickering College - school and appearance&requirements and expectatins
Pickering College - school and appearance&requirements and expectatinsPickering College - school and appearance&requirements and expectatins
Pickering College - school and appearance&requirements and expectatins
 
UCW b comm brochure
UCW b comm brochureUCW b comm brochure
UCW b comm brochure
 
Alexander college general flyer english
Alexander college general flyer englishAlexander college general flyer english
Alexander college general flyer english
 
UCW 加西大学 工商管理硕士
UCW 加西大学   工商管理硕士UCW 加西大学   工商管理硕士
UCW 加西大学 工商管理硕士
 
Gds
GdsGds
Gds
 
Queens university-handbook
Queens university-handbookQueens university-handbook
Queens university-handbook
 
Alexander academy tuition-and-fees-all-programs-2015
Alexander academy tuition-and-fees-all-programs-2015Alexander academy tuition-and-fees-all-programs-2015
Alexander academy tuition-and-fees-all-programs-2015
 
York School District unlocking potencial for learning
York School District unlocking potencial for learningYork School District unlocking potencial for learning
York School District unlocking potencial for learning
 
Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...
Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...
Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...
 
Canadian tourism college hrm diplomas + testimonials
Canadian tourism college hrm diplomas + testimonialsCanadian tourism college hrm diplomas + testimonials
Canadian tourism college hrm diplomas + testimonials
 
Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies...
Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies...Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies...
Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies...
 
One Core Preservation System for all your Data. No Exceptions! Marco Klindt a...
One Core Preservation System for all your Data. No Exceptions! Marco Klindt a...One Core Preservation System for all your Data. No Exceptions! Marco Klindt a...
One Core Preservation System for all your Data. No Exceptions! Marco Klindt a...
 
Educational Records of Practice: Preservation and Access Concerns. Elizabeth ...
Educational Records of Practice: Preservation and Access Concerns. Elizabeth ...Educational Records of Practice: Preservation and Access Concerns. Elizabeth ...
Educational Records of Practice: Preservation and Access Concerns. Elizabeth ...
 

Similar to Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie and Haiyan Meng

Prosit google-cloud
Prosit google-cloudProsit google-cloud
Prosit google-cloudUC Davis
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereRodrique Heron
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamStewart Needham
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Ilya Haykinson
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
Penetrating Windows 8 with syringe utility
Penetrating Windows 8 with syringe utilityPenetrating Windows 8 with syringe utility
Penetrating Windows 8 with syringe utilityIOSR Journals
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Interview questions
Interview questionsInterview questions
Interview questionsxavier john
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
Linux Desktop Automation
Linux Desktop AutomationLinux Desktop Automation
Linux Desktop AutomationRui Lapa
 
Ato2019 weave-services-istio
Ato2019 weave-services-istioAto2019 weave-services-istio
Ato2019 weave-services-istioLin Sun
 
Weave Your Microservices with Istio
Weave Your Microservices with IstioWeave Your Microservices with Istio
Weave Your Microservices with IstioAll Things Open
 
All Things Open 2019 weave-services-istio
All Things Open 2019 weave-services-istioAll Things Open 2019 weave-services-istio
All Things Open 2019 weave-services-istioLin Sun
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
OpenNMS - My Notes
OpenNMS - My NotesOpenNMS - My Notes
OpenNMS - My Notesashrawi92
 
XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016
XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016
XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016ICS User Group
 
Run alone: a standalone application attempt by Gabriel Sor
Run alone: a standalone application attempt by Gabriel SorRun alone: a standalone application attempt by Gabriel Sor
Run alone: a standalone application attempt by Gabriel SorFAST
 

Similar to Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie and Haiyan Meng (20)

Prosit google-cloud
Prosit google-cloudProsit google-cloud
Prosit google-cloud
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
Penetrating Windows 8 with syringe utility
Penetrating Windows 8 with syringe utilityPenetrating Windows 8 with syringe utility
Penetrating Windows 8 with syringe utility
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Interview questions
Interview questionsInterview questions
Interview questions
 
TSRT Crashes
TSRT CrashesTSRT Crashes
TSRT Crashes
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Linux Desktop Automation
Linux Desktop AutomationLinux Desktop Automation
Linux Desktop Automation
 
Ato2019 weave-services-istio
Ato2019 weave-services-istioAto2019 weave-services-istio
Ato2019 weave-services-istio
 
Weave Your Microservices with Istio
Weave Your Microservices with IstioWeave Your Microservices with Istio
Weave Your Microservices with Istio
 
All Things Open 2019 weave-services-istio
All Things Open 2019 weave-services-istioAll Things Open 2019 weave-services-istio
All Things Open 2019 weave-services-istio
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Handout3o
Handout3oHandout3o
Handout3o
 
OpenNMS - My Notes
OpenNMS - My NotesOpenNMS - My Notes
OpenNMS - My Notes
 
XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016
XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016
XPages on IBM Bluemix: The Do's and Dont's - ICS.UG 2016
 
Run alone: a standalone application attempt by Gabriel Sor
Run alone: a standalone application attempt by Gabriel SorRun alone: a standalone application attempt by Gabriel Sor
Run alone: a standalone application attempt by Gabriel Sor
 

More from 12th International Conference on Digital Preservation (iPRES 2015)

More from 12th International Conference on Digital Preservation (iPRES 2015) (9)

Project Chrysalis – Transforming the Digital Business of the National Archive...
Project Chrysalis – Transforming the Digital Business of the National Archive...Project Chrysalis – Transforming the Digital Business of the National Archive...
Project Chrysalis – Transforming the Digital Business of the National Archive...
 
DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arco...
DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arco...DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arco...
DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arco...
 
Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...
Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...
Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...
 
Beyond the Binary: Pre-Ingest Preservation of Metadata. Jessica Moran and Jay...
Beyond the Binary: Pre-Ingest Preservation of Metadata. Jessica Moran and Jay...Beyond the Binary: Pre-Ingest Preservation of Metadata. Jessica Moran and Jay...
Beyond the Binary: Pre-Ingest Preservation of Metadata. Jessica Moran and Jay...
 
Experiment, Document & Decide: A Collaborative Approach to Preservation Plann...
Experiment, Document & Decide: A Collaborative Approach to Preservation Plann...Experiment, Document & Decide: A Collaborative Approach to Preservation Plann...
Experiment, Document & Decide: A Collaborative Approach to Preservation Plann...
 
Functional Access to Forensic Disk Images in a Web Service. Kam Woods, Christ...
Functional Access to Forensic Disk Images in a Web Service. Kam Woods, Christ...Functional Access to Forensic Disk Images in a Web Service. Kam Woods, Christ...
Functional Access to Forensic Disk Images in a Web Service. Kam Woods, Christ...
 
Towards a Common Approach for Access to Digital Archival Records in Europe. A...
Towards a Common Approach for Access to Digital Archival Records in Europe. A...Towards a Common Approach for Access to Digital Archival Records in Europe. A...
Towards a Common Approach for Access to Digital Archival Records in Europe. A...
 
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
 
A Foundational Framework for Digital Curation: The Sept Domain Model. Stephen...
A Foundational Framework for Digital Curation: The Sept Domain Model. Stephen...A Foundational Framework for Digital Curation: The Sept Domain Model. Stephen...
A Foundational Framework for Digital Curation: The Sept Domain Model. Stephen...
 

Recently uploaded

Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 

Recently uploaded (20)

Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 

Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie and Haiyan Meng

  • 1. Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie, and Haiyan Meng
  • 3. The Cooperative Computing Lab • We collaborate with people who have large scale computing problems in science, engineering, and other fields. • We operate computer systems on the O(10,000) cores: clusters, clouds, grids. • We conduct computer science research in the context of real people and problems. • We release open source software for large scale distributed computing. 3 http://ccl.cse.nd.edu
  • 4. Some of Our Collaborators K. Lannon: Analyze 2PB of data produced by the LHC experiment at CERN J. Izaguirre: Simulate 10M different configurations of a complex protein. S. Emrich: Analyze DNA in thousands of genomes for similar sub-sequences. P. Flynn: Computational experiments on millions of acquired face videos.
  • 5. Reproducibility in Scientific Computing is Very Poor Today • Can I re-run a result from a colleague from five years ago, and obtain the same result? • How about a student in my lab from last week? • Today, are we preparing for our current results to be re-used by others five years from now? • Multiple reasons why not: – Rapid technological change. – No archival of artifacts. – Many implicit dependencies. – Lack of backwards compatibility. – Lack of social incentives.
  • 6. Our scientific collaborators see the value in reproducibility… But only if it can be done - easily - at large scale - with high performance
  • 7. In principle, preserving a software execution is easy.
  • 8. I want to preserve my simulation method and results so other people can try it out. data output DOI:10.XXXX DOI:10.ZZZZ DOI:10.CCCC DOI:10.YYYY … and repeat this 1M times with different –p values. mysim.exe –in data –out output –p 10
  • 9. But it’s not that simple!
  • 10. I want to preserve my simulation method and results so other people can try it out. data output mysim.exe –in data –out output –p 10 config calib HTTP GET Green Goat Linux 57.83.09.B libsim ruby X86-64 CPU / 64GB RAM / 200GB Disk SIM_MODE=clever
  • 11. The problem is implicit dependencies: (things you need but cannot see) How do we find them? How do we deliver them?
  • 12. Two approaches: Preserve the Mess (VMs and Parrot) Encourage Cleanliness (Umbrella and Prune)
  • 13. Preserve the Mess: Save a VM data output sim.exe –in data –out output –p 10 config Green Goat Linux 57.83.09.B libsim ruby Hardware SIM_MODE=clever Virtual Hardware data output sim.exe –in data –out output –p 10 config Green Goat Linux 57.83.09.B libsim ruby SIM_MODE=clever VMM DOI:10.XXXX calib HTTP GET
  • 14. Preserve the Mess: Save a VM • Not a bad place to start, but: – Captures more things than necessary. – Duplication across similar VM images. – Hard to disentangle things logically – what if you want to run the same thing with a new OS? – Doesn’t capture network interactions. – May be coupled to a specific VM technology. – VM services are not data archives.
  • 15. Preserve the Mess: Trace Individual Dependencies data output sim.exe –in data –out output –p 10 config Green Goat Linux 57.83.09.B libsim ruby Hardware SIM_MODE=clever calib HTTP GET /home/fred/data /home/fred/config /home/fred/sim.exe /usr/lib/libsim /usr/bin/ruby http://somewhere/calib . . . Resource Actually UsedOriginal Machine A portable package that can be re-executed using Docker, Parrot, or Amazon Parrot Trace all system calls Case Study: 166TB reduced to a 21GB package. Haiyan Meng, Matthias Wolf, Peter Ivie, Anna Woodard, Michael Hildreth, Douglas Thain, A Case Study in Preserving a High Energy Physics Application with Parrot, Journal of Physics: Conference Series, December, 2015.
  • 16. Preserve the Mess: Trace Individual Dependencies • Solves some problems: – Captures only what is necessary. – Observes network interactions. – Portable across VM technologies. • Still has some of the same problems: – Hard to disentangle things logically – what if you want to run the same thing with a new OS? – Duplication across similar VM images / packages. – VM services are not data archives.
  • 17. What we really want: A structured way to compose an application with all of its dependencies. Enable preservation and sharing of data and images for efficiency.
  • 18. Encourage Cleanliness: Umbrella kernel: { name: “Linux”, version: “3.2”; } opsys: { name: “Red Hat”, version: “6.1” } software: { mysim: { url: “doi://10.WW/ZZZZ” mount: “/soft/sim”, } } data: { input: { url: “http://some.url” mount : “/data/input”, } calib: { url: “doi://10.XX/YYYY” mount: “/data/calib”, } } “umbrella run mysim.json” mysim.json RedHat 6.1 Linux 3.2.4 mysim input calib RedHat 6.1 input mysim calib Online Data Archives
  • 19. RedHat 6.1 RedHat 6.2 Mysim 3.1 Mysim 3.2 Institutional Repository input1 calib1 input2 calib2 Linux 3.2.4 Linux 3.2.5 RedHat 6.1 Linux 3.2.4 Mysim 3.1 input1 Umbrella specifies a reproducible environment while avoiding duplication and enabling precise adjustments. Run the experiment Same thing, but use different input data. Same thing, but update the OS RedHat 6.1 Linux 3.2.4 Mysim 3.1 input2 RedHat 6.2 Linux 3.2.4 Mysim 3.1 input2 Haiyan Meng and Douglas Thain, Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids, Virt. Tech. for Distributed Computing, June 2015.
  • 20. Specification is More Important Than Mechanism • Umbrella can work in a variety of ways: – Native Hardware: Just check for compatibility. – Amazon: allocate VM, copy and unpack tarballs. – Docker: create container, mount volumes. – Parrot: Download tarballs, mount at runtime. – Condor: Request compatible machine. • More ways will be possible in the future as technologies come and go. • Key requirement: Efficient runtime composition, rather than copying, to allow shared deps.
  • 21. Encourage Cleanliness: Construct workflows from carefully specified building blocks.
  • 22. Encouraging Cleanliness: PRUNE • Observation: The Unix execution model is part of the problem, because it allows implicit deps. • Can we improve upon the standard command- line shell interface to make it reproducible? • Instead of interpreting an opaque string: mysim.exe –in data –out calib • Ask the user to invoke a function instead: output = mysim( input, calib ) IN ENV mysim.json
  • 23. PRUNE – Preserving Run Environment PUT “/tmp/input1.dat” AS “input1” [id 3ba8c2] PUT “/tmp/input2.dat” AS “input2” [id dab209] PUT “/tmp/calib.dat” AS “calib” [id 64c2fa] PUT “sim.function” AS “sim” [id fffda7] out1 = sim( input1, calib ) IN ENV mysim.json [out1 is bab598] out2 = sim( input2, calib ) IN ENV mysim.json [out2 is 392caf] out3 = sim( input2, calib2 ) IN ENV bettersim.json [out3 is 232768]
  • 24. RedHat 6.1 RedHat 6.2 Mysim 3.1 Mysim 3.2 Online Data Archive input1 calib1 outpu11 myenv1 Linux 3.2.4 Linux 3.2.5 PRUNE connects together precisely reproducible executions and gives each item a unique identifier myenv1 input1 calib1 output 1sim output1 = sim( input1, calib1 ) IN ENV mysim.json bab598 = fffda7 ( 3ba8c2, 64c2fa ) IN ENV c8c832 myenv1 sim2
  • 25. Recapitulation • Key problem: User cannot see the implicit dependencies that are critical to their code. • Preserve the Mess: – VMs: Just capture everything present. – Parrot: Capture only what is actually used. • Encourage Cleanliness: – Umbrella: Describe all deps in a simple specification. – PRUNE: Use a specification with every task run. • Preserving the mess may work for a handful of tasks, but cleanliness is required for millions!
  • 26. Many Open Problems! • Naming: Tension between usability and durability: DOIs, UUIDs, HMACs, . . . • Overhead: Tools must be close to native performance, or they won’t get used. • Usability: Do users have to change behavior? • Layers: Preserve program binaries, or sources + compilers, or something else? • Repositories: Will they take provisional data? • Compatibility: Can we plug into existing tools? • Composition: Connect systems together?
  • 27. For more information… 27 Haiyan Meng hmeng@nd.edu Peter Ivie pivie@nd.edu Douglas Thain dthain@nd.edu Parrot Packaging: http://ccl.cse.nd.edu/software/parrot Umbrella Technology Preview: http://ccl.cse.nd.edu/software/umbrella