SlideShare a Scribd company logo
1 of 34
Docker in Open Science
Data Analysis Challenges
Bruce Hoff
Principal Software Engineer,
Sage Bionetworks
Open Science in Disease Research
Containerization as a tool for scientific reproducibility
Case Study: Docker in the 2015 ALS Stratification Challenge
Case Study: Docker in the 2016 Digital Mammography Challenge
Open Issues and Lessons Learned
Agenda
This talk is about saving lives.
Disease research is data intensive…
… but published analyses often aren’t
reproducible.
… and valuable data sets aren’t shared freely.
… which reduces the rate of progress.
Difficulties in science validation
 Amgen scientists tried to confirm 53 landmark papers in pre-clinical
oncology research: Only 6 (11%) were confirmed.[1]
 Bayer HealthCare reported that only about 25% of published
preclinical studies could be validated.[2]
 Poti Gate: Genomics Research at Duke during 2006-2010, led to the
identification of Diagnostic Signatures that spurred clinical trials. The
research was later deemed statistically flawed and the clinical trials
stopped
[1] C. Glenn Begley and Lee M. Ellis, Nature 483, 531 (2012)
[2] Prinz,F.,Schlange,T.&Asadullah,K., NatureRev. Drug Discov. 10, 712 (2011)
Our Solution: Open Data
Analysis Challenges
 Engage the community, rather than a select company or
lab, to solve a problem in biological/medicinal research.
 Obtain and expose a high value data set that would
otherwise be accessible by a few.
 Require that participants share their code and document
their algorithms; test for reproducibility.
0
50
100
150
200
250
300
350
400
450
NumberofSubmittingTeams
Unique Final submissions
Measures of Impact
• 32 scientific challenges
• 50 partner institutions (since 2006)
• >5000 registered users
• 10 international conferences
• 2500 conference attendees
• >100 publications using DREAM data
• 25 journal articles
• 3 journal special issues
• 2 edited books
• 1,300 Citations
• 20 PhD theses
• Use of Challenges in Classroom as problem sets
Dialogue for Reverse Engineering Assessment and Methods
(DREAM) is a crowdsourcing effort that poses quantitative
challenges about systems biology modeling.
Sage Bionetworks (2009-) is a nonprofit biomedical research
organization seeking to accelerate biomedical research through
open systems, incentives, and standards.
The two organization merged in 2013 to drive a continuing
series of open science challenges.
The Organization
• Web services that facilitate collaborative web science
– Projects Sharing Resources (code, files, ideas)
– wiki narratives
• Analysis provenance - linking data, code, and results; data
versioning
• Web services that facilitate Challenge logistics
– Registration, acceptance of data usage, acceptance of Challenge Terms and Conditions
– Real-time challenge leaderboards
– Discussion Forum
– Formation of Teams
– Online Supplement for Challenge Papers: e.g.:
https://www.synapse.org/#!Synapse:syn2528824/wiki/
Synapse: enabling collaborative research
2015 ALS Challenge
a case study in using Docker
in a DREAM challenge
ALS is a rapidly progressing neurodegenerative disease that typically leads
to death within 3-5 years but for which disease progression is heterogeneous
across the patient population.
Data for 9000 ALS patients provided by the Pooled Resources Open-Access
Clinical Trial (PRO-ACT) database.
The challenge was to predict disease progression from clinical data.
$28,000 in prize money raised through a grass-roots fund drive
https://www.indiegogo.com/projects/fund-the-prize-solve-als-together
Nature Biotechnology agreed to publish the results.
In a typical challenge…
• Data is partitioned into
– training
– leaderboard
– validation
• Participants
– download training data
– apply statistical learning methods
– submit predictions
Organizers want to constrain submitted models to work in a certain
way:
• Model has a ‘selector’ component to select predictive clinical features
• Model has a ‘predictor’ component to predict ALS outcome based on
selected features.
Organizers want to run each model themselves to:
- Ensure models are structured as prescribed
- Ensure reproducibility of output
Docker to the rescue!
Clinical
Data
Model
Output
Selector
Selected
Features
Predictor
Scientific
Leadership
High value
data set
IT Resources
Prize Money
High visibility
Publication
Community
participation
The ‘Stone Soup’ of Open Challenges
IBM Cloud with a ZEC12 system virtual
machine running a Linux server with 32
processors, 240 GB memory and 9 TB
storage space.
IBM Donates a Mainframe for ALS Challenge
Provision a container on a unique port for each participant. They log in as:
> ssh user_name@129.34.20.96 -p port_number
Provide a script that sends a “signal” to a process running Docker
> create_model_snapshot
Back-end process runs “docker commit” to create a copy of the model for
scoring.
Back-end reruns captured image as a new container, after mounting
leaderboard (or later, validation) data volume.
Using Docker with a Mainframe
2016 Digital Mammography
Challenge
a case study in using Docker
in a DREAM challenge
• The Scientific Question: How can we reduce erroneous recall
rate (false positives)?
• Image analysis machine learning problem
• “Deep learning” algorithms expected
• $1.2M in prize money expected to attract 100s of serious
participants
• 600,000 mammography images donated (~20TB)
• Budget for 100s of GPU servers from two Cloud providers
(AWS, IBM)
Why use Docker?
1) Large data size
2) Sensitive data
3) Provisioned compute
(1) Allocate
machine (e.g.
own laptop)
(2) Retrieve
base image
(3) Retrieve
small, pilot
dataset.
(4) Create model
(5) Verify model using pilot dataset
(6) Push
Dockerized
model to
registry
(8) Receive
trained model
and score.
…
(7) Submit
model to
Challenge.
Submission queue built into Synapse
(1) Retrieve
new
submissions.
…
(2) Retrieve
Docker image.
(3) Train / score model.
(4) Save
trained model
and score.
• We’ve implemented the data donor’s wish to maintain control of
the data.
• We have obviated the need to download the large data set.
• We have democratized participation, making compute available
to those who might not otherwise have it.
• After the challenge we have a library of rerunnable models
ensuring reproducibility.
Outcome
• How best to monitor a fleet of Docker hosts (incl. GPU usage)?
• How reproducible are models run on different GPU machines?
How much of the software stack should be in the container?
• How shall we limit submitted jobs?
• Are there networking issues as models access data?
• What are the security issues when running submitted
containers?
Open questions
• Images aren't always portable. System Z images can't be used
on Intel-based hardware.
• Reproducibility doesn't mean comprehensibility
• Find out about all our challenges at www.synapse.org
• For those of you down in the trenches, see brucehoff/dockerauth
for an example of how to do registry delegated authorization in
Java.
/etc
Acknowledgements
 Sage Bionetworks
 Stephen Friend
 Thea Norman
 Lara Mangravite
 Mike Kellen
 Mette Peters
 Arno Klein
 Solly Sieberts
 Abhi Pratap
 Chris Bare
 Bruce Hoff
 IBM
 Erhan Bilal
 Kely Norel
 Elise Blaese
 Pablo Meyer Rojas
 Kahn Rrhissorrakrai
 EBI
 Julio Saez Rodriguez
 Thomas Cokelaer
 Federica Eduati
 Michael Menden
 L. Maximilians University
 Robert Kueffner,
 Univ Colorado, Denver
 Jim Costello
 OHSU
 Joe Gray
 Adam Margolin
 Mehmet Gonen
 Laura Heiser
 Prize4Life
 Melanie Leitnerr
 Neta Zach
 NCI
 Dinah Singer
 Dan Gallahan
 ISMMS
 Eli Stahl
 Gaurav Pandey
 Columbia University
 Andrea Califano
 Mukesh Bansal
 Chuck Karan
 Rice University
 Amina Qutub
 David Noren
 Byron Long
 MD Anderson
 Steven Kornblau
 Univ of Lausanne
 Daniel Marbach
 Broad Institute
 Bill Hahn
 Barbara Weir
 Aviad Tsherniak
 Merck
 Robert Plenge
 BYU
 Keoni Kauwe
 OICR
 Paul Boutros
 UCSC
 Josh Stuart
Thank you!
• Science Translational Medicine (1 paper)
• Nature Biotechnology (4 papers)
• Nature Genetics (papers in preparation)
• Nature Methods (papers in preparation)
• Nature Neuroscience (papers in preparation)
• PLoS Computational Biology (papers in review and preparation)
• National Cancer Institute (contracts for Best Performers)
Challenge Assisted Peer Review Partners
 A crowdsourcing effort that poses quantitative challenges about systems
biology modeling and data analysis on:
 Transcriptional and signaling networks,
 Predictions of response to perturbations,
 Translational research (tox, RA, AD, ALS, AML, …)
 Our mission is
 to contribute to the solution of important biomedical problems
 to foster collaboration between research groups
 to democratize data
 to accelerate research
 to objectively assess algorithm performance
What are the DREAM Challenges
Peer review is subjective. But even if it were not, what comes to the
reviewers may be biased:
 Bias against publication of negative results or results contrary to
published results
 Incentive structure put researchers under considerable pressure to try
until they find a positive result (multiple testing, over-fitting, etc.)
Dani Brunner et al., Behavioral
processes 89, 187-195 (2012)
Inflated Statistical Significance
Multiple Testing
Selective Reporting
Overfitting
Benefits of crowd-sourcing
• Performance Evaluation
– Unbiased, consistent, and rigorous method assessment
– Unbiased comparison and discovery of best methods
– Determine the solvability of a scientific question
• Sampling of the space of methods
– Understand the diversity of methodologies presently being
used to solve a problem
Benefits of crowd-sourcing
• Acceleration of Research
– The community of participants can do in 4 months what would take 10
years to any group
• Community Building
– Make high quality, well-annotated data accessible
– Foster community collaborations on fundamental research questions
– Determine robust solutions through community consensus: “The Wisdom
of Crowds”
• Disease research is data intensive. A typical researcher has a PhD in
multivariate statistics and does a lot of programming in languages like R,
Python, and Matlab, using libraries of established tools.
• So these analyses are software stacks of a sort, each piece having the
typical series of revisions.
• This makes reproducibility really challenging: To reproduce an analysis
you need not only the original data and the statistical processing script
written by the author, but the correct versions of all the dependencies.
• Obviously containerization offers a powerful tool for reproducibility: the
entire software stack used in an analysis can be tracked.
The challenge of reproducibility

More Related Content

What's hot

Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Chip Childers
 
Containers, microservices and serverless for realists
Containers, microservices and serverless for realistsContainers, microservices and serverless for realists
Containers, microservices and serverless for realistsKarthik Gaekwad
 
Letting Science Drive Technology at GlaxoSmithKline
Letting Science Drive Technology at GlaxoSmithKlineLetting Science Drive Technology at GlaxoSmithKline
Letting Science Drive Technology at GlaxoSmithKlineDocker, Inc.
 
Cloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical ExamplesCloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical ExamplesAnderson Carvalho
 
Containers - Transforming the data centre as we know it 2016
Containers - Transforming the data centre as we know it 2016Containers - Transforming the data centre as we know it 2016
Containers - Transforming the data centre as we know it 2016Keith Lynch
 
Making Friendly Microservices by Michele Titlol
Making Friendly Microservices by Michele TitlolMaking Friendly Microservices by Michele Titlol
Making Friendly Microservices by Michele TitlolDocker, Inc.
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative worldKarthik Gaekwad
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeAdrian Cockcroft
 
2015 03-11_todd-fritz_devnexus_2015
2015 03-11_todd-fritz_devnexus_20152015 03-11_todd-fritz_devnexus_2015
2015 03-11_todd-fritz_devnexus_2015Todd Fritz
 
How to contribute to cloud native computing foundation (CNCF)
How to contribute to cloud native computing foundation (CNCF)How to contribute to cloud native computing foundation (CNCF)
How to contribute to cloud native computing foundation (CNCF)Krishna-Kumar
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the DataKellyn Pot'Vin-Gorman
 
Container World 2017!
Container World 2017!Container World 2017!
Container World 2017!kgraham32
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker, Inc.
 
Alibaba Cloud Conference 2016 - Docker Open Source
Alibaba Cloud Conference   2016 - Docker Open Source Alibaba Cloud Conference   2016 - Docker Open Source
Alibaba Cloud Conference 2016 - Docker Open Source John Willis
 
DockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container DeliveryDockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container DeliveryOscar Renalias
 
DockerCon EU 2015: Containing IoT Sensor Telemetry
DockerCon EU 2015: Containing IoT Sensor TelemetryDockerCon EU 2015: Containing IoT Sensor Telemetry
DockerCon EU 2015: Containing IoT Sensor TelemetryDocker, Inc.
 
DockerCon 18 Cool Hacks: solo.io
DockerCon 18 Cool Hacks:  solo.ioDockerCon 18 Cool Hacks:  solo.io
DockerCon 18 Cool Hacks: solo.ioDocker, Inc.
 
server to cloud: converting a legacy platform to an open source paas
server to cloud:  converting a legacy platform to an open source paasserver to cloud:  converting a legacy platform to an open source paas
server to cloud: converting a legacy platform to an open source paasTodd Fritz
 
All Things Open : Crash Course in Open Source Cloud Computing
All Things Open : Crash Course in Open Source Cloud Computing All Things Open : Crash Course in Open Source Cloud Computing
All Things Open : Crash Course in Open Source Cloud Computing Mark Hinkle
 
Docker for the Enterprise with Containers as a Service by Banjot Chanana
Docker for the Enterprise with Containers as a Service by Banjot ChananaDocker for the Enterprise with Containers as a Service by Banjot Chanana
Docker for the Enterprise with Containers as a Service by Banjot ChananaDocker, Inc.
 

What's hot (20)

Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015
 
Containers, microservices and serverless for realists
Containers, microservices and serverless for realistsContainers, microservices and serverless for realists
Containers, microservices and serverless for realists
 
Letting Science Drive Technology at GlaxoSmithKline
Letting Science Drive Technology at GlaxoSmithKlineLetting Science Drive Technology at GlaxoSmithKline
Letting Science Drive Technology at GlaxoSmithKline
 
Cloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical ExamplesCloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical Examples
 
Containers - Transforming the data centre as we know it 2016
Containers - Transforming the data centre as we know it 2016Containers - Transforming the data centre as we know it 2016
Containers - Transforming the data centre as we know it 2016
 
Making Friendly Microservices by Michele Titlol
Making Friendly Microservices by Michele TitlolMaking Friendly Microservices by Michele Titlol
Making Friendly Microservices by Michele Titlol
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
2015 03-11_todd-fritz_devnexus_2015
2015 03-11_todd-fritz_devnexus_20152015 03-11_todd-fritz_devnexus_2015
2015 03-11_todd-fritz_devnexus_2015
 
How to contribute to cloud native computing foundation (CNCF)
How to contribute to cloud native computing foundation (CNCF)How to contribute to cloud native computing foundation (CNCF)
How to contribute to cloud native computing foundation (CNCF)
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
 
Container World 2017!
Container World 2017!Container World 2017!
Container World 2017!
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General Session
 
Alibaba Cloud Conference 2016 - Docker Open Source
Alibaba Cloud Conference   2016 - Docker Open Source Alibaba Cloud Conference   2016 - Docker Open Source
Alibaba Cloud Conference 2016 - Docker Open Source
 
DockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container DeliveryDockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container Delivery
 
DockerCon EU 2015: Containing IoT Sensor Telemetry
DockerCon EU 2015: Containing IoT Sensor TelemetryDockerCon EU 2015: Containing IoT Sensor Telemetry
DockerCon EU 2015: Containing IoT Sensor Telemetry
 
DockerCon 18 Cool Hacks: solo.io
DockerCon 18 Cool Hacks:  solo.ioDockerCon 18 Cool Hacks:  solo.io
DockerCon 18 Cool Hacks: solo.io
 
server to cloud: converting a legacy platform to an open source paas
server to cloud:  converting a legacy platform to an open source paasserver to cloud:  converting a legacy platform to an open source paas
server to cloud: converting a legacy platform to an open source paas
 
All Things Open : Crash Course in Open Source Cloud Computing
All Things Open : Crash Course in Open Source Cloud Computing All Things Open : Crash Course in Open Source Cloud Computing
All Things Open : Crash Course in Open Source Cloud Computing
 
Docker for the Enterprise with Containers as a Service by Banjot Chanana
Docker for the Enterprise with Containers as a Service by Banjot ChananaDocker for the Enterprise with Containers as a Service by Banjot Chanana
Docker for the Enterprise with Containers as a Service by Banjot Chanana
 

Viewers also liked

A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersDocker, Inc.
 
DockerCon14 John Engates
DockerCon14 John EngatesDockerCon14 John Engates
DockerCon14 John EngatesDocker, Inc.
 
DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...
DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...
DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...Docker, Inc.
 
DockerCon 2015: Docker Engine Breakout Session
DockerCon 2015: Docker Engine Breakout SessionDockerCon 2015: Docker Engine Breakout Session
DockerCon 2015: Docker Engine Breakout SessionDocker, Inc.
 
DockerCon SF 2015: How to Build a Secure DevOps Environment for the Government
DockerCon SF 2015: How to Build a Secure DevOps Environment for the GovernmentDockerCon SF 2015: How to Build a Secure DevOps Environment for the Government
DockerCon SF 2015: How to Build a Secure DevOps Environment for the GovernmentDocker, Inc.
 
Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1Docker, Inc.
 
Building Images from dockerfiles
Building Images from dockerfilesBuilding Images from dockerfiles
Building Images from dockerfilesDocker, Inc.
 
How to Use Your Own Private Registry
How to Use Your Own Private RegistryHow to Use Your Own Private Registry
How to Use Your Own Private RegistryDocker, Inc.
 
Contribute and Collaborate 101
Contribute and Collaborate 101Contribute and Collaborate 101
Contribute and Collaborate 101Docker, Inc.
 
DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2Docker, Inc.
 
DockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards DockerDockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards DockerDocker, Inc.
 
DockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to MinutesDockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to MinutesDocker, Inc.
 
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioCost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioDocker, Inc.
 
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...Docker, Inc.
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013Docker, Inc.
 
Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft - Docker in 8-bit by Aditya Gupta Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft - Docker in 8-bit by Aditya Gupta Docker, Inc.
 
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker ContainersDockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker ContainersDocker, Inc.
 

Viewers also liked (20)

A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
 
DockerCon14 John Engates
DockerCon14 John EngatesDockerCon14 John Engates
DockerCon14 John Engates
 
DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...
DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...
DockerCon EU 2015: Finding a Theory of the Universe with Docker and Volunteer...
 
DockerCon 2015: Docker Engine Breakout Session
DockerCon 2015: Docker Engine Breakout SessionDockerCon 2015: Docker Engine Breakout Session
DockerCon 2015: Docker Engine Breakout Session
 
DockerCon SF 2015: How to Build a Secure DevOps Environment for the Government
DockerCon SF 2015: How to Build a Secure DevOps Environment for the GovernmentDockerCon SF 2015: How to Build a Secure DevOps Environment for the Government
DockerCon SF 2015: How to Build a Secure DevOps Environment for the Government
 
Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1
 
Building Images from dockerfiles
Building Images from dockerfilesBuilding Images from dockerfiles
Building Images from dockerfiles
 
How to Use Your Own Private Registry
How to Use Your Own Private RegistryHow to Use Your Own Private Registry
How to Use Your Own Private Registry
 
Contribute and Collaborate 101
Contribute and Collaborate 101Contribute and Collaborate 101
Contribute and Collaborate 101
 
DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2
 
DockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards DockerDockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards Docker
 
DockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to MinutesDockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to Minutes
 
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.ioCost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
Cost Control Across Cloud, On-Premise and VM Computers by Mark Lavi, Calm.io
 
OpenStack Boston
OpenStack BostonOpenStack Boston
OpenStack Boston
 
Docker Links
Docker LinksDocker Links
Docker Links
 
DockerCon14 eBay
DockerCon14 eBayDockerCon14 eBay
DockerCon14 eBay
 
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013
 
Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft - Docker in 8-bit by Aditya Gupta Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft - Docker in 8-bit by Aditya Gupta
 
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker ContainersDockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
 

Similar to Docker in Open Science Data Analysis Challenges by Bruce Hoff

Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionChris Dwan
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingOla Spjuth
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowEagle Genomics
 

Similar to Docker in Open Science Data Analysis Challenges by Bruce Hoff (20)

Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 

More from Docker, Inc.

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Docker, Inc.
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildDocker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSDocker, Inc.
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXSecuring Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXDocker, Inc.
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeHow To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeDocker, Inc.
 
Distributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at SalesforceDistributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at SalesforceDocker, Inc.
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubThe First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubDocker, Inc.
 
Monitoring in a Microservices World
Monitoring in a Microservices WorldMonitoring in a Microservices World
Monitoring in a Microservices WorldDocker, Inc.
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...Docker, Inc.
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with DockerDocker, Inc.
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeBecome a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeDocker, Inc.
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryHow to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryDocker, Inc.
 
Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!Docker, Inc.
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog ScaleDocker, Inc.
 
Labels, Labels, Labels
Labels, Labels, Labels Labels, Labels, Labels
Labels, Labels, Labels Docker, Inc.
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelDocker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSDocker, Inc.
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...Docker, Inc.
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDeveloping with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDocker, Inc.
 

More from Docker, Inc. (20)

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXSecuring Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINX
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeHow To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and Compose
 
Hands-on Helm
Hands-on Helm Hands-on Helm
Hands-on Helm
 
Distributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at SalesforceDistributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at Salesforce
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubThe First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker Hub
 
Monitoring in a Microservices World
Monitoring in a Microservices WorldMonitoring in a Microservices World
Monitoring in a Microservices World
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeBecome a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio Code
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryHow to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container Registry
 
Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog Scale
 
Labels, Labels, Labels
Labels, Labels, Labels Labels, Labels, Labels
Labels, Labels, Labels
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDeveloping with Docker for the Arm Architecture
Developing with Docker for the Arm Architecture
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Docker in Open Science Data Analysis Challenges by Bruce Hoff

  • 1. Docker in Open Science Data Analysis Challenges Bruce Hoff Principal Software Engineer, Sage Bionetworks
  • 2. Open Science in Disease Research Containerization as a tool for scientific reproducibility Case Study: Docker in the 2015 ALS Stratification Challenge Case Study: Docker in the 2016 Digital Mammography Challenge Open Issues and Lessons Learned Agenda
  • 3. This talk is about saving lives. Disease research is data intensive… … but published analyses often aren’t reproducible. … and valuable data sets aren’t shared freely. … which reduces the rate of progress.
  • 4. Difficulties in science validation  Amgen scientists tried to confirm 53 landmark papers in pre-clinical oncology research: Only 6 (11%) were confirmed.[1]  Bayer HealthCare reported that only about 25% of published preclinical studies could be validated.[2]  Poti Gate: Genomics Research at Duke during 2006-2010, led to the identification of Diagnostic Signatures that spurred clinical trials. The research was later deemed statistically flawed and the clinical trials stopped [1] C. Glenn Begley and Lee M. Ellis, Nature 483, 531 (2012) [2] Prinz,F.,Schlange,T.&Asadullah,K., NatureRev. Drug Discov. 10, 712 (2011)
  • 5. Our Solution: Open Data Analysis Challenges  Engage the community, rather than a select company or lab, to solve a problem in biological/medicinal research.  Obtain and expose a high value data set that would otherwise be accessible by a few.  Require that participants share their code and document their algorithms; test for reproducibility.
  • 7. Measures of Impact • 32 scientific challenges • 50 partner institutions (since 2006) • >5000 registered users • 10 international conferences • 2500 conference attendees • >100 publications using DREAM data • 25 journal articles • 3 journal special issues • 2 edited books • 1,300 Citations • 20 PhD theses • Use of Challenges in Classroom as problem sets
  • 8. Dialogue for Reverse Engineering Assessment and Methods (DREAM) is a crowdsourcing effort that poses quantitative challenges about systems biology modeling. Sage Bionetworks (2009-) is a nonprofit biomedical research organization seeking to accelerate biomedical research through open systems, incentives, and standards. The two organization merged in 2013 to drive a continuing series of open science challenges. The Organization
  • 9. • Web services that facilitate collaborative web science – Projects Sharing Resources (code, files, ideas) – wiki narratives • Analysis provenance - linking data, code, and results; data versioning • Web services that facilitate Challenge logistics – Registration, acceptance of data usage, acceptance of Challenge Terms and Conditions – Real-time challenge leaderboards – Discussion Forum – Formation of Teams – Online Supplement for Challenge Papers: e.g.: https://www.synapse.org/#!Synapse:syn2528824/wiki/ Synapse: enabling collaborative research
  • 10. 2015 ALS Challenge a case study in using Docker in a DREAM challenge
  • 11. ALS is a rapidly progressing neurodegenerative disease that typically leads to death within 3-5 years but for which disease progression is heterogeneous across the patient population. Data for 9000 ALS patients provided by the Pooled Resources Open-Access Clinical Trial (PRO-ACT) database. The challenge was to predict disease progression from clinical data. $28,000 in prize money raised through a grass-roots fund drive https://www.indiegogo.com/projects/fund-the-prize-solve-als-together Nature Biotechnology agreed to publish the results.
  • 12. In a typical challenge… • Data is partitioned into – training – leaderboard – validation • Participants – download training data – apply statistical learning methods – submit predictions
  • 13. Organizers want to constrain submitted models to work in a certain way: • Model has a ‘selector’ component to select predictive clinical features • Model has a ‘predictor’ component to predict ALS outcome based on selected features. Organizers want to run each model themselves to: - Ensure models are structured as prescribed - Ensure reproducibility of output Docker to the rescue! Clinical Data Model Output Selector Selected Features Predictor
  • 14. Scientific Leadership High value data set IT Resources Prize Money High visibility Publication Community participation The ‘Stone Soup’ of Open Challenges
  • 15. IBM Cloud with a ZEC12 system virtual machine running a Linux server with 32 processors, 240 GB memory and 9 TB storage space. IBM Donates a Mainframe for ALS Challenge
  • 16. Provision a container on a unique port for each participant. They log in as: > ssh user_name@129.34.20.96 -p port_number Provide a script that sends a “signal” to a process running Docker > create_model_snapshot Back-end process runs “docker commit” to create a copy of the model for scoring. Back-end reruns captured image as a new container, after mounting leaderboard (or later, validation) data volume. Using Docker with a Mainframe
  • 17. 2016 Digital Mammography Challenge a case study in using Docker in a DREAM challenge
  • 18. • The Scientific Question: How can we reduce erroneous recall rate (false positives)? • Image analysis machine learning problem • “Deep learning” algorithms expected • $1.2M in prize money expected to attract 100s of serious participants • 600,000 mammography images donated (~20TB) • Budget for 100s of GPU servers from two Cloud providers (AWS, IBM)
  • 19. Why use Docker? 1) Large data size 2) Sensitive data 3) Provisioned compute
  • 20. (1) Allocate machine (e.g. own laptop) (2) Retrieve base image (3) Retrieve small, pilot dataset. (4) Create model (5) Verify model using pilot dataset
  • 21. (6) Push Dockerized model to registry (8) Receive trained model and score. … (7) Submit model to Challenge.
  • 22. Submission queue built into Synapse
  • 23. (1) Retrieve new submissions. … (2) Retrieve Docker image. (3) Train / score model. (4) Save trained model and score.
  • 24. • We’ve implemented the data donor’s wish to maintain control of the data. • We have obviated the need to download the large data set. • We have democratized participation, making compute available to those who might not otherwise have it. • After the challenge we have a library of rerunnable models ensuring reproducibility. Outcome
  • 25. • How best to monitor a fleet of Docker hosts (incl. GPU usage)? • How reproducible are models run on different GPU machines? How much of the software stack should be in the container? • How shall we limit submitted jobs? • Are there networking issues as models access data? • What are the security issues when running submitted containers? Open questions
  • 26. • Images aren't always portable. System Z images can't be used on Intel-based hardware. • Reproducibility doesn't mean comprehensibility • Find out about all our challenges at www.synapse.org • For those of you down in the trenches, see brucehoff/dockerauth for an example of how to do registry delegated authorization in Java. /etc
  • 27. Acknowledgements  Sage Bionetworks  Stephen Friend  Thea Norman  Lara Mangravite  Mike Kellen  Mette Peters  Arno Klein  Solly Sieberts  Abhi Pratap  Chris Bare  Bruce Hoff  IBM  Erhan Bilal  Kely Norel  Elise Blaese  Pablo Meyer Rojas  Kahn Rrhissorrakrai  EBI  Julio Saez Rodriguez  Thomas Cokelaer  Federica Eduati  Michael Menden  L. Maximilians University  Robert Kueffner,  Univ Colorado, Denver  Jim Costello  OHSU  Joe Gray  Adam Margolin  Mehmet Gonen  Laura Heiser  Prize4Life  Melanie Leitnerr  Neta Zach  NCI  Dinah Singer  Dan Gallahan  ISMMS  Eli Stahl  Gaurav Pandey  Columbia University  Andrea Califano  Mukesh Bansal  Chuck Karan  Rice University  Amina Qutub  David Noren  Byron Long  MD Anderson  Steven Kornblau  Univ of Lausanne  Daniel Marbach  Broad Institute  Bill Hahn  Barbara Weir  Aviad Tsherniak  Merck  Robert Plenge  BYU  Keoni Kauwe  OICR  Paul Boutros  UCSC  Josh Stuart
  • 29. • Science Translational Medicine (1 paper) • Nature Biotechnology (4 papers) • Nature Genetics (papers in preparation) • Nature Methods (papers in preparation) • Nature Neuroscience (papers in preparation) • PLoS Computational Biology (papers in review and preparation) • National Cancer Institute (contracts for Best Performers) Challenge Assisted Peer Review Partners
  • 30.  A crowdsourcing effort that poses quantitative challenges about systems biology modeling and data analysis on:  Transcriptional and signaling networks,  Predictions of response to perturbations,  Translational research (tox, RA, AD, ALS, AML, …)  Our mission is  to contribute to the solution of important biomedical problems  to foster collaboration between research groups  to democratize data  to accelerate research  to objectively assess algorithm performance What are the DREAM Challenges
  • 31. Peer review is subjective. But even if it were not, what comes to the reviewers may be biased:  Bias against publication of negative results or results contrary to published results  Incentive structure put researchers under considerable pressure to try until they find a positive result (multiple testing, over-fitting, etc.) Dani Brunner et al., Behavioral processes 89, 187-195 (2012) Inflated Statistical Significance Multiple Testing Selective Reporting Overfitting
  • 32. Benefits of crowd-sourcing • Performance Evaluation – Unbiased, consistent, and rigorous method assessment – Unbiased comparison and discovery of best methods – Determine the solvability of a scientific question • Sampling of the space of methods – Understand the diversity of methodologies presently being used to solve a problem
  • 33. Benefits of crowd-sourcing • Acceleration of Research – The community of participants can do in 4 months what would take 10 years to any group • Community Building – Make high quality, well-annotated data accessible – Foster community collaborations on fundamental research questions – Determine robust solutions through community consensus: “The Wisdom of Crowds”
  • 34. • Disease research is data intensive. A typical researcher has a PhD in multivariate statistics and does a lot of programming in languages like R, Python, and Matlab, using libraries of established tools. • So these analyses are software stacks of a sort, each piece having the typical series of revisions. • This makes reproducibility really challenging: To reproduce an analysis you need not only the original data and the statistical processing script written by the author, but the correct versions of all the dependencies. • Obviously containerization offers a powerful tool for reproducibility: the entire software stack used in an analysis can be tracked. The challenge of reproducibility

Editor's Notes

  1. Think in terms of mining data sets incorporating complete genomic profiles from thousands of subjects. Today someone working in disease research may have a PhD in statistics, never see a wet lab.
  2. Synapse provides a layer of web services that allow researchers to easily record and collaborate on their research (as widely or narrowly as they chose) in real-time and across institutional boundaries. These services include not only the Synapse web portal, but also programmatic clients which talk to the same web services. By leveraging Synapse provenance services, analysts are able to provide an analysis trail of the data, code, and results associated with a research project. This helps all involved in the project to clearly see what has been done, and by whom. By operating the Synapse platform and its services free of charge as a service to the scientific community, Sage Bionetworks hopes to catalyze new collaborations as well as exciting and reproducible scientific discoveries. Then maybe just mention that Brian Bot, Chris Bare, and Thea Norman are all at the meeting and would be happy to talk to anyone interested — and that they can stop by our poster.
  3. Synapse provides a layer of web services that allow researchers to easily record and collaborate on their research (as widely or narrowly as they chose) in real-time and across institutional boundaries. These services include not only the Synapse web portal, but also programmatic clients which talk to the same web services. By leveraging Synapse provenance services, analysts are able to provide an analysis trail of the data, code, and results associated with a research project. This helps all involved in the project to clearly see what has been done, and by whom. By operating the Synapse platform and its services free of charge as a service to the scientific community, Sage Bionetworks hopes to catalyze new collaborations as well as exciting and reproducible scientific discoveries. Then maybe just mention that Brian Bot, Chris Bare, and Thea Norman are all at the meeting and would be happy to talk to anyone interested — and that they can stop by our poster.
  4. For reproduced findings, authors had paid close attention to controls, reagents, investigator bias and describing the complete data set. For non-reproduced findings, data were not routinely analyzed by investigators blinded to the experimental versus control groups, there are no guidelines to report all data, etc. In the Bayer study 70% of the studies analyzed were on cancer research.