2018 Bio-IT World Agile in Wet Labs Speeds Big Data

Using Agile Techniques in Wet Labs
to Speed the Creation of Even More Big Data
Bruce Kozuma, Principal System Analyst
Kendra West, Scrum Master, Data Sciences and Data Engineering
Thursday 2018/05/17, Bio-IT World

About the Authors
• Bruce Kozuma is a Principal
Systems Analyst in IT
• Connect via LinkedIn:
https://linkedin.com/in/bkozuma
• Kendra West is a Scrum Master in
Data Sciences and Data Engineering
• Connect via LinkedIn:
https://linkedin.com/in/kendraleighwe
st

Core Members
~10
Institute Members
~38
Associate Members
~322
Employees
~1000
Post-Docs, Fellows & Scholars in Residence
~100
Visiting Scientists, Staff & Researchers
~750
Students
~550
Post-Docs/Partner Institutions
~600
Over 3,400 Broadies working together

About the Broad Institute of MIT and Harvard
• Propelling the understanding and
treatment of disease
• Collaborating deeply
• Reaching globally
• Empowering scientists
• Building partnerships
• Sharing data and knowledge
• Promoting inclusion

The Agile Manifesto
Individuals & Interactions > Processes & Tools
*Delivering Value > Comprehensive Documentation
Customer Collaboration > Contract Negotiation
Responding to Change > Following a Plan
*adapted to fit organizational needs

What is the Agile approach?
• We follow Twelve Agile Principles behind the Manifesto:
• Our highest priority is to satisfy the customer
through early and continuous delivery
of value
• Welcome changing requirements, even late in
development; Agile processes harness change
for the customer's competitive advantage
• Deliver frequently, from a couple of weeks to
a couple of months, with a preference to the
shorter timescale
• Value delivery is the primary measure of progress
Frequent
delivery and
feedback

• Business people and developers must work together
daily throughout the project
• The most efficient and effective method of conveying
information to and within a development team is
face-to-face conversation
• The best architectures, requirements, and designs
emerge from self-organizing teams
• At regular intervals, the team reflects on how to
become more effective, then tunes and adjusts its
behavior accordingly
Teams
communicating
openly

• Build projects around motivated individuals;
Give them the environment and support they need,
and trust them to get the job done
• Agile processes promote sustainable development;
The sponsors, developers, and users should be able
to maintain a constant pace indefinitely
• Continuous attention to technical excellence
and good design enhances agility
• Simplicity – the art of maximizing the amount
of work not done – is essential
Doing
our best
work

What is Scrum?
• An Agile framework
• Born in Boston
• 90% of Agile teams worldwide use Scrum
• Borrows its name from rugby

Scrum Values, Pillars, and Elements
Scrum values
OpeneSs
Courage
Respect
FocUs
ComMitment
Scrum pillars
• Transparency
• Inspection
• Adaptation
Scrum team
• Product Owner
• Scrum Master
• Development Team
Scrum events
• The Sprint
• Sprint Planning
• Daily Scrum
• Sprint Review
• Sprint Retrospective
Scrum artifacts
• Product Backlog
• Sprint Backlog
• Increment
• Definition of Done

The Broad’s mission embodies many Agile values!
Broad Mission
• Propelling the understanding
and treatment of disease
• Collaborating deeply
• Reaching globally
• Empowering scientists
• Building partnerships
• Sharing data and knowledge
• Promoting inclusion
Agile themes
• Frequent delivery & feedback
• Teams communicating openly
• Doing our best work
Too many arrows!

How to measure Big Data?
• Classic way is via Doug Laney’s Volume, Velocity, Variety model
• Volume: size of data (e.g., total size of a data set, number of records, number
of files, size of files)
• Velocity: Rate at which data produced and changed (e.g., production of BAMs,
changes in UCSC genome releases GRCh37 vs hg17)
• Variety:
• Diversity of formats (e.g., FASTQ, BAM, VCF, CRAM)
• Non-aligned data structures (e.g., CDISC)
• Inconsistent data semantics (e.g., cell line names)

Thesis of this talk!
• Using Agile techniques in wet labs and computational science speeds production of
big data in multiple dimensions
• Volume
• Increases number of samples sequenced
• Lowers cost of sequencinganalysis and barriers to clinical sequencing
• Velocity
• Reduces cycle time of physical sample preparation prior to sequencing
• Improves use of people and resources in lab work
• Variety
• Increases types of samples being sequenced (e.g., types of cells, diseases,
ethnic and geographic diversity, nomenclatures, APIs, and repositories)

Broad Institute launched
Initial $100M gift from Broad Foundations;
A 10-year “experiment” in collaborative
science
Broad doubles in size
Governed by MIT-Harvard leadership;
Administratively managed within MIT
Headquarters building opens
250,000 sq. ft. at 415 Main Street
Broads double initial gift to $200M
Unrestricted for Broad research and
operations
Creation of Stanley Center
Founding $100M, 10-year gift from
Stanley Medical Research Institute
“Experiment” declared a success
Broads announce new endowment of $400 million
Combined $600M Current Use + Endowment Gift
Carlos Slim Foundation provides $65M
New initiative in genomic disease research;
1st U.S. collaboration to receive funding
Stanley
building opens
at 75 Ames Street
Second gift of $74M
Slim Initiative for Genomic
Medicine for the Americas
10th anniversary
$100M gift from Broad Foundations
to launch next decade of science
Creation of the Klarman Cell Observatory
Klarman Family Foundation gift of 33M
Commitment of $650M
Ted Stanley invests in
psychiatric research
2002 2004 2007 2008 2009 2010 2012 2013 20142006 2015
Broad Genomics
GP and DSP align
Genomics Platform
BSP Arrays and
Sequencing merge
Volume – Size of sequenced sample x # samples
100,000 genomes
~ 70 PB of data
~ 825K BAM files
~ 1.2 billion hours
of streaming music
Two major research groups come together
Whitehead/MIT Center for Genome Research;
Harvard Institute of Chemistry and Cell Biology
Broad Institute, Inc. established
501(c)3 formed 9/08; Operations begin 7/09

Velocity
• Sequence cost/genome fallen ~$1K
• Cost to analyze a genome has also
fallen to ~$5
• Why does this matter?
Precision/Personalized
medicine involves more
sequencing
• Assert: Agile increases
velocity of reducing costs
via shorter cycle times,
cheaper reagents, reusable
software, better use of
people, etc.

Velocity – Sample preparation and sequencing
• How? Using Dynamic Work Design
• Principle #1: Constant reconciliation of intent and activity
• Principle #2: Regular use of structured problem solving
• Principle #3: Optimal challenge
• Principle #4: Connect the human chain

• Genomics
Platform
achieves
these results
through better
technology:
• Instruments
• Software
• Reagents
• Training
• Organization

• Dynamic Work Design shares many similarities with Agile/Scrum and uses many of
the same techniques:
• Visual management
• Morning production meeting
• Pull system (Kanban)

Velocity – People and resources

• PRISM for multiplexing screen of compounds against
cancer cell lines (wet lab)
• Dependency Map a public
portal for cancer data (wet
lab, COTS software,
software development)
Agile practices used
• Retrospectives
• Standups
• Sprints
• Kaizen
• Visual board

• Improving use of people and resources in data
science by enabling reuse
• Data Biosphere: modular and interoperable
components that can be assembled into diverse
data environments. The Data Biosphere should be
based on four governing principles. It should be:
• (1) modular, composed of functional components with well-specified interfaces
• (2) community-driven, created by many groups to foster a diversity of ideas;
• (3) open, developed under open-source licenses that enable extensibility and reuse, with
users able to add custom, proprietary modules as needed
• (4) standards-based, consistent with standards developed by coalitions such as the Global
Alliance for Genomics and Health (GA4GH)
Agile values
• Deliver value
• Work together
• Self-organizing teams
• Simplicity

Variety
• Increases types of samples being sequenced in additional dimensions, e.g.,
• Types and sources of cells
• Types of diseases
• Ethnic and geographic diversity
• Nomenclatures, APIs, and repositories
• Agile practices being applied in each case, speeding the processing of samples
and the creation of both sample metadata and genomic data

Variety – Types and sources of cells
• Agile principles being used by Broad labs involved
in Human Cell Atlas to manage wet lab work (e.g.,
visual boards,
retrospectives)
• Agile used to develop
portals to enable patients,
at scale, to sign up and
consent for studies, and
for sample processing

Variety – Ethnic and geographic diversity
• In 2016, 81% of participants in Genome-Wide Association Studies (GWAS) of
European descent, where African, Latin American, native or indigenous make up
less than 4%
• Agile practices used to
further studies in under-
represented populations
(e.g., visual management,
short delivery cycles)

Variety – Types of diseases
• Agile practices used to aid the study of a wider range of
diseases, e.g.,
• The Sabeti Lab uses Agile
practices in their work on
infectious diseases to
enable real-time sharing of
genomic data

Variety – Nomenclatures, APIs, and repositories
• Nomenclatures are critically important to sharing
data and promoting collaboration (e.g., cell lines)
• Broad scientists, both wet lab and data, are key
contributors to organizations and alliances that
have and promote sharing of data through public
(and coordinated) APIs
• Agile practices
used by both
groups in their
daily work!

How the Broad encourages adoption of Agile
• Encourages collaboration within the Broad, e.g.,
• Platforms (e.g., Genomics, Data Sciences)
• Programs (e.g., Cancer, Infectious Disease and Microbiome)
• Academic labs (e.g., Sabeti Lab, Regev Lab)
• Employs Agile within scientific groups and administration, e.g.,
• Data Sciences Platform has Agile coaches, Scrum Masters, and Product Owners as job
descriptions/titles
• Broad Information Technology Services employs Scrum for specific projects
• Supports affinity groups and offers related training
• Agile Academia, focused specifically on educating and spreading use of Agile
• PM@Broad, focused on traditional project management, but PMI embracing Agile…
• People Development workshops (e.g., Influencing without Authority, Matrix Management)

Recapitulation – Thesis of this talk!
• Using Agile techniques in wet labs and computational science speeds production of
big data in multiple dimensions
• Volume
• Increases number of samples sequenced
• Lowers cost of sequencinganalysis and barriers to clinical sequencing
• Velocity
• Variety
• Increases types of samples being sequenced (e.g., types of cells, diseases,
ethnic and geographic diversity, nomenclatures, APIs, and repositories)

Acknowledgements
• Mark Baker
• Michelle Campo
• Jean Chang
• Raymond Coderre
• Sheila Dodge
• Vicky Guo
• Andrew Hollinger
• Eric Jones
• Jen Lapan
• Yenarae Lee
• Anthony Losada
• William Mayo
• Peter Ragone
• Jennifer Roth
Thank you to the many people who helped paved the way for current and
future success! A few notable individuals:
• Katie Shakun
• David Siedzik
• Rocky Stroud
• Diolinda Vaz
• Sarah Winnicki
Broad Alumni
• Sadiya Akasha
• Zeyna Haddad

2018 Bio-IT World Agile in Wet Labs Speeds Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2018 Bio-IT World Agile in Wet Labs Speeds Big Data

Similar to 2018 Bio-IT World Agile in Wet Labs Speeds Big Data (20)

More from Bruce Kozuma

More from Bruce Kozuma (6)

Recently uploaded

Recently uploaded (20)

2018 Bio-IT World Agile in Wet Labs Speeds Big Data

Editor's Notes