Leveraging the cloud to transform and
streamline informatics processes
Laboratory Informatics Summit
December 5, 2017
Chris Dwan (chris@dwan.org)
Traveller there is no path. The path is made by walking.
Antonio Machado
Conclusions
Nobody cares about the cloud
People care about business, scientific, and
clinical outcomes
“Cloud” is a means to an end.
Nothing more.
Conclusions
Nobody cares about the cloud
People care about business, scientific, and clinical outcomes
“Cloud” is a means to an end. Nothing more.
Conclusions
Nobody cares about
the cloud
People care about business, scientific, and clinical outcomes
“Cloud” is a means to an end. Nothing more.
Enterprise
CIO
Sr. Director,
Research IT
The future is already here – it’s just not very well distributed
William Gibson
My Cloud Journey
1998: Army Research Lab
– Java framework to distribute a target recognition workflow across multiple
DoD research sites
2002: Minnesota Center for Computational Biology and Genomics
– Campus wide “grid” unifying three compute clusters to run BLAST analyses for
crop genomics
2008: BioTeam
“Inquiry” HPC product ported to AWS
My first real “Infrastructure as code” moment
2012: New York Genome Center
Work to make a new genome center “cloud ready” (though limited initial
adoption)
2014 – 2017: Broad Institute of MIT and Harvard
Transition production genomics workflows to Google’s cloud
Geek Cred: My First Petabyte,
2008
Geek Cred: My first Petabyte: 2008
Geek Cred: My First Petabyte,
2008
My first Petabyte: 2008
2012: On-premise petabytes are no longer so interesting to me
Genomic Data Production in ContextGenomic data production @ Broad
Genomic Data Production in ContextGenomic data production @ Broad
I joined the Broad in
2014
Caveat: This plot looked
very similarly scary back
in 2007
Geek Cred: My First Petabyte,
2008
My first Exabyte: 2014
Genomes on the Cloud (April 2016)
Testing the genome
analysis pipeline
“Go-live”
8 months in the cloud
8 months in the cloud
“If you aim for simplicity, master complexity.”
The Mustard Seed Garden Manual of Painting, 1679
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Automatic technology
updates rather than
annual fire-drills
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
Unlimited Scale, no
more forklift upgrades
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
Products are familiar to the
end-user rather than opaque
technology
Unlimited Scale, no
more forklift upgrades
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
Products are familiar to the
end-user rather than opaque
technology
Unlimited Scale, no
more forklift upgrades
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
What is the cloud?
“Amazon Web Services is the cloud”*
Chris Dagdigian
Bio-IT World, November 2009
* He has revised this opinion in the last 8 years
What is the cloud?
“Cloud computing is a model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with minimal
management effort or service provider interaction.”
NIST Special Publication 800-145
Homemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
Take and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
DeliveryTake and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
RestaurantDeliveryTake and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
RestaurantDeliveryTake and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages
On-Premises
(legacy!)
Infrastructure as
a Service (IaaS)
Platform as a
Service (PaaS)
Software as a
Service (SaaS)
Credit: Everybody on the Internet.
Cloud based killer apps
• Team chat / messaging: Slack, Skype, Hipchat, …
• File Sharing: Onedrive, Dropbox, Box, Egnyte, Google
Drive, …
• Video conferencing: Zoom, Chime, Skype, Hangouts, …
• Office productivity: G-Suite, Office 365
• Databases: Both SQL and NoSQL
Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
If you lack this
You don’t get
to engage here
Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
Wireless Internet, Fully charged battery
If you lack this
You don’t get
to engage here
IT Hierarchy of Needs
Productivity and Security, Applications,
disaster preparedness
Automation and
compliance
“Thought
Partner”
Files, formats, naming conventions, access controls
Phones, Projectors, Internet, Email, Chat
Power, Building Access, Laptops, Wifi, Identity
If you lack this
You don’t get
to engage here
Office
Co-located Data Center
Cloud Hosted Legacy Architecture
Silos of Files
ServerFarm
Sysadmin Team
Data
Center
Team
Office Colocated Data Center
Cloud Hosted Legacy Architecture
Active
Directory
Master
Sysadmin Team
Data
Center
Team
AWS US-East-2
Silos of Files
ServerFarm
Office Colocated Data Center
Cloud Hosted Legacy Architecture
Active
Directory
Master
Sysadmin Team
Data
Center
Team
AWS US-East-2
Silos of Files
ServerFarm
ALL NEW!
70% MORE
CLOUD!
Office Colocated Data Center
Cloud Hosted Legacy Architecture
Active
Directory
Master
Sysadmin Team
Data
Center
Team
AWS US-East-2
Silos of Posix Storage
ServerFarm
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than annual
fire-drills
ALL NEW!
70% MORE
CLOUD!
Merely virtualizing your
infrastructure provides none of the
executive level benefits of “cloud”
What about the data?
Elasticity
Compute:
– Wal-mart parking lot
– Spiky, unpredictable demand
– Elasticity in compute is capacity
– For variable compute needs and agility, cloud compute is a
slam-dunk.
Data:
– Grows without bound
– Elasticity in data is mobility and latency
– Egress charges and lock-in present a structural challenge
for cloud as a long term data storage strategy.
The right side of history
• Applications are containerized (Docker, Singularity)
• Data is accessed RESTfully (S3)
• Identity management is federated (Oauth2, …)
• Analytics are ubiquitous (HDFS / Spark)
• Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and
surge capacity
• Data flow operations adopt serverless architectures (Lambda)
• Technologists are embedded in project teams (DevOps)
This is a multi year journey.
Start today.
The right side of history
• Applications are containerized (Docker, Singularity)
• Data is accessed RESTfully (S3)
• Identity management is federated (Oauth2, …)
• Analytics are ubiquitous (HDFS / Spark)
• Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and
surge capacity
• Data flow operations adopt serverless architectures (Lambda)
• Technologists are embedded in project teams
(DevOps / staff rotations)
This is a multi year journey.
Start today.
The opposite of play is not work, it’s depression
Jane McGonnigal, Reality is Broken
Financial Governance
$$ !!
Financial Controls
• Shifting from CapEx to OpEx can put spending
power in the hands of individual contributors,
with little to no oversight.
• Cloud providers have robust tools for setting
and tracking budgets, but you must use them.
Data Deletion @ Scale
Me: “Blah Blah … I think we’re cool to delete about 600TB of data from a cloud
bucket. What do you think?”
Data Deletion @ Scale
Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket
Ray: “BOOM!”
Data Deletion @ Scale
Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket
• This was my first deliberate data deletion at this scale.
• It scared me how fast / easy it was.
• Look for single accounts / roles that can destroy everything.
Identity and Authorization
Compliance and Security
Compliance:
– Things have changed a lot since 2014.
– All major cloud providers will now sign BAA and share
liability
– All major cloud providers can now support HIPAA,
HITECH, FISMA, and other audit standards
Security:
– Cloud based systems can be substantially more secure
than on premise.
– Can also be substantially less secure.
Premature optimization is the root of all evil (or at least
most of it)
Donald Knuth – Computer Programming as an Art, 1975
Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and
enterprise relationships.
Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Hosted legacy doesn’t cut it.
– Achieving the benefit of cloud technologies will require you to re-
architect your legacy systems and re-tool your development /
deployment processes.
Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Hosted legacy doesn’t cut it.
– Achieving the benefit of cloud technologies will require you to re-architect your legacy
systems and re-tool your development / deployment processes.
Trust the lab, seriously.
– If they cling to Excel, means that Excel is better from their perspective.
– Ask them. They do not care about the cloud.
Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Hosted legacy doesn’t cut it.
– Achieving the benefit of cloud technologies will require you to re-architect your legacy
systems and re-tool your development / deployment processes.
Trust the lab, seriously.
– If they cling to Excel, means that Excel is better from their perspective.
– Ask them. They do not care about the cloud.
When in doubt, focus on the basics. Don’t overthink it.
If you have four groups working on a compiler, you’ll get a four pass
compiler
Eric S Raymond, The New Hacker’s Dictionary, 1996
Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Automate your archives
Unless it’s sequencing or imaging, dump it all to S3.
1TB on full fare S3 is $25/month. Don’t optimize yet.
Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Automate your archives
Unless it’s sequencing or imaging, dump it all to S3.
1TB on full fare S3 is $25/month. Don’t optimize yet.
Capture Metadata
Scrape headers and whatever you can find into a simple database (NoSQL
is fine)
Include links to the S3 archive.
Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Automate your archives
Unless it’s sequencing or imaging, dump it all to S3.
1TB on full fare S3 is $25/month. Don’t optimize yet.
Capture Metadata
Scrape headers and whatever you can find into a simple database (NoSQL is fine)
Include links to the S3 archive.
Curate:
Establish a regular meeting to review data architecture and cloud costs.
This stuff is important
We have an opportunity to change lives and health
outcomes, and to realize the gains of genomic medicine,
this year.
We also have an opportunity to waste vast amounts of
money and still not really help the world.
I would like to work together with you to build a better
future, sooner.
chris@dwan.org
Thank You
chris@dwan.org
https://dwan.org

2017 12 lab informatics summit

  • 1.
    Leveraging the cloudto transform and streamline informatics processes Laboratory Informatics Summit December 5, 2017 Chris Dwan (chris@dwan.org)
  • 2.
    Traveller there isno path. The path is made by walking. Antonio Machado
  • 3.
    Conclusions Nobody cares aboutthe cloud People care about business, scientific, and clinical outcomes “Cloud” is a means to an end. Nothing more.
  • 4.
    Conclusions Nobody cares aboutthe cloud People care about business, scientific, and clinical outcomes “Cloud” is a means to an end. Nothing more.
  • 5.
    Conclusions Nobody cares about thecloud People care about business, scientific, and clinical outcomes “Cloud” is a means to an end. Nothing more.
  • 7.
  • 8.
    The future isalready here – it’s just not very well distributed William Gibson
  • 9.
    My Cloud Journey 1998:Army Research Lab – Java framework to distribute a target recognition workflow across multiple DoD research sites 2002: Minnesota Center for Computational Biology and Genomics – Campus wide “grid” unifying three compute clusters to run BLAST analyses for crop genomics 2008: BioTeam “Inquiry” HPC product ported to AWS My first real “Infrastructure as code” moment 2012: New York Genome Center Work to make a new genome center “cloud ready” (though limited initial adoption) 2014 – 2017: Broad Institute of MIT and Harvard Transition production genomics workflows to Google’s cloud
  • 10.
    Geek Cred: MyFirst Petabyte, 2008 Geek Cred: My first Petabyte: 2008
  • 11.
    Geek Cred: MyFirst Petabyte, 2008 My first Petabyte: 2008
  • 12.
    2012: On-premise petabytesare no longer so interesting to me
  • 13.
    Genomic Data Productionin ContextGenomic data production @ Broad
  • 14.
    Genomic Data Productionin ContextGenomic data production @ Broad I joined the Broad in 2014 Caveat: This plot looked very similarly scary back in 2007
  • 15.
    Geek Cred: MyFirst Petabyte, 2008 My first Exabyte: 2014
  • 16.
    Genomes on theCloud (April 2016) Testing the genome analysis pipeline “Go-live”
  • 17.
    8 months inthe cloud
  • 18.
    8 months inthe cloud
  • 19.
    “If you aimfor simplicity, master complexity.” The Mustard Seed Garden Manual of Painting, 1679
  • 20.
    Senior leadership and“cloud” Removes a major support burden from in-house staff
  • 21.
    Senior leadership and“cloud” Removes a major support burden from in-house staff Automatic technology updates rather than annual fire-drills
  • 22.
    Senior leadership and“cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  • 23.
    Unlimited Scale, no moreforklift upgrades Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  • 24.
    Products are familiarto the end-user rather than opaque technology Unlimited Scale, no more forklift upgrades Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  • 25.
    Products are familiarto the end-user rather than opaque technology Unlimited Scale, no more forklift upgrades Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  • 26.
    What is thecloud? “Amazon Web Services is the cloud”* Chris Dagdigian Bio-IT World, November 2009 * He has revised this opinion in the last 8 years
  • 27.
    What is thecloud? “Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST Special Publication 800-145
  • 28.
    Homemade Pizza as aService Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  • 29.
    Take and BakeHomemade Pizzaas a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  • 30.
    DeliveryTake and BakeHomemade Pizzaas a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  • 31.
    RestaurantDeliveryTake and BakeHomemade Pizzaas a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  • 32.
    RestaurantDeliveryTake and BakeHomemade Pizzaas a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages On-Premises (legacy!) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Credit: Everybody on the Internet.
  • 33.
    Cloud based killerapps • Team chat / messaging: Slack, Skype, Hipchat, … • File Sharing: Onedrive, Dropbox, Box, Egnyte, Google Drive, … • Video conferencing: Zoom, Chime, Skype, Hangouts, … • Office productivity: G-Suite, Office 365 • Databases: Both SQL and NoSQL
  • 34.
    Maslow’s Hierarchy ofNeeds Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep
  • 35.
    Maslow’s Hierarchy ofNeeds Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep If you lack this You don’t get to engage here
  • 36.
    Maslow’s Hierarchy ofNeeds Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep Wireless Internet, Fully charged battery If you lack this You don’t get to engage here
  • 37.
    IT Hierarchy ofNeeds Productivity and Security, Applications, disaster preparedness Automation and compliance “Thought Partner” Files, formats, naming conventions, access controls Phones, Projectors, Internet, Email, Chat Power, Building Access, Laptops, Wifi, Identity If you lack this You don’t get to engage here
  • 38.
    Office Co-located Data Center CloudHosted Legacy Architecture Silos of Files ServerFarm Sysadmin Team Data Center Team
  • 39.
    Office Colocated DataCenter Cloud Hosted Legacy Architecture Active Directory Master Sysadmin Team Data Center Team AWS US-East-2 Silos of Files ServerFarm
  • 40.
    Office Colocated DataCenter Cloud Hosted Legacy Architecture Active Directory Master Sysadmin Team Data Center Team AWS US-East-2 Silos of Files ServerFarm ALL NEW! 70% MORE CLOUD!
  • 41.
    Office Colocated DataCenter Cloud Hosted Legacy Architecture Active Directory Master Sysadmin Team Data Center Team AWS US-East-2 Silos of Posix Storage ServerFarm Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills ALL NEW! 70% MORE CLOUD! Merely virtualizing your infrastructure provides none of the executive level benefits of “cloud”
  • 42.
  • 43.
    Elasticity Compute: – Wal-mart parkinglot – Spiky, unpredictable demand – Elasticity in compute is capacity – For variable compute needs and agility, cloud compute is a slam-dunk. Data: – Grows without bound – Elasticity in data is mobility and latency – Egress charges and lock-in present a structural challenge for cloud as a long term data storage strategy.
  • 44.
    The right sideof history • Applications are containerized (Docker, Singularity) • Data is accessed RESTfully (S3) • Identity management is federated (Oauth2, …) • Analytics are ubiquitous (HDFS / Spark) • Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and surge capacity • Data flow operations adopt serverless architectures (Lambda) • Technologists are embedded in project teams (DevOps) This is a multi year journey. Start today.
  • 45.
    The right sideof history • Applications are containerized (Docker, Singularity) • Data is accessed RESTfully (S3) • Identity management is federated (Oauth2, …) • Analytics are ubiquitous (HDFS / Spark) • Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and surge capacity • Data flow operations adopt serverless architectures (Lambda) • Technologists are embedded in project teams (DevOps / staff rotations) This is a multi year journey. Start today.
  • 46.
    The opposite ofplay is not work, it’s depression Jane McGonnigal, Reality is Broken
  • 47.
  • 48.
    Financial Controls • Shiftingfrom CapEx to OpEx can put spending power in the hands of individual contributors, with little to no oversight. • Cloud providers have robust tools for setting and tracking budgets, but you must use them.
  • 49.
    Data Deletion @Scale Me: “Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket. What do you think?”
  • 50.
    Data Deletion @Scale Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket Ray: “BOOM!”
  • 51.
    Data Deletion @Scale Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket • This was my first deliberate data deletion at this scale. • It scared me how fast / easy it was. • Look for single accounts / roles that can destroy everything.
  • 52.
  • 53.
    Compliance and Security Compliance: –Things have changed a lot since 2014. – All major cloud providers will now sign BAA and share liability – All major cloud providers can now support HIPAA, HITECH, FISMA, and other audit standards Security: – Cloud based systems can be substantially more secure than on premise. – Can also be substantially less secure.
  • 54.
    Premature optimization isthe root of all evil (or at least most of it) Donald Knuth – Computer Programming as an Art, 1975
  • 55.
    Specific Recommendations Do notwaste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
  • 56.
    Specific Recommendations Do notwaste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process.
  • 57.
    Specific Recommendations Do notwaste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process. Hosted legacy doesn’t cut it. – Achieving the benefit of cloud technologies will require you to re- architect your legacy systems and re-tool your development / deployment processes.
  • 58.
    Specific Recommendations Do notwaste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process. Hosted legacy doesn’t cut it. – Achieving the benefit of cloud technologies will require you to re-architect your legacy systems and re-tool your development / deployment processes. Trust the lab, seriously. – If they cling to Excel, means that Excel is better from their perspective. – Ask them. They do not care about the cloud.
  • 59.
    Specific Recommendations Do notwaste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process. Hosted legacy doesn’t cut it. – Achieving the benefit of cloud technologies will require you to re-architect your legacy systems and re-tool your development / deployment processes. Trust the lab, seriously. – If they cling to Excel, means that Excel is better from their perspective. – Ask them. They do not care about the cloud. When in doubt, focus on the basics. Don’t overthink it.
  • 60.
    If you havefour groups working on a compiler, you’ll get a four pass compiler Eric S Raymond, The New Hacker’s Dictionary, 1996
  • 61.
    Day One Commitments CentralizeIdentity: Integrate AD / Centrify / Okta. Yes, the lab account too.
  • 62.
    Day One Commitments CentralizeIdentity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up
  • 63.
    Day One Commitments CentralizeIdentity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up Automate your archives Unless it’s sequencing or imaging, dump it all to S3. 1TB on full fare S3 is $25/month. Don’t optimize yet.
  • 64.
    Day One Commitments CentralizeIdentity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up Automate your archives Unless it’s sequencing or imaging, dump it all to S3. 1TB on full fare S3 is $25/month. Don’t optimize yet. Capture Metadata Scrape headers and whatever you can find into a simple database (NoSQL is fine) Include links to the S3 archive.
  • 65.
    Day One Commitments CentralizeIdentity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up Automate your archives Unless it’s sequencing or imaging, dump it all to S3. 1TB on full fare S3 is $25/month. Don’t optimize yet. Capture Metadata Scrape headers and whatever you can find into a simple database (NoSQL is fine) Include links to the S3 archive. Curate: Establish a regular meeting to review data architecture and cloud costs.
  • 66.
    This stuff isimportant We have an opportunity to change lives and health outcomes, and to realize the gains of genomic medicine, this year. We also have an opportunity to waste vast amounts of money and still not really help the world. I would like to work together with you to build a better future, sooner. chris@dwan.org
  • 67.