Successfully reported this slideshow.
Your SlideShare is downloading. ×

2015 04 bio it world

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
2015 09 emc lsug
2015 09 emc lsug
Loading in …3
×

Check these out next

1 of 72 Ad
Advertisement

More Related Content

Slideshows for you (20)

Advertisement

Similar to 2015 04 bio it world (20)

Advertisement

2015 04 bio it world

  1. 1. Research Computing@Broad An Update: Bio-IT World Expo April, 2015 Chris Dwan (cdwan@broadinstitute.org) Director, Research Computing and Data Services
  2. 2. Take Home Messages • Go ahead and do the legwork to federate your environment to at least one public cloud. – It’s “just work” at this point. • Spend a lot of time understanding your data lifecycle, then stuff the overly bulky 95% of it in an object store fronted by a middleware application. – The latency sensitive, constantly used bits fit in RAM • Human issues of consent, data privacy, and ownership are still the hardest part of the picture. – We must learn to work together from a shared, standards based framework – The time is now.
  3. 3. The world is quite ruthless in selecting between the dream and the reality, even where we will not. Cormac McCarthy, All the Pretty Horses
  4. 4. • The Broad Institute is a non-profit biomedical research institute founded in 2004 • Fifty core faculty members and hundreds of associate members from MIT and Harvard • ~1000 research and administrative personnel, plus ~2,400+ associated researchers Programs and Initiatives focused on specific disease or biology areas Cancer Genome Biology Cell Circuits Psychiatric Disease Metabolism Medical and Population Genetics Infectious Disease Epigenomics Platforms focused technological innovation and application Genomics Therapeutics Imaging Metabolite Profiling Proteomics Genetic Perturbation The Broad Institute
  5. 5. • The Broad Institute is a non-profit biomedical research institute founded in 2004 • Twelve core faculty members and more than 200 associate members from MIT and Harvard • ~1000 research and administrative personnel, plus ~1000 associated researchers Programs and Initiatives focused on specific disease or biology areas Cancer Genome Biology Cell Circuits Psychiatric Disease Metabolism Medical and Population Genetics Infectious Disease Epigenomics Platforms focused technological innovation and application Genomics Therapeutics Imaging Metabolite Profiling Proteomics Genetic Perturbation The Broad Institute 60+ Illumina HiSeq instruments, including 14 ‘X’ sequencers 700,000+ genotyped samples ~18PB unique data / ~30PB usable file storage
  6. 6. The HPC Environment Shared Everything: A reasonable architecture Network • ~10,000 Cores of Linux server • 10Gb/sec ethernet backplane. • All storage is available as files (NAS) from all servers.
  7. 7. Monitoring and metrics Matt Nicholson joins the Broad Chris Dwan joins the Broad
  8. 8. Monitoring and metrics Matt Nicholson joins the Broad Gradual puppetization Chris Dwan joins the Broad
  9. 9. Monitoring and metrics Matt Nicholson joins the Broad Gradual puppetization: increased visibility We’re pretty sure that we actually have ~15,000 cores Chris Dwan joins the Broad
  10. 10. Metal Boot Image Provisioning (PXE / Cobbler, Kickstart) Hardware Provisioning (UCS, Xcat) Broad specific system configuration (Puppet) User or execution environment (Dotkit, docker, JVM, Tomcat) Bare Metal OS and vendor patches (Red Hat / yum, plus satellite) Private Cloud Public Cloud Containerized Wonderland Management Stack: Bare Metal Network topology (VLANS, et al) Many specific technical decisions do not matter, so long as you choose something and make it work (Dagdigian, 2015)
  11. 11. Shared Everything: Ugly reality 10 Gb/sec Network • At least six discrete compute farms running at least five versions of batch schedulers (LSF and SGE) • Nodes “shared” by mix- and-match between owners. • Nine Isilon clusters • Five Infinidat filers • ~19 distinct storage technologies. Genomics Platform Cancer Program Shared “Farm” Overlapping usage = Potential I/O bottleneck when multiple groups are doing heavy analysis
  12. 12. Metal Boot Image Provisioning (PXE / Cobbler, Kickstart) Hardware Provisioning (UCS, Xcat) Broad specific configuration (Puppet) User or execution environment (Dotkit, docker, JVM, Tomcat) Hypervisor OS Instance Provisioning (Openstack) Bare Metal OS and vendor patches (Red Hat / yum, plus satellite) Private Cloud Public Cloud Containerized Wonderland Configuration Stack: Now with Private Cloud! Network topology (VLANS, et al) Re-use everything possible from the bare metal environment. While inserting things that make our life easier.
  13. 13. Openstack (RHEL, Icehouse) Openstack@Broad: The least cloudy cloud. 10 Gb/sec Network Genomics Platform Cancer Program Shared “Farm” Least “cloudy” implementation possible. IT / DevOps staff as users Simply virtualizing and abstracting away hardware from user facing OS. Note that most former problems remain intact Incrementally easier to manage with very limited staff (3 FTE Linux admins).
  14. 14. Openstack monitoring / UI is primitive at best
  15. 15. Openstack: open issues Excluded from our project: – Software defined networking (Neutron) – “Cloud” storage (Cinder / Swift) – Monitoring / Billing (Ceilometer, Heat) – High Availability on Controllers Custom: – Most deployment infrastructure / scripting, including DNS – Network encapsulation – Active Directory integration – All core systems administration functions Core Message: – Do not change both what you do and how you do it at the same time. – Openstack could have been a catastrophe without rather extreme project scoping.
  16. 16. I need “telemetry,” rather than logs*. Jisto: Software startup with smart, smart monitoring and potential for containerized cycle harvesting a-la condor. *Logs let you know why you crashed. Telemetry lets you steer.
  17. 17. Trust no one: This machine was not actually the hypervisor of anything.
  18. 18. Broad Institute Firewall NAS Filers Compute Internet Internet 2 Edge Router Router We need elastic computing, a more cloudy cloud.
  19. 19. Broad Institute Firewall NAS Filers Compute Internet Internet 2 Edge Router Router Subnet: 10.200.x.x Domain: openstack.broadinstitute.org Hostname: tenant-x-x Starting at the bottom of the stack There is no good answer to the question of “DNS in your private cloud” Private BIND domains are your friend The particular naming scheme does not matter. Just pick a scheme
  20. 20. Broad Institute Firewall NAS Filers Compute Internet Internet 2 Edge Router Router Amazon VPC VPN Endpoint VPN Endpoint More Compute! Subnet: 10.200.x.x Domain: openstack.broadinstitute.org Hostname: tenant-x-x Subnet: 10.199.x.x Domain: aws.broadinstitute.org Network Engineering: You don’t have to replace everything.
  21. 21. Broad Institute Firewall NAS Filers Compute Internet Internet 2 Edge Router Router Amazon VPC VPN Endpoint VPN Endpoint More Compute! Subnet: 10.200.x.x Domain: openstack.broadinstitute.org Hostname: tenant-x-x Subnet: 10.199.x.x Domain: aws.broadinstitute.org Ignore issues of latency, network transport costs, and data locality for the moment. We’ll get to those later. Differentiate Layer 2 from Layer 3 connectivity. We are not using Amazon Direct Connect. We don’t need to, because AWS is routable via Internet 2. Network Engineering Makes Everything Simpler
  22. 22. Physical Network Layout: More bits! Markley Data Centers (Boston) Internap Data Centers (Somerville) Broad: Main St. Broad: Charles St. Between data centers: • 80 Gb/sec dark fiber Internet: 1 Gb/sec 10 Gb/sec Internet 2 10 Gb/sec 100 Gb/sec Failover Internet: 1 Gb/sec Metro Ring: • 20 Gb/sec dark fiber
  23. 23. My Metal Boot Image Provisioning (PXE / Cobbler, Kickstart) Hardware Provisioning (UCS, Xcat) Broad specific configuration (Puppet) User or execution environment (Dotkit, docker, JVM, Tomcat) Hypervisor OS Instance Provisioning (Openstack) Bare Metal OS and vendor patches (Red Hat / yum, plus satellite) Private Cloud Public Cloud Containerized Wonderland Configuration Stack: Private Hybrid Cloud! Network topology (VLANS, et al) Public Cloud Infrastructure Instance Provisioning (CycleCloud)
  24. 24. CycleCloud provides straightforward, recognizable cluster functionality with autoscaling and a clean management UI. Do not be fooled by the 85 page “quick start guide,” it’s just a cluster.
  25. 25. A social digression on cloud resources • Researchers are generally: – Remarkably hardworking – Responsible, good stewards of resources – Not terribly engaged with IT strategy These good character traits present social barriers to cloud adoption Researchs Need – Guidance and guard rails. – Confidence that they are not “wasting” resources – A sufficiently familiar environment to get started
  26. 26. Multiple Public Clouds Openstack Batch Compute Farm: 2015 Edition Production Farm Shared Research FarmTwo clusters, running the same batch scheduler (Univa’s Grid Engine). Production: Small number of humans operating several production systems for business critical data delivery. Resarch: Many humans running ad- hoc tasks.
  27. 27. Multiple Public Clouds Openstack End State: Compute Clusters Production Farm Shared Research Farm A financially in-elastic portion of the clusters is governed by traditional fairshare scheduling. Fairshare allocations change slowly (month to month) based on conversation, investment, and discussion of both business and emotional factors This allows consistent budgeting, dynamic exploration, and ad-hoc use without fear or guilt.
  28. 28. Multiple public clouds support auto-scaling queues for projects with funding and urgency Openstack plus public clouds provides a consistent capacity End State: Compute Clusters Production Farm Shared Research Farm On a project basis, funds can be allocated for truly elastic burst computing. This allows business logic to drive delivery based on investment A financially in-elastic portion of the clusters is governed by traditional fairshare scheduling. Fairshare allocations are changed slowly (month to month, perhaps) based on substantial conversation, investment, and discussion of both business logic and feelings. This allows consistent budgeting, dynamic exploration, and ad-hoc use without fear or guilt.
  29. 29. Broad Institute Amazon VPC The term I’ve heard for this is “intercloud.” End State: Multiple Interconnected Public Clouds for collaboration Google Cloud Sibling Institutions Long term goal: • Seamless collaboration inside and outside of Broad • With elastic compute and storage • With little or no copying of files or ad-hoc, one-off hacks
  30. 30. My Metal Boot Image Provisioning (PXE / Cobbler, Kickstart) Hardware Provisioning (UCS, Xcat) Broad configuration (Puppet) User or execution environment (Dotkit, docker, JVM, Tomcat) Hypervisor OS Instance Provisioning (Openstack) Bare Metal End User visible OS and vendor patches (Red Hat, plus satellite) Private Cloud Public Cloud Containerized Wonderland Configuration Stack: Now with containers! Network topology (VLANS, et al) Public Cloud Infrastructure Instance Provisioning (CycleCloud) ??? … Docker / Mesos Kubernetes / Cloud Foundry / Common Workflow Language / …
  31. 31. What about the data?
  32. 32. Scratch Space: “Pod local,” SSD filers 10 Gb/sec Network 80+ Gb/sec Network Scratch Space: • 3 x 70TB filers from Scalable Informatics. • Running a relative of Lustre • Over multiple 40Gb/sec interfaces • Managed using hostgroups, workload affinities, and an attentive operations team Openstack Production Farm
  33. 33. For small data: Lots of SSD / Flash • Unreasonable requirement: Make it impossible for spindles to be my bottleneck – 8 GByte per second throughput • Multiple quotes with fewer than 8 x 10Gb/sec ports – ~100 TByte usable capacity • I am not asking about large volume storage • Give me sustainable pricing. – On a NAS style file share • Raw SAN / block device / iSCSI is not the deal. • Solution: Scalable Informatics “Unison” filers – A lot of vendors failed basic sanity checks on this one. – Please listen carefully when I state requirements. I do not believe in single monolithic solutions anymore.
  34. 34. Caching edge filers for shared references 10 Gb/sec Network 80+ Gb/sec Network Scratch Space: • 3 x 70TB filers from Scalable Informatics. • Workload managed by hostgroups, workload affinities, and an attentive operations team Openstack Production Farm Avere Edge Filer (physical) On premise data stores Shared Research Farm Coherence on small volumes of files provided by a combination of clever network routing and Avere’s caching algorithms.
  35. 35. Plus caching edge filers for shared references 10 Gb/sec Network 80+ Gb/sec Network Scratch Space: • 3 x 70TB filers from Scalable Informatics. • Workload managed by hostgroups, workload affinities, and an attentive operations team Openstack Production Farm Multiple Public Clouds Avere Edge Filer (physical) On premise data stores Cloud backed data stores Shared Research Farm Coherence on small volumes of files provided by a combination of clever network routing and Avere’s caching algorithms.
  36. 36. Plus caching edge filers for shared references 10 Gb/sec Network 80+ Gb/sec Network Scratch Space: • 3 x 70TB filers from Scalable Informatics. • Workload managed by hostgroups, workload affinities, and an attentive operations team Openstack Production Farm Multiple Public Clouds Avere Edge Filer (physical) On premise data stores Avere Edge Filer (virtual) Cloud backed data stores Shared Research Farm Coherence on small volumes of files provided by a combination of clever network routing and Avere’s caching algorithms.
  37. 37. Geek Cred: My First Petabyte, 2008
  38. 38. Cool thing: Avere • Avere sells software with optional hardware: • NFS front end whose block-store is an S3 bucket. • It was born as a caching accelerator, and it does that well, so the network considerations are in the right place. • Since the hardware is optional …
  39. 39. Cool thing: Avere • Avere sells software with optional hardware: • NFS front end whose block-store is an S3 bucket. • It was born as a caching accelerator, and it does that well, so the network considerations are in the right place. • Since the hardware is optional … • NFS share that bridging on premise and cloud.
  40. 40. But what about the big data?
  41. 41. Broad Data Production, 2015: ~100TB /wk Data production will continue to grow year over year We can easily keep up with it, if we adopt appropriate technologies. 100TB/wk ~= 1.3Gb/sec but 1PB @ 1Gb/sec ~= 12 days.
  42. 42. Broad Data Production, 2015: ~100TB /wk of unique information “Data is heavy: It goes to the cheapest, closest place, and it stays there” Jeff Hammerbacher
  43. 43. Data Sizes for one 30x Whole Genome Base Calls from a single lane of an Illumina HiSeq X • Approximately the coverage required for 30x on a whole human genome. • Record of a laboratory event • Totally immutable • Almost never directly used Aligned reads from that same lane: • Substantially better compression because of putting like with like. 95 GB 60 GB Aggregated, topped up, and re-normalized BAM: • Near doubling in file size because of multiple quality scores per base. 145 GB Variant File (VCF) and other directly usable formats • Even smaller when we cast the distilled information into a database of some sort Tiny
  44. 44. Under the hood: ~1TB of MongoDB
  45. 45. And now for something completely different
  46. 46. File based storage: The Information Limits • Single namespace filers hit real-world limits at: – ~5PB (restriping times, operational hotspots, MTBF headaches) – ~109 files: Directories must either be wider or deeper than human brains can handle. • Filesystem paths are presumed to persist forever – Leads inevitably to forests of symbolic links • Access semantics are inadequate for the federated world. – We need complex, dynamic, context sensitive semantics including consent for research use.
  47. 47. We’re all familiar with this
  48. 48. Limits of File Based Organization
  49. 49. Limits of File Based Organization • The fact that whatever.bam and whatever.log are in the same directory implies a vast amount about their relationship. • The suffixes “bam” and “log” are also laden with meaning • That implicit organization and metadata must be made explicit in order to transcend the boundaries of file based storage
  50. 50. Limits of File Based Organization • Broad hosts genotypes derived from perhaps 700,000 individuals • These genotypes are organized according to a variety of standards (~1,000 cohorts), and are spread across a variety of filesystems • Metadata about consent, phenotype, etc is scattered across dozens to hundreds of “databases.”
  51. 51. Limits of File Based Organization • This lack of organization is holding us back from: • Collaboration and Federation between sibling organizations • Substantial cost savings using policy based data motion • Integrative research efforts • Large scale discoveries that are currently in reach
  52. 52. We’re all familiar with this Early 2014: Conversations about object storage with: • Amazon Google • EMC Cleversafe • Avere Amplidata • Data Direct Networks Infinidat • …
  53. 53. My object storage opinions • The S3 standard defines object storage – Any application that uses any special / proprietary features is a nonstarter – including clever metadata stuff. • All object storage must be durable to the loss of an entire data center – Conversations about sizing / usage need to be incredibly simple • Must be cost effective at scale – Throughput and latency are considerations, not requirements – This breaks the data question into stewardship and usage • Must not merely re-iterate the failure modes of filesystems
  54. 54. The dashboard should look opaque
  55. 55. The dashboard should look opaque • Object “names” should be a bag of UUIDs • Object storage should be basically unusable without the metadata index. • Anything else recapitulates the failure mode of file based storage.
  56. 56. The dashboard should look opaque • Object “names” should be a bag of UUIDs • Object storage should be basically unusable without the metadata index. • Anything else recapitulates the failure mode of file based storage. • This should scare you.
  57. 57. Current Object Storage Architecture “Boss” Middleware Consent for Research Use Phenotype LIMS Legacy File (file://) Cloud Providers (AWS / Google) • Domain specific middleware (“BOSS”) objects • Mediates access by issuing pre-signed URLs • Provides secured, time limited links • A work in progress. On Premise Object Store (2.6PB of EMC)
  58. 58. Current Object Storage Architecture “Boss” Middleware Consent for Research Use Phenotype LIMS Legacy File (file://) Cloud Providers (AWS / Google) On Premise Object Store (2.6PB of EMC) • Broad is currently decanting our two IRODs archives into 2.6PB of on-premise object storage. • This will free up 4PB of NAS filer (enough for a year of data production). • Have pushed at petabyte scale to Google’s cloud storage • At every point: Challenging but possible.
  59. 59. Data Deletion @ Scale Me: “Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket. What do you think?”
  60. 60. Data Deletion @ Scale Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket Ray: “BOOM!”
  61. 61. Data Deletion @ Scale Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket • This was my first deliberate data deletion at this scale. • It scared me how fast / easy it was. • Considering a “pull request” model for large scale deletions.
  62. 62. Files must give way to APIs At large scale, the file/folder model for managing data on computers becomes ineffective as a human interface, and eventually a hindrance to programmatic access. The solution: object storage + metadata. Regulatory Issues Ethical Issues Technical Issues
  63. 63. Federated Identity Management • This one is not solved. • I have the names of various technologies that I think are involved: OpenAM, Shibboleth, NIH Commons, … • It is up to us to understand the requirements and build a system that meets them. • Requirements are: – Regulatory / legal – Organizational – Ethical.
  64. 64. Genomic data is not de-identifiable Regulatory Issues Ethical Issues Technical Issues
  65. 65. This stuff is important We have an opportunity to change lives and health outcomes, and to realize the gains of genomic medicine, this year. We also have an opportunity to waste vast amounts of money and still not really help the world. I would like to work together with you to build a better future, sooner. cdwan@broadinstitute.org
  66. 66. Standards are needed for genomic data “The mission of the Global Alliance for Genomics and Health is to accelerate progress in human health by helping to establish a common framework of harmonized approaches to enable effective and responsible sharing of genomic and clinical data, and by catalyzing data sharing projects that drive and demonstrate the value of data sharing.” Regulatory Issues Ethical Issues Technical Issues
  67. 67. Thank You Research Computing Ops: Katie Shakun, David Altschuler, Dave Gregoire, Steve Kaplan, Kirill Lozinskiy, Paul McMartin, Zach Shulte, Brett Stogryn, Elsa Tsao Scientific Computing Services: Eric Jones, Jean Chang, Peter Ragone, Vince Ryan DevOps: Lukas Karlsson, Marc Monnar, Matt Nicholson, Ray Pete, Andrew Teixeira DSDE Ops: Kathleen Tibbetts, Sam Novod, Jason Rose, Charlotte Tolonen,, Ellen Winchester Emeritus: Tim Fennell, Cope Frazier, Eric Golin, Jay Weatherell, Ken Streck BITS: Matthew Trunnell, Rob Damian, Cathleen Bonner, Kathy Dooley, Katey Falvey, Eugene Opredelennov, Ian Poynter, (and many more) DSDE: Eric Banks, David An, Kristian Cibulskis, Gabrielle Franceschelli, Adam Kiezun, Nils Homer, Doug Voet, (and many more) KDUX: Scott Sutherland, May Carmichael, Andrew Zimmer (and many more) 68
  68. 68. Partner Thank Yous • Accunet (Nick Brown) Amazon • Avere Cisco (Skip Giles) • Cycle Computing EMC (Melissa Crichton, Patrick Combes) • Google (Will Brockman) Infinidat • Intel (Mark Bagley) Internet 2 • Red Hat Scalable Informatics (Joe Landman) • Solina Violin Memory • …
  69. 69. Take Home Messages • Go ahead and do the legwork to federate your environment to at least one public cloud. – It’s “just work” at this point. • Spend a lot of time understanding your data lifecycle, then stuff the overly bulky 95% of it in an object store fronted by a middleware application. – The latency sensitive, constantly used bits fit in RAM • Human issues of consent, data privacy, and ownership are still the hardest part of the picture. – We must learn to work together from a shared, standards based framework – The time is now.
  70. 70. The opposite of play is not work, it’s depression Jane McGonnigal, Reality is Broken

×