Multi-Tenant Pharma HPC Clusters

1,818 views

Published on

BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech

Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.

Please email me at: chris@bioteam.net if you would like the actual PDF file itself.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,818
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Multi-Tenant Pharma HPC Clusters

  1. 1. Multi-Tenant Research Clusters v 2.0 1
  2. 2. I’m Chris.I’m an infrastructure geek.I work for the BioTeam. www.bioteam.net - Twitter: @chris_dag 2
  3. 3. Your substitute host ...Speaking on behalf of others ‣ Original speaker can’t make it today ‣ Stepping in as substitute due to involvement in Assessment & Deployment project phases ‣ Just about everything in this presentation is the work of other, smarter, people! 3
  4. 4. Our Case Study:Sanofi S.A. 4
  5. 5. Pharma Multi-Tenant HPCCase Study: Sanofi S.A. ‣ Sanofi: multinational pharmaceutical with worldwide research & commercial locations ‣ 7 major therapeutic areas: cardiovascular, central nervous system, diabetes, internal medicine, oncology, thrombosis and vaccines ‣ Other Sanofi S.A. companies: Merial, Chattem, Genzyme & Sanofi Pasteur 5
  6. 6. HistoryCase Study: Sanofi S.A. ‣ System discussed here is among 1st major outcomes of a late-2011 global review effort called “HPC-O” (HPC Optimization) ‣ HPC-O involved: • Revisiting prior HPC recommendations • Intensive data-gathering & cataloging of HPC resources • North America: Interviews with 30+ senior scientists, IT & scientific leadership, system operators & all levels of global IT and infrastructure support services • Similar effort across EU operations as well 6
  7. 7. HPC-O RecommendationsCase Study: Sanofi S.A. ‣ Build a new shared-services HPC environment • Model/prototype for future global HPC • Designed to meet scientific & business requirements • Multiple concurrent users, groups and business units • ... use IT building blocks that are globally approved and supported 24x7 by Global IS (“GIS”) organization ‣ Site the initial system in the Boston area 7
  8. 8. Current Status ‣ Online since November 2012 ‣ Approaching end of initial round of testing, optimization and user acceptance work ‣ Today: currently running large-scale workloads but not formally in Production status 8
  9. 9. Why Multi-Tenant Cluster?Case Study: Sanofi S.A. ‣ HPC Mission & Scope Creep • 11+ separate HPC systems in North America alone • Wildly disparate technology, product & support models • “Islands” of HPC used by single business units • Almost no detectable cross-system or cross-site usage • Huge variance in refresh/upgrade cycles 9
  10. 10. Why Multi-Tenant Cluster, cont.Case Study: Sanofi S.A. ‣ Utilization & Efficiency • Islands of HPC tended to be underutilized most of the time and oversubscribed by a single business unit during peak demand times • Hardware age and capability varied widely due to huge differences in maintenance cycles by business unit • Cost of commercial software licensing (globally) hugely significant; difficult to maximize ROI & use of very expensive software entitlements across “islands” of HPC 10
  11. 11. Why Multi-Tenant Cluster, cont.Case Study: Sanofi S.A. ‣ Need for “Opportunistic Capacity” • Difficult to perform exploratory research outside the normal scope of business unit activities ‣ Avoid “Shadow IT” problems • Frustrated users will find their own DIY solutions • ... “the cloud” is just a departmental credit card away ‣ “Chaperoned” cloud-bursting • Centrally managed “chaperoned” utilization of IaaS cloud resources for specific workloads & data 11
  12. 12. In-house vs. CloudCase Study: Sanofi S.A. ‣ Cloud options studied intensively; still made decision to invest in significant regional HPC ‣ Some Reasons • Baseline “always available” capability • Ability to obsessively tune performance • Security • Cost control (many factors here ...) • Data size, movement & lifecycle issues • Agility 12
  13. 13. In-house vs. CloudCase Study: Sanofi S.A. ‣ HPC Storage: “Center of Gravity” for Scientific Data • Compute power is pretty easy and not super expensive • Storage need, even just for the Boston region is peta-scale • Mapping data flows and access patterns reveals a very complex web of researcher, instrument, workstation and pipeline interactions with storage ‣ In a nutshell: • Engineer the heck out of a robust peta-scale R&D storage platform for the Boston region • Drop a reasonable amount of HPC capability near this storage • Bias all engineering/design efforts to facilitate agility/change • Use the cloud only when best-fit 13
  14. 14. Picture TourWhat it actually looks like ... 14
  15. 15. 15
  16. 16. 16
  17. 17. 17
  18. 18. 18
  19. 19. 19
  20. 20. 20
  21. 21. 21
  22. 22. Key Enabling Technologies 22
  23. 23. Enabling Technologies: FacilityCase Study: Sanofi S.A. ‣ A Sanofi company has a suitable local colo suite • ... already under long-term lease • ... and with a bit of server consolidation, lots of space for HPC compute and storage • ... plenty of room for “adjunct” systems that will likely be attracted to storage “center of gravity” ‣ Can’t reveal exact size but this facility can handle double-digit numbers of additional HPC compute, storage and network cabinets 23
  24. 24. Enabling Technologies: WANCase Study: Sanofi S.A. ‣ Regional consolidated HPC is not possible without MAN/WAN efforts to connect all sites and users • ... direct routing required; not optimal to route HPC traffic through Corporate Tier-1 facilities that may be thousands of miles away • Existing MAN/WAN network links upgraded if there was a business/scientific justification • All other MAN/WAN links verified that expansion is easy/ possible should a business need arise 24
  25. 25. Enabling Technologies: WANCase Study: Sanofi S.A. ‣ Regional Networking Result • Most sites: bonded 1-Gigabit path to regional HPC hub • A Cambridge building has direct 10-Gigabit Ethernet link to the HPC hub; used for heavy data movement as well as ingest of data arriving on physical media • Special routing (HTTP, FTP,) in place for satellite locations not yet on the converged Enterprise WAN/MAN • HPC Hub Facility: - Dedicated HPC-only internet link for open-data downloads - Internet-2 connection being pursued for EDU collaboration 25
  26. 26. ArchitectureCase Study 26
  27. 27. ArchitecturePhilosophy ‣ Intense desire to keep things simple ‣ Commodity works very well; avoid the expensive and the exotic when we can ‣ Extra commodity capacity compensates for performance lost by not choosing the exotic competition • Also delivers more agility and easier reuse/repurposing ‣ If we build from globally-blessed IT components we can eventually turn basic operation, maintenance and monitoring over to the Global IS organization • ... freeing Research IT staff to concentrate on science & users 27
  28. 28. ArchitectureHPC Stack ‣ Explicit decision made to source the HPC cluster stack from a commercial provider • This is actually a radical departure from prior HPC efforts ‣ Many evaluated; one chosen ‣ Primary drivers: • 24x7 commercial support • Research IT staff needs to concentrate on apps/users • “Single SKU” Out-of-the-box functionality and features (bare metal provisioning, etc.) that reduce operational burden 28
  29. 29. ArchitectureHPC Stack - Bright Computing ‣ Bright Computing selected • Hardware neutral • Scheduler neutral • Full API, CLI and lightweight monitoring stack • Web GUIs for non-experts • Single dashboard for advanced monitoring and management • Data-aware scheduling & native support for AWS cloud bursting 29
  30. 30. ArchitectureCompute Hardware ‣ Compute Hardware 30
  31. 31. ArchitectureCompute Hardware ‣ Key Design Goals • Use common server config for as many nodes as possible • Modular & extensible design • “Blessed” by Global IS (GIS) organization 31
  32. 32. ArchitectureCompute Hardware ‣ HP C7000 Blade Enclosures • Our basic building block • Very flexible on network, interconnect and blade configuration • Sanofi GIS approved • “Lights-out” facility approved • Pre-negotiated preferential pricing on almost everything we needed 32
  33. 33. ArchitectureCompute Hardware ‣ HP C7000 Blade Enclosure becomes the smallest modular unit in HPC design ‣ Big cluster built from smaller preconfigured “blocks” of C7000s ‣ 4 standard “blocks”: • M-Block • C-Block • G-Block • X-Block 33
  34. 34. ArchitectureCompute Hardware ‣ M-Block (Mgmt) • HP BL460c Blades - Dual-socket quad-core with 96GB RAM & 1TB mirrored OS disks ‣ 2x HA Master Node(s) ‣ 1x Mgmt Node ‣ 3x HPC Login Node(s) ‣ ... plenty of room ... 34
  35. 35. ArchitectureCompute Hardware ‣ C-Block (Compute) • HP BL460c Blades - Dual-socket quad-core with 96GB RAM & 1TB mirrored OS disks ‣ Fully populated with 16 blades per enclosure ‣ Set of 8 C-Blocks = 1024 CPU Cores 35
  36. 36. ArchitectureCompute Hardware ‣ G-Block (GPU) • No C7000; HP s6500 enclosure used for G-Block units ‣ HP SL250s Servers ‣ 3x Tesla GPUs per SL250s Server ‣ ... 15 Tflop per G-Block 36
  37. 37. ArchitectureCompute Hardware ‣ X-Block C7000 • Hosting of “Adjunct Servers” • X-block for unique requirements that don’t fit into a standard C,G or M-block configuration; or for servers supplied by business units ‣ Big Memory Nodes ‣ Virtualization Platform(s) ‣ Big SMP Nodes ‣ Graphics/Viz Nodes ‣ Application Servers ‣ Database Servers 37
  38. 38. ArchitectureCompute Hardware ‣ Modular design can grow into double-digit numbers of datacenter cabinets • C-blocks and G-blocks for compute; M-blocks and X- blocks for Mgmt and special cases • 8-core 96GB RAM, 1TB BL460c blade is standard individual server config; deviation only when required 38
  39. 39. ArchitectureNetwork Hardware ‣ Network 39
  40. 40. ArchitectureNetwork Hardware ‣ Network ‣ Key design decision: • 10Gigabit Ethernet only • No Infiniband* ‣ Fully redundant Cisco Nexus 10Gb Fabric 40
  41. 41. ArchitectureNetwork Hardware ‣ Cisco Nexus 10G • Redundant Everything • C7000 enclosures (M, X and C-blocks) have 40Gb uplinks (80Gb possible) • G-Blocks and misc systems have 10Gb links • 20Gb bandwidth to each storage node • Easily expanded; centrally managed, monitored and controlled 41
  42. 42. ArchitectureStorage Hardware ‣ Storage 42
  43. 43. ArchitectureStorage Hardware ‣ EMC Isilon Scale-out NAS • ~1 petabyte raw for active use • ~1 petabyte raw for backup ‣ Why Isilon? • Large, single-namespace scaling beyond our most aggressive capacity projections • Easy to manage / GIS Approved • Aggregate throughput increases with capacity expansion • Tiering & SSD options 43
  44. 44. ArchitectureExternal Connectivity ‣ Dedicated Internet circuit for new HPC Hub • Direct download/ingest of large public datasets without affecting other business users • Downloads don’t hit MAN/WAN networks & avoid the centrally routed Enterprise internet egress point located hundreds of miles away • Very handy for Cloud/VPN efforts as well ‣ Internet 2 • I2 and other high speed academic network connectivity planned 44
  45. 45. ArchitecturePhysical data ingest ‣ Large Scale Data Ingest & Export • Often overlooked; very important! ‣ Dedicated Data Station • 10 Gig link to HPC Hub • Fast CPUs for checksum and integrity operations • Removable SATA/SAS bays • Lots of USB & eSATA ports 45
  46. 46. One More Thing ... 46
  47. 47. One more thing ...Not just a single cluster ‣ Single cluster? Nope. 47
  48. 48. One more thing ...Not just a single cluster ‣ Single cluster? Nope. ‣ The secret sauce is in the facility, storage and network core ‣ Petabytes of scientific data have a “gravitational pull” within an enterprise ‣ ... we expect many new users and use cases to follow 48
  49. 49. One more thing ...Not just a single cluster ‣ We can support: • Additional clusters & analytic platforms grafted onto our network and storage core • Validated server, software and cluster environments collocated in close proximity • Integration with private cloud and virtualization environments • Integration with public IaaS clouds • Dedicated Hadoop / Big Data environments • On-demand reconfiguration of C-Blocks into HDFS/Hadoop-optimized mini clusters • And much more ... 49
  50. 50. Beyond the hardware bits ... 50
  51. 51. Beyond the hardware ...Many other critical factors involved ‣ Lets Discuss: • Requirements Gathering • Building Trust • Governance • Support Model 51
  52. 52. Requirements Gathering‣ When seven-figure CapEx amounts are involved you can’t afford to make a mistake‣ Capturing business & scientific requirements is non trivial • ... especially when trying to account for future needs‣ Not a 1 person / 1 department job • ... requires significant expertise and insider knowledge spanning science, software, business plans and both research and global IT staff 52
  53. 53. Requirements GatheringOur approach 1. Keep the core project team small & focused • Engage niche resources (legal, security, etc) on demand 2. Promiscuous (“meet with anyone”) data gathering, meeting & discussion philosophy 3. Strong project management / oversight 4. Public support from senior leadership 5. Frequent sync-up with key leaders & groups • Global facility/network/storage/support orgs, Research budget & procurement teams, senior scientific leadership, etc. 53
  54. 54. Building TrustConsolidated HPC requires trust ‣ Previous: Many independent islands of HPC • ... often built/supported/run by local resources ‣ Moving to a shared-services model requires great trust among users & scientific leadership • Researchers have low tolerance for BS/incompetence • Informatics is essential; users need to be reassured that current capabilities will be maintained while new capabilities will be gained • Enterprise IT must be willing to prove it understands & can support the unique needs and operational requirements of research informatics 54
  55. 55. Building TrustOur approach ‣ Our Approach: • Strong project team with deep technical & institutional experience. Team members could answer any question coming from researchers or business unit professionally and with an aura of expertise & competence • Explicit vocal support from senior IT and research leadership (“We will make this work. Promise.”) • Willingness to accept & respond to criticism & feedback - ... especially when someone smashes a poor assumption or finds a gap in the planned design 55
  56. 56. Governance‣ Tied for first place among “reasons why centralized HPC deployments fail”‣ Multi-Tenant HPC Governance is essential‣ ... and often overlooked 56
  57. 57. GovernanceThe basic issue ‣ ... in research HPC settings there are certain things that should NEVER be dictated by IT ‣ It is not appropriate for an IT SysAdmin to ... • Create or alter resource allocation policies & quotas • Decide what users/groups get special treatment • Decide what software can and can not be used • ... etc. ‣ A governance structure involving scientific leadership and user representation is essential 57
  58. 58. GovernanceOur Approach ‣ Two committees: “Ops” and “Overlord” ‣ Ops Committee: Users & HPC IT staff coordinating HPC operations jointly ‣ Overlord Committee: Invoked as needed. Tiebreaker decisions, busts through political/ organizational walls and approve funding/ expansion decisions 58
  59. 59. GovernanceOur Approach ‣ Ops Committee communicates frequently and is consulted before any user-affecting changes occur • Membership is drawn from interested/engaged HPC “power users” from each business unit + the HPC Admin Team ‣ Ops Committee “owns” HPC scheduler & queue policies and approves/denies any requests for special treatment. All scheduler/policy changes are blessed by Ops before implementation ‣ This is the primary ongoing governance group 59
  60. 60. GovernanceOur Approach ‣ Overlord Committee meets only as needed • Membership: the scariest heavy hitters we could recruit from senior scientific and IT leadership - VP or Director level is not unreasonable • This group needs the most senior people you can find. Heavy hitters required when mediating between conflicting business units or busting through political/ organizational barriers - Committee does not need to be large, just powerful 60
  61. 61. Support ModelOur Approach ‣ Often-overlooked or under-resourced ‣ We are still working on this ourselves ‣ General model • Transition server, network and storage maintenance & monitoring over to Global IS as soon as possible • Free up rare HPC Support FTE resources to concentrate on enabling science & supporting users • Offer frequent training and local “HPC mentor” attention • Online/portal tools that facilitate user communication, best practice advice and collaborative “self-support” for common issues • Still TBD: Helpdesk, Ticketing & Dashboards 61
  62. 62. end; Thanks!Slides: http://slideshare.net/chrisdag/ 62

×