How UC San Francisco Delivered ‘Science as a
Service’ with Private Cloud for HPC
Brad Dispensa, University of California
A...
2
Agenda
 Who we are
 Motivation
 Project
 Architecture
 Next steps
3
Who We Are
 Andrew Nelson
 Staff Systems Engineer
• VMware
• VCDX#33
4
Who We Are
 Brad Dispensa
 Director IT / IS UCSF
• Department of Anesthesia
• Institute for Human Genetics
5http://www.flickr.com/photos/43021516@N06/7467742818/
6
7
8
What This Is Not…
 We are not launching a new product
 This is about a collaboration to determine the use case and
lim...
9
Motivation
 A need to deploy HPC as service
• *Where the use case makes sense
 Where could it make sense?
• Jobs that ...
10
11
Bias?
 Why VM people think they wouldn’t do this
• “You will saturate my servers and cause slowdowns in production sys...
12
Motivation
 Here’s the thing….
• Most life science jobs are single threaded
• Most “programmers” are grad-students
• H...
13
Run Any Software Stacks
App A
OS A
App B
OS B
virtualization layer
hardware
virtualization layer
hardware
virtualizatio...
14
Separate Workloads
virtualization layer
hardware
virtualization layer
hardware
virtualization layer
hardware
Secure mul...
15
Use Resources More Efficiently
App A
OS A
App B
OS B
virtualization layer
hardware
virtualization layer
hardware
virtua...
16
Workload Agility
hardware
operating system
app app app
virtualization layer
hardware
virtualization layer
hardware
app ...
17
Multi-tenancy with Resource Guarantees
Define policies to manage resource sharing
between groups
App A
OS A
App B
OS B
...
18
Protect Applications from Hardware Failures
virtualization layer
hardware
virtualization layer
hardware
virtualization ...
19
Protect Applications from Hardware Failures
virtualization layer
hardware
virtualization layer
hardware
virtualization ...
20
Elastic Application Layers
App A
OS A
App B
OS B
virtualization layer
hardware
virtualization layer
hardware
virtualiza...
21
http://siliconangle.com/blog/2011/05/24/the-basic-qos-myth-myth-3-of-the-good-enough-network/fuzzy-tv/
22
Agenda
 Who we are
 Motivation
 Project
 Architecture
 Next steps
23
http://www.examiner.com/article/best-barbecue-books
24
Project Overview
 Collaborative research effort between UCSF and VMware Field and
CTO Office
• Additional participatio...
25
Project Overview
 Desktop visualization
• Could we also replace expensive desktops with thin-client like devices
for u...
26
27
Project Overview – Success Factors
 Didn’t have to be as fast as metal but can’t be significantly slower
 The end pro...
28
Agenda
 Who we are
 Motivation
 Project
 Architecture
 Next steps
29
VMware vCAC
Users IT
Research Group 1 Research Group m
Public Clouds
Programmatic
Control and
Integrations
User Portals...
30
Architecture
 The components are “off the shelf”
• Standard Dell servers
• Mellanox FDR switches
• Isilon and DDN are ...
31
Architecture
 Why Blades?
• It’s what we have...
• The chassis allows us to isolate more easily for initial testing bu...
32
Agenda
 Who we are
 Motivation
 Project
 Architecture
 Next steps
33
Next Steps
 The Results will report performance comparisons between bare-
metal and virtualized for a set of Life Scie...
34
Next Steps
 Complete initial benchmarking
• Capture core metrics on the physical hardware and then capture the same
da...
35
36
Conclusions and future directions
http://commons.wikimedia.org/wiki/File:20_questions_1954.JPG
THANK YOU
How UC San Francisco Delivered ‘Science as a
Service’ with Private Cloud for HPC
Brad Dispensa, University of California
A...
VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC
Upcoming SlideShare
Loading in …5
×

VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

589 views
437 views

Published on

VMworld 2013

Brad Dispensa, University of California
Andrew Nelson, VMware

Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
589
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

  1. 1. How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC Brad Dispensa, University of California Andrew Nelson, VMware VSVC5272 #VSVC5272
  2. 2. 2 Agenda  Who we are  Motivation  Project  Architecture  Next steps
  3. 3. 3 Who We Are  Andrew Nelson  Staff Systems Engineer • VMware • VCDX#33
  4. 4. 4 Who We Are  Brad Dispensa  Director IT / IS UCSF • Department of Anesthesia • Institute for Human Genetics
  5. 5. 5http://www.flickr.com/photos/43021516@N06/7467742818/
  6. 6. 6
  7. 7. 7
  8. 8. 8 What This Is Not…  We are not launching a new product  This is about a collaboration to determine the use case and limitations of running workloads that historically have been run in HPC clusters that could be run virtually  What we find we will share so you can make your own choices
  9. 9. 9 Motivation  A need to deploy HPC as service • *Where the use case makes sense  Where could it make sense? • Jobs that are not dependent on saturating all I/O • Jobs that don’t require all available resources • Jobs that require bleeding edge packages • Users want to run as root (Really?!) • User wants to run an unsupported OS • Development / QA • Job integrity more important than run time • Funding issue (Grant based)
  10. 10. 10
  11. 11. 11 Bias?  Why VM people think they wouldn’t do this • “You will saturate my servers and cause slowdowns in production systems” • “I don’t have HPC Fabric” • “Vm sprawl would take over my datacenter” • “How would I begin to scope for a use case that does not fit the usual 20% utilization model?”  Why HPC people think they wouldn’t do this • “Its not high performance” • “It will be slow and unwieldy” • “My app has to be run on dedicated hardware” • “Latency introduced by the hypervisor” • “That wont work for my weird use case”
  12. 12. 12 Motivation  Here’s the thing…. • Most life science jobs are single threaded • Most “programmers” are grad-students • HPC in Life Sciences is not the same as HPC for oil and gas or other engineering users. • We are not “critical” it’s research, 5 9’s is not our deal • When do most long runs start, Friday. Nice to use that hardware that was just going to idle all weekend. • How is this any different than any discussion in HPC? • We often debate which file system is better, which chipset, controller. • It’s never be one size fits all. • Spending more time sizing rather than just running it. • The hardware should really be agnostic • Should we buy …. Or …. http://frabz.com/meme-generator/caption/10406-Willy-Wonka/
  13. 13. 13 Run Any Software Stacks App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware Support groups with disparate software requirements Including root access
  14. 14. 14 Separate Workloads virtualization layer hardware virtualization layer hardware virtualization layer hardware Secure multi-tenancy Fault isolation …and sometimes performance App A OS A App B OS B
  15. 15. 15 Use Resources More Efficiently App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware App A OS A App C OS B App C OS A Avoid killing or pausing jobs Increase overall throughput
  16. 16. 16 Workload Agility hardware operating system app app app virtualization layer hardware virtualization layer hardware app app app
  17. 17. 17 Multi-tenancy with Resource Guarantees Define policies to manage resource sharing between groups App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware App A OS A App C OS B App C OS A App A OS A App B OS B
  18. 18. 18 Protect Applications from Hardware Failures virtualization layer hardware virtualization layer hardware virtualization layer hardware Reactive Fault Tolerance: “Fail and Recover” App A OS App A OS
  19. 19. 19 Protect Applications from Hardware Failures virtualization layer hardware virtualization layer hardware virtualization layer hardware MPI-0 OS MPI-1 OS MPI-2 OS Proactive Fault Tolerance: “Move and Continue”
  20. 20. 20 Elastic Application Layers App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware App A OS A App C OS B App C OS A Ability to decouple compute and data and size each appropriately Multi-threading vs multi-VMs App A OS A App B OS B Compute OS A Data OS B
  21. 21. 21 http://siliconangle.com/blog/2011/05/24/the-basic-qos-myth-myth-3-of-the-good-enough-network/fuzzy-tv/
  22. 22. 22 Agenda  Who we are  Motivation  Project  Architecture  Next steps
  23. 23. 23 http://www.examiner.com/article/best-barbecue-books
  24. 24. 24 Project Overview  Collaborative research effort between UCSF and VMware Field and CTO Office • Additional participation by nVidia, EMC/Isilon and DDN.  Prove out the value of a private cloud solution for HPC Life Sciences workload  Stand up a small private cloud on customer-supplied hardware • Dell M1000E Blade Chassis • Dell M610 Blades • FDR-IB • Equalogic VMDK storage • DDN GPFS store • EMC/Isilon store (NFS)  Testing to include an array of Life Sciences applications important to UCSF, including some testing of the use of VMware VDI to move science desktop workloads into the private cloud
  25. 25. 25 Project Overview  Desktop visualization • Could we also replace expensive desktops with thin-client like devices for users that need to visualize complex imaging datasets or 3D instrument datasets?
  26. 26. 26
  27. 27. 27 Project Overview – Success Factors  Didn’t have to be as fast as metal but can’t be significantly slower  The end product must allow a user to self provision a host from a vetted list of options • “I want 10 Ubuntu machines that I can run as root with X packages installed”  Environment must be agile allowing for different workloads to cohabitate a single hardware environment • I.E. You can run a “R” workload on the same blade that is running a desktop visualization job  What ever you could do on metal, you have to be able to do in virtualization (*)  Users must be fully sandboxed to prevent “bad stuff” from leaving their workloads
  28. 28. 28 Agenda  Who we are  Motivation  Project  Architecture  Next steps
  29. 29. 29 VMware vCAC Users IT Research Group 1 Research Group m Public Clouds Programmatic Control and Integrations User Portals Security VMware vCNS Research Cluster 1 Research Cluster n VMware vCloud Automation Center VMware vCenter Server VMware vSphere VMware vSphere VMware vSphere Catalogs VMware vCenter Server VMware vCenter Server Secure Private Cloud for HPC
  30. 30. 30 Architecture  The components are “off the shelf” • Standard Dell servers • Mellanox FDR switches • Isilon and DDN are tuned as normal  No custom workflows • We tune the nodes the same way you would normally in your virtual and HPC environments.  There is no “next-gen” black box appliance used, what we have you can have.
  31. 31. 31 Architecture  Why Blades? • It’s what we have... • The chassis allows us to isolate more easily for initial testing but they are also commonly deployed in dense virtualization environments as well in HPC.
  32. 32. 32 Agenda  Who we are  Motivation  Project  Architecture  Next steps
  33. 33. 33 Next Steps  The Results will report performance comparisons between bare- metal and virtualized for a set of Life Sciences applications important to UCSF and life sciences: • BLAST – running a synthetic data set • Bowtie • Affymetrix and Illumina genomics pipelines (both with vendor-supplied test datasets) • R – with a stem-cell dataset (likely) or a hypertension dataset (possibly) • Desktop virtualization  The Results will also report on use of VDI to move current workstation science applications onto the Proof of Concept server cluster  An important part of this will be an assessment of the hypothesized value props: self provisioning, multi-tenancy, etc.
  34. 34. 34 Next Steps  Complete initial benchmarking • Capture core metrics on the physical hardware and then capture the same data as a virtualized host.  Does it work? • What happens when we start to scale it upwards, does performance stay linear?
  35. 35. 35
  36. 36. 36 Conclusions and future directions http://commons.wikimedia.org/wiki/File:20_questions_1954.JPG
  37. 37. THANK YOU
  38. 38. How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC Brad Dispensa, University of California Andrew Nelson, VMware VSVC5272 #VSVC5272

×