Your SlideShare is downloading. ×
0
Tuesday, July 10, 12
Inside the Atlassian OnDemand               private cloud               George Barnett               SAAS Platform Archite...
In 2010 a team of engineers moved into our secret lair                          (above a pub) to re-imagine our hosted pla...
6 months later                                               13,500 VMs                       Launch - October 2011       ...
We have a cloud. So what?Tuesday, July 10, 12
We also had a cloud.. and ..                          VM sprawl              Poor performance                       Over p...
Virtualisation often creates                    new challenges but does                  nothing about existing ones.Tuesd...
Tuesday, July 10, 12
Tuesday, July 10, 12
Tuesday, July 10, 12
Tuesday, July 10, 12
FocusTuesday, July 10, 12
Be less flexible about what                       infrastructure you provide.Tuesday, July 10, 12
“You can use any database you like, as                            long as its PostgreSQL 8.4.”                         #su...
• Stop trying to be everything to everyone                       • (we have other clouds within Atlassian)                ...
Fail fast. Learn quickly.Tuesday, July 10, 12
Do as little                       as possible                       deploy and                         use itTuesday, Jul...
Block-1                A small scale model of the initial proposed platform                architecture. 4 desktop machine...
Block-1                       Applications do not fall over.                       Network boot assumptions validated.    ...
Block-2                A large scale model of the platform architecture.                Purpose: Validate hardware resourc...
Block-2                       Customers per GB of RAM metric validated                       VM Distribution and failover ...
HardwareTuesday, July 10, 12
Challenge                Existing platform hardware was a poor fit for our workload.                Memory and IO were heav...
Monitoring                We took 6 months worth of monitoring data from our                existing platform.            ...
• 10 x Compute nodes (144G RAM, 12 cores, NO disks)                • 3 x Storage nodes (24 disks)                • Each ra...
Advantage #1                Reliable.                Each machine goes through a 2                day burn in before it go...
Advantage #2                Neat.Tuesday, July 10, 12
Advantage #3                Consistent.Tuesday, July 10, 12
Advantage #4                Easy to deploy.Tuesday, July 10, 12
No disks.Tuesday, July 10, 12
Wait. What?Tuesday, July 10, 12
Challenge                Existing compute infrastructure used local disk for swap                and hypervisor boot.     ...
• No disks in compute infrastructure                       • Avoid spinning 20 more disks per rack for a hypervisor OS    ...
• No disks in compute infrastructure                       • Avoid spinning 20 more disks per rack for a hypervisor OS    ...
• Image is ~170Mb gzipped filesystem                       • Download on boot, extract into ram - ~400Mb                • N...
Compute Node                         Netboot Server                                           dhcp                        ...
Sharp Edges.                • No swap == provision carefully                       • Not a problem if you automate provisi...
SoftwareTuesday, July 10, 12
Challenge                Virtualisation is often inefficient.                There’s a memory and CPU penalty which is hard...
Open VZ                • Linux containers                       • Basis for Parallels Virtuozzo Containers                ...
PerformanceTuesday, July 10, 12
http://wiki.openvz.org/Performance/vConsolidate-SMPTuesday, July 10, 12
http://wiki.openvz.org/Performance/LAMPTuesday, July 10, 12
Resource de-dupingTuesday, July 10, 12
“Don’t load the same thing                                 twice”Tuesday, July 10, 12
Challenge                Java VM’s aren’t lightweight.Tuesday, July 10, 12
• Full virtualisation does a poor job at this                       • 50 VMs = 50 Kernels + 50 caches + 50 shared libs!   ...
OpenVZ containers all share                     the same kernel.Tuesday, July 10, 12
• Provide a single OS image to all - free benefits:                       • Shared libraries only load once.               ...
Challenge                If all containers share the same OS image, then                managing state is a nightmare!    ...
• But managing state on multiple machines is a solved                  problem!                       • What if you have >...
Does your iPhone upgrade                        iOS when you install an                                 app?Tuesday, July ...
“Fix problems by removing them, not by adding                                 systems to manage them.”                    ...
Read-only OS imagesTuesday, July 10, 12
Data classes in a system                • OS and system daemon code                • Application code                • App...
Tuesday, July 10, 12
Tuesday, July 10, 12
OpenVZ KernelTuesday, July 10, 12
OpenVZ KernelTuesday, July 10, 12
Container                       OpenVZ KernelTuesday, July 10, 12
Container                       OpenVZ KernelTuesday, July 10, 12
Container                       OS tools                       System supplied code                                       ...
Container                       OS tools                                              / - Read Only                       ...
Container                       OS tools                                              / - Read Only                       ...
Container                       OS tools                               Applications, JVM’s                                ...
Container                       OS tools                               Applications, JVM’s                                ...
Container                       OS tools                               Applications, JVM’s                                ...
Container                                          Application and user data - /data (R/W)                       OS tools ...
Container                                          Application and user data - /data (R/W)                                ...
Container                                          Application and user data - /data (R/W)                                ...
Container                                          Application and user data - /data (R/W)                                ...
How?                • Storage nodes export /e/ro/ & /e/rw                • Build an OS distro inside a chroot.            ...
How?                • On Container start bind mount:                       /net/storage-n/e/ro/os/linux-image-v1/         ...
More benefits                • Distribute OS images as a simple directory.                • Prove that environments (Dev, S...
The Swear WallTuesday, July 10, 12
The swear wall helps prevent death by a thousand cuts.                       Your team has a gut feeling about whats hurti...
Tuesday, July 10, 12
1.!@&*^# Solaris!                       2.Solaris gets a mark                       3.Repeat                       4.Perio...
Optimise for the task at hand.                       Don’t layer solutions onto problems. Get rid of them.Tuesday, July 10...
Thank you!Tuesday, July 10, 12
Upcoming SlideShare
Loading in...5
×

Inside the Atlassian OnDemand Private Cloud

2,702

Published on

In order to launch Atlassian OnDemand, we needed to rethink the way we did infrastructure. Join Atlassian SaaS Platform Architect, George Barnett as he discusses how we delivered a scalable platform that runs tens of thousands of JVMs, all while reducing the cost by ten-fold. This talk will cover design decisions, technology choices and the lessons learned during the build out.

Published in: Technology

Transcript of "Inside the Atlassian OnDemand Private Cloud"

  1. 1. Tuesday, July 10, 12
  2. 2. Inside the Atlassian OnDemand private cloud George Barnett SAAS Platform ArchitectTuesday, July 10, 12
  3. 3. In 2010 a team of engineers moved into our secret lair (above a pub) to re-imagine our hosted platform.Tuesday, July 10, 12
  4. 4. 6 months later 13,500 VMs Launch - October 2011 1000 VMsTuesday, July 10, 12
  5. 5. We have a cloud. So what?Tuesday, July 10, 12
  6. 6. We also had a cloud.. and .. VM sprawl Poor performance Over provisioning Slow deployments Low visibility into the full stackTuesday, July 10, 12
  7. 7. Virtualisation often creates new challenges but does nothing about existing ones.Tuesday, July 10, 12
  8. 8. Tuesday, July 10, 12
  9. 9. Tuesday, July 10, 12
  10. 10. Tuesday, July 10, 12
  11. 11. Tuesday, July 10, 12
  12. 12. FocusTuesday, July 10, 12
  13. 13. Be less flexible about what infrastructure you provide.Tuesday, July 10, 12
  14. 14. “You can use any database you like, as long as its PostgreSQL 8.4.” #summit12Tuesday, July 10, 12
  15. 15. • Stop trying to be everything to everyone • (we have other clouds within Atlassian) • Lower operational complexity • Easier to provide a deeply integrated, well supported toolchain • Small test surface matrixTuesday, July 10, 12
  16. 16. Fail fast. Learn quickly.Tuesday, July 10, 12
  17. 17. Do as little as possible deploy and use itTuesday, July 10, 12
  18. 18. Block-1 A small scale model of the initial proposed platform architecture. 4 desktop machines and a switch. Purpose: Validate design, evaluate failure modes. http://history.nasa.gov/Apollo204/blocks.htmlTuesday, July 10, 12
  19. 19. Block-1 Applications do not fall over. Network boot assumptions validated. Creation of VM’s over NFS too resource and time intensive. (more on this later)Tuesday, July 10, 12
  20. 20. Block-2 A large scale model of the platform architecture. Purpose: Validate hardware resource assumptions and compare CPU vendors. http://history.nasa.gov/Apollo204/blocks.htmlTuesday, July 10, 12
  21. 21. Block-2 Customers per GB of RAM metric validated VM Distribution and failover tools work. Initial specs of compute hardware too conservative. Decided to add 50% more RAM.Tuesday, July 10, 12
  22. 22. HardwareTuesday, July 10, 12
  23. 23. Challenge Existing platform hardware was a poor fit for our workload. Memory and IO were heavily constrained, but CPU was not.Tuesday, July 10, 12
  24. 24. Monitoring We took 6 months worth of monitoring data from our existing platform. We used this to data to determine the right mix of hardware.Tuesday, July 10, 12
  25. 25. • 10 x Compute nodes (144G RAM, 12 cores, NO disks) • 3 x Storage nodes (24 disks) • Each rack delivered fully assembled • Unwrap, provide power, networking • Connected to customers in ~2 hoursTuesday, July 10, 12
  26. 26. Advantage #1 Reliable. Each machine goes through a 2 day burn in before it goes into the rack.Tuesday, July 10, 12
  27. 27. Advantage #2 Neat.Tuesday, July 10, 12
  28. 28. Advantage #3 Consistent.Tuesday, July 10, 12
  29. 29. Advantage #4 Easy to deploy.Tuesday, July 10, 12
  30. 30. No disks.Tuesday, July 10, 12
  31. 31. Wait. What?Tuesday, July 10, 12
  32. 32. Challenge Existing compute infrastructure used local disk for swap and hypervisor boot. Once we got the memory density right, it’s only boot.Tuesday, July 10, 12
  33. 33. • No disks in compute infrastructure • Avoid spinning 20 more disks per rack for a hypervisor OS • Evaluated booting from: • USB drives • NFS • Custom binary initrd image + kernelTuesday, July 10, 12
  34. 34. • No disks in compute infrastructure • Avoid spinning 20 more disks per rack for a hypervisor OS • Evaluated booting from: • USB drives (unreliable and slow!) • NFS (what if the network goes away?) • Custom binary initrd image + kernelTuesday, July 10, 12
  35. 35. • Image is ~170Mb gzipped filesystem • Download on boot, extract into ram - ~400Mb • No external dependencies after boot • All compute nodes boot from the same image • Reboot to known stateTuesday, July 10, 12
  36. 36. Compute Node Netboot Server dhcp PXE DHCP response TFTP gpxe dhcp DHCP Etherboot response HTTP bootscript kernel & boot image BootTuesday, July 10, 12
  37. 37. Sharp Edges. • No swap == provision carefully • Not a problem if you automate provisioning • Treat running hypervisor image like an appliance • Don’t change code - rebuild image and reboot • Doing this often? Too many services in the hypervisorTuesday, July 10, 12
  38. 38. SoftwareTuesday, July 10, 12
  39. 39. Challenge Virtualisation is often inefficient. There’s a memory and CPU penalty which is hard to avoid.Tuesday, July 10, 12
  40. 40. Open VZ • Linux containers • Basis for Parallels Virtuozzo Containers • LXC isn’t there yet • No guest OS kernels • No performance hit • Better resource sharingTuesday, July 10, 12
  41. 41. PerformanceTuesday, July 10, 12
  42. 42. http://wiki.openvz.org/Performance/vConsolidate-SMPTuesday, July 10, 12
  43. 43. http://wiki.openvz.org/Performance/LAMPTuesday, July 10, 12
  44. 44. Resource de-dupingTuesday, July 10, 12
  45. 45. “Don’t load the same thing twice”Tuesday, July 10, 12
  46. 46. Challenge Java VM’s aren’t lightweight.Tuesday, July 10, 12
  47. 47. • Full virtualisation does a poor job at this • 50 VMs = 50 Kernels + 50 caches + 50 shared libs! • Memory de-dupe combats this, but burns CPU. • Memory de-dupe works across all OSes • We don’t use Windows. • By being less flexible, we can exploit Linux specific features.Tuesday, July 10, 12
  48. 48. OpenVZ containers all share the same kernel.Tuesday, July 10, 12
  49. 49. • Provide a single OS image to all - free benefits: • Shared libraries only load once. • OS is cached only once. • OS image is the same on every instance.Tuesday, July 10, 12
  50. 50. Challenge If all containers share the same OS image, then managing state is a nightmare! One bad change in one container would break them all!Tuesday, July 10, 12
  51. 51. • But managing state on multiple machines is a solved problem! • What if you have >10,000 machines. • Why are you modifying the OS anyway?Tuesday, July 10, 12
  52. 52. Does your iPhone upgrade iOS when you install an app?Tuesday, July 10, 12
  53. 53. “Fix problems by removing them, not by adding systems to manage them.” #summit12Tuesday, July 10, 12
  54. 54. Read-only OS imagesTuesday, July 10, 12
  55. 55. Data classes in a system • OS and system daemon code • Application code • Application and user dataTuesday, July 10, 12
  56. 56. Tuesday, July 10, 12
  57. 57. Tuesday, July 10, 12
  58. 58. OpenVZ KernelTuesday, July 10, 12
  59. 59. OpenVZ KernelTuesday, July 10, 12
  60. 60. Container OpenVZ KernelTuesday, July 10, 12
  61. 61. Container OpenVZ KernelTuesday, July 10, 12
  62. 62. Container OS tools System supplied code OpenVZ KernelTuesday, July 10, 12
  63. 63. Container OS tools / - Read Only System supplied code OpenVZ KernelTuesday, July 10, 12
  64. 64. Container OS tools / - Read Only System supplied code OpenVZ KernelTuesday, July 10, 12
  65. 65. Container OS tools Applications, JVM’s / - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  66. 66. Container OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  67. 67. Container OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  68. 68. Container Application and user data - /data (R/W) OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  69. 69. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  70. 70. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  71. 71. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ KernelTuesday, July 10, 12
  72. 72. How? • Storage nodes export /e/ro/ & /e/rw • Build an OS distro inside a chroot. • Use whatever tools you are comfortable with. • Put this chroot tree in the RO location on storage nodes • Make a “data” dir in the RW location for each containerTuesday, July 10, 12
  73. 73. How? • On Container start bind mount: /net/storage-n/e/ro/os/linux-image-v1/ -> /vz/<ctid>/root • Replace etc, var & tmp with a memfs • Linux expects to be able to write to these • Mount containers data dir (RW) to /dataTuesday, July 10, 12
  74. 74. More benefits • Distribute OS images as a simple directory. • Prove that environments (Dev, Stg, Prd) are identical using MD5sum. • Flip between OS versions by changing a variableTuesday, July 10, 12
  75. 75. The Swear WallTuesday, July 10, 12
  76. 76. The swear wall helps prevent death by a thousand cuts. Your team has a gut feeling about whats hurting them - this helps you quantify that feeling and act on the pain.Tuesday, July 10, 12
  77. 77. Tuesday, July 10, 12
  78. 78. 1.!@&*^# Solaris! 2.Solaris gets a mark 3.Repeat 4.Periodically throw out offensive technology 5... 6.PROFIT!! (swear less)Tuesday, July 10, 12
  79. 79. Optimise for the task at hand. Don’t layer solutions onto problems. Get rid of them.Tuesday, July 10, 12
  80. 80. Thank you!Tuesday, July 10, 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×