Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Inside the Atlassian OnDemand Private Cloud Slide 1 Inside the Atlassian OnDemand Private Cloud Slide 2 Inside the Atlassian OnDemand Private Cloud Slide 3 Inside the Atlassian OnDemand Private Cloud Slide 4 Inside the Atlassian OnDemand Private Cloud Slide 5 Inside the Atlassian OnDemand Private Cloud Slide 6 Inside the Atlassian OnDemand Private Cloud Slide 7 Inside the Atlassian OnDemand Private Cloud Slide 8 Inside the Atlassian OnDemand Private Cloud Slide 9 Inside the Atlassian OnDemand Private Cloud Slide 10 Inside the Atlassian OnDemand Private Cloud Slide 11 Inside the Atlassian OnDemand Private Cloud Slide 12 Inside the Atlassian OnDemand Private Cloud Slide 13 Inside the Atlassian OnDemand Private Cloud Slide 14 Inside the Atlassian OnDemand Private Cloud Slide 15 Inside the Atlassian OnDemand Private Cloud Slide 16 Inside the Atlassian OnDemand Private Cloud Slide 17 Inside the Atlassian OnDemand Private Cloud Slide 18 Inside the Atlassian OnDemand Private Cloud Slide 19 Inside the Atlassian OnDemand Private Cloud Slide 20 Inside the Atlassian OnDemand Private Cloud Slide 21 Inside the Atlassian OnDemand Private Cloud Slide 22 Inside the Atlassian OnDemand Private Cloud Slide 23 Inside the Atlassian OnDemand Private Cloud Slide 24 Inside the Atlassian OnDemand Private Cloud Slide 25 Inside the Atlassian OnDemand Private Cloud Slide 26 Inside the Atlassian OnDemand Private Cloud Slide 27 Inside the Atlassian OnDemand Private Cloud Slide 28 Inside the Atlassian OnDemand Private Cloud Slide 29 Inside the Atlassian OnDemand Private Cloud Slide 30 Inside the Atlassian OnDemand Private Cloud Slide 31 Inside the Atlassian OnDemand Private Cloud Slide 32 Inside the Atlassian OnDemand Private Cloud Slide 33 Inside the Atlassian OnDemand Private Cloud Slide 34 Inside the Atlassian OnDemand Private Cloud Slide 35 Inside the Atlassian OnDemand Private Cloud Slide 36 Inside the Atlassian OnDemand Private Cloud Slide 37 Inside the Atlassian OnDemand Private Cloud Slide 38 Inside the Atlassian OnDemand Private Cloud Slide 39 Inside the Atlassian OnDemand Private Cloud Slide 40 Inside the Atlassian OnDemand Private Cloud Slide 41 Inside the Atlassian OnDemand Private Cloud Slide 42 Inside the Atlassian OnDemand Private Cloud Slide 43 Inside the Atlassian OnDemand Private Cloud Slide 44 Inside the Atlassian OnDemand Private Cloud Slide 45 Inside the Atlassian OnDemand Private Cloud Slide 46 Inside the Atlassian OnDemand Private Cloud Slide 47 Inside the Atlassian OnDemand Private Cloud Slide 48 Inside the Atlassian OnDemand Private Cloud Slide 49 Inside the Atlassian OnDemand Private Cloud Slide 50 Inside the Atlassian OnDemand Private Cloud Slide 51 Inside the Atlassian OnDemand Private Cloud Slide 52 Inside the Atlassian OnDemand Private Cloud Slide 53 Inside the Atlassian OnDemand Private Cloud Slide 54 Inside the Atlassian OnDemand Private Cloud Slide 55 Inside the Atlassian OnDemand Private Cloud Slide 56 Inside the Atlassian OnDemand Private Cloud Slide 57 Inside the Atlassian OnDemand Private Cloud Slide 58 Inside the Atlassian OnDemand Private Cloud Slide 59 Inside the Atlassian OnDemand Private Cloud Slide 60 Inside the Atlassian OnDemand Private Cloud Slide 61 Inside the Atlassian OnDemand Private Cloud Slide 62 Inside the Atlassian OnDemand Private Cloud Slide 63 Inside the Atlassian OnDemand Private Cloud Slide 64 Inside the Atlassian OnDemand Private Cloud Slide 65 Inside the Atlassian OnDemand Private Cloud Slide 66 Inside the Atlassian OnDemand Private Cloud Slide 67 Inside the Atlassian OnDemand Private Cloud Slide 68 Inside the Atlassian OnDemand Private Cloud Slide 69 Inside the Atlassian OnDemand Private Cloud Slide 70 Inside the Atlassian OnDemand Private Cloud Slide 71 Inside the Atlassian OnDemand Private Cloud Slide 72 Inside the Atlassian OnDemand Private Cloud Slide 73 Inside the Atlassian OnDemand Private Cloud Slide 74 Inside the Atlassian OnDemand Private Cloud Slide 75 Inside the Atlassian OnDemand Private Cloud Slide 76 Inside the Atlassian OnDemand Private Cloud Slide 77 Inside the Atlassian OnDemand Private Cloud Slide 78 Inside the Atlassian OnDemand Private Cloud Slide 79 Inside the Atlassian OnDemand Private Cloud Slide 80
Upcoming SlideShare
Puppet Camp Tokyo 2014: Fireballs, ice bats and 1,000,000 plugins: a story of continuous delivery
Next
Download to read offline and view in fullscreen.

7 Likes

Share

Download to read offline

Inside the Atlassian OnDemand Private Cloud

Download to read offline

In order to launch Atlassian OnDemand, we needed to rethink the way we did infrastructure. Join Atlassian SaaS Platform Architect, George Barnett as he discusses how we delivered a scalable platform that runs tens of thousands of JVMs, all while reducing the cost by ten-fold. This talk will cover design decisions, technology choices and the lessons learned during the build out.

Related Books

Free with a 30 day trial from Scribd

See all

Inside the Atlassian OnDemand Private Cloud

  1. 1. Tuesday, July 10, 12
  2. 2. Inside the Atlassian OnDemand private cloud George Barnett SAAS Platform Architect Tuesday, July 10, 12
  3. 3. In 2010 a team of engineers moved into our secret lair (above a pub) to re-imagine our hosted platform. Tuesday, July 10, 12
  4. 4. 6 months later 13,500 VMs Launch - October 2011 1000 VMs Tuesday, July 10, 12
  5. 5. We have a cloud. So what? Tuesday, July 10, 12
  6. 6. We also had a cloud.. and .. VM sprawl Poor performance Over provisioning Slow deployments Low visibility into the full stack Tuesday, July 10, 12
  7. 7. Virtualisation often creates new challenges but does nothing about existing ones. Tuesday, July 10, 12
  8. 8. Tuesday, July 10, 12
  9. 9. Tuesday, July 10, 12
  10. 10. Tuesday, July 10, 12
  11. 11. Tuesday, July 10, 12
  12. 12. Focus Tuesday, July 10, 12
  13. 13. Be less flexible about what infrastructure you provide. Tuesday, July 10, 12
  14. 14. “You can use any database you like, as long as its PostgreSQL 8.4.” #summit12 Tuesday, July 10, 12
  15. 15. • Stop trying to be everything to everyone • (we have other clouds within Atlassian) • Lower operational complexity • Easier to provide a deeply integrated, well supported toolchain • Small test surface matrix Tuesday, July 10, 12
  16. 16. Fail fast. Learn quickly. Tuesday, July 10, 12
  17. 17. Do as little as possible deploy and use it Tuesday, July 10, 12
  18. 18. Block-1 A small scale model of the initial proposed platform architecture. 4 desktop machines and a switch. Purpose: Validate design, evaluate failure modes. http://history.nasa.gov/Apollo204/blocks.html Tuesday, July 10, 12
  19. 19. Block-1 Applications do not fall over. Network boot assumptions validated. Creation of VM’s over NFS too resource and time intensive. (more on this later) Tuesday, July 10, 12
  20. 20. Block-2 A large scale model of the platform architecture. Purpose: Validate hardware resource assumptions and compare CPU vendors. http://history.nasa.gov/Apollo204/blocks.html Tuesday, July 10, 12
  21. 21. Block-2 Customers per GB of RAM metric validated VM Distribution and failover tools work. Initial specs of compute hardware too conservative. Decided to add 50% more RAM. Tuesday, July 10, 12
  22. 22. Hardware Tuesday, July 10, 12
  23. 23. Challenge Existing platform hardware was a poor fit for our workload. Memory and IO were heavily constrained, but CPU was not. Tuesday, July 10, 12
  24. 24. Monitoring We took 6 months worth of monitoring data from our existing platform. We used this to data to determine the right mix of hardware. Tuesday, July 10, 12
  25. 25. • 10 x Compute nodes (144G RAM, 12 cores, NO disks) • 3 x Storage nodes (24 disks) • Each rack delivered fully assembled • Unwrap, provide power, networking • Connected to customers in ~2 hours Tuesday, July 10, 12
  26. 26. Advantage #1 Reliable. Each machine goes through a 2 day burn in before it goes into the rack. Tuesday, July 10, 12
  27. 27. Advantage #2 Neat. Tuesday, July 10, 12
  28. 28. Advantage #3 Consistent. Tuesday, July 10, 12
  29. 29. Advantage #4 Easy to deploy. Tuesday, July 10, 12
  30. 30. No disks. Tuesday, July 10, 12
  31. 31. Wait. What? Tuesday, July 10, 12
  32. 32. Challenge Existing compute infrastructure used local disk for swap and hypervisor boot. Once we got the memory density right, it’s only boot. Tuesday, July 10, 12
  33. 33. • No disks in compute infrastructure • Avoid spinning 20 more disks per rack for a hypervisor OS • Evaluated booting from: • USB drives • NFS • Custom binary initrd image + kernel Tuesday, July 10, 12
  34. 34. • No disks in compute infrastructure • Avoid spinning 20 more disks per rack for a hypervisor OS • Evaluated booting from: • USB drives (unreliable and slow!) • NFS (what if the network goes away?) • Custom binary initrd image + kernel Tuesday, July 10, 12
  35. 35. • Image is ~170Mb gzipped filesystem • Download on boot, extract into ram - ~400Mb • No external dependencies after boot • All compute nodes boot from the same image • Reboot to known state Tuesday, July 10, 12
  36. 36. Compute Node Netboot Server dhcp PXE DHCP response TFTP gpxe dhcp DHCP Etherboot response HTTP bootscript kernel & boot image Boot Tuesday, July 10, 12
  37. 37. Sharp Edges. • No swap == provision carefully • Not a problem if you automate provisioning • Treat running hypervisor image like an appliance • Don’t change code - rebuild image and reboot • Doing this often? Too many services in the hypervisor Tuesday, July 10, 12
  38. 38. Software Tuesday, July 10, 12
  39. 39. Challenge Virtualisation is often inefficient. There’s a memory and CPU penalty which is hard to avoid. Tuesday, July 10, 12
  40. 40. Open VZ • Linux containers • Basis for Parallels Virtuozzo Containers • LXC isn’t there yet • No guest OS kernels • No performance hit • Better resource sharing Tuesday, July 10, 12
  41. 41. Performance Tuesday, July 10, 12
  42. 42. http://wiki.openvz.org/Performance/vConsolidate-SMP Tuesday, July 10, 12
  43. 43. http://wiki.openvz.org/Performance/LAMP Tuesday, July 10, 12
  44. 44. Resource de-duping Tuesday, July 10, 12
  45. 45. “Don’t load the same thing twice” Tuesday, July 10, 12
  46. 46. Challenge Java VM’s aren’t lightweight. Tuesday, July 10, 12
  47. 47. • Full virtualisation does a poor job at this • 50 VMs = 50 Kernels + 50 caches + 50 shared libs! • Memory de-dupe combats this, but burns CPU. • Memory de-dupe works across all OSes • We don’t use Windows. • By being less flexible, we can exploit Linux specific features. Tuesday, July 10, 12
  48. 48. OpenVZ containers all share the same kernel. Tuesday, July 10, 12
  49. 49. • Provide a single OS image to all - free benefits: • Shared libraries only load once. • OS is cached only once. • OS image is the same on every instance. Tuesday, July 10, 12
  50. 50. Challenge If all containers share the same OS image, then managing state is a nightmare! One bad change in one container would break them all! Tuesday, July 10, 12
  51. 51. • But managing state on multiple machines is a solved problem! • What if you have >10,000 machines. • Why are you modifying the OS anyway? Tuesday, July 10, 12
  52. 52. Does your iPhone upgrade iOS when you install an app? Tuesday, July 10, 12
  53. 53. “Fix problems by removing them, not by adding systems to manage them.” #summit12 Tuesday, July 10, 12
  54. 54. Read-only OS images Tuesday, July 10, 12
  55. 55. Data classes in a system • OS and system daemon code • Application code • Application and user data Tuesday, July 10, 12
  56. 56. Tuesday, July 10, 12
  57. 57. Tuesday, July 10, 12
  58. 58. OpenVZ Kernel Tuesday, July 10, 12
  59. 59. OpenVZ Kernel Tuesday, July 10, 12
  60. 60. Container OpenVZ Kernel Tuesday, July 10, 12
  61. 61. Container OpenVZ Kernel Tuesday, July 10, 12
  62. 62. Container OS tools System supplied code OpenVZ Kernel Tuesday, July 10, 12
  63. 63. Container OS tools / - Read Only System supplied code OpenVZ Kernel Tuesday, July 10, 12
  64. 64. Container OS tools / - Read Only System supplied code OpenVZ Kernel Tuesday, July 10, 12
  65. 65. Container OS tools Applications, JVM’s / - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  66. 66. Container OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  67. 67. Container OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  68. 68. Container Application and user data - /data (R/W) OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  69. 69. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  70. 70. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  71. 71. Container Application and user data - /data (R/W) /data/service/ OS tools Applications, JVM’s / - Read Only /sw - Read Only System supplied code Configs OpenVZ Kernel Tuesday, July 10, 12
  72. 72. How? • Storage nodes export /e/ro/ & /e/rw • Build an OS distro inside a chroot. • Use whatever tools you are comfortable with. • Put this chroot tree in the RO location on storage nodes • Make a “data” dir in the RW location for each container Tuesday, July 10, 12
  73. 73. How? • On Container start bind mount: /net/storage-n/e/ro/os/linux-image-v1/ -> /vz/<ctid>/root • Replace etc, var & tmp with a memfs • Linux expects to be able to write to these • Mount containers data dir (RW) to /data Tuesday, July 10, 12
  74. 74. More benefits • Distribute OS images as a simple directory. • Prove that environments (Dev, Stg, Prd) are identical using MD5sum. • Flip between OS versions by changing a variable Tuesday, July 10, 12
  75. 75. The Swear Wall Tuesday, July 10, 12
  76. 76. The swear wall helps prevent death by a thousand cuts. Your team has a gut feeling about whats hurting them - this helps you quantify that feeling and act on the pain. Tuesday, July 10, 12
  77. 77. Tuesday, July 10, 12
  78. 78. 1.!@&*^# Solaris! 2.Solaris gets a mark 3.Repeat 4.Periodically throw out offensive technology 5... 6.PROFIT!! (swear less) Tuesday, July 10, 12
  79. 79. Optimise for the task at hand. Don’t layer solutions onto problems. Get rid of them. Tuesday, July 10, 12
  80. 80. Thank you! Tuesday, July 10, 12
  • phamphuongtu

    Aug. 1, 2015
  • openvz

    May. 21, 2015
  • tpberntsen

    Dec. 9, 2014
  • ThomasBartolucci

    Sep. 15, 2014
  • mcobby

    May. 20, 2014
  • brpaz

    Jan. 15, 2014
  • anthonyverez

    Aug. 5, 2012

In order to launch Atlassian OnDemand, we needed to rethink the way we did infrastructure. Join Atlassian SaaS Platform Architect, George Barnett as he discusses how we delivered a scalable platform that runs tens of thousands of JVMs, all while reducing the cost by ten-fold. This talk will cover design decisions, technology choices and the lessons learned during the build out.

Views

Total views

5,643

On Slideshare

0

From embeds

0

Number of embeds

488

Actions

Downloads

33

Shares

0

Comments

0

Likes

7

×