This slide intentionally left blank.Wednesday, 17 October 12
MAKING CLOUDS GO FASTER                           FOR FUN AND PROFIT                                   2Wednesday, 17 Octo...
3Wednesday, 17 October 12
Speakers                           Who crafted this talk?                                                    4Wednesday, 1...
Alex Howells                                @nixgeek                           Technical Operations                       ...
Paul Thomas                                       @ftergl0w                                   Technical Operations        ...
Bedtime Reading                            You can get a copy of these slides after the talk -                            ...
Problem?                              8Wednesday, 17 October 12
Performance                           It doesn’t need to be rocket science.                                  It does matte...
In a parallel universe...         “Oh man, that was too fast!             It’s so much better               now it’s slow!...
YEAH RIGHT                           I wish I had users who were that easy to please!                                  But...
In our universe...             “Why is that dude smiling?!                This is too slow!              Why can’t it be f...
THINGS ARE IMPROVING                           Cactus => Diablo => Essex => Folsom                           But things ca...
Mostly reliable,                              but can be a bit slow!                           Today                      ...
Faster. More scalable.             A real driving experience.                           The Future?                       ...
What’s the big deal?                                Why should I listen to you?                                           ...
WE’RE A LOT LIKE YOU!                            Developers. Operators. Engineers. Users.                            We se...
18Wednesday, 17 October 12
Airspace                            LivingSocial PaaS          We care about speed because ...        * Scaling services u...
Performance Matters                               What could your business do if instances came                           ...
What do we do?                              21Wednesday, 17 October 12
Think Positive                           Because solutions are better than problems!                                      ...
23Wednesday, 17 October 12
Two-Pronged                 Approach                           Hardware & Software                             “A Love Sto...
Warning!                   Picking the right hardware is quite hard.                   It’s often individual to your users...
Hardware                              26Wednesday, 17 October 12
Our Servers                               Supermicro 1027R-WRFT+                               2x Intel Xeon E5-2670 (8C/1...
Benefits                                    * ‘Just right’ balance of CPU/RAM for us.                                    * ...
Our Network          Top of Rack -                   Zone Spine -          Arista Networks 7050T           Arista Networks...
Benefits                                 * A network which runs Linux!                                 * Ability to automat...
Software                              31Wednesday, 17 October 12
Production                               Ubuntu 12.04 LTS (‘Precise Pangolin’)                               Hypervisor --...
Ubuntu 12.04 LTS (‘Precise Pangolin’)               Hypervisor -- KVM               Useful for development and testing    ...
WHAT NOW?                       We’ve picked the hardware stack. It’s awesome.                      We’ve got our software...
Monitoring                           Support calls are imprecise. We need data!                                           ...
Old School            * Is my service (API) responding on TCP/8774?            * Am I able to make a GET and fetch instanc...
New Thinking                               “End-User Experience Monitoring”                           * “How long did my w...
DEMO TIME!                        Because pretty pictures are awesome.               We’ll call the slowest transactions o...
Boundary                                              “AppViz”                                * Port-to-port throughput/la...
Tracelytics                       Latency Trends               * Over the last 60 minutes               * Over the last 24...
Tracelytics             Patches                            If you want to try out OpenStack APM -                       ht...
GlanceWednesday, 17 October 12
KeystoneWednesday, 17 October 12
NovaWednesday, 17 October 12
NovaWednesday, 17 October 12
NovaWednesday, 17 October 12
NovaWednesday, 17 October 12
“Call to Arms”            > Performance regression tests as an OpenStack CI gate?            > More people talking about “...
Credits                                    Because these folks are awesome                           N.B. Not intended as ...
Credits                           http://www.livingsocial.com                                        50Wednesday, 17 Octob...
Credits                           http://www.cloudscaling.com                                        51Wednesday, 17 Octob...
Credits                           http://www.aristanetworks.com                                         52Wednesday, 17 Oc...
Credits                           http://www.tracelytics.com                                       53Wednesday, 17 October...
We’re done talking,              thanks for listening!                           Any questions?                           ...
Interested?                                   E-mail Ken -                           ken.persel@livingsocial.com          ...
Upcoming SlideShare
Loading in …5
×

Making clouds go faster, for fun and profit!

665 views

Published on

Everyone loves it when things are fast, and that statement holds true whether you're visiting http://www.livingsocial.com or whether you're hitting the OpenStack Nova API and requesting, "Please show me all the instances which I've got running". Nobody ever writes in asking for support and saying, "All of my API calls are completing far too quickly. Slow it down!".

Optimizing the performance of software is arguably a never ending crusade. At some point in time you'll get things fast enough that you can say, "Any effort invested beyond this point is not adding value for the business" but then along comes new code which adds a zillion awesome features, but also regresses performance back to a level where it needs another tune-up.

In the process of transforming our infrastructure and preparing our new OpenStack IaaS to host all our applications, we've been looking for performance wins across the whole stack. We've got some aggressive targets to meet. We've investigated many hardware options and chosen an optimal solution, we've instrumented some of the OpenStack APIs and benchmarked to produce interesting results, and whilst we're not done yet, we do have a "Half-Time Match Report".

Join me as I walk through our learnings so far and propose follow-on areas for investigation and optimization.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
665
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Making clouds go faster, for fun and profit!

  1. 1. This slide intentionally left blank.Wednesday, 17 October 12
  2. 2. MAKING CLOUDS GO FASTER FOR FUN AND PROFIT 2Wednesday, 17 October 12
  3. 3. 3Wednesday, 17 October 12
  4. 4. Speakers Who crafted this talk? 4Wednesday, 17 October 12
  5. 5. Alex Howells @nixgeek Technical Operations LivingSocial alex.howells@livingsocial.com http://github.com/agh 5Wednesday, 17 October 12
  6. 6. Paul Thomas @ftergl0w Technical Operations LivingSocial paul.thomas@livingsocial.com http://github.com/AfterGlow 6Wednesday, 17 October 12
  7. 7. Bedtime Reading You can get a copy of these slides after the talk - https://speakerdeck.com/u/nixgeekWednesday, 17 October 12
  8. 8. Problem? 8Wednesday, 17 October 12
  9. 9. Performance It doesn’t need to be rocket science. It does matter though! I promise I’m not trolling you. 9Wednesday, 17 October 12
  10. 10. In a parallel universe... “Oh man, that was too fast! It’s so much better now it’s slow!!” -- Average User 10Wednesday, 17 October 12
  11. 11. YEAH RIGHT I wish I had users who were that easy to please! But since we live in the real world... 11Wednesday, 17 October 12
  12. 12. In our universe... “Why is that dude smiling?! This is too slow! Why can’t it be faster?” -- Average Users 12Wednesday, 17 October 12
  13. 13. THINGS ARE IMPROVING Cactus => Diablo => Essex => Folsom But things can improve faster with focus! 13Wednesday, 17 October 12
  14. 14. Mostly reliable, but can be a bit slow! Today 14Wednesday, 17 October 12
  15. 15. Faster. More scalable. A real driving experience. The Future? 15Wednesday, 17 October 12
  16. 16. What’s the big deal? Why should I listen to you? 16Wednesday, 17 October 12
  17. 17. WE’RE A LOT LIKE YOU! Developers. Operators. Engineers. Users. We see potential. We see opportunities. 17Wednesday, 17 October 12
  18. 18. 18Wednesday, 17 October 12
  19. 19. Airspace LivingSocial PaaS We care about speed because ... * Scaling services up/down needs to happen fast! * Needing to maintain huge pools of “slack capacity” to account for sudden spikes in traffic sucks. * Upgrading applications should be fast. What does fast mean to us? One example? New instances online in under 10 seconds. 19Wednesday, 17 October 12
  20. 20. Performance Matters What could your business do if instances came online in under 5 seconds vs. 50 seconds? > Makes integration tests leveraging the Cloud complete much faster. > Seasonal spikes? React to them faster - happier customers spend more money. > Engineers who don’t grumble that “getting servers is a pain in the ass”. > Deploy new applications and services more quickly and easily. Along with many other things ... 20Wednesday, 17 October 12
  21. 21. What do we do? 21Wednesday, 17 October 12
  22. 22. Think Positive Because solutions are better than problems! 22Wednesday, 17 October 12
  23. 23. 23Wednesday, 17 October 12
  24. 24. Two-Pronged Approach Hardware & Software “A Love Story” 24Wednesday, 17 October 12
  25. 25. Warning! Picking the right hardware is quite hard. It’s often individual to your users needs. What works for us may not rock your world. 25Wednesday, 17 October 12
  26. 26. Hardware 26Wednesday, 17 October 12
  27. 27. Our Servers Supermicro 1027R-WRFT+ 2x Intel Xeon E5-2670 (8C/16T 2.60GHz) 16 x 8GB 1600MHz ECC Memory LSI 9266-8i (1-LD RAID-10) 8 x Intel 520-series 240GB SSD Dual-Port Intel X540 10GBASE-T 27Wednesday, 17 October 12
  28. 28. Benefits * ‘Just right’ balance of CPU/RAM for us. * Exceptional ephemeral I/O performance > Not using eMLC - trade off? > We can think about SQL on IaaS * A surplus of network bandwidth Servers are not a bottleneck! 28Wednesday, 17 October 12
  29. 29. Our Network Top of Rack - Zone Spine - Arista Networks 7050T Arista Networks 7050Q 48-port 10GBASE-T Switch 16-port 40GbE Switch + 4-port 40GbE (uplinks) 29Wednesday, 17 October 12
  30. 30. Benefits * A network which runs Linux! * Ability to automate it via ZTP and Chef * Non-blocking communication in a rack. * Provision 160Gbps to spine via four cables. * Under 2:1 contention for comms in/out of rack. * Less need to think about QoS! Network is not a bottleneck! 30Wednesday, 17 October 12
  31. 31. Software 31Wednesday, 17 October 12
  32. 32. Production Ubuntu 12.04 LTS (‘Precise Pangolin’) Hypervisor -- KVM CloudScaling OCS 1.3 .. based off OpenStack Essex .. Moving to OCS 2.0 in near future... .. that one is OpenStack Folsom .. 32Wednesday, 17 October 12
  33. 33. Ubuntu 12.04 LTS (‘Precise Pangolin’) Hypervisor -- KVM Useful for development and testing .. we’re running OpenStack Folsom now .. Most of the data shown later was grabbed with help from DevStack running on similar hardware to our production environment. 33Wednesday, 17 October 12
  34. 34. WHAT NOW? We’ve picked the hardware stack. It’s awesome. We’ve got our software installed. It’s looking great. 34Wednesday, 17 October 12
  35. 35. Monitoring Support calls are imprecise. We need data! 35Wednesday, 17 October 12
  36. 36. Old School * Is my service (API) responding on TCP/8774? * Am I able to make a GET and fetch instance info? * Is my server running all the processes it should? * Are there any errors on my network ports? If any of this looks broken, send me alerts saying so!Wednesday, 17 October 12
  37. 37. New Thinking “End-User Experience Monitoring” * “How long did my website take to show?” * Individual performance of each click or API call * Inspection of latency within the application If lots of users interactions are slow, then I want you to alert me. If its just an outlier - log it and shut up.Wednesday, 17 October 12
  38. 38. DEMO TIME! Because pretty pictures are awesome. We’ll call the slowest transactions our “Disaster Porn”. 38Wednesday, 17 October 12
  39. 39. Boundary “AppViz” * Port-to-port throughput/latency * How much SQL traffic are you doing? Updates in real-time. Look backwards in time. Powered by IPFIX (RFC 5101) 39Wednesday, 17 October 12
  40. 40. Tracelytics Latency Trends * Over the last 60 minutes * Over the last 24 hours * Over the last 7 days Lots more cool stuff to help ... We’ll blitz through a few more things next ... Top Tip: This is bad news. 40Wednesday, 17 October 12
  41. 41. Tracelytics Patches If you want to try out OpenStack APM - https://github.com/Afterglow/tracelytics-openstack Any questions? Just open an issue! 41Wednesday, 17 October 12
  42. 42. GlanceWednesday, 17 October 12
  43. 43. KeystoneWednesday, 17 October 12
  44. 44. NovaWednesday, 17 October 12
  45. 45. NovaWednesday, 17 October 12
  46. 46. NovaWednesday, 17 October 12
  47. 47. NovaWednesday, 17 October 12
  48. 48. “Call to Arms” > Performance regression tests as an OpenStack CI gate? > More people talking about “How I fixed those >5 second outliers!” > Better ‘shared knowledge’ about what settings to tweak for added oomph > Architectural analysis asking about “big picture” (big impact) changes Reminder about those patches - https://github.com/Afterglow/tracelytics-openstack 48Wednesday, 17 October 12
  49. 49. Credits Because these folks are awesome N.B. Not intended as an exhaustive list of all the awesome people in the world/room! 49Wednesday, 17 October 12
  50. 50. Credits http://www.livingsocial.com 50Wednesday, 17 October 12
  51. 51. Credits http://www.cloudscaling.com 51Wednesday, 17 October 12
  52. 52. Credits http://www.aristanetworks.com 52Wednesday, 17 October 12
  53. 53. Credits http://www.tracelytics.com 53Wednesday, 17 October 12
  54. 54. We’re done talking, thanks for listening! Any questions? 54Wednesday, 17 October 12
  55. 55. Interested? E-mail Ken - ken.persel@livingsocial.com Or just find me! Reminder that these slides are over at - https://speakerdeck.com/u/nixgeekWednesday, 17 October 12

×