Active Queue Management (for Cloud Services)

5,876 views

Published on

Peak load, and burst-y traffic are problem spaces which are often (and tragically) confused for each other, invariably to the detriment of both ops and users. While peak-load is all about capacity management, in a burst-y situation, you might have to prioritize - or even drop! - requests. Knowing which requests to process, and how to actually process them is the world of Active Queue Management (AQM). While AQM has long been exclusively in the domain of the TCP/IP crowd, it has been slowly making its way into the world of cloud-services, albeit with much (faulty!) wheel-reinventing.
Join me as I take you through the world of Active Queue Management, back-pressure, load-ramping, and tactical avoidance, things that most people should be architecting into their services, but aren't.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,876
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • An overall approach to Preparedness
  • The Buddha nature of erlang – Fault Tolerance
  • Hosted PBX
  • And you always get asked this
  • Don’t let it spread
  • Don’t let it spread
  • Don’t let it spread
  • Don’t let it spread
  • Don’t let it spread
  • Don’t let it spread
  • Gen 2 – “Hosted PBXs”
  • Our story starts on a happy Saturday in february
  • Its still Friday
  • Just part of one cluster failed, but a threshold had been passed
  • No worries, we’ll just bounce that one cluster, it’ll all be good
  • Total System Meltdown
  • All the calls keep retrying, causing memory utilization to go through the roof
  • Voicemail conversion was going on independent of everything else, causing CPU utilization to spike
  • Eventually, the cache timed out, and tried to reload stuff from the disk.
  • And then everyone tries the Apps, and the Twitters and the facebooks and the everythings.
  • Some of us have been confronted by this
  • Total System Meltdown
  • And you always get asked this
  • There is only so much planning you can do. At some point, the 1000 year flood hits
  • The point being, Shit will happen.The question is, when Shit happens, can you clean up?
  • Its not just us
  • Its not just us
  • Its not just us!!!We are not alone!(Breaking Benjamin)(5 of 5 leading providers…)
  • Do you have disks in the loop?Maybe humans?Or large data? (postgres data moved to backup datacenter?)
  • Yeah right.Its what everybody sez.And then shit happens
  • How fast are you?How quickly can you come back up? Can you store enough state to survive?
  • Is BufferBloat a problem?
  • Once you are up, can you draw down the queue fast enough?Or at all, for that matter?
  • Is backpressure going to be a problem?
  • If the answer is “Yes”, then the talk is over, because it just works.
  • What if the answer is “No”? (Now we have a story)
  • ProgrammableIf you’re lucky, you’re infrastructure will automagically support ramping
  • Fake it. People respond subconsciously to these, and actually waitYou can even get away with dropping the request(This assumes that you can recover in time)
  • This happens inside the airport too!Passengers self-select the best gates to enter(intelligent routing)
  • (Programmable, Behavioral, & self managed)the plane move around different runways before leaving, to free up gates, and make passengers think something is happening(always take the first flight out! And the last flight back!)
  • Surprisingly, airlines are ridiculously good at AQM.
  • The question is, what do you do when you can’t come up in time? 3 gallon bucket, 5 gallons of water…
  • Just start dropping when queue fills upThis is pretty bad – global synchronization becomes a problemPlanes don’t take off till they get clearance from the other end
  • Slow Start, AQM, RED, CoDEL, …Why don’t we learn from networks?
  • RED / SRED(RED in a different light – toilet bowl)
  • RED / SRED(RED in a different light – toilet bowl)
  • The 3rd priority airport always gets the shaft
  • F(low) REDRED on a per-flow basis (the entire route map)Kinda the default. Discard second request)
  • RED – P(referential) D(rop)Does RED only for High BW flows (high traffic routes)(Throttle spammy clients. Or features.)
  • Fixed two bugs in REDMade it feedback based (self-tuning)Toilet diagram caused problems
  • Sliced bread has nothing on itDave Taht
  • Know when something breaks
  • Know when something breaks
  • Know what broke
  • Know what broke
  • Know what broke
  • Know what broke
  • And then everyone tries the Apps, and the Twitters and the facebooks and the everythings.
  • Don’t bother logging, checking, testing, etc.
  • So, no transcoding, so no CPU
  • The “replug”
  • Active Queue Management (for Cloud Services)

    1. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Active Queue Management Mahesh Paolini-Subramanya (@dieswaytoofast) V.P. R&D, Ubiquiti Networks
    2. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management The Metrics
    3. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Phone calls per Second The Metrics
    4. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Phone calls per Second x 1000 The Metrics
    5. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Simultaneous Phone Calls The Metrics
    6. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Simultaneous Phone Calls x 10,000 The Metrics
    7. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management API Requests The Metrics
    8. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management ∞ API Requests The Metrics
    9. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Multi-Site
    10. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management My Vacation
    11. Active Queue Management (Actually, the day before) V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved
    12. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management A small failure…
    13. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management
    14. Active Queue Management The Horror! The Horror! V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved
    15. Why are my calls failing?
    16. You better call me back!
    17. I’m still p***ed off!
    18. And you’re stupid Apps don’t work!
    19. Dude! WTF?!?!
    20. The Horror! The Horror!
    21. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Surely you Tested?
    22. Romney 2012
    23. (Lack of) Speed Kills
    24. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management !!!!!Queues!!!!!!
    25. Active Queue Management Can you recover quickly? Bufferbloat doesn’t matter, right? Once up, can you deal with the backlog? Back-pressure isn’t an issue, right? V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Queues
    26. Active Queue Management Can you recover quickly? Bufferbloat doesn’t matter, right? NO PE Once up, can you deal with the backlog? Back-pressure isn’t an issue, right? V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Queues
    27. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Programmable
    28. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Behavioral
    29. Self Managed
    30. Queues Queue Mgmt.
    31. Queues Active Queue Mgmt.
    32. Something’s gotta give
    33. Tail Drop
    34. God (category – TCP/IP)
    35. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management RED
    36. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management RED
    37. Newark Airport
    38. FRED
    39. RED-PD
    40. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management RED in a different Light (1999) Queues
    41. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management CoDel Queues
    42. Active Queue Management What about Testing? V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved
    43. DUH … Active Queue Management What about Testing? V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved
    44. Active Queue Management Black swans will occur – Oh Yes! V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved The Bottom Line
    45. Active Queue Management Black swans will occur – Oh Yes! You can only improve what you control V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved The Bottom Line
    46. Active Queue Management Black swans will occur – Oh Yes! You can only improve what you control Your business will define your discards V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved The Bottom Line
    47. Active Queue Management Black swans will occur – Oh Yes! You can only improve what you control Your business will define your discards V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Agility is (always!) your friend The Bottom Line
    48. The Business Beware the Black Swan
    49. V 1.0 © Ubiquiti Networks, Inc. All Rights Reserved Active Queue Management Questions mahesh@dieswaytoofast.com @dieswaytoofast
    50. You, apparently, forgot about me
    51. Free Calling
    52. No Voicemail
    53. “Active” Queue Management
    54. Questions mahesh@dieswaytoofast.com @dieswaytoofast

    ×