Building a cloud service on a cloud infrastructure. Also, cloud.

911 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
911
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
28
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Building a cloud service on a cloud infrastructure. Also, cloud.

  1. 1. Building a cloud service on a cloud infrastructure at Also, cloud. Mikhail Panchenko, Surge 2011
  2. 2. Who Am I?PancakesInfrastructure Engineer at SimpleGeoBackend Engineer at Flickr before thatBackend and Frontend Engineer at Yahoo!Ops/Tools before thatPhilosophy, Economics, and French majorbefore that@mihasyapancakes@simplegeo.com
  3. 3. Tools for mobile/geo developersPrimarily focused on services, some data-oriented APIsPaaS, I guess? Ive lost track a bitAvailability, redundancy part of brand Our outage = your outageNo pressure
  4. 4. AgendaGoalsA little bit of theoryChallenges in The CloudGeneral ArchitectureImplementation Details
  5. 5. Architectural GoalsHigh availabilityLinear scalabilityElasticity/FlexibilityRedundancy/Fault Tolerance
  6. 6. Read: dont wake me up, please
  7. 7. Sound Familiar?
  8. 8. Some Theory, Food for Thought
  9. 9. The Internets as Complex Systems
  10. 10. http://www.amazon.com/Normal-Accidents-Living-High-Risk-Technologies/dp/0691004129
  11. 11. "Complex interactions are those of unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not immediately comprehensible."Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 78). Kindle Edition.
  12. 12. "The notion of baffling interactions is increasinglyfamiliar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostileenvironments, increasing their ties to other systems, they experience more and more incomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents." Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
  13. 13. Fortunately,This Is Only The Internet
  14. 14. "The beauty of this is its simplicity. Once a plan gets too complex, everything can go wrong." Walter Sobchak, The Big Lebowski
  15. 15. InteractionsLinear vs Complex
  16. 16. CouplingTight vs Loose
  17. 17. Three Mile Island "... they found that radioactive water was nottraveling to the tank they intended, but because ofcomplex flow and pressure interactions, was going to a different, wrong tank, which also overflowed, this time in the auxiliary building."Charles Perrow. Normal Accidents: Living with High-Risk Technologies (pp. 22-23). Kindle Edition.
  18. 18. Amazon Web Services "The traffic shift was executed incorrectly andrather than routing the traffic to the other router onthe primary network, the traffic was routed onto the lower capacity redundant EBS network." "Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region" http://aws.amazon.com/message/65648/
  19. 19. Common ThemePreviously independent systems become coupled as a result of unanticipated interactions, leading to fundamentally surprising results
  20. 20. When pumping radioactive water into the wrongtank, the behavior of the program is undefined
  21. 21. But where does The Cloud come in??
  22. 22. The Trifle Analogy Photo by mathematically_impossible
  23. 23. The Trifle Analogy Photo by mathematically_impossible
  24. 24. A complex system consisting of complex subsystems
  25. 25. Photo by wwarby
  26. 26. The Trifle AnalogyOriginal photos by mathematically_impossible and miheco
  27. 27. Tightly coupled to a complex system over which you have no control and into which you have no insight
  28. 28. Photo by 20after4
  29. 29. Recall"Baffling Interactions"
  30. 30. "The notion of baffling interactions is increasinglyfamiliar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other systems, they experience more and moreincomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents." Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
  31. 31. DECOUPLE DECOUPLE DECOUPLE ( also, simplify )
  32. 32. Photo by erikcharlton
  33. 33. Decouple Your SubsystemsShared resources are the most commonsource of unexpected interactionResist temptation to double up on rolesUse queues, caches as buffers NOTE: those are complex subsystems of their own
  34. 34. Decouple Your Subsystems Explicit DecouplingCPU Affinity Webserver on 1-7; SSH etc on 8 Crude, but gets the job doneMore robust solutions - containers
  35. 35. Decouple Your FunctionalityService architectureEach service does one thing wellEasier to measure, understand, andaccommodate resource demandsReduce potential for interactions,cross-functional failure
  36. 36. Decouple from Your Environment with Configuration Management Decouple from your platform (OS/kernel) Easy to test/bench potential candidates Easy to migrate if you find a winner This is especially important when dealing with cloud Automate as much of deploy/bootstrap process as possible Probably wont help much during a provider outage due to stampede BUT: DirectConnect You might not always be in the cloud..
  37. 37. Decouple Your DatacentersMost robust redundancy mechanismHot-hot keeps you on your toesSimplifies, not just for the cloud Yahoo! now foregoing datacenter features like HVAC "If it gets too hot in Washington, turn that DC off for a while" Im sure theyre not the only ones
  38. 38. Decouple Your Datacenters"AZ" - Basic building block for EC2This is the level they (theoretically)decouple atThey are probably thinking along thesame lines we are - must be able to turnoff one AZ without impact in the other
  39. 39. ( theres a hidden interaction there )
  40. 40. Every datacenter as an independent microcosm of your overall architecture
  41. 41. The Birds n the Bees
  42. 42. Birds Eye View
  43. 43. Photo by reschroederimages
  44. 44. Birds Eye View
  45. 45. ( note the absence of specifics )
  46. 46. Birds Eye View
  47. 47. Maintenance - Divide & Conquer
  48. 48. Local Degradation - Divide & Conquer
  49. 49. Incompatible Upgrade - Guess!
  50. 50. Incompatible Upgrade - Guess!
  51. 51. Incompatible Upgrade - Yay!
  52. 52. Baffling Single Node Failure
  53. 53. 202 Accepted
  54. 54. Spike in Write Traffic
  55. 55. Really simple operational steps for stressful tasks & situations
  56. 56. Temporally decouple the problem from the resolution
  57. 57. Go back to sleep Photo by joshme17
  58. 58. Now, how about those specifics?
  59. 59. Write Path
  60. 60. ELBDynamic Load BalancingFlexible virtual IPEasy to add/remove AZsUses healthchecks to automaticallyevict nodes
  61. 61. Gate - "Layer 8 Proxy"Lightweight Node.js daemonOAuthRate LimitingBasic routing to actual services
  62. 62. Recall"Decouple Your Functionality"
  63. 63. Services - Pick Your Own AdventureNode.js and Python Some people just hate Node.jsCan be anything, as long as Gate cantalk to it ( another reason to decouple )Highly specialized
  64. 64. RabbitMQA grenade for our knife-fightVery flexible - more than we need Simplification candidateNew persistor in >= 1.3 - degradationover failureSee talk at 1:30PM
  65. 65. CassandraA mostly-textbook DHTHomogenous distributed modelRandom load distributionPartition tolerance A perfect foundation for our architecture
  66. 66. Partition ToleranceIts not just for outages
  67. 67. Recall"Divide & Conquer"
  68. 68. This too is a partition
  69. 69. Thank You!@mihasyapancakes@simplegeo.com

×