Building a cloud service on a cloud infrastructure. Also, cloud.
Building a cloud service on a cloud infrastructure at Also, cloud. Mikhail Panchenko, Surge 2011
Who Am I?PancakesInfrastructure Engineer at SimpleGeoBackend Engineer at Flickr before thatBackend and Frontend Engineer at Yahoo!Ops/Tools before thatPhilosophy, Economics, and French majorbefore that@email@example.com
Tools for mobile/geo developersPrimarily focused on services, some data-oriented APIsPaaS, I guess? Ive lost track a bitAvailability, redundancy part of brand Our outage = your outageNo pressure
AgendaGoalsA little bit of theoryChallenges in The CloudGeneral ArchitectureImplementation Details
"Complex interactions are those of unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not immediately comprehensible."Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 78). Kindle Edition.
"The notion of baffling interactions is increasinglyfamiliar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostileenvironments, increasing their ties to other systems, they experience more and more incomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents." Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
Three Mile Island "... they found that radioactive water was nottraveling to the tank they intended, but because ofcomplex flow and pressure interactions, was going to a different, wrong tank, which also overflowed, this time in the auxiliary building."Charles Perrow. Normal Accidents: Living with High-Risk Technologies (pp. 22-23). Kindle Edition.
Amazon Web Services "The traffic shift was executed incorrectly andrather than routing the traffic to the other router onthe primary network, the traffic was routed onto the lower capacity redundant EBS network." "Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region" http://aws.amazon.com/message/65648/
Common ThemePreviously independent systems become coupled as a result of unanticipated interactions, leading to fundamentally surprising results
When pumping radioactive water into the wrongtank, the behavior of the program is undefined
"The notion of baffling interactions is increasinglyfamiliar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other systems, they experience more and moreincomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents." Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
Decouple Your SubsystemsShared resources are the most commonsource of unexpected interactionResist temptation to double up on rolesUse queues, caches as buffers NOTE: those are complex subsystems of their own
Decouple Your Subsystems Explicit DecouplingCPU Affinity Webserver on 1-7; SSH etc on 8 Crude, but gets the job doneMore robust solutions - containers
Decouple Your FunctionalityService architectureEach service does one thing wellEasier to measure, understand, andaccommodate resource demandsReduce potential for interactions,cross-functional failure
Decouple from Your Environment with Configuration Management Decouple from your platform (OS/kernel) Easy to test/bench potential candidates Easy to migrate if you find a winner This is especially important when dealing with cloud Automate as much of deploy/bootstrap process as possible Probably wont help much during a provider outage due to stampede BUT: DirectConnect You might not always be in the cloud..
Decouple Your DatacentersMost robust redundancy mechanismHot-hot keeps you on your toesSimplifies, not just for the cloud Yahoo! now foregoing datacenter features like HVAC "If it gets too hot in Washington, turn that DC off for a while" Im sure theyre not the only ones
Decouple Your Datacenters"AZ" - Basic building block for EC2This is the level they (theoretically)decouple atThey are probably thinking along thesame lines we are - must be able to turnoff one AZ without impact in the other