• Like
High Availability Clouds-Cloud Computing Expo
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

High Availability Clouds-Cloud Computing Expo



Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. High Availability Clouds “Moving mission critical applications to the cloud.” Jeremy Hitchcock, CEO Dynamic Network Services Cloud Computing Expo 2009
  • 2. Who cares? Why Relevant? • Enterprises and service providers: “now what”? • Desire to move business or mission critical apps – That’s most of them • Clouds have an “unstable” feel Cloud Computing Expo 2009
  • 3. Who cares? Why Relevant? • Still, benefits to virtualizing computing resources • Most don’t care about raw hardware • Becoming more software/resource integrators – Less concerned with software/hardware integration • Better use of hardware resources – Most systems are pretty idle all the time • Hardware is getting expensive (well, power is) Cloud Computing Expo 2009
  • 4. Where are Clouds? You Are Here Cloud Computing Expo 2009
  • 5. Where we are going (or like to be) • Cloud adoption going to be like this? – Limited to spiky demand or distributed processing • Will more services move to cloud environments? • Even between clouds and traditional hosting? • No hardware? – Someone has to worry about infrastructure though Cloud Computing Expo 2009
  • 6. Background on me • Internet infrastructure: DNS for other people – DynDNS.com, Dynect Platform • Do traffic management, dynamic quot;routingquot; for clouds • Work with a lot of cloud providers to get domain.com to node-19334 but not node-49291 • Background in networking, software engineering • Use all unmanaged hosting (but do have a VPS offering for consumer (it was a dev project)) Cloud Computing Expo 2009
  • 7. Terms • Unmanaged hosting – corporate/outsourced datacenter, your own everything • Managed hosting – Hardware is provided with ping port and power • Cloud hosting – Using virtual resources to accomplish the same as the above two items Cloud Computing Expo 2009
  • 8. Goals with High Availability • Availability: Users do not see outages • Scaling: Not impossible or easy – Does not mean more resources available – Important when you think “on demand” • Efficient use of resources (more on that) • Institutionalized operations practices – Monitoring, security regimes Cloud Computing Expo 2009
  • 9. High Available What? • Well, anything? • Applications • File systems • CPU, I/O, and network – I/O is both storage space and retrieval Cloud Computing Expo 2009
  • 10. HA Availability Cloud Computing Expo 2009
  • 11. Early Days of Hosting • Been here before: mainframes to 1U servers • Copy over redundancy in larger systems – “That’s how larger systems were so accessible” • Expensive 1Us lead to commodity hardware • “We just take our application and move it over here” • And that was when things took a turn… Cloud Computing Expo 2009
  • 12. Cloud Computing Expo 2009
  • 13. Ouch! • Lots of cheap hardware, gained efficiency – Most of the time anyway • Applications were not available – Up and down all of time • DB admins, network admins, system admins all pointing fingers Cloud Computing Expo 2009
  • 14. Ouch! • Needed more 1Us to do the job • 1U equipment quality was not as good • More people, more operations issues • Security concerns, DB admins having system access • Failures and scaling became a problem until… Cloud Computing Expo 2009
  • 15. Ah Ha Moment! It’s ok if a 1U fails. It happens all the time! Cloud Computing Expo 2009
  • 16. Ah Ha Moment! • Make the system more redundant, fault-tolerant • Break apart units to create working spaces – N+1 redundancy, whatever your risk tolerance is • Specialized hardware to maintain efficiency • Monitor the units of work – Ping, port, power separately Cloud Computing Expo 2009
  • 17. Ah Ha Moment! • Separate DB/app/file into clusters – That makes scaling and failover easy • Filiers for DB and large scale storage • Demand SLAs for network transit • Get the NOC to work on cross system outages Cloud Computing Expo 2009
  • 18. Still Some Lingering Issues • Architectures grew to match applications – Tightly coupled, is that good? – Makes it hard to move around – Specialized hardware pieces • Do you look like Flickr? – If you do, their hosting platform will work for your app Cloud Computing Expo 2009
  • 19. Cloud Computing Expo 2009
  • 20. Still Some Lingering Issues • Systems are more complicated – Yahoo 9/11 Memorial site cascade failures • Fix was a load balancer/DNS tweak • Lots of “glue” to make sure everything works • Each architecture is [slightly] different Cloud Computing Expo 2009
  • 21. Finally: Some Lingering Issues • Therefore: – Failures, if an application is in shards, works – Scaling is application specific, different bottlenecks – Reasonable efficiency, limited specialized hardware – More people to maintain “the system” but secure Cloud Computing Expo 2009
  • 22. Now Onto Clouds… • Promise: – On demand resources (true if you can use it) – Greater computer efficiency (all costs are internalized) – More flexibility for development and peak usage – Greater availability • Reality: – Your responsibility to throw in more hardware – Trade specialization for generalization (bottlenecks) – Limited by tools provided and consumed – Maybe Cloud Computing Expo 2009
  • 23. Availability Cloud Computing Expo 2009
  • 24. Availability is Defined by Outages Cloud Computing Expo 2009
  • 25. Amazon/Cloud Outages? • Not clear: – “There was this one in July 2008” – “Some DNS issues yesterday” • How often? How regular? • Out of 500,000 harddrives, x will fail in 3.243 years • Out of 1 cloud provider? (or maybe 5) – We don’t know. Cloud Computing Expo 2009
  • 26. Cloud Realities • “Best effort” to provide services • Ever ask for an SLA? – I’m sure it’s coming but not soon enough for some • Remember, Amazon is providing a service – Unmanaged environment • Relax, that’s the Internet, we’ll figure it out Cloud Computing Expo 2009
  • 27. Cloud Realities • No physical access to systems • No guarantee for systems to be available • No guarantee that new systems to be available • No continuity guarantee – Great performance one moment, maybe not the next – Shared resources • Everything is local, security is a lot different Cloud Computing Expo 2009
  • 28. But Clouds are Virtualized 1Us! • Well, they are, but not really • Used to be: – Ping, port, power – raw access – Hybrids: corporate datacenter, managed, unmanaged • Now: – Ping, port, power, file I/O– virtual access • Outsourcing network, hardware, and OS Cloud Computing Expo 2009
  • 29. Why is it different • Hardware becomes a service – Depending on the application, that may matter • More vendors in the mix – Network, hardware, OS much more packaged • Simpler presentation but complicated behind the scenes • Library issues, security issues, OS upgrades? Cloud Computing Expo 2009
  • 30. Availability • Goal: Eliminate single points of failure – Clouds are consolidations of services – Solution is to split it apart • Achieve true diversity – Business continuity diversity – Geographic diversity – Network diversity – OS diversity • More layers make interactions hard to predict Cloud Computing Expo 2009
  • 31. Eliminate Pointsof Failure • Cloud diversity • Cloud outages are typically binary • Interoperability needed to make it easier – That will come in several ways Cloud Computing Expo 2009
  • 32. Failover Events • Failure events happen (more frequently in clouds?) • Trick is detecting and redirecting – quot;Once is a mistake, twice is jazz” – Miles Davis • Needs to be seamless and automatic • Good provisioning and monitoring in place – Server builds, revisioning, server configurations – Everything more modular Cloud Computing Expo 2009
  • 33. Scaling • Go from 1 to 2 to 4 to 10,000 units • Split apart work units • Have to do it sooner than later • More sharding, less efficient • Not all units are going to be equal nor constant Cloud Computing Expo 2009
  • 34. Provisioning • Everything needs to be automatic (or at least close) • As you grow, this hurts more and more • Provisioning means lab, dev, and production • This becomes a critical system – Monitoring and backups should work with provisioning Cloud Computing Expo 2009
  • 35. Hardware Considerations • Hardware optimized software packages may change • Security patches – Default images v. custom images • Physical access not granted to you but others – Physical access means all access – Encrypted data on disc – Less recovery options • Do you really have access to your data? – See backups Cloud Computing Expo 2009
  • 36. Host Issues • Host system security vulnerabilities • Everything is local – VLANing becoming more available • Underlying systems need maintenance – Live migrations Cloud Computing Expo 2009
  • 37. Monitoring • System related outages because units will fail • Normal tools are based on physical limitations • Cloud environments not always clear where the failure is • Test from the last mile • Performance testing important too • System testing and transactions • May not pinpoint problems but it does send pages Cloud Computing Expo 2009
  • 38. Backups • Incremental backups much more important • Backup within the same cloud? – Probably not, but where? • Data files, application files, configuration files – Version everything – Document how they all go together • But you already do that so it’s ok  Cloud Computing Expo 2009
  • 39. Migrations • Be able to take your data (server image) – Server import and export • Live migration, underlying software provides it • This is all interoperability needs Cloud Computing Expo 2009
  • 40. Disaster Planning • When things go really wrong: – Need to communicate using other means • Social networking like Twitter (are they affected as well?) – Have a plan B, diversity of cloud providers – Seek SLAs? Cloud Computing Expo 2009
  • 41. Some Things External • DNS – Point domain.com to your plan B • Backups and files – When you want to publish content at plan B • Customer communications – Tell customers and users what’s going on • Last-mile monitoring – Everything might look ok in the cloud • Want options if there is an outage Cloud Computing Expo 2009
  • 42. Key Points • Clouds are great for applications, even mission critical ones • Best practices for server farms aren’t always best practices for clouds • Need to rely on software to make hardware assumptions work right • Constant trade off of cost and availability, what’s the risk tolerance Cloud Computing Expo 2009
  • 43. Questions Jeremy Hitchcock jeremy@dyn-inc.com http://dyn-inc.com/ Cloud Computing Expo 2009