High Availability Clouds
“Moving mission critical applications to the cloud.”

             Jeremy Hitchcock, CEO
        ...
Who cares? Why Relevant?
• Enterprises and service providers: “now what”?
• Desire to move business or mission critical ap...
Who cares? Why Relevant?
• Still, benefits to virtualizing computing resources
• Most don’t care about raw hardware
• Beco...
Where are Clouds?
    You Are
     Here




    Cloud Computing Expo 2009
Where we are going (or like to be)

• Cloud adoption going to be like this?
   – Limited to spiky demand or distributed pr...
Background on me
• Internet infrastructure: DNS for other people
   – DynDNS.com, Dynect Platform
• Do traffic management,...
Terms
• Unmanaged hosting – corporate/outsourced
  datacenter, your own everything
• Managed hosting – Hardware is provide...
Goals with High Availability
• Availability: Users do not see outages
• Scaling: Not impossible or easy
   – Does not mean...
High Available What?
•   Well, anything?
•   Applications
•   File systems
•   CPU, I/O, and network
    – I/O is both sto...
HA Availability




  Cloud Computing Expo 2009
Early Days of Hosting
• Been here before: mainframes to 1U servers
• Copy over redundancy in larger systems
   – “That’s h...
Cloud Computing Expo 2009
Ouch!
• Lots of cheap hardware, gained efficiency
   – Most of the time anyway
• Applications were not available
   – Up a...
Ouch!
•   Needed more 1Us to do the job
•   1U equipment quality was not as good
•   More people, more operations issues
•...
Ah Ha Moment!


It’s ok if a 1U fails. It happens all the time!




               Cloud Computing Expo 2009
Ah Ha Moment!
• Make the system more redundant, fault-tolerant
• Break apart units to create working spaces
   – N+1 redun...
Ah Ha Moment!
• Separate DB/app/file into clusters
   – That makes scaling and failover easy
• Filiers for DB and large sc...
Still Some Lingering Issues
• Architectures grew to match applications
   – Tightly coupled, is that good?
   – Makes it h...
Cloud Computing Expo 2009
Still Some Lingering Issues
• Systems are more complicated
   – Yahoo 9/11 Memorial site cascade failures
      • Fix was ...
Finally: Some Lingering Issues
• Therefore:
  –   Failures, if an application is in shards, works
  –   Scaling is applica...
Now Onto Clouds…
• Promise:
   –   On demand resources (true if you can use it)
   –   Greater computer efficiency (all co...
Availability




Cloud Computing Expo 2009
Availability is Defined by Outages




            Cloud Computing Expo 2009
Amazon/Cloud Outages?
• Not clear:
   – “There was this one in July 2008”
   – “Some DNS issues yesterday”
• How often? Ho...
Cloud Realities
• “Best effort” to provide services
• Ever ask for an SLA?
   – I’m sure it’s coming but not soon enough f...
Cloud Realities
•   No physical access to systems
•   No guarantee for systems to be available
•   No guarantee that new s...
But Clouds are Virtualized 1Us!
• Well, they are, but not really
• Used to be:
   – Ping, port, power – raw access
   – Hy...
Why is it different
• Hardware becomes a service
   – Depending on the application, that may matter
• More vendors in the ...
Availability
• Goal: Eliminate single points of failure
   – Clouds are consolidations of services
   – Solution is to spl...
Eliminate Pointsof Failure
• Cloud diversity
• Cloud outages are typically binary
• Interoperability needed to make it eas...
Failover Events
• Failure events happen (more frequently in clouds?)
• Trick is detecting and redirecting
   – quot;Once i...
Scaling
•   Go from 1 to 2 to 4 to 10,000 units
•   Split apart work units
•   Have to do it sooner than later
•   More sh...
Provisioning
•   Everything needs to be automatic (or at least close)
•   As you grow, this hurts more and more
•   Provis...
Hardware Considerations
• Hardware optimized software packages may change
• Security patches
   – Default images v. custom...
Host Issues
• Host system security vulnerabilities
• Everything is local
   – VLANing becoming more available
• Underlying...
Monitoring
• System related outages because units will fail
• Normal tools are based on physical limitations
• Cloud envir...
Backups
• Incremental backups much more important
• Backup within the same cloud?
   – Probably not, but where?
• Data fil...
Migrations
• Be able to take your data (server image)
   – Server import and export
• Live migration, underlying software ...
Disaster Planning
• When things go really wrong:
   – Need to communicate using other means
      • Social networking like...
Some Things External
• DNS
   – Point domain.com to your plan B
• Backups and files
   – When you want to publish content ...
Key Points
• Clouds are great for applications, even mission
  critical ones
• Best practices for server farms aren’t alwa...
Questions

  Jeremy Hitchcock
jeremy@dyn-inc.com
 http://dyn-inc.com/




   Cloud Computing Expo 2009
Upcoming SlideShare
Loading in …5
×

High Availability Clouds-Cloud Computing Expo

3,228
-1

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,228
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
153
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

High Availability Clouds-Cloud Computing Expo

  1. 1. High Availability Clouds “Moving mission critical applications to the cloud.” Jeremy Hitchcock, CEO Dynamic Network Services Cloud Computing Expo 2009
  2. 2. Who cares? Why Relevant? • Enterprises and service providers: “now what”? • Desire to move business or mission critical apps – That’s most of them • Clouds have an “unstable” feel Cloud Computing Expo 2009
  3. 3. Who cares? Why Relevant? • Still, benefits to virtualizing computing resources • Most don’t care about raw hardware • Becoming more software/resource integrators – Less concerned with software/hardware integration • Better use of hardware resources – Most systems are pretty idle all the time • Hardware is getting expensive (well, power is) Cloud Computing Expo 2009
  4. 4. Where are Clouds? You Are Here Cloud Computing Expo 2009
  5. 5. Where we are going (or like to be) • Cloud adoption going to be like this? – Limited to spiky demand or distributed processing • Will more services move to cloud environments? • Even between clouds and traditional hosting? • No hardware? – Someone has to worry about infrastructure though Cloud Computing Expo 2009
  6. 6. Background on me • Internet infrastructure: DNS for other people – DynDNS.com, Dynect Platform • Do traffic management, dynamic quot;routingquot; for clouds • Work with a lot of cloud providers to get domain.com to node-19334 but not node-49291 • Background in networking, software engineering • Use all unmanaged hosting (but do have a VPS offering for consumer (it was a dev project)) Cloud Computing Expo 2009
  7. 7. Terms • Unmanaged hosting – corporate/outsourced datacenter, your own everything • Managed hosting – Hardware is provided with ping port and power • Cloud hosting – Using virtual resources to accomplish the same as the above two items Cloud Computing Expo 2009
  8. 8. Goals with High Availability • Availability: Users do not see outages • Scaling: Not impossible or easy – Does not mean more resources available – Important when you think “on demand” • Efficient use of resources (more on that) • Institutionalized operations practices – Monitoring, security regimes Cloud Computing Expo 2009
  9. 9. High Available What? • Well, anything? • Applications • File systems • CPU, I/O, and network – I/O is both storage space and retrieval Cloud Computing Expo 2009
  10. 10. HA Availability Cloud Computing Expo 2009
  11. 11. Early Days of Hosting • Been here before: mainframes to 1U servers • Copy over redundancy in larger systems – “That’s how larger systems were so accessible” • Expensive 1Us lead to commodity hardware • “We just take our application and move it over here” • And that was when things took a turn… Cloud Computing Expo 2009
  12. 12. Cloud Computing Expo 2009
  13. 13. Ouch! • Lots of cheap hardware, gained efficiency – Most of the time anyway • Applications were not available – Up and down all of time • DB admins, network admins, system admins all pointing fingers Cloud Computing Expo 2009
  14. 14. Ouch! • Needed more 1Us to do the job • 1U equipment quality was not as good • More people, more operations issues • Security concerns, DB admins having system access • Failures and scaling became a problem until… Cloud Computing Expo 2009
  15. 15. Ah Ha Moment! It’s ok if a 1U fails. It happens all the time! Cloud Computing Expo 2009
  16. 16. Ah Ha Moment! • Make the system more redundant, fault-tolerant • Break apart units to create working spaces – N+1 redundancy, whatever your risk tolerance is • Specialized hardware to maintain efficiency • Monitor the units of work – Ping, port, power separately Cloud Computing Expo 2009
  17. 17. Ah Ha Moment! • Separate DB/app/file into clusters – That makes scaling and failover easy • Filiers for DB and large scale storage • Demand SLAs for network transit • Get the NOC to work on cross system outages Cloud Computing Expo 2009
  18. 18. Still Some Lingering Issues • Architectures grew to match applications – Tightly coupled, is that good? – Makes it hard to move around – Specialized hardware pieces • Do you look like Flickr? – If you do, their hosting platform will work for your app Cloud Computing Expo 2009
  19. 19. Cloud Computing Expo 2009
  20. 20. Still Some Lingering Issues • Systems are more complicated – Yahoo 9/11 Memorial site cascade failures • Fix was a load balancer/DNS tweak • Lots of “glue” to make sure everything works • Each architecture is [slightly] different Cloud Computing Expo 2009
  21. 21. Finally: Some Lingering Issues • Therefore: – Failures, if an application is in shards, works – Scaling is application specific, different bottlenecks – Reasonable efficiency, limited specialized hardware – More people to maintain “the system” but secure Cloud Computing Expo 2009
  22. 22. Now Onto Clouds… • Promise: – On demand resources (true if you can use it) – Greater computer efficiency (all costs are internalized) – More flexibility for development and peak usage – Greater availability • Reality: – Your responsibility to throw in more hardware – Trade specialization for generalization (bottlenecks) – Limited by tools provided and consumed – Maybe Cloud Computing Expo 2009
  23. 23. Availability Cloud Computing Expo 2009
  24. 24. Availability is Defined by Outages Cloud Computing Expo 2009
  25. 25. Amazon/Cloud Outages? • Not clear: – “There was this one in July 2008” – “Some DNS issues yesterday” • How often? How regular? • Out of 500,000 harddrives, x will fail in 3.243 years • Out of 1 cloud provider? (or maybe 5) – We don’t know. Cloud Computing Expo 2009
  26. 26. Cloud Realities • “Best effort” to provide services • Ever ask for an SLA? – I’m sure it’s coming but not soon enough for some • Remember, Amazon is providing a service – Unmanaged environment • Relax, that’s the Internet, we’ll figure it out Cloud Computing Expo 2009
  27. 27. Cloud Realities • No physical access to systems • No guarantee for systems to be available • No guarantee that new systems to be available • No continuity guarantee – Great performance one moment, maybe not the next – Shared resources • Everything is local, security is a lot different Cloud Computing Expo 2009
  28. 28. But Clouds are Virtualized 1Us! • Well, they are, but not really • Used to be: – Ping, port, power – raw access – Hybrids: corporate datacenter, managed, unmanaged • Now: – Ping, port, power, file I/O– virtual access • Outsourcing network, hardware, and OS Cloud Computing Expo 2009
  29. 29. Why is it different • Hardware becomes a service – Depending on the application, that may matter • More vendors in the mix – Network, hardware, OS much more packaged • Simpler presentation but complicated behind the scenes • Library issues, security issues, OS upgrades? Cloud Computing Expo 2009
  30. 30. Availability • Goal: Eliminate single points of failure – Clouds are consolidations of services – Solution is to split it apart • Achieve true diversity – Business continuity diversity – Geographic diversity – Network diversity – OS diversity • More layers make interactions hard to predict Cloud Computing Expo 2009
  31. 31. Eliminate Pointsof Failure • Cloud diversity • Cloud outages are typically binary • Interoperability needed to make it easier – That will come in several ways Cloud Computing Expo 2009
  32. 32. Failover Events • Failure events happen (more frequently in clouds?) • Trick is detecting and redirecting – quot;Once is a mistake, twice is jazz” – Miles Davis • Needs to be seamless and automatic • Good provisioning and monitoring in place – Server builds, revisioning, server configurations – Everything more modular Cloud Computing Expo 2009
  33. 33. Scaling • Go from 1 to 2 to 4 to 10,000 units • Split apart work units • Have to do it sooner than later • More sharding, less efficient • Not all units are going to be equal nor constant Cloud Computing Expo 2009
  34. 34. Provisioning • Everything needs to be automatic (or at least close) • As you grow, this hurts more and more • Provisioning means lab, dev, and production • This becomes a critical system – Monitoring and backups should work with provisioning Cloud Computing Expo 2009
  35. 35. Hardware Considerations • Hardware optimized software packages may change • Security patches – Default images v. custom images • Physical access not granted to you but others – Physical access means all access – Encrypted data on disc – Less recovery options • Do you really have access to your data? – See backups Cloud Computing Expo 2009
  36. 36. Host Issues • Host system security vulnerabilities • Everything is local – VLANing becoming more available • Underlying systems need maintenance – Live migrations Cloud Computing Expo 2009
  37. 37. Monitoring • System related outages because units will fail • Normal tools are based on physical limitations • Cloud environments not always clear where the failure is • Test from the last mile • Performance testing important too • System testing and transactions • May not pinpoint problems but it does send pages Cloud Computing Expo 2009
  38. 38. Backups • Incremental backups much more important • Backup within the same cloud? – Probably not, but where? • Data files, application files, configuration files – Version everything – Document how they all go together • But you already do that so it’s ok  Cloud Computing Expo 2009
  39. 39. Migrations • Be able to take your data (server image) – Server import and export • Live migration, underlying software provides it • This is all interoperability needs Cloud Computing Expo 2009
  40. 40. Disaster Planning • When things go really wrong: – Need to communicate using other means • Social networking like Twitter (are they affected as well?) – Have a plan B, diversity of cloud providers – Seek SLAs? Cloud Computing Expo 2009
  41. 41. Some Things External • DNS – Point domain.com to your plan B • Backups and files – When you want to publish content at plan B • Customer communications – Tell customers and users what’s going on • Last-mile monitoring – Everything might look ok in the cloud • Want options if there is an outage Cloud Computing Expo 2009
  42. 42. Key Points • Clouds are great for applications, even mission critical ones • Best practices for server farms aren’t always best practices for clouds • Need to rely on software to make hardware assumptions work right • Constant trade off of cost and availability, what’s the risk tolerance Cloud Computing Expo 2009
  43. 43. Questions Jeremy Hitchcock jeremy@dyn-inc.com http://dyn-inc.com/ Cloud Computing Expo 2009
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×