Availability, The Cloud and Everything (version 2, Surge2010)

2,648
-1

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,648
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
54
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Availability, The Cloud and Everything (version 2, Surge2010)

  1. Availability, the Cloud and Everything Joe Williams Saturday, October 2, 2010
  2. Me • Joe Williams • Infrastructure Engineer • Cloudant • @williamsjoe • joeandmotorboat.com Saturday, October 2, 2010
  3. • Distributed database built on CouchDB • Real-time Search and Analytics • Sign Up! (Free to 256MB) • cloudant.com • http://github.com/cloudant/bigcouch Saturday, October 2, 2010
  4. Bias • Distributed Databases (CouchDB) • Amazon EC2 • Chef • Erlang Saturday, October 2, 2010
  5. Availability Saturday, October 2, 2010
  6. Availability • What is Availability? Saturday, October 2, 2010
  7. Availability Saturday, October 2, 2010
  8. Availability “System availability refers to the accessibility of system services to users. A system is available if it is operational for an overwhelming fraction of the time. Unlike reliability, availability is instantaneous.” Saturday, October 2, 2010
  9. Availability “System reliability refers to the property of tolerating constituent component failures, for the longest time. A system is perfectly reliable if it never fails.” Saturday, October 2, 2010
  10. Availability • Reliability * Availability = Dependability Saturday, October 2, 2010
  11. Availability • Availability & Reliability • Mean time to failures • Mean time to repair • Durability • Fault isolation • Fault tolerance Saturday, October 2, 2010
  12. Availability • Uptime / Downtime • Perceived • Actual Saturday, October 2, 2010
  13. Availability • Probabilistic Risk Assessment • Event Tree Analysis • Fault Tree Analysis Apthorpe (http://www.usenix.org/events/lisa01/tech/apthorpe/apthorpe.ps) Saturday, October 2, 2010
  14. The Cloud Saturday, October 2, 2010
  15. The Cloud “It never gets easier, you just go faster.” - Greg Lemond Saturday, October 2, 2010
  16. The Cloud • Abstraction • Commoditization • Homogenous • Ephemeral Saturday, October 2, 2010
  17. The Cloud • Costs • Loss of Control • Single Points of Failure • Network Partitions / Data Locality • Unreliable • Performance Saturday, October 2, 2010
  18. The Cloud • Benefits • API to everything • Fast and Flexible Resource Mgmt • “Unlimited” Resources Saturday, October 2, 2010
  19. The Cloud • Bootstrapping • Time and Effort Adam Jacob and Ezra Zygmuntowicz (http://blip.tv/file/2285124/) Saturday, October 2, 2010
  20. The Cloud • Nodes are stateless and disposable. Saturday, October 2, 2010
  21. The Cloud "Clouds are systems ... and with systems, you have to think hard and know how to deal with issues in that environment. The scale is so much bigger, and you don't have the physical control. But we think people should be optimistic about what we can do here. If we are clever about deploying cloud computing with a clear-eyed notion of what the risk models are, maybe we can actually save the economy through technology." - Security in the Ether By David Talbot - MIT Technology Review Jan/Feb 2010 Saturday, October 2, 2010
  22. What’s Next • Distributed Systems • Automation • Data Driven Operations Saturday, October 2, 2010
  23. Distributed Systems Baran (http://www.rand.org/pubs/research_memoranda/RM3420/) Saturday, October 2, 2010
  24. Distributed Systems • RAID ain’t as redundant as it used to be. Leventhal (http://queue.acm.org/detail.cfm?id=1670144) Saturday, October 2, 2010
  25. Distributed Systems • Redundancy • Duplication • Distribution Saturday, October 2, 2010
  26. Distributed Systems • Alphabet Soup • ACID, CAP, BASE, 2PC, MVCC • Vector Clocks, Eventual Consistency • Dynamo, Paxos, Chandra, Byzantine Saturday, October 2, 2010
  27. Distributed Systems • CAP == Availability Saturday, October 2, 2010
  28. Distributed Systems • Erlang • Distributed • Concurrent • Fault Tolerant Saturday, October 2, 2010
  29. Distributed Systems • Erlang • Supervision Trees Saturday, October 2, 2010
  30. Distributed Systems • Erlang • Hot Code Upgrades • Distributed Upgrades are HARD Saturday, October 2, 2010
  31. Distributed Systems • Future Work • Erlang Supervision Trees • PRA / FTA / ETA Apthorpe (http://www.usenix.org/events/lisa01/tech/apthorpe/apthorpe.ps) Saturday, October 2, 2010
  32. Automation Saturday, October 2, 2010
  33. Automation • Optimal use of the cloud. Saturday, October 2, 2010
  34. Automation • Frequent deployment. Saturday, October 2, 2010
  35. Automation • Tools • Chef • Puppet • Cfengine • Bcfg2 Saturday, October 2, 2010
  36. Automation • Erlang + Chef (as of v0.8) • erl_call Provider Saturday, October 2, 2010
  37. Data Driven Operations Saturday, October 2, 2010
  38. Data Driven Operations “What gets measured, gets managed.” -Peter Drucker Saturday, October 2, 2010
  39. Data Driven Operations • Instrumentation Saturday, October 2, 2010
  40. Data Driven Operations • Logging Saturday, October 2, 2010
  41. Data Driven Operations • Visualization Saturday, October 2, 2010
  42. Data Driven Operations • Demo! Saturday, October 2, 2010
  43. Data Driven Operations • Modeling • Analysis • Universal Law of Computational Scalability • Amdahl’s Law Saturday, October 2, 2010
  44. Data Driven Operations • Modeling isn’t just for capacity planning. Montagne (http://queue.acm.org/detail.cfm?id=1862187) Saturday, October 2, 2010
  45. The End Saturday, October 2, 2010
  46. Questions? Joe Williams - @williamsjoe Saturday, October 2, 2010

×