Availability, The Cloud and Everything (version 2, Surge2010)
Upcoming SlideShare
Loading in...5
×
 

Availability, The Cloud and Everything (version 2, Surge2010)

on

  • 2,379 views

 

Statistics

Views

Total Views
2,379
Views on SlideShare
2,379
Embed Views
0

Actions

Likes
1
Downloads
53
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Availability, The Cloud and Everything (version 2, Surge2010) Availability, The Cloud and Everything (version 2, Surge2010) Presentation Transcript

  • Availability, the Cloud and Everything Joe Williams Saturday, October 2, 2010
  • Me • Joe Williams • Infrastructure Engineer • Cloudant • @williamsjoe • joeandmotorboat.com Saturday, October 2, 2010
  • • Distributed database built on CouchDB • Real-time Search and Analytics • Sign Up! (Free to 256MB) • cloudant.com • http://github.com/cloudant/bigcouch Saturday, October 2, 2010
  • Bias • Distributed Databases (CouchDB) • Amazon EC2 • Chef • Erlang Saturday, October 2, 2010
  • Availability Saturday, October 2, 2010
  • Availability • What is Availability? Saturday, October 2, 2010
  • Availability Saturday, October 2, 2010
  • Availability “System availability refers to the accessibility of system services to users. A system is available if it is operational for an overwhelming fraction of the time. Unlike reliability, availability is instantaneous.” Saturday, October 2, 2010
  • Availability “System reliability refers to the property of tolerating constituent component failures, for the longest time. A system is perfectly reliable if it never fails.” Saturday, October 2, 2010
  • Availability • Reliability * Availability = Dependability Saturday, October 2, 2010
  • Availability • Availability & Reliability • Mean time to failures • Mean time to repair • Durability • Fault isolation • Fault tolerance Saturday, October 2, 2010
  • Availability • Uptime / Downtime • Perceived • Actual Saturday, October 2, 2010
  • Availability • Probabilistic Risk Assessment • Event Tree Analysis • Fault Tree Analysis Apthorpe (http://www.usenix.org/events/lisa01/tech/apthorpe/apthorpe.ps) Saturday, October 2, 2010
  • The Cloud Saturday, October 2, 2010
  • The Cloud “It never gets easier, you just go faster.” - Greg Lemond Saturday, October 2, 2010
  • The Cloud • Abstraction • Commoditization • Homogenous • Ephemeral Saturday, October 2, 2010
  • The Cloud • Costs • Loss of Control • Single Points of Failure • Network Partitions / Data Locality • Unreliable • Performance Saturday, October 2, 2010
  • The Cloud • Benefits • API to everything • Fast and Flexible Resource Mgmt • “Unlimited” Resources Saturday, October 2, 2010
  • The Cloud • Bootstrapping • Time and Effort Adam Jacob and Ezra Zygmuntowicz (http://blip.tv/file/2285124/) Saturday, October 2, 2010
  • The Cloud • Nodes are stateless and disposable. Saturday, October 2, 2010
  • The Cloud "Clouds are systems ... and with systems, you have to think hard and know how to deal with issues in that environment. The scale is so much bigger, and you don't have the physical control. But we think people should be optimistic about what we can do here. If we are clever about deploying cloud computing with a clear-eyed notion of what the risk models are, maybe we can actually save the economy through technology." - Security in the Ether By David Talbot - MIT Technology Review Jan/Feb 2010 Saturday, October 2, 2010
  • What’s Next • Distributed Systems • Automation • Data Driven Operations Saturday, October 2, 2010
  • Distributed Systems Baran (http://www.rand.org/pubs/research_memoranda/RM3420/) Saturday, October 2, 2010
  • Distributed Systems • RAID ain’t as redundant as it used to be. Leventhal (http://queue.acm.org/detail.cfm?id=1670144) Saturday, October 2, 2010
  • Distributed Systems • Redundancy • Duplication • Distribution Saturday, October 2, 2010
  • Distributed Systems • Alphabet Soup • ACID, CAP, BASE, 2PC, MVCC • Vector Clocks, Eventual Consistency • Dynamo, Paxos, Chandra, Byzantine Saturday, October 2, 2010
  • Distributed Systems • CAP == Availability Saturday, October 2, 2010
  • Distributed Systems • Erlang • Distributed • Concurrent • Fault Tolerant Saturday, October 2, 2010
  • Distributed Systems • Erlang • Supervision Trees Saturday, October 2, 2010
  • Distributed Systems • Erlang • Hot Code Upgrades • Distributed Upgrades are HARD Saturday, October 2, 2010
  • Distributed Systems • Future Work • Erlang Supervision Trees • PRA / FTA / ETA Apthorpe (http://www.usenix.org/events/lisa01/tech/apthorpe/apthorpe.ps) Saturday, October 2, 2010
  • Automation Saturday, October 2, 2010
  • Automation • Optimal use of the cloud. Saturday, October 2, 2010
  • Automation • Frequent deployment. Saturday, October 2, 2010
  • Automation • Tools • Chef • Puppet • Cfengine • Bcfg2 Saturday, October 2, 2010
  • Automation • Erlang + Chef (as of v0.8) • erl_call Provider Saturday, October 2, 2010
  • Data Driven Operations Saturday, October 2, 2010
  • Data Driven Operations “What gets measured, gets managed.” -Peter Drucker Saturday, October 2, 2010
  • Data Driven Operations • Instrumentation Saturday, October 2, 2010
  • Data Driven Operations • Logging Saturday, October 2, 2010
  • Data Driven Operations • Visualization Saturday, October 2, 2010
  • Data Driven Operations • Demo! Saturday, October 2, 2010
  • Data Driven Operations • Modeling • Analysis • Universal Law of Computational Scalability • Amdahl’s Law Saturday, October 2, 2010
  • Data Driven Operations • Modeling isn’t just for capacity planning. Montagne (http://queue.acm.org/detail.cfm?id=1862187) Saturday, October 2, 2010
  • The End Saturday, October 2, 2010
  • Questions? Joe Williams - @williamsjoe Saturday, October 2, 2010