Designing for the Cloud:A Tutorial<br />Stuart Charlton, CTO, Elastra<br />
Tutorial Objectives<br />What has cloud computing done to IT systems design & architecture?<br />“The future is already he...
About Your Presenter<br />Stuart Charlton<br /><ul><li>Canadian, now in San Francisco</li></ul>CTO, Elastra<br /><ul><li>F...
Tutorial Agenda, in 4 Words<br />Clouds<br />Service<br />Data<br />Control<br />4<br />
Agenda – Part 1<br />Clouds: Fear of a Fluffy Planet<br />What has changed, and what remains the same?<br />Designing appl...
Agenda – Part 2<br />Data: Management & Access<br />Contrasting Philosophies<br />Persistence vs. Management; Scale-Up vs....
Caveats<br />Audience Assumption:  IT Devs & Architects<br />Some exposure to cloud, but not necessarily advanced<br />The...
Clouds<br />Fear of a Fluffy Planet<br />8<br />
(court<br />(Courtesy of browsertoolkit.com)<br />
The Freedom!<br />On Demand Infrastructure via API calls<br />Inside or outside my data centres (Private / Public Cloud)<b...
The Horror!<br />Hype Overdrive<br />Cloud Running Shoes!  Cloud Chewing Gum!  GOOG!Werner Vogels Action Figures!  (well, ...
Will the Real Slim Cloudy Please Stand Up?<br />“I, for one, welcome our new  outsourced overlords”<br />Finer-grained out...
Will the Real Slim Cloudy Please Stand Up?<br />“I like Big *aaS and I cannot lie”<br />“My name is… what? Slim Cloudy!”<b...
Designing Applications in this World<br />Distributed & networked systems have triumphed<br />The fallacies must be taken ...
Designing Applications in this World<br />Autonomous services constitute most systems<br />Full-stack services, not just b...
Solving for one’s own problems<br />Mainstream tools, platforms, and servers have not consistently caught up<br />LOTS of ...
A Cloud Design Reference Architecture<br />Web – WebArch & REST<br />Service, Data,& Control – this tutorial<br />Resource...
Service<br />Organizing your computing domain for<br />fault<br />scale<br />management<br />WEB<br />SERVICE<br />DATA<br...
Data<br />Storage, retrieval,integrity, recovery given<br />Distributed systems<br />Large scale<br />High availability<br...
Control<br />Provision, configuration, governance, and optimization of infrastructure<br />Resource brokerage<br />Policy ...
Service<br />Foundation for Systems<br />
Designing a Service, circa 1998-2008<br />Multi-Tier Hybrid Architecture<br />Some stateless, some stateful computing<br /...
Designing a Service, circa 2008+<br />Autonomous services <br />Divide system into areas of functional responsibility (tie...
Breaking or bridging a problem across resources<br />Big Problems (Parallel)<br />Theory:Amdahl’s lawShared memory or disk...
Case Study in “Big Problem” Solving:MapReduce & Apache Hadoop<br />Input<br />Read your data from files as a K/V map<br />...
….But how fast can I get?Theory Interlude:  Amdahl’s Law<br />How fast can I speed up a sequential process?<br />Time = Se...
Gunther’s Universal Scalability Law<br />It gets worse…<br />Most scale-outexperiencesretrogradebehavior at peak loads<br ...
Case study in solving “little problems”Actors:   The Basic Idea<br />Programmable entities are concurrent, share nothing, ...
ErlangSupervisors: Assuming failure will occur<br />Failures require cleanup & restart<br />Supervisor relationships canen...
What kinds of failures?  A Simplification.<br />Exceptional Conditions<br />Conditions that a programmer did not or should...
Data<br />Management & Access<br />
Evolving the Database:  Two Philosophies<br />Data Persistence Systemsand Frameworks<br />Database Management Systems(DBMS...
Scaling the Database:  Two Philosophies<br />Scale-Up<br />Scale-Out<br />Concurrent processing & parallelism through hard...
… What happens when database clustering software stops sucking? (i.e. now)<br />A flurry of programmer-oriented approaches...
Too many choices, with idiosyncratic design histories
Let’s detangle this…</li></ul>34<br />
When should I share components?<br />Shared Disk<br />Shared Nothing<br />Partition compute across nodes<br />Storage is s...
Modern Data Persistence Systems <br />Object Persistence<br />“Navigational databases in Java, Smalltalk, C++”<br />GemSto...
Clustered DBMS for Transactions<br />Oracle Real Application Clusters (RAC)<br />Shared disk, Replicated Memory (“Cache Fu...
Clustered DBMS for Parallel Queries<br />Teradata<br />The old standard data warehouse, hardware + software<br />Netezza<b...
Scaling to Internet-Scale<br />Single Control Domain<br />One Database Site<br />Consistency is built-in<br />Scalable wit...
How do I make consistency tradeoffs?Theory interlude:  The CAP theorem<br />Consistency (A+C in ACID)<br />There’s a total...
CAP Tradeoffs:  Consistency & Availability<br /><ul><li>The common case.
 Fault tolerance through replicas </li></ul>  & fast fail + fast recovery<br /><ul><li>Implication:
 network outage between servers might halt the system
 generally requires a single domainof control
Examples that emphasize C+A:
 Single-site cluster databases
 Google BigTable
Hadoop’sHBase
 Oracle RAC, IBM DB2 Parallel
 Clustered file systems
Google File System & HDFS
 Distributed Spaces & Caches
Coherence, Gigaspaces & Terracotta</li></li></ul><li>CAP Tradeoffs:  Consistency & Partitions<br /><ul><li>Common approach...
Implication:
Upcoming SlideShare
Loading in …5
×

Designing for the Cloud Tutorial - QCon SF 2009

2,083 views

Published on

Slides from the cloud design tutorial I was supposed to give at QCon SF 2009.

Published in: Technology
2 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total views
2,083
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
126
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide

Designing for the Cloud Tutorial - QCon SF 2009

  1. 1. Designing for the Cloud:A Tutorial<br />Stuart Charlton, CTO, Elastra<br />
  2. 2. Tutorial Objectives<br />What has cloud computing done to IT systems design & architecture?<br />“The future is already here, it’s just not very evenly distributed” (Gibson)<br />How should new systems be designed with the new constraints?<br />Such as: parallelism, availability, on demand infra<br />Where can I find are practical frameworks, tools, and techniques, and what are the tradeoffs?<br />Hadoop, Cassandra, Parallel DBs, Actors, Caches, Containers, and Configuration Management<br />
  3. 3. About Your Presenter<br />Stuart Charlton<br /><ul><li>Canadian, now in San Francisco</li></ul>CTO, Elastra<br /><ul><li>Focus on Customers, Products, Technology Directions</li></ul>In prior lives... <br /><ul><li>BEA Systems, Rogers Communications, Financial Services,global training & consulting</li></ul>RESTafarian and Data geek<br />Stu Says Stuffhttp://stucharlton.com/blog<br />
  4. 4. Tutorial Agenda, in 4 Words<br />Clouds<br />Service<br />Data<br />Control<br />4<br />
  5. 5. Agenda – Part 1<br />Clouds: Fear of a Fluffy Planet<br />What has changed, and what remains the same?<br />Designing applications in this world<br />A Cloud Design Reference Architecture<br />(aka. A cheat sheet to categorize thinking in the clouds)<br />Service: Foundations for Systems<br />Solving Big Problems vs. Little Problems<br />Amdahl’s Law & The Universal Scalability Law <br />Actor-Based Concurrency: Dr. Strangelanguage, (or How I Learned to Stop Worrying and Love Erlang)<br />
  6. 6. Agenda – Part 2<br />Data: Management & Access<br />Contrasting Philosophies<br />Persistence vs. Management; Scale-Up vs. Scale-Out<br />Shared Disk vs. Shared Nothing<br />A survey of solutions (from clustered DBMS to K/V stores)<br />Consistency, Availability, Partitioning (CAP) Tradeoffs<br />Deep dig into what these really imply<br />Control: Containers, Configuration & Modeling<br />The Dev/Ops Tennis Match<br />The Evolution of Automation<br />From Scripts to Runbooks to FSMs to HTNs<br />
  7. 7. Caveats<br />Audience Assumption: IT Devs & Architects<br />Some exposure to cloud, but not necessarily advanced<br />The technology is a fast moving target<br />Especially state of the specific tools & frameworks<br />Theory vs. practice<br />I try to balance the two; both are essential<br />Time is limited<br />Only scratching the surface of certain topics<br />Missing topics are usually full tutorials in their own right<br />Much of the subject matter is up for debate<br />And, this is a tutorial, not a workshop…. <br />
  8. 8. Clouds<br />Fear of a Fluffy Planet<br />8<br />
  9. 9. (court<br />(Courtesy of browsertoolkit.com)<br />
  10. 10. The Freedom!<br />On Demand Infrastructure via API calls<br />Inside or outside my data centres (Private / Public Cloud)<br />Pay-per-use pricing models<br />Great for temporary growth needs<br />Platform-as-a-Service<br />Scalability without Skill, Availability without Avarice<br />Large Scale, Always On<br />New opportunities due to cheaper scale & availability<br />
  11. 11. The Horror!<br />Hype Overdrive<br />Cloud Running Shoes! Cloud Chewing Gum! GOOG!Werner Vogels Action Figures! (well, not quite yet)<br />Standards Support<br />So many to choose from!<br />OCCI, vCloud + OVF, EC2, WBEM, WS-Management<br />Platform-as-a-Service<br />What color would you like for your locked trunk’s interior?<br />Crazy Talk<br />No SQL! Eventual Consistency! Infrastructure as Code!<br />
  12. 12. Will the Real Slim Cloudy Please Stand Up?<br />“I, for one, welcome our new outsourced overlords”<br />Finer-grained outsourcing<br />Metered resource usage<br />APIs & self-service UIs<br />… but isn’t outsourcing often a shell game?<br />See Distributed Computing Economics, Jim Gray (2003)<br />“Scale without skill, availability without avarice”<br />Insert constrained code [here]<br />Magically scalable & available<br />GAE, Azure (some day)<br />… but aren’t you locked in?<br />
  13. 13. Will the Real Slim Cloudy Please Stand Up?<br />“I like Big *aaS and I cannot lie”<br />“My name is… what? Slim Cloudy!”<br />Private, Public, or Community Clouds<br />Multiple stack levels<br />“Real” SOA, not just web services<br />… haven’t I heard this before?<br />Reduced lead times to change<br />Agile Operations / Lean IT<br />Revolution in systems management<br />… can we really change IT?<br />
  14. 14. Designing Applications in this World<br />Distributed & networked systems have triumphed<br />The fallacies must be taken seriously now<br />Network is unreliable, latency &gt; 0, bandwidth is finite, topology might change, etc.<br />Scale-out & fault tolerance: the new design center<br />Versus productive business logic, data management, etc.<br />What’s old is new<br />Some challengers to mainstream ideas are old ideas being reapplied<br />e.g. Erlang, Map/Reduce, distributed file systems, replication<br />
  15. 15. Designing Applications in this World<br />Autonomous services constitute most systems<br />Full-stack services, not just bits of code<br />Design for constant operations<br />Interdependence + Distribution + Autonomy = Pain<br />FCAPS (Fault, Configuration, Accounting, Performance & Security Management) <br />Security & Privacy<br />Multi-tenancy, data-in-transit vs. data-at-rest, etc.<br />
  16. 16. Solving for one’s own problems<br />Mainstream tools, platforms, and servers have not consistently caught up<br />LOTS of software experimentation in:<br />Web servers, containers, caches, databases, network configuration, systems management<br />The danger is to view new solutions as the better way of doing things in general<br />It’s possible; but stuff is changing quickly<br />New territory always involves a level of reinvention<br />The tech world has not rebooted due to cloud computing<br />Beware Fanbois/Fangrrls, Pundits & The Press<br />
  17. 17. A Cloud Design Reference Architecture<br />Web – WebArch & REST<br />Service, Data,& Control – this tutorial<br />Resource –virtualization,management &infrastructure clouds<br />WEB<br />SERVICE<br />DATA<br />CONTROL<br />RESOURCE<br />
  18. 18. Service<br />Organizing your computing domain for<br />fault<br />scale<br />management<br />WEB<br />SERVICE<br />DATA<br />CONTROL<br />RESOURCE<br />
  19. 19. Data<br />Storage, retrieval,integrity, recovery given<br />Distributed systems<br />Large scale<br />High availability<br />(possible) Multi-tenancy<br />WEB<br />SERVICE<br />DATA<br />CONTROL<br />RESOURCE<br />
  20. 20. Control<br />Provision, configuration, governance, and optimization of infrastructure<br />Resource brokerage<br />Policy constraints<br />Dependency management<br />Software configuration<br />Authorization & Auditability<br />WEB<br />SERVICE<br />DATA<br />CONTROL<br />RESOURCE<br />
  21. 21. Service<br />Foundation for Systems<br />
  22. 22. Designing a Service, circa 1998-2008<br />Multi-Tier Hybrid Architecture<br />Some stateless, some stateful computing<br />Session state is replicated<br />Independent servers / applications<br />Low-level redundancy (RAID, 2x NICs, etc.)<br />“Put your eggs into a small number of baskets, and watch those baskets”<br />General assumptions<br />Failure at the service layer shouldn’t lead to downtime<br />Failure at the data layer may be catastrophic<br />
  23. 23. Designing a Service, circa 2008+<br />Autonomous services <br />Divide system into areas of functional responsibility (tiers irrelevant)<br />Interdependent servers / applications<br />Software-level redundancy andfault handling <br />“Many, many servers breaking big problems down or distributinglots of little problems around”<br />New realities<br />Partial failure is a regular, normal occurrence; no excuse for downtime from any service<br />
  24. 24. Breaking or bridging a problem across resources<br />Big Problems (Parallel)<br />Theory:Amdahl’s lawShared memory or disk vs. Shared nothing<br />New Practice:MapReduce (e.g. Hadoop), Spaces, Master/Worker<br />Retro: Linda, MPI, OpenMP, IPC or Threads<br />Little Problems (Concurrent)<br />Theory: Actor-model & process calculi<br />New Practice: Lightweight Messaging, Spaces, Erlang & Scala Actors<br />Retro: IPC, Thread pools,Components (COM+/EJB),Big Messaging (MQ, TIB, JMS)<br />
  25. 25. Case Study in “Big Problem” Solving:MapReduce & Apache Hadoop<br />Input<br />Read your data from files as a K/V map<br />Distribute Mapping Function<br />Input one (k,v) pair<br />returns new K/V list<br />Partition & Sort<br />Handled by framework (eg. Hadoop)<br />Provide a comparator<br />Distribute Reduce Function<br />Input one (k, list of values) pair<br />Return a list of output values<br />Output<br />Save the list as a file<br />
  26. 26. ….But how fast can I get?Theory Interlude: Amdahl’s Law<br />How fast can I speed up a sequential process?<br />Time = Serial part + Parallel part <br />Thus, the speed up is<br />Where P is the % of the program that can be parallel<br />N is the number of processors<br />What happens when P is 95%? -- Maximum of 20x <br />How about 99.99%?<br />
  27. 27. Gunther’s Universal Scalability Law<br />It gets worse…<br />Most scale-outexperiencesretrogradebehavior at peak loads<br />Capacity(N)  =   N 1 + α (N − 1) + β N (N − 1) <br />α is the contention <br />β is the coherency delay<br />http://www.perfdynamics.com/Manifesto/gcaprules.html<br />
  28. 28. Case study in solving “little problems”Actors: The Basic Idea<br />Programmable entities are concurrent, share nothing, communicate through messages<br />Actors can<br />Send messages<br />Create other actors<br />Specify how it responds to messages<br />Very lightweight (actors = objects)<br />Usually no ordering guarantees<br />At the language level<br />
  29. 29. ErlangSupervisors: Assuming failure will occur<br />Failures require cleanup & restart<br />Supervisor relationships canensure the systemtolerates faults<br />Hot-swap patches<br />Fundamentally inthe language libraries<br />
  30. 30. What kinds of failures? A Simplification.<br />Exceptional Conditions<br />Conditions that a programmer did not or should not handle<br />Tolerated through replication, fast failure, and/or restart(s)<br />Examples<br />Hardware failures, network outages, “Heisenbugs”, rare software conditions<br />Conditions that the programmer can handle<br />Handled through cleanup or “catch” code<br />Examples<br />File not found, type conversion, bad arithmetic (divide by zero),malformed input<br />Error Conditions<br />
  31. 31. Data<br />Management & Access<br />
  32. 32. Evolving the Database: Two Philosophies<br />Data Persistence Systemsand Frameworks<br />Database Management Systems(DBMS)<br />Goal: Store & retrieve data quickly, reliable, with minimal hassle to the programmer<br />Often uses application tools & languages to manage & access data<br />Focused set of features<br />Goal: Manage the access, integrity, security, and reliability of data, independently of applications<br />Hard separation of tools & languages (e.g. SQL, DBA tools)<br />Broad set of features<br />
  33. 33. Scaling the Database: Two Philosophies<br />Scale-Up<br />Scale-Out<br />Concurrent processing & parallelism through hardware<br />SMP, NUMA, MPP<br />RAID Arrays (SAN & NAS)<br />Shared disk or memory<br />Benefit: It worked in the 90s.<br />Drawback: Expensive, often bespoke, forklift upgrades<br />Concurrent processing & parallelism through software<br />Commodity hardware<br />Software provides the engine<br />Shared nothing<br />Benefit: Linear scale, easy to standardize, easy to replicate / upgrade<br />Drawback: Traditionally, the software sucked.<br />33<br />
  34. 34. … What happens when database clustering software stops sucking? (i.e. now)<br />A flurry of programmer-oriented approaches<br />Persistence engines rule the bleeding edge in 2009<br />Key/Value Stores, JSON Document stores, etc.<br />Declarative/Imperative impedance mismatch(the “Vietnam” of the software tools industry) gets conflated with distributed data<br />Lots of practical confusion<br /><ul><li>What are the tradeoffs with a widely scaled out database system?
  35. 35. Too many choices, with idiosyncratic design histories
  36. 36. Let’s detangle this…</li></ul>34<br />
  37. 37. When should I share components?<br />Shared Disk<br />Shared Nothing<br />Partition compute across nodes<br />Storage is shared through NAS or SAN<br />Good for:<br />Mixed workload<br />Small random access reads<br />Worst case:<br />Inter-node network chatter caps scalability<br />Disk pings to propagate writes (e.g. Oracle pre-RAC)<br />Partition data across nodes<br />Each node owns its data<br />Good for:<br />Read-mostly<br />Parallel reads of huge data volumes<br />Consistent writes go to one partition<br />Worst case:<br />Repartitioning<br />Hotspot records don’t scale<br />Writes that span partitions<br />
  38. 38. Modern Data Persistence Systems <br />Object Persistence<br />“Navigational databases in Java, Smalltalk, C++”<br />GemStone, Versant, Objectivity<br />Distributed Key-Value Stores<br />“Structured data with lesser need for complex queries”<br />Consistent: BigTable, HBase, Voldemort<br />Eventually Consistent: Dynamo, Cassandra<br />Document and/or Blob Stores<br />“Indexed structured data + binaries/fulltext”<br />CouchDB, BerkeleyDB, MongoDB<br />
  39. 39. Clustered DBMS for Transactions<br />Oracle Real Application Clusters (RAC)<br />Shared disk, Replicated Memory (“Cache Fusion”)<br />Limited by mesh interconnect to disk (partitioning possible)<br />IBM DB2 Data Partitioning Feature<br />Shared nothing database cluster, high number of nodes<br />IBM DB2 pureScale<br />New (Oct 2009) technology that ports IBM DB2 mainframe shared-disk clustering to the DB2 for open systems<br />Microsoft SQL Server 2008<br />“Federated” Shared Nothing Database a longtime feature<br />
  40. 40. Clustered DBMS for Parallel Queries<br />Teradata<br />The old standard data warehouse, hardware + software<br />Netezza<br />Data warehousing appliance (hw + software)<br />Vertica<br />Column-oriented, shared nothing clustered database<br />Mike Stonebraker’s new company<br />Greenplum<br />Column-oriented, shared nothing clustered database<br />Based on PostgreSQL with MapReduce engine<br />
  41. 41. Scaling to Internet-Scale<br />Single Control Domain<br />One Database Site<br />Consistency is built-in<br />Scalable with tradeoffs among different workloads<br />Scale to the limits of network bandwidth & manageability<br />Main Example:<br />Clustered DBMS<br />Multiple Control Domains<br />Many Database Sites<br />Consistency requires agreement protocol<br />Scalable only if consistency is relaxed<br />Nearly limitless (global) scale<br />Main Examples:<br />DNS <br />The Web<br />39<br />
  42. 42. How do I make consistency tradeoffs?Theory interlude: The CAP theorem<br />Consistency (A+C in ACID)<br />There’s a total orderingon all operations on the data;i.e. like a sequence<br />Availability<br />Every request onnon-failed servers must havea response<br />Tolerance to Network Partitions<br />All messages might be lost between server nodes<br />Choose at most two of these (as a spectrum).<br />
  43. 43. CAP Tradeoffs: Consistency & Availability<br /><ul><li>The common case.
  44. 44. Fault tolerance through replicas </li></ul> & fast fail + fast recovery<br /><ul><li>Implication:
  45. 45. network outage between servers might halt the system
  46. 46. generally requires a single domainof control
  47. 47. Examples that emphasize C+A:
  48. 48. Single-site cluster databases
  49. 49. Google BigTable
  50. 50. Hadoop’sHBase
  51. 51. Oracle RAC, IBM DB2 Parallel
  52. 52. Clustered file systems
  53. 53. Google File System & HDFS
  54. 54. Distributed Spaces & Caches
  55. 55. Coherence, Gigaspaces & Terracotta</li></li></ul><li>CAP Tradeoffs: Consistency & Partitions<br /><ul><li>Common approach for traditional distributed systems
  56. 56. Implication:
  57. 57. multiple domains of control
  58. 58. clients can’t always read/write
  59. 59. failures degrade scale & performance due to negotiation
  60. 60. Examples that emphasize C+P:
  61. 61. Distributed shared nothing databases
  62. 62. Two-phase commit
  63. 63. Distributed locks & file systems
  64. 64. Chubby & Hadoop’sZooKeeper
  65. 65. Paxos & consensus protocols
  66. 66. Synchronous Log Shipping</li></li></ul><li>CAP Tradeoffs: Partitions & Availability<br /><ul><li>New approach for Internet-scale systems
  67. 67. Implication:
  68. 68. multiple domains of control
  69. 69. reads & writes always succeed(eventually)
  70. 70. clients may read inconsistent (old or undone) data
  71. 71. Examples that emphasize A+P:
  72. 72. Internet DNS
  73. 73. Web Caching & Content Delivery Networks
  74. 74. Amazon Dynamo (and clones)
  75. 75. Cassandra (Facebook, Digg)
  76. 76. CouchDB (BBC)
  77. 77. Asynchronous log shipping</li></li></ul><li>Summary of the CAP Tradeoffs<br />Mix & match the tradeoffs where appropriate<br />Google’s search engine uses all three!<br />The tradeoffs are a spectrum, and are not static choices<br />eg. there are adjustable levels of consistency to consider <br />Strict, causal, snapshot / epoch, eventual, weak…<br />The main tradeoff: writes to multiple sites / domains of control (with or without high availability)<br />Single Domain (don’t tolerate network partitions), orAgreement Protocol (reduces availability), orRelaxed Consistency (stale/inaccurate data is possible)<br />Weaker consistency is where the idea of a DBMS falters (it is contrary to its main purpose in life)<br />
  78. 78. Please don’t throw out logical/relationaldata design! (unless you have to)<br />“Future users of massive datasets should be protected from having to know how the data is organized in the computing cloud….<br />…. Activities of users through web agents and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.”<br />Paraphrasing Ed Codd – 39 years ago!<br />
  79. 79. Control<br />Containers, Configuration, & Modeling<br />
  80. 80. The Dev / Ops Game<br />
  81. 81. Example:Why can’t these two servers communicate?<br />Possible areas of problems<br />Security<br />Bad credentials<br />Server Configuration<br />Wrong IP or Port<br />Bad setup to listen or call<br />Network Configuration<br />Wrong duplex<br />Bad DNS or DHCP<br />Firewall Configuration<br />Ports or protocols not open<br />
  82. 82. Example:What do I need to do to make this change?<br />Desired Change<br />Scale-out this cluster<br />But…<br />Impacts on other systems<br />Security Systems<br />Load Balancers<br />Monitoring<br />CMDB / Service Desk<br />Architecture issues<br />Stateful or stateless nodes<br />Repartitioning?<br />Limits/constraints on scale out?<br />49<br />
  83. 83. Example:What is the authoritative reality?<br />Desired State<br />Configuration Template<br />Model<br />Script<br />Workflow<br />CMDB<br />Code<br />Current State<br />On the server<br />Might not be in a file<br />Might get changed at runtime<br />And when you do change…<br />It may not actually change<br />It might change to an undesirable setting<br />It might affect other settings that you didn’t think about<br />50<br />
  84. 84. Configuration Code, Files, and Models<br />Bottom Up<br />Scripts & Recipes<br />Hand-grown automation<br />Runbooks<br />Workflow, policy<br />Frameworks<br />Chef<br />Puppet, Cfengine<br />Build Dependency Systems<br />Maven<br />Top Down<br />Modeled Viewpoints<br />E.g. Microsoft Oslo, UML, Enterprise Architecture<br />Modular Containers<br />E.g. OSGi, Spring, Azure roles<br />Configuration Models<br />SML, CIM<br />ECML , EDML<br />
  85. 85. An Evolution of Automation<br />Scripts<br />For automating common cases<br />Run-Book Automation<br />Scripts as visual workflow<br />Declarative<br />Separate what you want from how you want it done<br />Finite State Machines<br />Organize scripts into described states & transitions<br />Hierarchical Task Networks (Planning)<br />Assemble a plan by exploring hypothetical strategic paths<br />
  86. 86. An Approach to Integrated Design and Ops<br />53<br />
  87. 87. Wrap-up<br />Cloudy, with a chance of …<br />
  88. 88. Revisiting the Cloud Design Reference Architecture<br />Service – Big vs. Little ProblemsMapReduce & ActorsAmdahl’s Law<br />Data – persistence vs. mgtscale-up vs. scale-outCAP tradeoffs<br />Control –containers, configuration, automation<br />WEB<br />SERVICE<br />DATA<br />CONTROL<br />RESOURCE<br />
  89. 89. For More Information<br />Hadoop<br />http://hadoop.apache.org/<br />CAP Theorem Proof Paper<br />http://people.csail.mit.edu/sethg/pubs/BrewersConjecture-SigAct.pdf<br />Google’s papers on Distributed & Parallel Computing<br />http://research.google.com/pubs/DistributedSystemsandParallelComputing.html<br />Neil Gunther’s “Taking the Pith out of Performance” Blog<br />http://perfdynamics.blogspot.com/<br />A Comparison of Approaches to Large-Scale Data Analytics<br />http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf<br />Model-Driven Operations for the Cloud<br />http://www.stucharlton.com/stuff/oopsla09.pdf<br />

×