Architecting for the cloud storage misc topics


Published on

this is day 4 of the Architecting for the Cloud course. It covers storage solutions and a collection of miscellaneous topics

Published in: Software
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Architecting for the cloud storage misc topics

  1. 1. © Matthew Bass 2013 Architecting for the Cloud Len and Matt Bass Storage in the Cloud
  2. 2. © Matthew Bass 2013 Outline This section will focus on storage in the cloud • We will first look at relational databases • What solutions emerged for the cloud • Storage options for NoSQL databases • Architecture of typical NoSQL databases
  3. 3. © Matthew Bass 2013 Outline This section will focus on storage in the cloud • We will first look at relational databases • What solutions emerged for the cloud • Storage options for NoSQL databases • Architecture of typical NoSQL databases
  4. 4. © Matthew Bass 2013 History • The relational data model was created in the late 1960s • In the 1980s relational databases became commercially successful – Replacing Hierarchical and Network data bases • Relational databases continue to be the dominate db model today
  5. 5. © Matthew Bass 2013 Relational Databases • The relational model is a mathematical model for describing the structure of data – We will not go into this model • Let’s take a quick review of the 1st and 2nd normal form, however
  6. 6. © Matthew Bass 2013 Example Imagine you sell car parts – You have warehouses – You have part inventories – You have orders What’s the problem? Warehouse Warehouse Address Part
  7. 7. © Matthew Bass 2013 What Happens Here? Warehouse 1 123 Main Street Transmission, Steering wheel, Brake pads, … What about here? Warehouse 1 123 Main Street Transmission Warehouse 1 123 Main Street Steering wheel Warehouse 1 123 Main Street Brake Pads
  8. 8. © Matthew Bass 2013 The Solution … Warehouse ID Warehouse Address Warehouse Table: Parts Table: Relations Table: Part ID Part Description Warehouse ID Part ID
  9. 9. © Matthew Bass 2013 This Works • We have a standard language for querying the data (SQL) • We can now extract data in a very flexible way • We can read, write, update, and delete data pretty efficiently – Joins add some overhead
  10. 10. © Matthew Bass 2013 Moreover We Have RDBMS • We have robust software systems that manage the data • These systems provide many advanced features including: – Behavior – Concurrency control – Transactions – Referential integrity – Optimization
  11. 11. © Matthew Bass 2013 Behavior • DBMS’s provide mechanisms for building in behavior • These are mechanisms like – Stored procedures – PLSQL • This allows you to simplify the application logic
  12. 12. © Matthew Bass 2013 Concurrency Control • DBMS’s will support multiple user access • They will lock tables during updates to ensure that writes are complete prior to reads • They will manage multiple updates to ensure integrity and consistency of data
  13. 13. © Matthew Bass 2013 Transactions • Transactions are supported • This ensures that updates either happen completely or not at all – Often an atomic update is a set up updates to individual records across multiple tables – If only some of these updates happen the integrity of the overall database is compromised
  14. 14. © Matthew Bass 2013 Referential Integrity • Ensures that references from one table refer to a valid entry in another table
  15. 15. © Matthew Bass 2013 Optimization • Database systems will perform a variety of actions to optimize based on usage patterns • They will – Create indexes – Create virtual tables – Cache values – …
  16. 16. © Matthew Bass 2013 Impedance Mismatch • There is however, a mismatch – We need to translate between the relational structure and the organizational needs • Think about the reports needed for the warehouse – Purchase orders – History of orders for customer – Parts inventory per warehouse – … • This means we will need lots of Joins – This isn’t too much of an issue until we scale …
  17. 17. © Matthew Bass 2013 Speaking of Scaling … Do relational databases scale?
  18. 18. © Matthew Bass 2013 Internet Scale Is Difficult • We can “shard” the data – Split the data across the machines • This is very difficult to do efficiently • This makes joins more costly – Remember joins are common • This also has a practical limit – At some point you will need to replicate the data • The database becomes slow …
  19. 19. © Matthew Bass 2013 Change is Needed • For this reason internet scale applications moved to distributed file systems – Google was the first – Many others followed • This allowed the data to be partitioned across nodes more efficiently – We’ll talk about this in a minute
  20. 20. © Matthew Bass 2013 Outline This section will focus on storage in the cloud • We will first look at relational databases • What solutions emerged for the cloud • Storage options for NoSQL databases • Architecture of typical NoSQL databases
  21. 21. © Matthew Bass 2013 Needs • Let’s explore the needs in a bit more detail • The file system needed to: – Be fault-tolerant – Handle large files – Accommodate extremely large data sets – Accommodate many concurrent clients – Be flexible enough to handle multiple kinds of applications
  22. 22. © Matthew Bass 2013 Fault-Tolerance • Due to the scale of the systems they were deployed on hundreds or thousands of servers • This meant that at any given time some of these nodes would not be operational • Problems from application bugs, operating system bugs, human error, hardware failures, and networks are common
  23. 23. © Matthew Bass 2013 Large Files/Large Data Sets • It’s common for files in these systems to be multiple GBs • Each file could have millions of objects – E.g. many individual web pages • The data sets grow quickly • The data sets can be multiple terabytes or petabytes
  24. 24. © Matthew Bass 2013 Many Concurrent Clients • The system needs to efficiently handle multiple clients • These clients could be reading or writing
  25. 25. © Matthew Bass 2013 Multiple Applications • Additionally the system needs to be flexible enough to handle multiple applications • Applications have a variety of needs – Long streaming reads – Throughput oriented operations – Low latency reads – …
  26. 26. © Matthew Bass 2013 Addressing Needs • There were a number of things that were done to address the needs • One primary decision was the de-normalization of the data – We’ll talk about this more in the next slides • Other decisions include (we’ll talk about these in a bit) – Block size – Replication strategy – Data consistency checks – API and capability of the system
  27. 27. © Matthew Bass 2013 De-Normalizing Data • Remember what was difficult with relational models? – Joins across nodes are expensive – As is synchronization for replicated data • If the data is de-normalized it can be “localized” – Data that will likely be accessed together can be collocated – In other words store it as you will use it
  28. 28. © Matthew Bass 2013 Example • Imagine a Purchase Order • Typically this would contain – Customer information – Product information – Pricing
  29. 29. © Matthew Bass 2013 Relational Purchase Order • The data could would be split across multiple tables such as – Customer – Product Catalog – Inventory – … • If the data set is large enough the data would be distributed
  30. 30. © Matthew Bass 2013 De-Normalized Purchase Order • In a file system without a relational model the data doesn’t need to be split up • The purchase order data would be co-located • If the data set was very large purchase orders would still be co-located – Different purchased orders could be distributed – A single purchase order, however would not be
  31. 31. © Matthew Bass 2013 Relational vs NoSQL Relational Model NoSQL Customers Product Catalog Inventory Orders 1 - 100 Orders 101 - 200 Orders 201 - 300
  32. 32. © Matthew Bass 2013 What Does This Mean? • Data has no explicit structure (not entirely true … but we’ll talk about this) – Data is largely treated as a blob • This has several implications – You can change the nature of the data as needed – You can collocate the data as desired – The application now has increased burden
  33. 33. © Matthew Bass 2013 Back to Purchase Order PO Number PO 1 Contents of PO1 … 2 Contents of PO2 … 3 Contents of PO3 … 4 Contents of PO4 … Key Value
  34. 34. © Matthew Bass 2013 Retrieving Data • To retrieve the purchase order data you provide the reference key • The file system routes you to the appropriate node (more later) • The single node returns the entire purchase order • This can happen quickly … regardless of how many purchase orders you have Do you see any potential issues?
  35. 35. © Matthew Bass 2013 Data Locality • First, being able to retrieve the data quickly depends on the location of the data • If the data is distributed it’s difficult to retrieve quickly – Imagine you want to get the number of times a customer ordered product X – More on this later • While there is not an explicit structure there is an implicit structure – Design of this structure is important
  36. 36. © Matthew Bass 2013 Data Processing • As the file system treats the data as unstructured it’s not able to preprocess the data • Getting an ordered list, for example, has to be done in the application • The validity of the data needs to be checked by the application
  37. 37. © Matthew Bass 2013 Updating Data • What happens if you want to change the data? – Imagine trying to update the customer’s address • Updates tend to be difficult • In this environment you tend to not update data – Instead you will append the new data – You can establish rules for the lifetime of the data
  38. 38. © Matthew Bass 2013 Other Issues • Things like data integrity are not managed by the file system • You don’t (typically) have full support for transactions • There is no notion of referential integrity • There is support for some concurrent access, but with built in assumptions • Consistency is not typically guaranteed (more later)
  39. 39. © Matthew Bass 2013 A New Tool in Your Toolbox • You’ve been given a new kind of hammer – Remember that everything is not a nail – In other words these kinds of data stores are good for some things … and not others • Today there are many different flavors of these data stores – Both in terms of structures and features
  40. 40. © Matthew Bass 2013 Multiple Data Structures • Today many options exist – Key value stores – Document centric data stores – Column data bases • We’ve also started to see old models reemerge e.g. – Hierarchical data stores
  41. 41. © Matthew Bass 2013 Key Value Databases • Basically you have a key that maps to some “value” • This value is just a blob – The database doesn’t care about the content or structure of this value • The operations are quite simple e.g. – Read (get the value given a key) – Insert (inserts a key/value pair) – Remove (removes the value associated with a given key)
  42. 42. © Matthew Bass 2013 Key Value Databases II • There is no real schema – Basically you query a key and get the value – This can be useful when accessing things like user sessions, shopping carts, … • Concurrency – Concurrency only makes sense at the level of a single key – Can have either optimistic write or eventual consistency – we’ll talk about this more later • Replication – Can be handled by the client or the data store – more about this later
  43. 43. © Matthew Bass 2013 Uses • Very fast reads • Scales well • Good for quick access of data without complex querying needs – The classic example is for session management • Not good for – Situations where data integrity is critical – Data with complex querying needs
  44. 44. © Matthew Bass 2013 Document Centric Databases • Stores a “document” ID : 123 Customer : 8790 Line Items : [{product id: 2, quantity: 2} {product id: 34, quantity 1}] …
  45. 45. © Matthew Bass 2013 Document Centric • No schema • You can query the data store – Can return all or part of the document – Typically query the store by using the id (or key) • As with key value, discussing concurrency only makes sense at the level of a single document
  46. 46. © Matthew Bass 2013 Advantages • A document centric data store is similar in many ways to a key/value data store • It does, however, allow for more complex queries – For example you can query using a non-primary key
  47. 47. © Matthew Bass 2013 Column Databases • Row key maps to “column families” 1234 … Name Matt Billing Address 123 Main st Phone 412 770-4145 Profile Order Data … Order Data … Order Data … Orders
  48. 48. © Matthew Bass 2013 Column Databases - Rows • Rows are grouped together to form units of load balancing – Row keys are ordered and grouped together by locality – In this example consecutive rows would be from the same domain (CNN) • Concurrency makes sense at the level of a row Key Contents com.cnn.www Html page … …
  49. 49. © Matthew Bass 2013 Column Databases – Columns • Columns are grouped into “column families” • Column families form the unit of access control – Clients may or may not have access to all column families • Column keys can be used to query data
  50. 50. © Matthew Bass 2013 Column Databases – Timestamps • The cells in a column database can be versioned with a timestamp • The cells can contain multiple versions – The application can typically specify how many versions to keep or when a version times out • You can use either use a client generated timestamp or one generated by the storage node
  51. 51. © Matthew Bass 2013 Examples Document Centric • MongoDB • CouchDB • RavenDB Key Value • DynamoDB • Azure Table • Redis • Riak Column • Hbase • Cassandra • Hypertable • SimpleDB
  52. 52. © Matthew Bass 2013 NoSQL vs RDBMS • Explicit vs Implicit Schema – NoSQL databases do have an implicit schema – at least in most cases • Distribution of data • Consistency • Efficiency of storage • Additional capabilities
  53. 53. © Matthew Bass 2013 Schema • Clearly with Relational DB there is an explicit schema • You do have an implicit schema with NoSQL db as well – You typically want to do something with the data • With relational schema distributed data has a big performance impact • Data model of NoSQL data impacts performance as well – It is easier to distribute data so that related data is co-located
  54. 54. © Matthew Bass 2013 Consistency - CAP Theorem • When data becomes distributed you need to worry about a network partition – Essentially this means that instances of your data store can’t communicate • When this happens you need to choose between availability or consistency
  55. 55. © Matthew Bass 2013 Let’s Demonstrate • Imagine we start a store that takes orders – Who wants to work at this store? • The operators need to be able to: – Take orders – Give order history – Modify orders • We will start with one operator until business grows …
  56. 56. © Matthew Bass 2013 Consistency in the Cloud • Many NoSQL databases give you options – Eventual consistency – Optimistic consistency – … • They all come with different trade offs • You must understand the needs of your system to ensure appropriate behavior – We’ll talk more about this later
  57. 57. © Matthew Bass 2013 Outline This section will focus on storage in the cloud • We will first look at relational databases • What solutions emerged for the cloud • Storage options for NoSQL databases • Architecture of typical NoSQL databases
  58. 58. © Matthew Bass 2013 Fault Tolerance • As we said earlier fault tolerance was a prime motivator for many of the decisions • These systems are built with commodity components that are prone to failure • They also need to deal with other issues (previously mentioned) that arise • We’ll look at a representative example of such a system to understand what decisions have been made
  59. 59. © Matthew Bass 2013 Google File System • Grew out of “BigFiles” • Distributed, scalable and portable file system • Written in Java • Supports the kinds of applications we discussed earlier – Search – Large data retrieval
  60. 60. © Matthew Bass 2013 Leads to following requirements 1. High reliability through commodity hardware – Even with RAID, disks will still have one failure per day. If the system has to deal with failure smoothly in any case, it is much more economical to use commodity hardware. – Even if disks do not fail, data blocks may get corrupted. 2. Minimal synchronization on writes – Require each application process to write to a distinct file. File merge can take place after files are written. – This means minimal locking during the write process (or read process). 3. Data blocks are all the same size – Streaming data. ALL blocks are 64MBytes. – GFS is unaware of any internal logic of data and the Internal logic of data must be managed by the application
  61. 61. © Matthew Bass 2013 GFS Interfaces • Supports the following commands – Open – Create – Read – Write – Close – Append – Snapshot
  62. 62. © Matthew Bass 2013 Organization of GFS • Organized into clusters • Each cluster might have thousands of machines • Within each cluster you have the following kinds of entities – Clients – Master servers – Chunk servers
  63. 63. © Matthew Bass 2013 GFS Clients • Clients are any entity that makes a file request • Requests are often to retrieve existing files • They might also include manipulating or creating files • Clients are other computers or applications – Think of the web server that serves your search engine as a client
  64. 64. © Matthew Bass 2013 Chunk Servers • Responsible for storing the data “chunks” – These chunks are all 64 MB blocks • These chunk servers are the work horses of the file system • They receive requests for data and send the chunks directly to the client • The client also writes the files directly to the appropriate chunk servers – The reference for replicas come from the master as well • The chunk server is responsible for determining the correctness of the write (more later)
  65. 65. © Matthew Bass 2013 Master Servers • Acts as a coordinator for the cluster • Keeps track of the metadata – This is data that describes the data blocks (or chunks) – Tells the Master what chunk the file belongs to • Master tells the client where the chunk is located • Master keeps an operations log – Logs the activities of the cluster – One of the mechanisms used to keep service outages to a minimum (more later)
  66. 66. © Matthew Bass 2013 Two Additional Concepts Lease: • Lease is the minimal locking that is performed. Client receives lease on a file when it is opened and, until file is closed or lease expires, no other process can write to that file. This prevents accidently using the same file name twice. • Client must renew lease periodically (~ 1 minute) or lease is expired. Block: • Every file managed by GFS is divided into 64MByte blocks. Each read/write is in terms of <file, block #> • Each block is replicated – three is the default number of replicas. • As far as GFS is concerned there is no internal structure to a block. The application must perform any parsing of the data that is necessary.
  67. 67. © Matthew Bass 2013 Basic Read Operation Client Master Chunk Server Chunk Server Chunk Server Requests location of File Sends read request Returns location Returns file content
  68. 68. © Matthew Bass 2013 Basic Write Operation Client Master Chunk Server Chunk Server Chunk Server Requests location of primary and secondary Sends data to write Returns locations Caches locations Sends data to write Applies Mutations
  69. 69. © Matthew Bass 2013 Reliability Mechanisms • Master and chunk replication • Rebalancing • Stale replication detection • Checksumming • Garbage removal
  70. 70. © Matthew Bass 2013 Master Replication • One active Master per cluster • “Shadow” masters exist on other machines – These shadows may perform limited functions (i.e. reads) • Monitors the operations of the active master – Though the operations log • Maintains contact with the Chunk Servers by polling – Does this to keep track of data • If the Master fails the shadow takes over
  71. 71. © Matthew Bass 2013 Data Replication/Rebalancing • File system replicates chunks of data • It stores data on different machines across different racks – That way if a machine or rack fail another replica exists • Master also monitors cluster as a whole • It periodically rebalances the load across the cluster – All chunk servers run at near capacity but never at full capacity • Master also monitors each chunk to ensure data is current – If not it’s designated as a stale replica – The stale replica become garbage
  72. 72. © Matthew Bass 2013 Checksum • In order to detect data corruption checksumming is used • The system breaks each 64 MB chunk into 64 KB blocks • Each block has it’s own 32 bit checksum • The Master monitors the checksums for each block • If the checksums don’t match what the Master has on record it is deleted and a new replica created
  73. 73. © Matthew Bass 2013 Failure Scenarios • Let’s look at the following failure scenarios to see what happens – Client failure – Corrupt disk – Chunk server failure – Master failure
  74. 74. © Matthew Bass 2013 Client Failure • Client fails while file open • Master recognizes this because lease expires • File is placed in intermediate state where client can re- activate lease • After intermediate state expires (~hour), Master informs Chunk Server that have blocks for that file to delete them • Master removes all entries associated with file • Chunk Server deletes blocks
  75. 75. © Matthew Bass 2013 Corrupt Disk This is the case where a block becomes corrupted after writing. Replica1 writes a checksum for every 64 KB in a parallel file. Replica1 returns checksums along with the block during a read. Client checks checksum when block returned If there is an error then Client: • Retries read from different Replica2 • Informs Master of corrupt block on Replica1 Master: • Allocates new replica for that block on Replica3 • Informs Replica2 with an existing replica to copy it to Replica3. • Informs Replica1 with corrupted block to delete that block.
  76. 76. © Matthew Bass 2013 Chunk Server Failure Master sends Heartbeat request to Chunk Server • Active Replica responds with a list of block #, replica #s it has. • Failed Replica does not respond Master recognizes Replica’s failure. Master maintains block #, replica # -> Chunk Server mapping from last Heartbeat. Master queues all of blocks replicated on failed Chunk Server to generate an additional replica. The generation of an additional replica of Block A: • Allocate new replica on an active Chunk Server say Replica1 • Instruct one of the Chunk Servers with valid replica of Block A to copy it to Replica1.
  77. 77. © Matthew Bass 2013 Master Failure • Back up Master maintains copy of log • Responsible for creating checkpoint image and trimming EditLog • BackupNode takes over in case of Master failure • BackupNode may also fail BackupNode Master EditLog Checkpoint Image
  78. 78. © Matthew Bass 2013 More about Master Structure Four Threads: • Main – perform file management operations. • Ping/Echo – check on status of Chunk Servers and receive responses from Chunk Servers • Replica Management – manage new replica creation and replica deletion • Lease Management – cancel leases when they expire. Queues replicas for replica deletion for files where the client has failed. Three Modes • Normal operations • Safe mode – when Master is restarted then no new requests are accepted until percentage of Chunk Servers have reported their block allocations • Backup – act as Master backup
  79. 79. © Matthew Bass 2013 Summary • Relational databases are difficult to distribute efficiently – Scalability can be problematic • NoSQL databases offer an alternative – Data is typically schema-less • Aggregates of data that mirror primary use cases are considered a unit of data • Queries across nodes requires an efficient mechanism for aggregation
  80. 80. © Matthew Bass 2013 Architecting for the Cloud MiscTopics
  81. 81. © Matthew Bass 2013 Topics These are topics that have architectural implications and do not fit neatly into one of the other lectures. • Zookeeper • Failure in the cloud • Business continuity • Release planning • Managing configuration parameters • Monitoring 81
  82. 82. © Matthew Bass 2013 Zookeeper • Zookeeper is intended to manage distributed coordination – Synchronization – data 82
  83. 83. © Matthew Bass 2013 Distributed applications • Zookeeper provides guaranteed consistent (mostly) data structure for every instance of a distributed application. – Definition of “mostly” is within eventual consistency lag (but this is small). More on eventual consistency later. • Zookeeper deals with managing failure as well as consistency. – Done using Paxos algorithm. • Zookeeper guarantees that service requests are linearly ordered and processed in a FIFO order
  84. 84. © Matthew Bass 2013 Model • Zookeeper maintains a file type data structure – Hierarchical – Data in every node (called znode) – Amount of data in each node assumed small (<1M) – Intended for metadata • Configuration • Location • Group
  85. 85. © Matthew Bass 2013 Zookeeper znode structure / <data> /b1 <data> /b1/c1 <data> /b1/c2 <data> /b2 <data> /b2/c1 <data>
  86. 86. © Matthew Bass 2013 API Function Type create write delete write Exists read Get children Read Get data Read Set data write + others • All calls return atomic views of state – either succeed or fail. No partial state returned. Writes also are atomic. Either succeed or fail. If they fail, no side effects.
  87. 87. © Matthew Bass 2013 Example - Group membership • Remember the load balancer. It has a list of registered servers. • The load balancer wants to know which of its servers are – Alive – Providing service • The list must be – highly available – Reflect failure of individual servers • Strict performance requirements on list manager
  88. 88. © Matthew Bass 2013 Using Zookeeper to manage group membership • Load balancer on initialization – connects to zookeeper – Gets list of zookeeper servers – Create session (if server fails – automatic fail over) • Load balancer issues Create /”Servers” call • If already exists get a failure • Servers register by creating /Server/my_IP • Load balancer can list children of /Servers and get their IPs. • Watcher will inform Load balancer if a server fails or leaves. • Latency is low (order of micro seconds) since Zookeeper keeps data structures in memory.
  89. 89. © Matthew Bass 2013 Other use cases • Leader election • Distributed locks • Synchronization • Configuration
  90. 90. © Matthew Bass 2013 Topics • Zookeeper • Failure in the cloud • Business continuity • Release planning • Managing configuration parameters • Monitoring 90
  91. 91. © Matthew Bass 2013 Failures in the cloud • Cloud failures large and small • The Long Tail • Techniques for dealing with the long tail
  92. 92. © Matthew Bass 2013 Sometimes the whole cloud fails …
  93. 93. © Matthew Bass 2013 Selected Cloud Outages - 2013 • July 10, Google down for 10 minutes • June 18, Facebook down for 30 minutes • Aug 14-17 offline for three days • Aug 19, down for 40-45 minutes • Aug 22, Apple iCloud down for 11 hours • Aug 16, Google down for 5 minutes • Sept 13, AWS down for ~two hours • Nov 21, Microsoft services intermittent for ~2 hours
  94. 94. © Matthew Bass 2013 And sometimes just a part of it fails …
  95. 95. © Matthew Bass 2013 A year in the life of a Google datacenter • Typical first year for a new cluster: – ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover) – ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back) - ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) – ~5 racks go wonky (40-80 machines see 50% packetloss) – ~8 network maintenances (4 might cause ~30-minute random connectivity losses) – ~12 router reloads (takes out DNS and external vips for a couple minutes) – ~3 router failures (have to immediately pull traffic for an hour) – ~dozens of minor 30-second blips for dns – ~1000 individual machine failures – ~thousands of hard drive failures • slow disks, bad memory, misconfigured machines, flaky machines, dead horses, etc.
  96. 96. © Matthew Bass 2013 Amazon failure statistics • In a data center with ~64,000 servers with 2 disks each ~5 servers and ~17 disks fail every day.
  97. 97. © Matthew Bass 2013 What does this mean for a consumer of the cloud? • You need to be concerned about “long tail” distribution for requests due to piecewise failure • You need to be concerned about business continuity due to overall failure.
  98. 98. © Matthew Bass 2013 Short digression into probability • A distribution describes the probability than any given reading will have a particular value. • Many phenomenon in nature are “normally distributed”. • Most values will cluster around the mean with progressively smaller numbers of values going toward the edges. • In a normal distribution the mean is equal to the median
  99. 99. © Matthew Bass 2013 Long Tail • In a long tail distribution, there are some values far from the mean. • These values are sufficient to influence the mean. • The mean and the median are dramatically different in a long tail distribution.
  100. 100. © Matthew Bass 2013 What does this mean? • If there is a partial failure of the cloud some activities will take a long time to complete and exhibit a long tail. • The figure shows distribution of 1000 AWS “launch instance” calls. • 4.5% of calls were “long tail” Mean Median STD Max launch instance EC2 27.81 23.10 25.12 202.3
  101. 101. © Matthew Bass 2013 What can you do to prevent long tail problems? • “Hedged” request. Suppose you wish to launch 10 instances. Issue 11 requests. Terminate the request that has not completed when 10 are completed. • “Alternative” request. In the above scenario, issue 10 requests. When 8 requests have completed issue 2 more. Cancel the last 2 to respond. • Using these techniques reduces the time of the longest of the 1000 launch instance requests from 202 sec to 51 sec.
  102. 102. © Matthew Bass 2013 Topics • Zookeeper • Failure in the cloud • Business continuity • Release planning • Managing configuration parameters • Monitoring 102
  103. 103. © Matthew Bass 2013 Business continuity • Business continuity means that the business should continue to provide service even if a disaster such as a fire, floor, or cloud outage occurs. • Two numbers characterize a business continuity strategy – RTO is the Recovery Time Objective – how long before the service is available again – RPO is the Recovery Point Objective – what is the point in time that the system rolls back to. i.e. how much data can be potentially lost • Allows for cost/benefit trade offs. • Many industries such as banks have compliance rules that require business continuity policies and practices.
  104. 104. © Matthew Bass 2013 How does business continuity work? • Replicate site in physically distant location. • Recall DNS server with multiple sites • If first site does not respond promptly, client will try second site. X Site 1 Site 2 456.77.88.99123.45.67.89 456.77.88.99 DNS
  105. 105. © Matthew Bass 2013 What does it mean to “replicate site”? • Must have a parallel datacenter • Data must be replicated within RPO – If RPO is small or zero this implies DB replication – If RPO is larger then can use other means to replicate data • Software must also be replicated. – Versions must be identical in both sites • Using different versions in different sites may result in different results. • Configurations in two sites will be different but must yield the same results. • Replication of a site incurs costs. You may wish to increase the RPO and just copy (back up) data to another site.
  106. 106. © Matthew Bass 2013 Recall discussion about DNS servers • There is a hierarchy of DNS servers. • Local DNS servers are under the control of the local organization. • When a disaster happens, the new data center can be made operative by changing the IP address in the local DNS server.
  107. 107. © Matthew Bass 2013 What the the architectural implications • State maintained in servers will be lost if a disaster happens • Dependencies on other than configuration parameters must be identical in a replicated site. • Applications must be architected to be movable for one environment to another.
  108. 108. © Matthew Bass 2013 Topics • Zookeeper • Failure in the cloud • Business continuity • Release planning • Managing configuration parameters • Monitoring 108
  109. 109. © Matthew Bass 2013 Dependencies • There exist many different types of dependencies within a system. E.g. – Inter component – Version – Configuration parameters – Hardware – Location – Names – DB schemas – Platform – Libraries • Inconsistency among these dependencies is a common source of production time errors.
  110. 110. © Matthew Bass 2013 For example • You develop some code on your desktop. – You have installed the latest Java update – You configure your code to use a Python script to do some data cleansing – You depend on a component that your colleagues are simultaneously developing. • You deploy your code into production. – The latest Java version has not been installed. – Python has not been installed in the production environment. – Your colleagues are delayed in their development.
  111. 111. © Matthew Bass 2013 You finally get your code into production • A user has a problem and calls the help desk. • The help desk doesn’t know how to solve the problem and escalates it back to you. • You have gone on vacation.
  112. 112. © Matthew Bass 2013 Problems lead to a requirement for a formal “release plan” 1. Define and agree release and deployment plans with customers/stakeholders. 2. Ensure that each release package consists of a set of related assets and service components that are compatible with each other. 3. Ensure that integrity of a release package and its constituent components is maintained throughout the transition activities and recorded accurately in the configuration management system. 4. „„Ensure that all release and deployment packages can be tracked, installed, tested, verified, and/or uninstalled or backed out, if appropriate. 5. „„Ensure that change is managed during the release and deployment activities. 6. „„Record and manage deviations, risks, issues related to the new or changed service, and take necessary corrective action. 7. „„Ensure that there is knowledge transfer to enable the customers and users to optimise their use of the service to support their business activities. 8. „„Ensure that skills and knowledge are transferred to operations and support staff to enable them to effectively and efficiently deliver, support and maintain the service, according to required warranties and service levels * 112
  113. 113. © Matthew Bass 2013 Release planning is labor intensive • Note the requirements for coordination in the release plan • Each item requires multiple people and time consuming activities. – Time consuming activities delay introducing features included in the release. • Open questions – Which items are dealt with through process? – Which items are dealt with through tool support? – Which items are dealt with through architecture design? – Which items are dealt with through a combination of the above? • We will see an architecture designed to reduce team coordination inn a subsequent lecture.
  114. 114. © Matthew Bass 2013 Topics • Zookeeper • Failure in the cloud • Business continuity • Release planning • Managing configuration parameters • Monitoring 114
  115. 115. © Matthew Bass 2013 What is a configuration parameter? • A configuration parameter or environment variable is a parameter for an application that either controls the behavior of the application or specifies a connection of the application to its environment – Thread pool or database connection pool size control the behavior of the application. – Database url specifies an connection of the app to a database.
  116. 116. © Matthew Bass 2013 When are configuration parameters bound? • Recommended practice is to bind these at initialization time for the app. – App is loaded into an execution environment – App is told where to find configuration parameters through language, OS, or environment specific means. E.g. main parameter in C – App reads configuration parameters from the specified location. • The virtue of this approach is that an app can be loaded into different execution environments and doesn’t need to be aware of which environment it is.
  117. 117. © Matthew Bass 2013 Use DB as an example – Unit test • App is given URL for database access component. • In the case of unit test, the database access component is a component that maintains some fake data in memory for fast access without the overhead of the full DB.
  118. 118. © Matthew Bass 2013 Integration Test • Test database is maintained for integration testing. • Test database has subset of full data base. • URL of test database is provided to App • App can read or write test database
  119. 119. © Matthew Bass 2013 Performance testing • Special database access component exists for performance testing – Passes through reads to production database – Writes to mirror database • App is given URL of special database access component • Allows testing with real data but blocks and writes to real database • Mirror database is checked at end of test for correctness.
  120. 120. © Matthew Bass 2013 Other configuration parameters • Other configuration parameters should be identical from integration test through to production. • Reduces possibility of incorrect specification of configuration parameters. – Incorrect specification of configuration parameters is a major source of deployment errors.
  121. 121. © Matthew Bass 2013 Topics • Zookeeper • Failure in the cloud • Business continuity • Release planning • Managing configuration parameters • Monitoring 121
  122. 122. © Matthew Bass 2013 Monitoring • When is this done • Why is it done • What can you get from monitoring. • Data sources – monitor/logs
  123. 123. © Matthew Bass 2013 What is monitoring? • Monitoring is the collection of data from individual or collections of systems during the runtime of these systems. • Isn’t this an operations problem and not an architectural problem? – No. • Operators are first class stakeholders and their needs should be considered when designing the system. • In the modern world, difficult run time problems are solved by the architect so its to your advantage that the correct information is available. • Other reasons are implicit in the uses of monitoring information which we are about to go into.
  124. 124. © Matthew Bass 2013 Why monitor? 1. Identifying failures and the associated faults both at runtime and during post-mortems held after a failure has occurred. 2. Identifying performance problems both of individual systems and collections of interacting systems. 3. Characterizing workload for both short term and long term billing and capacity planning purposes. 4. Measuring user reactions to various types of interfaces or business offerings. We will discuss A/B testing later. 5. Detecting intruders who are attempting to break into the system. (outside of our scope).
  125. 125. © Matthew Bass 2013 Basic metrics • Per VM instance provider will collect – CPU utilization – Disk read/writes – Messages in/out • These metrics are used for – Charging – Scaling – Mapping utilization to workload • Similar type of metrics for storage and utilities • Can aggregate these metrics over autoscaling groups, regions, accounts, etc.
  126. 126. © Matthew Bass 2013 Other metrics • The problem with the basic metrics is that they are not related to particular activities whether business or internal. • Other things to monitor – Transactions – transactions per second gives the business an idea of how many customers are utilizing the system. – Transactions by type. – Messages from one portion of the system to another. – Error conditions detected by different portions of the system – … anything you want
  127. 127. © Matthew Bass 2013 How do I decide what to monitor? • Look at reasons for monitoring – Failure detection – Performance degradation – Workload characterization – User reactions • For each reason, – decide what symptoms you would like reported. – Place responsibilities to detect symptoms in various modules. – Decide on active/passive monitoring (discussed soon) – Decide what constitutes an alarm (discussed soon) – Logic should be under configuration control – levels of reporting
  128. 128. © Matthew Bass 2013 Metadata is crucial • Data by itself is not that useful. • It must be tagged with identifying information including timestamp.. • For example – VM CPU usage divided among which processes – I/O requests to which disks triggered from which VM process – Messages from which component to which other component in response to what user requests. • Ideal – each user request is given a tag and all monitoring information as a consequence of satisfying that request are tagged with request ID. • Other monitoring activities are tagged with ID that identifies why activity was trigger.
  129. 129. © Matthew Bass 2013 Why this emphasis on metadata? • Any of the uses enumerated for monitoring data require associating effect with its cause. • The monitoring data represents the effect. • The metadata enables determining the cause.
  130. 130. © Matthew Bass 2013 Active/Passive • Active data collection involves the component that generates the data. It emits it periodically or based on a triggering event – To a key-value store – To a file – A message to a known location • Passive data collection involves the component that generates the data making it available to an agent in the same address space. The agent emits the data either periodically or based on events.
  131. 131. © Matthew Bass 2013 Data Collection • Whether active or passive data, the data is emitted from a component to a known location periodically or based on events. System application agent System application agent Monitoring System
  132. 132. © Matthew Bass 2013 Monitoring Systems • Data collecting tools – Ngaio . – Sensu – Inciga – Cloud Watch – AWS specific
  133. 133. © Matthew Bass 2013 Volumes of data • It is possible to generate huge amounts of data. • That is the purpose of data collating tools – Logstash – Splunk • Features of such tools – Collating data from different instances – Visualization – Filtering – Organizing data – Reports
  134. 134. © Matthew Bass 2013 Alarms • An alarm is a specific message about some condition needing attention. – Can be e-mail, text, or on screen for operators. • Problems with alarms – False positives – an alarm is raised without justification – False negatives – justification exists but no alarm is raised.
  135. 135. © Matthew Bass 2013 Summary • Distributed coordination problems are simplified when using a tool such as Zookeeper • You must expect failure in the cloud and prepare for it. • A disaster is when everything has failed and you need to have business continuity plans • Flexibility in the cloud is managed by setting configuration parameters and they need to be managed. • Monitoring lets you know what is going on with your system from whatever perspective you wish. But, you must choose your perspective.
  136. 136. © Matthew Bass 2013 Questions??