Service Primitives for Internet Scale Applications


Published on

A general framework to describe internet scale applications and characterize the functional properties that can be traded away to improve the following operational metrics:

* Throughput (how many user requests/sec?)

* Interactivity (latency, how fast user requests finish?)

* Availability (% of time user perceives service as up), including fast recovery to improve availability

* TCO (Total Cost of Ownership)

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Service Primitives for Internet Scale Applications

    1. 1. Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University
    2. 2. Interactive Internet-Scale Application? <ul><li>Millions of users. </li></ul>Global LB Local LB Presentation Servers + $ LB Application Servers + $ Fail over State Replica Local LB Presentation Servers + $ Presentation Servers + $ Application Servers + $ Application Servers + $ Data Center State PS + $ LB AS + $ Fail over Local LB State PS + $ LB AS + $ Fail over
    3. 3. Motivation <ul><li>A general framework to describe IIA’s and characterize the functional properties that can be traded away to improve the following operational metrics: </li></ul><ul><ul><li>Throughput (how many user requests/sec?) </li></ul></ul><ul><ul><li>Interactivity (latency, how fast user requests finish?) </li></ul></ul><ul><ul><li>Availability (% of time user perceives service as up), including fast recovery to improve availability </li></ul></ul><ul><ul><li>TCO (Total Cost of Ownership) </li></ul></ul><ul><li>In particular, enumerate architectural primitives that expose partial degradation of functional properties and illustrate how they can be built with “commodity” HW. </li></ul>
    4. 4. Recall ACID <ul><li>Atomicity: For a transaction involving two or more discrete pieces of information, either all pieces changed are committed or none. </li></ul><ul><li>Consistency: A transaction creates a new valid state obeying all user integrity constraints. </li></ul><ul><li>Isolation: Changes from non-committed transactions remains hidden from all other concurrent transactions (Serializable, Repeatable-R, Commited-R, Uncommit-R) </li></ul><ul><li>Durability: Committed data survives beyond system restarts and storage failures. </li></ul>
    5. 5. ACID is too much for Internet scale <ul><li>Yahoo UDB: tens of thousands of reads/sec, up to 10k writes/sec </li></ul><ul><li>Geoplexing used for both disaster recovery and scalability, but eager replication (strong consistency) across replicas scales poorly </li></ul><ul><ul><li>If total DB size grows with # nodes, deadlock rate increases at the same rate as number of nodes </li></ul></ul><ul><ul><li>If DB size grows sublinearly, deadlock rate increases as cube of number of nodes </li></ul></ul><ul><li>Even if we could use transactional DB’s and eager replication, cost would be too high </li></ul>
    6. 6. The New Properties <ul><li>Durability (State): Hard, Soft, Stateless </li></ul><ul><li>Consistency: Strong, Eventual, Weak, NonC </li></ul><ul><li>Completeness: Full, Incomp-R, Lossy-W </li></ul><ul><li>Visibility: User, Entity, World </li></ul>
    7. 7. Durability (Hard, Soft, Stateless) <ul><li>Hard: This is permanent state in the original sense of the D in ACID. </li></ul><ul><li>Soft: This is temporary storage in the RAM sense, i.e. if power fails then data is lost. This is cheaper and acceptable if user can rebuild state quickly. </li></ul><ul><li>Stateless: No need to store state on behalf of the user. </li></ul>
    8. 8. Consistency (Strong, Eventual, Weak) <ul><li>Eventual: after a write, there is some time t after which all reads see the new value. (eg caching) </li></ul><ul><li>Strong: in addition, before time t, no reads see the new value (single-copy ACID consistency) </li></ul><ul><li>Weak: This is weak consistency in the TACT sense - captures ordering inaccuracies, or persistent staleness. </li></ul>
    9. 9. Completeness (Full, Incomp, Lossy) <ul><li>Complete: all updates either succeed, or fail synchronously. All queries return 100% accurate data. </li></ul><ul><li>Incomplete Queries: This is aggregated lossy reads over partitioned state, or state sampling. The best example here is Inktomi’s distributed search where its ok that some partitions not return results under load. </li></ul><ul><li>Lossy Updates: This means that its ok for some commited writes to not make it. Example: Lossy Counters and online polls. </li></ul>
    10. 10. Visibility (World, Entity, User) <ul><li>World: The state and changes to it are visible to all the world, e.g. listing a product on eBay. </li></ul><ul><li>Entity: State is only visible to a group of users, or within a specific subset of the data (e.g. eBay Jewlery) </li></ul><ul><li>User: The state and changes to it are only visible to the user interacting with it, e.g. the MyYahoo user profile. This could be simpler to implement using ReadMyWrites techniques. </li></ul>
    11. 11. Architectural Primitives Interactiveness, Graceful Degradation Weak Consistency Lossy/Sampled Aggregation Interactiveness, Graceful Degradation Entity Visibility Partitioning Interactiveness, Availability, Throughput Eventual Consistency Caching, Replication Gains Trades Primitives
    12. 12. Examples of Primitives <ul><li>LossyUpdate(key,newVal) </li></ul><ul><li>LossyAccumulator(key, updateOp) - for commutative ops </li></ul><ul><li>LossyAggregate(searchKeys) - lossy search of an index </li></ul>
    13. 13. LossyUpdate implementation <ul><li>LossyUpdate </li></ul><ul><ul><li>Steve Gribble’s DHT: atomic ops, single-copy consistency; during failure recovery, reads are slower and writes are refused </li></ul></ul><ul><ul><li>If update occurs while updated partition is recovering => fail </li></ul></ul><ul><ul><li>Otherwise, update is persistent </li></ul></ul><ul><ul><li>When is this useful? </li></ul></ul><ul><li>LossyAccumulator (for hit counter, online poll, etc) </li></ul><ul><ul><li>Every period T, in-memory sub-accumulators from worker nodes are swept to persistent copy </li></ul></ul><ul><ul><li>At the same time, current value of master accumulator is read by each worker node, to serve reads locally </li></ul></ul><ul><ul><li>Worker nodes don’t backup in-memory copy => fast restart </li></ul></ul><ul><ul><li>Can bound loss rate of accumulator and inconsistency in read </li></ul></ul>
    14. 14. What is given up <ul><li>What is given up </li></ul><ul><ul><li>Strict consistency of read copies of accumulator </li></ul></ul><ul><ul><li>Precision of accumulator value (lost updates) </li></ul></ul><ul><li>What is gained: fast recovery for each node, continuous operation despite transient per-node failures </li></ul>