HELLO DEVDAY




PDX 2010
Justin Marney
           HELLO DEVDAY



PDX 2010
Viget Labs
           Justin Marney
           HELLO DEVDAY


PDX 2010
GMU BCS ‘06
           Viget Labs
           Justin Marney
           HELLO DEVDAY
PDX 2010
Ruby ‘07
           GMU BCS ‘06
           Viget Labs
           Justin Marney
PDX 2010
Viget ‘08
           Ruby ‘07
           GMU BCS ‘06
           Viget Labs
PDX 2010
PDX 2010
PDX 2010
DISTRIBUTING
           YOUR DATA



PDX 2010
WHY?
           DISTRIBUTING
           YOUR DATA


PDX 2010
web applications are judged
           by their level of availability

           WHY?
           DISTRIBUTING
           ...
ability to continue operating
           during failure scenarios
           web applications are judged
           by the...
ability to manage availability
           during failure scenarios
           ability to continue operating
           dur...
increase throughput
           ability to manage availability
           during node failure
           ability to continu...
increase durability
           increase throughput
           ability to manage availability
           during node failur...
increase scalability
           increase durability
           increase throughput
           ability to manage availabili...
SCALABILITY
           "I can add twice as much X
           and get twice as much Y."
           X = processor, RAM, disk...
SCALABILITY
           scalability is a ratio.
           2:2 = linear scalability ratio
           scalability ratio allo...
SCALABILITY
           UP/DOWN/VERTICAL/
           HORIZONTAL/L/R/L/R/A/
           B/START




PDX 2010
SCALABILITY
           UP
           grow your infrastructure
           multiple data centers
           higher bandwidth...
SCALABILITY
           DOWN
           shrink your infrastructure
           mobile
           set-top
           laptop

...
SCALABILITY
           VERTICAL
           add to a single node
           CPU
           RAM
           RAID




PDX 2010
SCALABILITY
           HORIZONTAL
           add more nodes
           distribute the load
           commodity cost
     ...
@gary_hustwit: Dear
           Twitter: when a World Cup
           match is at the 90th
           minute, you might want...
ASYNCHRONOUS
     A distributed transaction is bound
     by availability of all nodes.




PDX 2010
ASYNCHRONOUS
     A distributed transaction is bound
     by availability of all nodes.

     (.99^1) = .99
     (.99^2) =...
ASYNCHRONOUS
     Asynchronous systems operate
     without the concept of global state.
     The concurrency model more
 ...
ASYNCHRONOUS
     Asynchronous systems operate
     without the concept of global state.
     The concurrency model more
 ...
ACID
     Atomic
     Series of database operations either all occur, or nothing occurs.


     Consistent
     Transactio...
ACID
     Defines a set of characteristics that
     aim to ensure consistency.
     What happens when we realize that
    ...
ACID
     Without global state, no Atomicity.
     Without a linear timeline, no
     transactions and no Isolation.
     ...
Without A, I, or D, Consistency in
     terms of entity integrity is no longer
     guaranteed.




PDX 2010
CAP Theorem
     Eric Brewer @ 2000 Principles of
     Distributed Computing (PODC).
     Seth Gilbert and Nancy Lynch
   ...
CAP Acronym
     Consistency: Multiple values for the
     same piece of data are not allowed.
     Availability: If a non...
CAP Theorem

     Consistency, Availability, Partition-
     Tolerance: Choose One...




PDX 2010
CAP Theorem
     Single node systems bound by CAP.
     100% Partition-tolerant
     100% Consistent
     No Availability ...
CAP Theorem
     Multi-node systems bound by CAP.
     CA : DT, 2PC, ACID
     CP : Quorum, distributed databases
     AP ...
CAP Theorem
     CAP doesn't say AP systems are
     the solution to your problem.
     Not an absolute decision.
     Mos...
CAP Theorem
     Understand the trade-offs and use
     that understanding to build a
     system that fails predictably.
...
BASE
     Dan Pritchett
     BASE: An ACID Alternative
     Associate for Computing Machinery
     Queue, 2008



PDX 2010
BASE
     BASE: An ACID Alternative
     Basically Available
     Soft State
     Eventually Consistent




PDX 2010
BASE
     BASE: An ACID Alternative
     Basically Available
     Soft State
     Eventually Consistent




PDX 2010
Eventually Consistent
     Rename to Managed Consistency.
     Does not mean probable or hopeful
     or indefinite time in...
Eventually Consistent
     During certain scenarios a decision
     must be made to either return
     inconsistent data o...
Eventually Consistent
     In order to achieve availability in an
     asynchronous system, accept that
     failures are ...
How can we model the operations
     we perform on our data to be
     asynchronous & EC?




PDX 2010
Model system as a network of
     independent components.
     Partition components along
     functional boundaries.
    ...
This doesn't meant every part of
     your system must operate this way!
     Use ACID 2.0 to help identify and
     archi...
ACID 2.0
     Associative
     Order of operations does not change the result.


     Commutative
     Operations can be a...
OPS BROS
     Incremental scalability
     Homogeneous node responsibilities
     Heterogeneous node capabilities




PDX ...
LINKS
     Base: An ACID Alternative
     Into the Clouds on New Acid
     Brewer's CAP theorem
     Embracing Concurrency...
ME
     http://sorescode.com
     http://github.com/gotascii
     http://spkr8.com/s/1
     @vigemarn



PDX 2010
Upcoming SlideShare
Loading in...5
×

Distributing Your Data

1,161
-1

Published on

A presentation on a few of the fundamental concepts behind distributed data stores. Presented at Devnation Portland 2010 http://devnation.us/events/10.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,161
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Distributing Your Data

  1. 1. HELLO DEVDAY PDX 2010
  2. 2. Justin Marney HELLO DEVDAY PDX 2010
  3. 3. Viget Labs Justin Marney HELLO DEVDAY PDX 2010
  4. 4. GMU BCS ‘06 Viget Labs Justin Marney HELLO DEVDAY PDX 2010
  5. 5. Ruby ‘07 GMU BCS ‘06 Viget Labs Justin Marney PDX 2010
  6. 6. Viget ‘08 Ruby ‘07 GMU BCS ‘06 Viget Labs PDX 2010
  7. 7. PDX 2010
  8. 8. PDX 2010
  9. 9. DISTRIBUTING YOUR DATA PDX 2010
  10. 10. WHY? DISTRIBUTING YOUR DATA PDX 2010
  11. 11. web applications are judged by their level of availability WHY? DISTRIBUTING YOUR DATA PDX 2010
  12. 12. ability to continue operating during failure scenarios web applications are judged by their level of availability WHY? DISTRIBUTING PDX 2010 YOUR DATA
  13. 13. ability to manage availability during failure scenarios ability to continue operating during failure scenarios web applications are judged by their level of availability WHY? PDX 2010
  14. 14. increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios web applications are judged by their level of availability PDX 2010 WHY?
  15. 15. increase durability increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios web applications are judged by their level of availability PDX 2010
  16. 16. increase scalability increase durability increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios PDX 2010
  17. 17. SCALABILITY "I can add twice as much X and get twice as much Y." X = processor, RAM, disks, servers, bandwidth Y = throughput, storage space, uptime PDX 2010
  18. 18. SCALABILITY scalability is a ratio. 2:2 = linear scalability ratio scalability ratio allows you to predict how much it will cost you to grow. PDX 2010
  19. 19. SCALABILITY UP/DOWN/VERTICAL/ HORIZONTAL/L/R/L/R/A/ B/START PDX 2010
  20. 20. SCALABILITY UP grow your infrastructure multiple data centers higher bandwidth faster machines PDX 2010
  21. 21. SCALABILITY DOWN shrink your infrastructure mobile set-top laptop PDX 2010
  22. 22. SCALABILITY VERTICAL add to a single node CPU RAM RAID PDX 2010
  23. 23. SCALABILITY HORIZONTAL add more nodes distribute the load commodity cost limited only by capital PDX 2010
  24. 24. @gary_hustwit: Dear Twitter: when a World Cup match is at the 90th minute, you might want to turn on a few more servers. PDX 2010
  25. 25. ASYNCHRONOUS A distributed transaction is bound by availability of all nodes. PDX 2010
  26. 26. ASYNCHRONOUS A distributed transaction is bound by availability of all nodes. (.99^1) = .99 (.99^2) = .98 (.99^3) = .97 PDX 2010
  27. 27. ASYNCHRONOUS Asynchronous systems operate without the concept of global state. The concurrency model more accurately reflects the real world. PDX 2010
  28. 28. ASYNCHRONOUS Asynchronous systems operate without the concept of global state. The concurrency model more accurately reflects the real world. What about my ACID!? PDX 2010
  29. 29. ACID Atomic Series of database operations either all occur, or nothing occurs. Consistent Transaction does not violate any integrity constraints during execution. Isolated Cannot access data that is modified during an incomplete transaction. Durable Transactions that have committed will survive permanently. PDX 2010
  30. 30. ACID Defines a set of characteristics that aim to ensure consistency. What happens when we realize that in order scale we need to distribute our data and handle asynchronous operations? PDX 2010
  31. 31. ACID Without global state, no Atomicity. Without a linear timeline, no transactions and no Isolation. The canonical location of data might not exist, therefore no D. PDX 2010
  32. 32. Without A, I, or D, Consistency in terms of entity integrity is no longer guaranteed. PDX 2010
  33. 33. CAP Theorem Eric Brewer @ 2000 Principles of Distributed Computing (PODC). Seth Gilbert and Nancy Lynch published a formal proof in 2002. PDX 2010
  34. 34. CAP Acronym Consistency: Multiple values for the same piece of data are not allowed. Availability: If a non-failing node can be reached the system functions. Partition-Tolerance: Regardless of packet loss, if a non-failing node is reached the system functions. PDX 2010
  35. 35. CAP Theorem Consistency, Availability, Partition- Tolerance: Choose One... PDX 2010
  36. 36. CAP Theorem Single node systems bound by CAP. 100% Partition-tolerant 100% Consistent No Availability Guarantee PDX 2010
  37. 37. CAP Theorem Multi-node systems bound by CAP. CA : DT, 2PC, ACID CP : Quorum, distributed databases AP : Dynamo, no ACID PDX 2010
  38. 38. CAP Theorem CAP doesn't say AP systems are the solution to your problem. Not an absolute decision. Most systems are a hybrid of CA, CP, & AP. PDX 2010
  39. 39. CAP Theorem Understand the trade-offs and use that understanding to build a system that fails predictably. Enables you to build a system that degrades gracefully during a failure. PDX 2010
  40. 40. BASE Dan Pritchett BASE: An ACID Alternative Associate for Computing Machinery Queue, 2008 PDX 2010
  41. 41. BASE BASE: An ACID Alternative Basically Available Soft State Eventually Consistent PDX 2010
  42. 42. BASE BASE: An ACID Alternative Basically Available Soft State Eventually Consistent PDX 2010
  43. 43. Eventually Consistent Rename to Managed Consistency. Does not mean probable or hopeful or indefinite time in the future. Describes what happens during a failure. PDX 2010
  44. 44. Eventually Consistent During certain scenarios a decision must be made to either return inconsistent data or deny a request. EC allows you control the level of consistency vs. availability in your application. PDX 2010
  45. 45. Eventually Consistent In order to achieve availability in an asynchronous system, accept that failures are going to happen. Understand failure points and know what you are willing to give up in order to achieve availability. PDX 2010
  46. 46. How can we model the operations we perform on our data to be asynchronous & EC? PDX 2010
  47. 47. Model system as a network of independent components. Partition components along functional boundaries. Don't interact with your data as one big global state. PDX 2010
  48. 48. This doesn't meant every part of your system must operate this way! Use ACID 2.0 to help identify and architect components than can. PDX 2010
  49. 49. ACID 2.0 Associative Order of operations does not change the result. Commutative Operations can be aggregated in any order. Idempotent Operation can be applied multiple times without changing the result. Distributed Operations are distributed and processed asynchronously. PDX 2010
  50. 50. OPS BROS Incremental scalability Homogeneous node responsibilities Heterogeneous node capabilities PDX 2010
  51. 51. LINKS Base: An ACID Alternative Into the Clouds on New Acid Brewer's CAP theorem Embracing Concurrency At Scale Amazon's Dynamo PDX 2010
  52. 52. ME http://sorescode.com http://github.com/gotascii http://spkr8.com/s/1 @vigemarn PDX 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×