Distributing Your Data

1,352 views

Published on

A presentation on a few of the fundamental concepts behind distributed data stores. Presented at Devnation Portland 2010 http://devnation.us/events/10.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,352
On SlideShare
0
From Embeds
0
Number of Embeds
209
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Distributing Your Data

  1. 1. HELLO DEVDAY PDX 2010
  2. 2. Justin Marney HELLO DEVDAY PDX 2010
  3. 3. Viget Labs Justin Marney HELLO DEVDAY PDX 2010
  4. 4. GMU BCS ‘06 Viget Labs Justin Marney HELLO DEVDAY PDX 2010
  5. 5. Ruby ‘07 GMU BCS ‘06 Viget Labs Justin Marney PDX 2010
  6. 6. Viget ‘08 Ruby ‘07 GMU BCS ‘06 Viget Labs PDX 2010
  7. 7. PDX 2010
  8. 8. PDX 2010
  9. 9. DISTRIBUTING YOUR DATA PDX 2010
  10. 10. WHY? DISTRIBUTING YOUR DATA PDX 2010
  11. 11. web applications are judged by their level of availability WHY? DISTRIBUTING YOUR DATA PDX 2010
  12. 12. ability to continue operating during failure scenarios web applications are judged by their level of availability WHY? DISTRIBUTING PDX 2010 YOUR DATA
  13. 13. ability to manage availability during failure scenarios ability to continue operating during failure scenarios web applications are judged by their level of availability WHY? PDX 2010
  14. 14. increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios web applications are judged by their level of availability PDX 2010 WHY?
  15. 15. increase durability increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios web applications are judged by their level of availability PDX 2010
  16. 16. increase scalability increase durability increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios PDX 2010
  17. 17. SCALABILITY "I can add twice as much X and get twice as much Y." X = processor, RAM, disks, servers, bandwidth Y = throughput, storage space, uptime PDX 2010
  18. 18. SCALABILITY scalability is a ratio. 2:2 = linear scalability ratio scalability ratio allows you to predict how much it will cost you to grow. PDX 2010
  19. 19. SCALABILITY UP/DOWN/VERTICAL/ HORIZONTAL/L/R/L/R/A/ B/START PDX 2010
  20. 20. SCALABILITY UP grow your infrastructure multiple data centers higher bandwidth faster machines PDX 2010
  21. 21. SCALABILITY DOWN shrink your infrastructure mobile set-top laptop PDX 2010
  22. 22. SCALABILITY VERTICAL add to a single node CPU RAM RAID PDX 2010
  23. 23. SCALABILITY HORIZONTAL add more nodes distribute the load commodity cost limited only by capital PDX 2010
  24. 24. @gary_hustwit: Dear Twitter: when a World Cup match is at the 90th minute, you might want to turn on a few more servers. PDX 2010
  25. 25. ASYNCHRONOUS A distributed transaction is bound by availability of all nodes. PDX 2010
  26. 26. ASYNCHRONOUS A distributed transaction is bound by availability of all nodes. (.99^1) = .99 (.99^2) = .98 (.99^3) = .97 PDX 2010
  27. 27. ASYNCHRONOUS Asynchronous systems operate without the concept of global state. The concurrency model more accurately reflects the real world. PDX 2010
  28. 28. ASYNCHRONOUS Asynchronous systems operate without the concept of global state. The concurrency model more accurately reflects the real world. What about my ACID!? PDX 2010
  29. 29. ACID Atomic Series of database operations either all occur, or nothing occurs. Consistent Transaction does not violate any integrity constraints during execution. Isolated Cannot access data that is modified during an incomplete transaction. Durable Transactions that have committed will survive permanently. PDX 2010
  30. 30. ACID Defines a set of characteristics that aim to ensure consistency. What happens when we realize that in order scale we need to distribute our data and handle asynchronous operations? PDX 2010
  31. 31. ACID Without global state, no Atomicity. Without a linear timeline, no transactions and no Isolation. The canonical location of data might not exist, therefore no D. PDX 2010
  32. 32. Without A, I, or D, Consistency in terms of entity integrity is no longer guaranteed. PDX 2010
  33. 33. CAP Theorem Eric Brewer @ 2000 Principles of Distributed Computing (PODC). Seth Gilbert and Nancy Lynch published a formal proof in 2002. PDX 2010
  34. 34. CAP Acronym Consistency: Multiple values for the same piece of data are not allowed. Availability: If a non-failing node can be reached the system functions. Partition-Tolerance: Regardless of packet loss, if a non-failing node is reached the system functions. PDX 2010
  35. 35. CAP Theorem Consistency, Availability, Partition- Tolerance: Choose One... PDX 2010
  36. 36. CAP Theorem Single node systems bound by CAP. 100% Partition-tolerant 100% Consistent No Availability Guarantee PDX 2010
  37. 37. CAP Theorem Multi-node systems bound by CAP. CA : DT, 2PC, ACID CP : Quorum, distributed databases AP : Dynamo, no ACID PDX 2010
  38. 38. CAP Theorem CAP doesn't say AP systems are the solution to your problem. Not an absolute decision. Most systems are a hybrid of CA, CP, & AP. PDX 2010
  39. 39. CAP Theorem Understand the trade-offs and use that understanding to build a system that fails predictably. Enables you to build a system that degrades gracefully during a failure. PDX 2010
  40. 40. BASE Dan Pritchett BASE: An ACID Alternative Associate for Computing Machinery Queue, 2008 PDX 2010
  41. 41. BASE BASE: An ACID Alternative Basically Available Soft State Eventually Consistent PDX 2010
  42. 42. BASE BASE: An ACID Alternative Basically Available Soft State Eventually Consistent PDX 2010
  43. 43. Eventually Consistent Rename to Managed Consistency. Does not mean probable or hopeful or indefinite time in the future. Describes what happens during a failure. PDX 2010
  44. 44. Eventually Consistent During certain scenarios a decision must be made to either return inconsistent data or deny a request. EC allows you control the level of consistency vs. availability in your application. PDX 2010
  45. 45. Eventually Consistent In order to achieve availability in an asynchronous system, accept that failures are going to happen. Understand failure points and know what you are willing to give up in order to achieve availability. PDX 2010
  46. 46. How can we model the operations we perform on our data to be asynchronous & EC? PDX 2010
  47. 47. Model system as a network of independent components. Partition components along functional boundaries. Don't interact with your data as one big global state. PDX 2010
  48. 48. This doesn't meant every part of your system must operate this way! Use ACID 2.0 to help identify and architect components than can. PDX 2010
  49. 49. ACID 2.0 Associative Order of operations does not change the result. Commutative Operations can be aggregated in any order. Idempotent Operation can be applied multiple times without changing the result. Distributed Operations are distributed and processed asynchronously. PDX 2010
  50. 50. OPS BROS Incremental scalability Homogeneous node responsibilities Heterogeneous node capabilities PDX 2010
  51. 51. LINKS Base: An ACID Alternative Into the Clouds on New Acid Brewer's CAP theorem Embracing Concurrency At Scale Amazon's Dynamo PDX 2010
  52. 52. ME http://sorescode.com http://github.com/gotascii http://spkr8.com/s/1 @vigemarn PDX 2010

×