• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Distributing Your Data
 

Distributing Your Data

on

  • 1,261 views

A presentation on a few of the fundamental concepts behind distributed data stores. Presented at Devnation Portland 2010 http://devnation.us/events/10.

A presentation on a few of the fundamental concepts behind distributed data stores. Presented at Devnation Portland 2010 http://devnation.us/events/10.

Statistics

Views

Total Views
1,261
Views on SlideShare
1,123
Embed Views
138

Actions

Likes
0
Downloads
6
Comments
0

4 Embeds 138

http://sorescode.com 130
http://localhost 4
http://l 3
http://www.google.ca 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Distributing Your Data Distributing Your Data Presentation Transcript

    • HELLO DEVDAY PDX 2010
    • Justin Marney HELLO DEVDAY PDX 2010
    • Viget Labs Justin Marney HELLO DEVDAY PDX 2010
    • GMU BCS ‘06 Viget Labs Justin Marney HELLO DEVDAY PDX 2010
    • Ruby ‘07 GMU BCS ‘06 Viget Labs Justin Marney PDX 2010
    • Viget ‘08 Ruby ‘07 GMU BCS ‘06 Viget Labs PDX 2010
    • PDX 2010
    • PDX 2010
    • DISTRIBUTING YOUR DATA PDX 2010
    • WHY? DISTRIBUTING YOUR DATA PDX 2010
    • web applications are judged by their level of availability WHY? DISTRIBUTING YOUR DATA PDX 2010
    • ability to continue operating during failure scenarios web applications are judged by their level of availability WHY? DISTRIBUTING PDX 2010 YOUR DATA
    • ability to manage availability during failure scenarios ability to continue operating during failure scenarios web applications are judged by their level of availability WHY? PDX 2010
    • increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios web applications are judged by their level of availability PDX 2010 WHY?
    • increase durability increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios web applications are judged by their level of availability PDX 2010
    • increase scalability increase durability increase throughput ability to manage availability during node failure ability to continue operating during failure scenarios PDX 2010
    • SCALABILITY "I can add twice as much X and get twice as much Y." X = processor, RAM, disks, servers, bandwidth Y = throughput, storage space, uptime PDX 2010
    • SCALABILITY scalability is a ratio. 2:2 = linear scalability ratio scalability ratio allows you to predict how much it will cost you to grow. PDX 2010
    • SCALABILITY UP/DOWN/VERTICAL/ HORIZONTAL/L/R/L/R/A/ B/START PDX 2010
    • SCALABILITY UP grow your infrastructure multiple data centers higher bandwidth faster machines PDX 2010
    • SCALABILITY DOWN shrink your infrastructure mobile set-top laptop PDX 2010
    • SCALABILITY VERTICAL add to a single node CPU RAM RAID PDX 2010
    • SCALABILITY HORIZONTAL add more nodes distribute the load commodity cost limited only by capital PDX 2010
    • @gary_hustwit: Dear Twitter: when a World Cup match is at the 90th minute, you might want to turn on a few more servers. PDX 2010
    • ASYNCHRONOUS A distributed transaction is bound by availability of all nodes. PDX 2010
    • ASYNCHRONOUS A distributed transaction is bound by availability of all nodes. (.99^1) = .99 (.99^2) = .98 (.99^3) = .97 PDX 2010
    • ASYNCHRONOUS Asynchronous systems operate without the concept of global state. The concurrency model more accurately reflects the real world. PDX 2010
    • ASYNCHRONOUS Asynchronous systems operate without the concept of global state. The concurrency model more accurately reflects the real world. What about my ACID!? PDX 2010
    • ACID Atomic Series of database operations either all occur, or nothing occurs. Consistent Transaction does not violate any integrity constraints during execution. Isolated Cannot access data that is modified during an incomplete transaction. Durable Transactions that have committed will survive permanently. PDX 2010
    • ACID Defines a set of characteristics that aim to ensure consistency. What happens when we realize that in order scale we need to distribute our data and handle asynchronous operations? PDX 2010
    • ACID Without global state, no Atomicity. Without a linear timeline, no transactions and no Isolation. The canonical location of data might not exist, therefore no D. PDX 2010
    • Without A, I, or D, Consistency in terms of entity integrity is no longer guaranteed. PDX 2010
    • CAP Theorem Eric Brewer @ 2000 Principles of Distributed Computing (PODC). Seth Gilbert and Nancy Lynch published a formal proof in 2002. PDX 2010
    • CAP Acronym Consistency: Multiple values for the same piece of data are not allowed. Availability: If a non-failing node can be reached the system functions. Partition-Tolerance: Regardless of packet loss, if a non-failing node is reached the system functions. PDX 2010
    • CAP Theorem Consistency, Availability, Partition- Tolerance: Choose One... PDX 2010
    • CAP Theorem Single node systems bound by CAP. 100% Partition-tolerant 100% Consistent No Availability Guarantee PDX 2010
    • CAP Theorem Multi-node systems bound by CAP. CA : DT, 2PC, ACID CP : Quorum, distributed databases AP : Dynamo, no ACID PDX 2010
    • CAP Theorem CAP doesn't say AP systems are the solution to your problem. Not an absolute decision. Most systems are a hybrid of CA, CP, & AP. PDX 2010
    • CAP Theorem Understand the trade-offs and use that understanding to build a system that fails predictably. Enables you to build a system that degrades gracefully during a failure. PDX 2010
    • BASE Dan Pritchett BASE: An ACID Alternative Associate for Computing Machinery Queue, 2008 PDX 2010
    • BASE BASE: An ACID Alternative Basically Available Soft State Eventually Consistent PDX 2010
    • BASE BASE: An ACID Alternative Basically Available Soft State Eventually Consistent PDX 2010
    • Eventually Consistent Rename to Managed Consistency. Does not mean probable or hopeful or indefinite time in the future. Describes what happens during a failure. PDX 2010
    • Eventually Consistent During certain scenarios a decision must be made to either return inconsistent data or deny a request. EC allows you control the level of consistency vs. availability in your application. PDX 2010
    • Eventually Consistent In order to achieve availability in an asynchronous system, accept that failures are going to happen. Understand failure points and know what you are willing to give up in order to achieve availability. PDX 2010
    • How can we model the operations we perform on our data to be asynchronous & EC? PDX 2010
    • Model system as a network of independent components. Partition components along functional boundaries. Don't interact with your data as one big global state. PDX 2010
    • This doesn't meant every part of your system must operate this way! Use ACID 2.0 to help identify and architect components than can. PDX 2010
    • ACID 2.0 Associative Order of operations does not change the result. Commutative Operations can be aggregated in any order. Idempotent Operation can be applied multiple times without changing the result. Distributed Operations are distributed and processed asynchronously. PDX 2010
    • OPS BROS Incremental scalability Homogeneous node responsibilities Heterogeneous node capabilities PDX 2010
    • LINKS Base: An ACID Alternative Into the Clouds on New Acid Brewer's CAP theorem Embracing Concurrency At Scale Amazon's Dynamo PDX 2010
    • ME http://sorescode.com http://github.com/gotascii http://spkr8.com/s/1 @vigemarn PDX 2010