• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

On the Papers of Giants






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • The problem is, databases are complicated, and I’m just not that smart. But here I am, talking to you about databases, so I must have fixed that somehow.\n
  • My solution was to start a user group called ChicagoDB. Every month we read an academic paper, and then discuss it as a group. I saw a problem, and found a solution for it.\n
  • \n
  • \n
  • The problem I’m talking about is fixating on Solutions instead of problems. We tend to see this a lot when new gems are released. \n
  • \n
  • \n
  • We’re talking about the core of your application, it’s data. This is the heart of your app, and shouldn’t be left to ‘just picking something cool’\n
  • I think most of us would call ourselves responsible programmers. Part of that is having at least a basic understanding of the tools and technologies you use and recommend. Knowledge that extends the bullet points of a website or the readme of a gem.\n
  • Particular systems were designed with different goals in mind. Do those goals match yours?\n
  • What kind of data model is it? Does it only offer a simple key value store, or does it offer a richer model, maybe it’s a graph database?\n\nWhat kind of querying?\nCan I only do a primary key lookup?\nWhat about secondary indexes?\nWhat about adhoc querying for something like BI reporting? Embedding documents in Mongo makes this difficult, does it still match up to my problem space well, or can I live with that?\n\nThe more of these questions you can ask upfront, the better of your life will be.\n
  • While I was on the train going to the airport to fly over here, I saw this sign, and I think it sums it up pretty well. Either way you are investing time, you might as well gain as much knowledge as you can along the way.\n\nInvestigate before you invest, true for stock portfolios and databases.\n
  • With all of that said, let’s do a little bit of investigation.\n
  • \n
  • published in 2007.\nOutlined the building blocks of a distributed system\nconsistent hashing, vector clocks, gossip protocols, hinted handoff were all described, although none of these things were new\n
  • Riak, developed by Basho, is an open source implementation of the Dynamo system\n
  • \n
  • \n
  • \n
  • Eventually this setup will fail. Something will happen to your master database and your site will effectively go down. At this point, I think it would be pretty common for people to start looking at a master-master setup, maybe some sort of sharding. But there are other solutions that live outside of the relational world that can help.\n
  • At the heart of consistent hashing is the hash function. In our over simplified version, it only returns values between 1 and 100. Riak has a 160-bit integer space.\n\n\n
  • \n
  • Each available node in the cluster is pseudo-randomly mapped onto the ring.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • or in other words, we now have a system with extremely high, transparent, availability\n
  • In order to help distribute the load evenly, we break the ring into various sections, and assign each section a vnode. Then, those vnodes are mapped back to a real server in the cluster.\n
  • We all know that nothing comes for free, except for drinks last night, so what exactly have we given up?\n
  • \n
  • We’ve all been raised on acid.\n\nAtomicity - Modifications must follow an all or nothing rule. Meaning, they either complete or do not complete, but no data is left in some sort of between state\nConsistency - Nothing in your transaction will violate the rules of the database. Integrity. More importantly, the database goes from one consistent state to the next.\nIsolation - Each transaction operates independently of every other transaction. Other transactions cannot see the\nDurability - Once the database says that data is committed there is no opportunity for that to be undone. Things like database restart, kill-9ing the process, shouldn’t cause any data lose \n\nThe properties of ACID are always desirable, they’re just not always possible as transactional growth increases\n
  • Eric Brewer theorized this in 2000, and was published in 2002\n\nThe client perceives that a set of operations has occurred all at once.\nAll nodes are available for reading and writing data.\nOperations will complete, even if individual components are unavailable.\n\nAt any given point in time, we can only have two of these.\n
  • Just so we’re all clear. A partition is anytime a node can’t communicate with the rest of the cluster. Whether that be from hardware failure, or high latency\n
  • Since we’ve already determined that a node can die, we can go ahead and circle ‘P’. And really, the CAP theorem needs to be updated to reflect this, any distributed system will need tolerance for failures, to not have it is just not acceptable.\n
  • AP system, high availability, but weak consistency guarantee\n
  • Strong consistency. Low availability, requests only return true once a quorum is met.\n\nWhat would our system look like if we wanted to switch it to CP?\n
  • In order to guarantee us some strong consistency, we would need to do a blocking write. \n
  • BASE is the opposite of ACID, get it?\n\nThe System seems to be working all the time\nIt’s not consistent all the time\nbecomes consistent at some point in time\nWhere ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux.\n
  • \n
  • \n
  • \n
  • \n
  • A vector clock is an algorithm for generating partial ordering of events, and can be used to detect causality violations. Each time a node updates the value for a specific key, it increments its own part of the counter.\n
  • \n
  • \n
  • If merging doesn’t fit your domain model, the other option is to actually bubble this up the application layer, and having that figure it out.\n
  • Hinted handoffs are a way of dealing with node failure and recovery\n
  • \n
  • Like school girls, the nodes like to gossip.\n\nEach node randomly communicates with the other nodes in the cluster about it’s view of the state of the cluster.\n
  • By doing a little investigation we’ve now not only about consistent hashing, vector clocks, gossip protocols, read repairs, but we’ve also learned a bit about the problems they solve, and what we’re giving up in return. In other words, we’re winning.\n
  • \n
  • \n
  • \n

On the Papers of Giants On the Papers of Giants Presentation Transcript

  • On the Papers of Giants Understanding Data Storage @ethangunderson SRC 2011
  • NoSQL?
  • NoGreenland?
  • Is a key-value store
  • Is a document store
  • Is just difficult to use
  • </rant>
  • Hi, I’m EthanI work at
  • Hi, I’m EthanI <3 databases
  • Databases are complicated
  • ChicagoDB
  • The Ruby community has asimilar problem
  • And no, I’m not talking abouttesting frameworks
  • Solutions Problems
  • BombDB! Home Download BombDB SQLFeatures 10Scale! 5Maintenance Free! 0Query language Cool Factorthat is not sql! Fire all of your DBAs!
  • O. M. G.I must use this!
  • It’s not about picking atechnology because it’s cool
  • It’s about doing yourhomework
  • What design goals were inmind?
  • What kind of data model?What kind of querying?
  • Investigate before you invest
  • Let’s do a little investigation
  • Questions?Comments? Shout out
  • Dynamo
  • Let’s start simple Reads Writes
  • Let’s start simple Reads Writes
  • Let’s start simple Reads Writes
  • Let’s start simple Reads Writes
  • Consistent Hashing hash(‘key’) => {1-100}
  • 100 1
  • hash(‘scotruby’) => 15
  • Our insert
  • 76-10051-75 1-25 Canonical 26-50
  • 76-10051-75 1-25Replica Canonical 26-50 Replica
  • 76-10051-75 1-50
  • As long as one node is up, wecan read/write data
  • Distributing Load: Vnodes
  • Tradeoffs
  • Dropping ACID
  • ACIDAtomicityConsistencyIsolation
  • CAPConsistencyAvailability
  • Partition Tolerance
  • CA P
  • CA P
  • CA P
  • 76-10051-75 1-25Replica Canonical 26-50 Replica
  • BASEBasically AvailableSoft State
  • Read Repair Canonical Replica read(‘scotruby’) Replica
  • Read Repair “#winning” Canonical “#winning” Replica read(‘scotruby’) Replica “#killingit”
  • Read Repair Canonical Replica read(‘scotruby’) “#winning” Replica
  • Read Repair “Awesome” Canonical “#winning” Replica read(‘scotruby’) Replica “#killingit”
  • Vector Clocks Canonical Replica (a:1) (a:1) (a:2) (a:2) (a:3) (a:3)
  • Vector Clocks: superset Canonical Replica (a:1) (a:1) (a:2) (a:2) (a:2) (a:2, b:1) (a:3, b:1) (a:3, b:1)
  • Vector Clocks: merging Canonical Replica (a:1) (a:1) (a:2) (a:2) (a:3) (a:2, b:1) (a:3, b:1) (a:3, b:1)
  • Vector Clocks: conflict Canonical Replica (a:1) (a:1) (a:2) (a:2) (a:3) (a:2, b:1) (a:4) (a:2, b:1)
  • Hinted Handoff
  • Hinted Handoff Hey slacker, here’s the stuff you missed
  • Gossip Protocol Yo, looks likenode 1 is down Don’t worry about it, I’ll pick up it’s key range.
  • So that’s Dynamo in a nutshell
  • TakeawaysInvestigate before you investChoose the right tool for the job
  • HomeworkCodds Relational Modelhttp://www.seas.upenn.edu/~zives/03f/cis550/codd.pdfCAPhttp://citeseerx.ist.psu.edu/viewdoc/download?doi= Dynamohttp://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfGoogles BigTablehttp://labs.google.com/papers/bigtable-osdi06.pdfWerner Vogels’ Bloghttp://www.allthingsdistributed.com/
  • Thanks!