View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Same, Same but different Riak solves similar problems to MongoDB Semi-structured data modeled as "documents” Storage of non-document data in the database High write-availability Riak is intrinsically multi-node scalable Mongo in comparison is single system (+ sharding) Riak achieves availability via quorum writes Mongo uses performant in-place writes Riak uses “masterless” replication
N/R/W – Dynamo N = Number of replicas to store R = Number of replicas needed to read W = Number of replicas needed to read These principals first appeared in an Amazon research paper known as Dynamo
160bit integer key space. Each node that joins is assigned part of that space for consistent hashing Hashing means any node can service any request making the cluster masterless and eventually consistant Number of replicas
Number of replies before Riak gives the client a successful reply. Tries to access all nodes, but as soon as the N/R is satisfied a response is given Reads
Same as reads; W implies the number of successful nodes that must reply before the write is considered consistent by the client Writes
Extreme example Given N=10, R=W=2 we could have 8 nodes down and the cluster would still be fully available to all clients
What does this all mean? N/R/W specified at request time, so eachclient can specify its own tolerance foroutages dynamically Despite any outages within the cluster, the whole cluster can still appear available based on N/R/W Given N=3 and R=W=2, we can have 3-2=1 node down/unreachable/laggy in the cluster Stupidly high availability complete with eventual consistency controlled by dynamic clients
Brewer’s CAP Theorem Consistency Availability Partition Tolerance You cant have all things, all the time… …but you can have some of each, all the time! Riak is about choosing your own levels of each according to your use case
Consistency Start with document version zero Things get redistributed and n0 and n2 are sitting in NYC and n1 and n3 are in London What if stuff changes??
Consistency Uh oh: inconsistency Both parts of the cluster are still fully available NYC serves v1 whilst London serves v0 The network resumes and Riak determines the latest version by using vector clocks
Consistency What if both sides of the Atlantic changed? Riak is unable to determine which is the right document, both are returned to the client with an indication of the inconsistency
Distributed, fault-tolerant full-text searching Lucene syntax for queries No need for index sharding Linier scaling Double the number of nodes to get double the search capacity (awesome!) Search via: Fields, wildcards, fuzzy text or token proximity Riak Search