Bathcamp 2010-riak


Published on

Published in: Technology, News & Politics
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bathcamp 2010-riak

  1. 1. Timothy Perrett Bath Camp 2010
  2. 2. What is Riak? • Documented orientated database • Written in Erlang • Based on Dynamo[1] and CAP Theorem[2] • Highly fault tolerant • HTTP and ProtoBuff interface • Write MapReduce in Erlang or JavaScript 1. 2.
  3. 3. Same, Same but different • Riak solves similar problems to MongoDB • Semi-structured data modeled as "documents” • Storage of non-document data in the database • High write-availability • Riak is intrinsically multi-node scalable • Mongo in comparison is single system (+ sharding) • Riak achieves availability via quorum writes • Mongo uses performant in-place writes • Riak uses “masterless” replication
  4. 4. N/R/W – Dynamo N = Number of replicas to store R = Number of replicas needed to read W = Number of replicas needed to read • These principals first appeared in an Amazon research paper known as Dynamo
  5. 5. • 160bit integer key space. Each node that joins is assigned part of that space for consistent hashing • Hashing means any node can service any request making the cluster masterless and eventually consistant Number of replicas
  6. 6. • Number of replies before Riak gives the client a successful reply. • Tries to access all nodes, but as soon as the N/R is satisfied a response is given Reads
  7. 7. • Same as reads; W implies the number of successful nodes that must reply before the write is considered consistent by the client Writes
  8. 8. Extreme example • Given N=10, R=W=2 we could have 8 nodes down and the cluster would still be fully available to all clients
  9. 9. What does this all mean? • N/R/W specified at request time, so each client can specify its own tolerance for outages dynamically • Despite any outages within the cluster, the whole cluster can still appear available based on N/R/W • Given N=3 and R=W=2, we can have 3-2=1 node down/unreachable/laggy in the cluster • Stupidly high availability complete with eventual consistency controlled by dynamic clients
  10. 10. Brewer’s CAP Theorem • Consistency • Availability • Partition Tolerance • You cant have all things, all the time… • …but you can have some of each, all the time! • Riak is about choosing your own levels of each according to your use case
  11. 11. Consistency • Start with document version zero • Things get redistributed and n0 and n2 are sitting in NYC and n1 and n3 are in London • What if stuff changes??
  12. 12. Consistency • Uh oh: inconsistency • Both parts of the cluster are still fully available • NYC serves v1 whilst London serves v0 • The network resumes and Riak determines the latest version by using vector clocks
  13. 13. Consistency • What if both sides of the Atlantic changed? • Riak is unable to determine which is the right document, both are returned to the client with an indication of the inconsistency
  14. 14. • Distributed, fault-tolerant full-text searching • Lucene syntax for queries • No need for index sharding • Linier scaling • Double the number of nodes to get double the search capacity (awesome!) • Search via: • Fields, wildcards, fuzzy text or token proximity Riak Search
  15. 15. Questions?