Timothy Perrett
Bath Camp 2010
What is Riak?
• Documented orientated database
• Written in Erlang
• Based on Dynamo[1] and CAP Theorem[2]
• Highly fault tolerant
• HTTP and ProtoBuff interface
• Write MapReduce in Erlang or JavaScript
1. http://goo.gl/r8Np
2. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Same, Same but different
• Riak solves similar problems to MongoDB
• Semi-structured data modeled as "documents”
• Storage of non-document data in the database
• High write-availability
• Riak is intrinsically multi-node scalable
• Mongo in comparison is single system (+ sharding)
• Riak achieves availability via quorum writes
• Mongo uses performant in-place writes
• Riak uses “masterless” replication
N/R/W – Dynamo
N = Number of replicas to store
R = Number of replicas needed to read
W = Number of replicas needed to read
• These principals first appeared in an Amazon
research paper known as Dynamo
• 160bit integer key
space. Each node that
joins is assigned part
of that space for
consistent hashing
• Hashing means any
node can service any
request making the
cluster masterless and
eventually consistant
Number of replicas
• Number of replies
before Riak gives
the client a
successful reply.
• Tries to access all
nodes, but as soon
as the N/R is
satisfied a response
is given
Reads
• Same as reads; W
implies the number
of successful nodes
that must reply
before the write
is considered
consistent by
the client
Writes
Extreme example
• Given N=10, R=W=2 we
could have 8 nodes
down and the cluster
would still be fully
available to all clients
What does this all mean?
• N/R/W specified at request time, so each
client can specify its own tolerance for
outages dynamically
• Despite any outages within the cluster, the whole
cluster can still appear available based on N/R/W
• Given N=3 and R=W=2, we can have 3-2=1 node
down/unreachable/laggy in the cluster
• Stupidly high availability complete with eventual
consistency controlled by dynamic clients
Brewer’s CAP Theorem
• Consistency
• Availability
• Partition Tolerance
• You cant have all things, all the time…
• …but you can have some of each, all the time!
• Riak is about choosing your own levels of
each according to your use case
Consistency
• Start with document
version zero
• Things get redistributed
and n0 and n2 are
sitting in NYC and n1
and n3 are in London
• What if stuff changes??
Consistency
• Uh oh: inconsistency
• Both parts of the cluster
are still fully available
• NYC serves v1 whilst
London serves v0
• The network resumes
and Riak determines
the latest version by
using vector clocks
Consistency
• What if both sides of
the Atlantic changed?
• Riak is unable to
determine which is the
right document, both
are returned to the
client with an indication
of the inconsistency
• Distributed, fault-tolerant full-text searching
• Lucene syntax for queries
• No need for index sharding
• Linier scaling
• Double the number of nodes to get double the
search capacity (awesome!)
• Search via:
• Fields, wildcards, fuzzy text or token proximity
Riak Search
Questions?
basho.com/riak.html
github.com/basho/riak
twitter.com/timperrett
github.com/timperrett
blog.getintheloop.eu

Bathcamp 2010-riak

  • 1.
  • 2.
    What is Riak? •Documented orientated database • Written in Erlang • Based on Dynamo[1] and CAP Theorem[2] • Highly fault tolerant • HTTP and ProtoBuff interface • Write MapReduce in Erlang or JavaScript 1. http://goo.gl/r8Np 2. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  • 3.
    Same, Same butdifferent • Riak solves similar problems to MongoDB • Semi-structured data modeled as "documents” • Storage of non-document data in the database • High write-availability • Riak is intrinsically multi-node scalable • Mongo in comparison is single system (+ sharding) • Riak achieves availability via quorum writes • Mongo uses performant in-place writes • Riak uses “masterless” replication
  • 4.
    N/R/W – Dynamo N= Number of replicas to store R = Number of replicas needed to read W = Number of replicas needed to read • These principals first appeared in an Amazon research paper known as Dynamo
  • 5.
    • 160bit integerkey space. Each node that joins is assigned part of that space for consistent hashing • Hashing means any node can service any request making the cluster masterless and eventually consistant Number of replicas
  • 6.
    • Number ofreplies before Riak gives the client a successful reply. • Tries to access all nodes, but as soon as the N/R is satisfied a response is given Reads
  • 7.
    • Same asreads; W implies the number of successful nodes that must reply before the write is considered consistent by the client Writes
  • 8.
    Extreme example • GivenN=10, R=W=2 we could have 8 nodes down and the cluster would still be fully available to all clients
  • 9.
    What does thisall mean? • N/R/W specified at request time, so each client can specify its own tolerance for outages dynamically • Despite any outages within the cluster, the whole cluster can still appear available based on N/R/W • Given N=3 and R=W=2, we can have 3-2=1 node down/unreachable/laggy in the cluster • Stupidly high availability complete with eventual consistency controlled by dynamic clients
  • 10.
    Brewer’s CAP Theorem •Consistency • Availability • Partition Tolerance • You cant have all things, all the time… • …but you can have some of each, all the time! • Riak is about choosing your own levels of each according to your use case
  • 11.
    Consistency • Start withdocument version zero • Things get redistributed and n0 and n2 are sitting in NYC and n1 and n3 are in London • What if stuff changes??
  • 12.
    Consistency • Uh oh:inconsistency • Both parts of the cluster are still fully available • NYC serves v1 whilst London serves v0 • The network resumes and Riak determines the latest version by using vector clocks
  • 13.
    Consistency • What ifboth sides of the Atlantic changed? • Riak is unable to determine which is the right document, both are returned to the client with an indication of the inconsistency
  • 14.
    • Distributed, fault-tolerantfull-text searching • Lucene syntax for queries • No need for index sharding • Linier scaling • Double the number of nodes to get double the search capacity (awesome!) • Search via: • Fields, wildcards, fuzzy text or token proximity Riak Search
  • 15.