3. . What is Riak ?
Riak is an open source, distributed NoSQL database implementing
the principles from Amazon’s Dynamo paper.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
4. . Dynamo Model
Riak is a highly available, proprietary key-value structured storage
system or a distributed data store. It has properties of both
databases and distributed hash tables (DHTs). It is not directly
exposed as a web service, but is used to power parts of other
Amazon Web Services such as Amazon S3.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
5. . Dynamo Model
Riak is a highly available, proprietary key-value structured storage
system or a distributed data store. It has properties of both
databases and distributed hash tables (DHTs). It is not directly
exposed as a web service, but is used to power parts of other
Amazon Web Services such as Amazon S3.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
6. . Dynamo Model
Riak is a highly available, proprietary key-value structured storage
system or a distributed data store. It has properties of both
databases and distributed hash tables (DHTs). It is not directly
exposed as a web service, but is used to power parts of other
Amazon Web Services such as Amazon S3.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
7. . Dynamo Model
Riak is a highly available, proprietary key-value structured storage
system or a distributed data store. It has properties of both
databases and distributed hash tables (DHTs). It is not directly
exposed as a web service, but is used to power parts of other
Amazon Web Services such as Amazon S3.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
8. . Dynamo Model
Riak is a highly available, proprietary key-value structured storage
system or a distributed data store. It has properties of both
databases and distributed hash tables (DHTs). It is not directly
exposed as a web service, but is used to power parts of other
Amazon Web Services such as Amazon S3.
Implementations
Apache Cassandra
Project Voldemort
Riak
Amazon DynamoDB
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
9. . This talk
Front Matter
Dynamo (and NoSQL) are nothing new
Much of Dynamo was invented > 10 years ago
Dynamo chooses AP of CAP
This talk will focus on properties of Dynamo-inspired systems
(Riak, Cassandra, Voldemort)
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
11. . CAP theorem
Consistent, writes are atomic and all subsequent requests
retrieve the new value
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12. . CAP theorem
Consistent, writes are atomic and all subsequent requests
retrieve the new value
Available, the database will always return a value as long as a
single server is running
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
13. . CAP theorem
Consistent, writes are atomic and all subsequent requests
retrieve the new value
Available, the database will always return a value as long as a
single server is running
Partition Tolerant, the system will still function even if server
communication is temporarily lost—that is, a network
partition
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
14. . CAP theorem
Consistent, writes are atomic and all subsequent requests
retrieve the new value
Available, the database will always return a value as long as a
single server is running
Partition Tolerant, the system will still function even if server
communication is temporarily lost—that is, a network
partition
You can have only two at once.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
16. . Eventual Consistency
Distributed databases must be partition tolerant, so the
choice between availability and consistency can be difficult.
The real world is eventually consistent and works (mostly) fine
*Eventual* doesn’t mean minutes, days, or even seconds in
non-failure cases
DNS, HTTP with Expires: header
How you model the real world matters!
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
17. . Riak Architecture
Amazon’s Dynamo architecture
Distributed, Scalable, No single point of failure
No transactions; trade strong consistency for eventual consistency
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
18. . Simple Key Value data store - Store anything
Plain text, JSON, or XML to images or video clips—all accessible
through a simple HTTP interface
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
19. . Simple Key Value data store - Store anything
Plain text, JSON, or XML to images or video clips—all accessible
through a simple HTTP interface
Riak KV
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
20. . Fault-Tolerant
Riak is also fault-tolerant. Servers can go up or down at any
moment with no single point of failure.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
21. . Riak loves the web
Query Riak via URLs, headers, and verbs, and Riak returns assets
and standard HTTP response codes.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
23. . Queriablity
REST API
Easy quick and dirty
Map Reduce
Provide set of starting keys, filter via map
Map reduce is meant for calulations/ aggregations and not queries.
Riak Search
Full text search in Riak
Opinionated
Roll out your own indices
Difficult to get it right
More code to maintain
Often introduces SPOFs
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
34. . The tldr; version
MapReduce is a framework for processing embarrassingly parallel
problems across huge datasets using a large number of computers
(nodes), collectively referred to as a cluster (if all nodes are on the
same local network and use similar hardware) or a grid (if the
nodes are shared across geographically and administratively
distributed systems, and use more heterogenous hardware).
Computational processing can occur on data stored either in a
filesystem (unstructured) or in a database (structured).
MapReduce can take advantage of locality of data, processing data
on or near the storage assets to decrease transmission of data.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
35. . Map
”Map” step: The master node takes the input, divides it into
smaller sub-problems, and distributes them to worker nodes. A
worker node may do this again in turn, leading to a multi-level tree
structure. The worker node processes the smaller problem, and
passes the answer back to its master node.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
36. . Reduce
”Reduce” step: The master node then collects the answers to all
the sub-problems and combines them in some way to form the
output – the answer to the problem it was originally trying to solve.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
52. . Riak …
Best used: If you want something Cassandra-like (Dynamo-like),
but no way you’re gonna deal with the bloat and complexity. If you
need very good single-site scalability, availability and
fault-tolerance, but you’re ready to pay for multi-site replication.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
53. . Riak …
For example: Point-of-sales data collection. Factory control
systems. Places where even seconds of downtime hurt. Could be
used as a well-update-able web server.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
55. . Why use Riak
Scalable
Riak is built so you can add more capacity as your app or platform
grows. When you add new machines, Riak distributes data
automatically around the cluster with no downtime and a
near-linear increase in performance and throughput
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
56. . Why use Riak
Simple Ops
Riak is the most boring database you’ll ever run in production. No
sharding required, just horizontal scaling and straight-forward
capacity planning. The same operational tasks apply to small
clusters and large clusters. More machines does not mean more
ops.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
57. . Why use Riak
Masterless
A Riak cluster is masterless. No node is special and any node can
handle requests for any other node in the cluster. All requests to
Riak happen concurrently and developers don’t have to spend time
worrying about read/write locks or single points of failure.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
58. . Why use Riak
Fault Tolerant
Decide how many replicas of the data you want (start at 3). If
nodes go down, requests are routed transparently to other nodes.
Riak uses proven architectural principles like hinted handoff and
read repair so you can always write and read data, even in failure
conditions.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
59. . Why use Riak
Complex Queries
In addition to key/value access to your data, Riak has built-in
support for MapReduce, Full Text Search, and Secondary Indexes,
giving developers various ways to store and query their data.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
60. . Why use Riak
Flexible APIs
Riak is equipped with fully-featured HTTP and Protocol Buffers
APIs, with support for both of these transports in all of our
supported client libraries. You can also write your own client layer
to wrap our APIs if your use case or preference calls for it.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
61. . Why use Riak
1000s of Users
Comcast, Yammer, Voxer, Boeing, BestBuy, SEOMoz, Joyent,
Kiip, DotCloud, Formspring, GitHub, and the Danish Government
are just a few of the thousands of startups and enterprises that
have deployed Riak.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
62. . Who uses it and why?
github git.io
comcast Internal object storage
yammer Notifications
DISQUS analytics
AOL 1.5 billion data objects per day
Mozilla User reviews
joyent, best buy …
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
63. . Why use Riak
Language Support
Basho and the Riak Community maintain libraries for most major
languages including Java, Node.js, Python, Ruby, PHP, C/C++,
and many more. All the client code, like the core Riak code, is
available on GitHub.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
64. . Why use Riak
Powerful Community
The Riak community is composed of smart, passionate hackers
who are contributing code, support, and evangelism every day.
There are hundreds of companies and individuals testing, breaking,
running, and building Riak.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..