Intro to riak
Upcoming SlideShare
Loading in...5
×
 

Intro to riak

on

  • 1,805 views

 

Statistics

Views

Total Views
1,805
Views on SlideShare
1,805
Embed Views
0

Actions

Likes
0
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Intro to riak Intro to riak Presentation Transcript

  • . Distributed computation on dynamo-style distributed storage: Riak pipe. An introduction to Riak Jaseem Abid September 19, 2012 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . What is Riak ? Riak is an open source, distributed NoSQL database implementing the principles from Amazon’s Dynamo paper. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Dynamo Model Riak is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables (DHTs). It is not directly exposed as a web service, but is used to power parts of other Amazon Web Services such as Amazon S3. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Dynamo Model Riak is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables (DHTs). It is not directly exposed as a web service, but is used to power parts of other Amazon Web Services such as Amazon S3. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Dynamo Model Riak is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables (DHTs). It is not directly exposed as a web service, but is used to power parts of other Amazon Web Services such as Amazon S3. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Dynamo Model Riak is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables (DHTs). It is not directly exposed as a web service, but is used to power parts of other Amazon Web Services such as Amazon S3. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Dynamo Model Riak is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables (DHTs). It is not directly exposed as a web service, but is used to power parts of other Amazon Web Services such as Amazon S3. Implementations Apache Cassandra Project Voldemort Riak Amazon DynamoDB . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . This talk Front Matter Dynamo (and NoSQL) are nothing new Much of Dynamo was invented > 10 years ago Dynamo chooses AP of CAP This talk will focus on properties of Dynamo-inspired systems (Riak, Cassandra, Voldemort) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . CAP theorem . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . CAP theorem Consistent, writes are atomic and all subsequent requests retrieve the new value . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . CAP theorem Consistent, writes are atomic and all subsequent requests retrieve the new value Available, the database will always return a value as long as a single server is running . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . CAP theorem Consistent, writes are atomic and all subsequent requests retrieve the new value Available, the database will always return a value as long as a single server is running Partition Tolerant, the system will still function even if server communication is temporarily lost—that is, a network partition . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . CAP theorem Consistent, writes are atomic and all subsequent requests retrieve the new value Available, the database will always return a value as long as a single server is running Partition Tolerant, the system will still function even if server communication is temporarily lost—that is, a network partition You can have only two at once. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . CAP theorem Riak picks AP of CAP . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Eventual Consistency Distributed databases must be partition tolerant, so the choice between availability and consistency can be difficult. The real world is eventually consistent and works (mostly) fine *Eventual* doesn’t mean minutes, days, or even seconds in non-failure cases DNS, HTTP with Expires: header How you model the real world matters! . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak Architecture Amazon’s Dynamo architecture Distributed, Scalable, No single point of failure No transactions; trade strong consistency for eventual consistency . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Simple Key Value data store - Store anything Plain text, JSON, or XML to images or video clips—all accessible through a simple HTTP interface . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Simple Key Value data store - Store anything Plain text, JSON, or XML to images or video clips—all accessible through a simple HTTP interface Riak KV . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Fault-Tolerant Riak is also fault-tolerant. Servers can go up or down at any moment with no single point of failure. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak loves the web Query Riak via URLs, headers, and verbs, and Riak returns assets and standard HTTP response codes. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Scalable # start 4 nodes $ dev/dev1/bin/riak start $ dev/dev2/bin/riak start $ dev/dev3/bin/riak start $ dev/dev4/bin/riak start # scale out $ dev/dev2/bin/riak-admin join dev1@127.0.0.1 # scabe back in $ riak-admin leave . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Queriablity REST API Easy quick and dirty Map Reduce Provide set of starting keys, filter via map Map reduce is meant for calulations/ aggregations and not queries. Riak Search Full text search in Riak Opinionated Roll out your own indices Difficult to get it right More code to maintain Often introduces SPOFs . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak REST API Representational state transfer . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak REST API Representational state transfer Create Update Read Delete verbs . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak REST API Representational state transfer Create Update Read Delete verbs Riak and cURL . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . REST API w/ curl $ curl http://localhost:8091/ping OK # Let’s issue a bad query. # -I tells cURL that we want only the header response. $ curl -I http://localhost:8091/riak/no_bucket/no_key HTTP/1.1 404 Object Not Found Server: MochiWeb/1.1 WebMachine/1.7.3 (participate in the frantic) Date: Thu, 04 Aug 2011 01:25:49 GMT Content-Type: text/plain Content-Length: 10 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Lets PUT something in the DB $ curl -v -X PUT http://localhost:8091/riak/favs/db -H "Content-Type: text/html" -d "<html> <body> <h1> My new favorite DB is RIAK </h1> </body> </html>" . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Now get it back curl -X GET http://localhost:8091/riak/favs/db Or just hit the browser . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Or POST it $ curl -I -X POST http://localhost:8091/riak/animals -H "Content-Type: application/json" -d { "nickname" : "Sergeant Stubby", "breed" : "Terrier" } HTTP/1.1 201 Created Vary: Accept-Encoding Server: MochiWeb/1.1 WebMachine/1.7.3 (participate in the frantic) Location: /riak/animals/6VZc2o7zKxq2B34kJrm1S0ma3PO Date: Tue, 05 Apr 2011 07:45:33 GMT Content-Type: application/json Content-Length: 0 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Dump anything curl -X PUT HTTP://127.0.0.1:8091/riak/images/1.jpg -H "Content-type: image/jpeg" --data-binary @image_name.jpg . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Or DELETE it $ curl -I -X DELETE http://localhost:8091/riak/ animals/6VZc2o7zKxq2B34kJrm1S0ma3PO HTTP/1.1 204 No Content Vary: Accept-Encoding Server: MochiWeb/1.1 WebMachine/1.7.3 (participate in the frantic) Date: Mon, 11 Apr 2011 05:08:39 GMT Content-Type: application/x-www-form-urlencoded Content-Length: 0 DELETE won’t return any body, but the HTTP code will be 204 if successful. Otherwise, as you’d expect, it returns a 404. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Map Reduce Google :) *large* data sets Functional Programming Apache Hadoop Why C suck here . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . The tldr; version MapReduce is a framework for processing embarrassingly parallel problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster (if all nodes are on the same local network and use similar hardware) or a grid (if the nodes are shared across geographically and administratively distributed systems, and use more heterogenous hardware). Computational processing can occur on data stored either in a filesystem (unstructured) or in a database (structured). MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Map ”Map” step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Reduce ”Reduce” step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . JSON { "name": "John Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York" "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ] } .. . .. . .. . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. .
  • . Map The Riak way function (v) { return [v.phoneNumber]; } The couchDB way function (v) { emit("phone", v.phoneNumber); } . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak Search ... . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak Secondary Indices ... . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Written in : Erlang & C, some Javascript . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Protocol: HTTP/REST or custom binary . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Tunable trade-offs for distribution and replication (N, R, W) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Pre - and post-commit hooks in JavaScript or Erlang, for validation and security. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Map/reduce in JavaScript or Erlang . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Links & link walking: use it as a graph database . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Secondary indices: but only one at once Large object support (Luwak) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Comes in ”open source” and ”enterprise” editions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Full-text search, indexing, querying with Riak …Search server (beta) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … In the process of migrating the storing backend from ”Bitcask” to Google ”LevelDB” . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Masterless multi-site replication replication and SNMP monitoring are commercially licensed . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … Best used: If you want something Cassandra-like (Dynamo-like), but no way you’re gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you’re ready to pay for multi-site replication. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Riak … For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Review Multi-node Clustering MapReduce Processing Integrated Full-Text Search Secondary Indexing Masterless Multi-Site Replication Large Object Support Implementation Architecture . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Scalable Riak is built so you can add more capacity as your app or platform grows. When you add new machines, Riak distributes data automatically around the cluster with no downtime and a near-linear increase in performance and throughput . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Simple Ops Riak is the most boring database you’ll ever run in production. No sharding required, just horizontal scaling and straight-forward capacity planning. The same operational tasks apply to small clusters and large clusters. More machines does not mean more ops. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Masterless A Riak cluster is masterless. No node is special and any node can handle requests for any other node in the cluster. All requests to Riak happen concurrently and developers don’t have to spend time worrying about read/write locks or single points of failure. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Fault Tolerant Decide how many replicas of the data you want (start at 3). If nodes go down, requests are routed transparently to other nodes. Riak uses proven architectural principles like hinted handoff and read repair so you can always write and read data, even in failure conditions. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Complex Queries In addition to key/value access to your data, Riak has built-in support for MapReduce, Full Text Search, and Secondary Indexes, giving developers various ways to store and query their data. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Flexible APIs Riak is equipped with fully-featured HTTP and Protocol Buffers APIs, with support for both of these transports in all of our supported client libraries. You can also write your own client layer to wrap our APIs if your use case or preference calls for it. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak 1000s of Users Comcast, Yammer, Voxer, Boeing, BestBuy, SEOMoz, Joyent, Kiip, DotCloud, Formspring, GitHub, and the Danish Government are just a few of the thousands of startups and enterprises that have deployed Riak. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Who uses it and why? github git.io comcast Internal object storage yammer Notifications DISQUS analytics AOL 1.5 billion data objects per day Mozilla User reviews joyent, best buy … . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Language Support Basho and the Riak Community maintain libraries for most major languages including Java, Node.js, Python, Ruby, PHP, C/C++, and many more. All the client code, like the core Riak code, is available on GitHub. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • . Why use Riak Powerful Community The Riak community is composed of smart, passionate hackers who are contributing code, support, and evangelism every day. There are hundreds of companies and individuals testing, breaking, running, and building Riak. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • Questions? . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..