Riak: A friendly key/value store for the web.

  • 12,968 views
Uploaded on

Talk at DevNation Portland, July 10th, 2010. …

Talk at DevNation Portland, July 10th, 2010.

Keynote available for download at:
http://dl.dropbox.com/u/458036/presentations/DevNation%20Portland.zip

Sources and additional information in the presenter notes.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
12,968
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
152
Comments
0
Likes
14

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. riak A friendly key/value store for the web. ION EV NAT 010D D 2 N TLA POR A primer by Bruce Williams
  • 2. D PO EV R N TLA A N TI D O N My name is Bruce Williams. ct ed di g ad din I’m lee a nd e b th e. to e dg
  • 3. D PO EV R N TLA A N TI D O N 2001 - Present Day wa yyy before it was a viable job choice.
  • 4. D PO EV R N TLA A N TI D O N But I use other languages, too. rom . y f ms all ig ci ad es pe ar rp o the
  • 5. D PO EV R N TLA A N TI D O N Photo by oddsteph - http://flic.kr/p/6vWPBU me su of as e Le t’s on ll a is ba J av base t he ats. b Choose the Right Weapon
  • 6. D PO EV R N TLA A N TI D O N Based in the D.C. area. (but I’m not.)
  • 7. You may find the following conspicuously missing in this talk: r y! o r S
  • 8. D PO EV R N TLA I will not be A N TI D O N presenting a paper on Dynamo, the CAP theorem, vector clocks, merkle trees, etc. These are explained elsewhere by my alg orithmic betters.
  • 9. D PO EV R N TLA A N TI D O N I will not be dwelling on performance or redundancy. Expect some vague statements like “very fa st” and “very robust.”
  • 10. D PO EV R N TLA A N TI D O N I will not try to convince you that “NoSQL” is the messiah. I t’s an alternative that m akes sense in some situations.
  • 11. D PO EV R N TLA A N TI D O N I will not be conducting a large-scale comparison of competing technologies. b ut I’d love to hear abou t what you use, and why
  • 12. What is Riak?
  • 13. D PO EV R N TLA A N TI D O N NoSQL and of the Dynamo persuasion.
  • 14. D PO EV R N TLA A N TI D O N Open Source & a commercial “EnterpriseDS” version with some proprietary pieces
  • 15. D PO EV R N TLA A N TI D O N Key/Value Store With some metadata.
  • 16. D PO EV R N TLA A N TI D O N Schema-less Great for sparse data, but requires more discipline.
  • 17. D PO EV R N TLA A N TI D O N Datatype Agnostic Con tent-Type is King.
  • 18. D PO EV R N TLA A N TI D O N Language Agnostic REST & PBC Erlang, Javascript, Java, PHP, Python, Ruby, ...
  • 19. D PO EV R N TLA A N TI D O N Distributed It’s [mostly] Erlang, what did you expect?
  • 20. D PO EV R N TLA A N TI D O N Masterless All nodes are equal
  • 21. D PO EV R N TLA A N TI D O N Scalable o r “easy to scale.”
  • 22. D PO EV R N TLA A N TI D O N Eventually Consistent and CAP tunable.
  • 23. D PO EV R N TLA A N TI D O N Uses Map/Reduce and “Link.”
  • 24. Getting Up & Running
  • 25. N O TI D A N N TLA http://riak.basho.com EV R D PO
  • 26. N O TI D A N N TLA EV R D PO hg & git
  • 27. D PO EV R A Quick Local Cluster N TLA A N TI D O N $ ./riak1/bin/riak start $ ./riak2/bin/riak start $ ./riak3/bin/riak start Start three “nodes” $ ./riak2/bin/riak-admin join riak1@127.0.0.1 $ ./riak3/bin/riak-admin join riak1@127.0.0.1 Join them in to a cluster
  • 28. Your Data
  • 29. D PO EV R Object N TLA A N TI D O N Content Type Body + Links The thing you’re storing.
  • 30. D PO EV R ca Key N TLA n A N TI D de be O N au fi to ne use ge ma d o r- ne tic r ra a te lly d pic1 The identifier for the object.
  • 31. D PO EV R Bucket N TLA A N TI D O N “p thin ic wi 1” “im is ag un es iq ” pic1 ue pic2 pic3 images The type or category of object.
  • 32. D PO EV R Addressability N TLA A N TI D O N <i ma ge images s/ pi c1 > pic1 Refer to objects by bucket and key.
  • 33. D PO EV R Example N TLA A N TI D O N require 'riak' client = Riak::Client.new client.bucket('images').new('pic1').tap do |pic1| pic1.content_type = 'image/jpeg' pic1.data = File.read('/path/to/jpg') pic1.store end $g em install riak-client
  • 34. D PO EV R Example N TLA A N TI D O N client.bucket('people').new('bruce').tap do |bruce| bruce.data = { name: 'Bruce Williams', email: 'bruce@codefluency.com' } bruce.store end puts client['people']['bruce'].data['name'] “application/json” is the d efault for riak-client
  • 35. D PO EV R Links N TLA A N TI D O N st or ed images people he re pic1 bruce can also be “tagged” Connect objects
  • 36. D PO EV R Example N TLA A N TI D O N client['people']['bruce'].tap do |bruce| bruce.links << client['images']['pic1'].to_link('avatar') bruce.store end client['people']['bruce'].walk(:tag => 'avatar')
  • 37. Hooks pre-commit reject or transform an object to be committed post-commit notify external services, build your own indexe
  • 38. Where does it go?
  • 39. D PO EV R The Ring N TLA A N TI D O N A 160-bit integer space
  • 40. D PO EV R The Ring N TLA A N TI D O N broken into equal sized partitions.
  • 41. N O TI D A N N TLA EV R D PO st more functional) looks kinda like this The Ring (it’s ju It Photo by marchdoe - http://flickr.com/photos/marchdoe/457741149
  • 42. D PO EV R The Ring N TLA A N TI D O N Each partition is managed by a vnode (virtual node),
  • 43. D PO EV R The Ring N TLA A N TI D O N Each vnode runs on a [physical] node.
  • 44. D PO EV R The Ring N TLA A N TI D O N 1 2 3 4 Each node owns an equal share of vnodes (& partitions)
  • 45. D PO EV R Replication N TLA A N TI D O N 3 is th e de fa ult n_val = 3 Objects are written to multiple partitions.
  • 46. , ils N O TI D A N N TLA fa EV R ” up “2 ck Uses Hinted Handoff to deal with D PO de pi k. no s n er lac he th s W e o the th Availability node failures. 4 2 3 1
  • 47. D PO EV R Persistence N TLA A N TI D O N dets ets fs gb_trees innostore bitcask multi + Supports pluggable backends
  • 48. CAP Tuning
  • 49. D PO EV R GET N TLA A N TI D O N r how many replicas need to agree (default: 2)
  • 50. D PO EV R PUT N TLA A N TI D O N r how many replicas need to agree when retrieving an existing object before the write (default: 2) w how many replicas to write to before returning a successful response (default: 2). dw how many replicas to commit to durable storage before returning a successful response (default: 0)
  • 51. (Map|Link)*Reduce
  • 52. D PO EV R Map N TLA A N TI D O N obj [result, ...] your function Map functions take one piece of data as input, and produce zero or more results as output.
  • 53. Data-locality is important in Riak. Map phases are run where the data is stored. You can have multiple map phases. The input to a map definition is a series of [bucket, key] names. unlike CouchDB
  • 54. D PO EV R Link N TLA A N TI D O N obj [linked_obj, ...] link walk, using a pattern A special kind of map phase; links matching a pattern are “walked” to find objects to be output.
  • 55. D PO EV R Reduce N TLA A N TI D O N [obj, ...] [result] your function Reduce functions combine the output of many "map" step evaluations, into one result
  • 56. The reduce phase occurs on the “coordinating node.” Reduces may be run multiple times as more input comes in (eg, re- reduce)
  • 57. D PO EV R Example N TLA A N TI D O N bruce = client['people']['bruce'] melissa = client['people']['melissa'] lets assume these have ages addy = client['addresses'].new('123fake') addy.data = { street: '123 Fake St', city: 'Portland', state: 'OR', zip: '97214' } addy.links << bruce.to_link('resident') addy.links << melissa.to_link('resident') addy.store
  • 58. D PO EV R Example N TLA A N TI D O N Riak::MapReduce.new(client).add(addy). link(tag: 'resident'). map("function (v) { return [Riak.mapValuesJson(v)[0]['age'] || 0] }"). reduce(function: 'Riak.reduceSum', keep: true). run We should get an array with one value
  • 59. Hurdles
  • 60. D PO EV R N TLA No range queries. A N TI D O N Sorry, Cassandra fans Things like time series data require creative approaches. like bucket and key naming, etc
  • 61. D PO EV R N TLA A N Don’t list keys. TI D O N ever, if you can avoid it. Processing an entire bucket is more expensive than you might think. because it lists keys
  • 62. D PO EV R N TLA A N TI D O N Watch your encoding. MapReduce Javascript phases need your data to be in valid Unicode. you’ll get a “bad encoding” error
  • 63. sy a E Questions?
  • 64. N O TI D A N N TLA EV R D PO @wbruce Thanks!