Riak at The NYC Cloud Computing Meetup Group

  • 1,817 views
Uploaded on

In depth look at the nosql product, Riak.

In depth look at the nosql product, Riak.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,817
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
39
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A Walk Down NOSQL Lane in the Cloud Part 2: Riak NYC Cloud Computing Group, March 2011 Alexander Sicular @siculars
  • 2. Who is this blowhard?Columbia University pays my mortgageFor the better part of a decade in MedicalInformaticsAm not shilling for any of these companiesAm not a computer scientistAm a computer science enthusiastparticularly in the area of Informatics
  • 3. Riak, eh?Dynamo inspiredHomogeneousSingle key-spaceDistributedReplicatedPredictablescaleability
  • 4. OriginsShow me your friends...Amazon’s Dynamohttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlAkamaihttp://www.basho.com/bios.html Paramount Home Video
  • 5. CAP Theorem http://en.wikipedia.org/wiki/CAP_theorem Consistency Availability Partition tolerance Pick two? http://guide.couchdb.org/draft/consistency.htmlRiak says: pick two at a time.
  • 6. HomogeneousEvery node is thesameAny node can serviceany requestNodes gossip on theirown port
  • 7. One Ring to Rule ThemSingle 160 bit key spaceHuh?No Sharding!
  • 8. Distributed (!= replicated)riak is not sharded ★Considerations:vnodes = units of -must plan maximumdistribution ring sizevnodes != physical -think about numbernodes (pnodes) of vnodes per pnodevnodes map to -generally no less thanpnodes 10 vnodes per pnodedata is distributed atthe vnode level
  • 9. Conflict Resolution Vector Clocks ancestry / divergency maintained automatic or manual resolution★ Considerations: X-Riak-ClientId, X-Riak-Vclock allow_mult
  • 10. Replicated (!= distributed)configurable replication values (“N”)configurable consistency and availabilityvalues at read and write time- read- write- durable write
  • 11. Predictable Scaleability How much performance per node? Scale in both directions> bin/riak-admin> Usage: riak-admin { join | leave | backup | restore | test | status | reip | js_reload | wait-for- service | ringready | transfers }
  • 12. Data Agnostic schemaless data objects may be of any type binary, text (json, xml) use content types>curl -v -d this is a test -H "Content-Type: text/plain" http://127.0.0.1:8098/riak/testBucket/testKey
  • 13. Extra GoodiesErlanghttp://www.pragprog.com/titles/jaerlang/programming-erlangCode Architecturebasho_benchMultiple backends bitcask, innodb, mem
  • 14. Code architectureHighly modularized riak_core riak_kv bitcask erlang_js http://bitbucket.org/basho
  • 15. basho_benchPerformance profilinghighly customizablepretty pictureskey/value store generalizedhttps://wiki.basho.com/display/RIAK/Benchmarking+with+Basho+Benchhttp://pics.livejournal.com/demmonoid/pic/00001sa7
  • 16. BitcaskRiak’s default disk backendWrite Only LogHeavy updates will grow your footprint - Look into compaction/merging settingsKeys are cached in memory with disk offsetshttps://spreadsheets.google.com/ccc?key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO
  • 17. Speak my language? HTTPhttp://wiki.basho.com/display/RIAK/REST+API Protocol Buffershttp://wiki.basho.com/display/RIAK/PBC+API Native Erlanghttp://wiki.basho.com/display/RIAK/Erlang+Client+PBC http://www.zazzle.com/ speak_to_me_in_tagalog_tshirt-235376204895796392
  • 18. Ok sounds good. How do I get it?>git|hg clone http://bitbucket.org/basho/riak>cd riak>make all && make rel OR if you’re on a mac:>brew install riak
  • 19. Ok sounds good. How do I get it?>git|hg clone http://bitbucket.org/basho/riak_search>cd riak_search>make all && make rel OR if you’re on a mac:>brew install riak-search
  • 20. What does that get me? Fully functional Self contained (<3) Default configuration-64 vnodes, “riak” cookie, N = 3
  • 21. Work... like so. Config fileshttp://wiki.basho.com/display/RIAK/Configuration+Filesapp.config-ring_creation_sizevm.args-name, -settings
  • 22. Fire it up> bin/riak> Usage: riak {start|stop|restart| reboot|ping|console|attach}> bin/riak start
  • 23. Do Stuff! GET:> curl -v http://127.0.0.1:8098/ping> curl -v http://127.0.0.1:8098/stats> curl -v http://127.0.0.1:8098/riak/myBucket> curl -v http://127.0.0.1:8098/riak/myBucket/myKey PUT:> curl -v -X PUT -H "Content-Type: application/json" -d {"backend": "ets"} http://127.0.0.1:8098/riak/myBucket> curl -v -X PUT -d test key http://127.0.0.1:8098/riak/ myBucket/myKey> curl -v -X POST -d autogen key http://127.0.0.1:8098/ riak/myBucket
  • 24. LinksLightweight GraphingPractical limitations re. number of links perobjectUnidirectional object linkingrelationship modeling (one to one, one to many)Returns “Content-Type: multipart/mixed;” - Library needs to be multipart aware - nodejs, formidable
  • 25. Link WalkingFirst level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_Via Map/Reduce>$ curl -X POST -H "content-type:application/json" http://localhost:8098/mapred --data @-{"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map":{"language":"javascript","source":"function(v){ return [v]; }"}}]}^DN level depth>curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_More Info:http://blog.basho.com/2010/02/24/link-walking-by-example/http://wiki.basho.com/display/RIAK/Linkshttp://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking
  • 26. Map/ReduceFunctions written in either Erlang orJavaScriptMap is distributed to where the data livesReduce is run on the node coordinating theM/RErlang > JavaScriptTweak JavaScript settings in app.conf
  • 27. M/R in Riak An input to start from function(v, keydata, args) { bucket ! if (v.values) { ! var ret = [], o = {}; ! o = Riak.mapValuesJson(v)[0]; ! list of keys / keyfilter ! o.lastModifiedParsed = Date.parse(v["values"][0] ["metadata"]["X-Riak-Last-Modified"]); ! o.key = v["key"]; ★ keys > bucket ! ret.push(o); ! return ret; possible link phase ! } else { ! return []; ! } one or more map phases ! }; (many) possible reduce phase(s) Map = SQL Select/Where clauseReduce = SQL Aggregates (SUM, COUNT, GROUPBY)
  • 28. Pre/Post Commit Hooks Pre Commit JavaScript or Post Commit Erlang Erlang Validation Indexing Modify data Messaging Kill writes
  • 29. Chief complaintsNo indexNo native sortNo incrementNo native datastructures
  • 30. Riak SearchBetaliciousSuperset of RiakFull text searchhttp://wiki.basho.com/display/RIAK/Riak+Searchhttp://www.slideshare.net/rklophaus/riak-search-erlang-factory-london-2010 http://www.seowebworx.co.uk/
  • 31. Riak Search... moreuses a modified bitcask backend calledmerge_indexenabled on a per bucket basisaccess via http and command line
  • 32. Riak-JSNodeJS Riak moduleWritten in CoffeescriptHTTP and ProtobufCustomizable via “meta” optionshttp://riakjs.org
  • 33. Code demonodejsriak-jsredissimple post sitetagsjson data passing
  • 34. Javascript Mapvar map = function(v, keydata, args) {! if (v.values) {! var ret = [], o = {};! o = Riak.mapValuesJson(v)[0];! o.key = v["key"]; / /put the key in the returned data object! o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"];! ret.push(o);! return ret;! } else {! return [];! }! };
  • 35. Javascript Reducevar sortInt = function ( data , args ) { var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field; var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === desc;! ! data.sort ( function(a,b) {! ! ! if (desc) {! ! ! var _ref = [b, a];! ! ! a = _ref[0];! ! ! b = _ref[1];! ! ! }! !! ! return a[sortBy] - b[sortBy]! ! } );! ! return data! };
  • 36. Putting it all togetherriak! .add(“bucket”) //map function! .map(map) //reduce fuction! .reduce(sortInt, { field: "lastModified", order: "desc" })! .run(function(err, response) { //send out an error if there is one! if (err) res.simpleJSON(400, {errortxt: mapreduce gone bad :(} );! / /otherwise send the data back...! res.simpleJSON(200, { response } );!! });
  • 37. Hybrid architectures are the future!Use tools like Redis to augment shortcomings!
  • 38. 1,456,023 Or “A Lot”At scale, precisiondoes not matter inpractice. Google Twitter http://photography.nationalgeographic.com/photography/enlarge/ okavango-cape-buffalo_pod_image.html
  • 39. Google Look Ma! No exact counts!
  • 40. TwitterNo Totals! No Pagination!
  • 41. Questions?NYC Cloud Computing Group, March 2011 Alexander Sicular @siculars