I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private
Riak is a distributed key-value store inspired by Dynamo. It is homogeneous, with a single key space and is distributed and replicated across nodes. Riak aims to provide predictable scalability and high availability while allowing for some flexibility in consistency versus availability tradeoffs. It uses a ring topology and vector clocks to manage data distribution and conflict resolution. Riak supports schemaless data storage and provides features like links for basic graph capabilities and map/reduce functions for querying data.
Similar to I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQueryTatsuhiko Miyagawa
Similar to I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private (20)
Scanning the Internet for External Cloud Exposures via SSL Certs
I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private
1. A Walk Down NOSQL
Lane in the Cloud
Part 2: Riak
NYC Cloud Computing Group, March 2011
Alexander Sicular
@siculars
2. Who is this blowhard?
Columbia University pays my mortgage
For the better part of a decade in Medical
Informatics
Am not shilling for any of these companies
Am not a computer scientist
Am a computer science enthusiast
particularly in the area of Informatics
4. Origins
Show me your friends...
Amazon’s Dynamo
http://www.allthingsdistributed.com/
2007/10/amazons_dynamo.html
Akamai
http://www.basho.com/bios.html
Paramount Home Video
5. CAP Theorem
http://en.wikipedia.org/wiki/CAP_theorem
Consistency
Availability
Partition tolerance
Pick two?
http://guide.couchdb.org/draft/consistency.html
Riak says: pick two at a time.
7. One Ring to Rule Them
Single 160 bit key space
Huh?
No Sharding!
8. Distributed (!= replicated)
riak is not sharded
★Considerations:
vnodes = units of -must plan maximum
distribution ring size
vnodes != physical -think about number
nodes (pnodes) of vnodes per pnode
vnodes map to -generally no less than
pnodes 10 vnodes per pnode
data is distributed at
the vnode level
10. Replicated (!= distributed)
configurable replication values (“N”)
configurable consistency and availability
values at read and write time
-
read
-
write
-
durable write
11. Predictable Scaleability
How much performance per node?
Scale in both directions
> bin/riak-admin
> Usage: riak-admin { join | leave |
backup | restore | test | status |
reip | js_reload | wait-for-
service | ringready | transfers }
12. Data Agnostic
schemaless
data objects may be of any type
binary, text (json, xml)
use content types
>curl -v -d 'this is a test' -H "Content-Type: text/plain"
http://127.0.0.1:8098/riak/testBucket/testKey
16. Bitcask
Riak’s default disk backend
Write Only Log
Heavy updates will grow your footprint
- Look into compaction/merging settings
Keys are cached in memory with disk offsets
https://spreadsheets.google.com/ccc?
key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO
18. Ok sounds good.
How do I get it?
>git|hg clone http://bitbucket.org/
basho/riak
>cd riak
>make all && make rel
OR if you’re on a mac:
>brew install riak
19. Ok sounds good.
How do I get it?
>git|hg clone http://bitbucket.org/basho/
riak_search
>cd riak_search
>make all && make rel
OR if you’re on a mac:
>brew install riak-search
20. What does that get me?
Fully functional
Self contained (<3)
Default configuration
-64 vnodes, “riak” cookie, N = 3
24. Links
Lightweight Graphing
Practical limitations re. number of links per
object
Unidirectional object linking
relationship modeling (one to one, one to many)
Returns “Content-Type: multipart/mixed;”
- Library needs to be multipart aware
- nodejs, formidable
25. Link Walking
First level depth
>curl http://localhost:8098/riak/myBucket/myKey/_,_,_
Via Map/Reduce
>$ curl -X POST -H "content-type:application/json"
http://localhost:8098/mapred --data @-
{"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map":
{"language":"javascript","source":"function(v)
{ return [v]; }"}}]}
^D
N level depth
>curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_
More Info:
http://blog.basho.com/2010/02/24/link-walking-by-example/
http://wiki.basho.com/display/RIAK/Links
http://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking
26. Map/Reduce
Functions written in either Erlang or
JavaScript
Map is distributed to where the data lives
Reduce is run on the node coordinating the
M/R
Erlang > JavaScript
Tweak JavaScript settings in app.conf
27. M/R in Riak
An input to start from
function(v, keydata, args) {
bucket ! if (v.values) {
! var ret = [], o = {};
! o = Riak.mapValuesJson(v)[0]; !
list of keys / keyfilter ! o.lastModifiedParsed = Date.parse(v["values"][0]
["metadata"]["X-Riak-Last-Modified"]);
! o.key = v["key"];
★ keys > bucket ! ret.push(o);
! return ret;
possible link phase ! } else {
! return [];
! }
one or more map phases ! };
(many) possible reduce phase(s)
Map = SQL Select/Where clause
Reduce = SQL Aggregates (SUM, COUNT, GROUP
BY)
28. Pre/Post Commit Hooks
Pre Commit
JavaScript or Post Commit
Erlang
Erlang
Validation
Indexing
Modify data
Messaging
Kill writes
34. Javascript Map
var map = function(v, keydata, args) {
! if (v.values) {
! var ret = [], o = {};
! o = Riak.mapValuesJson(v)[0];
! o.key = v["key"]; / /put the key in the returned data object
! o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"];
! ret.push(o);
! return ret;
! } else {
! return [];
! }
! };
35. Javascript Reduce
var sortInt = function ( data , args ) {
var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field;
var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === 'desc';
! ! data.sort ( function(a,b) {
! ! ! if (desc) {
! ! ! var _ref = [b, a];
! ! ! a = _ref[0];
! ! ! b = _ref[1];
! ! ! }! !
! ! return a[sortBy] - b[sortBy]
! ! } );
! ! return data
! };
36. Putting it all together
riak
! .add(“bucket”)
//map function
! .map(map)
//reduce fuction
! .reduce(sortInt, { field: "lastModified", order: "desc" })
! .run(function(err, response) {
//send out an error if there is one
! if (err) res.simpleJSON(400, {errortxt: 'mapreduce gone bad :('} );
! / /otherwise send the data back...
! res.simpleJSON(200, { response } );
!
! });
38. 1,456,023 Or “A Lot”
At scale, precision
does not matter in
practice.
Google
Twitter
http://photography.nationalgeographic.com/photography/enlarge/
okavango-cape-buffalo_pod_image.html