RiakAn awesome noSQL data store:• Super easy to scale up AND down• Fault tolerant – no SPoF• Flexible schema• Full-text search out of the box• Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
Riak – Basics• Data in Riak is grouped buckets (effectively namespaces)• Basic operations are: • Get, save, delete, search, map, reduce• Eventual consistency managed through N, R, and W bucket parameters.• Everything we put in Riak is JSON• We talk to Riak through the excellent riak-js node library by Francisco Treacy
Data Model – Clips title ctime domain authormentions annotation tags
Data Model - ClipsClips are the gateway to all of our data <html> Comments on Clip ‘abc’ … “F1rst” </html> key: abc Blob “Nice clip yo!” “Saw this on Reddit…” Clip Key: abc Comment Cache
Riak Search• Gets many things out of Riak by something other than the primary key.• You specify a schema (the types for the field within a JSON object).• Works great but with one big gotcha: – Index is uses term-based partitioning instead of document-based partitioning – Implication: joins + sort + pagination sucks – We know how to work around this
Riak Search – Querying• Query syntax based on Lucene• Basic Query text:funny• Compound Query login:greg OR (login:gary AND tags:riak)• Range Query ctime:[98685879630026 TO 98686484430026]
Clipboard App Flow Client node.js Riak Go to clipboard.com/home Search clips bucket query = login:greg Top 20 results Top 20 results startrendering (For each clip) API Request for blob GET from blobs bucket Return blob to client render blob
Clipboard Queries login:greg mentions:greg ctime:[98685879630026 TO 98686484430026] (Search)
Uh oh login:greg AND private:false Matches only my clips Matches 20% of all clips! login:greg AND text:iPhone (Search)
Doc Partition Query Processing1. x AND y (sort z, start = 990, count = 10)2. On Each node: 1. Perform x AND y 2. Sort on z 3. Slice [ 0 .. 1000 ] 4. Send to aggregator3. On aggregator 1. Merge all results (N x 1000) 2. Slice [ 990 .. 1000 ]
Term Partition Query Processing1. x AND y (sort z, start = 990, count = 10)2. On x node: search for x (and send all)3. On y node: search for y (and send all)4. On aggregator: 1. Do x AND y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
Riak Search Issues1. For any singular term, all results must be sent back to aggregator.2. Incorrectly performs sort and slice (does sort then slice)3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|).4. All matches must be read to get sort field.
Riak Search Fixes1. Inline fields for short and common attributes.2. Dynamic fields for precomputed ANDs.3. PRESORT option for sorting without document reads.
Inline FieldsNifty feature added recently to Riak SearchFields only used to prune result set can bemade inline for a big perf winNormal query applied first – then results filteredquickly with inline “filter” queryHigh storage cost – only viable for small fields! (Search)
Riak Search – Inline Fields cont. login:greg AND private:false becomes Query - login:greg Filter Query – private:false private:false is efficiently applied only to results of login:greg. Hooray! (Search)
Fixing ANDsBut what about login:greg AND text:iPhone?text field is too large to inline!We had to get creative. (Search)
Dynamic FieldsOur Solution: Create a new field - text_u (u for user)Values in text_u have the user’s name appendedIn greg’s clip text:iPhone text_greg:iPhoneIn bob’s clip text:iPhone text_bob:iPhone (Search)
Presort on Keys• Our addition to Riak code base.• Does sort before slice• If PRESORT=key, then never reads the docs• Tremendous win (> 100x compared to M/R approaches)
Clip Keys<Time (ms)><User (guid)><SHA1 of Value>• Base-64 encode each component• Only use first 4 characters of user & content• Only 16 bytesCollisions? 1 in 17M if clipped the same thingat same time.
Our Query Processing1. w AND (x AND y) (sort z, start = 990, count = 10)2. On w_x node: search and send w_x3. On w_y node: search and send all w_y4. On aggregator: 1. Do w_x AND w_y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
Summary• Use inline fields for short and common bits• Use dynamic fields for prebuilt ANDs• Use keys that imply sort order• Use same techniques for pagination• Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).