Your SlideShare is downloading. ×
Riak perf wins
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Riak perf wins


Published on

How the team at got more than 100x better search performance with some simple changes to riak search.

How the team at got more than 100x better search performance with some simple changes to riak search.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Riak SearchPerformance Wins How we got > 100x improvement in query throughput Gary Flake, Founder
  • 2. Demo Introduction
  • 3. Architecture web-01 web-02 web-03 Node.js + Nginx Node.js + Nginx Node.js + Nginx riak-01 cache-01 redis-01 riak-05 riak-02 cache-02 redis-02 cache-03 riak-04 riak-03 admin-01 thumb-01 thumb-02 job-01 job-02
  • 4. RiakAn awesome noSQL data store:• Super easy to scale up AND down• Fault tolerant – no SPoF• Flexible schema• Full-text search out of the box• Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
  • 5. Riak – Basics• Data in Riak is grouped buckets (effectively namespaces)• Basic operations are: • Get, save, delete, search, map, reduce• Eventual consistency managed through N, R, and W bucket parameters.• Everything we put in Riak is JSON• We talk to Riak through the excellent riak-js node library by Francisco Treacy
  • 6. Data Model – Clips title ctime domain authormentions annotation tags
  • 7. Data Model - ClipsClips are the gateway to all of our data <html> Comments on Clip ‘abc’ … “F1rst” </html> key: abc Blob “Nice clip yo!” “Saw this on Reddit…” Clip Key: abc Comment Cache
  • 8. Other Buckets• Users• Blobs• Comments• Templates• Counts• Search Caches• Transactions
  • 9. Riak Search• Gets many things out of Riak by something other than the primary key.• You specify a schema (the types for the field within a JSON object).• Works great but with one big gotcha: – Index is uses term-based partitioning instead of document-based partitioning – Implication: joins + sort + pagination sucks – We know how to work around this
  • 10. Riak Search – Querying• Query syntax based on Lucene• Basic Query text:funny• Compound Query login:greg OR (login:gary AND tags:riak)• Range Query ctime:[98685879630026 TO 98686484430026]
  • 11. Clipboard App Flow Client node.js Riak Go to Search clips bucket query = login:greg Top 20 results Top 20 results startrendering (For each clip) API Request for blob GET from blobs bucket Return blob to client render blob
  • 12. Clipboard Queries login:greg mentions:greg ctime:[98685879630026 TO 98686484430026] (Search)
  • 13. Clipboard Queries cont. login:greg AND tags:riak login:greg AND text:node AND text:javascript (Search)
  • 14. Uh oh login:greg AND private:false Matches only my clips Matches 20% of all clips! login:greg AND text:iPhone (Search)
  • 15. Index Partitioning Schemes
  • 16. Doc Partition Query Processing1. x AND y (sort z, start = 990, count = 10)2. On Each node: 1. Perform x AND y 2. Sort on z 3. Slice [ 0 .. 1000 ] 4. Send to aggregator3. On aggregator 1. Merge all results (N x 1000) 2. Slice [ 990 .. 1000 ]
  • 17. Term Partition Query Processing1. x AND y (sort z, start = 990, count = 10)2. On x node: search for x (and send all)3. On y node: search for y (and send all)4. On aggregator: 1. Do x AND y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 18. Riak Search Issues1. For any singular term, all results must be sent back to aggregator.2. Incorrectly performs sort and slice (does sort then slice)3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|).4. All matches must be read to get sort field.
  • 19. Riak Search Fixes1. Inline fields for short and common attributes.2. Dynamic fields for precomputed ANDs.3. PRESORT option for sorting without document reads.
  • 20. Inline FieldsNifty feature added recently to Riak SearchFields only used to prune result set can bemade inline for a big perf winNormal query applied first – then results filteredquickly with inline “filter” queryHigh storage cost – only viable for small fields! (Search)
  • 21. Riak Search – Inline Fields cont. login:greg AND private:false becomes Query - login:greg Filter Query – private:false private:false is efficiently applied only to results of login:greg. Hooray! (Search)
  • 22. Fixing ANDsBut what about login:greg AND text:iPhone?text field is too large to inline!We had to get creative. (Search)
  • 23. Dynamic FieldsOur Solution: Create a new field - text_u (u for user)Values in text_u have the user’s name appendedIn greg’s clip text:iPhone  text_greg:iPhoneIn bob’s clip text:iPhone  text_bob:iPhone (Search)
  • 24. Presort on Keys• Our addition to Riak code base.• Does sort before slice• If PRESORT=key, then never reads the docs• Tremendous win (> 100x compared to M/R approaches)
  • 25. Clip Keys<Time (ms)><User (guid)><SHA1 of Value>• Base-64 encode each component• Only use first 4 characters of user & content• Only 16 bytesCollisions? 1 in 17M if clipped the same thingat same time.
  • 26. Our Query Processing1. w AND (x AND y) (sort z, start = 990, count = 10)2. On w_x node: search and send w_x3. On w_y node: search and send all w_y4. On aggregator: 1. Do w_x AND w_y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 27. Summary• Use inline fields for short and common bits• Use dynamic fields for prebuilt ANDs• Use keys that imply sort order• Use same techniques for pagination• Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).
  • 28. Questions?
  • 29. We’re hiring! Invitation Code: just4u Or talk to us right now! Thanks!