Riak Search - Berlin Buzzwords 2010

2,672 views

Published on

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,672
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
21
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Riak Search - Berlin Buzzwords 2010

  1. 1. Riak Search A Full-Text Search and Indexing Engine based on Riak Berlin Buzzwords· June 2010 Basho Technologies Rusty Klophaus - @rklophaus
  2. 2. Why did we build it? What are the major goals? How does it work? 2
  3. 3. Part One Why did we build Riak Search? 3
  4. 4. Riak is a scalable, highly-available, networked, open-source key/value store. 4
  5. 5. Writing to a Key/Value Store Key/Value CLIENT RIAK 5
  6. 6. Writing to a Key/Value Store Object CLIENT RIAK 6
  7. 7. Querying a Key/Value Store Key Object CLIENT RIAK 7
  8. 8. Querying Riak via LinkWalking Key + Instructions Walk to Related Keys Object(s) CLIENT RIAK 8
  9. 9. Querying Riak via Map/Reduce Key(s) + JS Functions Map Map Reduce Computed Value(s) CLIENT RIAK 9
  10. 10. Key/Value Stores like Key-Based Queries 10
  11. 11. Query by Secondary Index where Category == "Shoes" WTF!? I'm a KV store! CLIENT RIAK 11
  12. 12. Full-Text Query "Converse AND Shoes" This is getting old. CLIENT RIAK 12
  13. 13. These kinds of queries need an Index. *Market Opportunity!* 13
  14. 14. Part Two What are the major goals of Riak Search? 14
  15. 15. An application built on Riak. Your Riak Application 15
  16. 16. Hrm... I need an index. Your Riak Application Index Object 16
  17. 17. Hrm... I need an index with more features. Your ??? Riak Application 17
  18. 18. Lucene should do the trick... Your Lucene Riak Application 18
  19. 19. ...shard to add more storage capacity... Your Application Lucene Lucene Lucene Riak 19
  20. 20. ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene 20
  21. 21. ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene Operations nightmare! 21
  22. 22. What do we really want? Your Riak-ified Riak Application Lucene 22
  23. 23. What do we really want? Your Riak Riak Application Search 23
  24. 24. Functionality? Be like Lucene (and more). • Lucene Syntax • Leverages Java Lucene Analyzers • Solr Endpoints • Integration via Riak Post-Commit Hook (Index) • Integration via Riak Map/Reduce (Query) • Near-Realtime • Schema-less 24
  25. 25. Operations? Be like Riak. • No special nodes • Add nodes, get more compute and storage • Automatically load balance • Replicas for durability and performance • Index and query in parallel • Swappable storage backends 25
  26. 26. Part Three How do we do it? 26
  27. 27. A Gentle Introduction to Document Indexing 27
  28. 28. The Inverted Index Document Inverted Index day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 28
  29. 29. The Inverted Index Documents Combined Inverted Index and, 4 #1 Every dog has his day. bag, 3 bark, 2 bite, 2 The dog's bark #2 cat, 3 is worse than his bite. cat, 4 day, 1 dog, 1 #3 Let the cat out of the bag. dog, 2 dog, 4 every, 1 #4 It's raining cats and dogs. has, 1 ... 29
  30. 30. At Query Time... "dog AND cat" AND dog cat 30
  31. 31. At Query Time... AND dog cat dog, 1 cat, 3 dog, 2 cat, 4 dog, 4 31
  32. 32. At Query Time... Result: 4 AND (Merge Intersection) 1 3 2 4 4 32
  33. 33. At Query Time... Result: 1, 2, 3, 4 OR (Merge Union) 1 3 2 4 4 33
  34. 34. Complex Behavior from Simple Structures 34
  35. 35. Storage Approaches... 35
  36. 36. Riak Search uses Consistent Hashing to store data on Partitions 36
  37. 37. Introduction to Consistent Hashing and Partitions Partitions = 10 Number of Nodes = 5 Partitions per Node = 2 Replicas (NVal) = 2 37
  38. 38. Introduction to Consistent Hashing and Partitions Object 38
  39. 39. Document Partitioning vs. Term Partitioning 39
  40. 40. ...and the Resulting Tradeoffs 40
  41. 41. Document Partitioning @ Index Time #1 Every dog has his day. 41
  42. 42. Document Partitioning @ Query Time "dog OR cat" 42
  43. 43. Term Partitioning @ Index Time day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 43
  44. 44. Term Partitioning @ Index Time dog, 1 day, 1 has, 1 his, 1 every, 1 44
  45. 45. Term Partitioning @ Query Time "dog OR cat" 45
  46. 46. Tradeoffs... Document Partitioning Term Partitioning + Lower Latency Queries - Higher Latency Queries - Lower Throughput + Higher Throughput - Lots of Disk Seeks - Hotspots in Ring (the "Obama" problem) 46
  47. 47. Riak Search: Term Partitioning Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets. Optimizations: • Term splitting to reduce hot spots • Bloom filters & caching to save query-time bandwidth • Batching to save query-time & index-time bandwidth Support for either approach eventually. 47
  48. 48. Part Four Review 48
  49. 49. Riak Search turns this... "Converse AND Shoes" WTF!? I'm a KV store! CLIENT RIAK 49
  50. 50. ...into this... "Converse AND Shoes" Gladly! CLIENT RIAK 50
  51. 51. ...into this... "Converse AND Shoes" Keys or Objects CLIENT RIAK 51
  52. 52. ...while keeping operations easy. Your Riak Riak Application Search 52
  53. 53. Thanks! Questions? Search Team: John Muellerleile - @jrecursive Rusty Klophaus - @rklophaus Kevin Smith - @kevsmith Currently working with a small set of Beta users. Open-source release planned for Q3. www.basho.com

×