Riak
Search
the next generation
Tuesday, September 17, 13
tsantero
@basho.com
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
2.0coming soon..
Tuesday, September 17, 13
the history of
Riak Search
Tuesday, September 17, 13
home grown
full-text search
Tuesday, September 17, 13
lucene
Tuesday, September 17, 13
SCALETuesday, September 17, 13
NODE # = HASH(KEY) % NUM_NODES
NH(Ka) = 0
NH(Kb) = 1
NH(Kc) = 2
NH(Kd) = 0
...
Naive Hashing
Tuesday, September 17, 13
NODE 0 NODE 1 NODE 2
Ka Kb KcKd Ke KfKg Kh Ki
Kj KkKm KlKp Kn KoKq Kr
Naive Hashing
Tuesday, September 17, 13
NODE 0 NODE 1 NODE 2
Ka Kb Kc KdKgKi
NODE 3
Ke Kf KhKj Kk Kl
Km Kn Ko KpKq Kr
Naive Hashing
Tuesday, September 17, 13
K * (NN - 1) / NN => K
• K = # OF KEYS
• NN = # OF NODES
• AS NN GROWS FACTOR ESSENTIALLY BECOMES 1, THUS
ALL KEYS MOVE
Na...
PARTITION # = HASH(KEY) % PARTITIONS
• # PARTITIONS REMAINS CONSTANT
• KEY ALWAYS MAPS TO SAME PARTITION
• NODES OWN PARTI...
P9P6P3P8P5P2P7P4P1
NODE 0 NODE 1 NODE 2
Ka Kb KcKd Ke KfKg Kh Ki
Kj KkKm KlKp Kn KoKq Kr
Consistent Hashing
Tuesday, Septe...
P9P6P3P8 P5P2P7P4 P1
NODE 0 NODE 1 NODE 2
KaKb KcKd KeKfKg Kh Ki
KjKkKm KlKp KnKoKq Kr
NODE 3
Consistent Hashing
Tuesday, ...
NN * K/Q => K/Q
• K = # OF KEYS
• NN = # OF NODES
• Q = # OF PARTITIONS
• AS K GROWS NN BECOMES CONSTANT,THUS K/Q KEYS
MOV...
uniform distribution
Consistent
Hashing
{logical vs physical
partitioning scheme
even division of keys
Tuesday, September ...
the future of
Riak Search
Tuesday, September 17, 13
Tuesday, September 17, 13
persistence
distributing Solr
querying
indexing
Tuesday, September 17, 13
Each Riak node runs an instance of Solr
Tuesday, September 17, 13
Solr index = riak bucket
document = RObj value
plaintext, JSON, XML
Tuesday, September 17, 13
Distributed Searching in Solr
query
faceting
highlighting
stats
spell check
term vectors
Tuesday, September 17, 13
SolrCloud
Tuesday, September 17, 13
SolrCloud
Tuesday, September 17, 13
Harvest vs Yield
Tuesday, September 17, 13
A better measure of
Availability
Tuesday, September 17, 13
Queries Issues
Queries OfferedYield =
Tuesday, September 17, 13
Harvest =
Data Available
Total Dataset
Tuesday, September 17, 13
Harvest
Yield
Tuesday, September 17, 13
Manage Harvest by storing Index Replicas
Tuesday, September 17, 13
Term vs Document
Partitioning Schemes
Tuesday, September 17, 13
Node 0
Node 1
Node 2
Term Based Partitioning
Tuesday, September 17, 13
Node 0
Node 1
Node 2
Document Based Partitioning
Tuesday, September 17, 13
Replicas
Node 0
Node 1
Node 2
Tuesday, September 17, 13
Quorums
Tuesday, September 17, 13
Concurrency
=>
Siblings
Tuesday, September 17, 13
Read Repair
(Anti-Entropy)
Tuesday, September 17, 13
replica replica replica
Tuesday, September 17, 13
replica replica replica
X
Tuesday, September 17, 13
replica replica replica
replica replica replica
Tuesday, September 17, 13
Active Anti-Entropy
(self healing clusters)
Tuesday, September 17, 13
real-time updates
persistent
non-blocking
disk-based
Tuesday, September 17, 13
Tuesday, September 17, 13
= hashes marked “dirty”
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
= keys to read-repair
Tuesday, September 17, 13
Questions?
make it so!
Tuesday, September 17, 13
Upcoming SlideShare
Loading in...5
×

Riak Search - The Next Generation

1,023

Published on

Tom Santero, Technical Evangelist, Basho Technologies delivered this presentation at a recent Big Data Warehousing Meetup that was held with Caserta Concepts.

Full-text search capabilities have existed in Riak since 2010. Known simply as Riak Search, the implementation is a homegrown adaptation inspired by Lucene. While Riak Search has been used in production for many years now, Basho has been developing a replacement over the course of the past 12 months in a project codenamed 'Yokozuna'. In this talk, Tom discussed the motivation behind Yokozuna, how it works and why you want to run it in production.

For more information, visit http://basho.com/ or http://www.casertaconcepts.com/.

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,023
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Riak Search - The Next Generation

  1. 1. Riak Search the next generation Tuesday, September 17, 13
  2. 2. tsantero @basho.com Tuesday, September 17, 13
  3. 3. Tuesday, September 17, 13
  4. 4. Tuesday, September 17, 13
  5. 5. 2.0coming soon.. Tuesday, September 17, 13
  6. 6. the history of Riak Search Tuesday, September 17, 13
  7. 7. home grown full-text search Tuesday, September 17, 13
  8. 8. lucene Tuesday, September 17, 13
  9. 9. SCALETuesday, September 17, 13
  10. 10. NODE # = HASH(KEY) % NUM_NODES NH(Ka) = 0 NH(Kb) = 1 NH(Kc) = 2 NH(Kd) = 0 ... Naive Hashing Tuesday, September 17, 13
  11. 11. NODE 0 NODE 1 NODE 2 Ka Kb KcKd Ke KfKg Kh Ki Kj KkKm KlKp Kn KoKq Kr Naive Hashing Tuesday, September 17, 13
  12. 12. NODE 0 NODE 1 NODE 2 Ka Kb Kc KdKgKi NODE 3 Ke Kf KhKj Kk Kl Km Kn Ko KpKq Kr Naive Hashing Tuesday, September 17, 13
  13. 13. K * (NN - 1) / NN => K • K = # OF KEYS • NN = # OF NODES • AS NN GROWS FACTOR ESSENTIALLY BECOMES 1, THUS ALL KEYS MOVE Naive Hashing Tuesday, September 17, 13
  14. 14. PARTITION # = HASH(KEY) % PARTITIONS • # PARTITIONS REMAINS CONSTANT • KEY ALWAYS MAPS TO SAME PARTITION • NODES OWN PARTITIONS • PARTITIONS CONTAIN KEYS • EXTRA LEVEL OF INDIRECTION Consistent Hashing Tuesday, September 17, 13
  15. 15. P9P6P3P8P5P2P7P4P1 NODE 0 NODE 1 NODE 2 Ka Kb KcKd Ke KfKg Kh Ki Kj KkKm KlKp Kn KoKq Kr Consistent Hashing Tuesday, September 17, 13
  16. 16. P9P6P3P8 P5P2P7P4 P1 NODE 0 NODE 1 NODE 2 KaKb KcKd KeKfKg Kh Ki KjKkKm KlKp KnKoKq Kr NODE 3 Consistent Hashing Tuesday, September 17, 13
  17. 17. NN * K/Q => K/Q • K = # OF KEYS • NN = # OF NODES • Q = # OF PARTITIONS • AS K GROWS NN BECOMES CONSTANT,THUS K/Q KEYS MOVE Consistent Hashing Tuesday, September 17, 13
  18. 18. uniform distribution Consistent Hashing {logical vs physical partitioning scheme even division of keys Tuesday, September 17, 13
  19. 19. the future of Riak Search Tuesday, September 17, 13
  20. 20. Tuesday, September 17, 13
  21. 21. persistence distributing Solr querying indexing Tuesday, September 17, 13
  22. 22. Each Riak node runs an instance of Solr Tuesday, September 17, 13
  23. 23. Solr index = riak bucket document = RObj value plaintext, JSON, XML Tuesday, September 17, 13
  24. 24. Distributed Searching in Solr query faceting highlighting stats spell check term vectors Tuesday, September 17, 13
  25. 25. SolrCloud Tuesday, September 17, 13
  26. 26. SolrCloud Tuesday, September 17, 13
  27. 27. Harvest vs Yield Tuesday, September 17, 13
  28. 28. A better measure of Availability Tuesday, September 17, 13
  29. 29. Queries Issues Queries OfferedYield = Tuesday, September 17, 13
  30. 30. Harvest = Data Available Total Dataset Tuesday, September 17, 13
  31. 31. Harvest Yield Tuesday, September 17, 13
  32. 32. Manage Harvest by storing Index Replicas Tuesday, September 17, 13
  33. 33. Term vs Document Partitioning Schemes Tuesday, September 17, 13
  34. 34. Node 0 Node 1 Node 2 Term Based Partitioning Tuesday, September 17, 13
  35. 35. Node 0 Node 1 Node 2 Document Based Partitioning Tuesday, September 17, 13
  36. 36. Replicas Node 0 Node 1 Node 2 Tuesday, September 17, 13
  37. 37. Quorums Tuesday, September 17, 13
  38. 38. Concurrency => Siblings Tuesday, September 17, 13
  39. 39. Read Repair (Anti-Entropy) Tuesday, September 17, 13
  40. 40. replica replica replica Tuesday, September 17, 13
  41. 41. replica replica replica X Tuesday, September 17, 13
  42. 42. replica replica replica replica replica replica Tuesday, September 17, 13
  43. 43. Active Anti-Entropy (self healing clusters) Tuesday, September 17, 13
  44. 44. real-time updates persistent non-blocking disk-based Tuesday, September 17, 13
  45. 45. Tuesday, September 17, 13
  46. 46. = hashes marked “dirty” Tuesday, September 17, 13
  47. 47. Tuesday, September 17, 13
  48. 48. Tuesday, September 17, 13
  49. 49. Tuesday, September 17, 13
  50. 50. Tuesday, September 17, 13
  51. 51. = keys to read-repair Tuesday, September 17, 13
  52. 52. Questions? make it so! Tuesday, September 17, 13

×