• Save
Riak Search - The Next Generation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Riak Search - The Next Generation

on

  • 1,065 views

Tom Santero, Technical Evangelist, Basho Technologies delivered this presentation at a recent Big Data Warehousing Meetup that was held with Caserta Concepts. ...

Tom Santero, Technical Evangelist, Basho Technologies delivered this presentation at a recent Big Data Warehousing Meetup that was held with Caserta Concepts.

Full-text search capabilities have existed in Riak since 2010. Known simply as Riak Search, the implementation is a homegrown adaptation inspired by Lucene. While Riak Search has been used in production for many years now, Basho has been developing a replacement over the course of the past 12 months in a project codenamed 'Yokozuna'. In this talk, Tom discussed the motivation behind Yokozuna, how it works and why you want to run it in production.

For more information, visit http://basho.com/ or http://www.casertaconcepts.com/.

Statistics

Views

Total Views
1,065
Views on SlideShare
1,065
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Riak Search - The Next Generation Presentation Transcript

  • 1. Riak Search the next generation Tuesday, September 17, 13
  • 2. tsantero @basho.com Tuesday, September 17, 13
  • 3. Tuesday, September 17, 13
  • 4. Tuesday, September 17, 13
  • 5. 2.0coming soon.. Tuesday, September 17, 13
  • 6. the history of Riak Search Tuesday, September 17, 13
  • 7. home grown full-text search Tuesday, September 17, 13
  • 8. lucene Tuesday, September 17, 13
  • 9. SCALETuesday, September 17, 13
  • 10. NODE # = HASH(KEY) % NUM_NODES NH(Ka) = 0 NH(Kb) = 1 NH(Kc) = 2 NH(Kd) = 0 ... Naive Hashing Tuesday, September 17, 13
  • 11. NODE 0 NODE 1 NODE 2 Ka Kb KcKd Ke KfKg Kh Ki Kj KkKm KlKp Kn KoKq Kr Naive Hashing Tuesday, September 17, 13
  • 12. NODE 0 NODE 1 NODE 2 Ka Kb Kc KdKgKi NODE 3 Ke Kf KhKj Kk Kl Km Kn Ko KpKq Kr Naive Hashing Tuesday, September 17, 13
  • 13. K * (NN - 1) / NN => K • K = # OF KEYS • NN = # OF NODES • AS NN GROWS FACTOR ESSENTIALLY BECOMES 1, THUS ALL KEYS MOVE Naive Hashing Tuesday, September 17, 13
  • 14. PARTITION # = HASH(KEY) % PARTITIONS • # PARTITIONS REMAINS CONSTANT • KEY ALWAYS MAPS TO SAME PARTITION • NODES OWN PARTITIONS • PARTITIONS CONTAIN KEYS • EXTRA LEVEL OF INDIRECTION Consistent Hashing Tuesday, September 17, 13
  • 15. P9P6P3P8P5P2P7P4P1 NODE 0 NODE 1 NODE 2 Ka Kb KcKd Ke KfKg Kh Ki Kj KkKm KlKp Kn KoKq Kr Consistent Hashing Tuesday, September 17, 13
  • 16. P9P6P3P8 P5P2P7P4 P1 NODE 0 NODE 1 NODE 2 KaKb KcKd KeKfKg Kh Ki KjKkKm KlKp KnKoKq Kr NODE 3 Consistent Hashing Tuesday, September 17, 13
  • 17. NN * K/Q => K/Q • K = # OF KEYS • NN = # OF NODES • Q = # OF PARTITIONS • AS K GROWS NN BECOMES CONSTANT,THUS K/Q KEYS MOVE Consistent Hashing Tuesday, September 17, 13
  • 18. uniform distribution Consistent Hashing {logical vs physical partitioning scheme even division of keys Tuesday, September 17, 13
  • 19. the future of Riak Search Tuesday, September 17, 13
  • 20. Tuesday, September 17, 13
  • 21. persistence distributing Solr querying indexing Tuesday, September 17, 13
  • 22. Each Riak node runs an instance of Solr Tuesday, September 17, 13
  • 23. Solr index = riak bucket document = RObj value plaintext, JSON, XML Tuesday, September 17, 13
  • 24. Distributed Searching in Solr query faceting highlighting stats spell check term vectors Tuesday, September 17, 13
  • 25. SolrCloud Tuesday, September 17, 13
  • 26. SolrCloud Tuesday, September 17, 13
  • 27. Harvest vs Yield Tuesday, September 17, 13
  • 28. A better measure of Availability Tuesday, September 17, 13
  • 29. Queries Issues Queries OfferedYield = Tuesday, September 17, 13
  • 30. Harvest = Data Available Total Dataset Tuesday, September 17, 13
  • 31. Harvest Yield Tuesday, September 17, 13
  • 32. Manage Harvest by storing Index Replicas Tuesday, September 17, 13
  • 33. Term vs Document Partitioning Schemes Tuesday, September 17, 13
  • 34. Node 0 Node 1 Node 2 Term Based Partitioning Tuesday, September 17, 13
  • 35. Node 0 Node 1 Node 2 Document Based Partitioning Tuesday, September 17, 13
  • 36. Replicas Node 0 Node 1 Node 2 Tuesday, September 17, 13
  • 37. Quorums Tuesday, September 17, 13
  • 38. Concurrency => Siblings Tuesday, September 17, 13
  • 39. Read Repair (Anti-Entropy) Tuesday, September 17, 13
  • 40. replica replica replica Tuesday, September 17, 13
  • 41. replica replica replica X Tuesday, September 17, 13
  • 42. replica replica replica replica replica replica Tuesday, September 17, 13
  • 43. Active Anti-Entropy (self healing clusters) Tuesday, September 17, 13
  • 44. real-time updates persistent non-blocking disk-based Tuesday, September 17, 13
  • 45. Tuesday, September 17, 13
  • 46. = hashes marked “dirty” Tuesday, September 17, 13
  • 47. Tuesday, September 17, 13
  • 48. Tuesday, September 17, 13
  • 49. Tuesday, September 17, 13
  • 50. Tuesday, September 17, 13
  • 51. = keys to read-repair Tuesday, September 17, 13
  • 52. Questions? make it so! Tuesday, September 17, 13