• Save
Riak Search - The Next Generation
Upcoming SlideShare
Loading in...5
×
 

Riak Search - The Next Generation

on

  • 962 views

Tom Santero, Technical Evangelist, Basho Technologies delivered this presentation at a recent Big Data Warehousing Meetup that was held with Caserta Concepts. ...

Tom Santero, Technical Evangelist, Basho Technologies delivered this presentation at a recent Big Data Warehousing Meetup that was held with Caserta Concepts.

Full-text search capabilities have existed in Riak since 2010. Known simply as Riak Search, the implementation is a homegrown adaptation inspired by Lucene. While Riak Search has been used in production for many years now, Basho has been developing a replacement over the course of the past 12 months in a project codenamed 'Yokozuna'. In this talk, Tom discussed the motivation behind Yokozuna, how it works and why you want to run it in production.

For more information, visit http://basho.com/ or http://www.casertaconcepts.com/.

Statistics

Views

Total Views
962
Slideshare-icon Views on SlideShare
962
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Riak Search - The Next Generation Riak Search - The Next Generation Presentation Transcript

    • Riak Search the next generation Tuesday, September 17, 13
    • tsantero @basho.com Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • 2.0coming soon.. Tuesday, September 17, 13
    • the history of Riak Search Tuesday, September 17, 13
    • home grown full-text search Tuesday, September 17, 13
    • lucene Tuesday, September 17, 13
    • SCALETuesday, September 17, 13
    • NODE # = HASH(KEY) % NUM_NODES NH(Ka) = 0 NH(Kb) = 1 NH(Kc) = 2 NH(Kd) = 0 ... Naive Hashing Tuesday, September 17, 13
    • NODE 0 NODE 1 NODE 2 Ka Kb KcKd Ke KfKg Kh Ki Kj KkKm KlKp Kn KoKq Kr Naive Hashing Tuesday, September 17, 13
    • NODE 0 NODE 1 NODE 2 Ka Kb Kc KdKgKi NODE 3 Ke Kf KhKj Kk Kl Km Kn Ko KpKq Kr Naive Hashing Tuesday, September 17, 13
    • K * (NN - 1) / NN => K • K = # OF KEYS • NN = # OF NODES • AS NN GROWS FACTOR ESSENTIALLY BECOMES 1, THUS ALL KEYS MOVE Naive Hashing Tuesday, September 17, 13
    • PARTITION # = HASH(KEY) % PARTITIONS • # PARTITIONS REMAINS CONSTANT • KEY ALWAYS MAPS TO SAME PARTITION • NODES OWN PARTITIONS • PARTITIONS CONTAIN KEYS • EXTRA LEVEL OF INDIRECTION Consistent Hashing Tuesday, September 17, 13
    • P9P6P3P8P5P2P7P4P1 NODE 0 NODE 1 NODE 2 Ka Kb KcKd Ke KfKg Kh Ki Kj KkKm KlKp Kn KoKq Kr Consistent Hashing Tuesday, September 17, 13
    • P9P6P3P8 P5P2P7P4 P1 NODE 0 NODE 1 NODE 2 KaKb KcKd KeKfKg Kh Ki KjKkKm KlKp KnKoKq Kr NODE 3 Consistent Hashing Tuesday, September 17, 13
    • NN * K/Q => K/Q • K = # OF KEYS • NN = # OF NODES • Q = # OF PARTITIONS • AS K GROWS NN BECOMES CONSTANT,THUS K/Q KEYS MOVE Consistent Hashing Tuesday, September 17, 13
    • uniform distribution Consistent Hashing {logical vs physical partitioning scheme even division of keys Tuesday, September 17, 13
    • the future of Riak Search Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • persistence distributing Solr querying indexing Tuesday, September 17, 13
    • Each Riak node runs an instance of Solr Tuesday, September 17, 13
    • Solr index = riak bucket document = RObj value plaintext, JSON, XML Tuesday, September 17, 13
    • Distributed Searching in Solr query faceting highlighting stats spell check term vectors Tuesday, September 17, 13
    • SolrCloud Tuesday, September 17, 13
    • SolrCloud Tuesday, September 17, 13
    • Harvest vs Yield Tuesday, September 17, 13
    • A better measure of Availability Tuesday, September 17, 13
    • Queries Issues Queries OfferedYield = Tuesday, September 17, 13
    • Harvest = Data Available Total Dataset Tuesday, September 17, 13
    • Harvest Yield Tuesday, September 17, 13
    • Manage Harvest by storing Index Replicas Tuesday, September 17, 13
    • Term vs Document Partitioning Schemes Tuesday, September 17, 13
    • Node 0 Node 1 Node 2 Term Based Partitioning Tuesday, September 17, 13
    • Node 0 Node 1 Node 2 Document Based Partitioning Tuesday, September 17, 13
    • Replicas Node 0 Node 1 Node 2 Tuesday, September 17, 13
    • Quorums Tuesday, September 17, 13
    • Concurrency => Siblings Tuesday, September 17, 13
    • Read Repair (Anti-Entropy) Tuesday, September 17, 13
    • replica replica replica Tuesday, September 17, 13
    • replica replica replica X Tuesday, September 17, 13
    • replica replica replica replica replica replica Tuesday, September 17, 13
    • Active Anti-Entropy (self healing clusters) Tuesday, September 17, 13
    • real-time updates persistent non-blocking disk-based Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • = hashes marked “dirty” Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • Tuesday, September 17, 13
    • = keys to read-repair Tuesday, September 17, 13
    • Questions? make it so! Tuesday, September 17, 13