Riak Search
    A Full-Text Search
   and Indexing Engine
         based on Riak


      Berlin Buzzwords· June 2010

    ...
Why did we build it?
What are the major goals?
  How does it work?




                            2
Part One
Why did we build
 Riak Search?




                   3
Riak is
a scalable, highly-available, networked,
    open-source key/value store.




                                    ...
Writing to a Key/Value Store




                 Key/Value




       CLIENT                  RIAK
                      ...
Writing to a Key/Value Store




                     Object




       CLIENT                  RIAK
                     ...
Querying a Key/Value Store




                       Key


                     Object

       CLIENT                 RIA...
Querying Riak via LinkWalking

               Key + Instructions
                                     Walk to
            ...
Querying Riak via Map/Reduce

            Key(s) + JS Functions
                                      Map

               ...
Key/Value Stores
       like
Key-Based Queries




                    10
Query by Secondary Index


            where Category == "Shoes"




                           WTF!? I'm a
              ...
Full-Text Query


                  "Converse AND Shoes"




                                This is
                     ...
These kinds of queries
   need an Index.

*Market Opportunity!*



                         13
Part Two
What are the major
goals of Riak Search?




                        14
An application built on Riak.




              Your
                                Riak
          Application




      ...
Hrm... I need an index.




              Your
                             Riak
          Application     Index
         ...
Hrm... I need an index with more features.




       Your
                         ???           Riak
  Application




 ...
Lucene should do the trick...




       Your
                      Lucene    Riak
   Application




                    ...
...shard to add more storage capacity...




      Your
   Application
                 Lucene   Lucene   Lucene   Riak


...
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
             ...
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
             ...
What do we really want?




       Your       Riak-ified
                               Riak
   Application      Lucene


...
What do we really want?




       Your          Riak
                            Riak
   Application     Search




     ...
Functionality? Be like Lucene (and more).

• Lucene Syntax
• Leverages Java Lucene Analyzers
• Solr Endpoints
• Integratio...
Operations? Be like Riak.

• No special nodes
• Add nodes, get more compute and storage
• Automatically load balance
• Rep...
Part Three
How do we do it?




                   26
A Gentle Introduction to
  Document Indexing




                           27
The Inverted Index

    Document           Inverted Index



                           day, 1
                           ...
The Inverted Index

   Documents            Combined Inverted Index
                                and, 4
 #1   Every dog...
At Query Time...

                   "dog AND cat"




                       AND



           dog                     ca...
At Query Time...

                   AND


           dog            cat


         dog, 1
                         cat, 3...
At Query Time...
                      Result: 4




                        AND
                 (Merge Intersection)



...
At Query Time...
                   Result: 1, 2, 3, 4




                           OR
                      (Merge Unio...
Complex Behavior from Simple Structures




                                          34
Storage Approaches...




                        35
Riak Search uses
Consistent Hashing
 to store data on
     Partitions



                     36
Introduction to Consistent Hashing and Partitions

                Partitions = 10
                Number of Nodes = 5
   ...
Introduction to Consistent Hashing and Partitions


             Object




                                              ...
Document Partitioning
        vs.
  Term Partitioning




                        39
...and the
Resulting Tradeoffs




                      40
Document Partitioning @ Index Time


     #1   Every dog has
          his day.




                                     41
Document Partitioning @ Query Time

                          "dog OR cat"




                                         42
Term Partitioning @ Index Time
                                 day, 1
                                 dog, 1
     #1   E...
Term Partitioning @ Index Time



                       dog, 1
     day, 1                      has, 1




     his, 1   ...
Term Partitioning @ Query Time

                           "dog OR cat"




                                          45
Tradeoffs...


    Document Partitioning       Term Partitioning

   + Lower Latency Queries   - Higher Latency Queries
  ...
Riak Search: Term Partitioning

Term-partitioning is the most viable approach for our beta
clients’ needs: high throughput...
Part Four
 Review




            48
Riak Search turns this...


                  "Converse AND Shoes"




                              WTF!? I'm a
         ...
...into this...


                  "Converse AND Shoes"




                                Gladly!



           CLIENT ...
...into this...


                  "Converse AND Shoes"




                    Keys or Objects




           CLIENT    ...
...while keeping operations easy.




        Your            Riak
                                    Riak
    Applicatio...
Thanks! Questions?

                        Search Team:

                        John Muellerleile - @jrecursive

       ...
Upcoming SlideShare
Loading in...5
×

Riak Search - Berlin Buzzwords 2010

2,210

Published on

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,210
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Riak Search - Berlin Buzzwords 2010

  1. 1. Riak Search A Full-Text Search and Indexing Engine based on Riak Berlin Buzzwords· June 2010 Basho Technologies Rusty Klophaus - @rklophaus
  2. 2. Why did we build it? What are the major goals? How does it work? 2
  3. 3. Part One Why did we build Riak Search? 3
  4. 4. Riak is a scalable, highly-available, networked, open-source key/value store. 4
  5. 5. Writing to a Key/Value Store Key/Value CLIENT RIAK 5
  6. 6. Writing to a Key/Value Store Object CLIENT RIAK 6
  7. 7. Querying a Key/Value Store Key Object CLIENT RIAK 7
  8. 8. Querying Riak via LinkWalking Key + Instructions Walk to Related Keys Object(s) CLIENT RIAK 8
  9. 9. Querying Riak via Map/Reduce Key(s) + JS Functions Map Map Reduce Computed Value(s) CLIENT RIAK 9
  10. 10. Key/Value Stores like Key-Based Queries 10
  11. 11. Query by Secondary Index where Category == "Shoes" WTF!? I'm a KV store! CLIENT RIAK 11
  12. 12. Full-Text Query "Converse AND Shoes" This is getting old. CLIENT RIAK 12
  13. 13. These kinds of queries need an Index. *Market Opportunity!* 13
  14. 14. Part Two What are the major goals of Riak Search? 14
  15. 15. An application built on Riak. Your Riak Application 15
  16. 16. Hrm... I need an index. Your Riak Application Index Object 16
  17. 17. Hrm... I need an index with more features. Your ??? Riak Application 17
  18. 18. Lucene should do the trick... Your Lucene Riak Application 18
  19. 19. ...shard to add more storage capacity... Your Application Lucene Lucene Lucene Riak 19
  20. 20. ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene 20
  21. 21. ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene Operations nightmare! 21
  22. 22. What do we really want? Your Riak-ified Riak Application Lucene 22
  23. 23. What do we really want? Your Riak Riak Application Search 23
  24. 24. Functionality? Be like Lucene (and more). • Lucene Syntax • Leverages Java Lucene Analyzers • Solr Endpoints • Integration via Riak Post-Commit Hook (Index) • Integration via Riak Map/Reduce (Query) • Near-Realtime • Schema-less 24
  25. 25. Operations? Be like Riak. • No special nodes • Add nodes, get more compute and storage • Automatically load balance • Replicas for durability and performance • Index and query in parallel • Swappable storage backends 25
  26. 26. Part Three How do we do it? 26
  27. 27. A Gentle Introduction to Document Indexing 27
  28. 28. The Inverted Index Document Inverted Index day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 28
  29. 29. The Inverted Index Documents Combined Inverted Index and, 4 #1 Every dog has his day. bag, 3 bark, 2 bite, 2 The dog's bark #2 cat, 3 is worse than his bite. cat, 4 day, 1 dog, 1 #3 Let the cat out of the bag. dog, 2 dog, 4 every, 1 #4 It's raining cats and dogs. has, 1 ... 29
  30. 30. At Query Time... "dog AND cat" AND dog cat 30
  31. 31. At Query Time... AND dog cat dog, 1 cat, 3 dog, 2 cat, 4 dog, 4 31
  32. 32. At Query Time... Result: 4 AND (Merge Intersection) 1 3 2 4 4 32
  33. 33. At Query Time... Result: 1, 2, 3, 4 OR (Merge Union) 1 3 2 4 4 33
  34. 34. Complex Behavior from Simple Structures 34
  35. 35. Storage Approaches... 35
  36. 36. Riak Search uses Consistent Hashing to store data on Partitions 36
  37. 37. Introduction to Consistent Hashing and Partitions Partitions = 10 Number of Nodes = 5 Partitions per Node = 2 Replicas (NVal) = 2 37
  38. 38. Introduction to Consistent Hashing and Partitions Object 38
  39. 39. Document Partitioning vs. Term Partitioning 39
  40. 40. ...and the Resulting Tradeoffs 40
  41. 41. Document Partitioning @ Index Time #1 Every dog has his day. 41
  42. 42. Document Partitioning @ Query Time "dog OR cat" 42
  43. 43. Term Partitioning @ Index Time day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 43
  44. 44. Term Partitioning @ Index Time dog, 1 day, 1 has, 1 his, 1 every, 1 44
  45. 45. Term Partitioning @ Query Time "dog OR cat" 45
  46. 46. Tradeoffs... Document Partitioning Term Partitioning + Lower Latency Queries - Higher Latency Queries - Lower Throughput + Higher Throughput - Lots of Disk Seeks - Hotspots in Ring (the "Obama" problem) 46
  47. 47. Riak Search: Term Partitioning Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets. Optimizations: • Term splitting to reduce hot spots • Bloom filters & caching to save query-time bandwidth • Batching to save query-time & index-time bandwidth Support for either approach eventually. 47
  48. 48. Part Four Review 48
  49. 49. Riak Search turns this... "Converse AND Shoes" WTF!? I'm a KV store! CLIENT RIAK 49
  50. 50. ...into this... "Converse AND Shoes" Gladly! CLIENT RIAK 50
  51. 51. ...into this... "Converse AND Shoes" Keys or Objects CLIENT RIAK 51
  52. 52. ...while keeping operations easy. Your Riak Riak Application Search 52
  53. 53. Thanks! Questions? Search Team: John Muellerleile - @jrecursive Rusty Klophaus - @rklophaus Kevin Smith - @kevsmith Currently working with a small set of Beta users. Open-source release planned for Q3. www.basho.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×