• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Riak Search - Erlang Factory London 2010
 

Riak Search - Erlang Factory London 2010

on

  • 9,836 views

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality, with specific ...

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality, with specific focus on how Erlang is used to manage and execute an ever-changing population of ad hoc query processes.

Statistics

Views

Total Views
9,836
Views on SlideShare
7,780
Embed Views
2,056

Actions

Likes
16
Downloads
91
Comments
0

11 Embeds 2,056

http://nosql.mypopescu.com 1770
http://www.clipboard.com 157
http://www.slideshare.net 55
http://baradates.tumblr.com 43
http://stage.clpbrd.com 15
http://a0.twimg.com 8
http://paper.li 3
http://translate.googleusercontent.com 2
http://thinkery.me 1
http://twitter.com 1
http://www.linkedin.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Riak Search - Erlang Factory London 2010 Riak Search - Erlang Factory London 2010 Presentation Transcript

  • Riak Search A Full-Text Search and Indexing Engine based on Riak Erlang Factory· London· June 2010 Basho Technologies Rusty Klophaus - @rklophaus
  • Why did we build it? What are the major goals? How does it work? 2
  • Part One Why did we build Riak Search? 3
  • Riak is a scalable, highly-available, networked, open-source key/value store. 4
  • Writing to a Key/Value Store Key/Value CLIENT RIAK 5
  • Writing to a Key/Value Store Object CLIENT RIAK 6
  • Querying a Key/Value Store Key Object CLIENT RIAK 7
  • Querying Riak via LinkWalking Key + Instructions Walk to Related Keys Object(s) CLIENT RIAK 8
  • Querying Riak via Map/Reduce Key(s) + JS Functions Map Map Reduce Computed Value(s) CLIENT RIAK 9
  • Key/Value Stores like Key-Based Queries 10
  • Query by Secondary Index where Category == "Shoes" WTF!? I'm a KV store! CLIENT RIAK 11
  • Full-Text Query "Converse AND Shoes" This is getting old. CLIENT RIAK 12
  • These kinds of queries need an Index. *Market Opportunity!* 13
  • Part Two What are the major goals of Riak Search? 14
  • An application built on Riak. Your Riak Application 15
  • Hrm... I need an index. Your Riak Application Index Object 16
  • Hrm... I need an index with more features. Your ??? Riak Application 17
  • Lucene should do the trick... Your Lucene Riak Application 18
  • ...shard to add more storage capacity... Your Application Lucene Lucene Lucene Riak 19
  • ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene 20
  • ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene Operations nightmare! 21
  • What do we really want? Your Riak-ified Riak Application Lucene 22
  • What do we really want? Your Riak Riak Application Search 23
  • Functionality? Be like Lucene (and more). • Lucene Syntax • Leverages Java Lucene Analyzers • Solr Endpoints • Integration via Riak Post-Commit Hook (Index) • Integration via Riak Map/Reduce (Query) • Near-Realtime • Schema-less 24
  • Operations? Be like Riak. • No special nodes • Add nodes, get more compute and storage • Automatically load balance • Replicas for durability and performance • Index and query in parallel • Swappable storage backends 25
  • Part Three How do we do it? 26
  • A Gentle Introduction to Document Indexing 27
  • The Inverted Index Document Inverted Index day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 28
  • The Inverted Index Documents Combined Inverted Index and, 4 #1 Every dog has his day. bag, 3 bark, 2 bite, 2 The dog's bark #2 cat, 3 is worse than his bite. cat, 4 day, 1 dog, 1 #3 Let the cat out of the bag. dog, 2 dog, 4 every, 1 #4 It's raining cats and dogs. has, 1 ... 29
  • At Query Time... "dog AND cat" AND dog cat 30
  • At Query Time... AND dog cat dog, 1 cat, 3 dog, 2 cat, 4 dog, 4 31
  • At Query Time... Result: 4 AND (Merge Intersection) 1 3 2 4 4 32
  • At Query Time... Result: 1, 2, 3, 4 OR (Merge Union) 1 3 2 4 4 33
  • Complex Behavior from Simple Structures 34
  • Storage Approaches... 35
  • Riak Search uses Consistent Hashing to store data on Partitions 36
  • Introduction to Consistent Hashing and Partitions Partitions = 10 Number of Nodes = 5 Partitions per Node = 2 Replicas (NVal) = 2 37
  • Introduction to Consistent Hashing and Partitions Object 38
  • Document Partitioning vs. Term Partitioning 39
  • ...and the Resulting Tradeoffs 40
  • Document Partitioning @ Index Time #1 Every dog has his day. 41
  • Document Partitioning @ Query Time "dog OR cat" 42
  • Term Partitioning @ Index Time day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 43
  • Term Partitioning @ Index Time dog, 1 day, 1 has, 1 his, 1 every, 1 44
  • Term Partitioning @ Query Time "dog OR cat" 45
  • Tradeoffs... Document Partitioning Term Partitioning + Lower Latency Queries - Higher Latency Queries - Lower Throughput + Higher Throughput - Lots of Disk Seeks - Hotspots in Ring (the "Obama" problem) 46
  • Riak Search: Term Partitioning Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets. Optimizations: • Term splitting to reduce hot spots • Bloom filters & caching to save query-time bandwidth • Batching to save query-time & index-time bandwidth Support for either approach eventually. 47
  • Diving Deeper: The Lifecycle of a Query 48
  • Parse the Query 49
  • The Query meeting AND (face OR phone) 50
  • The Query as an Erlang Term (Parse w/ Leex and Yecc) [{land, [ {term,"meeting",[]}, {lor,[ {term,"face",[]}, {term,"phone",[]} ]} ]}] 51
  • The Query as a Graph #land #term #lor "meeting" #term #term "phone" "face" 52
  • Plan the Query 53
  • System Catalog Catalog Catalog Catalog Catalog Catalog Catalog Catalog Catalog 54
  • System Catalog Term Weight Term TermID & File Offset 55
  • Consult the System Catalog for Term/Node Weights #land #term #lor "meeting" #term #term 23 @ node B "phone" "face" 17 @ node A 13 @ node C 56
  • Use Term Weights to Plan the Query #land #term #lor "meeting" #term #term 23 @ node B "phone" "face" 17 @ node A 13 @ node C 57
  • Use Term Weights to Plan the Query #land #term #lor "meeting" #term #term 23 @ node B "phone" "face" 17 @ node A 13 @ node C 58
  • Use Term Weights to Plan the Query #node@B #land #term #node@A "meeting" #lor #term #term 23 @ node B "phone" "face" 13 @ node C 17 @ node A 59
  • The Node-Assigned Query as an Erlang Term [{node, {land, [ {node, {lor, [ {term,{"email","body","face"}, [ {node_weight,'node_c@127.0.0.1', 13} ]}, {term,{"email","body", "phone"}, [ {node_weight,'node_a@127.0.0.1', 17} ]} ]}, 'node_a@127.0.0.1' }, {term, {"email","body","meeting"}, [ {node_weight,'node_b@127.0.0.1', 23} ]} ]}, 'node_b@127.0.0.1' }] 60
  • Execute the Query 61
  • Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 62
  • Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 63
  • Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 64
  • Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 65
  • Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 66
  • Spawn the Query Processes & Stream the Results #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 67
  • Spawn the Query Processes & Stream the Results #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 68
  • Spawn the Query Processes & Stream the Results #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 69
  • Terminate When Finished #node@B #land #term #node@A 'disconnect' #lor #term #term 'disconnect' 'disconnect' 70
  • Message Format 71
  • The Message Format Message :: {results, [Result]} | {results, disconnect} Result :: {DocID, Properties} DocID :: term() Properties :: proplist() 72
  • The Message Format {results, [ {375, []}, {961, [{color, "red"}]}, {155, [{pos, [1,2,5]}]} ]} 73
  • Yay for Erlang! • Clean lines between load balancing and logic, single- and multi-node look the same • Easy to create new operators, rapid development of experimental features • Linked processes make cleanup a breeze • Significant code reduction over early Java prototypes 74
  • Part Four Review 75
  • Riak Search turns this... "Converse AND Shoes" WTF!? I'm a KV store! CLIENT RIAK 76
  • ...into this... "Converse AND Shoes" Gladly! CLIENT RIAK 77
  • ...into this... "Converse AND Shoes" Keys or Objects CLIENT RIAK 78
  • ...while keeping operations easy. Your Riak Riak Application Search 79
  • Thanks! Questions? Search Team: John Muellerleile - @jrecursive Rusty Klophaus - @rklophaus Kevin Smith - @kevsmith Currently working with a small set of Beta users. Open-source release planned for Q3. www.basho.com