Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Distributed Search in Riak
Integrating search in a NoSQL database
Fred Dushin
Member of Technical Staff
Basho Technologies
3
About Me
CORBA -> Web Services -> MoM
Joined Basho Jan 2015
Reach out!
github://fadushin
lr2015@dushin.net
4
What I want to talk about
How is Query even possible in a distributed
NoSQL database?
What happens when things break?
Ho...
5
What is Riak?
A Distributed key-value store
Prioritizes availability over consistency
Provides elasticity without downti...
6
A Riak Glossary
• Key
... any sequence of bytes
• Value
... any opaque blob of data
• Bucket
... an organizing namespace...
7
Riak Partitions
1
2
3
45
6
8
7
ring_size=8
2^160/4
0
BKey_1
BKey_2
BKey_3
BKey_4
BKey_n
...
sha1(BKey_i) = 3671 A68E 109...
Node 5
Node 4
Node 3
Node 2
Node 1
8
How Partitions are distributed
ring_size=8 num_nodes=5
1
2
3
45
6
8
7
1
6
27
3
8
45
A...
9
How entries are replicated
1
2
3
45
6
8
7
sha1(BKey) -> 6
"responsible" partition
"primary" replicas
n_val = 3
Node 5
Node 4
Node 3
Node 2
Node 1
10
Riak/KV Put
1 6
2 7
3 8
4
5
#!/usr/bin/env python
import riak
client = riak.RiakClie...
n_val = 3
w = quorum
Node 5
Node 4
Node 3
Node 2
Node 1
11
Riak/KV Write Availability
1 6
2 7
3 8
4
5
bucket: agents
key: ...
Node 5
Node 4
Node 3
Node 2
Node 1
12
Riak/KV Read Repair
1 6
2 7
3 8
4
5
#!/usr/bin/env python
import riak
client = riak....
13
Riak K/V Active Anti-Entropy
{BKey, #}
{BKey, #}
{BKey, #}
#Segment_1
{BKey, #}
{BKey, #}
{BKey, #}
#Segment_2 ...
{BKe...
Node 5
Node 4
Node 3
Node 2
Node 1
14
Riak/KV AAE
1 6
2 7
3 8
4
5
bucket: agents
key: agentp
value: {"name_s": "perry",
"t...
15
Yokozuna
16
What is Yokozuna?
An extension of Riak which provides search
capability over values stored in Riak
Data stored and repl...
Erlang BEAM
17
Yokozuna
Riak K/V
Yokozuna
Admin
API
Query
API
Solr
Monitor
Solr
Query
extractors
YZ AAE
http
http
http
pip...
Node 5
Node 4
Node 3
Node 2
Node 1
18
Indexing
1
2 7
3 8
4
5
#!/usr/bin/env python
import riak
client = riak.RiakClient(ho...
19
<!-- XML -->
<schema name="default" version="1.5">

<fields>
...
<dynamicField name="*_s" type="string" indexed="true" ...
20
Riak Query
All Solr queries are made on the Riak endpoint
Riak uses distributed (legacy) Solr to route queries to
nodes...
Node 5
Node 4
Node 3
Node 2
Node 1
21
Covering Sets
bucket: agents
key: agentp
value: {"name_s": "perry",
"type_s': "repti...
Node 5
Node 4
Node 3
Node 2
Node 1
22
Query
1
2 7
3 8
4
5
#!/usr/bin/env python
import riak
client = riak.RiakClient(host=...
23
Query
prompt$ curl 'http://node4:8098/search/query/my_index?wt=json&indent=true&q=type_s:mammal'
{
"responseHeader":{
"...
JVM
24
Yokozuna Java Components
Jetty
Monitor
Shard
Translator
Entropy
Data
Node 5
Node 4
Node 3
Node 2
Node 1
25
YZ AAE
1
2 7
3 8
4
5
my_index
my_index
my_index
my_index
my_index
6
name_s: ["perry"...
Node 2
26
YZ AAE
2 my_index
name_s: ["perry"]
type_s: ["mammal"]
bucket: agents
key: agentp
value: {"name_s": "perry",
"ty...
27
Entropy Data Field
<!-- XML -->
<schema name="default" version="1.5">

<fields>
...
<!-- Required fields -->
<field nam...
28
_yz_ed field
2 default my_bucket agentp 8 g2IHDNr2
version
bucket type
bucket name
key
object hash
partition
29
Entropy Data Query
prompt$ curl 'http://node3:8093/internal_solr/my_index/entropy_data?partition=8&limit=1000&wt=json&i...
JVM
30
Yokozuna Java Components
Jetty
Monitor
Shard
Translator
Entropy
Data
31
What does Solr bring to Riak?
What does Riak bring to Solr?
32
Thanks!
Upcoming SlideShare
Loading in …5
×

of

Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 1 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 2 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 3 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 4 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 5 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 6 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 7 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 8 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 9 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 10 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 11 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 12 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 13 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 14 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 15 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 16 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 17 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 18 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 19 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 20 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 21 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 22 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 23 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 24 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 25 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 26 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 27 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 28 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 29 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 30 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 31 Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies Slide 32
Upcoming SlideShare
Schema Design for Riak
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies

Download to read offline

Lucene/Solr Revolution 2015

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies

  1. 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  2. 2. Distributed Search in Riak Integrating search in a NoSQL database Fred Dushin Member of Technical Staff Basho Technologies
  3. 3. 3 About Me CORBA -> Web Services -> MoM Joined Basho Jan 2015 Reach out! github://fadushin lr2015@dushin.net
  4. 4. 4 What I want to talk about How is Query even possible in a distributed NoSQL database? What happens when things break? How does Riak distribute data? How does Riak repair divergence? What is Riak? What is Riak Search? What does Solr bring to Riak? What does Riak bring to Solr?
  5. 5. 5 What is Riak? A Distributed key-value store Prioritizes availability over consistency Provides elasticity without downtime
  6. 6. 6 A Riak Glossary • Key ... any sequence of bytes • Value ... any opaque blob of data • Bucket ... an organizing namespace for keys • Bucket Type ... an organizing namespace for buckets {{BucketType, Bucket}, Key} -> Value "BKey"
  7. 7. 7 Riak Partitions 1 2 3 45 6 8 7 ring_size=8 2^160/4 0 BKey_1 BKey_2 BKey_3 BKey_4 BKey_n ... sha1(BKey_i) = 3671 A68E 1098 CDEE 9F4B 2^160 * 3/4 2^160/2 A 20 byte hash, or, A really big number between 0 and 2^160 - 1 2^160 = 1461501637330902918203684832716283019655932542976 {1, 0} {2, 182687704666362864775460604089535377456991567872} {3, 365375409332725729550921208179070754913983135744} {4, 548063113999088594326381812268606132370974703616} {5, 730750818665451459101842416358141509827966271488} {6, 913438523331814323877303020447676887284957839360} {7, 1096126227998177188652763624537212264741949407232} {8, 1278813932664540053428224228626747642198940975104}
  8. 8. Node 5 Node 4 Node 3 Node 2 Node 1 8 How Partitions are distributed ring_size=8 num_nodes=5 1 2 3 45 6 8 7 1 6 27 3 8 45 A Riak "cluster"
  9. 9. 9 How entries are replicated 1 2 3 45 6 8 7 sha1(BKey) -> 6 "responsible" partition "primary" replicas n_val = 3
  10. 10. Node 5 Node 4 Node 3 Node 2 Node 1 10 Riak/KV Put 1 6 2 7 3 8 4 5 #!/usr/bin/env python import riak client = riak.RiakClient( host='node4' ) bucket = client.bucket('agents') key = 'agentp' value = {'name_s': "perry", 'type_s': "reptile"} obj = bucket.new(key, value) obj.store() bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} n_val = 3 w = quorum (⌊n_val/2⌋ + 1) ok sha1({agents, agentp}) -> 6
  11. 11. n_val = 3 w = quorum Node 5 Node 4 Node 3 Node 2 Node 1 11 Riak/KV Write Availability 1 6 2 7 3 8 4 5 bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} #!/usr/bin/env python import riak client = riak.RiakClient( host='node4' ) bucket = client.bucket('agents') key = 'agentp' value = {'name_s': "perry", 'type_s': "mammal"} obj = bucket.new(key, value) obj.store() ok bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} Hinted Handoff fallback
  12. 12. Node 5 Node 4 Node 3 Node 2 Node 1 12 Riak/KV Read Repair 1 6 2 7 3 8 4 5 #!/usr/bin/env python import riak client = riak.RiakClient( host='node4' ) bucket = client.bucket('agents') obj = bucket.get('agentp') bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} n_val = 3 r = quorum {'name_s': "perry", 'type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"}
  13. 13. 13 Riak K/V Active Anti-Entropy {BKey, #} {BKey, #} {BKey, #} #Segment_1 {BKey, #} {BKey, #} {BKey, #} #Segment_2 ... {BKey, #} {BKey, #} #Segment_k ... #Seg_1..k #Seg_j..n... #root {BKey, #} {BKey, #} {BKey, #} #Segment_n... ... {BKey, #} {BKey, #} {BKey, #}
  14. 14. Node 5 Node 4 Node 3 Node 2 Node 1 14 Riak/KV AAE 1 6 2 7 3 8 4 5 bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} Riak maintains multiple hashtrees for each partition, one for each "replica set" that can be overlap on the partition. Hashtrees are stored persistently on disk Asynchronously updated on data inserts Periodically exchanged between neighbors Divergence in values triggers read repair
  15. 15. 15 Yokozuna
  16. 16. 16 What is Yokozuna? An extension of Riak which provides search capability over values stored in Riak Data stored and replicated in Riak is automatically indexed in Solr Solr queries are distributed across the Riak cluster http://github.com/basho/yokozuna
  17. 17. Erlang BEAM 17 Yokozuna Riak K/V Yokozuna Admin API Query API Solr Monitor Solr Query extractors YZ AAE http http http pipe index/delete/repair http protobuf http protobuf operations analysis Solr Indexing
  18. 18. Node 5 Node 4 Node 3 Node 2 Node 1 18 Indexing 1 2 7 3 8 4 5 #!/usr/bin/env python import riak client = riak.RiakClient(host='node4') client.create_search_index('my_index') my_index my_index my_index my_index my_index bucket = client.bucket('agents') bucket.set_properties( {'search_index': 'my_index'} ) key = 'agentp' value = {'name_s': "perry", 'type_s': "mammal"} obj = bucket.new(key, value) obj.store() bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} 6 name_s: ["perry"] type_s: ["mammal"] name_s: ["perry"] type_s: ["mammal"] name_s: ["perry"] type_s: ["mammal"]
  19. 19. 19 <!-- XML --> <schema name="default" version="1.5">
 <fields> ... <dynamicField name="*_s" type="string" indexed="true" stored="true" multiValued="false"/> <dynamicField name="*_ss" type="string" indexed="true" stored="true" multiValued="true"/> ... <!-- Required fields --> <field name="_yz_id" type="_yz_str" indexed="true" stored="true" multiValued="false" required="true"/>
 <field name="_yz_rt" type="_yz_str" indexed="true" stored="true" multiValued="false"/>
 <field name="_yz_rb" type="_yz_str" indexed="true" stored="true" multiValued="false"/>
 <field name="_yz_rk" type="_yz_str" indexed="true" stored="true" multiValued="false"/>
 <field name="_yz_pn" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_fpn" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_vtag" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_err" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_ed" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 </fields> <uniqueKey>_yz_id</uniqueKey> <types> ... <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true" /> </types> 
 </schema> Default/Custom Schema https://github.com/basho/yokozuna/blob/develop/priv/default_schema.xml
  20. 20. 20 Riak Query All Solr queries are made on the Riak endpoint Riak uses distributed (legacy) Solr to route queries to nodes in the Riak cluster using the shards parameter Solr aggregates results and returns result through Riak Riak supports all query features supported in distributed Solr* * Protobuf interfaces currently have some limitations.
  21. 21. Node 5 Node 4 Node 3 Node 2 Node 1 21 Covering Sets bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "reptile"} 6 7 83 4 5 1 2 A Covering Set is a subset of all partitions such that for all BKeys in the keyspace, there is exactly one partition in the covering set in which that BKey can be found. Covering Sets are not unique!
  22. 22. Node 5 Node 4 Node 3 Node 2 Node 1 22 Query 1 2 7 3 8 4 5 #!/usr/bin/env python import riak client = riak.RiakClient(host='node4') results = bucket.search( 'type_s:mammal' index='my_index' ) my_index my_index my_index my_index my_index 6 name_s: ["perry"] type_s: ["mammal"] name_s: ["perry"] type_s: ["mammal"] name_s: ["perry"] type_s: ["mammal"] _yz_pn: 8 _yz_pn: 7 _yz_pn: 6
  23. 23. 23 Query prompt$ curl 'http://node4:8098/search/query/my_index?wt=json&indent=true&q=type_s:mammal' { "responseHeader":{ "status": 0, "QTime": 88, "params":{ "q" :"type_s:reptile", "shards": "node3:8093/internal_solr/my_index,node5:8093/internal_solr/my_index", "node5:8093": "(_yz_pn:5 AND (_yz_fpn:5 OR _yz_fpn:4))", "node3:8093": "_yz_pn:8 OR _yz_pn:3", "indent": "true", "wt": "json"}}, "response":{"numFound":1,"start":0,"maxScore":0.30685282,"docs":[ { "name_s": "perry", "type_s": "mammal", "_yz_id": "1*default*agents*agentp*8", "_yz_rk": "agentp", "_yz_rt": "default", "_yz_rb": "agents"}] } }
  24. 24. JVM 24 Yokozuna Java Components Jetty Monitor Shard Translator Entropy Data
  25. 25. Node 5 Node 4 Node 3 Node 2 Node 1 25 YZ AAE 1 2 7 3 8 4 5 my_index my_index my_index my_index my_index 6 name_s: ["perry"] type_s: ["mammal"] name_s: ["perry"] type_s: ["mammal"] name_s: ["perry"] type_s: ["mammal"] bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"}
  26. 26. Node 2 26 YZ AAE 2 my_index name_s: ["perry"] type_s: ["mammal"] bucket: agents key: agentp value: {"name_s": "perry", "type_s': "mammal"} 7 Yokozuna maintains its own set of AAE tress for data stored in Solr. Hashtrees are stored persistently on disk Updated on indexing operations Periodically exchanged between the K/V AAE tree on the same node If a value is missing in Solr, it is reindexed; if a value is indexed when it shouldn't be, it is deleted. Riak K/V is canonical.
  27. 27. 27 Entropy Data Field <!-- XML --> <schema name="default" version="1.5">
 <fields> ... <!-- Required fields --> <field name="_yz_id" type="_yz_str" indexed="true" stored="true" multiValued="false" required="true"/>
 <field name="_yz_rt" type="_yz_str" indexed="true" stored="true" multiValued="false"/>
 <field name="_yz_rb" type="_yz_str" indexed="true" stored="true" multiValued="false"/>
 <field name="_yz_rk" type="_yz_str" indexed="true" stored="true" multiValued="false"/>
 <field name="_yz_pn" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_fpn" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_vtag" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_err" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 <field name="_yz_ed" type="_yz_str" indexed="true" stored="false" multiValued="false"/>
 </fields> <uniqueKey>_yz_id</uniqueKey> <types> ... <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true" /> </types> 
 </schema>
  28. 28. 28 _yz_ed field 2 default my_bucket agentp 8 g2IHDNr2 version bucket type bucket name key object hash partition
  29. 29. 29 Entropy Data Query prompt$ curl 'http://node3:8093/internal_solr/my_index/entropy_data?partition=8&limit=1000&wt=json&indent=true' { "responseHeader":{ "status":0, "QTime":1}, "response":{"numFound":135,"start":0,"docs":[ ... { "vsn":"2", "riak_bucket_type":"default", "riak_bucket_name":"my_bucket", "riak_key":"agentp", "base64_hash":"g2IHDNr2" }, ... ]}, "more":false}
  30. 30. JVM 30 Yokozuna Java Components Jetty Monitor Shard Translator Entropy Data
  31. 31. 31 What does Solr bring to Riak? What does Riak bring to Solr?
  32. 32. 32 Thanks!
  • JohnLeBrasseur

    Apr. 30, 2017

Lucene/Solr Revolution 2015

Views

Total views

775

On Slideshare

0

From embeds

0

Number of embeds

7

Actions

Downloads

13

Shares

0

Comments

0

Likes

1

×