RedisSearch / CRDT: Kyle Davis, Meir Shpilraien

PRESENTED BY
RediSearch / CRDT
Kyle Davis (@stockholmux)
Redis Labs, Head of Developer Advocacy
Meir Shpilraien (@Meir_Shpilraien)
Redis Labs, Senior Software Engineer

PRESENTED BY
Overview of RediSearch
Kyle

PRESENTED BY
RediSearch can be used for two [primary] things:
Full Text
Search
Secondary
Index

PRESENTED BY
Search Aggregate
RediSearch can do three things.
Autocomplete
Query

PRESENTED BY
• Create a schema using four types
– Text
– Numeric
– Tag
– Geospatial
• Add Documents in Real Time
– Directly
– From Hash
– Index only
• Search & Aggregate
• Delete documents as needed
• Drop the whole index
Data Lifecycle in RediSearch / Search and Aggregate

PRESENTED BY
• Goals
– Intentionally not SQL
– But familiar
– Exposable to end-users
• Simple
– No knowledge of data/structure needed
• Powerful
– With knowledge, zero in on data
Query Language

PRESENTED BY
ford truck
Query Syntax

PRESENTED BY
AND / OR / NOT / Exact Phrase / Geospatial /
Tags / Prefix / Number Ranges / Optional
Terms & more
Query Syntax – more advanced
And combine them all into one query:
(chev*|ford) -explorer ~truck @year:[2001
2011] @location:[74 40 100 km] @condition:{
good | verygood }

PRESENTED BY
•Stop words:
–”a fox in the woods” -> “fox woods”
•Stemming:
–Query “going” -> find “going” ”go” “gone”
•Slop:
–Query: “glass pitcher”, slop 2 -> “glass gallon beer pitcher”
•With or without content:
–Query: “To be or not to be” -> Hamlet (without the whole play)
Matched text highlight/summary:
–Query – “To be or not to be” -> Hamlet. <b>To be, or not to be</b>-
that is the question
Full-text Search

PRESENTED BY
•Synonyms
–Query “Bob” -> Find documents with “Robert”
•Query Spell Check
–”a fxo in the woods” -> Did you mean “a fox in the
woods”
•Phonetic Search
–“John Smith” -> “Jon Smyth”
Full-text Search

PRESENTED BY
• Each field can have a weight which influences the rank in the returned result
• Each document can have a score to influence rank
• Built-in Scoring Functions
– Default: TF-IDF / term frequency–inverse document frequency
• Variant: DOCNORM
• Variant: BM25
– DISMAX (Solr’s default)
– DOCSCORE
– HAMMING for binary payloads
• Fields can be independently sortable, which trumps any in-built scorer
Scoring, Weights, and Sorting

PRESENTED BY
Aggregations
• Processes and transforms
• Same query language as search
• Can group, sort and apply transformations
• Follows pipeline of composable actions:
Filter Group Apply Sort Apply
Reduce
Reduce

PRESENTED BY
Grouping & Applications
• Reducers:
– COUNT
– COUNT_DISTINCT
– COUNT_DISTINCTISH
– SUM
– MIN
– MAX
– AVG
– STDDEV
– QUANTILE
– TOLIST
– FIRST_VALUE
– RANDOM_SAMPLE
• Manipulate
– Strings
• substr(upper('hello
world'),0,3) -> “HEL”
– Numbers w/ Arithmetic
• sqrt(log(foo) *
floor(@bar/baz)) + (3^@qaz %
6)
– Timestamp to Calendar
• timefmt(@mytimestamp, "%b %d
%Y %H:%M:%S”) -> Feb 24 2018
00:05:48

FT.AGGREGATE shipments "@box_area:[300 +inf]”
APPLY "year(@shipment_timestamp / 1000)" AS shipment_year
GROUPBY 1 @shipment_year REDUCE COUNT 0 AS shipment_count
SORTBY 2 @shipment_count DESC
LIMIT 0 3
APPLY "format("%sk+ Shipments",floor(@shipment_count /
1000))" AS shipment_count
RediSearch in Action: FT.AGGREGATE

PRESENTED BY
• In the module, but separate storage
• Radix trie-based, optimized for
real-time, as-you-type completions
• Simple API
– Add a suggestion (FT.SUGADD)
– Get a suggestion (FT.SUGGET)
– Delete a suggestion(FT.SUGDEL)
• Specify or increment “score” of each
item to create custom sortings
Autocomplete/Suggestions

PRESENTED BY
RediSeach CRDT / Benchmark
Meir

PRESENTED BY
1 Redisearch Benchmark
2 What is CRDT?
3 Search & CRDT
Agenda:

PRESENTED BY
Use case 1:
Building an Index
and Running a Simple Query

PRESENTED BY
• Indexing of a wikipedia dataset: 5.6M docs @ 5GB
• Date of dump: Feb 7, 2019
Dataset

PRESENTED BY
RediSearch vs Elasticsearch : indexing time
RediSearch
58%
faster

PRESENTED BY
Redisearch vs Elasticsearch - search of two words
x4 faster

PRESENTED BY
Use case 2
A Multi-Tenant Search

PRESENTED BY
• Serving a multi-tenant application,
• Each tenant has its own dedicated and isolated search index
• Number of docs per index - 500
• Total number of tenants - 50k
• Total number of indexed documents – 25M
What is a multi-tenant search ?

PRESENTED BY
Multi-tenant Results
RediSearch
3min and 21secs

PRESENTED BY
RediSearch
3min and 21secs
Crashed after 921 indices.

PRESENTED BY
• Natively in memory (*Elasticsearch was running with cache enabled)
• C (RediSearch) vs. Java (Elasticsearch)
• Extremely optimized built from the ground-up search engine vs. less optimized
20yro Lucene search engine
• Redis RESP light protocol vs Elasticsearch HTTP based protocol

PRESENTED BY
• Client & Server – AWS c4.8xlarge (36 vCPU and 60GB RAM)
Setup
client elastic client redis
RediSearch

PRESENTED BY
Elasticsearch:
• shards: 5
• JVM settings (Xms and Xmx)
• indices.memory.index_buffer_size
• index.refresh_interval (triggers flushes)
• index.number_of_replicas
Redisearch:
• Doc table size 10M
• No threads concurrency (handle using enterprise cluster)
Configuration Settings
RediSearch

PRESENTED BY
Multi site Active-Active replication
Consensus based Replication
A single instance needs to know that a majority of parties
agreed on an operation before applying it
Advantage:
- Secured strong consistency
- Known algorithms Paxos, Raft...
Disadvantage :
- Takes time to reach an agreement (especially on a worldwide scale)
*Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (2011), Conflict-Free
Replicated Data Types, Lecture Notes in Computer Science 6976

PRESENTED BY
Conflict Free Replicated Data-Types
• Consensus free technique that satisfy the “Eventual Consistency” properties
- No need to coordinate with other parties in advance → increases performance
- Waiting long enough, all parties state will be aligned → strong eventual consistency
What is CRDT
INCRBY 5 DECRBY 3x = 2
SyncSync
* Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (2011),
Conflict-Free Replicated Data Types, Lecture Notes in Computer Science 6976

PRESENTED BY
CRDT & Redisearch

PRESENTED BY
RediSearch and CRDB (Redis CRDT) → a Multi site Active-Active search engine
• RediSearch (FT.ADD) saves the raw data as a Hash.
• CRDT replicates the Hash between the sites.
• On Hash received, CRDT notifies RediSearch causing new data reindex.
• Only after conflicts being resolved by CRDT, RediSearch is being notified.
RediSearch & CRDT
ft.add idx doc1 name Danny
hset doc1 name Danny
Replicating data
to the other replica
notification on new data arrive
CRDT CRDT
RediSearch RediSearch
Site 1 Site 2

RedisSearch / CRDT: Kyle Davis, Meir Shpilraien

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to RedisSearch / CRDT: Kyle Davis, Meir Shpilraien

Similar to RedisSearch / CRDT: Kyle Davis, Meir Shpilraien (20)

More from Redis Labs

More from Redis Labs (20)

Recently uploaded

Recently uploaded (20)

RedisSearch / CRDT: Kyle Davis, Meir Shpilraien