Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil Merchant of Bounce - Redis Day Bangalore 2020

Real-time Geo-Searching at Scale with RediSearch
Apoorva Gaurav - Senior Software Engineer - Bounce
Ronil Merchant - Senior Software Engineer - Bounce

PRESENTED
BY
Speakers:
Ronil Merchant (Software Engineer)
Pushkar Gupta (Software Engineer)

What is Bounce?
We are a Mobility startup providing dock-less scooter sharing solutions to consumers. It is
basically a service that enables you to pick up a scooter from anywhere and drop anywhere.
https://bounceshare.com/

1.3 Cr+
Trips
19k+
Fleet Size
5 Cr+
Kms Travelled
25 Lac+
Active Users
Positive Impact on how the city travels

PRESENTED
BY
1 Section 1
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.
2 Section 2
3 Section 3
Agenda:

PRESENTED
BY
Problem Statement
Bike listing lies at the core of the user experience at Bounce
and since it marks the beginning of the user booking flow.
To give our users the best experience we need to list bikes
which are nearest to them and best match the user preferences
Hence bike discovery needs to be highly accurate as well as
have minimal latency.

PRESENTED
BY
Earlier Implementation
We have been traditionally using PostGres with the PostGis Extension to power geospatial
queries and search.

PRESENTED
BY
Challenges with current implementation.
Rapidly changing nature of our dataset (eg. We receive around 4k location updates per second) coupled
with the high throughput required at low latency ⚡ (We currently are at 1000 listings/second).
Difficult to handle this amount of scale on Postgres once we hit 10x scale (which has been the trend
over the last 10 months) as it will impact all db operations and a serious headache to Dev-ops and
platform team 😭😤!
Hence the use case essentially boiled down to having a db which performs extremely well in giving low
latency geo searching and document filtering capabilities in high write scenarios.

PRESENTED
BY
Approaches Evaluated
1. Postgres Scaling
2. Elasticsearch
3. RediSearch

PRESENTED
BY
Elasticsearch
● Ran geo distance filter query, along with some other filters (term queries).
● Were able to achieve ~1300 ops/s with set and get operations on a single r4,4xlarge instance with CPU hovering
around 50%. close to 1 lakh keys.
● Load tests were performed on a single node r4.4xlarge. Achieved response times of ~14ms for both reads and
writes at 1200 requests per second. overall more than 80k docs with below structure

PRESENTED
BY
Why not go ahead with Postgres scaling?
It would have been a stop gap solution as eventually at certain scale again we would have faced the same problems.
We are already on a db.m4.10xlarge
Also wanted to move towards a document based structure for the same again.

PRESENTED
BY
Motivation behind choosing Redis
Redis is known to thrive with excellent read and write performance in huge loads.
But bike discovery needs more than mere key value fetching and a certain amount of document querying
capabilities as well.
Come Redisearch to the rescue 🚑!

PRESENTED
BY
What is Redisearch?
As the name suggests it is a redis powered search engine.
It has full text search, filtering and geo filtering capabilities.
https://oss.redislabs.com/redisearch/index.html

PRESENTED
BY
Approach taken
There were 2 options to go ahead with.
1. Use redis geo index
https://oss.redislabs.com/redisearch/Overview.html#geo_index
2. Use geo hashes and index them and use search index.
https://en.wikipedia.org/wiki/Geohash
We did some tests evaluations based on both the approaches and decided to go ahead with the 2nd approach as
it gave better performance for our use case.
The tests overview will be shared later in the presentation

PRESENTED
BY
What is Geo Index?
As per the documentation, Geo indexes utilize Redis' own geo-indexing capabilities. In query
time, the geographical part of the query (a radius filter) is sent to Redis, returning only the ids of
documents that are within that radius.
An example geo query looks like

What is a GeoHash? 🤔
Geohash is a public domain geocode system
which encodes a geographic location into a
short string of letters and digits. It is a
hierarchical spatial data structure which
subdivides space into buckets of grid shape

PRESENTED
BY
As geohash searches are basically text based searches they should be faster as compared to geo index which
requires computations for whether the given coordinates are in search query.
We ran some tests evaluations to back up the same.
The tests overview will be shared later in the presentation

PRESENTED
BY
● Basically a document with relevant bike data for bike discovery was indexed to redis and filtering was done
based on geohash and other additional filters.
● Geohashes and most of the other string based filters (Status, Availability, bike_type, etc) were indexed as the
Tag data-type.
● Reason being we didn’t need to leverage the full text searching capabilities of redis and hence did not index the
fields as FULLTEXT .
● Also Significant performance gains were observed on using TAGS against FULLTEXT due to a having a limited
key set.
● GeoHash also has querying advantages also can be used to aggregate based on location, for various use cases
like demand pricing, etc.

PRESENTED
BY
Tests run
Stress tests run, with several types of schemas used.
1. with GEO field for location, and other filters as FULLTEXT.
2. Indexing filters as FULLTEXT and running exact match query (Lat/Lon being indexed). Using geohash for
location representation.
3. Indexing filters as TAG (Lat/Lon being indexed).
4. Indexing filters as FULLTEXT and running wild card search query. (lat/lon as no index).
5. lat lon fields as NOINDEX and all fields to be filtered upon as TAG.

PRESENTED
BY
Results
Case 1: Geo query and using exact search for other filters.
Reads: 85/s Latency: 120 ms
Writes: 85/s Latency: 125 ms
Case 2: All Filter fields defined as FULLTEXT and using exact search
Reads: 320/s Latency: 30 ms
Writes: 380/s Latency: 26 ms
Case 3: Filter fields defined as tag field. Also indexing lat and lon fields.
Reads: 400/s Latency: 18.42 ms
Writes: 1.01K/s Latency: 19.19 ms

PRESENTED
BY
A notably high
response time
is seen,
attributed to a
key explosion in
inverted index
due to updates
with randomly
changing lat/lon
values for
documents.

PRESENTED
BY
Case 4: Filter fields defined as FULLTEXT and using wildcard search. Also lat/lon fields are marked as NOINDEX
reads: 1.4k/s Latency: 6.5 ms
writes: 1.75k/s Latency: 5.5 ms

PRESENTED
BY
Case 5: Filter fields defined as TAG field and lat/lon fields are marked as NOINDEX
For 1.4k reads/s and 4.5k writes/s
Read latency: 3.5ms 💪🙌
Write latency: 4ms 👏😍

As we can see significant
performance gains were observed
when the lat/lon fields were marked as
NOINDEX.
Reason can be attributed from the
following excerpt from the
documentation.
https://oss.redislabs.com/redisearch/
Overview.html#index_garbage_collecti
on
In a nutshell a reduced keyspace in the
inverted index resulted in a
commendable boost in performance.

PRESENTED
BY
● Currently using a AWS DMS + Kinesis
infrastructure to live sync data from
Postgres to Redis.
● Dms task does CDC (Change data
Capture) for certain tables and loads it to
kinesis. A Consumer application reads
from kinesis and updates the data to
RediSearch
● Eventually certain high frequency
updates will be completely moved to
redis.
Final Implementation Overview

PRESENTED
BY
Schema looks something like this.

PRESENTED
BY
One of the caveats of using a geo hash is that a user can be at the edge of the geohash grid. To
make it closer to a radial search behaviour, we also consider the neighbouring grids for our
search query.
Query looks something on the lines of as shown below

PRESENTED
BY
Conclusion
● Moving towards document based Redissearch structure allowed extremely fast querying
on multiple filters.
● Also will help us reduce quite a huge chunk of writes on our primary postgres database.
● Moving to a document based structure also helped us to move away from expensive
joins.
● Additional use cases like filtering based on fuel or condition parameters can be easily
implemented as shown below.

PRESENTED
BY
Few Drawbacks would be that sorting based on distance was not possible via this approach,
hence we had to implement the same at application level.

Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil Merchant of Bounce - Redis Day Bangalore 2020

Recommended

Recommended

More Related Content

Similar to Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil Merchant of Bounce - Redis Day Bangalore 2020

Similar to Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil Merchant of Bounce - Redis Day Bangalore 2020 (20)

More from Redis Labs

More from Redis Labs (20)

Recently uploaded

Recently uploaded (20)

Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil Merchant of Bounce - Redis Day Bangalore 2020