Scaling Search at Lendingkart discusses how Lendingkart scaled their search capabilities to handle large increases in data volume. They initially tried scaling databases vertically and horizontally, but searches were still slow at 8 seconds. They implemented ElasticSearch for its near real-time search, high scalability, and out-of-the-box functionality. Logstash was used to seed data from MySQL and MongoDB into ElasticSearch. Custom analyzers and mappings were developed. Searches then reduced to 230ms and aggregations to 200ms, allowing the business to scale as transactional data grew 3000% and leads 250%.
1. Scaling Search at Lendingkart
Shivendra Singh, Swapnil Bagadia and Nitesh Kumar
16 June 2018
2. Scaling
Scaling/Scalability is the capability of a system to handle a
growing amount of work, or its potential to be enlarged to
accommodate that growth ~Wiki
4. Scaling and High Availability(Application)
● Application does not change too often (static)
● If we need more performance, we are adding more resources
● Easy to scale and achieve High Availability
● But what happens with the database?
5. Scaling and High Availability(Databases)
● We have to distribute the changes to all the databases in real time
● It has to be available for all the applications
● The application has to be able to do changes
7. Horizontal Scaling
● Master Master Setup
● Mysql cluster
● MariaDB/ Galera/ Percona XTRA DB
● Problems - Messy, Hard to identify/fix when issues arise, autoincrement
● Sharding Databases
● Complexity of managing at application level
● Multiple Read Replicas
● Single Master Multiple Read Replicas used by application
● Separate DB for Analytics
● Problems - replication lag
9. Query Optimizations
● Use indexes for better read performance
○ Multiple non-clustered/secondary indexes
○ Too many and too little indexes are both bad
○ Check for duplicate and unused indexes
○ Queries can be run without indexes but it can take a really long time
○ Best if all WHERE and JOIN clause are using INDEX for lookups
● Monitor and force use of indexes if required
○ FORCE INDEX for index to be used
● Fix top offenders (repeatedly)
○ Slow query logs (using long_query_time)
○ Use explain on these queries
■ Using Index - Good
■ Using Filesort, Using temporary - Bad
10. Server Side Optimizations for performance
● Sensible timeout for queries
○ Max query execution time
○ Lock wait timeouts
● Changed to Read Committed from Repeatable Read
11. Separating out Databases
● Smaller databases that are completely decoupled and independent
● Pros
○ Simplicity
○ More cost effective
○ High Availability
○ Enforces loose coupling across data stores
○ Allows better usage of connections to DB
● Cons
○ Hard to maintain referential integrity across different DBs
○ Usage in analytics/reporting.
○ Transaction Management
○ Does not solve the problem of a table growing really large
12. Monitoring and Key Metrics
● Memory Usage
○ Often most important for performance
○ Your working set must fit in memory well.
○ Less memory = more pressure on IO
● IOPS
○ 1xIOPS/ GB burstable upto 3x for General Purpose SSD
○ Provisioned IOPS for better performance
● CPU Usage
● Free Disk usage
● Replication Lag (in Read Replicas)
● Database Connections
13. Challenges of direct search in DB
● Searching on non indexed columns
● Perils of using LIKE queries
○ Full table scan
● Returning all columns
● Aggregations were killing the database performance
15. Challenges of direct search on mongoDB
● Single index
○ MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
○ Default index on the _id field during the creation of a collection.
○ Problem: 24 searchable fields in a document. So 24 indexes???
● Case insensitive search
○ LendingKart or LENdingkart or lendingkart or lendingKART etc.
○ Problem: MongoDB doesn't support case-insensitive regular expression search.
● Prefix search
○ Search query: John
First/middle/last name: John
Company name: Johnson
Email: johnkumar@gmail.com
○ Problem: slow performance
● Sorting and pagination
○ Sorting on specific fields like date or some id.
○ Pagination to separate a big result set into smaller chunks.
○ Problem: MongoDB has sorting memory limit.
16. Chain of thought for search improvement
● Compound index
○ An index that contains references to multiple fields within a document.
○ MongoDB imposes a limit of 31 fields for any compound index.
○ Example:
{
"_id": ObjectId(...),
"leadId": 1234,
"companyName": "lendingkart",
"city": "bangalore",
"email": "lendingkart@abc.com",
"phone": "9999999999”
}
db.leads.createIndex( { "leadId": 1, "companyName": 1, "email":1 } )
● Elastic search
○ An open-source, broadly-distributable, readily-scalable, enterprise-grade search engine.
17. Why we needed some magic!!!
● Searches in MySQL were slow
○ Around 8 seconds for normal search
● Searches in MongoDB were slow
○ Around 8 seconds for normal search
● Aggregations were slow
○ Taking 21 seconds - 36 seconds for aggregations
● Data Growth
○ Transactional/Application from 0.04M to 1.2M
○ Non Transactional/Leads from .6M to 2M
● Our goal was to get searches to happen within 250ms
18. ElasticSearch - You know for search....
Wer Ordnung hält, ist nur zu faul zum Suchen.
(If you keep things tidily ordered, you're just too lazy to go
searching.)
—German proverb
20. What is ELASTICsearch ?
● Full-text search and analytics engine
○ It allows you to store, search, and analyze big volumes of data quickly.
● Near Real Time(NRT)
○ Slight latency (normally one second) from the time you index a document until the time it becomes
searchable.
● Highly scalable
○ Elastic, as the name suggests. It’s clustered by default— you call it a cluster even if you run it on a
single server.
○ Increase/Decrease nodes as per requirement
● It just works...
○ Open-source/Free built on top of Apache Lucene, in Java(inherently cross-platform)
○ Ships with sensible defaults, keeping complex theories for leisure reading
○ Mostly, plug and play.
○ Much more than Lucene - JSON Based, Distributed, web server.
21. Sharding for scalability
○ To add data to Elasticsearch, we need an index—a place to store related data. In reality, an index is just
a logical namespace that points to one or more physical shards.
○ Each shard can have zero or more replicas
○ Replicas on different servers (server pools) for
failover
○ One in the cluster goes down? No problem.
○ Master - Automatic Master detection + failover
○ Responsible for distribution/balancing of shards
24. Data Seeding from MySQL to ES
● What were the options?
○ Binlog Processor service sync ing your MySQL data into Elasticsearch automatically
○ Asynchronous Kafka(as a queue) pipeline
● Why go through all the pain when we can get all the same from ELK stack itself?
○ Logstash was a perfect fit for our requirements
○ 100% Config Based
○ Not a single Line of Code
25. Simplicity at its best- Logstash
● How logstash works?
○ Ah, just like others, logstash has input/filter/output plugins.
○ Attention: logstash process events, not (only) loglines!• "Inputs generate events, filters modify them,
outputs ship them elsewhere." -- [the life of an event in logstash]
● Plugin Architecture
○ Input plugins: captures external data+format & transform it to logstash events
○ Filter plugins: process/transform events
○ Output plugins: send events to external destination & format
○ All Plugins
27. Logstash Configurations - introducing multiple pipelines
● Lack of congestion isolation - backpressure
● One size does not fit all - TCP2TCP(fast and light) vs JDBC2ES(large and low volume)
● The solution before Logstash 6.0: Multiple Logstash Instances - RPM/DEB /Multi-JVM instances
28. Data seeding from mongo to elastic cluster
● How to copy data from mongo to elastic cluster?
○ Mongo-connector
● Do we need to copy all fields and their values of a document from mongo to elastic cluster?
○ Useful(or searchable) data on cluster
● What is Oplog (operations log)?
● How mongo-connector reads oplog to copy documents (new or updated documents) on elastic cluster?
● Can we use a custom configuration file to specify some options to mongo-connector?
● How to track whether mongo-connector has stopped syncing data?
29. MongoDB Connector
Mongo-connector creates a pipeline from a MongoDB cluster to elasticsearch cluster and it copies your
documents from MongoDB to your target system.
30. OpLog(operations log)
● Oplog (operations log) that keeps a rolling record of all operations that modify the data stored in your
databases. Example:
> use test //switched to db test
> db.leads.insert({"leadId":1})
> db.leads.update({"leadId":1}, {$set : {"city": "bangalore"}})
● Oplog entry of above operation:
{ "ts" : { "t" : 1286821977000, "i" : 1 }, "h" : NumberLong("1722870850266333201"), "op" : "i", "ns" : "test.leads", "o" : { "_id"
: ObjectId("4cb35859007cc1f4f9f7f85d"), "leadId" : 1 } }
{ "ts" : { "t" : 1286821984000, "i" : 1 }, "h" : NumberLong("1633487572904743924"), "op" : "u", "ns" : "test.leads", "o2" : {
"_id" : ObjectId("4cb35859007cc1f4f9f7f85d") }, "o" : { "$set" : { "city": "bangalore" } } }
op: the write operation[i: insert, u: update]
Insert
Update
31. How mongo-connector reads oplog to copy documents (new or updated
documents) on elastic cluster?
● Mongo Connector creates an oplog progress file (oplog.timestamp).
● The oplog progress file keeps track of the latest oplog entry seen for each replica set to which Mongo
Connector is connected.
● Mongo Connector uses this file to decide, where to begin reading the oplog on startup.
● When the oplog progress file cannot be found, or if it is empty, Mongo Connector will begin pulling data
from all MongoDB collection in the "collection dump" phase.
● The oplog progress file is then updated with the most recent timestamp from before the dump
happened.
● Mongo Connector then applies all oplog operations from before the dump, so that the copied
documents will be up-to-date with what's on MongoDB.
32. Can we use a custom configuration file to specify some options to mongo-
connector?
● You can use a custom configuration file to specify some options to mongo-connector.
● To invoke mongo-connector with a configuration file option, run:
○ mongo-connector -c config.json
● Configuration options:
○ excludeFields: List of fields to not read from MongoDB. Comma-separated list of fields to exclude from MongoDB
documents. Example: [Database: test, Collection: leads]
"test.leads": {
"excludeFields": ["isSynced","comments","dndMobile","isDuplicateLead"]
}
○ oplogFile: The path to the oplog progress file.
○ batchSize: Number of records processed from the oplog before updating the timestamp file.
■ default bulk size is 1000 docs
33. How to track whether mongo-connector has stopped syncing data?
● Causes:
○ High write-load.
○ Mongo-connector connection with mongoDB or cluster got interrupted.
● Solution:
○ Write a script which run at scheduled time.
○ This script will query the total count of documents from mongo and also elastic.
○ If difference in count is greater than threshold, it will send notification.
MongoDB
Elastic
Cluster
Mongo-connector
34. ES Analyzers
● An analyzer — whether built-in or custom — is just a package which contains three lower-level
building blocks: character filters(>=0), tokenizers( =1), and token filters(>=0).
● Character filters - A character filter receives the original text as a stream of characters and can
transform the stream by adding, removing, or changing characters.
● A tokenizer receives a stream of characters, breaks it up into individual tokens (usually
individual words), and outputs a stream of tokens.
● A token filter receives the token stream and may add, remove, or change tokens.
36. Default Analyzers
● Standard Analyzer
The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most
punctuation, lowercases terms, and supports removing stop words.
● Simple Analyzer
The simple analyzer divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.
● Whitespace Analyzer
The whitespace analyzer divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.
● Stop Analyzer
The stop analyzer is like the simple analyzer, but also supports removal of stop words.
● Keyword Analyzer
The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
● Language Analyzers
Elasticsearch provides many language-specific analyzers like english or french.
● Fingerprint Analyzer
The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.
39. Analyzed Mappings
● How to analyze a field?
● How to analyze using an analyzer?
● How to analyze your querystring?
● Term Query vs Match Query
40. Search-Sort-Filter Operations
● Where to perform sorting/pagination?
○ Direct mongoDB or elastic cluster
● How to perform prefix/match/smart search?
○ Searchable fields: first/middle/last name, company name, status/substatus, leadId, email, phone
○ Search queries
■ Query1: Lendingkart
■ Query2: +91 9999999999
■ Query3: LEA-1234
■ Query4: 9999999999lkart@gmail.com
● How to perform case insensitive search?
○ LendingKart
○ LENdingkart
○ LendingKARt
○ lendingkart
○ LENDINGKART
41. Aggregations
Aggregations allow us to ask sophisticated questions of our data. A combination of buckets and metrics.
Snapshot performance improvement: 21-36 sec to ~200ms
42. Relevance Score
● Boolean model to find matching documents:
full AND text AND search AND (elasticsearch OR lucene)
● Term frequency/inverse document frequency
tf(t in d) = √frequency
idf(t) = 1 + log ( numDocs / (docFreq + 1))
44. Numbers
● Data Growth
● Transactional/Application from 0.04M to 1.2M+ ~ 3000%
● Non Transactional/Leads from .6M to 2M+ - 250%
● Speed of search
● Searches came down from 8 seconds to ~ 230ms
● Aggregations came down from 21-36 seconds to ~ 200ms
Scaling is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth.
It is the ability of a system to handle efficiently more work than it typically performs without going down, or otherwise the ability to be enlarged in order to perform more work efficiently
A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.
Most of the times we are concerned about load scalability- The ability for a distributed system to easily expand and contract its resource pool to accommodate heavier or lighter loads. Alternatively, the ease with which a system or component can be modified(added or removed) to accommodate changing load.
Also referred to as Scaling Up(increase the hardware capability of the machine),
Storage and instance type are decoupled in RDS
There is minimal downtime when you are scaling up on a Multi-AZ environment because the standby database gets upgraded first, then a failover will occur to the newly sized database. A Single-AZ instance will be unavailable during the scale operation.
Can scale up/down when needed. RDS does not allow to downgrade the size of the disk, can only upgrade.
Also referred to as scaling out(add more machines). To scale horizontally (or scale out) means to add more nodes to a system, such as adding a new computer to a distributed software application. An example might be scaling out from one web server system to three.
Problems with Master Master setup, Managing difficulty, need rigorous monitoring, when facing issues hard to fix.
Multiple Read Replicas- replication lag under write heavy load
Sharding challenges - complexity at the applications level
We are using Multiple read replica setup.
There are MySQL Connectors that allow you to do read/write splitting.
In addition to using a MySQL Connector, you can add a load balancer between your application and database servers. You make this addition so that you have a single database endpoint presented to the application. This approach allows for a more dynamic environment where you can transparently add or remove read replicas behind the load balancer without constantly updating the database connection string of the application. You can also perform a custom health check by using scripts.
-Run large, repeating reporting queries and batch jobs on the slave instead
-Point completely read-only pages to serve from the slave
-when a crawler such as Google is identified in the headers, hardwire all queries to go to the slave
changed the transaction isolation level in mysql from "Repeatable Read" to "Read Committed". Lower level of isolation
max_execution_time 120000 The execution timeout for SELECT statements, in milliseconds.
innodb_lock_wait_timeout 60 Timeout in seconds an innodb transaction may wait for a row lock before giving up increase for reliability decrease for performance
tx_isolation READ-COMMITTED allowed values READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ, SERIALIZABLE Repeatable read is a higher isolation level, that in addition to the guarantees of the read committedlevel, it also guarantees that any data read cannot change, if the transaction reads the same data again
general_log 0
long_query_time 10
log_queries_not_using_indexes 0
log_output NONE allowed values TABLE, FILE, NONE
we debated whether to do caching with NoSQL solutions or in memory memcached or redis/ replication and read/write split and load balancing. We finally came to the conclusion that going from one server to two was a
lot simpler from an application-design standpoint,
more cost effective as can scale up/down the DB under high/low load independently,
better in terms of high availability - outage in one database will affect only related services.
Enforces loose coupling by preventing direct access to the DB from developers
Cons- sharding/other solution is needed if a table grows really huge
Memory is most important. To tell if your working set is almost all in memory, check the ReadIOPS metric (while the DB instance is under load. The value of ReadIOPS should be small and stable. If scaling up the DB instance class—to a class with more RAM—results in a dramatic drop in ReadIOPS, your working set was not almost completely in memory. Continue to scale up until ReadIOPS no longer drops dramatically after a scaling operation, or ReadIOPS is reduced to a very small amount
Typical IOPS should be within baseline for consistent performance
Limit is around 75% for CPU, Memory and Storage Metrics. If the usage/metric goes up, cloudwatch alarms are triggered and accordingly take action if this is getting breached consistently.
QPS(Not monitoring)
Context switching to go to mongo from mysql.
we are using mongoDB for non transactional data which is highly unstructured
Separate Leads module which was using MongoDB. Use cases were different as the data is highly unstructured
Give context for keeping data in mongo.(not required)
Application - transactional
Leads non-transaction
Transactional data grew 3000%
Non transactional data grew by 250%
Goal was to get searches to happen within 250ms
Nitesh started on non-transactional which we released 1 year back and then Swapnil picked it up for transactional data
Lives easier, Machines Lazier
No need for an external load balancer • Since cluster does it‟s own routing. • Ask any server in the cluster, it will delegate to correct node.• What if … • More data > More shards. • More availability > More replicas per shard.
WHAT DOES IT ADD TO LUCENE?• RESTfull Service • JSON API over HTTP • Want to use it from PHP? • CURL Requests, as if you‟d do requests to the Facebook Graph API.• High Availability & Performance • Clustering• Long Term Persistency • Write through to persistent storage system
Lucene is a java library. You can include it in your project and refer to its functions using function calls.
Elasticsearch is a JSON Based, Distributed, web server build over Lucene. Though it's Lucene who is doing the actual work beneath, Elasticsearch provides us a convenient layer over Lucene. Each shard in ELasticsearch is a separate Lucene instance. So to summarize
Elasticsearch is built over Lucene and provides a JSON based REST API to refer to Lucene features.
Elasticsearch provides a distributed system on top of Lucene. A distributed system is not something Lucene is aware of or built for. Elasticsearch provides this abstraction of distributed structure.
Provides other supporting features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc.
The number of primary shards in an index is fixed at the time that an index is cre‐ ated, but the number of replica shards can be changed at any time.
Let’s create an index called blogs in our empty one-node cluster. By default, indices are assigned five primary shards, but for the purpose of this demonstration, we’ll assign just three primary shards and one replica (one replica of every primary shard):
PUT /blogs {
"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1
} }
shard = hash(routing) % number_of_primary_shards
it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
Add one liner texts for this
The number of primary shards in an index is fixed at the time that an index is cre‐ ated, but the number of replica shards can be changed at any time.
Let’s create an index called blogs in our empty one-node cluster. By default, indices are assigned five primary shards, but for the purpose of this demonstration, we’ll assign just three primary shards and one replica (one replica of every primary shard):
PUT /blogs {
"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1
} }
Logstash queue size capped at arbitrary # 20 - (non-configurable)
Back PressureInflight events can be lost - so small size queuesPersistent queues in pipeline
#
# # How many events to retrieve from inputs before sending to filters+workers
# pipeline.batch.size: 125
#
# # How long to wait in milliseconds while polling for the next event
# # before dispatching an undersized batch to filters+outputs
# pipeline.batch.delay: 50
Add isSync flag etc as exmaple
Take a break at the end
Data has grown immensely
Search speed has reduced significantly
Also means reduced load on primary data store (mysql/ mongo). No scaling up has happened since an year