Senior Software Engineer at Google.
Previously worked with several startups.
Works on distributed systems, system architecture, etc.
We’ll talk about databases.
With a case-study of SMS Gyan.
● SMS Gyan was launched by Innoz in 2008.
● An SMS based answering engine, came to be known as “Internet on SMS”.
airtel SMS Gyan
Data modelling is perhaps the most important part of developing software.
Decision on how to structure, store, and retrieve data can affect the entire
application, throughout its life.
There are several factors to consider while choosing a database, such as,
● Structure of the data
● Expected data volume
● Performance requirements
Relational vs Non-Relational Databases
For structured data.
Stores data in tables that may share information (and
Uses JOIN queries to access data in different tables.
Performance tuning becomes necessary with large
volumes of data.
Relatively difficult to scale out.
Lacks flexibility in how data is stored.
Atomicity, Consistency, Isolation, and Durability (ACID)
For unstructured data (documents).
No concept of tables, fields/columns.
MongoDB, And Elasticsearch store data as JSON-like
Supports data locality.
Can easily support very large volumes of data.
Easier to scale out, because of native support for
replication, sharding, etc.
Can support changes to the structure of data stored,
making it easier to modify the application layer.
No transactions (typically), so no ACID guarantees.
Some provide Eventual Consistency.
Consistency or Availability?
● Network partitions will inevitably happen in a distributed system.
● Choosing between a relational vs non-relational db can boil down to this
● The first version was a simple PHP app with a MySQL database.
● Supported a few hundred users and a few hundred queries a day.
● A Relational Database Management System (RDBMS).
● One of the most popular databases.
● Free and open-source, easy to get started.
● Reliable and scalable.
Need to store
● The queries from users
● The answers to the queries (as a local cache)
● User details (network operator, whether a subscriber, etc)
phone network is_subscribed query result source query_ts
9876543210 airtel 1 MySQL MySQL is
wikipedia 2009-11-10 12:00:00
phone query query_ts
9876543210 MySQL 2009-11-10 12:00:00
phone network is_subscribed last_active
9876543210 airtel 1 2009-11-10
query result source
MySQL MySQL is an open-source
High volume of airtel 121 requests
● The application was receiving a large number of requests (> 1000 qps).
● Caused the database to become slow, and the requests to fail (SLA violation).
App DB Airtel
● MySQL FULLTEXT index was used.
● The results were sometimes not accurate, especially for queries that are
sentences or phrases.
● MySQL performance was deteriorating as the data volume was increasing.
Improving search results
query result source
MySQL MySQL is an
● Designed for really fast text searches. Supports stemming, ranking, etc.
● Data is stored as documents. Provides REST APIs to read and write data.
● Highly available, scalable, and (relatively) easy to configure.
● Natively supports sharding and replication.
Cluster: Consists of one or more nodes.
Node: An instance of ES.
Index: A logical namespace, maps to one or more primary shards, and can have 0 or more
Document: A record stored in ES.
Shard: A single low-level worker unit managed by ES.
Primary Shard: Each document is stored in a primary shard.
Replica Shard: A copy of a Primary shard. Each primary shard can have 0 or more replicas.
Replicas help distribute ES’s load, and can help in failover if a primary shard is unavailable.
Pagination of results
● SMS replies put a limit on the length of content, so a whole wikipedia article
would be returned as several pages.
● Users need to send SMS to retrieve each page.
● A distributed, in-memory data structure store.
○ Can store simple key values, Sets, Lists, Ordered Lists, etc., and can perform operations such
as Set union/intersection, push/pop to/from Lists, etc.
● Can be used as an in-memory key-value db, cache, and message broker.
● Durability is optional.
● Different function from the databases discussed earlier.
In SMS Gyan
1. Fetch query result (database, or source on internet)
2. Write the entire result into cache, with user’s phone number as key.
3. Extract a page (upto 240 characters) and send to user, remove the served
page from the content in cache.
4. If user requests more pages, do step 3.
5. Clear the key if
a. The user sends a different query, or
b. There is no request from user for a specific period of time.