Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu

OLTP in Supply Chain
Vijay Babu
Rajesh Kannan

OLTP in FSG
● From Promise to Customer Doorstep
● Various Microservices and multiple OLTPs
● Two problems
○ High scale OLTP
○ Change propagation

High Scale OLTP
● OLTP system with high throughput and large dataset
● Inventory Management System in Flipkart
● Manages the Inventory view of all 100s of Millions of
Listings
● Handles reservations for each order that is taken in
Flipkart

High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset

Concurrent Transactions
● More users try to reserve same item
● Problem is more prominent at scale during sale events

Traditional solutions improvised
● How to scale for higher throughput at low latency?
● Cache the data
○ Data is too dynamic
○ Data size is too huge to cache
○ Use a in-memory store

Traditional solutions improvised
● How to scale for a large data size?
● Shard the data
○ Still data size is too big
○ Key value store with encoded data

Data Encoding
● Space savings to a scale of 5X
● More prominent when there are more attributes
Simple encoding example:
{"quantity":100,"reservations":50}
Encodes to (with delimiter as “::”)
“100::50”

Concurrency with encoding
● Encoding/Decoding at App layer leads to concurrency
issues
● Updating the encoded data with partial tokenizing logic
in data store
○ Exploiting data store capabilities for
serializing/deserializing the tokens and operating on
them
○ Lua for redis/aerospike or mysql procedure

Unique Challenges
● Scale for BAU, Super-Scale for Sale events
● Hotspot scenario during such events
● High concurrency on a single resource

Hotspot scenario
● Very high throughput in a single resource
● Sharding will not distribute the load
● Impact of Network PPS

Working with PPS limits
● The impact of PPS can be limited by two approaches
○ Slave reduction
○ Transaction Buffering

Slave reduction
● Isolate read and write scaling
● Precomputed stores for majority of use cases
● Only important reads use the write store
● Pros
○ PPS limit avoided
● Cons
○ Data consistency issues between write and read
stores

Transaction buffering
● Buffer transactions at App layer
○ Hit data store in batches
○ Buffer for a limited time limit
○ Buffer for a upper bound of size
● Pros
○ PPS limit avoided
● Cons
○ Increased App latencies
○ More threads and memory consumption at app

Other Challenges
● Skewed Throughput
● Atomicity at scale

● Traditional strategies - might not suffice once a certain
scale is hit
● Mix of more than one common strategy
● Variation of a traditional strategy
Learnings summary

Part 2: Change Propagation
in OLTP

What we do
● City Logistics Team handles
○ First & Last leg in Flipkart Supply Chain
○ Delivering shipments to customer
○ Collects returns
● Built on top of multiple microservices backed by mysql

Business Problem => Tech problem
● Need for real-time dashboards to
track ground operations
● Achieve real-time replication
against the traditional ETL Job
● Need querying on unified data
view that is obtained from data-
sources of multiple
microservices
● Ability to easily manage schema
changes and not let change
propagation fail due to this
● Isolation of source and
consumer
● Sending notifications based on
the domain data change
● Ability to alter/enrich from
multiple data streams

Vertica for operational dashboards
Vertica as a centralized analytical datastore for all
operational needs
○ Columnar data store. Provides high-compression of
data
○ Supports Massively Parallel Processing (MPP) and
scales linearly.

Need for Change Propagation System !

Options for Change Data
Propagation
Options Message
generation
Relay API Latency IOPS/ IO
overhead
Others
Outbound
store
Sync Async High High Impl. is
relatively easy
Binary Log
replication
Async Async N/A Low More control!
But complex
to impl

Tungsten
● https://github.com/vmware/tungsten-replicator
● Provides eventual consistency with exactly-once
delivery
● Native support for DDLs and DMLs
● Replication connectors available for
○ Source(Extractor): MySQL, Oracle, Amazon RDS
○ Target(Applier): Vertica, MySQL, Oracle , HDFS,
other NoSQL and data-warehouse stores.

Tungsten - Contd.
● Supports filters (Javascript/java) where transactional
data can be enriched, altered & monitored.
● Supports parallel replication
● Simplicity to use and rich tools for administration
● Has good code documentation, so adding or enhancing
feature is easy
● Active community support

Tungsten Architecture
● Master - Slave topology
● Master (Extractor)
○ pulls binlog from source
○ generates THL
● Slave(Applier)
○ pulls THL from
master
○ applies to target
store using JDBC/native
connector.

Tungsten Adoption: Key challenges

Problem - Hard dependency
on mysql master
Context:
Tungsten service persists checkpoints in the source
database.
Problem:
Not a good idea to compromise on master sanctity for this
purpose. Better to use a read-only slave as a source. But
Tungsten does not support replicating from mysql slave
server

Problem - Hard dependency
on mysql master - contd
Added the ability to use any remote jdbc store to
maintain checkpointing details.

Problem: No bootstrap support for
onboarding
Problem:
No native support for onboarding an existing mysql database to Vertica
Solution:
Automated scripts for
● Create tables in Vertica
● Export mysql data as csv and import to Vertica(issues with zero
dates, bit type)
● Spawn new tungsten master and slave instances.
● Enable live replication from the binlog position
● Data validation between source & target
Now onboarding process requires just 5 mins of manual intervention

Problem: Rollback
transactions in binlog
Problem:
● An app uses temporary tables.
● During txn rollback, reverted rows also applied onto
Vertica
● Hence inconsistency in Vertica.

Replication during Statement binlog
format - Transaction Commit
Queries executing at Master Inside the binlog Inside the mysql Slave
<<start Txn1>>
Create temp table tmp1
Insert into tmp1 values(...)
Commit
● All queries logged
● Commit
Tmp1 table created with data
<<start Txn2>>
Insert into mainTable1 select * from
tmp1
All queries logged Insertion to mainTable1
Drop Temp table tmp1 All queries logged Dropped tmp1
Commit Commit
TIME

Replication during Statement binlog
format - Transaction Rollback
Queries executing at Master Inside the binlog Inside the mysql Slave
<<start Txn1>>
Create temp table tmp1
Insert into tmp1 values(...)
Commit
● All queries logged
● Commit
Tmp1 table created with
data
<<start Txn2>>
Insert into mainTable1 select * from
tmp1
All queries logged Insertion to mainTable1
Drop Temp table tmp1 All queries logged Dropped tmp1
Rollback Rollback Undo mainTable1 inserts
TIME

Problem: Rollback transactions in
binlog - Contd
Context:
● Temp tables are connection scoped; Temp tables are replicated in
binlog for STATEMENT format, but not for ROW.
● Binlog formats can be changed on the fly
● Hence Txn containing drop temp table logged for backward
compatibility.
Solution:
1. Lack of support for ROLLBACK statement for Vertica applier.
Implementing rollback at Vertica Applier needs redesigning
2. So avoided the problem by keeping Drop Temp table in a
separate transaction.

Problem: Replication breaks
during schema change
Problem:
● Vertica replication fails when a table is created/altered
● Tungsten does not propagate schema change for cross-
platforms.
Solution:
● Added ability for major and minor DDL replication
● Achieved by extracting table metadata from MySQL and
generating SQL
● Supports most of the DDL commands in Vertica

Problem - operational overhead
during mysql servers switch
Problem:
● When Mysql slave is down, switching to another is not
trivial because binlog position is localized
● Causes duplicate key error at Vertica
Solution:
● Introduced skip-replication-error flag, enabled during the
switch window
● It overwrites existing data (INSERT=> UPDATE,
UPDATE => double UPDATE)

Problem - Jmx metrics
collection
Problem:
● Tungsten publishes replication metrics as MBean
operation
● Flipkart in-house(Cosmos) metrix collector understands
MBean attribute
● Hence no visibility on the system
Solution:
Encapsulated the metrics as MBean attributes

Where we are now
Production Environment details:
● 20 Tungster masters reading from source systems
● Replicating 50 Databases
● dataset of size 5TB
● Processing binlogs of 100GB every day
● Processing throughput 10K row change events/sec

Legacy Stack migration - Sync Bridge

Reference:
Branch contains enhancements & fixes
https://github.fkinternal.com/Flipkart/ekl-tungsten-replicator/tree/cl_changes
Confluence space
https://confluence.fkinternal.com/display/ECLO/Data+Framework

Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu

Similar to Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu (20)

Recently uploaded

Recently uploaded (20)

Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu