SlideShare a Scribd company logo
OLTP in Supply Chain
Vijay Babu
Rajesh Kannan
OLTP in FSG
● From Promise to Customer Doorstep
● Various Microservices and multiple OLTPs
● Two problems
○ High scale OLTP
○ Change propagation
Part 1: High Scale OLTP
High Scale OLTP
● OLTP system with high throughput and large dataset
● Inventory Management System in Flipkart
● Manages the Inventory view of all 100s of Millions of
Listings
● Handles reservations for each order that is taken in
Flipkart
Inventory Management
Inventory Management
Inventory Management
Inventory Management
High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset
High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset
High Throughput
High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset
Concurrent Transactions
● More users try to reserve same item
● Problem is more prominent at scale during sale events
High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset
High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset
High Scale OLTP
● High throughput
● High concurrency
● Low latency
● Data consistency
● Large dataset
Traditional solutions improvised
● How to scale for higher throughput at low latency?
● Cache the data
○ Data is too dynamic
○ Data size is too huge to cache
○ Use a in-memory store
Traditional solutions improvised
● How to scale for a large data size?
● Shard the data
○ Still data size is too big
○ Key value store with encoded data
Data store topology
Data Encoding
● Space savings to a scale of 5X
● More prominent when there are more attributes
Simple encoding example:
{"quantity":100,"reservations":50}
Encodes to (with delimiter as “::”)
“100::50”
Concurrency with encoding
● Encoding/Decoding at App layer leads to concurrency
issues
● Updating the encoded data with partial tokenizing logic
in data store
○ Exploiting data store capabilities for
serializing/deserializing the tokens and operating on
them
○ Lua for redis/aerospike or mysql procedure
Concurrency with encoding
Unique Challenges
● Scale for BAU, Super-Scale for Sale events
● Hotspot scenario during such events
● High concurrency on a single resource
Hotspot scenario
● Very high throughput in a single resource
● Sharding will not distribute the load
● Impact of Network PPS
PPS Impact
Working with PPS limits
● The impact of PPS can be limited by two approaches
○ Slave reduction
○ Transaction Buffering
Working with PPS limits
● The impact of PPS can be limited by two approaches
○ Slave reduction
○ Transaction Buffering
Slave reduction
Slave reduction
● Isolate read and write scaling
● Precomputed stores for majority of use cases
● Only important reads use the write store
● Pros
○ PPS limit avoided
● Cons
○ Data consistency issues between write and read
stores
Working with PPS limits
● The impact of PPS can be limited by two approaches
○ Slave reduction
○ Transaction Buffering
Transaction buffering
Transaction buffering
● Buffer transactions at App layer
○ Hit data store in batches
○ Buffer for a limited time limit
○ Buffer for a upper bound of size
● Pros
○ PPS limit avoided
● Cons
○ Increased App latencies
○ More threads and memory consumption at app
Other Challenges
● Skewed Throughput
● Atomicity at scale
● Traditional strategies - might not suffice once a certain
scale is hit
● Mix of more than one common strategy
● Variation of a traditional strategy
Learnings summary
Part 2: Change Propagation
in OLTP
What we do
● City Logistics Team handles
○ First & Last leg in Flipkart Supply Chain
○ Delivering shipments to customer
○ Collects returns
● Built on top of multiple microservices backed by mysql
Business Problem => Tech problem
● Need for real-time dashboards to
track ground operations
● Achieve real-time replication
against the traditional ETL Job
● Need querying on unified data
view that is obtained from data-
sources of multiple
microservices
● Ability to easily manage schema
changes and not let change
propagation fail due to this
● Isolation of source and
consumer
● Sending notifications based on
the domain data change
● Ability to alter/enrich from
multiple data streams
Vertica for operational dashboards
Vertica as a centralized analytical datastore for all
operational needs
○ Columnar data store. Provides high-compression of
data
○ Supports Massively Parallel Processing (MPP) and
scales linearly.
Need for Change Propagation System !
Options for Change Data
Propagation
Options Message
generation
Relay API Latency IOPS/ IO
overhead
Others
Outbound
store
Sync Async High High Impl. is
relatively easy
Binary Log
replication
Async Async N/A Low More control!
But complex
to impl
Tungsten
● https://github.com/vmware/tungsten-replicator
● Provides eventual consistency with exactly-once
delivery
● Native support for DDLs and DMLs
● Replication connectors available for
○ Source(Extractor): MySQL, Oracle, Amazon RDS
○ Target(Applier): Vertica, MySQL, Oracle , HDFS,
other NoSQL and data-warehouse stores.
Tungsten - Contd.
● Supports filters (Javascript/java) where transactional
data can be enriched, altered & monitored.
● Supports parallel replication
● Simplicity to use and rich tools for administration
● Has good code documentation, so adding or enhancing
feature is easy
● Active community support
Tungsten Architecture
● Master - Slave topology
● Master (Extractor)
○ pulls binlog from source
○ generates THL
● Slave(Applier)
○ pulls THL from
master
○ applies to target
store using JDBC/native
connector.
Operational Dashboard
Tungsten Adoption: Key challenges
Problem - Hard dependency
on mysql master
Context:
Tungsten service persists checkpoints in the source
database.
Problem:
Not a good idea to compromise on master sanctity for this
purpose. Better to use a read-only slave as a source. But
Tungsten does not support replicating from mysql slave
server
Problem - Hard dependency
on mysql master - contd
Added the ability to use any remote jdbc store to
maintain checkpointing details.
Problem: No bootstrap support for
onboarding
Problem:
No native support for onboarding an existing mysql database to Vertica
Solution:
Automated scripts for
● Create tables in Vertica
● Export mysql data as csv and import to Vertica(issues with zero
dates, bit type)
● Spawn new tungsten master and slave instances.
● Enable live replication from the binlog position
● Data validation between source & target
Now onboarding process requires just 5 mins of manual intervention
Problem: Rollback
transactions in binlog
Problem:
● An app uses temporary tables.
● During txn rollback, reverted rows also applied onto
Vertica
● Hence inconsistency in Vertica.
Replication during Statement binlog
format - Transaction Commit
Queries executing at Master Inside the binlog Inside the mysql Slave
<<start Txn1>>
Create temp table tmp1
Insert into tmp1 values(...)
Commit
● All queries logged
● Commit
Tmp1 table created with data
<<start Txn2>>
Insert into mainTable1 select * from
tmp1
All queries logged Insertion to mainTable1
Drop Temp table tmp1 All queries logged Dropped tmp1
Commit Commit
TIME
Replication during Statement binlog
format - Transaction Rollback
Queries executing at Master Inside the binlog Inside the mysql Slave
<<start Txn1>>
Create temp table tmp1
Insert into tmp1 values(...)
Commit
● All queries logged
● Commit
Tmp1 table created with
data
<<start Txn2>>
Insert into mainTable1 select * from
tmp1
All queries logged Insertion to mainTable1
Drop Temp table tmp1 All queries logged Dropped tmp1
Rollback Rollback Undo mainTable1 inserts
TIME
Problem: Rollback transactions in
binlog - Contd
Context:
● Temp tables are connection scoped; Temp tables are replicated in
binlog for STATEMENT format, but not for ROW.
● Binlog formats can be changed on the fly
● Hence Txn containing drop temp table logged for backward
compatibility.
Solution:
1. Lack of support for ROLLBACK statement for Vertica applier.
Implementing rollback at Vertica Applier needs redesigning
2. So avoided the problem by keeping Drop Temp table in a
separate transaction.
Problem: Replication breaks
during schema change
Problem:
● Vertica replication fails when a table is created/altered
● Tungsten does not propagate schema change for cross-
platforms.
Solution:
● Added ability for major and minor DDL replication
● Achieved by extracting table metadata from MySQL and
generating SQL
● Supports most of the DDL commands in Vertica
Problem - operational overhead
during mysql servers switch
Problem:
● When Mysql slave is down, switching to another is not
trivial because binlog position is localized
● Causes duplicate key error at Vertica
Solution:
● Introduced skip-replication-error flag, enabled during the
switch window
● It overwrites existing data (INSERT=> UPDATE,
UPDATE => double UPDATE)
Problem - Jmx metrics
collection
Problem:
● Tungsten publishes replication metrics as MBean
operation
● Flipkart in-house(Cosmos) metrix collector understands
MBean attribute
● Hence no visibility on the system
Solution:
Encapsulated the metrics as MBean attributes
Where we are now
Production Environment details:
● 20 Tungster masters reading from source systems
● Replicating 50 Databases
● dataset of size 5TB
● Processing binlogs of 100GB every day
● Processing throughput 10K row change events/sec
Legacy Stack migration - Sync Bridge
Reference:
Branch contains enhancements & fixes
https://github.fkinternal.com/Flipkart/ekl-tungsten-replicator/tree/cl_changes
Confluence space
https://confluence.fkinternal.com/display/ECLO/Data+Framework
Thank You!!

More Related Content

What's hot

A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
confluent
 

What's hot (20)

What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
 
What’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStoreWhat’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStore
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
How THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scaleHow THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scale
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
 
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloads
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDB
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streams
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
 
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
 
M|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureM|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database Infrastructure
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
 

Similar to Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu

Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 

Similar to Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu (20)

Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Taking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master ClusterTaking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master Cluster
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
 
Kafka used at scale to deliver real-time notifications
Kafka used at scale to deliver real-time notificationsKafka used at scale to deliver real-time notifications
Kafka used at scale to deliver real-time notifications
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagation - Rajesh Kannan, Vijay Babu

  • 1. OLTP in Supply Chain Vijay Babu Rajesh Kannan
  • 2. OLTP in FSG ● From Promise to Customer Doorstep ● Various Microservices and multiple OLTPs ● Two problems ○ High scale OLTP ○ Change propagation
  • 3. Part 1: High Scale OLTP
  • 4. High Scale OLTP ● OLTP system with high throughput and large dataset ● Inventory Management System in Flipkart ● Manages the Inventory view of all 100s of Millions of Listings ● Handles reservations for each order that is taken in Flipkart
  • 9. High Scale OLTP ● High throughput ● High concurrency ● Low latency ● Data consistency ● Large dataset
  • 10. High Scale OLTP ● High throughput ● High concurrency ● Low latency ● Data consistency ● Large dataset
  • 12. High Scale OLTP ● High throughput ● High concurrency ● Low latency ● Data consistency ● Large dataset
  • 13. Concurrent Transactions ● More users try to reserve same item ● Problem is more prominent at scale during sale events
  • 14. High Scale OLTP ● High throughput ● High concurrency ● Low latency ● Data consistency ● Large dataset
  • 15. High Scale OLTP ● High throughput ● High concurrency ● Low latency ● Data consistency ● Large dataset
  • 16. High Scale OLTP ● High throughput ● High concurrency ● Low latency ● Data consistency ● Large dataset
  • 17. Traditional solutions improvised ● How to scale for higher throughput at low latency? ● Cache the data ○ Data is too dynamic ○ Data size is too huge to cache ○ Use a in-memory store
  • 18. Traditional solutions improvised ● How to scale for a large data size? ● Shard the data ○ Still data size is too big ○ Key value store with encoded data
  • 20. Data Encoding ● Space savings to a scale of 5X ● More prominent when there are more attributes Simple encoding example: {"quantity":100,"reservations":50} Encodes to (with delimiter as “::”) “100::50”
  • 21. Concurrency with encoding ● Encoding/Decoding at App layer leads to concurrency issues ● Updating the encoded data with partial tokenizing logic in data store ○ Exploiting data store capabilities for serializing/deserializing the tokens and operating on them ○ Lua for redis/aerospike or mysql procedure
  • 23. Unique Challenges ● Scale for BAU, Super-Scale for Sale events ● Hotspot scenario during such events ● High concurrency on a single resource
  • 24. Hotspot scenario ● Very high throughput in a single resource ● Sharding will not distribute the load ● Impact of Network PPS
  • 26. Working with PPS limits ● The impact of PPS can be limited by two approaches ○ Slave reduction ○ Transaction Buffering
  • 27. Working with PPS limits ● The impact of PPS can be limited by two approaches ○ Slave reduction ○ Transaction Buffering
  • 29. Slave reduction ● Isolate read and write scaling ● Precomputed stores for majority of use cases ● Only important reads use the write store ● Pros ○ PPS limit avoided ● Cons ○ Data consistency issues between write and read stores
  • 30. Working with PPS limits ● The impact of PPS can be limited by two approaches ○ Slave reduction ○ Transaction Buffering
  • 32. Transaction buffering ● Buffer transactions at App layer ○ Hit data store in batches ○ Buffer for a limited time limit ○ Buffer for a upper bound of size ● Pros ○ PPS limit avoided ● Cons ○ Increased App latencies ○ More threads and memory consumption at app
  • 33. Other Challenges ● Skewed Throughput ● Atomicity at scale
  • 34. ● Traditional strategies - might not suffice once a certain scale is hit ● Mix of more than one common strategy ● Variation of a traditional strategy Learnings summary
  • 35. Part 2: Change Propagation in OLTP
  • 36. What we do ● City Logistics Team handles ○ First & Last leg in Flipkart Supply Chain ○ Delivering shipments to customer ○ Collects returns ● Built on top of multiple microservices backed by mysql
  • 37. Business Problem => Tech problem ● Need for real-time dashboards to track ground operations ● Achieve real-time replication against the traditional ETL Job ● Need querying on unified data view that is obtained from data- sources of multiple microservices ● Ability to easily manage schema changes and not let change propagation fail due to this ● Isolation of source and consumer ● Sending notifications based on the domain data change ● Ability to alter/enrich from multiple data streams
  • 38. Vertica for operational dashboards Vertica as a centralized analytical datastore for all operational needs ○ Columnar data store. Provides high-compression of data ○ Supports Massively Parallel Processing (MPP) and scales linearly.
  • 39. Need for Change Propagation System !
  • 40. Options for Change Data Propagation Options Message generation Relay API Latency IOPS/ IO overhead Others Outbound store Sync Async High High Impl. is relatively easy Binary Log replication Async Async N/A Low More control! But complex to impl
  • 41. Tungsten ● https://github.com/vmware/tungsten-replicator ● Provides eventual consistency with exactly-once delivery ● Native support for DDLs and DMLs ● Replication connectors available for ○ Source(Extractor): MySQL, Oracle, Amazon RDS ○ Target(Applier): Vertica, MySQL, Oracle , HDFS, other NoSQL and data-warehouse stores.
  • 42. Tungsten - Contd. ● Supports filters (Javascript/java) where transactional data can be enriched, altered & monitored. ● Supports parallel replication ● Simplicity to use and rich tools for administration ● Has good code documentation, so adding or enhancing feature is easy ● Active community support
  • 43. Tungsten Architecture ● Master - Slave topology ● Master (Extractor) ○ pulls binlog from source ○ generates THL ● Slave(Applier) ○ pulls THL from master ○ applies to target store using JDBC/native connector.
  • 46. Problem - Hard dependency on mysql master Context: Tungsten service persists checkpoints in the source database. Problem: Not a good idea to compromise on master sanctity for this purpose. Better to use a read-only slave as a source. But Tungsten does not support replicating from mysql slave server
  • 47. Problem - Hard dependency on mysql master - contd Added the ability to use any remote jdbc store to maintain checkpointing details.
  • 48. Problem: No bootstrap support for onboarding Problem: No native support for onboarding an existing mysql database to Vertica Solution: Automated scripts for ● Create tables in Vertica ● Export mysql data as csv and import to Vertica(issues with zero dates, bit type) ● Spawn new tungsten master and slave instances. ● Enable live replication from the binlog position ● Data validation between source & target Now onboarding process requires just 5 mins of manual intervention
  • 49. Problem: Rollback transactions in binlog Problem: ● An app uses temporary tables. ● During txn rollback, reverted rows also applied onto Vertica ● Hence inconsistency in Vertica.
  • 50. Replication during Statement binlog format - Transaction Commit Queries executing at Master Inside the binlog Inside the mysql Slave <<start Txn1>> Create temp table tmp1 Insert into tmp1 values(...) Commit ● All queries logged ● Commit Tmp1 table created with data <<start Txn2>> Insert into mainTable1 select * from tmp1 All queries logged Insertion to mainTable1 Drop Temp table tmp1 All queries logged Dropped tmp1 Commit Commit TIME
  • 51. Replication during Statement binlog format - Transaction Rollback Queries executing at Master Inside the binlog Inside the mysql Slave <<start Txn1>> Create temp table tmp1 Insert into tmp1 values(...) Commit ● All queries logged ● Commit Tmp1 table created with data <<start Txn2>> Insert into mainTable1 select * from tmp1 All queries logged Insertion to mainTable1 Drop Temp table tmp1 All queries logged Dropped tmp1 Rollback Rollback Undo mainTable1 inserts TIME
  • 52. Problem: Rollback transactions in binlog - Contd Context: ● Temp tables are connection scoped; Temp tables are replicated in binlog for STATEMENT format, but not for ROW. ● Binlog formats can be changed on the fly ● Hence Txn containing drop temp table logged for backward compatibility. Solution: 1. Lack of support for ROLLBACK statement for Vertica applier. Implementing rollback at Vertica Applier needs redesigning 2. So avoided the problem by keeping Drop Temp table in a separate transaction.
  • 53. Problem: Replication breaks during schema change Problem: ● Vertica replication fails when a table is created/altered ● Tungsten does not propagate schema change for cross- platforms. Solution: ● Added ability for major and minor DDL replication ● Achieved by extracting table metadata from MySQL and generating SQL ● Supports most of the DDL commands in Vertica
  • 54. Problem - operational overhead during mysql servers switch Problem: ● When Mysql slave is down, switching to another is not trivial because binlog position is localized ● Causes duplicate key error at Vertica Solution: ● Introduced skip-replication-error flag, enabled during the switch window ● It overwrites existing data (INSERT=> UPDATE, UPDATE => double UPDATE)
  • 55. Problem - Jmx metrics collection Problem: ● Tungsten publishes replication metrics as MBean operation ● Flipkart in-house(Cosmos) metrix collector understands MBean attribute ● Hence no visibility on the system Solution: Encapsulated the metrics as MBean attributes
  • 56. Where we are now Production Environment details: ● 20 Tungster masters reading from source systems ● Replicating 50 Databases ● dataset of size 5TB ● Processing binlogs of 100GB every day ● Processing throughput 10K row change events/sec
  • 57. Legacy Stack migration - Sync Bridge
  • 58. Reference: Branch contains enhancements & fixes https://github.fkinternal.com/Flipkart/ekl-tungsten-replicator/tree/cl_changes Confluence space https://confluence.fkinternal.com/display/ECLO/Data+Framework