SlideShare a Scribd company logo
Aesop Change Data Propagation :
Bridging SQL and NoSQL Systems
Regunath B, Principal Architect, Flipkart
github.com/regunathb twitter.com/RegunathB
What data store?
• In interviews: I need to scale and therefore will use a
NoSQL database
• Avoids the overheads of RDBMS!?
• XX product brochure:
• Y million ops/sec (Lies, Damn Lies, and Benchmarks)
• In-memory, Flash optimised
• In architecture reviews:
• Durability of data, disk-to-memory ratios
• How many nodes in a single cluster?
• CAP tradeoffs: Consistency vs. Availability
2
SCALING AN E-COMMERCE
WEBSITE
4
Source: Wayback Machine (ca. 2007)
Scaling the data store
Source: http://www.slideshare.net/slashn/slash-n-tech-talk-track-2-website-architecturemistakes-learnings-siddhartha-reddy
MySQL&
Master&
Website&
Writes&
MySQL&
Slave&
Website&
Reads&
Analy5cs&
Reads&
Replica5on&
MySQL&
Master&
Website&
Writes&
MySQL&
Slave&1&
Website&
Reads&
Replica6on&
MySQL&
Slave&2&
Analy6cs&
Reads&
Replica6on&
MySQL&
Master&
Writes&
MySQL&
Slave&
Reads&
Replica4on&
5
Data Store
User session Memcached,
HBase
Product HBase, Elastic
Search, Redis
Cart MySQL,
MongoDB
Notifications HBase
Search Solr, Neo4j
Recommendations Hadoop MR,
Redis
Pricing (WIP) MySQL,
Aerospike
Scaling the data store
(polyglot persistence)
DATA CONSISTENCY IN
POLYGLOT PERSISTENCE
Caching/Serving Layer challenges
There are only two hard things in Computer Science: cache
invalidation and naming things.
-- Phil Karlton
7
• Cache TTLs, High Request concurrency, Lazy caching
• Thundering herds
• Availability of primary data store
• Cache size, distribution, no. of replicas
• Feasibility of write-through
• Serving layer is Eventually Consistent, at best
Eventual Consistency
• Replicas converge over time
• Pros
• Scale reads through multiple replicas
• Higher overall data availability
• Cons
• Reads return live data before convergence
• Need to implement Strong Eventual Consistency
when timeline-consistent view of data is needed
• Achieving Eventual Consistency is not easy
• Trivially requires Atleast-Once delivery guarantee of
updates to all replicas
8
AESOP - CHANGE DATA
CAPTURE, PROPAGATION
Introduction
10
• A keen observer of changes that can also relay change events reliably
to interested parties. Provides useful infrastructure for building
Eventually Consistent data sources and systems.
• Open Source : https://github.com/Flipkart/aesop
• Support : aesop-users@googlegroups.com
• Production Deployments at Flipkart :
• Payments : Multi-tiered datastore spanning MySQL, HBase
• ETL : Move changes on User accounts to data analysis platform/
warehouse
• Data Serving : Capture Wishlist data updates on MySQL and index
in Elastic Search
• WIP : Accounting, Pricing, Order management etc.
Change Propagation Approach
11
• Core Tech stack
• LinkedIn Databus
• Open Replicator (FK fork)
• NGData HBase SEP
• Netflix Zeno
• Apache Helix
Aesop Components
• Producer : Uses Log Mining (Old wine in new bottle?)
• "Durability is typically implemented via logging and
recovery.” Architecture of a Database System
• "The contents of the DB are a cache of the latest records in
the log. The truth is the log. The database is a cache of a
subset of the log.” - Jay Kreps (creator of Kafka)
• WAL (write ahead log) ensures:
• Each modification is flushed to disk
• Log records are in order
12
Aesop Components
• Databus Relay : Ring-Buffer holding Avro
serialised change events
• Memory mapped
• Similar to a Broker in a pub-sub system
• Enhanced in Aesop for configurability, metrics
collection and admin console
• Databus Consumer(s) : Sinks for change events
• Enhanced in Aesop for bootstrapping,
configurability, data transformation
13
Aesop Architecture
14
Event Consumption
• Data transformation
• Data Layer : Multiple
destinations
• MySQL
• Elastic Search
• HBase
15
Client Clustering : HA & Load
Balancing
16
Monitoring Console
17
Monitoring Console
18
Aesop Utilities
• Blocking Bootstrap
• Cold start
consumers
• Avro schema
generator
• SCN Generator
• Generational SCN
generator (to
handle MySQL
mastership
transfer)
19
Performance (Lies, Damn Lies, and Benchmarks)
• MySQL —> HBase
• Relay : 1 XL VM (8 core, 32GB)
• Consumers : 4 XL, 200 partitions
• Throughput : 30K Inserts per sec.
• Data size : 800 GB
• Time : 60 hrs
• Observations:
• Busy Relay - 95% CPU (serving data to 200 partitions)
• High producer throughput - Log read operates at disk transfer
rate
• High consumer throughput - Append-only writes of HBase
• Better scale possible with larger machine for Relay
• Partitioning Relay might be tricky - to preserve WAL edits ordering
20
Future Work
• Enhance, Implement:
• Producers
• HBase, MongoDB, etc.
• Data Layers
• Redis, Aerospike, etc.
• Document Operational best practices
• e.g. MySQL mastership transfer
• Infra component for building tiered data stores
• Sharded, Secondary indices, Low Latency, HW
optimized (high Disk-Memory ratios)
21
Aesop change data propagation

More Related Content

What's hot

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Michael Stack
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
✔ Eric David Benari, PMP
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010Membase
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
SingleStore
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon
 
Google mesa
Google mesaGoogle mesa
Google mesa
Sameer Tiwari
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
Michael Stack
 
RubiX
RubiXRubiX
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Shubham Tagra
 
Rebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBaseRebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBase
Robert Roland
 
Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!
Seetharam Venkatesh
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
Ashnikbiz
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
Anurag S
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
Alluxio, Inc.
 
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart... Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Ashish Tadose
 

What's hot (20)

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
 
Google mesa
Google mesaGoogle mesa
Google mesa
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
 
RubiX
RubiXRubiX
RubiX
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Rebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBaseRebuilding from MongoDB for Scale on HBase
Rebuilding from MongoDB for Scale on HBase
 
Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!Data Management on Hadoop at Yahoo!
Data Management on Hadoop at Yahoo!
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart... Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 

Viewers also liked

Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
Regunath B
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
Regunath B
 
Srikanth Nadhamuni
Srikanth NadhamuniSrikanth Nadhamuni
Srikanth Nadhamuni
eletseditorial
 
Aadhaar
AadhaarAadhaar
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uid
Ajit Dadresa
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
Regunath B
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
saipriyadonthula
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
Ali Raw
 

Viewers also liked (9)

Uid
UidUid
Uid
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
Srikanth Nadhamuni
Srikanth NadhamuniSrikanth Nadhamuni
Srikanth Nadhamuni
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uid
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Similar to Aesop change data propagation

SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
Brian Culver
 
Иван Глушков (Echo)
Иван Глушков (Echo)Иван Глушков (Echo)
Иван Глушков (Echo)Ontico
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
Sanjay Manwani
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
HBaseCon
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
Nguyen Tung
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
European SharePoint Conference
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
 
Is OLAP Dead?: Can Next Gen Tools Take Over?
Is OLAP Dead?: Can Next Gen Tools Take Over?Is OLAP Dead?: Can Next Gen Tools Take Over?
Is OLAP Dead?: Can Next Gen Tools Take Over?
Senturus
 
How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013
Dipti Borkar
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
Brian Culver
 
high performance databases
high performance databaseshigh performance databases
high performance databases
mahdi_92
 
Membase East Coast Meetups
Membase East Coast MeetupsMembase East Coast Meetups
Membase East Coast Meetups
Membase
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
Severalnines
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Michael Stack
 

Similar to Aesop change data propagation (20)

SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Иван Глушков (Echo)
Иван Глушков (Echo)Иван Глушков (Echo)
Иван Глушков (Echo)
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Is OLAP Dead?: Can Next Gen Tools Take Over?
Is OLAP Dead?: Can Next Gen Tools Take Over?Is OLAP Dead?: Can Next Gen Tools Take Over?
Is OLAP Dead?: Can Next Gen Tools Take Over?
 
How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
high performance databases
high performance databaseshigh performance databases
high performance databases
 
Membase East Coast Meetups
Membase East Coast MeetupsMembase East Coast Meetups
Membase East Coast Meetups
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 

Recently uploaded

Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 

Aesop change data propagation

  • 1. Aesop Change Data Propagation : Bridging SQL and NoSQL Systems Regunath B, Principal Architect, Flipkart github.com/regunathb twitter.com/RegunathB
  • 2. What data store? • In interviews: I need to scale and therefore will use a NoSQL database • Avoids the overheads of RDBMS!? • XX product brochure: • Y million ops/sec (Lies, Damn Lies, and Benchmarks) • In-memory, Flash optimised • In architecture reviews: • Durability of data, disk-to-memory ratios • How many nodes in a single cluster? • CAP tradeoffs: Consistency vs. Availability 2
  • 4. 4 Source: Wayback Machine (ca. 2007) Scaling the data store Source: http://www.slideshare.net/slashn/slash-n-tech-talk-track-2-website-architecturemistakes-learnings-siddhartha-reddy MySQL& Master& Website& Writes& MySQL& Slave& Website& Reads& Analy5cs& Reads& Replica5on& MySQL& Master& Website& Writes& MySQL& Slave&1& Website& Reads& Replica6on& MySQL& Slave&2& Analy6cs& Reads& Replica6on& MySQL& Master& Writes& MySQL& Slave& Reads& Replica4on&
  • 5. 5 Data Store User session Memcached, HBase Product HBase, Elastic Search, Redis Cart MySQL, MongoDB Notifications HBase Search Solr, Neo4j Recommendations Hadoop MR, Redis Pricing (WIP) MySQL, Aerospike Scaling the data store (polyglot persistence)
  • 7. Caching/Serving Layer challenges There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton 7 • Cache TTLs, High Request concurrency, Lazy caching • Thundering herds • Availability of primary data store • Cache size, distribution, no. of replicas • Feasibility of write-through • Serving layer is Eventually Consistent, at best
  • 8. Eventual Consistency • Replicas converge over time • Pros • Scale reads through multiple replicas • Higher overall data availability • Cons • Reads return live data before convergence • Need to implement Strong Eventual Consistency when timeline-consistent view of data is needed • Achieving Eventual Consistency is not easy • Trivially requires Atleast-Once delivery guarantee of updates to all replicas 8
  • 9. AESOP - CHANGE DATA CAPTURE, PROPAGATION
  • 10. Introduction 10 • A keen observer of changes that can also relay change events reliably to interested parties. Provides useful infrastructure for building Eventually Consistent data sources and systems. • Open Source : https://github.com/Flipkart/aesop • Support : aesop-users@googlegroups.com • Production Deployments at Flipkart : • Payments : Multi-tiered datastore spanning MySQL, HBase • ETL : Move changes on User accounts to data analysis platform/ warehouse • Data Serving : Capture Wishlist data updates on MySQL and index in Elastic Search • WIP : Accounting, Pricing, Order management etc.
  • 11. Change Propagation Approach 11 • Core Tech stack • LinkedIn Databus • Open Replicator (FK fork) • NGData HBase SEP • Netflix Zeno • Apache Helix
  • 12. Aesop Components • Producer : Uses Log Mining (Old wine in new bottle?) • "Durability is typically implemented via logging and recovery.” Architecture of a Database System • "The contents of the DB are a cache of the latest records in the log. The truth is the log. The database is a cache of a subset of the log.” - Jay Kreps (creator of Kafka) • WAL (write ahead log) ensures: • Each modification is flushed to disk • Log records are in order 12
  • 13. Aesop Components • Databus Relay : Ring-Buffer holding Avro serialised change events • Memory mapped • Similar to a Broker in a pub-sub system • Enhanced in Aesop for configurability, metrics collection and admin console • Databus Consumer(s) : Sinks for change events • Enhanced in Aesop for bootstrapping, configurability, data transformation 13
  • 15. Event Consumption • Data transformation • Data Layer : Multiple destinations • MySQL • Elastic Search • HBase 15
  • 16. Client Clustering : HA & Load Balancing 16
  • 19. Aesop Utilities • Blocking Bootstrap • Cold start consumers • Avro schema generator • SCN Generator • Generational SCN generator (to handle MySQL mastership transfer) 19
  • 20. Performance (Lies, Damn Lies, and Benchmarks) • MySQL —> HBase • Relay : 1 XL VM (8 core, 32GB) • Consumers : 4 XL, 200 partitions • Throughput : 30K Inserts per sec. • Data size : 800 GB • Time : 60 hrs • Observations: • Busy Relay - 95% CPU (serving data to 200 partitions) • High producer throughput - Log read operates at disk transfer rate • High consumer throughput - Append-only writes of HBase • Better scale possible with larger machine for Relay • Partitioning Relay might be tricky - to preserve WAL edits ordering 20
  • 21. Future Work • Enhance, Implement: • Producers • HBase, MongoDB, etc. • Data Layers • Redis, Aerospike, etc. • Document Operational best practices • e.g. MySQL mastership transfer • Infra component for building tiered data stores • Sharded, Secondary indices, Low Latency, HW optimized (high Disk-Memory ratios) 21