Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Aesop Change Data Propagation :
Bridging SQL and NoSQL Systems
Regunath B, Principal Architect, Flipkart
github.com/reguna...
What data store?
• In interviews: I need to scale and therefore will use a
NoSQL database
• Avoids the overheads of RDBMS!...
SCALING AN E-COMMERCE
WEBSITE
4
Source: Wayback Machine (ca. 2007)
Scaling the data store
Source: http://www.slideshare.net/slashn/slash-n-tech-talk-tra...
5
Data Store
User session Memcached,
HBase
Product HBase, Elastic
Search, Redis
Cart MySQL,
MongoDB
Notifications HBase
Se...
DATA CONSISTENCY IN
POLYGLOT PERSISTENCE
Caching/Serving Layer challenges
There are only two hard things in Computer Science: cache
invalidation and naming things....
Eventual Consistency
• Replicas converge over time
• Pros
• Scale reads through multiple replicas
• Higher overall data av...
AESOP - CHANGE DATA
CAPTURE, PROPAGATION
Introduction
10
• A keen observer of changes that can also relay change events reliably
to interested parties. Provides us...
Change Propagation Approach
11
• Core Tech stack
• LinkedIn Databus
• Open Replicator (FK fork)
• NGData HBase SEP
• Netfl...
Aesop Components
• Producer : Uses Log Mining (Old wine in new bottle?)
• "Durability is typically implemented via logging...
Aesop Components
• Databus Relay : Ring-Buffer holding Avro
serialised change events
• Memory mapped
• Similar to a Broker...
Aesop Architecture
14
Event Consumption
• Data transformation
• Data Layer : Multiple
destinations
• MySQL
• Elastic Search
• HBase
15
Client Clustering : HA & Load
Balancing
16
Monitoring Console
17
Monitoring Console
18
Aesop Utilities
• Blocking Bootstrap
• Cold start
consumers
• Avro schema
generator
• SCN Generator
• Generational SCN
gen...
Performance (Lies, Damn Lies, and Benchmarks)
• MySQL —> HBase
• Relay : 1 XL VM (8 core, 32GB)
• Consumers : 4 XL, 200 pa...
Future Work
• Enhance, Implement:
• Producers
• HBase, MongoDB, etc.
• Data Layers
• Redis, Aerospike, etc.
• Document Ope...
Aesop change data propagation
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
practical risks in aadhaar project and measures to overcome them
Next
Download to read offline and view in fullscreen.

3

Share

Download to read offline

Aesop change data propagation

Download to read offline

Slides from my talk at #GIDS 2015 Data Conference

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Aesop change data propagation

  1. 1. Aesop Change Data Propagation : Bridging SQL and NoSQL Systems Regunath B, Principal Architect, Flipkart github.com/regunathb twitter.com/RegunathB
  2. 2. What data store? • In interviews: I need to scale and therefore will use a NoSQL database • Avoids the overheads of RDBMS!? • XX product brochure: • Y million ops/sec (Lies, Damn Lies, and Benchmarks) • In-memory, Flash optimised • In architecture reviews: • Durability of data, disk-to-memory ratios • How many nodes in a single cluster? • CAP tradeoffs: Consistency vs. Availability 2
  3. 3. SCALING AN E-COMMERCE WEBSITE
  4. 4. 4 Source: Wayback Machine (ca. 2007) Scaling the data store Source: http://www.slideshare.net/slashn/slash-n-tech-talk-track-2-website-architecturemistakes-learnings-siddhartha-reddy MySQL& Master& Website& Writes& MySQL& Slave& Website& Reads& Analy5cs& Reads& Replica5on& MySQL& Master& Website& Writes& MySQL& Slave&1& Website& Reads& Replica6on& MySQL& Slave&2& Analy6cs& Reads& Replica6on& MySQL& Master& Writes& MySQL& Slave& Reads& Replica4on&
  5. 5. 5 Data Store User session Memcached, HBase Product HBase, Elastic Search, Redis Cart MySQL, MongoDB Notifications HBase Search Solr, Neo4j Recommendations Hadoop MR, Redis Pricing (WIP) MySQL, Aerospike Scaling the data store (polyglot persistence)
  6. 6. DATA CONSISTENCY IN POLYGLOT PERSISTENCE
  7. 7. Caching/Serving Layer challenges There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton 7 • Cache TTLs, High Request concurrency, Lazy caching • Thundering herds • Availability of primary data store • Cache size, distribution, no. of replicas • Feasibility of write-through • Serving layer is Eventually Consistent, at best
  8. 8. Eventual Consistency • Replicas converge over time • Pros • Scale reads through multiple replicas • Higher overall data availability • Cons • Reads return live data before convergence • Need to implement Strong Eventual Consistency when timeline-consistent view of data is needed • Achieving Eventual Consistency is not easy • Trivially requires Atleast-Once delivery guarantee of updates to all replicas 8
  9. 9. AESOP - CHANGE DATA CAPTURE, PROPAGATION
  10. 10. Introduction 10 • A keen observer of changes that can also relay change events reliably to interested parties. Provides useful infrastructure for building Eventually Consistent data sources and systems. • Open Source : https://github.com/Flipkart/aesop • Support : aesop-users@googlegroups.com • Production Deployments at Flipkart : • Payments : Multi-tiered datastore spanning MySQL, HBase • ETL : Move changes on User accounts to data analysis platform/ warehouse • Data Serving : Capture Wishlist data updates on MySQL and index in Elastic Search • WIP : Accounting, Pricing, Order management etc.
  11. 11. Change Propagation Approach 11 • Core Tech stack • LinkedIn Databus • Open Replicator (FK fork) • NGData HBase SEP • Netflix Zeno • Apache Helix
  12. 12. Aesop Components • Producer : Uses Log Mining (Old wine in new bottle?) • "Durability is typically implemented via logging and recovery.” Architecture of a Database System • "The contents of the DB are a cache of the latest records in the log. The truth is the log. The database is a cache of a subset of the log.” - Jay Kreps (creator of Kafka) • WAL (write ahead log) ensures: • Each modification is flushed to disk • Log records are in order 12
  13. 13. Aesop Components • Databus Relay : Ring-Buffer holding Avro serialised change events • Memory mapped • Similar to a Broker in a pub-sub system • Enhanced in Aesop for configurability, metrics collection and admin console • Databus Consumer(s) : Sinks for change events • Enhanced in Aesop for bootstrapping, configurability, data transformation 13
  14. 14. Aesop Architecture 14
  15. 15. Event Consumption • Data transformation • Data Layer : Multiple destinations • MySQL • Elastic Search • HBase 15
  16. 16. Client Clustering : HA & Load Balancing 16
  17. 17. Monitoring Console 17
  18. 18. Monitoring Console 18
  19. 19. Aesop Utilities • Blocking Bootstrap • Cold start consumers • Avro schema generator • SCN Generator • Generational SCN generator (to handle MySQL mastership transfer) 19
  20. 20. Performance (Lies, Damn Lies, and Benchmarks) • MySQL —> HBase • Relay : 1 XL VM (8 core, 32GB) • Consumers : 4 XL, 200 partitions • Throughput : 30K Inserts per sec. • Data size : 800 GB • Time : 60 hrs • Observations: • Busy Relay - 95% CPU (serving data to 200 partitions) • High producer throughput - Log read operates at disk transfer rate • High consumer throughput - Append-only writes of HBase • Better scale possible with larger machine for Relay • Partitioning Relay might be tricky - to preserve WAL edits ordering 20
  21. 21. Future Work • Enhance, Implement: • Producers • HBase, MongoDB, etc. • Data Layers • Redis, Aerospike, etc. • Document Operational best practices • e.g. MySQL mastership transfer • Infra component for building tiered data stores • Sharded, Secondary indices, Low Latency, HW optimized (high Disk-Memory ratios) 21
  • callin2

    Nov. 10, 2015
  • ApoorvSingh7

    Jun. 15, 2015
  • abhiramviswamitra

    Apr. 25, 2015

Slides from my talk at #GIDS 2015 Data Conference

Views

Total views

2,831

On Slideshare

0

From embeds

0

Number of embeds

121

Actions

Downloads

25

Shares

0

Comments

0

Likes

3

×