Using Kafka to scale database replication

How LinkedIn used Kafka to scale
Database Infrastructure
Basavaiah Thambara(Basu)
Staff SRE
https://www.linkedin.com/in/basavaiaht

Today’s
agenda
Introduction to Espresso - DataStore
Espresso - Replication needs
Limitations of using MySQL Replication
Espresso Replication using Kafka
Advantages of using Kafka
How Kafka based replication works
Conclusion & References

Espresso
● Document store
● Built on top of MySQL
● Bridges gap between RDBMS & k-v stores
● Features
■ Multi-colo
■ Secondary Indexing
■ Schema evolution
■ Change data capture
■ ETL to and from Hadoop
● Use cases
■ Profiles, Invitations, InMails, etc.

Scale of usage
● 80% of site facing databases
● No.of clusters - 145 , servers - ~19k
● No.of databases - ~300
● Data size - ~12PB
● Peak qps - ~3.4 M on single data store

Database Sharding
● Database
● Shard or partition

Espresso Basic Architecture
● Storage node
● Apache Helix
● Zookeeper
● Router
● Client/application

The need of replication
● Read scaling
● High availability
● Disaster Recovery
● Multi-colo support
● Backups

Espresso - local replication
● MySQL replication
● Per node replication
● Master
● Slave
● Master serves
● Node failure
old design

Espresso - cross colo replication
● Multi-colo writes
● Last writer win
● Databus
● Data Replicator
● Colo failure
old design

Limitations of using MySQL replication
● Poor resource utilization

● Cluster expansion is complex

● Upon master failure, single node gets traffic
● Human intervention to bring up slaves
● Slave less situation might lead to outage
When master goes down When slaves go down

● Databus operational complexity
● Databus maintenance cost

Espresso : Replication using kafka
● Per partition replication
● Flexible partition placement
● Every node serves traffic
● Data replicator uses kafka
New design

Advantages of using kafka
● Better h/w utilization
● Cluster expansion is easy as
■ add node(s) to cluster
■ rebalance
● No human intervention

Advantages of using kafka
● Node failure
■ parallel mastership handoff
■ parallel restore of slaves
● Databus complexity eliminated
● Huge cost savings
● Single platform for
■ internal replication
■ cross colo replication

Kafka based replication
● Delivery must be
■ guaranteed
■ In-Order
■ Exactly Once

GTIDs and SCNs
● Global transaction identifier
● Unique

Espresso Kafka Producer
● part of storage node
● Uses Open Replicator
● Single Threaded

Message protocol - Mastership Handoff

Producer configuration
● acks = “all”
● Infinite retries
● block.on.buffer.full = true
● max.in.flight.requests.per.connection = 1
● linger = 0
● on non-retryable exception
■ destroy producer
■ create new producer
■ resume from last checkpoint

Kafka broker config and spec
● Kafka broker config
■ replication factor =3
■ min.isr = 2
■ Disabled unclean leader elections
● Kafka broker node spec
■ 256GB RAM
■ 8 core, intel @ 2.00GHz
■ 19TB HDD with RAID
■ os - RHEL 6

Kafka replication stats
● Kafka cluster in each colo
● No.of kafka brokers - 336
● Peak 500MB per sec , 36 TB per day
● Peak 1.5M messages per sec, 34 billion per day

Conclusion
● Kafka is used for database replication at scale
● LinkedIn leveraged Kafka to scale Espresso
● Kafka helped to Unify data pipelines
● Saved $$$

References
1.https://engineering.linkedin.com/espresso/introducing-espresso-
linkedins-hot-new-distributed-document-store
2.https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-
linkedin
3.https://www.slideshare.net/ConfluentInc/espresso-database-
replication-with-kafka-tom-quiggle
4.https://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-
apache-kafka-49753844

Using Kafka to scale database replication

More Related Content

What's hot

Similar to Using Kafka to scale database replication

Recently uploaded

Using Kafka to scale database replication