The document discusses building an event-driven architecture using Apache Kafka and Kafka Connect. It describes how VeXeRe uses this approach to stream data from their MS SQL database into Kafka. Key points covered include event sourcing, how Kafka Connect works using connectors and tasks, best practices for monitoring connectors, and handling database schema evolution. Real-world use cases at VeXeRe like syncing data to data warehouses and search indexes are also examined.
Grokking Techtalk #39: How to build an event driven architecture with Kafka & Kafka Connect
1. How to build an event driven architecture
with Kafka & Kafka Connect
Nov 12, 2020
Lợi Nguyễn - Technical Architect @ VeXeRe
1
2. Vietnam’s largest online bus booking systemvexere.com
Name: Nguyễn Văn Lợi
Company:
● Vexere - #1 Saas based bus ticket platform in Vietnam
● Chotot - #1 Classified Marketplace in Vietnam
● Blue Orchid - A start-up founded by ex-Grab CTO
● Softfoundry - VoIP product
2
3. Vietnam’s largest online bus booking systemvexere.com
VeXeRe.com is a Vietnamese online bus ticket booking system that operates through
many transportation companies.
3
13. Vietnam’s largest online bus booking systemvexere.com
● Two representation of the world:
○ Application State: the current representation of the world, and
○ log of all the events: that changed that world
● The test definition of Event Sourcing:
○ at any time we can blow away the application state and confidently rebuild it from
the log.
● Benefit:
○ Audits
○ Debugging
13
14. Vietnam’s largest online bus booking systemvexere.com
● What is “Event Driven” architecture?
○ Event-carried State Transfer
○ Event Sourcing
14
15. Vietnam’s largest online bus booking systemvexere.com
● Event Sourcing in real world
○ What is 2 phase write?
○ MSSQL / transaction log
○ Postgresql / WAL
15
20. Vietnam’s largest online bus booking systemvexere.com
● Event Sourcing in real world
○ What is 2 phase write?
○ MSSQL / transaction log
○ Postgresql / WAL
20
21. Vietnam’s largest online bus booking systemvexere.com
● What is Kafka & Kafka Connect?
○ Connector/Task/Worker
○ Transform
○ How we use Kafka and Kafka Connect @ vexere
○ Pros/Cons of Kafka Connect vs Custom Producer
21
22. Vietnam’s largest online bus booking systemvexere.com
● topic
● producer
● consumer
● broker
● partition
● consumer group
22
29. Vietnam’s largest online bus booking systemvexere.com
Kafka Connect is a framework to stream data into and out of Apache Kafka
● Connectors – the high level abstraction that coordinates data streaming by managing tasks
● Tasks – the implementation of how data is copied to or from Kafka
● Workers – the running processes that execute connectors and tasks
● Converters – the code used to translate data between Connect and the system sending or receiving data
● Transforms – simple logic to alter each message produced by or sent to a connector
● Dead Letter Queue – how Connect handles connector errors
29
37. Vietnam’s largest online bus booking systemvexere.com
Pros Cons
● Many Connectors (source/sink)
● No coding required
● Simple transform only
● Hard to customize or write your own
connector
37
38. Vietnam’s largest online bus booking systemvexere.com
● What is Kafka & Kafka Connect?
○ Connector/Task/Worker
■ MSSQL Source Connector
○ Transform
○ How we use Kafka and Kafka Connect @ vexere
○ Pros/Cons of Kafka Connect vs Custom Producer
38
41. Vietnam’s largest online bus booking systemvexere.com
In Kafka Connect, task is being killed and will not recover until manually
restarted
Solution:
● Cronjob to monitor task status, then restart task by calling restful to
task api
● Dead Letter Queue to handle error in:
○ Convert
○ Transform
41
42. Vietnam’s largest online bus booking systemvexere.com 42
Reference: https://debezium.io/documentation/reference/connectors/sqlserver.html#sqlserver-schema-evolution
44. Vietnam’s largest online bus booking systemvexere.com 44
● NVARCHAR(max) is not supported in CDC table (cannot record before value,
only have after update value)
48. Vietnam’s largest online bus booking systemvexere.com
Separate read & write model
Write model: MSSQL
Read model: Elasticsearch
MSSQL ⇒ Kafka ⇒ Kafka consumer ⇒ Elasticsearch
48
49. Vietnam’s largest online bus booking systemvexere.com
Example: Real Time sync data from MSSQL ⇒ Stagging Postgres ⇒ Bigquery
Note:
● be careful when backfill data
● If we new column, we have to trigger dummy update to trigger all record event
=> a lot of trash in transaction log ==> need to write your own job
49