Way to kafka connect

what is kafka
2
coolmeen, grinfeld

what is kafka
3
coolmeen, grinfeld

Async communication messaging bus
6
coolmeen, grinfeld

7
coolmeen, grinfeld

8
coolmeen, grinfeld

Near real time Data Extraction and Transformation (ETL) platform:
9
coolmeen, grinfeld

Near real time multi source event aggregation
10
coolmeen, grinfeld

My Data is in Kafka, now what?
11
coolmeen, grinfeld

Kafka eco-system:
1. Kafka Simple ProducerConsumer
2. Kafka Streams
3. KSQL
What about data that is not in Kafka?
12
coolmeen, grinfeld

delete everything and use oracle
hand made by © coolmeen
13
coolmeen, grinfeld

Kafka Connect to the Rescue
Kafka Connect is a tool for scalably and reliably streaming data
between Apache Kafka and other data systems. It makes it
simple to quickly define connectors that move large data sets
into and out of Kafka.
https://www.confluent.io/hub/
14
coolmeen, grinfeld

Let’s look for example from our system
15
coolmeen, grinfeld

What do we have now?
Keeper Users
Web DBROUTER
W
A
N
Kafka
Getting user data from Charlie
16
coolmeen, grinfeld

What problems do we have?
1. Synchronous request from keeper to charlie over the WAN
2. Synchronous request between Web and DB services
3. Synchronous request between DB service and Oracle DB
4. and response back
LATENCY
1. Multiple requests from Keeper can slow (or kill) Charlie DB (Oracle)
2. Slowness in Charlie DB could cause slowness or failures in Keeper, it will cause retries with Quartz
which will go to Charlie Oracle and so onMUTUAL DEPENDENCY
17
coolmeen, grinfeld

Let’s try to solve this
Keeper Users
Web DBROUTER
W
A
N
Kafka
Cache
with
TTL
18
coolmeen, grinfeld

Keeper Users
Web DBROUTER
W
A
N
Kafka Cache
Scheduler
19
coolmeen, grinfeld

What did we solve?
Mutual Dependency - now only “Scheduler” depends on Charlie and it runs once per X time. It could be
adjusted to affect less on charlie.
What problems we still have and maybe have added the new ones?
Staleness - now data should be outdated for some (long) period of time (even when everything is
playing well). We can shorten un-updated period by making “Scheduler” to execute more frequently,
but it will take us back to “Mutual Dependency” problem.
Mutual Dependency - now we should take all users and their plans from DB every time scheduler
makes requests to charlie and it could take much more time, we need to manage retries and so on.
We depend on our Web Services
Latency - now router sends its requests inside Tlx without interacting directly with Charlie (Savis)
Solved Partially
Solved
20
coolmeen, grinfeld

Keeper Users
DB
ROUTER
W
A
N
Kafka Cache
1
ForKeeper
2
Cache
Updater
21
coolmeen, grinfeld

Keeper Users
DB
ROUTER
W
A
N
Kafka Cache
1
ForKeeper
2
Cache Updater
Controller
Cache
Cache
Updater
22
coolmeen, grinfeld

Keeper Users
DB
ROUTER
W
A
N
Kafka Cache
1
ForKeeper
2
Kafka
3
Cache
Updater
Cache Updater
Controller
23
coolmeen, grinfeld

What did we solved?
Mutual Dependency - now only “Scheduler” depends on Charlie and it runs once per X time. It could be
adjusted to influence less on charlie.
What problems we still have and maybe have added the new ones?
Staleness - now data should be outdated for some (long) period of time (even when everything is
playing well). We can shorten un-updated period by making “Scheduler” to execute more frequently,
but it will take us back to “Mutual Dependency” problem.
Inverse Dependency - Charlie knows keeper, despite of fact that keeper depends on charlie and NOT
the opposite direction (and it’s bad)
Latency - now router sends its requests inside Tlx without interacting directly with Charlie (Savis)
Solved Partially
Solved
Solved
Data loss if Charlie services (DB or Web) is down or adding latency to Charlie (if we wait for commit
until data is sent to keeper)
24
coolmeen, grinfeld

Keeper
Users
DB
ROUTER
W
A
N
Kafka Cache
CDC
(Kafka
Connect)
Cache
Updater
Single
source
of truth
Oracle
append
log
DB
25
coolmeen, grinfeld

Users
DB
ROUTERKafka Cache
Cache
Updater
Single
source
of truth
Oracle
append
log
Keeper
W
A
N
CDC
(Kafka
Connect)
26
coolmeen, grinfeld
DB

{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "ID"
},
{
"type": "string",
"optional": false,
"field": "FIRST_NAME"
},
{
"type": "string",
"optional": false,
"field": "LAST_NAME"
},
{
"type": "string",
"optional": true,
"field": "COMPANY"
}
],
"optional": true,
"name": "server1.DEBEZIUM.USERS.Value",
"field": "before"
}
Short example from Debezium
27
coolmeen, grinfeld

{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "ID"
},
{
"type": "string",
"optional": false,
"field": "FIRST_NAME"
},
{
"type": "string",
"optional": false,
"field": "LAST_NAME"
},
{
"type": "string",
"optional": true,
"field": "COMPANY"
}
],
"optional": true,
"name": "server1.DEBEZIUM.USERS.Value",
"field": "after"
}
28
coolmeen, grinfeld

{
"type": "struct",
"fields": [
{
"type": "string",
"optional": true,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"field": "txId"
},
{
"type": "int64",
"optional": true,
"field": "scn"
},
{
"type": "boolean",
"optional": true,
"field": "snapshot"
}
],
"optional": false,
"name": "io.debezium.connector.oracle.Source",
"field": "source"
}
29
coolmeen, grinfeld

{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
}
],
"optional": false,
"name": "server1.DEBEZIUM.USERS.Envelope"
},
"payload": {
"before": null,
"after": {
"ID": 1004,
"FIRST_NAME": "Ilya",
"LAST_NAME": "Morgenshtern",
"COMPANY": "TeleMessage"
},
"source": {
"version": "0.9.0.Alpha1",
"name": "server1",
"ts_ms": 1520085154000,
"txId": "6.28.807",
"scn": 2122185,
"snapshot": false
},
"op": "c",
"ts_ms": 1532592105975
}
}
30
coolmeen, grinfeld

Users Devices Products
Cache
What next?
31
coolmeen, grinfeld

Resources
Apache Kafka use cases from Apache Kafka site
Kafka Connect Confluent Hub
Kafka Connect Confluent Docs
Blog: The Simplest Useful Kafka Connect Data Pipeline In The World
Tutorial for Debezium
Devoxx: Data Streaming for Microservices using Debezium
ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka
#ApacheKafkaTLV hosting Gwen Shapira
33
coolmeen, grinfeld

Way to kafka connect

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Way to kafka connect

Similar to Way to kafka connect (20)

Recently uploaded

Recently uploaded (20)

Way to kafka connect

Editor's Notes