SlideShare a Scribd company logo
1 of 33
1
coolmeen, grinfeld
what is kafka
2
coolmeen, grinfeld
what is kafka
3
coolmeen, grinfeld
4
coolmeen, grinfeld
5
coolmeen, grinfeld
Async communication messaging bus
6
coolmeen, grinfeld
Async communication messaging bus
7
coolmeen, grinfeld
Async communication messaging bus
8
coolmeen, grinfeld
Near real time Data Extraction and Transformation (ETL) platform:
9
coolmeen, grinfeld
Near real time multi source event aggregation
10
coolmeen, grinfeld
My Data is in Kafka, now what?
11
coolmeen, grinfeld
Kafka eco-system:
1. Kafka Simple ProducerConsumer
2. Kafka Streams
3. KSQL
What about data that is not in Kafka?
12
coolmeen, grinfeld
delete everything and use oracle
hand made by © coolmeen
13
coolmeen, grinfeld
Kafka Connect to the Rescue
Kafka Connect is a tool for scalably and reliably streaming data
between Apache Kafka and other data systems. It makes it
simple to quickly define connectors that move large data sets
into and out of Kafka.
https://www.confluent.io/hub/
14
coolmeen, grinfeld
Let’s look for example from our system
15
coolmeen, grinfeld
What do we have now?
Keeper Users
Web DBROUTER
W
A
N
Kafka
Getting user data from Charlie
16
coolmeen, grinfeld
What problems do we have?
1. Synchronous request from keeper to charlie over the WAN
2. Synchronous request between Web and DB services
3. Synchronous request between DB service and Oracle DB
4. and response back
LATENCY
1. Multiple requests from Keeper can slow (or kill) Charlie DB (Oracle)
2. Slowness in Charlie DB could cause slowness or failures in Keeper, it will cause retries with Quartz
which will go to Charlie Oracle and so onMUTUAL DEPENDENCY
17
coolmeen, grinfeld
Let’s try to solve this
Keeper Users
Web DBROUTER
W
A
N
Kafka
Cache
with
TTL
18
coolmeen, grinfeld
Let’s try to solve this
Keeper Users
Web DBROUTER
W
A
N
Kafka Cache
Scheduler
19
coolmeen, grinfeld
What did we solve?
Mutual Dependency - now only “Scheduler” depends on Charlie and it runs once per X time. It could be
adjusted to affect less on charlie.
What problems we still have and maybe have added the new ones?
Staleness - now data should be outdated for some (long) period of time (even when everything is
playing well). We can shorten un-updated period by making “Scheduler” to execute more frequently,
but it will take us back to “Mutual Dependency” problem.
Mutual Dependency - now we should take all users and their plans from DB every time scheduler
makes requests to charlie and it could take much more time, we need to manage retries and so on.
We depend on our Web Services
Latency - now router sends its requests inside Tlx without interacting directly with Charlie (Savis)
Solved Partially
Solved
20
coolmeen, grinfeld
Let’s try to solve this
Keeper Users
DB
ROUTER
W
A
N
Kafka Cache
1
ForKeeper
2
Cache
Updater
21
coolmeen, grinfeld
Let’s try to solve this
Keeper Users
DB
ROUTER
W
A
N
Kafka Cache
1
ForKeeper
2
Cache Updater
Controller
Cache
Cache
Updater
22
coolmeen, grinfeld
Let’s try to solve this
Keeper Users
DB
ROUTER
W
A
N
Kafka Cache
1
ForKeeper
2
Kafka
3
Cache
Updater
Cache Updater
Controller
23
coolmeen, grinfeld
What did we solved?
Mutual Dependency - now only “Scheduler” depends on Charlie and it runs once per X time. It could be
adjusted to influence less on charlie.
What problems we still have and maybe have added the new ones?
Staleness - now data should be outdated for some (long) period of time (even when everything is
playing well). We can shorten un-updated period by making “Scheduler” to execute more frequently,
but it will take us back to “Mutual Dependency” problem.
Inverse Dependency - Charlie knows keeper, despite of fact that keeper depends on charlie and NOT
the opposite direction (and it’s bad)
Latency - now router sends its requests inside Tlx without interacting directly with Charlie (Savis)
Solved Partially
Solved
Solved
Data loss if Charlie services (DB or Web) is down or adding latency to Charlie (if we wait for commit
until data is sent to keeper)
24
coolmeen, grinfeld
Let’s try to solve this
Keeper
Users
DB
ROUTER
W
A
N
Kafka Cache
CDC
(Kafka
Connect)
Cache
Updater
Single
source
of truth
Oracle
append
log
DB
25
coolmeen, grinfeld
Users
DB
ROUTERKafka Cache
Cache
Updater
Single
source
of truth
Oracle
append
log
Keeper
W
A
N
CDC
(Kafka
Connect)
Let’s try to solve this
26
coolmeen, grinfeld
DB
{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "ID"
},
{
"type": "string",
"optional": false,
"field": "FIRST_NAME"
},
{
"type": "string",
"optional": false,
"field": "LAST_NAME"
},
{
"type": "string",
"optional": true,
"field": "COMPANY"
}
],
"optional": true,
"name": "server1.DEBEZIUM.USERS.Value",
"field": "before"
}
Short example from Debezium
27
coolmeen, grinfeld
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "ID"
},
{
"type": "string",
"optional": false,
"field": "FIRST_NAME"
},
{
"type": "string",
"optional": false,
"field": "LAST_NAME"
},
{
"type": "string",
"optional": true,
"field": "COMPANY"
}
],
"optional": true,
"name": "server1.DEBEZIUM.USERS.Value",
"field": "after"
}
Short example from Debezium
28
coolmeen, grinfeld
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": true,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"field": "txId"
},
{
"type": "int64",
"optional": true,
"field": "scn"
},
{
"type": "boolean",
"optional": true,
"field": "snapshot"
}
],
"optional": false,
"name": "io.debezium.connector.oracle.Source",
"field": "source"
}
Short example from Debezium
29
coolmeen, grinfeld
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
}
],
"optional": false,
"name": "server1.DEBEZIUM.USERS.Envelope"
},
"payload": {
"before": null,
"after": {
"ID": 1004,
"FIRST_NAME": "Ilya",
"LAST_NAME": "Morgenshtern",
"COMPANY": "TeleMessage"
},
"source": {
"version": "0.9.0.Alpha1",
"name": "server1",
"ts_ms": 1520085154000,
"txId": "6.28.807",
"scn": 2122185,
"snapshot": false
},
"op": "c",
"ts_ms": 1532592105975
}
}
Short example from Debezium
30
coolmeen, grinfeld
Users Devices Products
Cache
What next?
31
coolmeen, grinfeld
32
coolmeen, grinfeld
Resources
Apache Kafka use cases from Apache Kafka site
Kafka Connect Confluent Hub
Kafka Connect Confluent Docs
Blog: The Simplest Useful Kafka Connect Data Pipeline In The World
Tutorial for Debezium
Devoxx: Data Streaming for Microservices using Debezium
ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka
#ApacheKafkaTLV hosting Gwen Shapira
33
coolmeen, grinfeld

More Related Content

What's hot

Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and futureEd Yakabosky
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsDean Wampler
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...Natan Silnitsky
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Michael Noll
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLconfluent
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
RedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis codeRedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis codeRedis Labs
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacreconfluent
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
 
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaKafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaEno Thereska
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming EngineMoving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming EngineLightbend
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQLCrossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQLconfluent
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configsconfluent
 
RedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and BeyondRedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and BeyondRedis Labs
 

What's hot (20)

Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
RedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis codeRedisConf18 - Writing modular & encapsulated Redis code
RedisConf18 - Writing modular & encapsulated Redis code
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaKafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache Kafka
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming EngineMoving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQLCrossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
RedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and BeyondRedisConf18 - 2,000 Instances and Beyond
RedisConf18 - 2,000 Instances and Beyond
 

Similar to Way to kafka connect

Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...HostedbyConfluent
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlDavid Daeschler
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Matt Stubbs
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsContinuent
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Eugene
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemFlorent Ramiere
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumClement Demonchy
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...Vyacheslav Lapin
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
 
Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patternsFlorent Ramiere
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishCodeValue
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 

Similar to Way to kafka connect (20)

Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosql
 
Couchbase Data Pipeline
Couchbase Data PipelineCouchbase Data Pipeline
Couchbase Data Pipeline
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patterns
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publish
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Way to kafka connect

  • 6. Async communication messaging bus 6 coolmeen, grinfeld
  • 7. Async communication messaging bus 7 coolmeen, grinfeld
  • 8. Async communication messaging bus 8 coolmeen, grinfeld
  • 9. Near real time Data Extraction and Transformation (ETL) platform: 9 coolmeen, grinfeld
  • 10. Near real time multi source event aggregation 10 coolmeen, grinfeld
  • 11. My Data is in Kafka, now what? 11 coolmeen, grinfeld
  • 12. Kafka eco-system: 1. Kafka Simple ProducerConsumer 2. Kafka Streams 3. KSQL What about data that is not in Kafka? 12 coolmeen, grinfeld
  • 13. delete everything and use oracle hand made by © coolmeen 13 coolmeen, grinfeld
  • 14. Kafka Connect to the Rescue Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka. https://www.confluent.io/hub/ 14 coolmeen, grinfeld
  • 15. Let’s look for example from our system 15 coolmeen, grinfeld
  • 16. What do we have now? Keeper Users Web DBROUTER W A N Kafka Getting user data from Charlie 16 coolmeen, grinfeld
  • 17. What problems do we have? 1. Synchronous request from keeper to charlie over the WAN 2. Synchronous request between Web and DB services 3. Synchronous request between DB service and Oracle DB 4. and response back LATENCY 1. Multiple requests from Keeper can slow (or kill) Charlie DB (Oracle) 2. Slowness in Charlie DB could cause slowness or failures in Keeper, it will cause retries with Quartz which will go to Charlie Oracle and so onMUTUAL DEPENDENCY 17 coolmeen, grinfeld
  • 18. Let’s try to solve this Keeper Users Web DBROUTER W A N Kafka Cache with TTL 18 coolmeen, grinfeld
  • 19. Let’s try to solve this Keeper Users Web DBROUTER W A N Kafka Cache Scheduler 19 coolmeen, grinfeld
  • 20. What did we solve? Mutual Dependency - now only “Scheduler” depends on Charlie and it runs once per X time. It could be adjusted to affect less on charlie. What problems we still have and maybe have added the new ones? Staleness - now data should be outdated for some (long) period of time (even when everything is playing well). We can shorten un-updated period by making “Scheduler” to execute more frequently, but it will take us back to “Mutual Dependency” problem. Mutual Dependency - now we should take all users and their plans from DB every time scheduler makes requests to charlie and it could take much more time, we need to manage retries and so on. We depend on our Web Services Latency - now router sends its requests inside Tlx without interacting directly with Charlie (Savis) Solved Partially Solved 20 coolmeen, grinfeld
  • 21. Let’s try to solve this Keeper Users DB ROUTER W A N Kafka Cache 1 ForKeeper 2 Cache Updater 21 coolmeen, grinfeld
  • 22. Let’s try to solve this Keeper Users DB ROUTER W A N Kafka Cache 1 ForKeeper 2 Cache Updater Controller Cache Cache Updater 22 coolmeen, grinfeld
  • 23. Let’s try to solve this Keeper Users DB ROUTER W A N Kafka Cache 1 ForKeeper 2 Kafka 3 Cache Updater Cache Updater Controller 23 coolmeen, grinfeld
  • 24. What did we solved? Mutual Dependency - now only “Scheduler” depends on Charlie and it runs once per X time. It could be adjusted to influence less on charlie. What problems we still have and maybe have added the new ones? Staleness - now data should be outdated for some (long) period of time (even when everything is playing well). We can shorten un-updated period by making “Scheduler” to execute more frequently, but it will take us back to “Mutual Dependency” problem. Inverse Dependency - Charlie knows keeper, despite of fact that keeper depends on charlie and NOT the opposite direction (and it’s bad) Latency - now router sends its requests inside Tlx without interacting directly with Charlie (Savis) Solved Partially Solved Solved Data loss if Charlie services (DB or Web) is down or adding latency to Charlie (if we wait for commit until data is sent to keeper) 24 coolmeen, grinfeld
  • 25. Let’s try to solve this Keeper Users DB ROUTER W A N Kafka Cache CDC (Kafka Connect) Cache Updater Single source of truth Oracle append log DB 25 coolmeen, grinfeld
  • 27. { "schema": { "type": "struct", "fields": [ { "type": "struct", "fields": [ { "type": "int32", "optional": false, "field": "ID" }, { "type": "string", "optional": false, "field": "FIRST_NAME" }, { "type": "string", "optional": false, "field": "LAST_NAME" }, { "type": "string", "optional": true, "field": "COMPANY" } ], "optional": true, "name": "server1.DEBEZIUM.USERS.Value", "field": "before" } Short example from Debezium 27 coolmeen, grinfeld
  • 28. { "type": "struct", "fields": [ { "type": "int32", "optional": false, "field": "ID" }, { "type": "string", "optional": false, "field": "FIRST_NAME" }, { "type": "string", "optional": false, "field": "LAST_NAME" }, { "type": "string", "optional": true, "field": "COMPANY" } ], "optional": true, "name": "server1.DEBEZIUM.USERS.Value", "field": "after" } Short example from Debezium 28 coolmeen, grinfeld
  • 29. { "type": "struct", "fields": [ { "type": "string", "optional": true, "field": "version" }, { "type": "string", "optional": false, "field": "name" }, { "type": "int64", "optional": true, "field": "ts_ms" }, { "type": "string", "optional": true, "field": "txId" }, { "type": "int64", "optional": true, "field": "scn" }, { "type": "boolean", "optional": true, "field": "snapshot" } ], "optional": false, "name": "io.debezium.connector.oracle.Source", "field": "source" } Short example from Debezium 29 coolmeen, grinfeld
  • 30. { "type": "string", "optional": false, "field": "op" }, { "type": "int64", "optional": true, "field": "ts_ms" } ], "optional": false, "name": "server1.DEBEZIUM.USERS.Envelope" }, "payload": { "before": null, "after": { "ID": 1004, "FIRST_NAME": "Ilya", "LAST_NAME": "Morgenshtern", "COMPANY": "TeleMessage" }, "source": { "version": "0.9.0.Alpha1", "name": "server1", "ts_ms": 1520085154000, "txId": "6.28.807", "scn": 2122185, "snapshot": false }, "op": "c", "ts_ms": 1532592105975 } } Short example from Debezium 30 coolmeen, grinfeld
  • 31. Users Devices Products Cache What next? 31 coolmeen, grinfeld
  • 33. Resources Apache Kafka use cases from Apache Kafka site Kafka Connect Confluent Hub Kafka Connect Confluent Docs Blog: The Simplest Useful Kafka Connect Data Pipeline In The World Tutorial for Debezium Devoxx: Data Streaming for Microservices using Debezium ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka #ApacheKafkaTLV hosting Gwen Shapira 33 coolmeen, grinfeld

Editor's Notes

  1. From sync REST Communication to async communication messaging bus (scalable, fault tolerant, persistent). Kafka is scalable, fault tolerant, persistent
  2. From sync REST Communication to async communication messaging bus (scalable, fault tolerant, persistent). Kafka is scalable, fault tolerant, persistent
  3. From sync REST Communication to async communication messaging bus (scalable, fault tolerant, persistent). Kafka is scalable, fault tolerant, persistent
  4. From batch schedulers jobs to real time event processing
  5. This is our current system overview. We have Keeper which depends on Charlie. They are located in 2 different zones (cities, states) and networking between them goes through the WAN (Internet). Actually, we have Web on Tlx, but it calls DB service on Savis in any case, so it’s quite same situation
  6. long RTT between to geo zones (aka, latency) Increase number of possible failures (we should manage retries, recovery and so on) tight coupling 2 systems and this is error prone (affect each other - pauses, down time and so on) What is correct term for “mutual influence” when we 2 system can hurt each other (Mutual dependency or something similar)?
  7. So let’s add cache So let’s add cache with TTL (redis for example) and when no data in cache - let’s go to charlie
  8. But now we need to update this cache, so we need some service to maintain the cache: according to some pre-defined frequency, it requests users and their plans and update cache We can try to implement some type of receiving only changes by time or some data, but we can’t recognize deletes
  9. Let’s assume we solved latency Mutual influence (dependency) - solved partially, since we can continue to work if charlie is down, but data could be not updated (by the way - seems it’s ok for our use cases) we added staleness and if we want to decrease the staleness effect by increasing pulling frequency - trade off with going back to “mutual influence” when both systems affect each other
  10. Every time we update user or/and his plan, we can add async (or sync) action to, for example, Partner service and it will send some update directly to Kafka Let’s say it’s type of pub/sub managed by our code Another problem is our services: they are not persistent, so if DB or Partner is down - we lose data If DB synchronous sends requests to keeper before commit, we increase latency on Charlie’s transactions - meaning keeper affect charlie Another Problem: Charlie knows inner keeper structure, adding Kafka dependencies to trunk and so on
  11. By adding end-point on keeper - we remove charlie knowledge about keeper inner structure, but still Charlie knows about keeper, despite of fact it shouldn’t
  12. Still, we need to wait for commit in DB until we put data in Kafka (or we’ll lost data when DB restarted/crashed)
  13. We flipped dependency between 2 systems. Actually, keeper depends on charlie, but implementation is in opposite way. We prefer not to do it, since it means that Charlie knows keeper (even if we reduced coupling by introducing, hopefully, well defined endpoint to get data into keeper We need to add (in any case and for any solution) some integration tests to verify that changes in charlie doesn’t affect keeper
  14. CDC - capture data changes (e.g. Kafka Connect) (we currently using CDC for replicating main Oracle DB to our DR by using Oracle GoldenGate replication solution) Data log - first it was used for recovering from crash without losing data Now, let’s look into our system more precisely. Actually, we have 3 different system: the 1st is charlie, the 2nd is keeper and 3rd (actually the 1st one) is Oracle DB. Oracle DB is our “single source of truth”. It knows nothing about other system elements and depends on nothing. Other elements of TeleMessage depend on it. For example, Oracle doesn’t know what Charlie does, and Charlie accesses Oracle to get data from it. Keeper, actually, doesn’t depend on Charlie - it depends on Oracle DB, so nor charlie neither keeper should know each other and interact with each other
  15. Despite of being part of Keeper system, Kafka connect is “invisible” element, so without extra attention we could “hurt” keeper. A lot of people like to call this “transparency” (Ilya calls this “magic”). When your system has “transparency” it’s very easy to forget about this element and could be difficult to maintain. This situation is bug-prone, so it should be documented good and find the way to test it to avoid bugs.