Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019

Keep your Data Close and your
Caches Hotter using Apache
Kafka, Connect and KSQL
Ricardo Ferreira, Developer Advocate
@riferrei #KafkaSummit

About Me:
● Hi, my name is Ricardo Ferreira
● Developer Advocate @ Confluent
● Currently into Cloud & DevOps
● Ex-Oracle, Red Hat, IONA Tech
● https://riferrei.net

Data is only useful
if it is Fresh and
Contextual

There are three parts
in a airbag system:
● The bag itself.
● The sensors which tell the bag to
inflate when there is a collision
probability based on speediness.
● The inflation system, which does
combine two compounds [Sodium
Azide (NaN3) and Potassium
Nitrate (KNO3)] used to produce
Nitrogen gas and inflate the bag.
What if the airbag
deploys 30 seconds
after the collision?

December 6th, 2010:
Commuter rail train
hits elderly driver
● 70-year old lady hear on the news
that there will be no commuter rail
train on that day.
● She tries to beat the train as its
speed through the Groove Street,
but there was no enough time to
break.
● Luckily she is still alive. @riferrei #KafkaSummit
What if the information
about the commuter rail
train is outdated?

Caches can be a
Solution for Data
that is Fresh

APIs need to access
data freely and easily
● Data should never be treated as a
scarce resource in applications
● Latency should be kept as minimal
to ensure a better user experience
● Data should be not be static: keep
the data fresh continuously
● Find ways to handle large amounts
of data without breaking the APIs
CacheAPI
Read
Write
Read
Write

Caches can be either
built-in or distributed
● If data can fit into the API memory,
then you should use built-in caches
● Otherwise, you may need to use
distributed caches for large sizes
● Some cache implementations
provides the best of both cases
● For distributed caches, make sure
to always find a good way to O(1)
CacheAPI
Read
Write
Built-in Caches
Cache
API
Distributed Caches
Cache
Cache
Read
Write

Let’s Tweet the Song!
1. Access your Twitter account.
2. Use #KafkaSummit in your tweet.
3. The name of the song must be
within brackets as shown below.

Application X-Ray:
● Confluent Cloud Cluster
● AWS and Terraform
● Spring Boot Application
● Apache Kafka Connect
● Confluent KSQL
● Redis Cache
● AWS Lambda
● Amazon Alexa

Application X-Ray:
● Confluent Cloud Cluster
● AWS and Terraform
● Spring Boot Application
● Apache Kafka Connect
● Confluent KSQL
● Redis Cache
● AWS Lambda
● Amazon Alexa
You can find the
source-code of this
application here:

Caching Pattern:
Refresh Ahead
● Proactively updates the cache
● Keep the entries always in-sync
● Ideal for latency sensitive cases
● Ideal when data read is costly
● It may need initial data loading
Kafka
Connect
Cache
Kafka
Connect
API

Caching Pattern:
Refresh Ahead / Adapt
● Proactively updates the cache
● Keep the entries always in-sync
● Ideal for latency sensitive cases
● Ideal when data read is costly
● It may need initial data loading
Kafka
Connect
Application
Cache
Kafka
Connect
Transform and adapt
records before delivery
Schema Registry for
canonical models
API

Caching Pattern:
Write Behind
● Removes I/O pressure from app
● Allows true horizontal scalability
● Ensures ordering and persistence
● Minimizes DB code complexity
● Totally handles DB unavailability
Kafka
Connect
Application
Cache
Kafka
Connect
API

Caching Pattern:
Write Behind / Adapt
● Removes I/O pressure from app
● Allows true horizontal scalability
● Ensures ordering and persistence
● Minimizes DB code complexity
● Totally handles DB unavailability
Kafka
Connect
Application
Cache
Kafka
Connect
Transform and adapt
records before delivery
Schema Registry for
canonical models
API

Caching Pattern:
Event Federation
● Replicates data across regions
● Keep multiple regions in-sync
● Great to improve RPO and RTO
● Handles lazy/slow networks well
● Works well if its used along with
Read-Through and Write-Through
patterns.
Confluent
Replicator
<<MirrorMaker>>

Kafka Connect
Implementation
Strategies

Kafka Connect support
for In-Memory Caches
● Connector for Redis is open and it
is available in Confluent Hub
● Connector for Memcached is open
and it is available in Confluent Hub
● Connectors for both GridGain and
Apache Ignite implementations.
● Connector for InfiniSpan is open
and is maintained by Red Hat
Kafka
Connect
Kafka
Connect
Kafka
Connect
Kafka
Connect

Frameworks for other
In-Memory Caches
● Oracle provides HotCache from
GoldenGate for Oracle Coherence
● Hazelcast has the Jet framework,
which provides support for Kafka
● Pivotal GemFire (Apache Geode)
has good support from Spring
● Good news: you can always write
your own sink using Connect API
Oracle
GoldenGate
Hazelcast
Jet
Spring Data
Spring Kafka
Connect
Framework
Any
Cache

Interested on DB CDC?
Then meet Debezium!
● Amazing CDC technology to pull
data out from databases to Kafka
● Works in a log level, which means
true CDC implementation for your
projects instead of record polling
● Open-source maintained by Red
Hat. Have broad support for many
popular databases.
● It is built on top of Kafka Connect @riferrei #KafkaSummit

Support for Running
Kafka Connect Servers
● Run by yourself on BareMetal:
https://kafka.apache.org/downloads
https://www.confluent.io/download
● IaaS on AWS or Google Cloud:
https://github.com/confluentinc/ccloud-tools
● Running using Docker Containers:
https://hub.docker.com/r/confluentinc/cp-kafka-
connect/
● Running using Kubernetes:
https://github.com/confluentinc/cp-helm-chart
https://www.confluent.io/confluent-operator/
Kafka
Connect

25
Please Stay in Touch:
@riferrei
riferrei
riferrei
ricardo@confluent.io
https://riferrei.net
https://cnfl.io/slack

Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019

Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019

Similar to Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019