The distributed cache is becoming a popular technique to improve performance and simplify the data access layer when dealing with databases. Bringing the data as close as possible to the CPU allows unparalleled execution speed as well as horizontal scalability. This approach is often successful when used in a microservices design in which the cache is accessed only by a single API. However, it becomes more challenging if multiple applications are involved and changes are made to the database directly by other applications. The data held in the cache eventually becomes stale and no longer consistent with its underlying database. When consistency problems arise, the Engineering team must address that through additional coding — which directly jeopardizes the team’s ability to be agile between releases. This talk presents a set of patterns for cache-based architectures that aim to keep the caches always hot; by using Apache Kafka and its connectors to accomplish that goal. It will be shown how to set up these patterns across different IMDGs such as Hazelcast, Apache Ignite or Coherence. These patterns can be used in conjunction with different cache topologies such as cache-aside, read-through, write-behind, and refresh-ahead, making it reusable enough to be used as a framework to achieve data consistency in any architecture that relies on distributed caches.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) Kafka Summit NYC 2019
1. Keep your Data Close and your
Caches Hotter using Apache
Kafka, Connect and KSQL
Ricardo Ferreira, Developer Advocate
@riferrei #KafkaSummit
2. About Me:
● Hi, my name is Ricardo Ferreira
● Developer Advocate @ Confluent
● Currently into Cloud & DevOps
● Ex-Oracle, Red Hat, IONA Tech
● https://riferrei.net
@riferrei #KafkaSummit
3. Data is only useful
if it is Fresh and
Contextual
4. There are three parts
in a airbag system:
● The bag itself.
● The sensors which tell the bag to
inflate when there is a collision
probability based on speediness.
● The inflation system, which does
combine two compounds [Sodium
Azide (NaN3) and Potassium
Nitrate (KNO3)] used to produce
Nitrogen gas and inflate the bag.
@riferrei #KafkaSummit
What if the airbag
deploys 30 seconds
after the collision?
5. December 6th, 2010:
Commuter rail train
hits elderly driver
● 70-year old lady hear on the news
that there will be no commuter rail
train on that day.
● She tries to beat the train as its
speed through the Groove Street,
but there was no enough time to
break.
● Luckily she is still alive. @riferrei #KafkaSummit
What if the information
about the commuter rail
train is outdated?
7. APIs need to access
data freely and easily
● Data should never be treated as a
scarce resource in applications
● Latency should be kept as minimal
to ensure a better user experience
● Data should be not be static: keep
the data fresh continuously
● Find ways to handle large amounts
of data without breaking the APIs
@riferrei #KafkaSummit
CacheAPI
Read
Write
Read
Write
8. Caches can be either
built-in or distributed
● If data can fit into the API memory,
then you should use built-in caches
● Otherwise, you may need to use
distributed caches for large sizes
● Some cache implementations
provides the best of both cases
● For distributed caches, make sure
to always find a good way to O(1)
@riferrei #KafkaSummit
CacheAPI
Read
Write
Built-in Caches
Cache
API
Distributed Caches
Cache
Cache
Read
Write
10. Let’s Tweet the Song!
1. Access your Twitter account.
2. Use #KafkaSummit in your tweet.
3. The name of the song must be
within brackets as shown below.
@riferrei #KafkaSummit
14. Caching Pattern:
Refresh Ahead
● Proactively updates the cache
● Keep the entries always in-sync
● Ideal for latency sensitive cases
● Ideal when data read is costly
● It may need initial data loading
@riferrei #KafkaSummit
Kafka
Connect
Cache
Kafka
Connect
API
15. Caching Pattern:
Refresh Ahead / Adapt
● Proactively updates the cache
● Keep the entries always in-sync
● Ideal for latency sensitive cases
● Ideal when data read is costly
● It may need initial data loading
@riferrei #KafkaSummit
Kafka
Connect
Application
Cache
Kafka
Connect
Transform and adapt
records before delivery
Schema Registry for
canonical models
API
16. Caching Pattern:
Write Behind
● Removes I/O pressure from app
● Allows true horizontal scalability
● Ensures ordering and persistence
● Minimizes DB code complexity
● Totally handles DB unavailability
@riferrei #KafkaSummit
Kafka
Connect
Application
Cache
Kafka
Connect
API
17. Caching Pattern:
Write Behind / Adapt
● Removes I/O pressure from app
● Allows true horizontal scalability
● Ensures ordering and persistence
● Minimizes DB code complexity
● Totally handles DB unavailability
@riferrei #KafkaSummit
Kafka
Connect
Application
Cache
Kafka
Connect
Transform and adapt
records before delivery
Schema Registry for
canonical models
API
18. Caching Pattern:
Event Federation
● Replicates data across regions
● Keep multiple regions in-sync
● Great to improve RPO and RTO
● Handles lazy/slow networks well
● Works well if its used along with
Read-Through and Write-Through
patterns.
@riferrei #KafkaSummit
Confluent
Replicator
<<MirrorMaker>>
20. Kafka Connect support
for In-Memory Caches
● Connector for Redis is open and it
is available in Confluent Hub
● Connector for Memcached is open
and it is available in Confluent Hub
● Connectors for both GridGain and
Apache Ignite implementations.
● Connector for InfiniSpan is open
and is maintained by Red Hat
@riferrei #KafkaSummit
Kafka
Connect
Kafka
Connect
Kafka
Connect
Kafka
Connect
21. Frameworks for other
In-Memory Caches
● Oracle provides HotCache from
GoldenGate for Oracle Coherence
● Hazelcast has the Jet framework,
which provides support for Kafka
● Pivotal GemFire (Apache Geode)
has good support from Spring
● Good news: you can always write
your own sink using Connect API
@riferrei #KafkaSummit
Oracle
GoldenGate
Hazelcast
Jet
Spring Data
Spring Kafka
Connect
Framework
Any
Cache
22. Interested on DB CDC?
Then meet Debezium!
● Amazing CDC technology to pull
data out from databases to Kafka
● Works in a log level, which means
true CDC implementation for your
projects instead of record polling
● Open-source maintained by Red
Hat. Have broad support for many
popular databases.
● It is built on top of Kafka Connect @riferrei #KafkaSummit
23. Support for Running
Kafka Connect Servers
● Run by yourself on BareMetal:
https://kafka.apache.org/downloads
https://www.confluent.io/download
● IaaS on AWS or Google Cloud:
https://github.com/confluentinc/ccloud-tools
● Running using Docker Containers:
https://hub.docker.com/r/confluentinc/cp-kafka-
connect/
● Running using Kubernetes:
https://github.com/confluentinc/cp-helm-chart
https://www.confluent.io/confluent-operator/
@riferrei #KafkaSummit
Kafka
Connect