Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time Data Enrichment

UNLOCKING THE POWER
OF APACHE KAFKA:
HOW KAFKA LISTENERS FACILITATE
REAL TIME DATA ENRICHMENT
Pooja Dusane
Data Engineer | Denodo

AGENDA
1. Kafka
a. Why is Kafka Popular?
b. Kafka History
c. What is Kafka
d. Kafka Key Terminologies
2. Kafka Listener
a. What are Kafka Listeners
b. How Kafka Listeners facilitate real time data enrichment
c. Denodo Kafka Listener
d. Difference between Custom Wrapper and Listener
3. Demo
4. Closing Remarks

4
More than 80% of all Fortune 100
companies trust, and use Kafka.

‹#›
WHY IS KAFKA POPULAR
Architecture - Kafka uses a partitioned log model, which combines messaging queue and publish subscribe
approaches.
Scalability - Kafka provides scalability by allowing partitions to be distributed across different servers.
Zero Downtime - Kafka appears to be a publish-subscribe system capable of delivering in-order, continuous, and
scalable messaging.
Low Latency & High Throughput - Without the need for such powerful hardware, Apache Kafka as a service can
manage high-volume, high-speed data with millisecond latency, which is what most new use cases require.
Fault Tolerance - If a job is executing on a system that fails, Kafka Streams will immediately resume the process on one
of the remaining running instances of the application.
Extensibility - Kafka’s prominence has prompted numerous other programs to develop integrations with it over time.
Guaranteed Delivery - Kafka will ensure that no redundant messages are created in the topic and that messages sent
by a producer to a specific topic partition are attached in the order in which they were sent.

6
HISTORY
1 3 5
6
4
2
2010 LinkedIn
Developed Kafka
2015 Kafka version
0.8.2 is released
2019 Confluent
raised money to
expand.
2012 Kafka is
donated to the
Apache Software
Foundation
2017: Kafka
version 1.0.0 stable
release
2021: Kafka
version 2.8.0 is
released
(improvements)

7
WHAT IS KAFKA
Apache Kafka is a distributed data store optimized for ingesting and
processing streaming data in real-time.
Different models are available:
▪ Publish-Subscribe model
▪ Queuing model
Apache Kafka is horizontally scalable, highly available, fault tolerant.
It allows cluster architectures, load balancer configuration and topics
are partitioned.

‹#›
KEY TERMINOLOGY
● Broker : Apache Kafka runs as a cluster on one or more servers that can span multiple data centers.
● Producer : It writes data to the brokers.
● Consumer : It consumes data from brokers.
● Topics : A Topic is a category/feed name to which messages are stored and published.
● Partitions : Kafka topics are divided into a number of partitions, which contains immutable messages

11
WHAT ARE KAFKA LISTENERS
● Kafka listeners are part of an application that consume data from Kafka topics.
● They continuously poll Kafka for new messages in near real-time.
● Kafka listeners retrieve messages and process them according to the application's logic.
● Kafka listeners can be configured to listen to one or more topics and use consumer groups for fault-tolerance and
load balancing.

12
HOW KAFKA LISTENERS FACILITATE REAL TIME DATA ENRICHMENT
● Real-time data enrichment is the process of adding additional information to incoming data in real-time.
● Kafka listeners allow applications to consume data from Kafka topics and process it in real-time.
● When a Kafka listener is configured to listen to a particular Kafka topic, it will receive a stream of messages as they
are published to the topic.
● The listener can then process each message and add additional information to it before passing it on to
downstream systems or a consuming kafka topic.
● With Kafka listeners, organizations can build highly performant and scalable applications that can handle large
volumes of data in real-time.

‹#›
Overview
KAFKA LISTENERS IN DENODO
● Component in the Denodo Platform that allows receiving and sending events to Apache Kafka
● Executes the sentences against Denodo from the information received in Apache Kafka events
● Extension of the VQL language to allow configuring the created components
● Graphical component for the Design Studio applications to manage the created components

‹#›
Overview
KAFKA LISTENER IN DENODO
In Virtual DataPort you can create a Kafka listener to subscribe to data originated in a Kafka server.
● Execute the VQL statements received from the Kafka server.
● Or, define a query with the interpolation variable (@LISTENEREXPRESSION)

‹#›
Difference between Kafka Listener and Kafka Custom Wrapper
Custom Wrapper
● Custom Wrapper enables “pull” access (or query
based)
● Wrapper allows access to topic information in the same
way as if it were a conventional data source.
● Access is incrementally or from a certain point to obtain
all the requested data
● Only read from the Kafka topics so as to combine it
with other views
● Key Use Case- To access Kafka topics in as a data
source for publishing data in web services or reporting
tools
Listener
● Listener enables “push” access ( or event- based)
● The listener's objective is to process the information
from these topics.
● Access is through VQL statements or interpolation
variable
● Read and Write to the Kafka topics
● Key Use Case - Data enrichment of producer data.

‹#›
CDC: Change Data Capture pattern through Kafka Listener
Kafka
producer
topics
Kafka
subscriber
topicsr
Producer
application
SELECT * FROM
sources.departments
CONTEXT('cache_prelo
ad'='true',
'cache_wait_for_load
'='true',
'cache_invalidate'='
matching_rows');
Consumer
Application
<?xml version='1.0'
encoding='UTF-
8'?><response><item><department
_id>10</department_id><departme
nt_name>Administration</departm
ent_name>...
……
>..
<department_id>270</department_
id><department_name>Payroll</de
partment_name><manager_id/><loc
ation_id>1700</location_id></it
em></response>

‹#›
DEMO - Enriching events from producers through Kafka Listeners
Kafka
producer
topics
Kafka
subscriber
topicsr
Producer
application
'{"region_name":"America",
"region_id":2}'
[{"country_id":"AR","country_name"
:"Argentina","region_id":2},{"coun
try_id":"BR","country_name":"Brazi
l","region_id":2},{"country_id":"C
A","country_name":"Canada","region
_id":2},{"country_id":"MX","countr
y_name":"Mexico","region_id":2},{"
country_id":"US","country_name":"U
nited States of
America","region_id":2}]
SELECT * FROM sources.countries
WHERE region_id =
jsonpath(@LISTENEREXPRESSION,'$.regi
on_id')
Consumer
Application

CLOSING
REMARKS
● Kafka listeners continuously pull Kafka for new messages
in near real-time
● The listener can process each message and add additional
information thus enriching the data before passing it on to
a consuming kafka topic.
● In Denodo, Kafka listener can execute VQL statements that
are received from kafka server or you can use query with
the interpolation variable (@LISTENEREXPRESSION)

References
Denodo Community:
● Kafka Listeners
● Creating Kafka Listeners
● Denodo Kafka Custom Wrapper - User Manual

Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time Data Enrichment

Recommended

Recommended

More Related Content

Similar to Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time Data Enrichment

Similar to Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time Data Enrichment (20)

More from Denodo

More from Denodo (20)

Recently uploaded

Recently uploaded (20)

Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time Data Enrichment