Westpac Bank Tech Talk 1: Dive into Apache Kafka

Tech Talk #1
Dive into Apache Kafka
Brett Randall
Solutions Engineer
brett.randall@confluent.io

Schedule
Tech Talks Date/Time
TT#1 Dive into Apache Kafka® June 4th (Thursday)
10:30am - 11:30am AEST
TT#2 Introduction to Streaming Data and Stream Processing with Apache
Kafka
July 2nd (Thursday)
10:30am - 11:30am AEST
TT#3 Confluent Schema Registry August 6th (Thursday)
10:30am - 11:30am AEST
TT#4 Kafka Connect September 3rd (Thursday)
10:30am - 11:30am AEST
TT#5 Avoiding Pitfalls with Large-Scale Kafka Deployments October 1st (Thursday)
10:30am - 11:30am AEST

Disclaimer… • Some of you may know what Kafka
is or have used it already...
• If that’s the case, sit back and take a
refresher on Kafka and learn about
Confluent

Business Digitization Trends are Revolutionizing your
Data Flow
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
polymorphic

Legacy Data Infrastructure Solutions Have
Architectural Flaws
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
These solutions can be
● Batch-oriented, instead of event-
oriented in real time
● Complex to scale at high
throughput
● Connected point-to-point,
instead of publish / subscribe
● Lacking data persistence and
retention
● Incapable of in-flight message
processing
App App

Modern Architectures are Adapting to New Data
Requirements
NoSQL DBs Big Data Analytics
But how do we
revolutionize data
flow in a world of
exploding,
distributed and ever
changing data?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
App App

The Solution is a Streaming Platform for Real-Time
Data Processing
A Streaming Platform
provides a single
source of truth
about your data to
everyone in your
organization
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App AppApp App
Streaming Platform

Apache Kafka®: Open Source Streaming Platform
Battle-Tested at Scale
More than 1
petabyte of
data in Kafka
Over 4.5 trillion
messages per
day
60,000+ data
streams
Source of all
data warehouse
& Hadoop data
Over 300 billion
user-related
events per day
The birthplace of Apache Kafka

“Architectural Patterns with Apache
Kafka

Analytics - Database Offload
RDBMS
CDC
Data Lakes

Stream Processing with Apache Kafka and
ksqlDB
payment event
customer
customer payments
Stream
Processing
RDBMS CDC

Transform Once, Use Many
customer
Stream
Processin
g
RDBMS
customer payments
payment event

Evolve processing from old systems to new
Stream
Processing
RDBMS
Existing
App
New App
<x>

Evolve processing from old systems to new
Stream
Processing
RDBMS
Existing
App
New App
<x>
New App
<y>

KAFKA
A MODERN, DISTRIBUTED
PLATFORM FOR DATA STREAMS

Scalability of a Filesystem
• Hundreds of MB/s throughput
• Many TB per server
• Commodity hardware

Guarantees of a Database
• Strict ordering
• Persistence

Rewind & Replay
Reset to any point in the shared narrative

Distributed by Design
• Replication
• Fault Tolerance
• Partitioning
• Elastic Scaling

Kafka Topics
my-topic
my-topic-partition-0
broker-1
broker-2
broker-3

Creating a Topic
$ kafka-topics --zookeeper zk:2181
--create
--topic my-topic
--replication-factor 3
--partitions 3
Or use the new AdminClient API!

Partition Leadership and Replication
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4

Partition Leadership and Replication – Node Failure
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4

Clients – Producer Design
ProducerProducer Record
Topic
[Partition]
[Timestamp]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
Can’t retry,
throw exception
Success: return
metadata
Yes
[Headers]
[Key]

The Serializer
Kafka doesn’t care about what you send to it as long as it’s been
converted to a byte stream beforehand.
JSON
CSV
Avro
XML
SERIALIZERS
01001010 01010011 01001111 01001110
01000011 01010011 01010110
01001010 01010011 01001111 01001110
01010000 01110010 01101111 01110100 ...
01011000 01001101 01001100
(if you must)
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
Protobuf

The Serializer
private Properties kafkaProps = new Properties();
kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”);
kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
producer = new KafkaProducer<String, SpecificRecord>(kafkaProps);
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html

Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
[Key]
Value
Record keys determine the partition with the
default kafka partitioner
If a key isn’t provided, messages will be
produced in a round robin fashion
partitioner

Producer Record
Topic
[Partition]
AAAA
Value
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner

Producer Record
Topic
[Partition]
BBBB
Value
partitioner

Producer Record
Topic
[Partition]
CCCC
Value
partitioner

Producer Record
Topic
[Partition]
DDDD
Value
partitioner

Record Keys and why they’re important – Key Cardinality
Consumers
Key cardinality affects the
amount of work done by
consumers in a group. Poor key
choice can lead to uneven
workloads.
Keys in Kafka don’t have to be
primitives, like strings or ints.
Like values, they can be be
anything: JSON, Avro, etc… So
create a key that will evenly
distribute groups of records
around the partitions.
Car·di·nal·i·ty
/ˌkärdəˈnalədē/
Noun
the number of elements in a set or other grouping, as a property of that grouping.

A Basic Java Consumer
final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
-- Do Some Work --
}
}
} finally {
consumer.close();
}
}

Consuming From Kafka – Single Consumer
C
One consumer will
consume from all
partitions,
maintaining
partition offsets

Consuming From Kafka – Grouped Consumers
CC
C1
CC
consumers are
separate,
operating
independently
C2

C C
C C
Consumers in a
consumer group
share the
workload

0 1
2 3
They organize
themselves by ID

0 1
2 3
Failures will occur

0, 3 1
2 3
Another consumer in
the group picks up for
the failed consumer.
This is a rebalance.

Use a Good Kafka Client!
Clients
● Java/Scala - default clients, comes with Kafka
● C/C++ - https://github.com/edenhill/librdkafka
● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet
● Python - https://github.com/confluentinc/confluent-kafka-python
● Golang - https://github.com/confluentinc/confluent-kafka-go
● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!)
New Kafka features will only be available to modern, updated clients!

Without Confluent and Kafka
LINE OF BUSINESS 01 LINE OF BUSINESS 02 PUBLIC CLOUD
Data architecture is rigid, complicated, and expensive - making it too hard
and cost-prohibitive to get mission-critical apps to market quickly

Confluent & Kafka reimagine this as the central
nervous system of your business
Hadoop ...
Device
Logs ... App ...MicroserviceMainframes
Data
Warehouse
Splunk ...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Real-time
Inventory
Real-time Fraud
Detection
Real-time
Customer 360
Machine
Learning
Models
Real-time Data
Transformation ...
Contextual Event-Driven Applications
Universal Event Pipeline

Apache Kafka is one of the most popular open
source projects in the world
49
Confluent are the
Kafka Experts
Founded by the creators of
Apache Kafka, Confluent
continues to be the major
contributor.
Confluent invests in
Open Source
2020 re-architecture
removes the
scalability-limiting
use of Zookeeper
in Apache Kafka

Future-proof event streaming
Kafka re-engineered as a fully-managed, cloud-native service by its
original creators and major contributors of Kafka
Global
Automated disaster
recovery
Global applications with
geo-awareness
Infinite
Efficient and infinite data
with tiered storage
Unlimited horizontal
scalability for clusters
Elastic
Easy multi-cloud
orchestration
Persistent bridge to
cloud from on-prem

Make your applications
more valuable with
real time insights
enabled by next-gen
architecture
DATA INTEGRATION
Database changes
Log
events
IoT
events
Web events
Connected car
Fraud detection
Customer 360
Personalized
promotions
Apps driven by
real time data
Quality
assurance
SIEM/SOC
Inventory
management
Proactive
patient care
Sentiment
analysis
Capital
management
Modernize
your apps

Build a bridge to the cloud for your data
Ensure availability and connectivity regardless of where your data lives
53
Private Cloud
Deploy on premises with
Confluent Platform
Public/Multi-Cloud
Leverage a fully managed
service with Confluent Cloud
Hybrid Cloud
Build a persistent bridge
from datacenter to cloud

Confluent Platform
Dynamic Performance & Elasticity
Auto Data Balancer | Tiered Storage
Flexible DevOps Automation
Operator | Ansible
GUI-driven Mgmt & Monitoring
Control Center
Efficient
Operations at Scale
Freedom of Choice
Committer-driven Expertise
Event Streaming Database
ksqlDB
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
Non-Java Clients | REST Proxy
Global Resilience
Multi-region Clusters | Replicator
Data Compatibility
Schema Registry | Schema Validation
Enterprise-grade Security
RBAC | Secrets | Audit Logs
ARCHITECTOPERATORDEVELOPER
Open Source | Community licensed
Unrestricted
Developer Productivity
Production-stage
Prerequisites
Fully Managed Cloud ServiceSelf-managed Software
Training Partners
Enterprise
Support
Professional
Services
Apache Kafka

Project Metamorphosis
Unveiling the next-gen event
streaming platform
Listen to replay and
Sign up for updates
cnfl.io/pm
Jay Kreps
Co-founder and CEO
Confluent

Download your Apache Kafka and
Stream Processing O'Reilly Book Bundle
Download at: https://www.confluent.io/apache-kafka-stream-processing-book-
bundle/

Brett Randall
Solutions Engineer
brett.randall@confluent.io

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Westpac Bank Tech Talk 1: Dive into Apache Kafka

Similar to Westpac Bank Tech Talk 1: Dive into Apache Kafka (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Westpac Bank Tech Talk 1: Dive into Apache Kafka