SlideShare a Scribd company logo
1 of 56
Tech Talk #1
Dive into Apache Kafka
Brett Randall
Solutions Engineer
brett.randall@confluent.io
Schedule
Tech Talks Date/Time
TT#1 Dive into Apache Kafka® June 4th (Thursday)
10:30am - 11:30am AEST
TT#2 Introduction to Streaming Data and Stream Processing with Apache
Kafka
July 2nd (Thursday)
10:30am - 11:30am AEST
TT#3 Confluent Schema Registry August 6th (Thursday)
10:30am - 11:30am AEST
TT#4 Kafka Connect September 3rd (Thursday)
10:30am - 11:30am AEST
TT#5 Avoiding Pitfalls with Large-Scale Kafka Deployments October 1st (Thursday)
10:30am - 11:30am AEST
Disclaimer… • Some of you may know what Kafka
is or have used it already...
• If that’s the case, sit back and take a
refresher on Kafka and learn about
Confluent
Business Digitization Trends are Revolutionizing your
Data Flow
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
polymorphic
Legacy Data Infrastructure Solutions Have
Architectural Flaws
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
These solutions can be
● Batch-oriented, instead of event-
oriented in real time
● Complex to scale at high
throughput
● Connected point-to-point,
instead of publish / subscribe
● Lacking data persistence and
retention
● Incapable of in-flight message
processing
App App
Modern Architectures are Adapting to New Data
Requirements
NoSQL DBs Big Data Analytics
But how do we
revolutionize data
flow in a world of
exploding,
distributed and ever
changing data?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
App App
The Solution is a Streaming Platform for Real-Time
Data Processing
A Streaming Platform
provides a single
source of truth
about your data to
everyone in your
organization
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App AppApp App
Streaming Platform
Apache Kafka®: Open Source Streaming Platform
Battle-Tested at Scale
More than 1
petabyte of
data in Kafka
Over 4.5 trillion
messages per
day
60,000+ data
streams
Source of all
data warehouse
& Hadoop data
Over 300 billion
user-related
events per day
The birthplace of Apache Kafka
“Architectural Patterns with Apache
Kafka
Analytics - Database Offload
RDBMS
CDC
Data Lakes
Stream Processing with Apache Kafka and
ksqlDB
payment event
customer
customer payments
Stream
Processing
RDBMS CDC
Transform Once, Use Many
customer
Stream
Processin
g
RDBMS
customer payments
payment event
Evolve processing from old systems to new
Stream
Processing
RDBMS
Existing
App
New App
<x>
Evolve processing from old systems to new
Stream
Processing
RDBMS
Existing
App
New App
<x>
New App
<y>
Writers
Kafka
cluster
Readers
KAFKA
A MODERN, DISTRIBUTED
PLATFORM FOR DATA STREAMS
Scalability of a Filesystem
• Hundreds of MB/s throughput
• Many TB per server
• Commodity hardware
Guarantees of a Database
• Strict ordering
• Persistence
Rewind & Replay
Reset to any point in the shared narrative
Distributed by Design
• Replication
• Fault Tolerance
• Partitioning
• Elastic Scaling
Kafka Topics
my-topic
my-topic-partition-0
my-topic-partition-1
my-topic-partition-2
broker-1
broker-2
broker-3
Creating a Topic
$ kafka-topics --zookeeper zk:2181
--create 
--topic my-topic 
--replication-factor 3 
--partitions 3
Or use the new AdminClient API!
Producing to Kafka
Time
Producing to Kafka
Time
C CC
Partition Leadership and Replication
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
Partition Leadership and Replication – Node Failure
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
Producing to Kafka
Clients – Producer Design
ProducerProducer Record
Topic
[Partition]
[Timestamp]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
Can’t retry,
throw exception
Success: return
metadata
Yes
[Headers]
[Key]
The Serializer
Kafka doesn’t care about what you send to it as long as it’s been
converted to a byte stream beforehand.
JSON
CSV
Avro
XML
SERIALIZERS
01001010 01010011 01001111 01001110
01000011 01010011 01010110
01001010 01010011 01001111 01001110
01010000 01110010 01101111 01110100 ...
01011000 01001101 01001100
(if you must)
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
Protobuf
The Serializer
private Properties kafkaProps = new Properties();
kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”);
kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
producer = new KafkaProducer<String, SpecificRecord>(kafkaProps);
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
[Key]
Value
Record keys determine the partition with the
default kafka partitioner
If a key isn’t provided, messages will be
produced in a round robin fashion
partitioner
Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
AAAA
Value
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
BBBB
Value
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
CCCC
Value
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
DDDD
Value
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Record Keys and why they’re important – Key Cardinality
Consumers
Key cardinality affects the
amount of work done by
consumers in a group. Poor key
choice can lead to uneven
workloads.
Keys in Kafka don’t have to be
primitives, like strings or ints.
Like values, they can be be
anything: JSON, Avro, etc… So
create a key that will evenly
distribute groups of records
around the partitions.
Car·di·nal·i·ty
/ˌkärdəˈnalədē/
Noun
the number of elements in a set or other grouping, as a property of that grouping.
Consuming from Kafka
A Basic Java Consumer
final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
-- Do Some Work --
}
}
} finally {
consumer.close();
}
}
Consuming From Kafka – Single Consumer
C
One consumer will
consume from all
partitions,
maintaining
partition offsets
Consuming From Kafka – Grouped Consumers
CC
C1
CC
consumers are
separate,
operating
independently
C2
Consuming From Kafka – Grouped Consumers
C C
C C
Consumers in a
consumer group
share the
workload
Consuming From Kafka – Grouped Consumers
0 1
2 3
They organize
themselves by ID
Consuming From Kafka – Grouped Consumers
0 1
2 3
Failures will occur
Consuming From Kafka – Grouped Consumers
0, 3 1
2 3
Another consumer in
the group picks up for
the failed consumer.
This is a rebalance.
Use a Good Kafka Client!
Clients
● Java/Scala - default clients, comes with Kafka
● C/C++ - https://github.com/edenhill/librdkafka
● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet
● Python - https://github.com/confluentinc/confluent-kafka-python
● Golang - https://github.com/confluentinc/confluent-kafka-go
● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!)
New Kafka features will only be available to modern, updated clients!
Without Confluent and Kafka
LINE OF BUSINESS 01 LINE OF BUSINESS 02 PUBLIC CLOUD
Data architecture is rigid, complicated, and expensive - making it too hard
and cost-prohibitive to get mission-critical apps to market quickly
Confluent & Kafka reimagine this as the central
nervous system of your business
Hadoop ...
Device
Logs ... App ...MicroserviceMainframes
Data
Warehouse
Splunk ...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Real-time
Inventory
Real-time Fraud
Detection
Real-time
Customer 360
Machine
Learning
Models
Real-time Data
Transformation ...
Contextual Event-Driven Applications
Universal Event Pipeline
Apache Kafka is one of the most popular open
source projects in the world
49
Confluent are the
Kafka Experts
Founded by the creators of
Apache Kafka, Confluent
continues to be the major
contributor.
Confluent invests in
Open Source
2020 re-architecture
removes the
scalability-limiting
use of Zookeeper
in Apache Kafka
Future-proof event streaming
Kafka re-engineered as a fully-managed, cloud-native service by its
original creators and major contributors of Kafka
Global
Automated disaster
recovery
Global applications with
geo-awareness
Infinite
Efficient and infinite data
with tiered storage
Unlimited horizontal
scalability for clusters
Elastic
Easy multi-cloud
orchestration
Persistent bridge to
cloud from on-prem
Make your applications
more valuable with
real time insights
enabled by next-gen
architecture
DATA INTEGRATION
Database changes
Log
events
IoT
events
Web events
Connected car
Fraud detection
Customer 360
Personalized
promotions
Apps driven by
real time data
Quality
assurance
SIEM/SOC
Inventory
management
Proactive
patient care
Sentiment
analysis
Capital
management
Modernize
your apps
Build a bridge to the cloud for your data
Ensure availability and connectivity regardless of where your data lives
53
Private Cloud
Deploy on premises with
Confluent Platform
Public/Multi-Cloud
Leverage a fully managed
service with Confluent Cloud
Hybrid Cloud
Build a persistent bridge
from datacenter to cloud
Confluent Platform
Dynamic Performance & Elasticity
Auto Data Balancer | Tiered Storage
Flexible DevOps Automation
Operator | Ansible
GUI-driven Mgmt & Monitoring
Control Center
Efficient
Operations at Scale
Freedom of Choice
Committer-driven Expertise
Event Streaming Database
ksqlDB
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
Non-Java Clients | REST Proxy
Global Resilience
Multi-region Clusters | Replicator
Data Compatibility
Schema Registry | Schema Validation
Enterprise-grade Security
RBAC | Secrets | Audit Logs
ARCHITECTOPERATORDEVELOPER
Open Source | Community licensed
Unrestricted
Developer Productivity
Production-stage
Prerequisites
Fully Managed Cloud ServiceSelf-managed Software
Training Partners
Enterprise
Support
Professional
Services
Apache Kafka
Project Metamorphosis
Unveiling the next-gen event
streaming platform
Listen to replay and
Sign up for updates
cnfl.io/pm
Jay Kreps
Co-founder and CEO
Confluent
Download your Apache Kafka and
Stream Processing O'Reilly Book Bundle
Download at: https://www.confluent.io/apache-kafka-stream-processing-book-
bundle/
Brett Randall
Solutions Engineer
brett.randall@confluent.io

More Related Content

What's hot

Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
confluent
 

What's hot (20)

Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Scaling Microservices with Kubernetes
Scaling Microservices with KubernetesScaling Microservices with Kubernetes
Scaling Microservices with Kubernetes
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Intro to Helm for Kubernetes
Intro to Helm for KubernetesIntro to Helm for Kubernetes
Intro to Helm for Kubernetes
 
Helm 3
Helm 3Helm 3
Helm 3
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
 
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
 
knolx of KubeCost & Infracost
knolx of KubeCost & Infracostknolx of KubeCost & Infracost
knolx of KubeCost & Infracost
 
CI/CD Tools Universe: The Ultimate List
CI/CD Tools Universe: The Ultimate ListCI/CD Tools Universe: The Ultimate List
CI/CD Tools Universe: The Ultimate List
 
Kubernetes Application Deployment with Helm - A beginner Guide!
Kubernetes Application Deployment with Helm - A beginner Guide!Kubernetes Application Deployment with Helm - A beginner Guide!
Kubernetes Application Deployment with Helm - A beginner Guide!
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Run Jenkins as Managed Product on ECS - AWS Meetup
Run Jenkins as Managed Product on ECS - AWS MeetupRun Jenkins as Managed Product on ECS - AWS Meetup
Run Jenkins as Managed Product on ECS - AWS Meetup
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
 
Terraform
TerraformTerraform
Terraform
 

Similar to Westpac Bank Tech Talk 1: Dive into Apache Kafka

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
HostedbyConfluent
 

Similar to Westpac Bank Tech Talk 1: Dive into Apache Kafka (20)

What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
 

More from confluent

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Westpac Bank Tech Talk 1: Dive into Apache Kafka

  • 1. Tech Talk #1 Dive into Apache Kafka Brett Randall Solutions Engineer brett.randall@confluent.io
  • 2. Schedule Tech Talks Date/Time TT#1 Dive into Apache Kafka® June 4th (Thursday) 10:30am - 11:30am AEST TT#2 Introduction to Streaming Data and Stream Processing with Apache Kafka July 2nd (Thursday) 10:30am - 11:30am AEST TT#3 Confluent Schema Registry August 6th (Thursday) 10:30am - 11:30am AEST TT#4 Kafka Connect September 3rd (Thursday) 10:30am - 11:30am AEST TT#5 Avoiding Pitfalls with Large-Scale Kafka Deployments October 1st (Thursday) 10:30am - 11:30am AEST
  • 3. Disclaimer… • Some of you may know what Kafka is or have used it already... • If that’s the case, sit back and take a refresher on Kafka and learn about Confluent
  • 4. Business Digitization Trends are Revolutionizing your Data Flow Massive volumes of new data generated every day Mobile Cloud Microservices Internet of Things Machine Learning Distributed across apps, devices, datacenters, clouds Structured, unstructured polymorphic
  • 5. Legacy Data Infrastructure Solutions Have Architectural Flaws App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB These solutions can be ● Batch-oriented, instead of event- oriented in real time ● Complex to scale at high throughput ● Connected point-to-point, instead of publish / subscribe ● Lacking data persistence and retention ● Incapable of in-flight message processing App App
  • 6. Modern Architectures are Adapting to New Data Requirements NoSQL DBs Big Data Analytics But how do we revolutionize data flow in a world of exploding, distributed and ever changing data? App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB App App
  • 7. The Solution is a Streaming Platform for Real-Time Data Processing A Streaming Platform provides a single source of truth about your data to everyone in your organization NoSQL DBs Big Data Analytics App App DWH Transactional Databases Analytics Databases Data Flow DB DB App AppApp App Streaming Platform
  • 8. Apache Kafka®: Open Source Streaming Platform Battle-Tested at Scale More than 1 petabyte of data in Kafka Over 4.5 trillion messages per day 60,000+ data streams Source of all data warehouse & Hadoop data Over 300 billion user-related events per day The birthplace of Apache Kafka
  • 10. Analytics - Database Offload RDBMS CDC Data Lakes
  • 11. Stream Processing with Apache Kafka and ksqlDB payment event customer customer payments Stream Processing RDBMS CDC
  • 12. Transform Once, Use Many customer Stream Processin g RDBMS customer payments payment event
  • 13. Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x>
  • 14. Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x> New App <y>
  • 15.
  • 18. Scalability of a Filesystem • Hundreds of MB/s throughput • Many TB per server • Commodity hardware
  • 19. Guarantees of a Database • Strict ordering • Persistence
  • 20. Rewind & Replay Reset to any point in the shared narrative
  • 21. Distributed by Design • Replication • Fault Tolerance • Partitioning • Elastic Scaling
  • 23. Creating a Topic $ kafka-topics --zookeeper zk:2181 --create --topic my-topic --replication-factor 3 --partitions 3 Or use the new AdminClient API!
  • 26. Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 27. Partition Leadership and Replication – Node Failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 29. Clients – Producer Design ProducerProducer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No Can’t retry, throw exception Success: return metadata Yes [Headers] [Key]
  • 30. The Serializer Kafka doesn’t care about what you send to it as long as it’s been converted to a byte stream beforehand. JSON CSV Avro XML SERIALIZERS 01001010 01010011 01001111 01001110 01000011 01010011 01010110 01001010 01010011 01001111 01001110 01010000 01110010 01101111 01110100 ... 01011000 01001101 01001100 (if you must) Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html Protobuf
  • 31. The Serializer private Properties kafkaProps = new Properties(); kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”); kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”); kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer"); producer = new KafkaProducer<String, SpecificRecord>(kafkaProps); Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 32. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] [Key] Value Record keys determine the partition with the default kafka partitioner If a key isn’t provided, messages will be produced in a round robin fashion partitioner
  • 33. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] AAAA Value Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner
  • 34. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] BBBB Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  • 35. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] CCCC Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  • 36. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] DDDD Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  • 37. Record Keys and why they’re important – Key Cardinality Consumers Key cardinality affects the amount of work done by consumers in a group. Poor key choice can lead to uneven workloads. Keys in Kafka don’t have to be primitives, like strings or ints. Like values, they can be be anything: JSON, Avro, etc… So create a key that will evenly distribute groups of records around the partitions. Car·di·nal·i·ty /ˌkärdəˈnalədē/ Noun the number of elements in a set or other grouping, as a property of that grouping.
  • 39. A Basic Java Consumer final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { -- Do Some Work -- } } } finally { consumer.close(); } }
  • 40. Consuming From Kafka – Single Consumer C One consumer will consume from all partitions, maintaining partition offsets
  • 41. Consuming From Kafka – Grouped Consumers CC C1 CC consumers are separate, operating independently C2
  • 42. Consuming From Kafka – Grouped Consumers C C C C Consumers in a consumer group share the workload
  • 43. Consuming From Kafka – Grouped Consumers 0 1 2 3 They organize themselves by ID
  • 44. Consuming From Kafka – Grouped Consumers 0 1 2 3 Failures will occur
  • 45. Consuming From Kafka – Grouped Consumers 0, 3 1 2 3 Another consumer in the group picks up for the failed consumer. This is a rebalance.
  • 46. Use a Good Kafka Client! Clients ● Java/Scala - default clients, comes with Kafka ● C/C++ - https://github.com/edenhill/librdkafka ● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet ● Python - https://github.com/confluentinc/confluent-kafka-python ● Golang - https://github.com/confluentinc/confluent-kafka-go ● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!) New Kafka features will only be available to modern, updated clients!
  • 47. Without Confluent and Kafka LINE OF BUSINESS 01 LINE OF BUSINESS 02 PUBLIC CLOUD Data architecture is rigid, complicated, and expensive - making it too hard and cost-prohibitive to get mission-critical apps to market quickly
  • 48. Confluent & Kafka reimagine this as the central nervous system of your business Hadoop ... Device Logs ... App ...MicroserviceMainframes Data Warehouse Splunk ... Data Stores Logs 3rd Party Apps Custom Apps / Microservices Real-time Inventory Real-time Fraud Detection Real-time Customer 360 Machine Learning Models Real-time Data Transformation ... Contextual Event-Driven Applications Universal Event Pipeline
  • 49. Apache Kafka is one of the most popular open source projects in the world 49 Confluent are the Kafka Experts Founded by the creators of Apache Kafka, Confluent continues to be the major contributor. Confluent invests in Open Source 2020 re-architecture removes the scalability-limiting use of Zookeeper in Apache Kafka
  • 50. Future-proof event streaming Kafka re-engineered as a fully-managed, cloud-native service by its original creators and major contributors of Kafka Global Automated disaster recovery Global applications with geo-awareness Infinite Efficient and infinite data with tiered storage Unlimited horizontal scalability for clusters Elastic Easy multi-cloud orchestration Persistent bridge to cloud from on-prem
  • 51. Make your applications more valuable with real time insights enabled by next-gen architecture DATA INTEGRATION Database changes Log events IoT events Web events Connected car Fraud detection Customer 360 Personalized promotions Apps driven by real time data Quality assurance SIEM/SOC Inventory management Proactive patient care Sentiment analysis Capital management Modernize your apps
  • 52. Build a bridge to the cloud for your data Ensure availability and connectivity regardless of where your data lives 53 Private Cloud Deploy on premises with Confluent Platform Public/Multi-Cloud Leverage a fully managed service with Confluent Cloud Hybrid Cloud Build a persistent bridge from datacenter to cloud
  • 53. Confluent Platform Dynamic Performance & Elasticity Auto Data Balancer | Tiered Storage Flexible DevOps Automation Operator | Ansible GUI-driven Mgmt & Monitoring Control Center Efficient Operations at Scale Freedom of Choice Committer-driven Expertise Event Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Global Resilience Multi-region Clusters | Replicator Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | Secrets | Audit Logs ARCHITECTOPERATORDEVELOPER Open Source | Community licensed Unrestricted Developer Productivity Production-stage Prerequisites Fully Managed Cloud ServiceSelf-managed Software Training Partners Enterprise Support Professional Services Apache Kafka
  • 54. Project Metamorphosis Unveiling the next-gen event streaming platform Listen to replay and Sign up for updates cnfl.io/pm Jay Kreps Co-founder and CEO Confluent
  • 55. Download your Apache Kafka and Stream Processing O'Reilly Book Bundle Download at: https://www.confluent.io/apache-kafka-stream-processing-book- bundle/