@SimonAubury
KSQL Ops!
Running ksqlDB in the wild
linkedin.com/in/simonaubury
@SimonAubury
Simon Aubury
Kafka Summit 2020
@SimonAubury
Why am I here?
2
I am Simon Aubury
Principal Data Engineer @ ThoughtWorks
@SimonAubury
Why are you here?
◉ You’re a developer using ksqlDB
◉ You’re on the operations side
◉ You’ve had the need to 🔍🛠🤷 data
3
@SimonAubury
What are we trying to solve?
◉ Expectations have changed
◉ Kafka - Central nervous system for data
◉ Frameworks are great
◉ However, there isn’t an out of the box solution for
everything
4
@SimonAubury
Where does ksqlDB sit? Choose an API
5
Kafka Producer
Kafka Connect
Source
Kafka
Consumer
Kafka Connect
Sink
K-Streams ksqlDB
Schema
Registry
@SimonAubury
Styles of enterprise workload
6
Data Pipelines / ETL Data Enrichment
Measurement,
Detection & Audit
Transformation
@SimonAubury
Why ksqlDB?
7
K-Streams
ksqlDB
@SimonAubury
“
the entire history of software
engineering can be characterized as
one of rising levels of abstraction
G. Booch ("The Limits of Software")
8
@SimonAubury
Deployment options: Interactive vs Headless
9
Headless / Application
• All KSQL statements in a file
• ksqlDB server run this file as an
argument
• Not aware of any streams or tables you
defined in other interactive sessions
Interactive mode
• Command Line (CLI) or REST API or
Confluent Control Centre
• Interactively explore existing topics
• Write queries & inspect results
• React and evolve over time
@SimonAubury
Let’s make some assumptions
Structured
◉ Schemas AVRO,
Protobuf
Standard
◉ Data is accessible
and discoverable
Security
◉ Authentication
◉ Authorization
10
Silo
◉ Domains
◉ Isolated workloads
Scalable
◉ Distributed Infra
◉ Resilient &
Redundant
@SimonAubury
Introducing
11
Alice
Developer
Online Retailer
Bob
Data Engineer
Insurer
Carol
DBA
Bank
Dan
Product Owner
Utility
@SimonAubury
1. Streaming ETL
Alice needs to analyse digital web traffic
12
@SimonAubury
Streaming ETL
13
Situation – Online Retailer
• Analyse digital web traffic
and customer behaviour
• Review sales funnel
• Measure points of drop
@SimonAubury
Build pipeline
14
Development
•CLI locally
•Git push
•Unit Test
Build
•PR
•Unit Tests
•Integration Test
•Schema Promotion
Deploy
•Performance test
•Acceptance test
@SimonAubury
Let’s develop
q Start/connect to development Kafka cluster
q Start/connect to development KSQL cluster
q Create or locate topics
q Configure security (ACL)
q Define/evolve schema
q Find data; Produce data; Kafka connect source
setup
q Test case
q ksqlDB testing tool
q Create application
q Develop in interactive CLI
q Deploy in headless
q Ensure tests pass
q Compare results
q Commit and push code
15
@SimonAubury
Let’s deploy
q Pull Request
q Conformance tests
q Complexity tests
q Data readiness
q Schema readiness
q Security readiness
q Start headless
q All KSQL statements in a file
q Localize tenant
q ksql.service.id
q Test case passes
q Validate result
q Monitor
16
@SimonAubury
Managing dependencies
17
Apache Gradle to Build and
Automate KSQL and Kafka Streams;
Stewart Bryson
Kafka Summit NYC 2019
https://github.com/RedPillAnalytics/gradle-confluent
@SimonAubury
Plan for failure
18
KSQL Server 1 KSQL Server 2
ConsumerProducer
create stream customer_behaviour as
select ...
from userprofile;
Kafka
Broker 1
Kafka
Broker 2
Kafka
Broker 3
@SimonAubury
2. Data Enrichment
Bob needs to find customers impacted by a storm
19
@SimonAubury
Data Enrichment
20
Situation - Insurer
• A storm has impacted a
large rural area
• Outbound calls to affected
customers
@SimonAubury
Business ready
◉ Burstable architecture
◉ Tenant separation
◉ Geo-special encoding functions
◉ Flexible loading
◉ Easy mechanism for loading data files
◉ Quickly join customer, reference data
○ And apply geospatial functions
○ Time-ordered immutable data invaluable
21
@SimonAubury
Visualization of topologies
22
https://zz85.github.io/kafka-streams-viz/
Confluent Control Center
@SimonAubury
3. Measurement & Audit
Carole needs to verify a new system load is accurate
23
@SimonAubury
Measurement & Audit
24
Situation - Compliance
• Commissioning new
systems
• Audit systems to verify
data loads
@SimonAubury
Business ready
◉ Controlled use of CLI
○ Interactive commands
◉ Connect framework
○ With classpath mapped to sidecars
◉ Use ksqlDB connector management
◉ Window clauses
25
@SimonAubury
Kafka Connect Secrets Management with ksqlDB
26
@SimonAubury
4. Data Transformation
Dan needs to fix data … in a hurry
27
@SimonAubury
Data Transformation
28
Situation – Product Launch
• New system
• 5% of users are failing
verification
• Unable to release patch
quickly to app-store
@SimonAubury
When you need to hack production data
◉ Some users are (unexpectedly) using email
address in uppercase ( jane@EXAMPLE.COM )
◉ The 3rd-party verification service can’t be
changed
◉ On-demand - CLI
29
@SimonAubury
Bounded Context / Plan for scale
30
Tenant structure KSQL server
KSQL Instance 1 KSQL Instance 2
Kafka
Broker 1
bootstrap.server=kb1:9092,kb2:9092,kb3:9092
ksql.service.id=customer_service
KSQL
CLI
Other
AppKafka
Broker 2
Kafka
Broker 3
@SimonAubury
Learnings
Quick take-aways . . .
31
@SimonAubury
ksqlDB – Design learnings
◉ Good software engineering
◉ Know when not to use
◉ System & capacity planning
32
@SimonAubury
ksqlDB – People learnings
33
◉ ksqlDB can be misleading
◉ Be informed not scared
◉ ksqlDB is a great tool to have
in the streaming tool box
@SimonAubury
Any questions ?
Thanks!
34
@SimonAubury
linkedin.com/in/simonaubury
Presentation template by SlidesCarnival

KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summit 2020