Use Apache Gradle to Build
and Automate KSQL and
Apache Kafka Streams
twitter: @stewartbryson
medium: @stewartbryson
linkedin: stewartbryson
Owner & CEO
Red Pill Analytics
@redpilla
What We Do
Data Warehouse Analytics
ANALYTICS
Data-engineering & ETL
@redpilla
Project Overview
KSQL
KSQL User Defined Functions
(demo-only)
Kafka Streams (demo-only)
Agenda
@redpilla
@redpilla
Project Inspiration
@redpilla
Data Integration: Streaming In
@redpilla
Data Integration: Streaming Out
@redpilla
Packaging of Payloads for OWMC
@redpilla
KSQL Primer
@redpilla
KSQL Registration Queries
CREATE STREAM clickstream
( _time bigint,
time varchar,
ip varchar,
request varchar,
status int,
userid int,
bytes bigint,
agent varchar )
with
( kafka_topic = 'clickstream',
value_format = 'json' );
@redpilla
KSQL Persistent Queries
CREATE TABLE events_per_min AS
SELECT userid,
count(*) AS events
FROM clickstream window
TUMBLING (size 60 second)
GROUP BY userid;
@redpilla
A pipeline is a group of SQL
statements that work together to
define an end-to-end process.
@redpilla
KSQL Dependencies
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_codes_count
customer_clickstream
user_clickstream
web_users
click_user_sessions
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_codes_count
customer_clickstream
user_clickstream
web_users
click_user_sessions
Dependencies
within a pipeline
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_codes_count
customer_clickstream
user_clickstream
web_users
click_user_sessions
Dependencies
across pipelines
@redpilla
KSQL Pipelines
clickstream
clickstream_codes
Streams and TablesSQL Scripts
enriched_error_codes
enriched_error_codes_count
customer_clickstream
user_clickstream
web_users
click_user_sessions
Manage all KSQL
dependencies in
one place
@redpilla
Multiple statements per pipeline
Statements are in script files
Script files are in Git
Developers need to iterate
@redpilla
What is Apache Gradle?
Gradle is an open-source build automation tool for the Age
of Continuous Delivery (CD). Building software is no longer
just about compiling, linking and packaging.
@redpilla
What is Apache Gradle?
@redpilla
gradle-confluent for KSQL
pipelines
shadow for KSQL user defined
functions
application for Kafka Streams
Gradle Plugins
@redpilla
Plugin
@redpilla
Plugin
github.com/RedPillAnalytics/gradle-confluent
@redpilla
Unfortunately…
I only have 40 minutes
@redpilla
Unfortunately…
I only have 33 minutes
@redpilla
Demo
@redpilla
Demo
...using a Jupyter Notebook
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryson, Red Pill Analytics) Kafka Summit NYC 2019

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryson, Red Pill Analytics) Kafka Summit NYC 2019