SlideShare a Scribd company logo
1 of 36
1
Beyond the DSL - #process
If you’re PAPI and you know it, merge your streams!
Antony Stubbs
Solution Architect EMEA
Confluent
MÜNCHEN - 09. OKTOBER 2018
3
Kafka Streams DSL - the Easy Path
4
DSL - but eventually...
5
Quick Scientific™ survey
Who in the audience uses Kafka in prod?
- Stream processing frameworks?
- Kafka Streams?
- PAPI?
6
Antony Stubbs - New Zealand Made
● @psynikal
● github.com/astubbs
● Confluent for ~2 years
● Consultant in EMEA
● Kafka Streams in my favourite
8
Topologies, Trolls and Troglodytes
9
KStream<Integer, Integer> input =
builder.stream("numbers-topic");
// Stateless computation
KStream<Integer, Integer> doubled =
input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.groupByKey()
.reduce((v1, v2) -> v1 + v2, "sum-of-odds");
What is the DSL?
10
What is the DSL?
11
What is #process?
Flexibility
12
What is #process?
Freedom
13
What is #process?
Power
But with great power...
but not that much… :)
14
What is #process?
KStream#process({magic})
15
What is #process?
interface Processor<K, V> {
void process(K key, V value)
}
16
What is #transform?
interface Transformer<K, V, R> {
R transform(K key, V value)
}
17
What is #process?
interface Processor<K, V> {
public void init(ProcessorContext context)
void process(K key, V value)
<... snip ...>
}
19
PAPI vs DSL
When should you use which?
● “It depends”
● DSL
○ Easy
○ Can do a lot with the blocks
■ Clever with data structures
○ If it fits nicely, use it
● PAPI can be more advanced, but also super flexible
○ Build reusable processors for your company
○ Doesn’t have the “nice“ utility functions like count - but that’s the point
○ Can ”shoot your own foot”
○ Be responsible
● Don’t bend over backwards to fit your model to the DSL
20
A Combination - the Best of Both Worlds
ppvStream.groupByKey().reduce({ newState, ktableEntry ->
<.... reduce function …>
}).toStream().transform({
new Transformer<>((){
<.... do something complicated…>
}
}).mapValues({ v ->
<.... map function …>
}).to("output-deltas-v2")
21
What is a State Store? IQ?
A local database - RocksDB
● K/V Store
● High speed
● Spills to disk
An optimisation?
- Moves the state to the processing
What are Interactive Queries (IQ)?
By nature of Kafka
● Distributed
● Redundant
● Backed onto Kafka (optionally)
● Highly available (Kafka + Standby tasks)
22
Simple:
- Deduplication (vs EOS)
- Secondary indices
- Need to do something periodically
- TTL Caches
- Synchronous state checking
Working With Processors and State Stores
Advanced:
- State recalculation
- Expiring data efficiently
- Global KTable triggered joins (DSL work around)
- Probabilistic counting with custom store implementations...
24
State Subselects - Compound Keys
State stores have the #put, #get and #range method call - this brings some new magic...
Order Items
- Select all orders items from my state store, for this order key
- Avoids building larger and larger compounded values
- Which will take longer and longer to update - using individual entries instead
Time
- Timestamp compounded with key
- Great for retrieving entries within computable time windows (hint hint)
- Great for scanning over a range of entries
25
State Subselects - Secondary Indices
● Think ~“tables”...
● Serving an “Interactive Query” for a few possible fields...
○ Can’t construct compound keys for multiple combinations
○ One State Store per combination
○ Upon inserting the primary key/value
■ Also insert into the extra stores the field<->key mappings
■ Upon query, query against the appropriate store that holds the mappings for the requested field
■ Collect the possibly many values (keys) and retrieve the entire object(s) from the primary store
KEY → VALUE
TYPE+KEY → KEY
EMAIL → KEY
29
Case Study - leveraging state stores
WIPERS = ON WIPERS = OFF WIPERS = ?
32
Dynamic,
dependent,
aggregate,
on demand recalculation,
from -
out of order data.
So, need to go back and recalculate...
33
DSL despair!
Can’t update aggregates outside of the aggregate that has been triggered...
Potential messy DSL solution (bend over backwards) - synthetic events!
- Publish an event back to the topic with the correct time stamp and information needed to retrigger the
other aggregates
- You need to calculate / all / the correct time stamps
- Pollutes the stream with fake data
- Is unnatural / smells
- Breaks out of the KS transaction (EOS)
34
Enter #process()
Keep track of our aggregates ourselves
- Need to calculate our own time buckets
- Time query or store for possible future buckets
- All kept with in the KS context (no producer break out for synthetic events)
35
Case Study - leveraging state stores
36
Punc’tuation.?
● What are Punctuations?
● What is Wall Time?
● What is Event Time?
37
● DSL has window retention periods
○ We need state - but for how long?
○ KTable TTL? (Bounded vs unbounded keyset)
#process window retention period you may ask?
● #process TTL?
○ Using punctuation - scan periodically through the state
store and delete all buckets that are beyond our retention
period
○ Do TTL on “KTable” type data
● How? Compound keys...
38
Future Event Triggers
Expiring credit cards in real time or some other future timed action
- don’t want to poll all entries and check action times
- need to be able to expire tasks
Time as secondary index (either compound key or secondary state store)
- range select on all keys with time value before now
- take action
- emit action taken event (context.forward to specific node or emit)
- delete entry
- poll state store with range select every ~second,
- or schedule next punctuator to run at timestamp of next event
- need to update
39
Database Changelog Derivation
Problem:
● DB CDC doesn’t emit deltas, only full state
● Can’t see what’s changed in the document
Solution:
● Derive deltas from full state, stored in a stateful stream processor
● Can use KTable tuples
Issue:
● No TTL - enter PAPI
40
Distributed one to MANY MANY MANY late (maybe) joins
What’s the problem?
- Effectively re-joining missed joins once the right hand side arrives
52
Sometimes it’s useful to avoid some DSL overheads
● Combine operators
● Avoid repartitioning in some cases
● etc...
Optimising Topologies
Beware inconsistent hashing...
53
Speaking of topology optimisation...
Two phase topology building
- First optimisation is reusing intermediate rekey topics
- Avoids branched on demand rekey further down the DAG by detecting the rekey and moving it up
immediately
- manually achievable by forcing rekey straight away with #through
Global topology optimisation coming in 2.1
54
Now I did ask about KSQL after all...
Check out it KSQL if you haven’t already…
● Abstraction over Kafka Streams
● Languages outside of the JVM
● Non programmers
● Among others...
KSQL User Defined Functions in CP 5.0!
● Parallels with Processors combined with the DSL, you can now insert more complex functionality
into ksql
○ Eg trained machine learning model and use as UDF in KSQL
55
Where to next? & We’re hiring!
● github.com/confluentinc/
kafka-streams-examples
● “Ben Stopford” on youtube.com
● Kafka Streams playlist on confluentinc
youtube
● Consulting services? Contact
sales@confluent.io
Further reading
● confluent.io/resources/
● docs.confluent.io/current/streams/
● confluentinc on Youtube
● github.com/astubbs
● @psynikal
Come find me for Q&A later...
Don’t be afraid of #process and do drop down from the DSL for some
operations!
Join us! https://www.confluent.io/careers/
56
THANK YOU!
Learn more:
confluent.io/download
confluent.io/product/ksql/
confluent.io/confluent-cloud/

More Related Content

What's hot

Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopTao Feng
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
 
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaJack Gudenkauf
 
Visualizing a global DNS network with open source tools
Visualizing a global DNS network with open source toolsVisualizing a global DNS network with open source tools
Visualizing a global DNS network with open source toolsAPNIC
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Guozhang Wang
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsParis Carbone
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkDongwon Kim
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaAbrar Sheikh
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineTianlun Zhang
 

What's hot (20)

Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
 
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark vertica
 
Visualizing a global DNS network with open source tools
Visualizing a global DNS network with open source toolsVisualizing a global DNS network with open source tools
Visualizing a global DNS network with open source tools
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream Windows
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into Kafka
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engine
 

Similar to Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API

Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...confluent
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02Francisco Gonçalves
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBTim Callaghan
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...confluent
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed SystemsSoftware Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systemsadrianionel
 
Distributed Task Scheduling with Akka, Kafka and Cassandra
Distributed Task Scheduling with Akka, Kafka and CassandraDistributed Task Scheduling with Akka, Kafka and Cassandra
Distributed Task Scheduling with Akka, Kafka and CassandraDavid van Geest
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Shared Database Concurrency
Shared Database ConcurrencyShared Database Concurrency
Shared Database ConcurrencyAivars Kalvans
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw
 

Similar to Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API (20)

Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed SystemsSoftware Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
 
Distributed Task Scheduling with Akka, Kafka and Cassandra
Distributed Task Scheduling with Akka, Kafka and CassandraDistributed Task Scheduling with Akka, Kafka and Cassandra
Distributed Task Scheduling with Akka, Kafka and Cassandra
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Shared Database Concurrency
Shared Database ConcurrencyShared Database Concurrency
Shared Database Concurrency
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API

  • 1. 1 Beyond the DSL - #process If you’re PAPI and you know it, merge your streams! Antony Stubbs Solution Architect EMEA Confluent MÜNCHEN - 09. OKTOBER 2018
  • 2. 3 Kafka Streams DSL - the Easy Path
  • 3. 4 DSL - but eventually...
  • 4. 5 Quick Scientific™ survey Who in the audience uses Kafka in prod? - Stream processing frameworks? - Kafka Streams? - PAPI?
  • 5. 6 Antony Stubbs - New Zealand Made ● @psynikal ● github.com/astubbs ● Confluent for ~2 years ● Consultant in EMEA ● Kafka Streams in my favourite
  • 7. 9 KStream<Integer, Integer> input = builder.stream("numbers-topic"); // Stateless computation KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds"); What is the DSL?
  • 11. 13 What is #process? Power But with great power... but not that much… :)
  • 13. 15 What is #process? interface Processor<K, V> { void process(K key, V value) }
  • 14. 16 What is #transform? interface Transformer<K, V, R> { R transform(K key, V value) }
  • 15. 17 What is #process? interface Processor<K, V> { public void init(ProcessorContext context) void process(K key, V value) <... snip ...> }
  • 16. 19 PAPI vs DSL When should you use which? ● “It depends” ● DSL ○ Easy ○ Can do a lot with the blocks ■ Clever with data structures ○ If it fits nicely, use it ● PAPI can be more advanced, but also super flexible ○ Build reusable processors for your company ○ Doesn’t have the “nice“ utility functions like count - but that’s the point ○ Can ”shoot your own foot” ○ Be responsible ● Don’t bend over backwards to fit your model to the DSL
  • 17. 20 A Combination - the Best of Both Worlds ppvStream.groupByKey().reduce({ newState, ktableEntry -> <.... reduce function …> }).toStream().transform({ new Transformer<>((){ <.... do something complicated…> } }).mapValues({ v -> <.... map function …> }).to("output-deltas-v2")
  • 18. 21 What is a State Store? IQ? A local database - RocksDB ● K/V Store ● High speed ● Spills to disk An optimisation? - Moves the state to the processing What are Interactive Queries (IQ)? By nature of Kafka ● Distributed ● Redundant ● Backed onto Kafka (optionally) ● Highly available (Kafka + Standby tasks)
  • 19. 22 Simple: - Deduplication (vs EOS) - Secondary indices - Need to do something periodically - TTL Caches - Synchronous state checking Working With Processors and State Stores Advanced: - State recalculation - Expiring data efficiently - Global KTable triggered joins (DSL work around) - Probabilistic counting with custom store implementations...
  • 20. 24 State Subselects - Compound Keys State stores have the #put, #get and #range method call - this brings some new magic... Order Items - Select all orders items from my state store, for this order key - Avoids building larger and larger compounded values - Which will take longer and longer to update - using individual entries instead Time - Timestamp compounded with key - Great for retrieving entries within computable time windows (hint hint) - Great for scanning over a range of entries
  • 21. 25 State Subselects - Secondary Indices ● Think ~“tables”... ● Serving an “Interactive Query” for a few possible fields... ○ Can’t construct compound keys for multiple combinations ○ One State Store per combination ○ Upon inserting the primary key/value ■ Also insert into the extra stores the field<->key mappings ■ Upon query, query against the appropriate store that holds the mappings for the requested field ■ Collect the possibly many values (keys) and retrieve the entire object(s) from the primary store KEY → VALUE TYPE+KEY → KEY EMAIL → KEY
  • 22. 29 Case Study - leveraging state stores WIPERS = ON WIPERS = OFF WIPERS = ?
  • 23. 32 Dynamic, dependent, aggregate, on demand recalculation, from - out of order data. So, need to go back and recalculate...
  • 24. 33 DSL despair! Can’t update aggregates outside of the aggregate that has been triggered... Potential messy DSL solution (bend over backwards) - synthetic events! - Publish an event back to the topic with the correct time stamp and information needed to retrigger the other aggregates - You need to calculate / all / the correct time stamps - Pollutes the stream with fake data - Is unnatural / smells - Breaks out of the KS transaction (EOS)
  • 25. 34 Enter #process() Keep track of our aggregates ourselves - Need to calculate our own time buckets - Time query or store for possible future buckets - All kept with in the KS context (no producer break out for synthetic events)
  • 26. 35 Case Study - leveraging state stores
  • 27. 36 Punc’tuation.? ● What are Punctuations? ● What is Wall Time? ● What is Event Time?
  • 28. 37 ● DSL has window retention periods ○ We need state - but for how long? ○ KTable TTL? (Bounded vs unbounded keyset) #process window retention period you may ask? ● #process TTL? ○ Using punctuation - scan periodically through the state store and delete all buckets that are beyond our retention period ○ Do TTL on “KTable” type data ● How? Compound keys...
  • 29. 38 Future Event Triggers Expiring credit cards in real time or some other future timed action - don’t want to poll all entries and check action times - need to be able to expire tasks Time as secondary index (either compound key or secondary state store) - range select on all keys with time value before now - take action - emit action taken event (context.forward to specific node or emit) - delete entry - poll state store with range select every ~second, - or schedule next punctuator to run at timestamp of next event - need to update
  • 30. 39 Database Changelog Derivation Problem: ● DB CDC doesn’t emit deltas, only full state ● Can’t see what’s changed in the document Solution: ● Derive deltas from full state, stored in a stateful stream processor ● Can use KTable tuples Issue: ● No TTL - enter PAPI
  • 31. 40 Distributed one to MANY MANY MANY late (maybe) joins What’s the problem? - Effectively re-joining missed joins once the right hand side arrives
  • 32. 52 Sometimes it’s useful to avoid some DSL overheads ● Combine operators ● Avoid repartitioning in some cases ● etc... Optimising Topologies Beware inconsistent hashing...
  • 33. 53 Speaking of topology optimisation... Two phase topology building - First optimisation is reusing intermediate rekey topics - Avoids branched on demand rekey further down the DAG by detecting the rekey and moving it up immediately - manually achievable by forcing rekey straight away with #through Global topology optimisation coming in 2.1
  • 34. 54 Now I did ask about KSQL after all... Check out it KSQL if you haven’t already… ● Abstraction over Kafka Streams ● Languages outside of the JVM ● Non programmers ● Among others... KSQL User Defined Functions in CP 5.0! ● Parallels with Processors combined with the DSL, you can now insert more complex functionality into ksql ○ Eg trained machine learning model and use as UDF in KSQL
  • 35. 55 Where to next? & We’re hiring! ● github.com/confluentinc/ kafka-streams-examples ● “Ben Stopford” on youtube.com ● Kafka Streams playlist on confluentinc youtube ● Consulting services? Contact sales@confluent.io Further reading ● confluent.io/resources/ ● docs.confluent.io/current/streams/ ● confluentinc on Youtube ● github.com/astubbs ● @psynikal Come find me for Q&A later... Don’t be afraid of #process and do drop down from the DSL for some operations! Join us! https://www.confluent.io/careers/