Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluent) Kafka Summit London 2019

1
Riddles of Streaming
Code Puzzles for Fun and Profit
Nick Dearden

4
After reading these 3 records into a KTable, what
is the value for Alice ?
Out-of-Order
Data
A. Depends on the setting of max.task.idle.ms
B. Liverpool
C. Barcelona
D. Who cares, I’m from Manchester!
Time Key Value
T1 Alice Chelsea
T3 Alice Liverpool
T2 Alice Barcelona

55
How to re-order ?
● Use a state-store
● Define a window of maximum-
lateness
● Use punctuate() to flush
● Similar to DeDuplication pattern
and example (see
https://github.com/confluentinc/
kafka-streams-examples)
(Aside)

6
I heard there’s a RocksDB created for every
stateful operation, like Joins and Aggregations.
How do I manage/backup/recover those DBs ?
Stateful
Operations
A. Ensure they are on redundant storage and take
periodic snapshots
B. Implement the RocksDBConfigSetter interface
C. Kafka Streams takes care of it for me auto-magically
D. Who needs backups anyway ?

7
Fault-Tolerance, powered by Kafka
Server A:
“I do stateful stream
processing, like tables,
joins, aggregations.”
“streaming
restore” of
A’s local state to BChangelog Topic
“streaming
backup” of
A’s local state
KSQL / Kafka
Streams App
Kafka
A key challenge of distributed stream processing is fault-tolerant state.
State is automatically migrated
in case of server failure
Server B:
“I restore the state and
continue processing
where
server A stopped.”

8
RocksDBConfigSetter
Properties streamsConfig = new Properties();
streamsConfig.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, CustomRocksDBConfig.class);

9
Handling Bad
Input Data
How do you deal with malformed incoming data
records, which may cause the library to choke
before your own code is ever reached ?
A. No need, my data is perfect!
B. Configure an UncaughtExceptionHandler
C. Use a DeserializationExceptionHandler
D. Use a custom exception handler for each topic

10
DeserializationExceptionHandler

11
Exception
Handling
In Your Own
Code
How should you handle possible exceptions in
your user code to ensure your app can always
continue to the next record ?
A. Use a custom exception handler for each operation
B. Configure an UncaughtExceptionHandler
C. I’m an l33t coder, my app has no bugs!
D. It depends, maybe I can’t ?

12
User Exception Handling
• Some operations can return a default or fall-back value
- e.g. a filter() takes a predicate function. Inside that function
you could try..catch..return false;
• Many operations, like map() or a ValueJoiner(), must return
something. Note that null has special meaning here!
- you could add a subsequent filter() to remove the nulls

13
Handling Bad
Output Data
How do you handle the case where output
records cannot be produced ?
A. Use a SerializationExceptionHandler
B. No need, my brokers are guaranteed 100% available !
C. Configure an UncaughtExceptionHandler
D. Use a ProductionExceptionHandler

15
When thinking about scaling out (and ignoring
standby tasks for a moment) the maximum
number of useful stream-threads I can assign to
my app are….?
CPUs,
Threads,
Topologies
A. Max(partition count for any input topic)
B. Sum(partition count of all my input topics)
C. It depends…
D. The number of vCPUs in each server

16
Topologies, Tasks, & Partitions
• Topologies are divided into sub-topologies at read-write boundaries
- Read-process-write loop
• Within a sub-topology, tasks created for the max input partition count
- If multiple input topics, they are being co-processed, e.g. joins
- Internal topics, such as *-rekey ones, are counted too
• Each task is assigned to at most one StreamThread
- A StreamThread results in at least 3 JVM threads being created
- A StreamThread has it’s own Consumer and Producer instance

1717
Topologies, Tasks, &
Partitions
Divide a topology into read-
process-write sub-topologies
Thanks to Andy Bryant for the diagram!

18
Joins What else should I do, if both my input topics
have the same keys, to join them together ?
A. Ensure they have the same partition counts
B. Nothing - It Just WorksTM
C. Dance naked around an oak tree at full moon
D. It depends…

19
Joining Out-of-Order Data
● Joins are always made in event-time
● Event-time advances as new input records are
consumed
● Prior to KIP-353 (Apache Kafka 2.1) this could
be problematic, especially if uneven traffic
rates across partitions
● New config max.task.idle.ms to trade
latency vs possibility of out-of-order data
across partitions

20
Joining Same-Time Data
● E.g. capturing dbms change events via CDC
● Data changes occur within a transaction, at
the exact same timestamp
● In the case of a stream-table join, it matters for
the join which event is consumed first
● Consider a Custom TimestampExtractor – but
not a panacea!

21
Do I need to explicitly create a re-key topic, used
with .to(“topic”) or through(“topic”) ?
Re-keying and
re-partitioning
a topic
A. Only if the moon is full
B. Only if there is no subsequent join or groupBy step
C. Kafka Streams takes care of it for me auto-magically
D. Yes, always

22
Re-Partitioning
• Kafka Streams auto-creates a *-rekey topic for you, with correct
partition count, when you selectKey followed by a key-dependent op
• The third line is internally re-written like this:
• If no subsequent join or groupBy, you have to pre-provision the topic

25
Uncaught Exception Handling

26
CREATE STREAM vip_actions AS
SELECT userid, page, action,
zipcode
FROM clickstream c
LEFT JOIN users u ON c.userid =
u.user_id
WHERE u.level = 'Platinum';

Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluent) Kafka Summit London 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluent) Kafka Summit London 2019

Similar to Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluent) Kafka Summit London 2019 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluent) Kafka Summit London 2019