A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event Driven Platform (Peter Hannam, 6point6) Kafka Summit London 2019

Supporting iterative
development of an Event
Driven Platform

The Problem
Changes such as business logic modifications, adding enrichments, or bug fixing could
invalidate cached and persisted data.
And consumers need to be protected from changes by upstream producers.
How do we:
• Support active development and iterative releases
• Deal with failure pragmatically
• Only serve current and correct data
• Avoid breaching non-functional requirements
With agile approaches we (typically) don’t do big-bang releases but develop business
functionality through frequent incremental releases.

Classic approaches don’t meet these needs
Calculate on-demand
Pull model: Business logic is
applied on request.
Bulk recalculation ƛ
Push model: Result of business
logic is stored in a persisted store,
with the calculation running as a
large job.
Stream replay ϰ
Push model: Business logic is
applied to single messages or
small windows and then sent to
another stream or stored/cached.
• No risk of staleness as data is
always fresh
• Cannot use pure-streaming as
needs random access to data
• May breach NFRs if calculation
takes too long or dependent
services are slow
• Streaming!
• Risk of staleness or missing
data
• Can be used in conjunction
with streaming
• Risk of staleness or missing
data as request may happen
whilst results are being
updated
• Depending on scenarios,
tends to be more efficient for
large reprocessing

Our approach
• Everything is versioned:
• Importantly, the persisted data is as well
• REST service requests appropriately
versioned persisted data
• Persisted data updated after business
logic change through bulk processing:
• Not really a lambda architecture à only run
bulk processing during uplift
• REST service could request data
waiting to be recalculated:
• Have a mechanism to ‘fast track’
calculations of requested stale data

Fully streaming
• Single code-base supports both BAU
and stale-data requests
• Need a separate topic and job for fast
tracked recalculation requests to allow
them to be prioritised:
• Cannot currently configure Consumers &
Streams to prioritise a topic (KIP-349)
• Simpler streaming jobs but increased
complexity in the REST services:
• REST services have to poll database until
version is updated
• REST services have to know how to put
messages onto the recalc topic%

Encapsulated REST services
• Streaming jobs are simple co-
ordinators:
• REST services contain the business logic
and do all data processing
• REST services are simpler and self-
contained for stale data requests:
• Don’t need to put messages back onto a
topic
• Requests simply block until the data is
updated
• Topic management is much simpler…
• But the complexity has moved into the
streaming jobs:
• Have to handle failure states for REST such
as overload, timeouts, transactions, etc

Bulk reprocessing
• ƛ-style: batch-job
• Requires original messages to have been saved to non-streaming store
• If business logic is held within REST services, can simply call them
• If stream job, can copy data back to the original stream or a new one
• Could also have the bulk job run the business logic, but then have duplicity issues
• ϰ-style: stream replay
• Have to be careful of caches: single topic can’t efficiently serve both the head and random access
• May be better to have two topics: one for live data and one for replays
• Simpler deployment stack as doesn’t require extra technology à simply reset the offset

Defensive strategies
• Producers validate all messages against a JSON schema before sending
• Consumers accept a fixed range of schema versions and validate payloads:
• Ensure standards compliance of producer by ignoring incorrectly headered messages
• Will ignore any messages which fall outside of configured schema/library range
• Messages which fail schema validation are sent to a DLQ
• Schema failures over a configured windowed-threshold could kill the job
• Retry and failure functionality
• All interactions with services/systems use patterns such as circuit breakers and
back-pressure
• Baked in libraries such as Micrometer and OpenTracing to enforce integration with
operations stack

Retry & failure
• Developers could choose from multiple
retry and failure options
• Could combine strategies:
• Retry 5 times, then pause 5 times, then fail
• Pause and park had exponential time
strategies:
• Short pause now but then give a service
more time to recover
• Retry:
• Retry message immediately
• Pause:
• Pauses the job entirely
• Used when ordering is important
• Park:
• Puts message on hold but keeps processing
topic
• Could use multiple topics or RocksDB
• Used when ordering isn’t important
• Fail:
• Sends message to DLQ immediately
• Schema validation errors and the like

DLQs
• If all else fails, ask a human to intervene
• May need to kill the job, if message ordering is important
• Need to provide tooling for admins to investigate and replay messages from the DLQ
• Have to think about replays:
• Track attempts so that a message doesn’t keep looping through
• What happens if you replay the topic: do you want to replay dead messages?
• Typically only saw messages on the DLQ during pre-prod testing:
• Retry strategies dealt with most services issues
• Schema enforcement in producer removed chance of corrupt messages

Outcomes
• Reduced data risks surrounding deployments:
• Only correct data would be served via services
• Platform downtime was reduced:
• Allowing upgrade scripts to be run during live hours reduced deployment pressures
• Provided reassurance around NFRs during upgrades:
• Platform was sized to ensure NFRs weren’t breached during deployments
• Customer bought into trade-offs on increased latency for increased correctness
• Provided strong contracts between teams:
• Defensive measures allowed developers to focus on business logic

A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event Driven Platform (Peter Hannam, 6point6) Kafka Summit London 2019

A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event Driven Platform (Peter Hannam, 6point6) Kafka Summit London 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event Driven Platform (Peter Hannam, 6point6) Kafka Summit London 2019

Similar to A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event Driven Platform (Peter Hannam, 6point6) Kafka Summit London 2019 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event Driven Platform (Peter Hannam, 6point6) Kafka Summit London 2019