From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invitae) Kafka Summit SF 2019

From Zero to Streaming
Healthcare
Alex Kouznetsov
Invitae

Overview
● Our first Kafka Streams project
● Techniques, challenges and lessons learned
● Ambitious amount of slides — fast high level overview, less technical detail

The project
● Sendout to a partner lab
● Us: Python, Django REST, RMQ
● Them: poll-only HTTP API
○ create order for a sample → ok
○ query sample status → status
○ query for test results (by time range) → list of order IDs
○ get report for an order ID → report payload

The plan
● Kafka, Schema Registry, KStreams, Scala

Lesson 1: Data is the Application

Data-first mindset
● Code is transient, data lives on
● … so don’t hide it.
● Data defines behavior (especially in streaming apps)
● Data defines architecture
● Abolition of data ownership
● Every executable unit of business logic is a transformation of state
● Your business is a function
● Doesn’t that look like FP?
○ Applications are pure functions
○ Applications form a declarative pipeline
○ Inputs and outputs are strongly typed

Solving for data-first
● Breaking away from imperative mindset
● Model data to directly represent business logic (i.e. truth)
● Shifting logic from code to data by increasing granularity and precision
○ Higher precision → simpler transformations
○ Harder to refactor data than code
● How much design is enough? Until:
○ It makes sense on paper
○ Ephemeral state is eliminated
○ You don’t need logs any more
● Hypothesis: we can build the whole thing as a streaming app

Kafka != Messaging System
● Expect to see them messages again
● Everything must replay consistently
● Everything must replay safely
○ Trickle-down idempotency and boundaries
● Remember what you did — contribute to total state, then aggregate

Knowing when (and how) to crash
● Almost never :)
● Two crash reasons: bad circumstance, bad data
● Crashing on data is same as not crashing plus one manual step
● Don’t waste state: turn everything into data and write to a topic
● Every transformation should produce something (i.e., don’t .foreach())
● Data granularity helps layer strictness in smaller increments, minimize failure
impact

Configuration as code
● Why not our application code?

Topics and schemas: nice things
● Topic objects are values → “Find usages”
● Serde config is embedded in topic type
● StreamBuilder.from[K,V](Topic[_, _]): KStream[K,V] fixes K,V
● Having to think about (de)serializers: never!
● Topic creation, schema registration and compatibility checks: automated!

Solving scheduling
● Need to periodically rerun aggregates to produce new messages
● Problem: Streams client reacts only to new messages
● Solution: send a “trigger” message and .leftJoin() with KTable
● Needs a repartitioning trick
○ aggregate into a single-valued table, value is a set, or
○ make buckets, or
○ fan out trigger to match all keys (if known in advance)
● 👍 can source triggers from anywhere (we made a cron-like connector)
● avoid reacting to accumulated triggers

Solving IO
● Problem: need to call APIs
● Connectors?
● Problem: want to handle API calls as side effects in streaming context
● Solution: doing it in place works
○ For fast/idempotent effects

Solving ordering
● Sometimes request and
response objects are dependent
enough to need co-partitioning
● Total ordering ensures
consistency of log
● Case study: consecutive time
interval queries

Transparency: topology diagrams
● Writing stream apps is fun at first, but then topology grows
● Problem: understanding how the large application works
● Solution: Topology.describe() all the things and glue

Transparency: metrics
● JMX/Prometheus metrics from CP Helm charts are OK for many cases
● For others, we make a streaming app :)
● Aggregate the metrics we need, then two options:
○ Spin up a Prometheus server thread, reading from state store
○ Push to push-gateway

Transparency: tracing
● Zipkin with customized Brave integration
● Write traces to a topic (using a Kafka Reporter from Zipkin API)
● Provide DSL support for emitting spans with tags and annotations

Scala + FP + cats = MEOW
● FP: interesting buy-in dynamics, safer more generic code in the long run
● Errors as values, natural conversion of effects/errors to streams
● Cats makes FP better, also excellent onboarding tool
● Type class pattern helps solve Avro and Serdes
● Managed errors, IO and Effects in tests
● Tagless Final → implement and test components with ease

Testing
● TopologyTestDriver + multiple simultaneous topologies = integration-like unit
tests
● Modular emulators for external systems (both embeddable and standalone)
● Time provider connected to TTD’s own time base

Topic planning
● Not all topics are created equal (primary vs derived, internal vs exposed)
● Consider long term retention, long lived schema
● Think about injection points

How did we do?
● Pretty well
● Launched on time with few surprises
○ Sudden offset loss, some non-idempotency leakage
○ IO outliving transactions, checkpoints, internal stores
○ Request/response dependency guesses were almost all correct
● System is fault-tolerant and self-recovering
● Framework for onboarding
● Were Streams the right choice?
○ Principled functional pipeline
○ EXACTLY_ONCE
○ Tracing, topology generation
○ Not all use cases, but many

Future TODOs (WIP)
● Open source more, externalize blog
● Improved topology derivation
● Better declarative side effects (KStreams DSL, topology)
● Formalize decoupled IO apps (micro sagas)

https://github.com/invitae/
unthingable@GH
alexk@invitae.com

From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invitae) Kafka Summit SF 2019

More Related Content

What's hot

Similar to From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invitae) Kafka Summit SF 2019

More from confluent

Recently uploaded

From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invitae) Kafka Summit SF 2019