Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection with Tomas Neubauer

Quix Streams — Kafka Summit 2023 | 1
Quix Streams
Building Real-Time Applications at Scale

Tomas Neubauer
Previously McLaren technical lead
CTO & Co-founder, Quix
Hello, nice to meet you! 👋

Racing
background
Roots in real-time data processing in the
most extreme, time-critical environment.
● 50,000 channels per car
● 1.5 kHz per channel
● 1,000s realtime models and simulations

Now raise your hand if you are using…

Kafka

Streaming

Python

Goal
Crash detection
phone-data crashes
Fitness
app

● Architecture
● ML deployment
● Streaming landscape
● How it works
● Demo - Let's build it
Content

ML Deployment
REST API vs Streaming

phone-data
Websocket
gateway
alerts
Websocket
gateway
ANALYSIS & TRAINING
Trained
model
SageMaker
Architecture
Crash detection

ML Deployment with API
API REQUEST
WEB API
API RESPONSE
gX gY gZ gTotal
0.5 0.3 0.1 0.9
gX gY gZ gTotal Crash
0.5 0.3 0.1 0.9 1
SERVICE

Issues with REST APIs
REST API vs Streaming

Problems with REST API
API REQUEST
gX gY gZ gTotal
● CPU overhead
● Introducing delay
● Requests gets lost in case of service downtime or slow performance
WEB API
SERVICE

gX gY gZ gTotal
WEB API
SERVICE
API REQUEST

API REQUEST
gX gY gZ gTotal
API REQUEST
gX gY gZ gTotal
WEB API
SERVICE
WEB API
SERVICE

Stream processing
applications
An overview of stream
processing approaches

When you building stream processing applications with Kafka, there are two
options:
1. Just build an application that uses the Kafka producer and consumer APIs
directly
2. Adopt a full-fledged stream processing framework (Flink, Spark streaming,
Beam etc.)
Stream processing applications

● Works for simple stuff like one-message-at-a-time processing
● No external dependencies like JVM
● Gets very complicated when stateful processing is needed like calculation
aggregations or joining multiple streams
Kafka producer and consumer APIs

● Fully fledged stream processing frameworks solves stateful,
more complex operations
● But it is for a cost of increased complexity in many dimensions:
○ Java dependency
○ Deployment gets difficult because code is not running on its own but
in server side cluster (Flink cluster or Spark cluster)
○ Debugging is difficult
○ Performance optimization is difficult
○ Gets even worse when we combine synchronous architecture with
asynchronous in one application
Stream processing frameworks

JAR ﬁles…

Connecting Flink to Kafka is difficult

SQL looks easy to use but…

● Poor development experience
○ Logs only accessible from server, no debugging possible
● Performance hit caused by interface between JVM and Python
UDFs are nasty

DEBUGGING!!! 🐛🐛🐛

● Combining Kafka API approach with stream processing library
● Abstraction from key-value messages of Kafka API to virtual tables
● Standalone library that runs:
○ Locally for development and debugging
○ In docker or in Kubernetes for production deployments at scale
Is there a third way?

1. Messages in topic 2. Split messages into individual streams
4. Messages decomposed into rows
5. Memory state
updated from
incoming rows/series
6. State persistence
3. Message converted to tables
7. State and incoming data
is combined to output that
is sent to output topic
Commit offsets
Stateful processing with Pub & Sub client libraries

Quix Streams
1. Messages in topic 2. Messages decomposed as
rows available via pandas API
3. Messages processed
through pipeline defined as
pandas operations. Output
streamed to output topic.
● Automatic state management
● Automatic checkpointing
● Automatic message serialization/deserialization

How it works
Kafka + Kubernetes + Python

Our approach to stream processing
Containers
Containers running in
Kubernetes scaling hand
to hand with Kafka for
compute scalability.
Kafka
Handle your data reliably
and efficiently in memory
with Kafka. Using Kafka
partitions, replica system and
persistence to deliver
scalability and robustness.
Python
Python gives you flexibility.
It lets you transform data,
not just query it. From simple
filtering to ML use cases like
video processing.

Processing with streaming
SUB
gForce
X
gForce
Y
gForce
Z
0.5 0.3 0.1
gForce
X
gForce
Y
gForce
Z
gForce
Total
Crash
0.5 0.3 0.1 0.9 1
INPUT TOPIC
APP
OUTPUT TOPIC
PUB

Scale
SUB
gForce
X
gForce
Y
gForce
Z
0.5 0.3 0.1
gForce
X
gForce
Y
gForce
Z
gForce
Total
Crash
0.5 0.3 0.1 0.9 1
INPUT TOPIC OUTPUT TOPIC
PUB

Fault tolerant
SUB
gForce
X
gForce
Y
gForce
Z
0.5 0.3 0.1
gForce
X
gForce
Y
gForce
Z
gForce
Total
Crash
0.5 0.3 0.1 0.9 1
INPUT TOPIC OUTPUT TOPIC
PUB

Let’s build it!
Demo

GitHub
Try Quix Streams

37
info@quix.io | www.quix.io
Thank you

Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection with Tomas Neubauer

Recommended

Recommended

More Related Content

Similar to Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection with Tomas Neubauer

Similar to Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection with Tomas Neubauer (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection with Tomas Neubauer