Bangalore Meetup - Enable realtime machine learning with streaming data

Fresh Predictions
Using Real-Time
Data for Machine
Learning
Christina Lin
The Redpanda Lady
With
Redpanda
Data
Transforms

Christina Lin
Developer Advocate, Redpanda
aka. The Redpanda Lady
© 2024 REDPANDA DATA
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile
Integration
Data
Mesh
Active MQ
Living data stack
Resilience - handle failures and scale gracefully
Elasticity – infrastructure that can scale dynamically
Decentralization - data ownership, empowering
individual teams
Performance - low latency and high throughput
Autonomy – self service, define quality, and access
Nimble - efficient data movement
Distributed -distributed data processing for cloud native
Agility – quickly respond to change in data

Agenda
• Streamlined data ingestion and transformation
• Real-time machine learning
• Demo

LLM
RAG
GenAI
Prompt
Engineering
Natural
Language
Generation
Natural
Language
Processing
Deep
Learning
Vector/ Semantic
search
Neural
Network

Application
LLM
LLM
LLM
How do you build application with AI?

When is the next eclipse
when where is the best
place to see it?
April 8, 2024 are in
Exmouth, Australia and
East Timor

Application
LLM
LLM
LLM
• Performance problem
• Incorrect, unpredictable result
• Text-based, hard to customize
with small set of data
• $$$$$$$

Events
Events
Events
Event
Data Layer
Model
Prediction
Model
Testing
Model
Training
Machine Learning
Events
Events
Events
Event
Dataset
Dataset
Dataset
Dataset
Dataset
Events
Events
Events
Reference
data
Inference
Model
Registry
APP
APP
Model
Model
Model
Streaming Architecture for AI

Customized
Model
Customized
Model
Customized
Model
LLM
LLM
Better AI implementation
Retrieval
Augmented
Generation
Customized Domain
trained models
Customized Domain
trained models
Fine-tuned

RAG & Stream & EDA
Broker
APP
LLM
Vector
DB
APP
Model
Service
APP
Model
Broker
Aggregate

RAG & Stream & EDA
Broker
NPC1
LLM
Broker
NPC2
LLM
NPC3
LLM
WebSocket
Topic
Topic
Topic

Redpanda in 3 mins
Broker
Zookeeper/
KRaft
JVM
Page
Cache
Page
Cache
Page
Cache
Schema
Registry
Http Proxy Client
Connector
Debezium
Client
Disk

Redpanda in 3 mins
Broker
Zookeeper/
KRaft
JVM
Page
Cache
Page
Cache
Page
Cache
Schema
Registry
Http Proxy Client
Connector
Debezium
Client
Disk
WASM

Stateless
Streaming Pipeline
Transform
format Change, masking, filtering, validating
Dispatch, Wiretap
Spilt, multiple destination
Control
reroute
Normalize/ Denormalize Enrich
Multiple ingestion
Stateful
Streaming Pipeline
Complex event processing
Time-window based processing
Enrich
Multiple ingestion
Micro batch Pipeline
Transform for large output (Dataset)
Partitioning Split workload
Analytics
batch
Pipeline
Analytics large volume (legacy)
Transform large output (Dataset, legacy)
Transport large unstructured data
Better scalability for pipelines

Data
Pipeline
Broker
Data Ping-Pong
Data
Pipeline
Over the Network - Slow
Data
Pipeline

Redpanda Data Transform
Stateless
Streaming Pipeline Transform
format Change, masking, filtering,
validating
Dispatch, Wiretap
Spilt, multiple destination Control
reroute
Normalize/ Denormalize
Enrich
Multiple ingestion
WASM
WebAssembly
Binary instruction format for a stacked-based VM.
Portable compilation
Go
Rust
JS
Python
Ruby

rpk
cloud login
Choose my
fav language!
Builds the
WebAssembly module
Define transformation rules
rpk transform build
rpk transform init
rpk transform deploy
--input-topic=customer
--output-topic=customer_masked
Deploy transformation to cluster
customer
customer_masked
customer
customer_masked
customer
customer_masked
Replicate
across clusters
Redpanda Data Transforms

cloud login
customer
customer_masked
customer
customer_masked
customer
customer_masked
Replicate
across clusters
customer
partition 1
customer_masked
partition 1
Load to cache
Customer age: 34
↓
Customer age: 3*
Transform
Write back to disk with DMA
Thread per Core
(Quick to process data)
Redpanda Data Transform

Demo - Real-Time Data for Machine Learning
Machine Learning
lifecycle
Data ETL
Feature
Engineering
Model Training
Deploy/Experi
ment
Prediction
Monitor
Problem

Application
MLOps
Real time food delivery
result – Raw data
In broker processing
data on the fly, in
broker avoid data
ping-pong
Process cleaned
features and param
data set
Continuous real-
time data
training for ML
Dynamic
Model
Updating
Real-time inference
bit.ly/redpanda-india

redpanda-0 redpanda-1
redpanda-2
redpanda-
console
Redpanda Cluster
Jupytor
Notebook
TensorFlow
Simulator
producer.py
Redpanda Cluster
Simulator
producer.py

redpanda-
console
Redpanda Cluster
Simulator
producer.py
redpanda-0
L L
redpanda-1
L L
redpanda-2
L L

redpanda-2
redpanda-
console
Redpanda Cluster
Jupytor
Notebook
TensorFlow
Simulator
producer.py
Redpanda
Transforms
Redpanda
Transforms
build
deploy
Redpanda Cluster
Simulator
producer.py
Redpanda
Transforms
Redpanda
Transforms

redpanda-2
redpanda-
console
Redpanda Cluster
Jupytor
Notebook
TensorFlow
Simulator
producer.py
ML Model
training
consumer.py
model
Real-time
inference
app.py
model

Redpanda University
Free, self-paced online learning
https://university.redpanda.com
•Learn the fundamentals of data streaming
and Redpanda
•Install Redpanda and use the rpk CLI to
configure it
•Create producers and consumers
in Java, Python and NodeJS
•Sign up today for free!

Bangalore Meetup - Enable realtime machine learning with streaming data

Recommended

Recommended

More Related Content

Similar to Bangalore Meetup - Enable realtime machine learning with streaming data

Similar to Bangalore Meetup - Enable realtime machine learning with streaming data (20)

More from Christina Lin

More from Christina Lin (20)

Recently uploaded

Recently uploaded (20)

Bangalore Meetup - Enable realtime machine learning with streaming data