SlideShare a Scribd company logo
1 of 76
Download to read offline
Enabling Data Scientists to easily
create and own Kafka Consumers
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
@stefkrawczyk
linkedin.com/in/skrawczyk
Try out Stitch Fix → goo.gl/Q3tCQ3
2
- What is Stitch Fix?
- Data Science @ Stitch Fix
- Stitch Fix’s opinionated Kafka consumer
- Learnings & Future Directions
Agenda
What is Stitch Fix
What does the company do?
Kafka Summit 2021 4
Stitch Fix is a personal styling service.
Shop at your personal curated store. Check out what you like.
Kafka Summit 2021 5
Data Science is behind everything we do.
algorithms-tour.stitchfix.com
Algorithms Org.
- 145+ Data Scientists and Platform Engineers
- 3 main verticals + platform
Data Platform
Kafka Summit 2021 6
whoami
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
Pre-covid look
Data Science
@ Stitch Fix
Expectations we have on DS @ Stitch Fix
Kafka Summit 2021 8
Most common approach to Data Science
Typical organization:
● Horizontal teams
● Hand off
● Coordination required
DATA SCIENCE /
RESEARCH TEAMS
ETL TEAMS
ENGINEERING TEAMS
Kafka Summit 2021 9
At Stitch Fix:
● Single Organization
● No handoff
● End to end ownership
● We have a lot of them!
● Built on top of data
platform tools &
abstractions.
Full Stack Data Science
See https://cultivating-algos.stitchfix.com/
DATA SCIENCE
ETL
ENGINEERING
Kafka Summit 2021 10
Full Stack Data Science
A typical DS flow at Stitch Fix
Typical flow:
● Idea / Prototype
● ETL
● “Production”
● Eval/Monitoring/Oncall
● Start on next iteration
Kafka Summit 2021 11
Full Stack Data Science
A typical DS flow at Stitch Fix
Production can mean:
● Web service
● Batch job / Table
● Kafka consumer
Heavily biased towards Python.
Kafka Summit 2021 12
Example use cases DS have built kafka consumers for
Example Kafka Consumers
● A/B testing bucket allocation
● Transforming raw inputs into features
● Saving data into feature stores
● Event driven model prediction
● Triggering workflows
Stitch Fix’s opinionated
Kafka consumer
Code first, explanation second
Kafka Summit 2021 14
Our “Hello world”
Consumer Code [ ]
Architecture [ ]
Mechanics [ ]
Kafka Summit 2021 15
To run this:
> pip install sf_kafka
> python -m sf_kafka.server hello_world_consumer
A simple example
Hello world consumer
hello_world_consumer.py
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
Kafka Summit 2021 16
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
A simple example
Hello world consumer
hello_world_consumer.py
So what is this doing?
To run this:
> pip install sf_kafka
> python -m sf_kafka.server hello_world_consumer
Kafka Summit 2021 17
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 18
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 19
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
3. Printing them to console. (DS would replace this with a call to their function())
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 20
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
3. Printing them to console.(DS would replace this with a call to their function())
4. We are registering this function to consume from ‘some.topic’ with no output.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 21
So what’s really going on?
When someone runs python -m sf_kafka.server hello_world_consumer
Consumer Code ✅
Architecture [ ]
Mechanics [ ]
Kafka Summit 2021 22
Kafka Summit 2021 23
1
Kafka Summit 2021 24
1
2
Kafka Summit 2021 25
1
2
3
Kafka Summit 2021 26
1
2
3
4
Kafka Summit 2021 27
1
2
3
4
Platform Concerns DS Concerns
Kafka Summit 2021 28
Platform Concerns vs DS Concerns
Consumer Code ✅
Architecture ✅
Mechanics [ ]
Kafka Summit 2021 29
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 30
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 31
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 32
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Can change without DS involvement -- just need to rebuild their app.
Kafka Summit 2021 33
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Topic monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Requires coordination with DS
Kafka Summit 2021 34
Platform Concern Choice Benefit
Kafka Client
Processing assumption
Salient choices we made on Platform
Kafka Summit 2021 35
Platform Concern Choice Benefit
Kafka Client python confluent-kafka
(librdkafka)
librdkafka is very performant & stable.
Processing assumption
Salient choices we made on Platform
Kafka Summit 2021 36
Platform Concern Choice Benefit
Kafka Client python confluent-kafka
(librdkafka)
librdkafka is very performant & stable.
Processing assumption At least once; functions
should be idempotent.
Enables very easy error recovery strategy:
● Consumer app breaks until it is fixed; can usually
wait until business hours.
● No loss of events.
● Monitoring trigger is consumer lag.
Salient choices we made on Platform
Kafka Summit 2021 37
Platform Concern Choice Benefit
Message serialization
format
Do we want to write back
to kafka directly?
Salient choices we made on Platform
Kafka Summit 2021 38
Platform Concern Choice Benefit
Message serialization
format
JSON Easy mapping to and from python dictionaries.
Easy to grok for DS.
* python support for other formats wasn’t great.
Do we want to write back
to kafka directly?
Salient choices we made on Platform
Kafka Summit 2021 39
Platform Concern Choice Benefit
Message serialization
format
JSON Easy mapping to and from python dictionaries.
Easy to grok for DS.
* python support for other formats wasn’t great.
Do we want to write back
to kafka directly?
Write via proxy service first. Enabled:
● Not having producer code in the engine.
● Ability to validate/introspect all messages.
● Ability to augment/change minor format
structure without having to redeploy all
consumers.
Salient choices we made on Platform
Kafka Summit 2021 40
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Kafka Summit 2021 41
Completing the production story
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Kafka Summit 2021 42
Completing the production story
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Self-service
^
Kafka Summit 2021 43
The self-service story of how a DS gets a consumer to production
Completing the production story
Example Use Case: Event Driven Model Prediction
1. Client signs up & fills out profile.
2. Event is sent - client.signed_up.
3. Predict something about the client.
4. Emit predictions back to kafka.
5. Use this for email campaigns.
-> $$
Kafka Summit 2021 44
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
Kafka Summit 2021 45
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
Kafka Summit 2021 46
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 47
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 48
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 49
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 50
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
"""Schema that we want to validate against."""
schema = {
'metadata': {
'timestamp': str,
'id': str,
'version': str
},
'payload': {
'some_prediction_value': float,
'client': int
}
}
Event
my_prediction.py
Kafka Summit 2021 51
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code
3. Deploy via command line:
a. Handles python environment creation
b. Builds docker container
c. Deploys
Kafka Summit 2021 52
Self-service deployment via command line
Completing the production story
Kafka Summit 2021 53
Self-service deployment via command line
Completing the production story
1
Kafka Summit 2021 54
Self-service deployment via command line
Completing the production story
2
1
Kafka Summit 2021 55
Self-service deployment via command line
Completing the production story
2
3
1
Kafka Summit 2021 56
Self-service deployment via command line
Completing the production story
DS Touch Points
Can be in production in < 1 hour
Self-service!
Kafka Summit 2021 57
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
Kafka Summit 2021 58
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
Kafka Summit 2021 59
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
3
Kafka Summit 2021 60
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
3
4
Learnings &
Future Directions
What we learned from this and where we’re looking to go.
Kafka Summit 2021 62
What? Learning
Learnings - DS Perspective
Kafka Summit 2021 63
What? Learning
Do they use it? ✅ 👍
Learnings - DS Perspective
Kafka Summit 2021 64
What? Learning
Do they use it? ✅ 👍
Focusing on the function 1. All they need to know about kafka is that it’ll give
them a list of events.
2. Leads to better separation of concerns:
a. Can split driver code versus their logic.
b. Test driven development is easy.
Learnings - DS Perspective
Kafka Summit 2021 65
What? Learning
Do they use it? ✅ 👍
Focusing on the function 1. All they need to know about kafka is that it’ll give
them a list of events.
2. Leads to better separation of concerns:
a. Can split driver code versus their logic.
b. Test driven development is easy.
At least once processing 1. They enjoy easy error recovery; gives DS time to fix
things.
2. Idempotency requirement not an issue.
Learnings - DS Perspective
Kafka Summit 2021 66
What? Learning
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 67
What? Learning
Writing back via proxy service 1. Helped early on with some minor message format
adjustments & validation.
2. Would recommend writing back directly if we were to
start again.
a. Writing back directly leads to better performance.
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 68
What? Learning
Writing back via proxy service 1. Helped early on with some minor message format
adjustments & validation.
2. Would recommend writing back directly if we were to
start again.
a. Writing back directly leads to better performance.
Central place for all things kafka Very useful to have a central place to:
1. Understand topics & topic contents.
2. Having “off the shelf” ability to materialize stream to a
datastore removed need for DS to manage/optimize this
process. E.g. elasticsearch, data warehouse, feature
store.
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 69
What? Learning
Using internal async libraries Using internal asyncio libs is cumbersome for DS.
Native asyncio framework would feel better.*
Learnings - Platform Perspective (2/2)
* we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
Kafka Summit 2021 70
What? Learning
Using internal async libraries Using internal asyncio libs is cumbersome for DS.
Native asyncio framework would feel better.*
Lineage & Lineage Impacts If there is a chain of consumers*, didn’t have easy
introspection into:
● Processing speed of full chain
● Knowing what the chain was
Learnings - Platform Perspective (2/2)
* we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
Kafka Summit 2021 71
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Future Directions
Kafka Summit 2021 72
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Exploring stream processing like kafka
streams & faust.
Streaming processing over windows is slowly
becoming something more DS ask about.
Future Directions
Kafka Summit 2021 73
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Exploring stream processing like kafka
streams & faust.
Streaming processing over windows is slowly
becoming something more DS ask about.
Writing an open source version Hypothesis that this is valuable and that the
community would be interested; would you be?
Future Directions
Summary
TL;DR:
Kafka Summit 2021 75
TL;DR:
Summary
Kafka + Data Scientists @ Stitch Fix:
● We have a self-service platform for Data Scientists to deploy kafka consumers
● We achieve self-service through a separation of concerns:
○ Data Scientists focus on functions to process events
○ Data Platform provides guardrails for kafka operations
Questions?
Find me at:
@stefkrawczyk
linkedin.com/in/skrawczyk/ Try out Stitch Fix → goo.gl/Q3tCQ3

More Related Content

What's hot

A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyHostedbyConfluent
 
How to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHostedbyConfluent
 
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...HostedbyConfluent
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafkaconfluent
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail DataOn Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Dataconfluent
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent
 
What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5confluent
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?confluent
 
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...HostedbyConfluent
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...confluent
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQCloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQHostedbyConfluent
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Red Hat Developers
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...confluent
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRconfluent
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020confluent
 

What's hot (20)

A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
How to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, Stripe
 
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail DataOn Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...
 
What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQCloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
 

Similar to Data Scientists easily create Kafka Consumers

Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipesinside-BigData.com
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)Alexandre Gouaillard
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...HostedbyConfluent
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven MicroservicesFabrizio Fortino
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonTimothy Spann
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 

Similar to Data Scientists easily create Kafka Consumers (20)

Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Data Scientists easily create Kafka Consumers

  • 1. Enabling Data Scientists to easily create and own Kafka Consumers Stefan Krawczyk Mgr. Data Platform - Model Lifecycle @stefkrawczyk linkedin.com/in/skrawczyk Try out Stitch Fix → goo.gl/Q3tCQ3
  • 2. 2 - What is Stitch Fix? - Data Science @ Stitch Fix - Stitch Fix’s opinionated Kafka consumer - Learnings & Future Directions Agenda
  • 3. What is Stitch Fix What does the company do?
  • 4. Kafka Summit 2021 4 Stitch Fix is a personal styling service. Shop at your personal curated store. Check out what you like.
  • 5. Kafka Summit 2021 5 Data Science is behind everything we do. algorithms-tour.stitchfix.com Algorithms Org. - 145+ Data Scientists and Platform Engineers - 3 main verticals + platform Data Platform
  • 6. Kafka Summit 2021 6 whoami Stefan Krawczyk Mgr. Data Platform - Model Lifecycle Pre-covid look
  • 7. Data Science @ Stitch Fix Expectations we have on DS @ Stitch Fix
  • 8. Kafka Summit 2021 8 Most common approach to Data Science Typical organization: ● Horizontal teams ● Hand off ● Coordination required DATA SCIENCE / RESEARCH TEAMS ETL TEAMS ENGINEERING TEAMS
  • 9. Kafka Summit 2021 9 At Stitch Fix: ● Single Organization ● No handoff ● End to end ownership ● We have a lot of them! ● Built on top of data platform tools & abstractions. Full Stack Data Science See https://cultivating-algos.stitchfix.com/ DATA SCIENCE ETL ENGINEERING
  • 10. Kafka Summit 2021 10 Full Stack Data Science A typical DS flow at Stitch Fix Typical flow: ● Idea / Prototype ● ETL ● “Production” ● Eval/Monitoring/Oncall ● Start on next iteration
  • 11. Kafka Summit 2021 11 Full Stack Data Science A typical DS flow at Stitch Fix Production can mean: ● Web service ● Batch job / Table ● Kafka consumer Heavily biased towards Python.
  • 12. Kafka Summit 2021 12 Example use cases DS have built kafka consumers for Example Kafka Consumers ● A/B testing bucket allocation ● Transforming raw inputs into features ● Saving data into feature stores ● Event driven model prediction ● Triggering workflows
  • 13. Stitch Fix’s opinionated Kafka consumer Code first, explanation second
  • 14. Kafka Summit 2021 14 Our “Hello world” Consumer Code [ ] Architecture [ ] Mechanics [ ]
  • 15. Kafka Summit 2021 15 To run this: > pip install sf_kafka > python -m sf_kafka.server hello_world_consumer A simple example Hello world consumer hello_world_consumer.py import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {}
  • 16. Kafka Summit 2021 16 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} A simple example Hello world consumer hello_world_consumer.py So what is this doing? To run this: > pip install sf_kafka > python -m sf_kafka.server hello_world_consumer
  • 17. Kafka Summit 2021 17 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. A simple example Hello world consumer hello_world_consumer.py
  • 18. Kafka Summit 2021 18 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. A simple example Hello world consumer hello_world_consumer.py
  • 19. Kafka Summit 2021 19 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. 3. Printing them to console. (DS would replace this with a call to their function()) A simple example Hello world consumer hello_world_consumer.py
  • 20. Kafka Summit 2021 20 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. 3. Printing them to console.(DS would replace this with a call to their function()) 4. We are registering this function to consume from ‘some.topic’ with no output. A simple example Hello world consumer hello_world_consumer.py
  • 21. Kafka Summit 2021 21 So what’s really going on? When someone runs python -m sf_kafka.server hello_world_consumer Consumer Code ✅ Architecture [ ] Mechanics [ ]
  • 25. Kafka Summit 2021 25 1 2 3
  • 26. Kafka Summit 2021 26 1 2 3 4
  • 27. Kafka Summit 2021 27 1 2 3 4 Platform Concerns DS Concerns
  • 28. Kafka Summit 2021 28 Platform Concerns vs DS Concerns Consumer Code ✅ Architecture ✅ Mechanics [ ]
  • 29. Kafka Summit 2021 29 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format What does each side own Platform Concerns vs DS Concerns
  • 30. Kafka Summit 2021 30 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools What does each side own Platform Concerns vs DS Concerns
  • 31. Kafka Summit 2021 31 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns
  • 32. Kafka Summit 2021 32 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns Can change without DS involvement -- just need to rebuild their app.
  • 33. Kafka Summit 2021 33 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Topic monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns Requires coordination with DS
  • 34. Kafka Summit 2021 34 Platform Concern Choice Benefit Kafka Client Processing assumption Salient choices we made on Platform
  • 35. Kafka Summit 2021 35 Platform Concern Choice Benefit Kafka Client python confluent-kafka (librdkafka) librdkafka is very performant & stable. Processing assumption Salient choices we made on Platform
  • 36. Kafka Summit 2021 36 Platform Concern Choice Benefit Kafka Client python confluent-kafka (librdkafka) librdkafka is very performant & stable. Processing assumption At least once; functions should be idempotent. Enables very easy error recovery strategy: ● Consumer app breaks until it is fixed; can usually wait until business hours. ● No loss of events. ● Monitoring trigger is consumer lag. Salient choices we made on Platform
  • 37. Kafka Summit 2021 37 Platform Concern Choice Benefit Message serialization format Do we want to write back to kafka directly? Salient choices we made on Platform
  • 38. Kafka Summit 2021 38 Platform Concern Choice Benefit Message serialization format JSON Easy mapping to and from python dictionaries. Easy to grok for DS. * python support for other formats wasn’t great. Do we want to write back to kafka directly? Salient choices we made on Platform
  • 39. Kafka Summit 2021 39 Platform Concern Choice Benefit Message serialization format JSON Easy mapping to and from python dictionaries. Easy to grok for DS. * python support for other formats wasn’t great. Do we want to write back to kafka directly? Write via proxy service first. Enabled: ● Not having producer code in the engine. ● Ability to validate/introspect all messages. ● Ability to augment/change minor format structure without having to redeploy all consumers. Salient choices we made on Platform
  • 40. Kafka Summit 2021 40 Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰
  • 41. Kafka Summit 2021 41 Completing the production story Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰
  • 42. Kafka Summit 2021 42 Completing the production story Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰ Self-service ^
  • 43. Kafka Summit 2021 43 The self-service story of how a DS gets a consumer to production Completing the production story Example Use Case: Event Driven Model Prediction 1. Client signs up & fills out profile. 2. Event is sent - client.signed_up. 3. Predict something about the client. 4. Emit predictions back to kafka. 5. Use this for email campaigns. -> $$
  • 44. Kafka Summit 2021 44 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume.
  • 45. Kafka Summit 2021 45 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event
  • 46. Kafka Summit 2021 46 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 47. Kafka Summit 2021 47 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 48. Kafka Summit 2021 48 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 49. Kafka Summit 2021 49 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 50. Kafka Summit 2021 50 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository """Schema that we want to validate against.""" schema = { 'metadata': { 'timestamp': str, 'id': str, 'version': str }, 'payload': { 'some_prediction_value': float, 'client': int } } Event my_prediction.py
  • 51. Kafka Summit 2021 51 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line: a. Handles python environment creation b. Builds docker container c. Deploys
  • 52. Kafka Summit 2021 52 Self-service deployment via command line Completing the production story
  • 53. Kafka Summit 2021 53 Self-service deployment via command line Completing the production story 1
  • 54. Kafka Summit 2021 54 Self-service deployment via command line Completing the production story 2 1
  • 55. Kafka Summit 2021 55 Self-service deployment via command line Completing the production story 2 3 1
  • 56. Kafka Summit 2021 56 Self-service deployment via command line Completing the production story DS Touch Points Can be in production in < 1 hour Self-service!
  • 57. Kafka Summit 2021 57 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1
  • 58. Kafka Summit 2021 58 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2
  • 59. Kafka Summit 2021 59 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2 3
  • 60. Kafka Summit 2021 60 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2 3 4
  • 61. Learnings & Future Directions What we learned from this and where we’re looking to go.
  • 62. Kafka Summit 2021 62 What? Learning Learnings - DS Perspective
  • 63. Kafka Summit 2021 63 What? Learning Do they use it? ✅ 👍 Learnings - DS Perspective
  • 64. Kafka Summit 2021 64 What? Learning Do they use it? ✅ 👍 Focusing on the function 1. All they need to know about kafka is that it’ll give them a list of events. 2. Leads to better separation of concerns: a. Can split driver code versus their logic. b. Test driven development is easy. Learnings - DS Perspective
  • 65. Kafka Summit 2021 65 What? Learning Do they use it? ✅ 👍 Focusing on the function 1. All they need to know about kafka is that it’ll give them a list of events. 2. Leads to better separation of concerns: a. Can split driver code versus their logic. b. Test driven development is easy. At least once processing 1. They enjoy easy error recovery; gives DS time to fix things. 2. Idempotency requirement not an issue. Learnings - DS Perspective
  • 66. Kafka Summit 2021 66 What? Learning Learnings - Platform Perspective (1/2)
  • 67. Kafka Summit 2021 67 What? Learning Writing back via proxy service 1. Helped early on with some minor message format adjustments & validation. 2. Would recommend writing back directly if we were to start again. a. Writing back directly leads to better performance. Learnings - Platform Perspective (1/2)
  • 68. Kafka Summit 2021 68 What? Learning Writing back via proxy service 1. Helped early on with some minor message format adjustments & validation. 2. Would recommend writing back directly if we were to start again. a. Writing back directly leads to better performance. Central place for all things kafka Very useful to have a central place to: 1. Understand topics & topic contents. 2. Having “off the shelf” ability to materialize stream to a datastore removed need for DS to manage/optimize this process. E.g. elasticsearch, data warehouse, feature store. Learnings - Platform Perspective (1/2)
  • 69. Kafka Summit 2021 69 What? Learning Using internal async libraries Using internal asyncio libs is cumbersome for DS. Native asyncio framework would feel better.* Learnings - Platform Perspective (2/2) * we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
  • 70. Kafka Summit 2021 70 What? Learning Using internal async libraries Using internal asyncio libs is cumbersome for DS. Native asyncio framework would feel better.* Lineage & Lineage Impacts If there is a chain of consumers*, didn’t have easy introspection into: ● Processing speed of full chain ● Knowing what the chain was Learnings - Platform Perspective (2/2) * we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
  • 71. Kafka Summit 2021 71 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Future Directions
  • 72. Kafka Summit 2021 72 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Exploring stream processing like kafka streams & faust. Streaming processing over windows is slowly becoming something more DS ask about. Future Directions
  • 73. Kafka Summit 2021 73 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Exploring stream processing like kafka streams & faust. Streaming processing over windows is slowly becoming something more DS ask about. Writing an open source version Hypothesis that this is valuable and that the community would be interested; would you be? Future Directions
  • 75. Kafka Summit 2021 75 TL;DR: Summary Kafka + Data Scientists @ Stitch Fix: ● We have a self-service platform for Data Scientists to deploy kafka consumers ● We achieve self-service through a separation of concerns: ○ Data Scientists focus on functions to process events ○ Data Platform provides guardrails for kafka operations