SlideShare a Scribd company logo
Enabling Data Scientists to easily
create and own Kafka Consumers
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
@stefkrawczyk
linkedin.com/in/skrawczyk
Try out Stitch Fix → goo.gl/Q3tCQ3
2
- What is Stitch Fix?
- Data Science @ Stitch Fix
- Stitch Fix’s opinionated Kafka consumer
- Learnings & Future Directions
Agenda
What is Stitch Fix
What does the company do?
Kafka Summit 2021 4
Stitch Fix is a personal styling service.
Shop at your personal curated store. Check out what you like.
Kafka Summit 2021 5
Data Science is behind everything we do.
algorithms-tour.stitchfix.com
Algorithms Org.
- 145+ Data Scientists and Platform Engineers
- 3 main verticals + platform
Data Platform
Kafka Summit 2021 6
whoami
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
Pre-covid look
Data Science
@ Stitch Fix
Expectations we have on DS @ Stitch Fix
Kafka Summit 2021 8
Most common approach to Data Science
Typical organization:
● Horizontal teams
● Hand off
● Coordination required
DATA SCIENCE /
RESEARCH TEAMS
ETL TEAMS
ENGINEERING TEAMS
Kafka Summit 2021 9
At Stitch Fix:
● Single Organization
● No handoff
● End to end ownership
● We have a lot of them!
● Built on top of data
platform tools &
abstractions.
Full Stack Data Science
See https://cultivating-algos.stitchfix.com/
DATA SCIENCE
ETL
ENGINEERING
Kafka Summit 2021 10
Full Stack Data Science
A typical DS flow at Stitch Fix
Typical flow:
● Idea / Prototype
● ETL
● “Production”
● Eval/Monitoring/Oncall
● Start on next iteration
Kafka Summit 2021 11
Full Stack Data Science
A typical DS flow at Stitch Fix
Production can mean:
● Web service
● Batch job / Table
● Kafka consumer
Heavily biased towards Python.
Kafka Summit 2021 12
Example use cases DS have built kafka consumers for
Example Kafka Consumers
● A/B testing bucket allocation
● Transforming raw inputs into features
● Saving data into feature stores
● Event driven model prediction
● Triggering workflows
Stitch Fix’s opinionated
Kafka consumer
Code first, explanation second
Kafka Summit 2021 14
Our “Hello world”
Consumer Code [ ]
Architecture [ ]
Mechanics [ ]
Kafka Summit 2021 15
To run this:
> pip install sf_kafka
> python -m sf_kafka.server hello_world_consumer
A simple example
Hello world consumer
hello_world_consumer.py
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
Kafka Summit 2021 16
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
A simple example
Hello world consumer
hello_world_consumer.py
So what is this doing?
To run this:
> pip install sf_kafka
> python -m sf_kafka.server hello_world_consumer
Kafka Summit 2021 17
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 18
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 19
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
3. Printing them to console. (DS would replace this with a call to their function())
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 20
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
3. Printing them to console.(DS would replace this with a call to their function())
4. We are registering this function to consume from ‘some.topic’ with no output.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 21
So what’s really going on?
When someone runs python -m sf_kafka.server hello_world_consumer
Consumer Code ✅
Architecture [ ]
Mechanics [ ]
Kafka Summit 2021 22
Kafka Summit 2021 23
1
Kafka Summit 2021 24
1
2
Kafka Summit 2021 25
1
2
3
Kafka Summit 2021 26
1
2
3
4
Kafka Summit 2021 27
1
2
3
4
Platform Concerns DS Concerns
Kafka Summit 2021 28
Platform Concerns vs DS Concerns
Consumer Code ✅
Architecture ✅
Mechanics [ ]
Kafka Summit 2021 29
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 30
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 31
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 32
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Can change without DS involvement -- just need to rebuild their app.
Kafka Summit 2021 33
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Topic monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Requires coordination with DS
Kafka Summit 2021 34
Platform Concern Choice Benefit
Kafka Client
Processing assumption
Salient choices we made on Platform
Kafka Summit 2021 35
Platform Concern Choice Benefit
Kafka Client python confluent-kafka
(librdkafka)
librdkafka is very performant & stable.
Processing assumption
Salient choices we made on Platform
Kafka Summit 2021 36
Platform Concern Choice Benefit
Kafka Client python confluent-kafka
(librdkafka)
librdkafka is very performant & stable.
Processing assumption At least once; functions
should be idempotent.
Enables very easy error recovery strategy:
● Consumer app breaks until it is fixed; can usually
wait until business hours.
● No loss of events.
● Monitoring trigger is consumer lag.
Salient choices we made on Platform
Kafka Summit 2021 37
Platform Concern Choice Benefit
Message serialization
format
Do we want to write back
to kafka directly?
Salient choices we made on Platform
Kafka Summit 2021 38
Platform Concern Choice Benefit
Message serialization
format
JSON Easy mapping to and from python dictionaries.
Easy to grok for DS.
* python support for other formats wasn’t great.
Do we want to write back
to kafka directly?
Salient choices we made on Platform
Kafka Summit 2021 39
Platform Concern Choice Benefit
Message serialization
format
JSON Easy mapping to and from python dictionaries.
Easy to grok for DS.
* python support for other formats wasn’t great.
Do we want to write back
to kafka directly?
Write via proxy service first. Enabled:
● Not having producer code in the engine.
● Ability to validate/introspect all messages.
● Ability to augment/change minor format
structure without having to redeploy all
consumers.
Salient choices we made on Platform
Kafka Summit 2021 40
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Kafka Summit 2021 41
Completing the production story
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Kafka Summit 2021 42
Completing the production story
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Self-service
^
Kafka Summit 2021 43
The self-service story of how a DS gets a consumer to production
Completing the production story
Example Use Case: Event Driven Model Prediction
1. Client signs up & fills out profile.
2. Event is sent - client.signed_up.
3. Predict something about the client.
4. Emit predictions back to kafka.
5. Use this for email campaigns.
-> $$
Kafka Summit 2021 44
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
Kafka Summit 2021 45
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
Kafka Summit 2021 46
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 47
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 48
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 49
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 50
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
"""Schema that we want to validate against."""
schema = {
'metadata': {
'timestamp': str,
'id': str,
'version': str
},
'payload': {
'some_prediction_value': float,
'client': int
}
}
Event
my_prediction.py
Kafka Summit 2021 51
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code
3. Deploy via command line:
a. Handles python environment creation
b. Builds docker container
c. Deploys
Kafka Summit 2021 52
Self-service deployment via command line
Completing the production story
Kafka Summit 2021 53
Self-service deployment via command line
Completing the production story
1
Kafka Summit 2021 54
Self-service deployment via command line
Completing the production story
2
1
Kafka Summit 2021 55
Self-service deployment via command line
Completing the production story
2
3
1
Kafka Summit 2021 56
Self-service deployment via command line
Completing the production story
DS Touch Points
Can be in production in < 1 hour
Self-service!
Kafka Summit 2021 57
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
Kafka Summit 2021 58
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
Kafka Summit 2021 59
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
3
Kafka Summit 2021 60
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
3
4
Learnings &
Future Directions
What we learned from this and where we’re looking to go.
Kafka Summit 2021 62
What? Learning
Learnings - DS Perspective
Kafka Summit 2021 63
What? Learning
Do they use it? ✅ 👍
Learnings - DS Perspective
Kafka Summit 2021 64
What? Learning
Do they use it? ✅ 👍
Focusing on the function 1. All they need to know about kafka is that it’ll give
them a list of events.
2. Leads to better separation of concerns:
a. Can split driver code versus their logic.
b. Test driven development is easy.
Learnings - DS Perspective
Kafka Summit 2021 65
What? Learning
Do they use it? ✅ 👍
Focusing on the function 1. All they need to know about kafka is that it’ll give
them a list of events.
2. Leads to better separation of concerns:
a. Can split driver code versus their logic.
b. Test driven development is easy.
At least once processing 1. They enjoy easy error recovery; gives DS time to fix
things.
2. Idempotency requirement not an issue.
Learnings - DS Perspective
Kafka Summit 2021 66
What? Learning
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 67
What? Learning
Writing back via proxy service 1. Helped early on with some minor message format
adjustments & validation.
2. Would recommend writing back directly if we were to
start again.
a. Writing back directly leads to better performance.
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 68
What? Learning
Writing back via proxy service 1. Helped early on with some minor message format
adjustments & validation.
2. Would recommend writing back directly if we were to
start again.
a. Writing back directly leads to better performance.
Central place for all things kafka Very useful to have a central place to:
1. Understand topics & topic contents.
2. Having “off the shelf” ability to materialize stream to a
datastore removed need for DS to manage/optimize this
process. E.g. elasticsearch, data warehouse, feature
store.
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 69
What? Learning
Using internal async libraries Using internal asyncio libs is cumbersome for DS.
Native asyncio framework would feel better.*
Learnings - Platform Perspective (2/2)
* we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
Kafka Summit 2021 70
What? Learning
Using internal async libraries Using internal asyncio libs is cumbersome for DS.
Native asyncio framework would feel better.*
Lineage & Lineage Impacts If there is a chain of consumers*, didn’t have easy
introspection into:
● Processing speed of full chain
● Knowing what the chain was
Learnings - Platform Perspective (2/2)
* we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
Kafka Summit 2021 71
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Future Directions
Kafka Summit 2021 72
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Exploring stream processing like kafka
streams & faust.
Streaming processing over windows is slowly
becoming something more DS ask about.
Future Directions
Kafka Summit 2021 73
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Exploring stream processing like kafka
streams & faust.
Streaming processing over windows is slowly
becoming something more DS ask about.
Writing an open source version Hypothesis that this is valuable and that the
community would be interested; would you be?
Future Directions
Summary
TL;DR:
Kafka Summit 2021 75
TL;DR:
Summary
Kafka + Data Scientists @ Stitch Fix:
● We have a self-service platform for Data Scientists to deploy kafka consumers
● We achieve self-service through a separation of concerns:
○ Data Scientists focus on functions to process events
○ Data Platform provides guardrails for kafka operations
Questions?
Find me at:
@stefkrawczyk
linkedin.com/in/skrawczyk/ Try out Stitch Fix → goo.gl/Q3tCQ3

More Related Content

What's hot

Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Flink Forward
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
Alex Sadovsky
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
Romain Hardouin
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
QAware GmbH
 
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Michael Noll
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
Tatiana Al-Chueyr
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 

What's hot (20)

Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 

Similar to Enabling Data Scientists to easily create and own Kafka Consumers

Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
Zach Cox
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
inside-BigData.com
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
Alexandre Gouaillard
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
Shu-Jeng Hsieh
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
confluent
 

Similar to Enabling Data Scientists to easily create and own Kafka Consumers (20)

Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
 

Recently uploaded

The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
mohitd6
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Tier1 app
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Optimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptxOptimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptx
WebConnect Pvt Ltd
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
KrishnaveniMohan1
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...
Luigi Fugaro
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 

Recently uploaded (20)

The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Optimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptxOptimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptx
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
bgiolcb
bgiolcbbgiolcb
bgiolcb
 

Enabling Data Scientists to easily create and own Kafka Consumers

  • 1. Enabling Data Scientists to easily create and own Kafka Consumers Stefan Krawczyk Mgr. Data Platform - Model Lifecycle @stefkrawczyk linkedin.com/in/skrawczyk Try out Stitch Fix → goo.gl/Q3tCQ3
  • 2. 2 - What is Stitch Fix? - Data Science @ Stitch Fix - Stitch Fix’s opinionated Kafka consumer - Learnings & Future Directions Agenda
  • 3. What is Stitch Fix What does the company do?
  • 4. Kafka Summit 2021 4 Stitch Fix is a personal styling service. Shop at your personal curated store. Check out what you like.
  • 5. Kafka Summit 2021 5 Data Science is behind everything we do. algorithms-tour.stitchfix.com Algorithms Org. - 145+ Data Scientists and Platform Engineers - 3 main verticals + platform Data Platform
  • 6. Kafka Summit 2021 6 whoami Stefan Krawczyk Mgr. Data Platform - Model Lifecycle Pre-covid look
  • 7. Data Science @ Stitch Fix Expectations we have on DS @ Stitch Fix
  • 8. Kafka Summit 2021 8 Most common approach to Data Science Typical organization: ● Horizontal teams ● Hand off ● Coordination required DATA SCIENCE / RESEARCH TEAMS ETL TEAMS ENGINEERING TEAMS
  • 9. Kafka Summit 2021 9 At Stitch Fix: ● Single Organization ● No handoff ● End to end ownership ● We have a lot of them! ● Built on top of data platform tools & abstractions. Full Stack Data Science See https://cultivating-algos.stitchfix.com/ DATA SCIENCE ETL ENGINEERING
  • 10. Kafka Summit 2021 10 Full Stack Data Science A typical DS flow at Stitch Fix Typical flow: ● Idea / Prototype ● ETL ● “Production” ● Eval/Monitoring/Oncall ● Start on next iteration
  • 11. Kafka Summit 2021 11 Full Stack Data Science A typical DS flow at Stitch Fix Production can mean: ● Web service ● Batch job / Table ● Kafka consumer Heavily biased towards Python.
  • 12. Kafka Summit 2021 12 Example use cases DS have built kafka consumers for Example Kafka Consumers ● A/B testing bucket allocation ● Transforming raw inputs into features ● Saving data into feature stores ● Event driven model prediction ● Triggering workflows
  • 13. Stitch Fix’s opinionated Kafka consumer Code first, explanation second
  • 14. Kafka Summit 2021 14 Our “Hello world” Consumer Code [ ] Architecture [ ] Mechanics [ ]
  • 15. Kafka Summit 2021 15 To run this: > pip install sf_kafka > python -m sf_kafka.server hello_world_consumer A simple example Hello world consumer hello_world_consumer.py import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {}
  • 16. Kafka Summit 2021 16 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} A simple example Hello world consumer hello_world_consumer.py So what is this doing? To run this: > pip install sf_kafka > python -m sf_kafka.server hello_world_consumer
  • 17. Kafka Summit 2021 17 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. A simple example Hello world consumer hello_world_consumer.py
  • 18. Kafka Summit 2021 18 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. A simple example Hello world consumer hello_world_consumer.py
  • 19. Kafka Summit 2021 19 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. 3. Printing them to console. (DS would replace this with a call to their function()) A simple example Hello world consumer hello_world_consumer.py
  • 20. Kafka Summit 2021 20 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. 3. Printing them to console.(DS would replace this with a call to their function()) 4. We are registering this function to consume from ‘some.topic’ with no output. A simple example Hello world consumer hello_world_consumer.py
  • 21. Kafka Summit 2021 21 So what’s really going on? When someone runs python -m sf_kafka.server hello_world_consumer Consumer Code ✅ Architecture [ ] Mechanics [ ]
  • 25. Kafka Summit 2021 25 1 2 3
  • 26. Kafka Summit 2021 26 1 2 3 4
  • 27. Kafka Summit 2021 27 1 2 3 4 Platform Concerns DS Concerns
  • 28. Kafka Summit 2021 28 Platform Concerns vs DS Concerns Consumer Code ✅ Architecture ✅ Mechanics [ ]
  • 29. Kafka Summit 2021 29 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format What does each side own Platform Concerns vs DS Concerns
  • 30. Kafka Summit 2021 30 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools What does each side own Platform Concerns vs DS Concerns
  • 31. Kafka Summit 2021 31 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns
  • 32. Kafka Summit 2021 32 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns Can change without DS involvement -- just need to rebuild their app.
  • 33. Kafka Summit 2021 33 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Topic monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns Requires coordination with DS
  • 34. Kafka Summit 2021 34 Platform Concern Choice Benefit Kafka Client Processing assumption Salient choices we made on Platform
  • 35. Kafka Summit 2021 35 Platform Concern Choice Benefit Kafka Client python confluent-kafka (librdkafka) librdkafka is very performant & stable. Processing assumption Salient choices we made on Platform
  • 36. Kafka Summit 2021 36 Platform Concern Choice Benefit Kafka Client python confluent-kafka (librdkafka) librdkafka is very performant & stable. Processing assumption At least once; functions should be idempotent. Enables very easy error recovery strategy: ● Consumer app breaks until it is fixed; can usually wait until business hours. ● No loss of events. ● Monitoring trigger is consumer lag. Salient choices we made on Platform
  • 37. Kafka Summit 2021 37 Platform Concern Choice Benefit Message serialization format Do we want to write back to kafka directly? Salient choices we made on Platform
  • 38. Kafka Summit 2021 38 Platform Concern Choice Benefit Message serialization format JSON Easy mapping to and from python dictionaries. Easy to grok for DS. * python support for other formats wasn’t great. Do we want to write back to kafka directly? Salient choices we made on Platform
  • 39. Kafka Summit 2021 39 Platform Concern Choice Benefit Message serialization format JSON Easy mapping to and from python dictionaries. Easy to grok for DS. * python support for other formats wasn’t great. Do we want to write back to kafka directly? Write via proxy service first. Enabled: ● Not having producer code in the engine. ● Ability to validate/introspect all messages. ● Ability to augment/change minor format structure without having to redeploy all consumers. Salient choices we made on Platform
  • 40. Kafka Summit 2021 40 Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰
  • 41. Kafka Summit 2021 41 Completing the production story Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰
  • 42. Kafka Summit 2021 42 Completing the production story Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰ Self-service ^
  • 43. Kafka Summit 2021 43 The self-service story of how a DS gets a consumer to production Completing the production story Example Use Case: Event Driven Model Prediction 1. Client signs up & fills out profile. 2. Event is sent - client.signed_up. 3. Predict something about the client. 4. Emit predictions back to kafka. 5. Use this for email campaigns. -> $$
  • 44. Kafka Summit 2021 44 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume.
  • 45. Kafka Summit 2021 45 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event
  • 46. Kafka Summit 2021 46 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 47. Kafka Summit 2021 47 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 48. Kafka Summit 2021 48 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 49. Kafka Summit 2021 49 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 50. Kafka Summit 2021 50 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository """Schema that we want to validate against.""" schema = { 'metadata': { 'timestamp': str, 'id': str, 'version': str }, 'payload': { 'some_prediction_value': float, 'client': int } } Event my_prediction.py
  • 51. Kafka Summit 2021 51 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line: a. Handles python environment creation b. Builds docker container c. Deploys
  • 52. Kafka Summit 2021 52 Self-service deployment via command line Completing the production story
  • 53. Kafka Summit 2021 53 Self-service deployment via command line Completing the production story 1
  • 54. Kafka Summit 2021 54 Self-service deployment via command line Completing the production story 2 1
  • 55. Kafka Summit 2021 55 Self-service deployment via command line Completing the production story 2 3 1
  • 56. Kafka Summit 2021 56 Self-service deployment via command line Completing the production story DS Touch Points Can be in production in < 1 hour Self-service!
  • 57. Kafka Summit 2021 57 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1
  • 58. Kafka Summit 2021 58 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2
  • 59. Kafka Summit 2021 59 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2 3
  • 60. Kafka Summit 2021 60 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2 3 4
  • 61. Learnings & Future Directions What we learned from this and where we’re looking to go.
  • 62. Kafka Summit 2021 62 What? Learning Learnings - DS Perspective
  • 63. Kafka Summit 2021 63 What? Learning Do they use it? ✅ 👍 Learnings - DS Perspective
  • 64. Kafka Summit 2021 64 What? Learning Do they use it? ✅ 👍 Focusing on the function 1. All they need to know about kafka is that it’ll give them a list of events. 2. Leads to better separation of concerns: a. Can split driver code versus their logic. b. Test driven development is easy. Learnings - DS Perspective
  • 65. Kafka Summit 2021 65 What? Learning Do they use it? ✅ 👍 Focusing on the function 1. All they need to know about kafka is that it’ll give them a list of events. 2. Leads to better separation of concerns: a. Can split driver code versus their logic. b. Test driven development is easy. At least once processing 1. They enjoy easy error recovery; gives DS time to fix things. 2. Idempotency requirement not an issue. Learnings - DS Perspective
  • 66. Kafka Summit 2021 66 What? Learning Learnings - Platform Perspective (1/2)
  • 67. Kafka Summit 2021 67 What? Learning Writing back via proxy service 1. Helped early on with some minor message format adjustments & validation. 2. Would recommend writing back directly if we were to start again. a. Writing back directly leads to better performance. Learnings - Platform Perspective (1/2)
  • 68. Kafka Summit 2021 68 What? Learning Writing back via proxy service 1. Helped early on with some minor message format adjustments & validation. 2. Would recommend writing back directly if we were to start again. a. Writing back directly leads to better performance. Central place for all things kafka Very useful to have a central place to: 1. Understand topics & topic contents. 2. Having “off the shelf” ability to materialize stream to a datastore removed need for DS to manage/optimize this process. E.g. elasticsearch, data warehouse, feature store. Learnings - Platform Perspective (1/2)
  • 69. Kafka Summit 2021 69 What? Learning Using internal async libraries Using internal asyncio libs is cumbersome for DS. Native asyncio framework would feel better.* Learnings - Platform Perspective (2/2) * we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
  • 70. Kafka Summit 2021 70 What? Learning Using internal async libraries Using internal asyncio libs is cumbersome for DS. Native asyncio framework would feel better.* Lineage & Lineage Impacts If there is a chain of consumers*, didn’t have easy introspection into: ● Processing speed of full chain ● Knowing what the chain was Learnings - Platform Perspective (2/2) * we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
  • 71. Kafka Summit 2021 71 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Future Directions
  • 72. Kafka Summit 2021 72 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Exploring stream processing like kafka streams & faust. Streaming processing over windows is slowly becoming something more DS ask about. Future Directions
  • 73. Kafka Summit 2021 73 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Exploring stream processing like kafka streams & faust. Streaming processing over windows is slowly becoming something more DS ask about. Writing an open source version Hypothesis that this is valuable and that the community would be interested; would you be? Future Directions
  • 75. Kafka Summit 2021 75 TL;DR: Summary Kafka + Data Scientists @ Stitch Fix: ● We have a self-service platform for Data Scientists to deploy kafka consumers ● We achieve self-service through a separation of concerns: ○ Data Scientists focus on functions to process events ○ Data Platform provides guardrails for kafka operations