SlideShare a Scribd company logo
1 of 36
Download to read offline
Streaming
Why should I care?
Christian Trebing
Blue Yonder GmbH
@ctrebing
1
Agenda
Motivation
Streaming Intro
Implementation
Challenges
2
Data Processing
- The Monolith
3
THE Database
Data Input
Data Validation
Machine Learning
Data Output
Pain
Several teams are developing this application
• Customer desperately wants new feature in machine
learning
• But the data validation team is in the midst of
refactoring their database structure (‚will be finished
in two weeks‘)
• So you wait…
4
Could Microservices Help?
Great:
• No Dependency on single state
• Independent Development
• Independent Upgrades
Difficult:
• Too much data to transfer
• Too much data to store in each
service
5
Data Input Data
Machine Learning Data
Data Validation Data
Data Output Data
Are There Other Possibilities?
6
Streaming Intro
7
Databases and Streams
Same information in table and stream:
8
A=1 B=5 C=3 A=8 A=4 C=2
A: 1 A: 1
B: 5
A: 1
B: 5
C: 3
A: 8
B: 5
C: 3
A: 4
B: 5
C: 3
A: 4
B: 5
C: 2
Why does it matter?
Different services can be in different states
• Each service can consume the stream in its own
speed
• One service can be updated while the other runs
9
A=1 B=5 C=3 A=8 A=4 C=2
A: 1 A: 1
B: 5
A: 1
B: 5
C: 3
A: 8
B: 5
C: 3
A: 4
B: 5
C: 3
A: 4
B: 5
C: 2
Service 1
on index 3
Service 2
on index 5
Partitioned Streams
Sales stream, partitioned by location
Each partition could be handled by diff. processors
10
location: Rimini
product: Spaghetti
sales_date: 2017-07-10
quantity: 5
location: Rimini
product: Ravioli
sales_date: 2017-07-10
quantity: 8
location: Rimini
product: Pizza
sales_date: 2017-07-11
quantity: 1
location: Rimini
product: Spaghetti
sales_date: 2017-07-11
quantity: 7
location: Bilbao
product: Pizza
sales_date: 2017-07-10
quantity: 3
location: Bilbao
product: Ravioli
sales_date: 2017-07-11
quantity: 9
location: Bilbao
product: Spaghetti
sales_date: 2017-07-11
quantity: 5
location: Karlsruhe
product: Pizza
sales_date: 2017-07-10
quantity: 8
location: Karlsruhe
product: Ravioli
sales_date: 2017-07-10
quantity: 3
location: Karlsruhe
product: Ravioli
sales_date: 2017-07-11
quantity: 7
location: Karlsruhe
product: Spaghetti
sales_date: 2017-07-11
quantity: 7
Data Processing
- With Streaming
11
Data Input Data
Machine Learning Data
Data Validation Data
Data Output Data
Streaming Platform
What did we gain?
Independent Development
Independent Upgrade
Scalability
Did we throw out databases completely?
• Let’s see…
12
Is it magic?
No, it’s a tradeoff:
• A database is so powerful: ACID guarantess, SQL
language. You can do nearly everything
• This comes at a price
• Dependency on single state
• Scaling is hard
So let’s more think what we really need
13
What do we loose?
14
Database Stream
ACID Ordering on stream partition
SQL Queries
Service is responsible of keeping
its state
You have to decide whether you can live with that.
Implementation
15
Apache Kafka
16
Kafka Clients in Python
• pykafka, python-kafka, confluent-kafka-client
• Nice comparison has been done here:
• http://activisiongamescience.github.io/2016/06/15/Kafka-
Client-Benchmarking/
• Most performant currently is confluent-kafka-client
• Uses the c library librdkafka
17
Producer
from confluent_kafka import Producer
p = Producer({'bootstrap.servers': 'mybroker,mybroker2'})
for data in some_data_source:
p.produce('mytopic', data.encode('utf-8'))
p.flush()
18
Consumer
19
Apache Avro
Data Serialization, Enabling Schema Evolution
Clearly defined schema with:
• Schema evolution
• Schema registry
20
Data Type Field Name
string location
string product
string sales_date
int quantity
Writer’s schema Reader’s schema
Data Type Field Name
string location
string product
string sales_date
int quantity
int, default=0 delivery_id
Avro Schema
21
AvroProducer
22
AvroConsumer
23
Example: Data Validation
Separate valid and invalid sales records
24
Additional Processors
Need to evolve your application:
• Add processors for evaluation topics
• Try new variant of validation logic
database
• remember processing state, same for each processor
streaming
• offset in each processor, can work independently
25
Challenge
- Machine Learning Still Batch
26
How to get the input data?
Remember: There is no possibility to query a stream.
Somewhere all this data needs to be. Options:
• Memory of the service
• Serving database
• Blob store
Yes, that’s duplication. We’ll have to live with it.
27
Write Path / Read Path
28
write path read path
THE databasedata validation
machine learning
query
machine learning
write path read path
Blob storedata validation
machine learning
query
machine learning
Machine Learning Input Data
29
sales_validated products_validatedlocations_validated
join
tabletable
append to file
Challenge
- State in Processors
30
State - Nightmare of every
distributed systems engineer
Streaming: Data just rushes through
Why do we need state?
• Time window processing
• Data you want to join with
Formerly, the database did it for you
31
State - Some Challenges?
Failure of a processor
Scaling
32
State in Stream Processors
- Possible Solutions
• Just keep in memory. Reprocess stream to warm up
• Each processor to keep its own db
• Save condensed in stream
• Get it from other service
Frameworks exist in other languages:
• for example: Kafka Streams, Apache Samza
Up to yesterday: none in python. Then heard about
https://github.com/wintoncode/winton-kafka-streams
33
Summary
• You have more options for your data processing
applications than you might have thought
• As always, there are some tradeoffs
• You know the challenges
34
Questions?
35
Blue Yonder GmbH
Ohiostraße 8
76149 Karlsruhe
Germany
+49 721 383117 0
Blue Yonder Software Limited
19 Eastbourne Terrace
London, W2 6LG
United Kingdom
+44 20 3626 0360
Blue Yonder
Best decisions,
delivered daily
Blue Yonder Analytics, Inc.
5048 Tennyson Parkway
Suite 250
Plano, Texas 75024
USA
36

More Related Content

What's hot

An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsSingleStore
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...DataStax Academy
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Modernization sql server 2016
Modernization   sql server 2016Modernization   sql server 2016
Modernization sql server 2016Kiki Noviandi
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward
 
Whats new in Columnstore Indexes for SQL Server 2017
Whats new in Columnstore Indexes for SQL Server 2017Whats new in Columnstore Indexes for SQL Server 2017
Whats new in Columnstore Indexes for SQL Server 2017Niko Neugebauer
 
Scaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyScaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyOliver Seemann
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataDatabricks
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayDataStax Academy
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Niko Neugebauer
 
Clustered Columnstore Introduction
Clustered Columnstore IntroductionClustered Columnstore Introduction
Clustered Columnstore IntroductionNiko Neugebauer
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical OverviewTim Callaghan
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
 
Тарас Кльоба "ETL — вже не актуальна; тривалі живі потоки із системою Apache...
Тарас Кльоба  "ETL — вже не актуальна; тривалі живі потоки із системою Apache...Тарас Кльоба  "ETL — вже не актуальна; тривалі живі потоки із системою Apache...
Тарас Кльоба "ETL — вже не актуальна; тривалі живі потоки із системою Apache...Lviv Startup Club
 
Why you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentWhy you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentVoltDB
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outMariaDB plc
 

What's hot (19)

An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Modernization sql server 2016
Modernization   sql server 2016Modernization   sql server 2016
Modernization sql server 2016
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
 
Whats new in Columnstore Indexes for SQL Server 2017
Whats new in Columnstore Indexes for SQL Server 2017Whats new in Columnstore Indexes for SQL Server 2017
Whats new in Columnstore Indexes for SQL Server 2017
 
Scaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case studyScaling a SaaS backend with PostgreSQL - A case study
Scaling a SaaS backend with PostgreSQL - A case study
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBay
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
 
Clustered Columnstore Introduction
Clustered Columnstore IntroductionClustered Columnstore Introduction
Clustered Columnstore Introduction
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical Overview
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Тарас Кльоба "ETL — вже не актуальна; тривалі живі потоки із системою Apache...
Тарас Кльоба  "ETL — вже не актуальна; тривалі живі потоки із системою Apache...Тарас Кльоба  "ETL — вже не актуальна; тривалі живі потоки із системою Apache...
Тарас Кльоба "ETL — вже не актуальна; тривалі живі потоки із системою Apache...
 
Why you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentWhy you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise Environment
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
 

Similar to Streaming - Why Should I Care?

Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGIAF
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptQingsong Yao
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architectureRiccardo Perico
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talkKrishna Gade
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven ProductsDataWorks Summit
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysRahul Agarwal
 
GPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep diveGPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep diveRiccardo Perico
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Acting on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsActing on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsVoltDB
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
Warehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your DataWarehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your DataScott Arbeitman
 
Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...
Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...
Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...itnig
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceCarolEnLaNube
 
Geek Sync I In Database Automation We Trust
Geek Sync I In Database Automation We TrustGeek Sync I In Database Automation We Trust
Geek Sync I In Database Automation We TrustIDERA Software
 
Accelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataAccelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataKyle Hailey
 

Similar to Streaming - Why Should I Care? (20)

Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - Plumbee
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
MicroStrategy at Badoo
MicroStrategy at BadooMicroStrategy at Badoo
MicroStrategy at Badoo
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecture
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion Days
 
GPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep diveGPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep dive
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Acting on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsActing on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won Transactions
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Warehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your DataWarehousing Your Hits - The Why and How of Owning Your Data
Warehousing Your Hits - The Why and How of Owning Your Data
 
Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...
Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...
Die Another Day: Scaling from 0 to 4 million daily requests as a lone develop...
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets Salesforce
 
Geek Sync I In Database Automation We Trust
Geek Sync I In Database Automation We TrustGeek Sync I In Database Automation We Trust
Geek Sync I In Database Automation We Trust
 
Accelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataAccelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual Data
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Streaming - Why Should I Care?

  • 1. Streaming Why should I care? Christian Trebing Blue Yonder GmbH @ctrebing 1
  • 3. Data Processing - The Monolith 3 THE Database Data Input Data Validation Machine Learning Data Output
  • 4. Pain Several teams are developing this application • Customer desperately wants new feature in machine learning • But the data validation team is in the midst of refactoring their database structure (‚will be finished in two weeks‘) • So you wait… 4
  • 5. Could Microservices Help? Great: • No Dependency on single state • Independent Development • Independent Upgrades Difficult: • Too much data to transfer • Too much data to store in each service 5 Data Input Data Machine Learning Data Data Validation Data Data Output Data
  • 6. Are There Other Possibilities? 6
  • 8. Databases and Streams Same information in table and stream: 8 A=1 B=5 C=3 A=8 A=4 C=2 A: 1 A: 1 B: 5 A: 1 B: 5 C: 3 A: 8 B: 5 C: 3 A: 4 B: 5 C: 3 A: 4 B: 5 C: 2
  • 9. Why does it matter? Different services can be in different states • Each service can consume the stream in its own speed • One service can be updated while the other runs 9 A=1 B=5 C=3 A=8 A=4 C=2 A: 1 A: 1 B: 5 A: 1 B: 5 C: 3 A: 8 B: 5 C: 3 A: 4 B: 5 C: 3 A: 4 B: 5 C: 2 Service 1 on index 3 Service 2 on index 5
  • 10. Partitioned Streams Sales stream, partitioned by location Each partition could be handled by diff. processors 10 location: Rimini product: Spaghetti sales_date: 2017-07-10 quantity: 5 location: Rimini product: Ravioli sales_date: 2017-07-10 quantity: 8 location: Rimini product: Pizza sales_date: 2017-07-11 quantity: 1 location: Rimini product: Spaghetti sales_date: 2017-07-11 quantity: 7 location: Bilbao product: Pizza sales_date: 2017-07-10 quantity: 3 location: Bilbao product: Ravioli sales_date: 2017-07-11 quantity: 9 location: Bilbao product: Spaghetti sales_date: 2017-07-11 quantity: 5 location: Karlsruhe product: Pizza sales_date: 2017-07-10 quantity: 8 location: Karlsruhe product: Ravioli sales_date: 2017-07-10 quantity: 3 location: Karlsruhe product: Ravioli sales_date: 2017-07-11 quantity: 7 location: Karlsruhe product: Spaghetti sales_date: 2017-07-11 quantity: 7
  • 11. Data Processing - With Streaming 11 Data Input Data Machine Learning Data Data Validation Data Data Output Data Streaming Platform
  • 12. What did we gain? Independent Development Independent Upgrade Scalability Did we throw out databases completely? • Let’s see… 12
  • 13. Is it magic? No, it’s a tradeoff: • A database is so powerful: ACID guarantess, SQL language. You can do nearly everything • This comes at a price • Dependency on single state • Scaling is hard So let’s more think what we really need 13
  • 14. What do we loose? 14 Database Stream ACID Ordering on stream partition SQL Queries Service is responsible of keeping its state You have to decide whether you can live with that.
  • 17. Kafka Clients in Python • pykafka, python-kafka, confluent-kafka-client • Nice comparison has been done here: • http://activisiongamescience.github.io/2016/06/15/Kafka- Client-Benchmarking/ • Most performant currently is confluent-kafka-client • Uses the c library librdkafka 17
  • 18. Producer from confluent_kafka import Producer p = Producer({'bootstrap.servers': 'mybroker,mybroker2'}) for data in some_data_source: p.produce('mytopic', data.encode('utf-8')) p.flush() 18
  • 20. Apache Avro Data Serialization, Enabling Schema Evolution Clearly defined schema with: • Schema evolution • Schema registry 20 Data Type Field Name string location string product string sales_date int quantity Writer’s schema Reader’s schema Data Type Field Name string location string product string sales_date int quantity int, default=0 delivery_id
  • 24. Example: Data Validation Separate valid and invalid sales records 24
  • 25. Additional Processors Need to evolve your application: • Add processors for evaluation topics • Try new variant of validation logic database • remember processing state, same for each processor streaming • offset in each processor, can work independently 25
  • 27. How to get the input data? Remember: There is no possibility to query a stream. Somewhere all this data needs to be. Options: • Memory of the service • Serving database • Blob store Yes, that’s duplication. We’ll have to live with it. 27
  • 28. Write Path / Read Path 28 write path read path THE databasedata validation machine learning query machine learning write path read path Blob storedata validation machine learning query machine learning
  • 29. Machine Learning Input Data 29 sales_validated products_validatedlocations_validated join tabletable append to file
  • 30. Challenge - State in Processors 30
  • 31. State - Nightmare of every distributed systems engineer Streaming: Data just rushes through Why do we need state? • Time window processing • Data you want to join with Formerly, the database did it for you 31
  • 32. State - Some Challenges? Failure of a processor Scaling 32
  • 33. State in Stream Processors - Possible Solutions • Just keep in memory. Reprocess stream to warm up • Each processor to keep its own db • Save condensed in stream • Get it from other service Frameworks exist in other languages: • for example: Kafka Streams, Apache Samza Up to yesterday: none in python. Then heard about https://github.com/wintoncode/winton-kafka-streams 33
  • 34. Summary • You have more options for your data processing applications than you might have thought • As always, there are some tradeoffs • You know the challenges 34
  • 36. Blue Yonder GmbH Ohiostraße 8 76149 Karlsruhe Germany +49 721 383117 0 Blue Yonder Software Limited 19 Eastbourne Terrace London, W2 6LG United Kingdom +44 20 3626 0360 Blue Yonder Best decisions, delivered daily Blue Yonder Analytics, Inc. 5048 Tennyson Parkway Suite 250 Plano, Texas 75024 USA 36