SlideShare a Scribd company logo
Essential Ingredients of
Stream Processing @
Scale
Kartik Paramasivam
About Me
• ‘Streams Infrastructure’ at LinkedIn
– Pub-sub messaging : Apache Kafka
– Change Capture from various data systems:
Databus
– Stream Processing platform : Apache Samza
• Previous
– Built Microsoft Cloud Messaging (EventHub)
and Enterprise Messaging(Queues/Topics)
– .NET WebServices and Workflow stack
– BizTalk Server
Agenda
• What is Stream Processing ?
• Scenarios
• Canonical Architecture
• Essential Ingredients of Stream
Processing
• Close
Response latency
Stream processing
Milliseconds to minutes
RPC
Synchronous Later. Possibly much later.
0 ms
Agenda
• Stream processing Intro
• Scenarios
• Canonical Architecture
• Essential Ingredients of Stream
Processing
• Close
Newsfeed
Cyber-security
Internet of Things
Agenda
• Stream processing Intro
• Scenarios
• Canonical Architecture
• Essential Ingredients of Stream
Processing
• Close
CANONICA
L
ARCHITECT
URE
Dat
a-
Bus
Dat
a-
Bus
Real Time
Processing
(Samza)
Real Time
Processing
(Samza)
Batch
Processing
(Hadoop/Spa
rk)
Batch
Processing
(Hadoop/Spa
rk)
Volde
mort
R/O
Volde
mort
R/O
e.g.
Espress
o
e.g.
Espress
o
Processing
Bulk
upload
EspressoEspresso
Services TierServices Tier
Ingestion Serving
Clients(browser,devices,
sensors ….)
Kafk
a
Kafk
a
Agenda
• Stream processing Intro
• Scenarios
• Canonical Architecture
• Essential Ingredients of Stream
Processing
• Close
Essential Ingredients to Stream
Processing
1.Scale
2.Reprocessing
3.Accuracy of results
4.Easy to program
SCALE.. but not at any
cost
Basics : Scaling Ingestion
- Streams are
partitioned
- Messages sent to
partitions based on
PartitionKey
- Time based message
retention
Stream A
producersproducers
Pkey=10
consumerA
(machine1)
consumerA
(machine1)
consumerA
(machine2)
consumerA
(machine2)
Pkey=25 Pkey=45
e.g. Kafka, AWS Kinesis, Azure
EventHub
Scaling Processing.. E.g.
SamzaStream A
Task 1Task 1 Task 2Task 2 Task 3Task 3
Stream B
Samza Job
Samza – Streaming
Dataflow
Stream A
Stream c
Stream D
Job 1
Job 2
Stream B
Horizontal Scaling is great ! But..
• But more machines means
more $$
• Need to do more with less.
• So what’s the key bottleneck
during Event/Stream
Processing ?
Key Bottleneck: “Accessing Data”
• Big impact on CPU, Network,
Disk
• Types of Data Access
1. Adjunct data – Read only data
2. Scratchpad/derived data - Read-
Write data
Adjunct Data – typical
access
Kafk
a
Kafk
a
AdClicks Processing
Job
Processing
Job
AdQuality update
Kafk
a
Kafk
a
Membe
r
Databa
se
Membe
r
Databa
se
Read Member
Info
Concerns
1. Latency
2. CPU
3. Network
4. DDOS
Scratch pad/Derived Data – typical
access
Kaf
ka
Kaf
ka
Sensor
Data
Processing
Job
Processing
Job
Alerts
Kafk
a
Kafk
a
Device
State
Databa
se
Device
State
Databa
se
Concerns
1. Latency
2. CPU
3. Network
4. DDOS
Read + Update
per Device Info
Adjunct Data – with Samza
Kafk
a
Kafk
a
AdClicks
Processing Job
output
Kafk
a
Kafk
a
Member
Databas
e
(espress
o)
Member
Databas
e
(espress
o)
Datab
us
Datab
us
Kafka, Databus, Database, Samza Job
are all partitioned by MemberId
Member
Updates
Task1Task1
Task2Task2
Task3Task3
RocksDbRocksDb
Fault Tolerance in a stateful Samza
job
P0
P1
P2
P3
Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3
P0
P1
P2
P3
Host-A Host-B Host-C
Changelog Stream
Stable State
Fault Tolerance in a stateful Samza
job
P0
P1
P2
P3
Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3
P0
P1
P2
P3
Host-A Host-B Host-C
Changelog Stream
Host A dies/fails
Fault Tolerance in a stateful Samza
job
P0
P1
P2
P3
Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3
P0
P1
P2
P3
Host-E Host-B Host-C
Changelog Stream
YARN allocates
the tasks to a
container on a
different host!
Fault Tolerance in a stateful Samza
job
P0
P1
P2
P3
Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3
P0
P1
P2
P3
Host-E Host-B Host-C
Changelog Stream
Restore local
state by reading
from the
ChangeLog
Fault Tolerance in a stateful Samza
job
P0
P1
P2
P3
Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3
P0
P1
P2
P3
Host-E Host-B Host-C
Changelog Stream
Back to Stable
State
Hardware Spec: 24 cores, 1Gig NIC,
SSD
• (Baseline) Simple pass through job
with no local state
– 1.2 Million msg/sec
• Samza job with local state
– 400k msg/sec
• Samza job with local state with Kafka
backup
– 300k msg/sec
Performance Numbers with Samza
Local State - Summary
• Great for both read-only data and
read-write data
• Secret sauce to make local state
work
1. Change Capture System:
Databus/DynamoDB streams
2. Durable backup with Kafka Log
Compacted topics
Essential Ingredients to Stream
Processing
1.Scale
2.Reprocessing
3.Accuracy of results
4.Easy to program
REPROCESSING
Why do we need it ?
• Software upgrades.. Yes bugs are a
reality
• Business logic changes
• First time job deployment
Reprocessing Data – with
Samza
output
Kafk
a
Kafk
a
Member
Databas
e
(espress
o)
Member
Databas
e
(espress
o)
Datab
us
Datab
us
Member
Updates
Company/Title/
Location
StandardIzatio
n
Job
Company/Title/
Location
StandardIzatio
n
Job
Machin
e
Learnin
g
model
Machin
e
Learnin
g
model
bootstrap
Reprocessing- Caveats
• Stream processors are fast.. They can
DOS the system if you reprocess
– Control max-concurrency of your job
– Quotas for Kafka, Databases
– Async load into databases (Project Venice)
• Capacity
– Reprocessing a 100 TB source ?
• Doesn’t reprocessing mean you are no-
longer being real-time ?
Essential Ingredients to Stream
Processing
1.Scale but at not at any cost
2.Reprocessing
3.Accuracy of results
4.Easy to Program
ACCURACY OF RESULTS
Querying over an infinite stream
1.00
pm
Ad View Event
1:01
pm
Ad Click Event
Ad
Quality
Processor
Ad
Quality
Processor
User1
Did user
click the
Ad within
2 minutes
of seeing
the Ad
WHY
DELAYS
HAPPEN ?
Ad Quality
Processor
(Samza)
Ad Quality
Processor
(Samza)
Services TierServices Tier
Kafk
a
Kafk
a
Services TierServices Tier
Ad Quality
Processor
(Samza)
Ad Quality
Processor
(Samza)
Kafk
a
Kafk
a
Mirrored
kartik
DATACENTE
R 1
DATACENTE
R 2
AdViewEve
nt
L
B
WHY
DELAYS
HAPPEN ?
Real Time
Processing
(Samza)
Real Time
Processing
(Samza)
Services TierServices Tier
Kafk
a
Kafk
a
Services TierServices Tier
Real Time
Processing
(Samza)
Real Time
Processing
(Samza)
Kafk
a
Kafk
a
Mirrored
kartik
DATACENTE
R 1
DATACENTE
R 2
AdClick Event
L
B
What do we need to do to get
accurate results?
Deal with
• Late Arrivals
– E.g. AdClick event showed up 5 minutes
late.
• Out of order arrival
– E.g. AdClick event showed up before
AdView event
• Influenced by “Google MillWheel”
Solution
Kafk
a
Kafk
a
AdClicks
Processing Job
output
Kaf
ka
Kaf
ka
Task1Task1
Task2Task2
Task3Task3
Messag
e Store
Messag
e Store
Kafk
a
Kafk
a
AdView Messag
e
Store
Messag
e
Store
Messag
e
Store
Messag
e
Store
1. All events are stored locally
2. Find impacted ‘window/s’ for
late arrivals
3. Recompute result
4. Choose strategy for emitting
results (absolute or relative
Myth: This isn’t a problem with
Lambda Architecture..
• Theory: Since the processing
happens 1 hour or several hours later
delays are not a problem.
• Ok.. But what about the “edges”
– Some “sessions” start before the cut off
time for processing.. And end after the
cut off time.
– Delays and out of order processing
make things worse on the edges
Essential Ingredients to Stream
Processing
1.Scale but at not at any cost
2.Reprocessing
3.Accuracy of results
4.Easy Programmability
Easy Programmability
• Support for “accurate” Windowing/Joins.
( Google Cloud Dataflow )
• Ability to express workflows/DAGs in
config and DSL (e.g. Storm)
• SQL support for querying over streams
– Azure Stream Insight
• Apache Samza – working on the above
Agenda
• Stream processing Intro
• Scenarios
• Canonical Architecture
• Essential Ingredients of Stream
Processing
• Close
Some scale numbers at LinkedIn
• 1.3 Trillion Messages get ingested
into Kafka per day
– Each message gets consumed 4-5 times
• Database change capture :
– A few Trillion Messages get consumed
per week
• Samza jobs in production which
process more than 1 Million
messages/sec
References
• http://samza.apache.org/
• http://kafka.apache.org/
• https://github.com/linkedin/databus
• http://cs.brown.edu/~ugur/
8rulesSigRec.pdf
• http://www.cs.cmu.edu/~pavlo/courses
/fall2013/static/papers/p734-
akidau.pdf
Thank You!

More Related Content

What's hot

It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
HostedbyConfluent
 
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
confluent
 
Lambda-less stream processing - linked in
Lambda-less stream processing - linked inLambda-less stream processing - linked in
Lambda-less stream processing - linked in
Yi Pan
 
ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
Renato Javier Marroquín Mogrovejo
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
HostedbyConfluent
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
confluent
 
Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, Confluent
HostedbyConfluent
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
StreamNative
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
HostedbyConfluent
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
confluent
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
confluent
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
Jacob Maes
 

What's hot (20)

It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
 
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
 
Lambda-less stream processing - linked in
Lambda-less stream processing - linked inLambda-less stream processing - linked in
Lambda-less stream processing - linked in
 
ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, Confluent
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 

Viewers also liked

Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Big Data Spain
 
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data Spain
 
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Big Data Spain
 
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
Big Data Spain
 
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
Big Data Spain
 
A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...
Big Data Spain
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Big Data Spain
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Big Data Spain
 
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data Spain
 
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Big Data Spain
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
Big Data Spain
 
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Big Data Spain
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
Big Data Spain
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Big Data Spain
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Big Data Spain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Spain
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
Big Data Spain
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
Big Data Spain
 

Viewers also liked (20)

Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
 
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
 
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
 
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
 
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
 
A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
 
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
 
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
 
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
 
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
 

Similar to Essential ingredients for real time stream processing @Scale by Kartik pParamasivam at Big Data Spain 2015

Essential Ingredients of Realtime Stream Processing @ Scale
Essential Ingredients of Realtime Stream Processing @ ScaleEssential Ingredients of Realtime Stream Processing @ Scale
Essential Ingredients of Realtime Stream Processing @ Scale
Kartik Paramasivam
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
DataWorks Summit/Hadoop Summit
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
Steven Wu
 
Samza la hug
Samza la hugSamza la hug
Samza la hug
Sriram Subramanian
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Samza at LinkedIn
Samza at LinkedInSamza at LinkedIn
Samza at LinkedIn
Venu Ryali
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
Abhishek Shivanna
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
Martin Kleppmann
 
Samza tech talk_2015 - huawei
Samza tech talk_2015 - huaweiSamza tech talk_2015 - huawei
Samza tech talk_2015 - huawei
Yi Pan
 
Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0
Navina Ramesh
 
Stream processing in python with Apache Samza and Beam
Stream processing in python with Apache Samza and BeamStream processing in python with Apache Samza and Beam
Stream processing in python with Apache Samza and Beam
Hai Lu
 
Samza portable runner for beam
Samza portable runner for beamSamza portable runner for beam
Samza portable runner for beam
Hai Lu
 
DRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing ArchitecturesDRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing Architectures
Mohamed Mehdi Ben Aissa
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServices
Stéphane Maldini
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
Konstantinos Kloudas
 
Client-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command PatternClient-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command Pattern
pgt technology scouting GmbH
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
Ed Yakabosky
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward
 

Similar to Essential ingredients for real time stream processing @Scale by Kartik pParamasivam at Big Data Spain 2015 (20)

Essential Ingredients of Realtime Stream Processing @ Scale
Essential Ingredients of Realtime Stream Processing @ ScaleEssential Ingredients of Realtime Stream Processing @ Scale
Essential Ingredients of Realtime Stream Processing @ Scale
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
Samza la hug
Samza la hugSamza la hug
Samza la hug
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Samza at LinkedIn
Samza at LinkedInSamza at LinkedIn
Samza at LinkedIn
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
 
Samza tech talk_2015 - huawei
Samza tech talk_2015 - huaweiSamza tech talk_2015 - huawei
Samza tech talk_2015 - huawei
 
Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0
 
Stream processing in python with Apache Samza and Beam
Stream processing in python with Apache Samza and BeamStream processing in python with Apache Samza and Beam
Stream processing in python with Apache Samza and Beam
 
Samza portable runner for beam
Samza portable runner for beamSamza portable runner for beam
Samza portable runner for beam
 
DRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing ArchitecturesDRP for Big Data - Stream Processing Architectures
DRP for Big Data - Stream Processing Architectures
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServices
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
Client-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command PatternClient-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command Pattern
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 

Essential ingredients for real time stream processing @Scale by Kartik pParamasivam at Big Data Spain 2015

  • 1.
  • 2. Essential Ingredients of Stream Processing @ Scale Kartik Paramasivam
  • 3. About Me • ‘Streams Infrastructure’ at LinkedIn – Pub-sub messaging : Apache Kafka – Change Capture from various data systems: Databus – Stream Processing platform : Apache Samza • Previous – Built Microsoft Cloud Messaging (EventHub) and Enterprise Messaging(Queues/Topics) – .NET WebServices and Workflow stack – BizTalk Server
  • 4. Agenda • What is Stream Processing ? • Scenarios • Canonical Architecture • Essential Ingredients of Stream Processing • Close
  • 6. Agenda • Stream processing Intro • Scenarios • Canonical Architecture • Essential Ingredients of Stream Processing • Close
  • 10. Agenda • Stream processing Intro • Scenarios • Canonical Architecture • Essential Ingredients of Stream Processing • Close
  • 12. Agenda • Stream processing Intro • Scenarios • Canonical Architecture • Essential Ingredients of Stream Processing • Close
  • 13. Essential Ingredients to Stream Processing 1.Scale 2.Reprocessing 3.Accuracy of results 4.Easy to program
  • 14. SCALE.. but not at any cost
  • 15. Basics : Scaling Ingestion - Streams are partitioned - Messages sent to partitions based on PartitionKey - Time based message retention Stream A producersproducers Pkey=10 consumerA (machine1) consumerA (machine1) consumerA (machine2) consumerA (machine2) Pkey=25 Pkey=45 e.g. Kafka, AWS Kinesis, Azure EventHub
  • 16. Scaling Processing.. E.g. SamzaStream A Task 1Task 1 Task 2Task 2 Task 3Task 3 Stream B Samza Job
  • 17. Samza – Streaming Dataflow Stream A Stream c Stream D Job 1 Job 2 Stream B
  • 18. Horizontal Scaling is great ! But.. • But more machines means more $$ • Need to do more with less. • So what’s the key bottleneck during Event/Stream Processing ?
  • 19. Key Bottleneck: “Accessing Data” • Big impact on CPU, Network, Disk • Types of Data Access 1. Adjunct data – Read only data 2. Scratchpad/derived data - Read- Write data
  • 20. Adjunct Data – typical access Kafk a Kafk a AdClicks Processing Job Processing Job AdQuality update Kafk a Kafk a Membe r Databa se Membe r Databa se Read Member Info Concerns 1. Latency 2. CPU 3. Network 4. DDOS
  • 21. Scratch pad/Derived Data – typical access Kaf ka Kaf ka Sensor Data Processing Job Processing Job Alerts Kafk a Kafk a Device State Databa se Device State Databa se Concerns 1. Latency 2. CPU 3. Network 4. DDOS Read + Update per Device Info
  • 22. Adjunct Data – with Samza Kafk a Kafk a AdClicks Processing Job output Kafk a Kafk a Member Databas e (espress o) Member Databas e (espress o) Datab us Datab us Kafka, Databus, Database, Samza Job are all partitioned by MemberId Member Updates Task1Task1 Task2Task2 Task3Task3 RocksDbRocksDb
  • 23. Fault Tolerance in a stateful Samza job P0 P1 P2 P3 Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3 P0 P1 P2 P3 Host-A Host-B Host-C Changelog Stream Stable State
  • 24. Fault Tolerance in a stateful Samza job P0 P1 P2 P3 Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3 P0 P1 P2 P3 Host-A Host-B Host-C Changelog Stream Host A dies/fails
  • 25. Fault Tolerance in a stateful Samza job P0 P1 P2 P3 Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3 P0 P1 P2 P3 Host-E Host-B Host-C Changelog Stream YARN allocates the tasks to a container on a different host!
  • 26. Fault Tolerance in a stateful Samza job P0 P1 P2 P3 Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3 P0 P1 P2 P3 Host-E Host-B Host-C Changelog Stream Restore local state by reading from the ChangeLog
  • 27. Fault Tolerance in a stateful Samza job P0 P1 P2 P3 Task-0Task-0 Task-1Task-1 Task-2Task-2 Task-3Task-3 P0 P1 P2 P3 Host-E Host-B Host-C Changelog Stream Back to Stable State
  • 28. Hardware Spec: 24 cores, 1Gig NIC, SSD • (Baseline) Simple pass through job with no local state – 1.2 Million msg/sec • Samza job with local state – 400k msg/sec • Samza job with local state with Kafka backup – 300k msg/sec Performance Numbers with Samza
  • 29. Local State - Summary • Great for both read-only data and read-write data • Secret sauce to make local state work 1. Change Capture System: Databus/DynamoDB streams 2. Durable backup with Kafka Log Compacted topics
  • 30. Essential Ingredients to Stream Processing 1.Scale 2.Reprocessing 3.Accuracy of results 4.Easy to program
  • 32. Why do we need it ? • Software upgrades.. Yes bugs are a reality • Business logic changes • First time job deployment
  • 33. Reprocessing Data – with Samza output Kafk a Kafk a Member Databas e (espress o) Member Databas e (espress o) Datab us Datab us Member Updates Company/Title/ Location StandardIzatio n Job Company/Title/ Location StandardIzatio n Job Machin e Learnin g model Machin e Learnin g model bootstrap
  • 34. Reprocessing- Caveats • Stream processors are fast.. They can DOS the system if you reprocess – Control max-concurrency of your job – Quotas for Kafka, Databases – Async load into databases (Project Venice) • Capacity – Reprocessing a 100 TB source ? • Doesn’t reprocessing mean you are no- longer being real-time ?
  • 35. Essential Ingredients to Stream Processing 1.Scale but at not at any cost 2.Reprocessing 3.Accuracy of results 4.Easy to Program
  • 37. Querying over an infinite stream 1.00 pm Ad View Event 1:01 pm Ad Click Event Ad Quality Processor Ad Quality Processor User1 Did user click the Ad within 2 minutes of seeing the Ad
  • 38. WHY DELAYS HAPPEN ? Ad Quality Processor (Samza) Ad Quality Processor (Samza) Services TierServices Tier Kafk a Kafk a Services TierServices Tier Ad Quality Processor (Samza) Ad Quality Processor (Samza) Kafk a Kafk a Mirrored kartik DATACENTE R 1 DATACENTE R 2 AdViewEve nt L B
  • 39. WHY DELAYS HAPPEN ? Real Time Processing (Samza) Real Time Processing (Samza) Services TierServices Tier Kafk a Kafk a Services TierServices Tier Real Time Processing (Samza) Real Time Processing (Samza) Kafk a Kafk a Mirrored kartik DATACENTE R 1 DATACENTE R 2 AdClick Event L B
  • 40. What do we need to do to get accurate results? Deal with • Late Arrivals – E.g. AdClick event showed up 5 minutes late. • Out of order arrival – E.g. AdClick event showed up before AdView event • Influenced by “Google MillWheel”
  • 41. Solution Kafk a Kafk a AdClicks Processing Job output Kaf ka Kaf ka Task1Task1 Task2Task2 Task3Task3 Messag e Store Messag e Store Kafk a Kafk a AdView Messag e Store Messag e Store Messag e Store Messag e Store 1. All events are stored locally 2. Find impacted ‘window/s’ for late arrivals 3. Recompute result 4. Choose strategy for emitting results (absolute or relative
  • 42. Myth: This isn’t a problem with Lambda Architecture.. • Theory: Since the processing happens 1 hour or several hours later delays are not a problem. • Ok.. But what about the “edges” – Some “sessions” start before the cut off time for processing.. And end after the cut off time. – Delays and out of order processing make things worse on the edges
  • 43. Essential Ingredients to Stream Processing 1.Scale but at not at any cost 2.Reprocessing 3.Accuracy of results 4.Easy Programmability
  • 44. Easy Programmability • Support for “accurate” Windowing/Joins. ( Google Cloud Dataflow ) • Ability to express workflows/DAGs in config and DSL (e.g. Storm) • SQL support for querying over streams – Azure Stream Insight • Apache Samza – working on the above
  • 45. Agenda • Stream processing Intro • Scenarios • Canonical Architecture • Essential Ingredients of Stream Processing • Close
  • 46. Some scale numbers at LinkedIn • 1.3 Trillion Messages get ingested into Kafka per day – Each message gets consumed 4-5 times • Database change capture : – A few Trillion Messages get consumed per week • Samza jobs in production which process more than 1 Million messages/sec
  • 47. References • http://samza.apache.org/ • http://kafka.apache.org/ • https://github.com/linkedin/databus • http://cs.brown.edu/~ugur/ 8rulesSigRec.pdf • http://www.cs.cmu.edu/~pavlo/courses /fall2013/static/papers/p734- akidau.pdf