SlideShare a Scribd company logo
1 of 45
Stateful Stream
Processing at In-Memory
Speed
Jamie Grier
@jamiegrier
jamie@data-artisans.com
Who am I?
• Director of Applications Engineering at data
Artisans
• Previously working on streaming computation at
Twitter, Gnip and Boulder Imaging
• Involved in various kinds of stream processing for
about a decade
• High-speed video, social media streaming, general
frameworks for stream processing
Overview
• In stateful stream processing the bottleneck has often
been the key-value store
• Accuracy has been sacrificed for speed
• Lambda Architecture was developed to address
shortcomings of stream processors
• Can we remove the key-value store bottleneck and
enable processing at in-memory speeds?
• Can we do this accurately without Lamba Architecture?
Problem statement
• Incoming message rate: 1.5 million/sec
• Group by several dimensions and aggregate
over 1 hour event-time windows
• Write hourly time series data to database
• Respond to queries both over historical data and
the live in-flight aggregates
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:01
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:03
tweet-id: 2, event: url-
click, time: 02:01:01
tweet-id: 1, event:
impression, time:
02:02:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Time Series Data
0
25
50
75
100
125
01:00:00 02:00:00 03:00:00 04:00:00
Tweet Impressions
Tweet 1 Tweet 2
Any questions so far?
Legacy System
Stream Processor
Hadoop
Lambda Architecture
Streaming
Batch
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
• Aggregates built directly in
key/value store
• Read/modify/write for every
message
• Inaccurate: double-counting,
lost pre-aggregated data
• Hadoop job improves results
after 24 hours
Legacy System
(Lambda Architecture)
Any questions so far?
Goals for Prototype
System
• Feature parity with existing system
• Attempt to reduce hardware footprint by 100x
• Exactly once semantics: compute correct results in real-
time with or without failures. Failures should not lead to
missing data or double counting
• Satisfy realtime queries with low latency
• One system: No Lambda Architecture!
• Eliminate the key/value store bottleneck (big win)
My road to
Apache Flink
• Interested in Google Cloud Dataflow
• Google nailed the semantics for stream processing
• Unified batch and stream processing with one model
• Dataflow didn’t exist in open source at the time (or so I
thought) and I wanted to build it.
• My wife wouldn’t let me quit my job!
• Dataflow SDK is now open source as Apache Beam and
Flink is the most complete runner.
Why Apache Flink?
• Basically identical semantics to Google Cloud Dataflow
• Flink is a true fault-tolerant stateful stream processor
• Exactly once guarantees for state updates
• The state management features might allow us to eliminate the key-value
store
• Windowing is built-in which makes time series easy
• Native event time support / correct time based aggregations
• Very fast data shuffling in benchmarks: 83 million msgs/sec on 30 machines
• Flink “just works” with no tuning - even at scale!
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
We now have a sharded key/value store
inside the stream processor
Streaming
Prototype System
Apache Flink
Why not just query that!
We now have a sharded key/value store
inside the stream processor
Streaming
Prototype System
Apache Flink
Query
Servic
e
Why not just query that!
We now have a sharded key/value store
inside the stream processor
Prototype System
• Eliminates the key-value store
bottleneck
• Eliminates the batch layer
• No more Lambda Architecture!
• Realtime queries over in-flight
aggregates
• Hourly aggregates written to
database
The Results
• Uses 0.5% of the resources of the legacy system:
An improvement of 200x with zero tuning!
• Exactly once analytics in realtime
• Complete elimination of batch layer and Lambda
Architecture
• Successfully eliminated the key-value store
bottleneck
How is 200x improvement
possible?
• The key is making use of fault-tolerant state inside the
stream processor
• Computation proceeds at in-memory speeds
• No need to make requests over the network to update
values in external store
• Dramatically less load on the database because only the
completed window aggregates are written there.
• Flink is extremely efficient at network I/O and data shuffling,
and has highly optimized serialization architecture
Does this matter
at smaller scale?
• YES it does!
• Much larger problems on the same hardware
investment
• Exactly-once semantics and state management
is important at any scale!
• Engineering time invested can be expensive at
any scale if things don’t “just work”.
Summary
• Used stateful operator features in Flink to remove
the key/value store bottleneck
• Dramatic reduction in hardware costs (200x)
• Maintained feature parity by providing low-latency
queries for in flight aggregates as well as long-
term storage of hourly time series data
• Actually improved accuracy of aggregations:
Exactly-once vs. at least once semantics
Questions?
Thanks!

More Related Content

What's hot

The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFlink Forward
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkFlink Forward
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinFlink Forward
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingFlink Forward
 
Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Henry Saputra
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
 

What's hot (20)

The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
 
Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 

Viewers also liked

Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkJamie Grier
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
 
Workshop 9venos recuperaciones 2015 grado 9 veno
Workshop 9venos recuperaciones 2015 grado 9 venoWorkshop 9venos recuperaciones 2015 grado 9 veno
Workshop 9venos recuperaciones 2015 grado 9 venojolehidy6
 
Como crear una pagina web en línea
Como crear una pagina web en líneaComo crear una pagina web en línea
Como crear una pagina web en líneaShirley Trejo
 
Revista Acción Marcial - Número 17
Revista Acción Marcial - Número 17Revista Acción Marcial - Número 17
Revista Acción Marcial - Número 17Eskrima Kombat
 
Catálogo IM DC/POS
Catálogo IM DC/POSCatálogo IM DC/POS
Catálogo IM DC/POSnvalente2
 
Acne Treatments
Acne TreatmentsAcne Treatments
Acne Treatmentsfacedoctor
 
Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...
Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...
Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...Amancai Argomedo Carmona
 
Anuvrat_REPORT AT SPRING FAILURE
Anuvrat_REPORT AT SPRING FAILUREAnuvrat_REPORT AT SPRING FAILURE
Anuvrat_REPORT AT SPRING FAILUREAnuvrat Shukla
 
Chapter 01 power_point
Chapter 01 power_pointChapter 01 power_point
Chapter 01 power_pointncash513
 
Presentacion mi ciudad
Presentacion mi ciudadPresentacion mi ciudad
Presentacion mi ciudadyouni22
 
Trabajo extra de matematicas de David Paredes
Trabajo extra de matematicas de David ParedesTrabajo extra de matematicas de David Paredes
Trabajo extra de matematicas de David ParedesRodrigo Paredes
 
Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...
Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...
Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...Arsys
 
Blanco dejenaro
Blanco dejenaroBlanco dejenaro
Blanco dejenaroisfaschool
 
UHY-Capability-Statement-2015
UHY-Capability-Statement-2015UHY-Capability-Statement-2015
UHY-Capability-Statement-2015Mihael Rot
 
Cogents Performance Marketing Group
Cogents Performance Marketing GroupCogents Performance Marketing Group
Cogents Performance Marketing Groupcogentads
 
Jill Obenauer Resume 2016 edited
Jill Obenauer Resume 2016 editedJill Obenauer Resume 2016 edited
Jill Obenauer Resume 2016 editedJill Obenauer
 

Viewers also liked (20)

Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming Benchmark
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Workshop 9venos recuperaciones 2015 grado 9 veno
Workshop 9venos recuperaciones 2015 grado 9 venoWorkshop 9venos recuperaciones 2015 grado 9 veno
Workshop 9venos recuperaciones 2015 grado 9 veno
 
Como crear una pagina web en línea
Como crear una pagina web en líneaComo crear una pagina web en línea
Como crear una pagina web en línea
 
Revista Acción Marcial - Número 17
Revista Acción Marcial - Número 17Revista Acción Marcial - Número 17
Revista Acción Marcial - Número 17
 
Catálogo IM DC/POS
Catálogo IM DC/POSCatálogo IM DC/POS
Catálogo IM DC/POS
 
Portoviejo rock city
Portoviejo rock cityPortoviejo rock city
Portoviejo rock city
 
NewMahwah - Linkedin para Ejecutivos de Ventas
NewMahwah - Linkedin para Ejecutivos de VentasNewMahwah - Linkedin para Ejecutivos de Ventas
NewMahwah - Linkedin para Ejecutivos de Ventas
 
Acne Treatments
Acne TreatmentsAcne Treatments
Acne Treatments
 
Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...
Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...
Identidad lésbica en la literatura chilena reciente por amancai argomedo carm...
 
Anuvrat_REPORT AT SPRING FAILURE
Anuvrat_REPORT AT SPRING FAILUREAnuvrat_REPORT AT SPRING FAILURE
Anuvrat_REPORT AT SPRING FAILURE
 
Chapter 01 power_point
Chapter 01 power_pointChapter 01 power_point
Chapter 01 power_point
 
Presentacion mi ciudad
Presentacion mi ciudadPresentacion mi ciudad
Presentacion mi ciudad
 
Trabajo extra de matematicas de David Paredes
Trabajo extra de matematicas de David ParedesTrabajo extra de matematicas de David Paredes
Trabajo extra de matematicas de David Paredes
 
Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...
Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...
Tu web con fundamento también en el móvil (más chicha, menos perejil) - Salón...
 
Blanco dejenaro
Blanco dejenaroBlanco dejenaro
Blanco dejenaro
 
UHY-Capability-Statement-2015
UHY-Capability-Statement-2015UHY-Capability-Statement-2015
UHY-Capability-Statement-2015
 
Cogents Performance Marketing Group
Cogents Performance Marketing GroupCogents Performance Marketing Group
Cogents Performance Marketing Group
 
Jill Obenauer Resume 2016 edited
Jill Obenauer Resume 2016 editedJill Obenauer Resume 2016 edited
Jill Obenauer Resume 2016 edited
 

Similar to Stateful Stream Processing at In-Memory Speed

Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep DiveJavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep DiveAndreas Grabner
 
Event Driven Architectures - Net Conf UY 2018
Event Driven Architectures - Net Conf UY 2018Event Driven Architectures - Net Conf UY 2018
Event Driven Architectures - Net Conf UY 2018Bradley Irby
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NETDavid Giard
 
Building real time applications with Symfony2
Building real time applications with Symfony2Building real time applications with Symfony2
Building real time applications with Symfony2Antonio Peric-Mazar
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Catalogic Software
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!Andreas Grabner
 
Reactive Development: Commands, Actors and Events. Oh My!!
Reactive Development: Commands, Actors and Events.  Oh My!!Reactive Development: Commands, Actors and Events.  Oh My!!
Reactive Development: Commands, Actors and Events. Oh My!!David Hoerster
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Sps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flowSps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flowVincent Biret
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Vadym Kazulkin
 
Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsITProceed
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsAndreas Grabner
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017Speedment, Inc.
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profitRodrigo Campos
 

Similar to Stateful Stream Processing at In-Memory Speed (20)

Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep DiveJavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep Dive
 
Event Driven Architectures - Net Conf UY 2018
Event Driven Architectures - Net Conf UY 2018Event Driven Architectures - Net Conf UY 2018
Event Driven Architectures - Net Conf UY 2018
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Building real time applications with Symfony2
Building real time applications with Symfony2Building real time applications with Symfony2
Building real time applications with Symfony2
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
 
Redundant devops
Redundant devopsRedundant devops
Redundant devops
 
Reactive Development: Commands, Actors and Events. Oh My!!
Reactive Development: Commands, Actors and Events.  Oh My!!Reactive Development: Commands, Actors and Events.  Oh My!!
Reactive Development: Commands, Actors and Events. Oh My!!
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Sps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flowSps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flow
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...
 
Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico Jacobs
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Stateful Stream Processing at In-Memory Speed

  • 1. Stateful Stream Processing at In-Memory Speed Jamie Grier @jamiegrier jamie@data-artisans.com
  • 2. Who am I? • Director of Applications Engineering at data Artisans • Previously working on streaming computation at Twitter, Gnip and Boulder Imaging • Involved in various kinds of stream processing for about a decade • High-speed video, social media streaming, general frameworks for stream processing
  • 3. Overview • In stateful stream processing the bottleneck has often been the key-value store • Accuracy has been sacrificed for speed • Lambda Architecture was developed to address shortcomings of stream processors • Can we remove the key-value store bottleneck and enable processing at in-memory speeds? • Can we do this accurately without Lamba Architecture?
  • 4. Problem statement • Incoming message rate: 1.5 million/sec • Group by several dimensions and aggregate over 1 hour event-time windows • Write hourly time series data to database • Respond to queries both over historical data and the live in-flight aggregates
  • 5. Input and Queries Stream tweet-id: 1, event: url- click, time: 01:01:01 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:03 tweet-id: 2, event: url- click, time: 02:01:01 tweet-id: 1, event: impression, time: 02:02:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1
  • 6. Input and Queries Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1
  • 7. Input and Queries Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1 Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02
  • 8. Input and Queries Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1
  • 9. Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1 Input and Queries Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02
  • 10. Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1 Input and Queries
  • 11. Time Series Data 0 25 50 75 100 125 01:00:00 02:00:00 03:00:00 04:00:00 Tweet Impressions Tweet 1 Tweet 2
  • 13. Legacy System Stream Processor Hadoop Lambda Architecture Streaming Batch
  • 22. • Aggregates built directly in key/value store • Read/modify/write for every message • Inaccurate: double-counting, lost pre-aggregated data • Hadoop job improves results after 24 hours Legacy System (Lambda Architecture)
  • 24. Goals for Prototype System • Feature parity with existing system • Attempt to reduce hardware footprint by 100x • Exactly once semantics: compute correct results in real- time with or without failures. Failures should not lead to missing data or double counting • Satisfy realtime queries with low latency • One system: No Lambda Architecture! • Eliminate the key/value store bottleneck (big win)
  • 25. My road to Apache Flink • Interested in Google Cloud Dataflow • Google nailed the semantics for stream processing • Unified batch and stream processing with one model • Dataflow didn’t exist in open source at the time (or so I thought) and I wanted to build it. • My wife wouldn’t let me quit my job! • Dataflow SDK is now open source as Apache Beam and Flink is the most complete runner.
  • 26. Why Apache Flink? • Basically identical semantics to Google Cloud Dataflow • Flink is a true fault-tolerant stateful stream processor • Exactly once guarantees for state updates • The state management features might allow us to eliminate the key-value store • Windowing is built-in which makes time series easy • Native event time support / correct time based aggregations • Very fast data shuffling in benchmarks: 83 million msgs/sec on 30 machines • Flink “just works” with no tuning - even at scale!
  • 36. Prototype System Apache Flink We now have a sharded key/value store inside the stream processor Streaming
  • 37. Prototype System Apache Flink Why not just query that! We now have a sharded key/value store inside the stream processor Streaming
  • 38. Prototype System Apache Flink Query Servic e Why not just query that! We now have a sharded key/value store inside the stream processor
  • 39. Prototype System • Eliminates the key-value store bottleneck • Eliminates the batch layer • No more Lambda Architecture! • Realtime queries over in-flight aggregates • Hourly aggregates written to database
  • 40. The Results • Uses 0.5% of the resources of the legacy system: An improvement of 200x with zero tuning! • Exactly once analytics in realtime • Complete elimination of batch layer and Lambda Architecture • Successfully eliminated the key-value store bottleneck
  • 41. How is 200x improvement possible? • The key is making use of fault-tolerant state inside the stream processor • Computation proceeds at in-memory speeds • No need to make requests over the network to update values in external store • Dramatically less load on the database because only the completed window aggregates are written there. • Flink is extremely efficient at network I/O and data shuffling, and has highly optimized serialization architecture
  • 42. Does this matter at smaller scale? • YES it does! • Much larger problems on the same hardware investment • Exactly-once semantics and state management is important at any scale! • Engineering time invested can be expensive at any scale if things don’t “just work”.
  • 43. Summary • Used stateful operator features in Flink to remove the key/value store bottleneck • Dramatic reduction in hardware costs (200x) • Maintained feature parity by providing low-latency queries for in flight aggregates as well as long- term storage of hourly time series data • Actually improved accuracy of aggregations: Exactly-once vs. at least once semantics

Editor's Notes

  1. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  2. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  3. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  4. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  5. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  6. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  7. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  8. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  9. Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours