SlideShare a Scribd company logo
1 of 47
Kostas Tzoumas
@kostas_tzoumas
Apache FlinkTM
Counting elements in streams
Introduction
2
3
Data streaming is becoming
increasingly popular*
*Biggest understatement of 2016
4
Streaming technology is enabling the
obvious: continuous processing on data that
is continuously produced
5
Streaming is the next programming
paradigm for data applications, and you
need to start thinking in terms of streams
Counting
6
Continuous counting
 A seemingly simple application,
but generally an unsolved
problem
 E.g., count visitors, impressions,
interactions, clicks, etc
 Aggregations and OLAP cube
operations are generalizations
of counting
7
Counting in batch architecture
 Continuous
ingestion
 Periodic (e.g.,
hourly) files
 Periodic
batch jobs
8
Problems with batch architecture
 High latency
 Too many moving parts
 Implicit treatment of time
 Out of order event handling
 Implicit batch boundaries
9
Counting in λ architecture
 "Batch layer": what
we had before
 "Stream layer":
approximate early
results
10
Problems with batch and λ
 Way too many moving parts (and code dup)
 Implicit treatment of time
 Out of order event handling
 Implicit batch boundaries
11
Counting in streaming architecture
 Message queue ensures stream durability and replay
 Stream processor ensures consistent counting
12
Counting in Flink DataStream API
Number of visitors in last hour by country
13
DataStream<LogEvent> stream = env
.addSource(new FlinkKafkaConsumer(...)); // create stream from Kafka
.keyBy("country"); // group by country
.timeWindow(Time.minutes(60)) // window of size 1 hour
.apply(new CountPerWindowFunction()); // do operations per window
Counting hierarchy of needs
14
Continuous counting
... with low latency,
... efficiently on high volume streams,
... fault tolerant (exactly once),
... accurate and
repeatable,
... queryable
Based on Maslow's
hierarchy of needs
Counting hierarchy of needs
15
Continuous counting
Counting hierarchy of needs
16
Continuous counting
... with low latency,
Counting hierarchy of needs
17
Continuous counting
... with low latency,
... efficiently on high volume streams,
Counting hierarchy of needs
18
Continuous counting
... with low latency,
... efficiently on high volume streams,
... fault tolerant (exactly once),
Counting hierarchy of needs
19
Continuous counting
... with low latency,
... efficiently on high volume streams,
... fault tolerant (exactly once),
... accurate and
repeatable,
Counting hierarchy of needs
20
Continuous counting
... with low latency,
... efficiently on high volume streams,
... fault tolerant (exactly once),
... accurate and
repeatable,
... queryable
1.1+
Rest of this talk
21
Continuous counting
... with low latency,
... efficiently on high volume streams,
... fault tolerant (exactly once),
... accurate and
repeatable,
... queryable
Latency
22
Yahoo! Streaming Benchmark
23
 Storm, Spark Streaming, and Flink benchmark by the
Storm team at Yahoo!
 Focus on measuring end-to-end latency at low
throughputs
 First benchmark that was modeled after a real
application
 Read more:
• https://yahooeng.tumblr.com/post/135321837876/benchma
rking-streaming-computation-engines-at
Benchmark task: counting!
24
 Count ad impressions
grouped by campaign
 Compute aggregates
over last 10 seconds
 Make aggregates
available for queries
(Redis)
Results (lower is better)
25
Flink and Storm at
sub-second latencies
Spark has a latency-
throughput tradeoff
170k events/sec
Efficiency, and scalability
26
Handling high-volume streams
 Scalability: how many events/sec can a
system scale to, with infinite resources?
 Scalability comes at a cost (systems add
overhead to be scalable)
 Efficiency: how many events/sec can a
system scale to, with limited resources?
27
Extending the Yahoo! benchmark
28
 Yahoo! benchmark is a great starting point to
understand engine behavior.
 However, benchmark stops at low write throughput
and programs are not fault tolerant.
 We extended the benchmark to (1) high volumes,
and (2) to use Flink's built-in state.
• http://data-artisans.com/extending-the-yahoo-
streaming-benchmark/
Results (higher is better)
29
Also: Flink jobs are correct under failures (exactly once), Storm jobs are not
500k events/sec
3mi events/sec
15mi events/sec
Fault tolerance and repeatability
30
Stateful streaming applications
 Most interesting stream applications are stateful
 How to ensure that state is correct after failures?
31
Fault tolerance definitions
 At least once
• May over-count
• In some cases even under-count with different definition
 Exactly once
• Counts are the same after failure
 End-to-end exactly once
• Counts appear the same in an external sink (database, file
system) after failure
32
Fault tolerance in Flink
 Flink guarantees exactly once
 End-to-end exactly once supported with
specific sources and sinks
• E.g., Kafka  Flink  HDFS
 Internally, Flink periodically takes consistent
snapshots of the state without ever stopping
the computation
33
Savepoints
 Maintaining stateful
applications in production is
challenging
 Flink savepoints: externally
triggered, durable checkpoints
 Easy code upgrades (Flink or
app), maintenance, migration,
and debugging, what-if
simulations, A/B tests
34
Explicit handling of time
35
Notions of time
36
1977 1980 1983 1999 2002 2005 2015
Episode IV:
A New Hope
Episode V:
The Empire
Strikes Back
Episode VI:
Return of
the Jedi
Episode I:
The Phantom
Menace
Episode II:
Attach of
the Clones
Episode III:
Revenge of
the Sith
Episode VII:
The Force
Awakens
This is called event time
This is called processing time
Out of order streams
37
Event time windows
Arrival time windows
Instant event-at-a-time
First burst of events
Second burst of events
Why event time
 Most stream processors are limited to
processing/arrival time, Flink can operate on
event time as well
 Benefits of event time
• Accurate results for out of order data
• Sessions and unaligned windows
• Time travel (backstreaming)
38
What's coming up in Flink
39
Evolution of streaming in Flink
 Flink 0.9 (Jun 2015): DataStream API in beta, exactly-once
guarantees via checkpoiting
 Flink 0.10 (Nov 2015): Event time support, windowing
mechanism based on Dataflow/Beam model, graduated
DataStream API, high availability, state interface,
new/updated connectors (Kafka, Nifi, Elastic, ...), improved
monitoring
 Flink 1.0 (Mar 2015): DataStream API stability, out of core
state, savepoints, CEP library, improved monitoring, Kafka 0.9
support
40
Upcoming features
 SQL: ongoing work in collaboration with Apache
Calcite
 Dynamic scaling: adapt resources to stream volume,
historical stream processing
 Queryable state: ability to query the state inside the
stream processor
 Mesos support
 More sources and sinks (e.g., Kinesis, Cassandra)
41
Queryable state
42
Using the
stream
processor as a
database
Closing
43
Summary
 Stream processing gaining momentum, the right
paradigm for continuous data applications
 Even seemingly simple applications can be complex at
scale and in production – choice of framework crucial
44
 Flink: unique
combination of
capabilities,
performance, and
robustness
Flink in the wild
45
30 billion events daily 2 billion events in
10 1Gb machines
Picked Flink for "Saiki"
data integration & distribution
platform
See talks by at
Join the community!
46
 Follow: @ApacheFlink, @dataArtisans
 Read: flink.apache.org/blog, data-artisans.com/blog
 Subscribe: (news | user | dev) @ flink.apache.org
2 meetups next week in Bay Area!
47
April 6, San JoseApril 5, San Francisco
What's new in Flink 1.0 & recent performance benchmarks with Flink
http://www.meetup.com/Bay-Area-Apache-Flink-Meetup/

More Related Content

What's hot

Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin MeetupMárton Balassi
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Till Rohrmann
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Flink Forward
 
Pulsar connector on flink 1.14
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14宇帆 盛
 

What's hot (20)

Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Pulsar connector on flink 1.14
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14
 

Similar to Apache Flink at Strata San Jose 2016

Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishCodeValue
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Till Rohrmann
 
The Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache FlinkThe Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache FlinkC4Media
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 

Similar to Apache Flink at Strata San Jose 2016 (20)

Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publish
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
The Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache FlinkThe Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache Flink
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 

Recently uploaded

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Apache Flink at Strata San Jose 2016

  • 3. 3 Data streaming is becoming increasingly popular* *Biggest understatement of 2016
  • 4. 4 Streaming technology is enabling the obvious: continuous processing on data that is continuously produced
  • 5. 5 Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams
  • 7. Continuous counting  A seemingly simple application, but generally an unsolved problem  E.g., count visitors, impressions, interactions, clicks, etc  Aggregations and OLAP cube operations are generalizations of counting 7
  • 8. Counting in batch architecture  Continuous ingestion  Periodic (e.g., hourly) files  Periodic batch jobs 8
  • 9. Problems with batch architecture  High latency  Too many moving parts  Implicit treatment of time  Out of order event handling  Implicit batch boundaries 9
  • 10. Counting in λ architecture  "Batch layer": what we had before  "Stream layer": approximate early results 10
  • 11. Problems with batch and λ  Way too many moving parts (and code dup)  Implicit treatment of time  Out of order event handling  Implicit batch boundaries 11
  • 12. Counting in streaming architecture  Message queue ensures stream durability and replay  Stream processor ensures consistent counting 12
  • 13. Counting in Flink DataStream API Number of visitors in last hour by country 13 DataStream<LogEvent> stream = env .addSource(new FlinkKafkaConsumer(...)); // create stream from Kafka .keyBy("country"); // group by country .timeWindow(Time.minutes(60)) // window of size 1 hour .apply(new CountPerWindowFunction()); // do operations per window
  • 14. Counting hierarchy of needs 14 Continuous counting ... with low latency, ... efficiently on high volume streams, ... fault tolerant (exactly once), ... accurate and repeatable, ... queryable Based on Maslow's hierarchy of needs
  • 15. Counting hierarchy of needs 15 Continuous counting
  • 16. Counting hierarchy of needs 16 Continuous counting ... with low latency,
  • 17. Counting hierarchy of needs 17 Continuous counting ... with low latency, ... efficiently on high volume streams,
  • 18. Counting hierarchy of needs 18 Continuous counting ... with low latency, ... efficiently on high volume streams, ... fault tolerant (exactly once),
  • 19. Counting hierarchy of needs 19 Continuous counting ... with low latency, ... efficiently on high volume streams, ... fault tolerant (exactly once), ... accurate and repeatable,
  • 20. Counting hierarchy of needs 20 Continuous counting ... with low latency, ... efficiently on high volume streams, ... fault tolerant (exactly once), ... accurate and repeatable, ... queryable 1.1+
  • 21. Rest of this talk 21 Continuous counting ... with low latency, ... efficiently on high volume streams, ... fault tolerant (exactly once), ... accurate and repeatable, ... queryable
  • 23. Yahoo! Streaming Benchmark 23  Storm, Spark Streaming, and Flink benchmark by the Storm team at Yahoo!  Focus on measuring end-to-end latency at low throughputs  First benchmark that was modeled after a real application  Read more: • https://yahooeng.tumblr.com/post/135321837876/benchma rking-streaming-computation-engines-at
  • 24. Benchmark task: counting! 24  Count ad impressions grouped by campaign  Compute aggregates over last 10 seconds  Make aggregates available for queries (Redis)
  • 25. Results (lower is better) 25 Flink and Storm at sub-second latencies Spark has a latency- throughput tradeoff 170k events/sec
  • 27. Handling high-volume streams  Scalability: how many events/sec can a system scale to, with infinite resources?  Scalability comes at a cost (systems add overhead to be scalable)  Efficiency: how many events/sec can a system scale to, with limited resources? 27
  • 28. Extending the Yahoo! benchmark 28  Yahoo! benchmark is a great starting point to understand engine behavior.  However, benchmark stops at low write throughput and programs are not fault tolerant.  We extended the benchmark to (1) high volumes, and (2) to use Flink's built-in state. • http://data-artisans.com/extending-the-yahoo- streaming-benchmark/
  • 29. Results (higher is better) 29 Also: Flink jobs are correct under failures (exactly once), Storm jobs are not 500k events/sec 3mi events/sec 15mi events/sec
  • 30. Fault tolerance and repeatability 30
  • 31. Stateful streaming applications  Most interesting stream applications are stateful  How to ensure that state is correct after failures? 31
  • 32. Fault tolerance definitions  At least once • May over-count • In some cases even under-count with different definition  Exactly once • Counts are the same after failure  End-to-end exactly once • Counts appear the same in an external sink (database, file system) after failure 32
  • 33. Fault tolerance in Flink  Flink guarantees exactly once  End-to-end exactly once supported with specific sources and sinks • E.g., Kafka  Flink  HDFS  Internally, Flink periodically takes consistent snapshots of the state without ever stopping the computation 33
  • 34. Savepoints  Maintaining stateful applications in production is challenging  Flink savepoints: externally triggered, durable checkpoints  Easy code upgrades (Flink or app), maintenance, migration, and debugging, what-if simulations, A/B tests 34
  • 36. Notions of time 36 1977 1980 1983 1999 2002 2005 2015 Episode IV: A New Hope Episode V: The Empire Strikes Back Episode VI: Return of the Jedi Episode I: The Phantom Menace Episode II: Attach of the Clones Episode III: Revenge of the Sith Episode VII: The Force Awakens This is called event time This is called processing time
  • 37. Out of order streams 37 Event time windows Arrival time windows Instant event-at-a-time First burst of events Second burst of events
  • 38. Why event time  Most stream processors are limited to processing/arrival time, Flink can operate on event time as well  Benefits of event time • Accurate results for out of order data • Sessions and unaligned windows • Time travel (backstreaming) 38
  • 39. What's coming up in Flink 39
  • 40. Evolution of streaming in Flink  Flink 0.9 (Jun 2015): DataStream API in beta, exactly-once guarantees via checkpoiting  Flink 0.10 (Nov 2015): Event time support, windowing mechanism based on Dataflow/Beam model, graduated DataStream API, high availability, state interface, new/updated connectors (Kafka, Nifi, Elastic, ...), improved monitoring  Flink 1.0 (Mar 2015): DataStream API stability, out of core state, savepoints, CEP library, improved monitoring, Kafka 0.9 support 40
  • 41. Upcoming features  SQL: ongoing work in collaboration with Apache Calcite  Dynamic scaling: adapt resources to stream volume, historical stream processing  Queryable state: ability to query the state inside the stream processor  Mesos support  More sources and sinks (e.g., Kinesis, Cassandra) 41
  • 44. Summary  Stream processing gaining momentum, the right paradigm for continuous data applications  Even seemingly simple applications can be complex at scale and in production – choice of framework crucial 44  Flink: unique combination of capabilities, performance, and robustness
  • 45. Flink in the wild 45 30 billion events daily 2 billion events in 10 1Gb machines Picked Flink for "Saiki" data integration & distribution platform See talks by at
  • 46. Join the community! 46  Follow: @ApacheFlink, @dataArtisans  Read: flink.apache.org/blog, data-artisans.com/blog  Subscribe: (news | user | dev) @ flink.apache.org
  • 47. 2 meetups next week in Bay Area! 47 April 6, San JoseApril 5, San Francisco What's new in Flink 1.0 & recent performance benchmarks with Flink http://www.meetup.com/Bay-Area-Apache-Flink-Meetup/

Editor's Notes

  1. 3 systems (batch), or 5 systems (streaming), Need to add a new system for millisecond alerts What If I want to count every 5 minutes, not 1 hour? Just ignores out of order What if I wanna do sessions?
  2. 3 systems (batch), or 5 systems (streaming), Need to add a new system for millisecond alerts What If I want to count every 5 minutes, not 1 hour? Just ignores out of order What if I wanna do sessions?