SlideShare a Scribd company logo
1 of 37
Kostas Tzoumas
@kostas_tzoumas
Hadoop Summit San Jose
June 6, 2016
Streaming in the Wild with
Apache FlinkTM
2
Streaming technology is enabling the
obvious: continuous processing on data that
is continuously produced
Hint: you are already doing streaming
Why embrace streaming?
 Monitor your business and react in real time
 Implement robust continuous applications
 Adopt a decentralized architecture
 Consolidate analytics infrastructure
3
React in real time
4
Streaming versus real-time
 Streaming != Real-time
 E.g., streaming that is not real time:
continuous applications with large
windows
 E.g., real-time that is not streaming: very
fast data warehousing queries
 However: streaming applications can be
fast
5
Streaming
Real time
How real-time is Flink?
6
Yahoo! benchmark* data Artisans benchmarks**
* https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
** http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ and http://data-artisans.com/high-throughput-
low-latency-and-exactly-once-stream-processing-with-apache-flink/
When and why does this matter?
 Immediate reaction to life
• E.g., generate alerts on
anomaly/pattern/special event
 Avoid unnecessary tradeoffs
• Even if application is not latency-critical
• With Flink you do not pay a price for latency!
7
Bouygues Telecom – LUX
8
One of the largest telcos in
France. System (among
others) used for real time
diagnostics and alarming.
Read more: http://data-
artisans.com/flink-at-
bouygues-html/
Robust continuous
applications
9
Continuous application
 A production data application that needs to
be live 24/7 feeding other systems (perhaps
customer-facing)
 Need to be efficient, consistent, correct, and
manageable
 Stream processing is a great way to
implement continuous applications robustly
10
Continuous apps with “batch”
11
file 1
file 2
Job 1
Job 2
time
file 3 Job 3
Scheduler
Serve&store
Continuous apps with “lambda”
12
file 1
file 2
Job 1
Job 2
Scheduler
Streaming job
Serve&
store
Problems with batch and λ
 Way too many moving parts (and code dup)
 Implicit treatment of time
 Out of order event handling
 Implicit batch boundaries
13
Continuous apps with streaming
14
Streaming job
Serve&
store
Extending the Yahoo! benchmark
 Work of Jamie Grier, inspired by a real continuous
application at Twitter
15
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
What is the use case?
 Counting!
• Tweet impressions or ad views
 Most analytics is continuous counting and
aggregations grouped by dimensions
• E.g., anomaly detection
16
Requirements
 Performance: millions of events/sec, millions of
keys
 Correctness: counts correlated with timestamps
 Consistency: counts should be correct under
failures
 Manageability: ability to pause & restart,
reprocess, change code, etc
17
Before Flink
 Performance: 1000s of cores needed to sustain
workload
 Correctness: time handled in application code (or
not)
 Consistency: approximate results during the day,
exact results once a day (lambda)
 Manageability: acceptable
18
After Flink
 Performance: 10s of cores needed to sustain
workload
 Correctness: time handled by framework
 Consistency: correct results on demand
 Manageability: acceptable
19
Results (yet to be beaten!)
 Same program as Yahoo! benchmark
 30x over Storm, plus consistent results
20
Manageability
 Flink savepoints (Flink 1.0): consistent
snapshots of stateful applications
• Planned downtime for code upgrades,
maintenance, migration, debugging, etc
 Monitoring (Flink 1.1)
 Dynamic scaling (Flink 1.2+)
21
Decentralized architecture
22
Streaming and microservices
23
App App
App
local statelocal state
Archive
A decentralized architecture favors
a streaming-based data
infrastructure with local application
state
Zalando
24
Slides at http://www.slideshare.net/ZalandoTech/flink-in-zalandos-world-of-microservices-62376341
Zalando
25
Transitioning from monolithic
architecture to microservices
New BI stack
26
Flink @ Zalando (present & future)
 Business process monitoring
• Check if Zalando platform works
• Order & delivery velocities
• SLAs of related events
 Continuous ETL
• Transformation, combination, pre-aggregation
• Data cleansing and validation
 Complex Event Processing
 Sales monitoring
27
Consolidate analytics
28
Stream Processing as a Service
 How do we make stream processing more
accessible to the data analyst?
 More familiar interfaces
• Flink 1.1 includes the first version of SQL for
static data sets and data streams
 Easier deployment
29
King.com
30
King.com - RBEA
 RBEA – a platform
designed to make
stream processing
available inside
King.com
 Data scientists submit
scripts in Groovy
 Flink backend executes
these scripts
31
https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Netflix
 Netflix plans to offer
Stream Processing as a
Service internally in the
company
 Currently testing Flink
and Apache Beam
32
http://www.slideshare.net/mdaxini/netflix-keystone-streaming-data-pipeline-scale-in-the-clouddbtb2016-62076009
Closing
33
Disclaimer
 A lot of this presentation is based on the work of
very talented engineers building data products
with Flink
 Special thanks to:
• Amine Abdessemed (Bouygues Telecom)
• Mihail Vieru, Javier Lopez (Zalando)
• Gyula Fora, Mattias Andersson (King.com)
• Monal Daxini (Netflix)
34
More Flink tales at Hadoop Summit
35
Xiaowei Jiang
Blink−Improved Runtime for Flink and its
Application in Alibaba Search
Wednesday, June 29, 2016, 2:10PM - 2:50PM
210C
Stephan Ewen
Turning the Stream Processor into a Database:
Building Online Applications on Streams
Thursday, June 30, 2016, 12:20PM - 1:00PM
212
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016 (watch website)
Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!
data-artisans.com/careers

More Related Content

What's hot

Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Stream Processing with Apache Flink
Stream Processing with Apache FlinkStream Processing with Apache Flink
Stream Processing with Apache FlinkC4Media
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkJamie Grier
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingAljoscha Krettek
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 

What's hot (20)

Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Stream Processing with Apache Flink
Stream Processing with Apache FlinkStream Processing with Apache Flink
Stream Processing with Apache Flink
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming Benchmark
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data Processing
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 

Similar to Streaming in the Wild with Apache Flink

The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Parallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingParallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingQueBIT Consulting
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Partner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - AprilPartner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - Aprilconfluent
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
HL7 Survival Guide - Chapter 10 – Process and Workflow
HL7 Survival Guide - Chapter 10 – Process and WorkflowHL7 Survival Guide - Chapter 10 – Process and Workflow
HL7 Survival Guide - Chapter 10 – Process and WorkflowCaristix
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li
 

Similar to Streaming in the Wild with Apache Flink (20)

Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Parallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingParallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT Consulting
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Partner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - AprilPartner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - April
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
HL7 Survival Guide - Chapter 10 – Process and Workflow
HL7 Survival Guide - Chapter 10 – Process and WorkflowHL7 Survival Guide - Chapter 10 – Process and Workflow
HL7 Survival Guide - Chapter 10 – Process and Workflow
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Streaming in the Wild with Apache Flink

  • 1. Kostas Tzoumas @kostas_tzoumas Hadoop Summit San Jose June 6, 2016 Streaming in the Wild with Apache FlinkTM
  • 2. 2 Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you are already doing streaming
  • 3. Why embrace streaming?  Monitor your business and react in real time  Implement robust continuous applications  Adopt a decentralized architecture  Consolidate analytics infrastructure 3
  • 4. React in real time 4
  • 5. Streaming versus real-time  Streaming != Real-time  E.g., streaming that is not real time: continuous applications with large windows  E.g., real-time that is not streaming: very fast data warehousing queries  However: streaming applications can be fast 5 Streaming Real time
  • 6. How real-time is Flink? 6 Yahoo! benchmark* data Artisans benchmarks** * https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at ** http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ and http://data-artisans.com/high-throughput- low-latency-and-exactly-once-stream-processing-with-apache-flink/
  • 7. When and why does this matter?  Immediate reaction to life • E.g., generate alerts on anomaly/pattern/special event  Avoid unnecessary tradeoffs • Even if application is not latency-critical • With Flink you do not pay a price for latency! 7
  • 8. Bouygues Telecom – LUX 8 One of the largest telcos in France. System (among others) used for real time diagnostics and alarming. Read more: http://data- artisans.com/flink-at- bouygues-html/
  • 10. Continuous application  A production data application that needs to be live 24/7 feeding other systems (perhaps customer-facing)  Need to be efficient, consistent, correct, and manageable  Stream processing is a great way to implement continuous applications robustly 10
  • 11. Continuous apps with “batch” 11 file 1 file 2 Job 1 Job 2 time file 3 Job 3 Scheduler Serve&store
  • 12. Continuous apps with “lambda” 12 file 1 file 2 Job 1 Job 2 Scheduler Streaming job Serve& store
  • 13. Problems with batch and λ  Way too many moving parts (and code dup)  Implicit treatment of time  Out of order event handling  Implicit batch boundaries 13
  • 14. Continuous apps with streaming 14 Streaming job Serve& store
  • 15. Extending the Yahoo! benchmark  Work of Jamie Grier, inspired by a real continuous application at Twitter 15 http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
  • 16. What is the use case?  Counting! • Tweet impressions or ad views  Most analytics is continuous counting and aggregations grouped by dimensions • E.g., anomaly detection 16
  • 17. Requirements  Performance: millions of events/sec, millions of keys  Correctness: counts correlated with timestamps  Consistency: counts should be correct under failures  Manageability: ability to pause & restart, reprocess, change code, etc 17
  • 18. Before Flink  Performance: 1000s of cores needed to sustain workload  Correctness: time handled in application code (or not)  Consistency: approximate results during the day, exact results once a day (lambda)  Manageability: acceptable 18
  • 19. After Flink  Performance: 10s of cores needed to sustain workload  Correctness: time handled by framework  Consistency: correct results on demand  Manageability: acceptable 19
  • 20. Results (yet to be beaten!)  Same program as Yahoo! benchmark  30x over Storm, plus consistent results 20
  • 21. Manageability  Flink savepoints (Flink 1.0): consistent snapshots of stateful applications • Planned downtime for code upgrades, maintenance, migration, debugging, etc  Monitoring (Flink 1.1)  Dynamic scaling (Flink 1.2+) 21
  • 23. Streaming and microservices 23 App App App local statelocal state Archive A decentralized architecture favors a streaming-based data infrastructure with local application state
  • 27. Flink @ Zalando (present & future)  Business process monitoring • Check if Zalando platform works • Order & delivery velocities • SLAs of related events  Continuous ETL • Transformation, combination, pre-aggregation • Data cleansing and validation  Complex Event Processing  Sales monitoring 27
  • 29. Stream Processing as a Service  How do we make stream processing more accessible to the data analyst?  More familiar interfaces • Flink 1.1 includes the first version of SQL for static data sets and data streams  Easier deployment 29
  • 31. King.com - RBEA  RBEA – a platform designed to make stream processing available inside King.com  Data scientists submit scripts in Groovy  Flink backend executes these scripts 31 https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  • 32. Netflix  Netflix plans to offer Stream Processing as a Service internally in the company  Currently testing Flink and Apache Beam 32 http://www.slideshare.net/mdaxini/netflix-keystone-streaming-data-pipeline-scale-in-the-clouddbtb2016-62076009
  • 34. Disclaimer  A lot of this presentation is based on the work of very talented engineers building data products with Flink  Special thanks to: • Amine Abdessemed (Bouygues Telecom) • Mihail Vieru, Javier Lopez (Zalando) • Gyula Fora, Mattias Andersson (King.com) • Monal Daxini (Netflix) 34
  • 35. More Flink tales at Hadoop Summit 35 Xiaowei Jiang Blink−Improved Runtime for Flink and its Application in Alibaba Search Wednesday, June 29, 2016, 2:10PM - 2:50PM 210C Stephan Ewen Turning the Stream Processor into a Database: Building Online Applications on Streams Thursday, June 30, 2016, 12:20PM - 1:00PM 212
  • 36. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 (watch website) Early bird deadline: July 15, 2016 www.flink-forward.org

Editor's Notes

  1. 3 systems (batch), or 5 systems (streaming), Need to add a new system for millisecond alerts What If I want to count every 5 minutes, not 1 hour? Just ignores out of order What if I wanna do sessions?