Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Aljoscha Krettek
@aljoscha
Big Data Spain
November 17, 2016
Apache Flink for IoT:
How Event-Time Processing
Enables Easy...
What I’d Like to Talk About
2
 Streaming architecture and Flink
 IoT and event-time stream processing
 Use-case examples
3
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Intro: The Streaming Architecture
4
Big Data Architecture
 Collect events in HDFS (or similar)
 Periodically run (batch) jobs to process
 Problems:
• Huge ...
Rethinking Data Architecture
 Real-time reaction to events
 Continuous applications
 Process both real-time and histori...
What is (Distributed) Streaming
 Streaming:
Computations on never-
ending “streams” of data
records (“events”)
 Distribu...
What is Stateful Streaming
 Result depends on history
of stream
 A stateful stream
processor should gives
the tools to m...
What is Event-Time Streaming
 Events have timestamps
 Processing depends on
timestamps
 An event-time stream
processor ...
10
app state
app state
app state
event log
Query
service
Recap: What is Streaming?
 Continuous processing of data that is
continuously generated
 I.e., pretty much all “big” dat...
IoT and Event-time Stream
Processing
12
13
1read.bi/1yDOQQ3
The 'Internet Of Everything' Will
Generate $14.4 Trillion Of Value Over
The Next Decade.1
Example Event Sources
14
A Simple Definition
15
IoT use cases from the system’s
perspective:
A large number of (distributed) things
continuously ge...
IoT: Some Insights
16
 Data is continuously produced
→ Stream Processing
 Events have a timestamp
→ Event-time based pro...
What Is Event-Time Processing
17
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episod...
What Is Event-Time Processing
18
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
What’s The Problem?
19
13
12
735961112
1234567891011121314
Processing Time
Processing-Time Windows 137356
12 137 356Event-...
Sources of Time Mismatch
 Big Mismatch
• Network disconnects
• Slow network
 Small Mismatch
• The nature of distributed ...
Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®:
A Simple Walkthrough
http://data-artisans.com/ro...
22
23
24
Recap: Event-Time
 IoT use cases need event-time processing
 Even small mismatch of event
time/processing time will lead...
Use-Case Examples
26
30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 o...
King
 Challenges:
• Many games (Candy Crush, Farm Heroes,
Pet Rescue, and Bubble Witch…)
• 300 million monthly unique use...
Solution: RBEA
29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA
 Multiplexing of multiple data scientist
requests into a single Flink job
 Groovy as language for analysi...
Bouygues Telecom
31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and...
Bouygues: Challenges
 Low latency & streaming fashion counters
 Massive amounts of data + bursty loads
 Reliability
 M...
33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-fl...
In Summary
34
 If you need to ask: you already have a
streaming use case!
 IoT requires Proper Time Management
 Apache ...
3
Thank you!
@aljoscha
@ApacheFlink
@dataArtisans
36
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please vis...
We are hiring!
data-artisans.com/careers
Upcoming SlideShare
Loading in …5
×

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

1,207 views

Published on


Back to the program
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Thursday 17th

from 18:00 to 18:40

Theatre 19

-

Keynote

In this talk I’ll give a very short introduction to stream processing in general and then dive into event-time based stream processing. I will outline how this is important for IoT applications and also why it is such a challenging topic. Afterwards we’ll look at some real-world IoT use cases that are enabled by the support for robust event-time based stream processing provided by Apache Flink™. We will especially focus on easy of use and on correctness of results in the face of errors.

In the first half of the talk we’ll cover the basics of stream processing. We will look at the differences between event-time based and processing-time and at stateful stream processing. While on this, we’ll also highlight how the combination of these features is essential for doing robust stream processing in an IoT setting.

In the second part, we will look at how Flink solves some of the challenges that arise in event-time based processing and how that enables novel applications in the IoT space. We will do the latter by looking at a collection of real-world IoT use cases.

Some of the topics covered will be:
- Apache Flink
- Stateful Stream Processing
- Event Time vs. Processing Time Windowing
- Processing of out-of-order events
- IoT use cases

Published in: Data & Analytics
  • Be the first to comment

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

  1. 1. 1 Aljoscha Krettek @aljoscha Big Data Spain November 17, 2016 Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  2. 2. What I’d Like to Talk About 2  Streaming architecture and Flink  IoT and event-time stream processing  Use-case examples
  3. 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  4. 4. Intro: The Streaming Architecture 4
  5. 5. Big Data Architecture  Collect events in HDFS (or similar)  Periodically run (batch) jobs to process  Problems: • Huge latency • Natural boundaries in data don’t match batch boundaries 5
  6. 6. Rethinking Data Architecture  Real-time reaction to events  Continuous applications  Process both real-time and historical data 6
  7. 7. What is (Distributed) Streaming  Streaming: Computations on never- ending “streams” of data records (“events”)  Distributed: Computation spread across many machines 7 Your code Your code Your code Your code
  8. 8. What is Stateful Streaming  Result depends on history of stream  A stateful stream processor should gives the tools to manage state • Recover, roll back, version, upgrade, etc 8 Your code state
  9. 9. What is Event-Time Streaming  Events have timestamps  Processing depends on timestamps  An event-time stream processor should give you the tools to reason about time • Handle streams that are out of order 9 Your code state t3 t1 t2t4 t1-t2 t3-t4
  10. 10. 10 app state app state app state event log Query service
  11. 11. Recap: What is Streaming?  Continuous processing of data that is continuously generated  I.e., pretty much all “big” data  It’s all about state and time  Flink does all of that 11
  12. 12. IoT and Event-time Stream Processing 12
  13. 13. 13 1read.bi/1yDOQQ3 The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1
  14. 14. Example Event Sources 14
  15. 15. A Simple Definition 15 IoT use cases from the system’s perspective: A large number of (distributed) things continuously generating a large amount of data.
  16. 16. IoT: Some Insights 16  Data is continuously produced → Stream Processing  Events have a timestamp → Event-time based processing  Data/Events can arrive with huge delays/out-of-order  Most analyses happen on time windows
  17. 17. What Is Event-Time Processing 17 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  18. 18. What Is Event-Time Processing 18 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
  19. 19. What’s The Problem? 19 13 12 735961112 1234567891011121314 Processing Time Processing-Time Windows 137356 12 137 356Event-Time Windows 12 1112 Mismatch between event time and processing time.
  20. 20. Sources of Time Mismatch  Big Mismatch • Network disconnects • Slow network  Small Mismatch • The nature of distributed systems • Differing system clock time 20
  21. 21. Small Event-Time Mismatch 21 Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/
  22. 22. 22
  23. 23. 23
  24. 24. 24
  25. 25. Recap: Event-Time  IoT use cases need event-time processing  Even small mismatch of event time/processing time will lead to wrong results 25
  26. 26. Use-Case Examples 26
  27. 27. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees 27
  28. 28. King  Challenges: • Many games (Candy Crush, Farm Heroes, Pet Rescue, and Bubble Witch…) • 300 million monthly unique users • 30 billion events received every day  Need event-time based statistics 28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  29. 29. Solution: RBEA 29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  30. 30. Solution: RBEA  Multiplexing of multiple data scientist requests into a single Flink job  Groovy as language for analysis scripts  Event-time windowing 30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  31. 31. Bouygues Telecom 31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/ ~120 users* 5 Flink Production Apps 750 TB Storage 4 billion Events/ day 2015 ~300 users* 30 Flink Production Apps 2 PB Storage 10 billion Events/ day 2016 * Users of the information system
  32. 32. Bouygues: Challenges  Low latency & streaming fashion counters  Massive amounts of data + bursty loads  Reliability  Multiple flow correlation  Time management: • Out of order & late events → our worst enemies 32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
  33. 33. 33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
  34. 34. In Summary 34  If you need to ask: you already have a streaming use case!  IoT requires Proper Time Management  Apache Flink has done that for a long time now* * Since version 0.10
  35. 35. 3 Thank you! @aljoscha @ApacheFlink @dataArtisans
  36. 36. 36 One day of hands-on Flink training One day of conference Tickets are on sale Call for Papers is already open Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward
  37. 37. We are hiring! data-artisans.com/careers

×