Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

301 views

Published on


Back to the program
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Thursday 17th

from 18:00 to 18:40

Theatre 19

-

Keynote

In this talk I’ll give a very short introduction to stream processing in general and then dive into event-time based stream processing. I will outline how this is important for IoT applications and also why it is such a challenging topic. Afterwards we’ll look at some real-world IoT use cases that are enabled by the support for robust event-time based stream processing provided by Apache Flink™. We will especially focus on easy of use and on correctness of results in the face of errors.

In the first half of the talk we’ll cover the basics of stream processing. We will look at the differences between event-time based and processing-time and at stateful stream processing. While on this, we’ll also highlight how the combination of these features is essential for doing robust stream processing in an IoT setting.

In the second part, we will look at how Flink solves some of the challenges that arise in event-time based processing and how that enables novel applications in the IoT space. We will do the latter by looking at a collection of real-world IoT use cases.

Some of the topics covered will be:
- Apache Flink
- Stateful Stream Processing
- Event Time vs. Processing Time Windowing
- Processing of out-of-order events
- IoT use cases

Published in: Data & Analytics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
301
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • E.g., counters, windows of past events, state machines, trained ML models
  • Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

    1. 1. 1 Aljoscha Krettek @aljoscha Big Data Spain November 17, 2016 Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
    2. 2. What I’d Like to Talk About 2  Streaming architecture and Flink  IoT and event-time stream processing  Use-case examples
    3. 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
    4. 4. Intro: The Streaming Architecture 4
    5. 5. Big Data Architecture  Collect events in HDFS (or similar)  Periodically run (batch) jobs to process  Problems: • Huge latency • Natural boundaries in data don’t match batch boundaries 5
    6. 6. Rethinking Data Architecture  Real-time reaction to events  Continuous applications  Process both real-time and historical data 6
    7. 7. What is (Distributed) Streaming  Streaming: Computations on never- ending “streams” of data records (“events”)  Distributed: Computation spread across many machines 7 Your code Your code Your code Your code
    8. 8. What is Stateful Streaming  Result depends on history of stream  A stateful stream processor should gives the tools to manage state • Recover, roll back, version, upgrade, etc 8 Your code state
    9. 9. What is Event-Time Streaming  Events have timestamps  Processing depends on timestamps  An event-time stream processor should give you the tools to reason about time • Handle streams that are out of order 9 Your code state t3 t1 t2t4 t1-t2 t3-t4
    10. 10. 10 app state app state app state event log Query service
    11. 11. Recap: What is Streaming?  Continuous processing of data that is continuously generated  I.e., pretty much all “big” data  It’s all about state and time  Flink does all of that 11
    12. 12. IoT and Event-time Stream Processing 12
    13. 13. 13 1read.bi/1yDOQQ3 The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1
    14. 14. Example Event Sources 14
    15. 15. A Simple Definition 15 IoT use cases from the system’s perspective: A large number of (distributed) things continuously generating a large amount of data.
    16. 16. IoT: Some Insights 16  Data is continuously produced → Stream Processing  Events have a timestamp → Event-time based processing  Data/Events can arrive with huge delays/out-of-order  Most analyses happen on time windows
    17. 17. What Is Event-Time Processing 17 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
    18. 18. What Is Event-Time Processing 18 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
    19. 19. What’s The Problem? 19 13 12 735961112 1234567891011121314 Processing Time Processing-Time Windows 137356 12 137 356Event-Time Windows 12 1112 Mismatch between event time and processing time.
    20. 20. Sources of Time Mismatch  Big Mismatch • Network disconnects • Slow network  Small Mismatch • The nature of distributed systems • Differing system clock time 20
    21. 21. Small Event-Time Mismatch 21 Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/
    22. 22. 22
    23. 23. 23
    24. 24. 24
    25. 25. Recap: Event-Time  IoT use cases need event-time processing  Even small mismatch of event time/processing time will lead to wrong results 25
    26. 26. Use-Case Examples 26
    27. 27. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees 27
    28. 28. King  Challenges: • Many games (Candy Crush, Farm Heroes, Pet Rescue, and Bubble Witch…) • 300 million monthly unique users • 30 billion events received every day  Need event-time based statistics 28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
    29. 29. Solution: RBEA 29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
    30. 30. Solution: RBEA  Multiplexing of multiple data scientist requests into a single Flink job  Groovy as language for analysis scripts  Event-time windowing 30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
    31. 31. Bouygues Telecom 31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/ ~120 users* 5 Flink Production Apps 750 TB Storage 4 billion Events/ day 2015 ~300 users* 30 Flink Production Apps 2 PB Storage 10 billion Events/ day 2016 * Users of the information system
    32. 32. Bouygues: Challenges  Low latency & streaming fashion counters  Massive amounts of data + bursty loads  Reliability  Multiple flow correlation  Time management: • Out of order & late events → our worst enemies 32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
    33. 33. 33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
    34. 34. In Summary 34  If you need to ask: you already have a streaming use case!  IoT requires Proper Time Management  Apache Flink has done that for a long time now* * Since version 0.10
    35. 35. 3 Thank you! @aljoscha @ApacheFlink @dataArtisans
    36. 36. 36 One day of hands-on Flink training One day of conference Tickets are on sale Call for Papers is already open Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward
    37. 37. We are hiring! data-artisans.com/careers

    ×