Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

1,771 views

Published on

https://www.bigdataspain.org/2016/program/thu-from-data-numbers-knowledge-semantic-embeddings.html

https://www.youtube.com/watch?v=1sZFrHUgUw8&list=PL6O3g23-p8Tr5eqnIIPdBD_8eE5JBDBik&t=7s&index=36

Published in: Technology
  • Get paid to send out tweets - $25 per hour ♣♣♣ http://ishbv.com/socialpaid/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Is this possible to use Apache Flink for Wireless Sensor Networks?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

  1. 1. 1! Aljoscha Krettek @aljoscha Big Data Spain November 17, 2016 Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  2. 2. What I’d Like to Talk About 2 §  Streaming Architecture and Flink §  IoT and Event-Time based stream processing §  Use-Case Examples
  3. 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  4. 4. Intro: The Streaming Architecture 4
  5. 5. Rethinking Data Architecture §  Better app isolation §  Real-time reaction to events §  Robust continuous applications §  Process both real-time and historical data 5
  6. 6. 6 app state app state app state event log Query service
  7. 7. What is (Distributed) Streaming §  Streaming: Computations on never- ending “streams” of data records (“events”) §  Distributed: Computation spread across many machines 7 Your code Your code Your code Your code
  8. 8. What is Stateful Streaming §  Computation and state •  E.g., counters, windows of past events, state machines, trained ML models §  Result depends on history of stream §  A stateful stream processor should gives the tools to manage state •  Recover, roll back, version, upgrade, etc 8 Your code state
  9. 9. What is Event-Time Streaming §  Data records associated with timestamps (time series data) §  Processing depends on timestamps §  An event-time stream processor should give you the tools to reason about time •  Handle streams that are out of order •  Core feature is watermarks – a clock to measure event time 9 Your code state t3 t1 t2t4 t1-t2 t3-t4
  10. 10. Recap: What is Streaming? §  Continuous processing on data that is continuously generated §  I.e., pretty much all “big” data §  It’s all about state and time §  Flink does all of what we just saw 10
  11. 11. IoT and Event-time Stream Processing 11
  12. 12. 12 1read.bi/1yDOQQ3 The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1
  13. 13. Example Event Sources 13
  14. 14. A Simple Definition 14 IoT use cases from the system’s perspective: A large number of (distributed) things generating a large amount of data.
  15. 15. Important Properties 15 §  Data is continuously produced → Stream Processing §  Events have a timestamp that has to be considered → Event-time based processing §  Data/Events can arrive with huge delays §  Most analyses happen on time windows
  16. 16. Remember: Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data 16
  17. 17. What Is Event-Time Processing 17 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
  18. 18. What’s The Problem? 18 13 12 735961112 1234567891011121314 Processing Time Processing-Time Windows 137356 12 137 356Event-Time Windows 12 1112 Mismatch between event time and processing time.
  19. 19. Sources of Time Mismatch §  Big Mismatch •  Network disconnects •  Slow network §  Small Mismatch •  The nature of distributed systems •  Differing system clock time 19
  20. 20. Big Event-Time Mismatch 20 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  21. 21. Small Event-Time Mismatch 21 Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/
  22. 22. 22
  23. 23. 23
  24. 24. 24
  25. 25. Recap: Event-Time §  IoT use cases need event-time processing §  Even small mismatch of event time/ processing time will lead to wrong results 25
  26. 26. Use-Case Examples 26
  27. 27. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second 27
  28. 28. King §  Challenges: •  Many games (Candy Crush, Farm Heroes, Pet Rescue, and Bubble Witch…) •  300 million monthly unique users •  30 billion events received every day §  Need Event-Time Based statistics 28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  29. 29. Solution: RBEA 29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  30. 30. Solution: RBEA §  Multiplexing of multiple data scientist requests into a single Flink job §  Groovy as language for analysis scripts §  Event-time windowing 30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  31. 31. Bouygues Telecom 31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/ ~120 users* 5 Flink Production Apps 750 TB Storage 4 billion Events/ day 2015 ~300 users* 30 Flink Production Apps 2 PB Storage5 10 billion Events/ day 2016 * Users of the information system
  32. 32. Bouygues: Challenges §  Low latency & streaming fashion counters §  Massive amounts of data + bursty loads §  Reliability §  Multiple flow correlation §  Time management: •  Out of order & late events → our worst enemies •  Flexible window management 32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
  33. 33. 33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
  34. 34. In Summary 34 §  If you need to ask: you already have a streaming use case! §  IoT requires Proper Time Management §  Apache Flink has done that for a long time now* * Since version 0.10
  35. 35. 3 Thank you! @aljoscha @ApacheFlink @dataArtisans
  36. 36. 36 One day of hands-on Flink training One day of conference Tickets are on sale Call for Papers is already open Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward
  37. 37. We are hiring! data-artisans.com/careers
  38. 38. Appendix 38

×