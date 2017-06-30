© 2017 MapR TechnologiesMapR Confidential 1 Introduction to Stream Processing with Apache Flink Tugdual Grall @tgrall
{"about" : "me"} Tugdual "Tug" Grall • MapR : Technical Evangelist • MongoDB, Couchbase, e...
© 2017 MapR Technologies@tgrall 3 Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Ser...
Streaming Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data
Decoupling App B App A App C State managed centralized App B App A App C Applications build their own state
Event Stream=Data Pipelines
© 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1  12:00 am 2016-3-1  1:00 am 2016-3-1  2:00 am 2016-3-11  11:0...
© 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1  12:00 am 2016-3-1  1:00 am 2016-3-1  2:00 am 2016-3-11  11:0...
© 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1  12:00 am 2016-3-1  1:00 am 2016-3-1  2:00 am 2016-3-11  11:0...
Processing • Request / Response
Processing • Request / Response • Batch
Processing • Request / Response • Batch • Stream Processing
Processing • Request / Response • Batch • Stream Processing • Real-time reaction to events • Continuous applications • Process both real-time and historical data
Flink Architecture
Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google
Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow
Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow DataSet API Batch Processing API & Libraries
© 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, G...
© 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, G...
© 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, G...
Demonstration Flink Basics
© 2017 MapR Technologies@tgrall Batch & Stream case class Word (word: String, frequency: Int) // DataSet API - Batch val l...
Steam Processing Source Filter /  Transform Sink
Flink Ecosystem Source Sink Apache Kafka MapR Streams AWS Kinesis RabbitMQ Twitter Apache Bahir … Apache Kafka MapR Streams AWS Kinesis RabbitMQ Elasticsearch HDFS/MapR-FS …
Stateful Steam Processing Source Filter /  Transform State  read/write Sink
Is Flink used?
Powered by Flink
© 2017 MapR Technologies@tgrall 10 Billion events/day 2Tb of data/day 30 Applications 2Pb of storage and growing Source Bo...
Stream Processing Windowing
Stream Windows
© 2017 MapR Technologies@tgrall Stream Windows
© 2017 MapR Technologies@tgrall Stream Windows
© 2017 MapR Technologies@tgrall Stream Windows
© 2017 MapR Technologies@tgrall Stream Windows
Demonstration Flink Windowing
What about it ?What about it ? Time
Time in Flink • Multiple notion of "Time" in Flink • Event Time • Ingestion Time • Processing Time
What Is Event-Time Processing 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode  IV Episode  V Episode  VI Episode  I Episode  II Episode  III Episode  VII Event Time
Time in Flink
Complex Event Processing
© 2017 MapR Technologies@tgrall Complex Event Processing • Analyzing a stream of events and drawing conclusions • “if A an...
Use Case
© 2017 MapR Technologies@tgrall Order Events Process is reflected in a stream of order events Order(orderId, tStamp, “rece...
Real-time Warnings
© 2017 MapR Technologies@tgrall CEP to the Rescue Define processing and delivery intervals (SLAs) ProcessSucc(orderId, tSt...
CEP Example
Processing: Order ! Shipment
Processing: Order ! Shipment val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
© 2017 MapR Technologies@tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followe...
© 2017 MapR Technologies@tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followe...
Count Delayed Shipments
Compute Avg Processing Time
Demonstration Streaming Analytics
© 2017 MapR Technologies@tgrall Demonstration • https://github.com/mapr-demos/mapr-streams-flink-demo • https://github.com...
Kostas Tzoumas Stephan Ewen Fabian Hueske Till Rohrmann Jamie Grier Thanks to
Streaming Architecture http://mapr.com/ebooks/ Free ebooks & Online training http://mapr.com/training/
Stream Processing with Apache Flink Tugdual Grall @tgrall
Introduction to Streaming with Apache Flink

  1. 1. © 2017 MapR TechnologiesMapR Confidential 1 Introduction to Stream Processing with Apache Flink Tugdual Grall @tgrall
  2. 2. © 2017 MapR Technologies@tgrall {“about” : “me”} Tugdual “Tug” Grall • MapR : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder  • @tgrall • http://tgrall.github.io • tug@mapr.com / tugdual@gmail.com
  3. 3. © 2017 MapR Technologies@tgrall 3 Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps MapR Converged Data Platform
  4. 4. © 2017 MapR Technologies@tgrall Streaming Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data
  5. 5. © 2017 MapR Technologies@tgrall Decoupling App B App A App C State managed centralized App B App A App C Applications build their own state
  6. 6. © 2017 MapR Technologies@tgrall Event Stream=Data Pipelines
  7. 7. © 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1  12:00 am 2016-3-1  1:00 am 2016-3-1  2:00 am 2016-3-11  11:00pm 2016-3-12  12:00am 2016-3-12  1:00am 2016-3-11  10:00pm 2016-3-12  2:00am 2016-3-12  3:00am… partition partition
  8. 8. © 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1  12:00 am 2016-3-1  1:00 am 2016-3-1  2:00 am 2016-3-11  11:00pm 2016-3-12  12:00am 2016-3-12  1:00am 2016-3-11  10:00pm 2016-3-12  2:00am 2016-3-12  3:00am… partition partition Stream (low latency) Stream (high latency)
  9. 9. © 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1  12:00 am 2016-3-1  1:00 am 2016-3-1  2:00 am 2016-3-11  11:00pm 2016-3-12  12:00am 2016-3-12  1:00am 2016-3-11  10:00pm 2016-3-12  2:00am 2016-3-12  3:00am… partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  10. 10. © 2017 MapR Technologies@tgrall Processing • Request / Response
  11. 11. © 2017 MapR Technologies@tgrall Processing • Request / Response • Batch
  12. 12. © 2017 MapR Technologies@tgrall Processing • Request / Response • Batch • Stream Processing
  13. 13. © 2017 MapR Technologies@tgrall Processing • Request / Response • Batch • Stream Processing • Real-time reaction to events • Continuous applications • Process both real-time and historical data
  14. 14. © 2017 MapR Technologies@tgrall
  15. 15. © 2017 MapR Technologies@tgrall Flink Architecture
  16. 16. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google
  17. 17. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow
  18. 18. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow DataSet API Batch Processing API & Libraries
  19. 19. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow DataSet API Batch Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  20. 20. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  21. 21. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataﬂow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational CEP Event Processing Table Relational
  22. 22. © 2017 MapR Technologies@tgrall Demonstration Flink Basics
  23. 23. © 2017 MapR Technologies@tgrall Batch & Stream case class Word (word: String, frequency: Int) // DataSet API - Batch val lines: DataSet[String] = env.readTextFile(…) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() // DataStream API - Streaming val lines: DataSream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS)) .every(Time.of(1,SECONDS)).sum(”frequency") .print()
  24. 24. © 2017 MapR Technologies@tgrall Steam Processing Source Filter /  Transform Sink
  25. 25. © 2017 MapR Technologies@tgrall Flink Ecosystem Source Sink Apache Kafka MapR Streams AWS Kinesis RabbitMQ Twitter Apache Bahir … Apache Kafka MapR Streams AWS Kinesis RabbitMQ Elasticsearch HDFS/MapR-FS …
  26. 26. © 2017 MapR Technologies@tgrall Stateful Steam Processing Source Filter /  Transform State  read/write Sink
  27. 27. © 2017 MapR Technologies@tgrall Is Flink used?
  28. 28. © 2017 MapR Technologies@tgrall Powered by Flink
  29. 29. © 2017 MapR Technologies@tgrall 10 Billion events/day 2Tb of data/day 30 Applications 2Pb of storage and growing Source Bouyges Telecom : http://berlin.ﬂink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
  30. 30. © 2017 MapR Technologies@tgrall Stream Processing Windowing
  31. 31. © 2017 MapR Technologies@tgrall Stream Windows
  32. 32. © 2017 MapR Technologies@tgrall Stream Windows
  33. 33. © 2017 MapR Technologies@tgrall Stream Windows
  34. 34. © 2017 MapR Technologies@tgrall Stream Windows
  35. 35. © 2017 MapR Technologies@tgrall Stream Windows
  36. 36. © 2017 MapR Technologies@tgrall Demonstration Flink Windowing
  37. 37. © 2017 MapR Technologies@tgrall What about it ?What about it ? Time
  38. 38. © 2017 MapR Technologies@tgrall Time in Flink • Multiple notion of “Time” in Flink • Event Time • Ingestion Time • Processing Time
  39. 39. © 2017 MapR Technologies@tgrall What Is Event-Time Processing 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode  IV Episode  V Episode  VI Episode  I Episode  II Episode  III Episode  VII Event Time
  40. 40. © 2017 MapR Technologies@tgrall Time in Flink
  41. 41. © 2017 MapR Technologies@tgrall Complex Event Processing
  42. 42. © 2017 MapR Technologies@tgrall Complex Event Processing • Analyzing a stream of events and drawing conclusions • “if A and then B ! infer event C” • Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support
  43. 43. © 2017 MapR Technologies@tgrall Use Case
  44. 44. © 2017 MapR Technologies@tgrall Order Events Process is reflected in a stream of order events Order(orderId, tStamp, “received”) Shipment(orderId, tStamp, “shipped”) Delivery(orderId, tStamp, “delivered”) orderId: Identifies the order tStamp: Time at which the event happened
  45. 45. © 2017 MapR Technologies@tgrall Real-time Warnings
  46. 46. © 2017 MapR Technologies@tgrall CEP to the Rescue Define processing and delivery intervals (SLAs) ProcessSucc(orderId, tStamp, duration) ProcessWarn(orderId, tStamp) DeliverySucc(orderId, tStamp, duration) DeliveryWarn(orderId, tStamp) orderId: Identifies the order tStamp: Time when the event happened duration: Duration of the processing/delivery
  47. 47. © 2017 MapR Technologies@tgrall CEP Example
  48. 48. © 2017 MapR Technologies@tgrall Processing: Order ! Shipment
  49. 49. © 2017 MapR Technologies@tgrall Processing: Order ! Shipment val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
  50. 50. © 2017 MapR Technologies@tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) Processing: Order ! Shipment
  51. 51. © 2017 MapR Technologies@tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) } Processing: Order ! Shipment
  52. 52. © 2017 MapR Technologies@tgrall Count Delayed Shipments
  53. 53. © 2017 MapR Technologies@tgrall Compute Avg Processing Time
  54. 54. © 2017 MapR Technologies@tgrall Demonstration Streaming Analytics
  55. 55. © 2017 MapR Technologies@tgrall Demonstration • https://github.com/mapr-demos/mapr-streams-flink-demo • https://github.com/mapr-demos/wifi-sensor-demo • http://tgrall.github.io/blog/2016/10/12/getting-started-with- apache-flink-and-kafka/ • http://tgrall.github.io/blog/2016/10/17/getting-started-with- apache-flink-and-mapr-streams/ • more soon….
  56. 56. © 2017 MapR Technologies@tgrall Kostas Tzoumas Stephan Ewen Fabian Hueske Till Rohrmann Jamie Grier Thanks to
  57. 57. © 2017 MapR Technologies@tgrall Streaming Architecture http://mapr.com/ebooks/ Free ebooks & Online training http://mapr.com/training/
  58. 58. © 2017 MapR TechnologiesMapR Confidential 58 Stream Processing with Apache Flink Tugdual Grall @tgrall

