Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

K. Tzoumas & S. Ewen – Flink Forward Keynote

7,364 views

Published on

Flink Forward 2015

Published in: Technology
  • Be the first to comment

K. Tzoumas & S. Ewen – Flink Forward Keynote

  1. 1. Welcome to The first conference on Apache Flink Sponsored by
  2. 2. Some practical info §  Registration, cloakroom, and meals are in Palais §  Information point always staffed §  WiFi is FlinkForward §  Twitter hashtag is #ff15 §  Follow @FlinkForward
  3. 3. Some practical info §  Need help? Look for a volunteer (pink badges) §  All sessions are recorded and will be made available online §  This includes the training sessions 3
  4. 4. Getting around 4 Please go around while talks are in progress
  5. 5. Our speaker organizations 5
  6. 6. Kostas Tzoumas and Stephan Ewen @kostas_tzoumas | @StephanEwen Apache FlinkTM: From Incubation to Flink 1.0
  7. 7. 7 1.  A bit of history 2.  The streaming era and Flink 3.  Inside Flink 0.10 4.  Towards Flink 1.0 and beyond
  8. 8. A bit of history From incubation until now 8
  9. 9. 9 DataSet API (Java/Scala) Flink core Local Remote Yarn Apr 2014 Jun 2015Dec 2014 0.70.60.5 0.90.9-m1 0.10 Oct 2015 Top level 0.8 Gelly Table FlinkML SAMOA DataSet (Java/Scala/Python) DataStream (Java/Scala) HadoopM/R Flink dataflow engine Local Remote Yarn Tez Embedded Dataflow Dataflow Cascading Table Storm
  10. 10. Community growth Flink is one of the largest and most active Apache big data projects with well over 120 contributors 10
  11. 11. Flink meetups around the globe 11
  12. 12. Featured in 12
  13. 13. The streaming era Welcome to 13
  14. 14. 14 batch event based need new systems well served
  15. 15. 15 Streaming is the biggest change in data infrastructure since Hadoop
  16. 16. 16 1.  Radically simplified infrastructure 2.  Internet of Things, on-demand services 3.  Can completely subsume batch
  17. 17. 17 In a world of events and isolated apps, the stream processor is the backbone of the data infrastructure App App App local view local viewlocal view Consistent movement, analytics App App App Global view Consistent store
  18. 18. 18 §  Until now, stream processors were less mature than batch processors §  This led to •  in-house solutions •  abuse of batch processors •  Lambda architectures §  This is no longer the case
  19. 19. 19 Flink 0.10 With the upcoming 0.10 release, Flink significantly surpasses the state of the art in open source stream processing systems. And, we are heading to Flink 1.0 after that.
  20. 20. 20 §  Streaming technology has matured •  e.g., Flink, Kafka, Dataflow §  Flink and Dataflow duality •  a Google technology •  an open source Apache project +
  21. 21. 21 §  Streaming is happening §  Better adapt now §  Flink 0.10: a ready to use open source stream processor
  22. 22. Flink 0.10 Flink for the streaming era 22
  23. 23. Improved DataStream API §  Stream data analysis differs from batch data analysis by introducing time §  Streams are unbounded and produce data over time §  Simple as batch API if handling time in a simple way §  Powerful if you want to handle time in an advanced way (out-of-order records, preliminary results, etc) 23
  24. 24. Improved DataStream API 24 case  class  Event(location:  Location,  numVehicles:  Long)     val  stream:  DataStream[Event]  =  …;     stream        .filter  {  evt  =>  isIntersection(evt.location)  }  
  25. 25. Improved DataStream API 25 case  class  Event(location:  Location,  numVehicles:  Long)     val  stream:  DataStream[Event]  =  …;     stream        .filter  {  evt  =>  isIntersection(evt.location)  }          .keyBy("location")        .timeWindow(Time.of(15,  MINUTES),  Time.of(5,  MINUTES))        .sum("numVehicles")  
  26. 26. Improved DataStream API 26 case  class  Event(location:  Location,  numVehicles:  Long)     val  stream:  DataStream[Event]  =  …;     stream        .filter  {  evt  =>  isIntersection(evt.location)  }          .keyBy("location")        .timeWindow(Time.of(15,  MINUTES),  Time.of(5,  MINUTES))        .trigger(new  Threshold(200))        .sum("numVehicles")  
  27. 27. Improved DataStream API 27 case  class  Event(location:  Location,  numVehicles:  Long)     val  stream:  DataStream[Event]  =  …;     stream        .filter  {  evt  =>  isIntersection(evt.location)  }          .keyBy("location")        .timeWindow(Time.of(15,  MINUTES),  Time.of(5,  MINUTES))        .trigger(new  Threshold(200))        .sum("numVehicles")          .keyBy(  evt  =>  evt.location.grid  )        .mapWithState  {  (evt,  state:  Option[Model])  =>  {              val  model  =  state.orElse(new  Model())              (model.classify(evt),  Some(model.update(evt)))          }}  
  28. 28. IoT / Mobile Applications 28 Events occur on devices Queue / Log Events analyzed in a data streaming system Stream Analysis Events stored in a log
  29. 29. IoT / Mobile Applications 29
  30. 30. IoT / Mobile Applications 30
  31. 31. IoT / Mobile Applications 31
  32. 32. IoT / Mobile Applications 32 Out of order !!! First burst of events Second burst of events
  33. 33. IoT / Mobile Applications 33 Event time windows Arrival time windows Instant event-at-a-time Flink supports out of order time (event time) windows, arrival time windows (and mixtures) plus low latency processing. First burst of events Second burst of events
  34. 34. High Availability and Consistency 34 No Single-Point-Of-Failure any more Exactly-once processing semantics across pipeline Checkpoints/Fault Tolerance is decoupled from windows è Allows for highly flexible window implementations ZooKeeper ensemble Multiple Masters failover
  35. 35. Performance 35 Continuous streaming Latency-bound buffering Distributed Snapshots High Throughput & Low Latency With configurable throughput/latency tradeoff
  36. 36. Batch and Streaming 36 case  class  WordCount(word:  String,  count:  Int)     val  text:  DataStream[String]  =  …;     text      .flatMap  {  line  =>  line.split("  ")  }      .map  {  word  =>  new  WordCount(word,  1)  }      .keyBy("word")      .window(GlobalWindows.create())      .trigger(new  EOFTrigger())      .sum("count")   Batch Word Count in the DataStream API
  37. 37. Batch and Streaming 37 Batch Word Count in the DataSet API case  class  WordCount(word:  String,  count:  Int)     val  text:  DataStream[String]  =  …;     text      .flatMap  {  line  =>  line.split("  ")  }      .map  {  word  =>  new  WordCount(word,  1)  }      .keyBy("word")      .window(GlobalWindows.create())      .trigger(new  EOFTrigger())      .sum("count")   val  text:  DataSet[String]  =  …;     text      .flatMap  {  line  =>  line.split("  ")  }      .map  {  word  =>  new  WordCount(word,  1)  }      .groupBy("word")      .sum("count")      
  38. 38. Batch and Streaming 38 Pipelined and blocking operators Streaming Dataflow Runtime Batch Parameters DataSet DataStream Relational Optimizer Window Optimization Pipelined and windowed operators Schedule lazily Schedule eagerly Recompute whole operators Periodic checkpoints Streaming data movement Stateful operations DAG recovery Fully buffered streams DAG resource management Streaming Parameters
  39. 39. Batch and Streaming 39 A full-fledged batch processor as well Gelly Table FlinkML SAMOA DataSet (Java/Scala/Python) DataStream (Java/Scala) HadoopM/R Flink dataflow engine Local Remote Yarn Tez Embedded Dataflow Dataflow Cascading Table Storm
  40. 40. Batch and Streaming 40 A full-fledged batch processor as well Gelly Table FlinkML SAMOA DataSet (Java/Scala/Python) DataStream (Java/Scala) HadoopM/R Flink dataflow engine Local Remote Yarn Tez Embedded Dataflow Dataflow Cascading Table Storm More details at Dongwon Kim's Talk "A comparative performance evaluation of Flink"
  41. 41. Integration (picture not complete) 41 POSIX   Java/Scala Collections POSIX  
  42. 42. Monitoring 42 Life system metrics and user-defined accumulators/statistics Get  http://flink-­‐m:8081/jobs/7684be6004e4e955c2a558a9bc463f65/accumulators   Monitoring REST API for custom monitoring tools {  "id":  "dceafe2df1f57a1206fcb907cb38ad97",  "user-­‐accumulators":  [      {  "name":"avglen",  "type":"DoubleCounter",  "value":"123.03259440000001"  },      {  "name":"genwords",  "type":"LongCounter",  "value":"75000000"  }  ]  }  
  43. 43. Flink 0.10 Summary §  Focus on operational readiness •  high availability •  monitoring •  integration with other systems §  First-class support for event time §  Refined DataStream API: easy and powerful 43
  44. 44. Towards Flink 1.0 and beyond Where we see the project going 44
  45. 45. Towards Flink 1.0 §  Flink 1.0 is around the corner §  Focus on defining public APIs and automatic API compatibility checks §  Guarantee backwards compatibility in all Flink 1.X versions 45
  46. 46. Beyond Flink 1.0 §  Flink engine has most features in place §  Focus on usability features on top of DataStream API •  e.g., SQL, ML, more connectors §  Continue work on elasticity and memory management 46
  47. 47. 47   Enjoy the rest of The first conference on Apache Flink

×