Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Flink Berlin Meetup May 2016

545 views

Published on

A look at some of the upcoming Apache Flink features

Published in: Software
  • Be the first to comment

  • Be the first to like this

Apache Flink Berlin Meetup May 2016

  1. 1. Stephan Ewen @stephanewen What's coming up in Apache Flink? Quick teaser of some of the upcoming features
  2. 2. Disclaimer 2 This list of threads is incomplete This is not an Apache Flink roadmap!
  3. 3. What's coming up? 3 APIs Integration Operations Stream SQL Queryable State Cassandra Deployment and Management (YARN, Mesos, Docker, …) Dynamically Scaling Streaming Programs Metrics File System Sources Side Inputs Joining streams and static data BigTop Integration Kinesis State Scalability
  4. 4. Stream SQL 4
  5. 5. Two definitions of Stream SQL 1. Run a continuous SQL query that reads an infinite stream and continuously produces results 2. Continuously ingest streams into a warehouse. Query the real time data in the warehouse. 5
  6. 6. Two definitions of Stream SQL 1. Run a continuous SQL query that reads an infinite stream and continuously produces results 2. Continuously ingest streams into a warehouse. Query the real time data in the warehouse. 6 That's Flink's Stream SQL Good use case for Kafka + Flink + Druid
  7. 7. An Example 7 val execEnv = StreamExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(execEnv) // define a JSON encoded Kafka topic as external table val sensorSource = new KafkaJsonSource[(String, Long, Double)]("sensorTopic", kafkaProps, ("location", "time", "tempF")) // register external table tableEnv.registerTableSource("sensorData", sensorSource) // define query in external table val roomSensors: Table = tableEnv.sql(""" SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC FROM sensorData WHERE location LIKE 'room%' """) // write the table back to Kafka as JSON roomSensors.toSink(new KafkaJsonSink(...))
  8. 8. The Implementation 8 Flink 1.0 Flink 1.1 +
  9. 9. Queryable State 9
  10. 10. Sharing State with Applications 10 Access to the stream aggregates with a latency bound  Write them to a key/value store
  11. 11. Sharing State with Applications 11 Access to the stream aggregates with a latency bound  Write them to a key/value store Often the biggest bottleneck
  12. 12. Queryable State 12 Optional, and only at the end of windows Send queries to Flink's internal state
  13. 13. What does it bring?  Fewer moving parts in the infrastructure  Performance!  From an extension of Yahoo!'s streaming benchmark: • With key/value store: 280,000 events/s • Queryable state: 15,000,000 events/s  What's the secret? • No synchronous distributed communication • Persistence via Flink's checkpoint (async snapshots) 13
  14. 14. Dynamic Scaling 14
  15. 15. Adjust parallelism of Streaming Programs 15 Initial configuration Scale Out (for load) Scale In (save resources)
  16. 16. Adjust parallelism of Streaming Programs  Adjusting parallelism without (significantly) interrupting the program  Initial version: • Savepoint -> stop -> restart-with-different-parallelism  Stateless operators: Trivial  Stateful operators: Repartition state • State reorganized by key for key/value state and windows 16
  17. 17. Consistent Hashing 17
  18. 18. Redistribution via Key Groups 18
  19. 19. Redistribution via Key Groups  Flink 1.0: Hash keys into parallel partitions.  Finest granularity is a partition.  Flink 1.1: Hash keys into KeyGroups.  Assign KeyGroups to parallel partitions  Change of parallelism means change of assignment of KeyGroups to parallel partitions 19
  20. 20. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 Early bird deadline: July 15, 2016 www.flink-forward.org
  21. 21. We are hiring! data-artisans.com/careers

×