Apache Flink Berlin Meetup May 2016

•Download as PPTX, PDF•

0 likes•679 views

Stephan Ewen

A look at some of the upcoming Apache Flink features

Software

Stephan Ewen
@stephanewen
What's coming up in
Apache Flink?
Quick teaser of some of the upcoming features

Disclaimer
2
This list of threads is incomplete
This is not an Apache Flink roadmap!

What's coming up?
3
APIs
Integration Operations
Stream SQL
Queryable State
Cassandra
Deployment and Management
(YARN, Mesos, Docker, …)
Dynamically Scaling
Streaming Programs
Metrics
File System Sources
Side Inputs
Joining streams
and static data
BigTop
Integration
Kinesis
State Scalability

Two definitions of Stream SQL
1. Run a continuous SQL query that reads an infinite
stream and continuously produces results
2. Continuously ingest streams into a warehouse.
Query the real time data in the warehouse.
5

An Example
7
val execEnv = StreamExecutionEnvironment.getExecutionEnvironment
val tableEnv = TableEnvironment.getTableEnvironment(execEnv)
// define a JSON encoded Kafka topic as external table
val sensorSource = new KafkaJsonSource[(String, Long, Double)]("sensorTopic", kafkaProps,
("location", "time", "tempF"))
// register external table
tableEnv.registerTableSource("sensorData", sensorSource)
// define query in external table
val roomSensors: Table = tableEnv.sql("""
SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC
FROM sensorData
WHERE location LIKE 'room%' """)
// write the table back to Kafka as JSON
roomSensors.toSink(new KafkaJsonSink(...))

The Implementation
8
Flink 1.0 Flink 1.1 +

Sharing State with Applications
10
Access to the stream aggregates with a latency bound
 Write them to a key/value store

Sharing State with Applications
11
Access to the stream aggregates with a latency bound
 Write them to a key/value store
Often the biggest
bottleneck

Queryable State
12
Optional, and
only at the end of
windows
Send queries to Flink's internal state

What does it bring?
 Fewer moving parts in the infrastructure
 Performance!
 From an extension of Yahoo!'s streaming benchmark:
• With key/value store: 280,000 events/s
• Queryable state: 15,000,000 events/s
 What's the secret?
• No synchronous distributed communication
• Persistence via Flink's checkpoint (async snapshots)
13

Adjust parallelism of Streaming Programs
15
Initial
configuration
Scale Out
(for load)
Scale In
(save resources)

Adjust parallelism of Streaming Programs
 Adjusting parallelism without (significantly) interrupting the
program
 Initial version:
• Savepoint -> stop -> restart-with-different-parallelism
 Stateless operators: Trivial
 Stateful operators: Repartition state
• State reorganized by key for key/value state and windows
16

Redistribution via Key Groups
 Flink 1.0: Hash keys into parallel partitions.
 Finest granularity is a partition.
 Flink 1.1: Hash keys into KeyGroups.
 Assign KeyGroups to parallel partitions
 Change of parallelism means change of assignment of
KeyGroups to parallel partitions
19

Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org

We are hiring!
data-artisans.com/careers

What's hot

Marton Balassi – Stateful Stream ProcessingFlink Forward

From Apache Flink® 1.3 to 1.4Till Rohrmann

Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger

Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group

Apache Flink@ Strata & Hadoop World LondonStephan Ewen

Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen

Aljoscha Krettek - The Future of Apache FlinkFlink Forward

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward

Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske

Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica

Streaming in the Wild with Apache FlinkKostas Tzoumas

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward

Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann

Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward

Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward

Unified Stream and Batch Processing with Apache FlinkDataWorks Summit/Hadoop Summit

Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward

A look at Flink 1.2Stefan Richter

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward

What's hot (20)

Marton Balassi – Stateful Stream Processing

From Apache Flink® 1.3 to 1.4

Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)

Apache Flink@ Strata & Hadoop World London

Continuous Processing with Apache Flink - Strata London 2016

Aljoscha Krettek - The Future of Apache Flink

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...

Taking a look under the hood of Apache Flink's relational APIs.

Stephan Ewen - Experiences running Flink at Very Large Scale

Streaming in the Wild with Apache Flink

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...

Apache Flink: Streaming Done Right @ FOSDEM 2016

Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API

Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...

Unified Stream and Batch Processing with Apache Flink

Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink

A look at Flink 1.2

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...

Viewers also liked

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann

Intelligent Text Document Correction System Based on Similarity TechniqueMarwa Al-Rikaby

Cloud-Con: Informatica Vibe and Cloud Integration for the Hybrid EnterpriseDarren Cunningham

Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...DECK36

SnapLogic Elastic Integration Platform as a Service (iPaaS)Darren Cunningham

5 Signs You Need to Re-Think Your Data Integration StrategyDarren Cunningham

Flink vs. SparkSlim Baltagi

Viewers also liked (7)

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015

Intelligent Text Document Correction System Based on Similarity Technique

Cloud-Con: Informatica Vibe and Cloud Integration for the Hybrid Enterprise

Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...

SnapLogic Elastic Integration Platform as a Service (iPaaS)

5 Signs You Need to Re-Think Your Data Integration Strategy

Flink vs. Spark

Similar to Apache Flink Berlin Meetup May 2016

QCon London - Stream Processing with Apache FlinkRobert Metzger

GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger

January 2016 Flink Community Update & Roadmap 2016Robert Metzger

Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit

Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Stephan Ewen

Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez

Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek

Apache Flink Training: System OverviewFlink Forward

Flink history, roadmap and visionStephan Ewen

Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Till Rohrmann

Counting Elements in StreamsJamie Grier

A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger

Confluent and ElasticPaolo Castagna

Debunking Common Myths in Stream ProcessingDataWorks Summit/Hadoop Summit

Flink forward-2017-netflix keystones-paasMonal Daxini

Santander Stream Processing with Apache Flinkconfluent

Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...confluent

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent

Chicago Flink Meetup: Flink's streaming architectureRobert Metzger

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner

Similar to Apache Flink Berlin Meetup May 2016 (20)

QCon London - Stream Processing with Apache Flink

GOTO Night Amsterdam - Stream processing with Apache Flink

January 2016 Flink Community Update & Roadmap 2016

Flexible and Real-Time Stream Processing with Apache Flink

Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...

Analitica de datos en tiempo real con Apache Flink y Apache BEAM

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink Training: System Overview

Flink history, roadmap and vision

Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Counting Elements in Streams

A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)

Confluent and Elastic

Debunking Common Myths in Stream Processing

Flink forward-2017-netflix keystones-paas

Santander Stream Processing with Apache Flink

Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...

Chicago Flink Meetup: Flink's streaming architecture

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)

Recently uploaded

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

Professional Resume Template for Software DevelopersVinodh Ram

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08

chapter--4-software-project-planning.pptkotipi9215

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

What is Fashion PLM and Why Do You Need ItWave PLM

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

cybersecurity notes for mca students for learningVitsRangannavar

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Recently uploaded (20)

Salesforce Certified Field Service Consultant

The Evolution of Karaoke From Analog to App.pdf

Professional Resume Template for Software Developers

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

HR Software Buyers Guide in 2024 - HRSoftware.com

XpertSolvers: Your Partner in Building Innovative Software Solutions

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

5 Signs You Need a Fashion PLM Software.pdf

Unit 1.1 Excite Part 1, class 9, cbse...

chapter--4-software-project-planning.ppt

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

What is Fashion PLM and Why Do You Need It

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

cybersecurity notes for mca students for learning

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Cloud Management Software Platforms: OpenStack

Apache Flink Berlin Meetup May 2016

1. Stephan Ewen @stephanewen What's coming up in Apache Flink? Quick teaser of some of the upcoming features

2. Disclaimer 2 This list of threads is incomplete This is not an Apache Flink roadmap!

3. What's coming up? 3 APIs Integration Operations Stream SQL Queryable State Cassandra Deployment and Management (YARN, Mesos, Docker, …) Dynamically Scaling Streaming Programs Metrics File System Sources Side Inputs Joining streams and static data BigTop Integration Kinesis State Scalability

4. Stream SQL 4

5. Two definitions of Stream SQL 1. Run a continuous SQL query that reads an infinite stream and continuously produces results 2. Continuously ingest streams into a warehouse. Query the real time data in the warehouse. 5

6. Two definitions of Stream SQL 1. Run a continuous SQL query that reads an infinite stream and continuously produces results 2. Continuously ingest streams into a warehouse. Query the real time data in the warehouse. 6 That's Flink's Stream SQL Good use case for Kafka + Flink + Druid

7. An Example 7 val execEnv = StreamExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(execEnv) // define a JSON encoded Kafka topic as external table val sensorSource = new KafkaJsonSource[(String, Long, Double)]("sensorTopic", kafkaProps, ("location", "time", "tempF")) // register external table tableEnv.registerTableSource("sensorData", sensorSource) // define query in external table val roomSensors: Table = tableEnv.sql(""" SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC FROM sensorData WHERE location LIKE 'room%' """) // write the table back to Kafka as JSON roomSensors.toSink(new KafkaJsonSink(...))

8. The Implementation 8 Flink 1.0 Flink 1.1 +

9. Queryable State 9

10. Sharing State with Applications 10 Access to the stream aggregates with a latency bound  Write them to a key/value store

11. Sharing State with Applications 11 Access to the stream aggregates with a latency bound  Write them to a key/value store Often the biggest bottleneck

12. Queryable State 12 Optional, and only at the end of windows Send queries to Flink's internal state

13. What does it bring?  Fewer moving parts in the infrastructure  Performance!  From an extension of Yahoo!'s streaming benchmark: • With key/value store: 280,000 events/s • Queryable state: 15,000,000 events/s  What's the secret? • No synchronous distributed communication • Persistence via Flink's checkpoint (async snapshots) 13

14. Dynamic Scaling 14

15. Adjust parallelism of Streaming Programs 15 Initial configuration Scale Out (for load) Scale In (save resources)

16. Adjust parallelism of Streaming Programs  Adjusting parallelism without (significantly) interrupting the program  Initial version: • Savepoint -> stop -> restart-with-different-parallelism  Stateless operators: Trivial  Stateful operators: Repartition state • State reorganized by key for key/value state and windows 16

17. Consistent Hashing 17

18. Redistribution via Key Groups 18

19. Redistribution via Key Groups  Flink 1.0: Hash keys into parallel partitions.  Finest granularity is a partition.  Flink 1.1: Hash keys into KeyGroups.  Assign KeyGroups to parallel partitions  Change of parallelism means change of assignment of KeyGroups to parallel partitions 19

20. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 Early bird deadline: July 15, 2016 www.flink-forward.org

21. We are hiring! data-artisans.com/careers

Apache Flink Berlin Meetup May 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Apache Flink Berlin Meetup May 2016

Similar to Apache Flink Berlin Meetup May 2016 (20)

Recently uploaded

Recently uploaded (20)

Apache Flink Berlin Meetup May 2016