Robert Metzger - Connecting Apache Flink to the World - Reviewing the streaming connectors

Flink Forward
Flink ForwardFlink Forward
Robert Metzger, Aljoscha Krettek
@rmetzger_
Connecting Apache Flink®
to the World: Reviewing
the streaming connectors
What to expect from this talk
 Overview of all available connectors
 Kafka connector internals
 End-to-end exactly-once
 Apache Bahir and the future of connectors
 [Bonus] Message Queues and the Message
Acknowledging Source
2
Connectors in Apache Flink®
“Hello World, let’s connect”
3
Connectors in Flink 1.1
Connector Source Sink Notes
Streaming files Both source and sink are exactly-once
Apache Kafka Consumers (sources) exactly-once
Amazon Kinesis Consumers (sources) exactly-once
RabbitMQ / AMQP Consumers (sources) exactly-once
Elasticsearch No guarantees
Apache Cassandra Exactly-once with idempotent updates
Apache Nifi No guarantees
Redis No guarantees
4
There is also a Twitter Source and an ActiveMQ connector in Apache Bahir
Streaming connectors by activity
Streaming connectors ordered by number of threads/mentions on the
user@flink list:
 Apache Kafka (250+) (since 0.7)
 Apache Cassandra (38) (since 1.1)
 ElasticSearch (34) (since 0.10)
 File sources (~30) (since 0.10)
 Redis (27) (since 1.0)
 RabbitMQ (11) (since 0.7)
 Kinesis (10) (since 1.1)
 Apache Nifi (5) (since 0.10)
5Date of evaluation 5.9.2016
The Apache Kafka Connector
6
Apache Kafka connector: Intro
“Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.”
7This page contains material copied from http://kafka.apache.org/documentation.html#introduction
Apache Kafka connector: Consumer
 Flink has two main Kafka consumer
implementations
• For Kafka 0.8 an implementation against the
“SimpleConsumer” API of Kafka
• For Kafka 0.9+ we are using the new Kafka
consumer (KAFKA-1326)
 The producers are basically the same
8
Kafka 0.8 Consumer
9
Fetcher
Thread
Fetcher
Thread
Kafka
Broker
topicB:1
topicB:3
Kafka
Broker
topicA:3
topicB:6
topicB:5
Kafka
Broker
topicB:4
topicB:2
topicA:1
Kafka
Broker
topicA:2
topicB:0
topicA:0
Consumer
Thread
Fetcher
Thread
Fetcher
Thread
Consumer
Thread
topicA:2
topicB:0
topicA:0
topicB:1
topicB:3topicB:4
topicB:2
topicA:1
topicA:3
topicB:6
topicB:5
KafkaClusterFlinkCluster
Each TaskManager has one Consumer Thread, coordinating Fetcher Threads for
each Kafka broker
TaskManagerTaskManager
Kafka 0.8 Broker rebalance
10
Fetcher
Thread
Fetcher
Thread
Kafka
Broker
topicB:1
topicB:3
Kafka
Broker
topicA:3
topicB:6
topicB:5
Kafka
Broker
topicB:4
topicB:2
topicA:1
Kafka
Broker
topicA:2
topicB:0
topicA:0
Consumer
Thread
Fetcher
Thread
Fetcher
Thread
Consumer
Thread
topicA:2
topicB:0
topicA:0
topicB:1
topicB:3topicB:4
topicB:2
topicA:1
topicA:3
topicB:6
topicB:5
KafkaClusterFlinkCluster
The consumer is able to handle broker failures
1 Broker fails
2 Thread returns partitions
Kafka 0.8 Broker rebalance
11
Fetcher
Thread
Fetcher
Thread
Kafka
Broker
topicB:1
topicB:3
Kafka
Broker
topicA:3
topicB:6
topicB:5
Kafka
Broker
topicB:4
topicB:2
topicA:1
Kafka
Broker
topicA:2
topicB:0
topicA:0
Consumer
Thread
Fetcher
Thread
Fetcher
Thread
Consumer
Thread
topicA:2
topicB:0
topicA:0
topicB:1
topicB:3topicB:4
topicB:2
topicA:1
topicA:3
topicB:6
topicB:5
KafkaClusterFlinkCluster
On a failure, the Consumer Thread re-assigns partitions and spawns new threads
as needed
1 Broker fails
2 Thread returns partitions
topicB:4
topicB:2
topicA:1
Kafka 0.8 Broker rebalance
12
Fetcher
Thread
Fetcher
Thread
Kafka
Broker
topicB:1
topicB:3
Kafka
Broker
topicA:3
topicB:6
topicB:5
Kafka
Broker
topicB:4
topicB:2
topicA:1
Kafka
Broker
topicA:2
topicB:0
topicA:0
Consumer
Thread
Fetcher
Thread
Fetcher
Thread
Consumer
Thread
topicA:2
topicB:0
topicA:0
topicB:1
topicB:3 topicA:3
topicB:6
topicB:5
KafkaClusterFlinkCluster
On a failure, the Consumer Thread re-assigns partitions and spawns new threads
as needed
3 Kafka reassigns partitions
topicB:4
topicB:2
topicA:1
topicB:2
topicB:4
topicA:1
topicB:2
Fetcher
Thread
topicB:4
topicA:1
topicB:4
topicB:2
topicA:1
4 Flink assigns partitions to existing or new threads
Kafka 0.9+ Consumer
13
Kafka
Broker
topicB:1
topicB:3
Kafka
Broker
topicA:3
topicB:6
topicB:5
Kafka
Broker
topicB:4
topicB:2
topicA:1
Kafka
Broker
topicA:2
topicB:0
topicA:0
Consumer
Thread
Consumer
Thread
KafkaClusterFlinkCluster
New Kafka Consumer Magic
TaskManager TaskManager
Since Kafka 0.9, the new Consumer API handles broker failures/rebalancing,
offset committing, topic querying, …
Exactly-once for Kafka consumers
 Mechanism is the same for all connector
versions
 Offsets to Zookeeper / Broker for group.id
restart and external tools (at-least-once)
 Offsets checkpointed for exactly-once with
Flink state
14
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 0
Zookeeper/Broker
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 0, 0
This toy example is reading from a Kafka topic with two partitions, each containing “a”, “b”, “c”, … as messages.
The offset is set to 0 for both partitions, a counter is initialized to 0.
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
a
counter = 0
Zookeeper/Broker
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 1, 0
The Kafka consumer starts reading messages from partition 0. Message “a” is in-flight, the offset for the first
consumer has been set to 1.
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
a
counter = 1
Zookeeper/Broker
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 2, 1
a
b
Trigger
Checkpoint at
source
Message “a” arrives at the counter, it is set to 1. The consumers both read the next records (“b” and “a”). The
offsets are set accordingly. In parallel, the checkpoint coordinator decides to trigger a checkpoint at the source …
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator a
counter = 2
Zookeeper/Broker
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 1
a
b
offsets = 2, 1
c
The source has created a snapshot of its state (“offset=2,1”), which is now stored in the checkpoint coordinator.
The sources emitted a checkpoint barrier after messages “a” and “b”.
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 3
Zookeeper/Broker
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 2
a
b
offsets = 2, 1 counter = 3
c
b
The map operator has received checkpoint barriers from both sources. It checkpoints its state (counter=3) in the
coordinator. At the same time, the consumers are further reading more data from the Kafka partitions.
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 4
Zookeeper/Broker
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 2
a
offsets = 2, 1 counter = 3
c
b
Notify
checkpoint
complete
The checkpoint coordinator informs the Kafka consumer that the checkpoint has been completed. It commits the
checkpoints offsets into Zookeeper. Note that Flink is not relying on the Kafka offsets in ZK for restoring from failures
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 4
Zookeeper/Broker
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 2
a
offsets = 2, 1 counter = 3
c
b
Checkpoint in
Zookeeper/
Broker
The checkpoint is now persisted in Zookeeper. External tools such as the Kafka Offset Checker can see the lag of the
consumer group.
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 5
Zookeeper/Broker
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 4, 2
offsets = 2, 1 counter = 3
c
b
d
The processing further advances
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 5
Zookeeper/Broker
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 4, 2
offsets = 2, 1 counter = 3
c
b
d
Failure
Some failure has happened (such as worker failure)
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 3
Zookeeper/Broker
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 2, 1
offsets = 2, 1 counter = 3
Reset all
operators to
last
completed
checkpoint
The checkpoint coordinator restores the state at all the operators participating at the checkpointing. The Kafka
sources start from offset 2 and 1, the counter’s value is 3.
a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 3
Zookeeper/Broker
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 1
offsets = 2, 1 counter = 3
Continue
processing …
c
The system continues with the processing, the counter’s value is consistent across a worker failure.
End-to-End exactly once
26
Consistently move and process data
27
Process
Transform
Analyze
Exactly-once:
• Apache Kafka
• Kinesis
• RabbitMQ / ActiveMQ
• File monitoring
Exactly-once:
• Rolling file sink
• With idempotent updates
• Apache Cassandra
• Elasticsearch
• Redis
At-least-once (duplicates):
• Apache Kafka
 Flink allows to move data between systems,
keeping consistency
Continuous File Monitoring
28
Some
FileSystem
Monitoring
task
Periodic Querying
Parallel file
reader
Parallel file
reader
Parallel file
reader
• File Path
• Offset
Records
 The monitoring task checkpoints the last “modification time”
 The file readers checkpoint the current file + offset and the list of pending
files to read
Rolling / Bucketing File Sink
 System time bucketing
 Bucketing based on record data
29
Bucketing
Operator
11:00 10:00 9:00
9 5 1 1 8 4 2 4 6 2 3 4 Bucketing
Operator
8:00
0-4 5-9
Bucketing File Sink exactly-once
 On Hadoop 2.7+, we call truncate() to remove
invalid data on restore
 On earlier versions, we’ll write a metadata file
with valid offsets
 Downstream consumers must take valid offset
metadata into account
30
Kafka Producer: Avoid data loss
 Apache Kafka does currently not provide the infrastructure to
produce in an exactly-once fashion
 By avoiding data-loss, we can guarantee at-least-once.
31
Flink Kafka Producer
Kafka broker
Kafka partition
unacknowledged=7
On checkpoint, Flink calls flush()
and waits for unack == 0
 Guarantee that data has been
written
ACK
ACK
ACK
ACK
Apache Bahir and the future of
connectors
What’s next
32
Future of Connectors in Flink
 Kafka 0.10 support, with timestamps
 Dynamic scaling support for Kafka and other
connectors
 Refactor Kafka connector API
33
Apache Bahir™
 Bahir is a community specialized in connectors, allowing faster
releases independent of engine releases.
 Apache Bahir™ has been created for providing community-
contributed connectors a platform, following Apache governance.
 The Flink community decided to move some of our connectors
there. Kafka, Kinesis, streaming files, … will stay in Flink!
 Flink connectors in Bahir:
ActiveMQ, Redis, Flume sink, RethinkDB (incoming), streaming
Hbase (incoming).
 New connector contributions are welcome!
34
Disclaimer: The description of the Bahir community is my personal view. I am not a representative of the project.
Time for questions…
35
Connectors in Apache Flink
 Ask me now!
 Follow me on Twitter: @rmetzger_
 Ask the Flink community on
user@flink.apache.org
 Ask me privately on
rmetzger@apache.org
36
Message Queues
Exactly-once for
37
Message Queues supported by Flink
38
 Traditional message queues have different semantics
than Kafka, Kinesis, etc.
 RabbitMQ
• Advanced Message
Queuing Protocol (AMQP)
• Available in Apache Flink
 ActiveMQ
• Java Message Service (JMS)
• Available in Apache Bahir (no release yet)
Image source: http://www.instructables.com/id/Spark-Core-Photon-and-CloudMQTT/step1/What-is-Message-Queuing/
Message Queue Semantics
39
Flink
RabbitMQ
Source
Offset
Flink Kafka
Consumer
 In MQs, messages are removed once they are
consumed  Replay not possible
Message Acknowledging
 Once a checkpoint has been completed by all operators, the messages in
the queue are acknowledged, leading to their removal from the queue.
40
id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1
Flink
RabbitMQ
Source
Checkpoint 1:
id=1
id=2
id=3
Checkpoint 2:
id=4
id=5
id=6
Checkpoint 1 completed
id=8 id=7 id=6 id=5 id=4
Flink
RabbitMQ
Source
Checkpoint 1:
id=1
id=2
id=3
Checkpoint 2:
id=4
id=5
id=6
Message queue
ACK
id=1
ACK
id=2
ACK
id=3
Message Acknowledging
 In case of a failure, all the unacknowledged messages are
consumed again
41
id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1
Flink
RabbitMQ
Source
Checkpoint 1:
id=1
id=2
id=3
Checkpoint 2:
id=4
id=5
id=6
System failure
Flink
RabbitMQ
Source
Checkpoint 1:
id=1
id=2
id=3
Message queue
id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1
Message are not
lost and send again
after recovery
Message Acknowledging
 What happens if the system fails after a checkpoint is completed, but
before all messages have been acknowledged?
42
Checkpoint 1 completed
id=8 id=7 id=6 id=5 id=4
Flink
RabbitMQ
Source
Checkpoint 1:
id=1
id=2
id=3
Checkpoint 2:
id=4
id=5
id=6ACK
id=1
ACK
id=2
ACK
id=3
FAIL
 Flink stores a correlation ID of each (un-acked)
message to de-duplicate on restore
id=3
1 of 42

Recommended

Click-Through Example for Flink’s KafkaConsumer Checkpointing by
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
57.9K views12 slides
Stephan Ewen - Scaling to large State by
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
2.6K views27 slides
Apache flink 1.0.0 overview by
Apache flink 1.0.0 overviewApache flink 1.0.0 overview
Apache flink 1.0.0 overviewMapR Technologies
501 views25 slides
Francesco Versaci - Flink in genomics - efficient and scalable processing of ... by
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Flink Forward
641 views29 slides
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large... by
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
1.5K views44 slides
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink by
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
6.8K views39 slides

More Related Content

What's hot

Unified Stream and Batch Processing with Apache Flink by
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkDataWorks Summit/Hadoop Summit
4.5K views69 slides
Apache Flink@ Strata & Hadoop World London by
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
3.6K views39 slides
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes... by
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward
542 views23 slides
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ... by
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward
865 views32 slides
Apache Flink @ NYC Flink Meetup by
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupStephan Ewen
1.7K views68 slides
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing by
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processingYahoo Developer Network
5K views28 slides

What's hot(20)

Apache Flink@ Strata & Hadoop World London by Stephan Ewen
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen3.6K views
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes... by Flink Forward
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
Flink Forward542 views
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ... by Flink Forward
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward865 views
Apache Flink @ NYC Flink Meetup by Stephan Ewen
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
Stephan Ewen1.7K views
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing by Yahoo Developer Network
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin... by ucelebi
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi1.2K views
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod... by Flink Forward
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward397 views
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli... by Flink Forward
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward518 views
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink by Ververica
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica 2K views
Flink Streaming Hadoop Summit San Jose by Kostas Tzoumas
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas2.4K views
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco... by Flink Forward
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward633 views
Tech Talk @ Google on Flink Fault Tolerance and HA by Paris Carbone
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone1.9K views
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing... by Flink Forward
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward299 views
Pulsar connector on flink 1.14 by 宇帆 盛
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14
宇帆 盛123 views
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ... by Flink Forward
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward881 views
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API by Flink Forward
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward1.7K views
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat... by Flink Forward
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward1.3K views
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p... by Flink Forward
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward527 views
Apache Flink Training: DataStream API Part 1 Basic by Flink Forward
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward8.7K views

Viewers also liked

Jamie Grier - Robust Stream Processing with Apache Flink by
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
2.3K views24 slides
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ... by
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
989 views48 slides
A look at Flink 1.2 by
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2Stefan Richter
1.4K views82 slides
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an... by
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
943 views27 slides
Aljoscha Krettek - The Future of Apache Flink by
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
1.1K views34 slides
Stephan Ewen - Running Flink Everywhere by
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
2.7K views28 slides

Viewers also liked(20)

Jamie Grier - Robust Stream Processing with Apache Flink by Flink Forward
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward2.3K views
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ... by Flink Forward
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward989 views
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an... by Flink Forward
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Flink Forward943 views
Aljoscha Krettek - The Future of Apache Flink by Flink Forward
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
Flink Forward1.1K views
Stephan Ewen - Running Flink Everywhere by Flink Forward
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
Flink Forward2.7K views
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads by Flink Forward
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward1.6K views
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea... by Ververica
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica 1.6K views
Apache Flink Community Updates November 2016 @ Berlin Meetup by Robert Metzger
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger1K views
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords by Stephan Ewen
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen2.4K views
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next by Ververica
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica 724 views
Taking a look under the hood of Apache Flink's relational APIs. by Fabian Hueske
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske2.7K views
Kostas Tzoumas - Stream Processing with Apache Flink® by Ververica
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica 1.1K views
Trevor Grant - Apache Zeppelin - A friendlier way to Flink by Flink Forward
Trevor Grant - Apache Zeppelin - A friendlier way to FlinkTrevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Flink Forward895 views
Alexander Kolb - Flinkspector – Taming the squirrel by Flink Forward
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrel
Flink Forward1.3K views
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink by Flink Forward
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward1.1K views
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with... by Flink Forward
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Flink Forward479 views
Ted Dunning-Faster and Furiouser- Flink Drift by Flink Forward
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
Flink Forward833 views
Julian Hyde - Streaming SQL by Flink Forward
Julian Hyde - Streaming SQLJulian Hyde - Streaming SQL
Julian Hyde - Streaming SQL
Flink Forward841 views
Ted Dunning - Keynote: How Can We Take Flink Forward? by Flink Forward
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?
Flink Forward451 views

Similar to Robert Metzger - Connecting Apache Flink to the World - Reviewing the streaming connectors

Apache Kafka Women Who Code Meetup by
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
1.2K views51 slides
Scaling big with Apache Kafka by
Scaling big with Apache KafkaScaling big with Apache Kafka
Scaling big with Apache KafkaNikolay Stoitsev
259 views73 slides
Apache Kafka from 0.7 to 1.0, History and Lesson Learned by
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
1.5K views120 slides
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa... by
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
708 views136 slides
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning by
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
1.6K views110 slides
Kafka 101 by
Kafka 101Kafka 101
Kafka 101Aparna Pillai
205 views23 slides

Similar to Robert Metzger - Connecting Apache Flink to the World - Reviewing the streaming connectors(20)

Apache Kafka Women Who Code Meetup by Snehal Nagmote
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
Snehal Nagmote1.2K views
Apache Kafka from 0.7 to 1.0, History and Lesson Learned by Guozhang Wang
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Guozhang Wang1.5K views
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa... by Guozhang Wang
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang708 views
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning by Guido Schmutz
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz1.6K views
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications by Lightbend
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend5K views
Apache Kafka Reliability by Jeff Holoman
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
Jeff Holoman1.7K views
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc... by Matthias J. Sax
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Matthias J. Sax566 views
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams by confluent
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent3.2K views
Apache Kafka - Scalable Message-Processing and more ! by Guido Schmutz
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz1.4K views
TDEA 2018 Kafka EOS (Exactly-once) by Erhwen Kuo
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo520 views
Cruise Control: Effortless management of Kafka clusters by Prateek Maheshwari
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
Prateek Maheshwari279 views
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ... by confluent
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
confluent2.3K views
[네이버오픈소스세미나] Maglev Hashing Scheduler in IPVS, Linux Kernel - 송인주 by NAVER Engineering
[네이버오픈소스세미나] Maglev Hashing Scheduler in IPVS, Linux Kernel - 송인주[네이버오픈소스세미나] Maglev Hashing Scheduler in IPVS, Linux Kernel - 송인주
[네이버오픈소스세미나] Maglev Hashing Scheduler in IPVS, Linux Kernel - 송인주
NAVER Engineering18.8K views
From a Kafkaesque Story to The Promised Land at LivePerson by LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson3.2K views
A Deep Dive into Kafka Controller by confluent
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent10.2K views
Reliability Guarantees for Apache Kafka by confluent
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
confluent4.6K views

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin... by
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
856 views56 slides
Evening out the uneven: dealing with skew in Flink by
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
2.5K views35 slides
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i... by
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
185 views13 slides
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ... by
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
582 views34 slides
Introducing the Apache Flink Kubernetes Operator by
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
784 views37 slides
Autoscaling Flink with Reactive Mode by
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
926 views17 slides

More from Flink Forward(20)

Building a fully managed stream processing platform on Flink at scale for Lin... by Flink Forward
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward856 views
Evening out the uneven: dealing with skew in Flink by Flink Forward
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward2.5K views
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i... by Flink Forward
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward185 views
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ... by Flink Forward
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward582 views
Introducing the Apache Flink Kubernetes Operator by Flink Forward
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward784 views
Autoscaling Flink with Reactive Mode by Flink Forward
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward926 views
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli... by Flink Forward
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward266 views
One sink to rule them all: Introducing the new Async Sink by Flink Forward
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward316 views
Tuning Apache Kafka Connectors for Flink.pptx by Flink Forward
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward431 views
Flink powered stream processing platform at Pinterest by Flink Forward
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward224 views
Apache Flink in the Cloud-Native Era by Flink Forward
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward174 views
Where is my bottleneck? Performance troubleshooting in Flink by Flink Forward
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward541 views
Using the New Apache Flink Kubernetes Operator in a Production Deployment by Flink Forward
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward656 views
The Current State of Table API in 2022 by Flink Forward
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward173 views
Dynamic Rule-based Real-time Market Data Alerts by Flink Forward
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward759 views
Exactly-Once Financial Data Processing at Scale with Flink and Pinot by Flink Forward
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward697 views
Processing Semantically-Ordered Streams in Financial Services by Flink Forward
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward169 views
Tame the small files problem and optimize data layout for streaming ingestion... by Flink Forward
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward810 views
Batch Processing at Scale with Flink & Iceberg by Flink Forward
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward592 views
Welcome to the Flink Community! by Flink Forward
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward127 views

Recently uploaded

Construction Accidents & Injuries by
Construction Accidents & InjuriesConstruction Accidents & Injuries
Construction Accidents & InjuriesBisnar Chase Personal Injury Attorneys
6 views5 slides
Infomatica-MDM.pptx by
Infomatica-MDM.pptxInfomatica-MDM.pptx
Infomatica-MDM.pptxKapil Rangwani
11 views16 slides
shivam tiwari.pptx by
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
7 views14 slides
Employees attrition by
Employees attritionEmployees attrition
Employees attritionMaryAlejandraDiaz
7 views5 slides
Listed Instruments Survey 2022.pptx by
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
121 views12 slides
Shreyas hospital statistics.pdf by
Shreyas hospital statistics.pdfShreyas hospital statistics.pdf
Shreyas hospital statistics.pdfsamithavinal
5 views9 slides

Recently uploaded(20)

Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat4121 views
Shreyas hospital statistics.pdf by samithavinal
Shreyas hospital statistics.pdfShreyas hospital statistics.pdf
Shreyas hospital statistics.pdf
samithavinal5 views
Product Research sample.pdf by AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language... by patiladiti752
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
patiladiti7528 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views

Robert Metzger - Connecting Apache Flink to the World - Reviewing the streaming connectors

  • 1. Robert Metzger, Aljoscha Krettek @rmetzger_ Connecting Apache Flink® to the World: Reviewing the streaming connectors
  • 2. What to expect from this talk  Overview of all available connectors  Kafka connector internals  End-to-end exactly-once  Apache Bahir and the future of connectors  [Bonus] Message Queues and the Message Acknowledging Source 2
  • 3. Connectors in Apache Flink® “Hello World, let’s connect” 3
  • 4. Connectors in Flink 1.1 Connector Source Sink Notes Streaming files Both source and sink are exactly-once Apache Kafka Consumers (sources) exactly-once Amazon Kinesis Consumers (sources) exactly-once RabbitMQ / AMQP Consumers (sources) exactly-once Elasticsearch No guarantees Apache Cassandra Exactly-once with idempotent updates Apache Nifi No guarantees Redis No guarantees 4 There is also a Twitter Source and an ActiveMQ connector in Apache Bahir
  • 5. Streaming connectors by activity Streaming connectors ordered by number of threads/mentions on the user@flink list:  Apache Kafka (250+) (since 0.7)  Apache Cassandra (38) (since 1.1)  ElasticSearch (34) (since 0.10)  File sources (~30) (since 0.10)  Redis (27) (since 1.0)  RabbitMQ (11) (since 0.7)  Kinesis (10) (since 1.1)  Apache Nifi (5) (since 0.10) 5Date of evaluation 5.9.2016
  • 6. The Apache Kafka Connector 6
  • 7. Apache Kafka connector: Intro “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.” 7This page contains material copied from http://kafka.apache.org/documentation.html#introduction
  • 8. Apache Kafka connector: Consumer  Flink has two main Kafka consumer implementations • For Kafka 0.8 an implementation against the “SimpleConsumer” API of Kafka • For Kafka 0.9+ we are using the new Kafka consumer (KAFKA-1326)  The producers are basically the same 8
  • 10. Kafka 0.8 Broker rebalance 10 Fetcher Thread Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 KafkaClusterFlinkCluster The consumer is able to handle broker failures 1 Broker fails 2 Thread returns partitions
  • 11. Kafka 0.8 Broker rebalance 11 Fetcher Thread Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 KafkaClusterFlinkCluster On a failure, the Consumer Thread re-assigns partitions and spawns new threads as needed 1 Broker fails 2 Thread returns partitions topicB:4 topicB:2 topicA:1
  • 12. Kafka 0.8 Broker rebalance 12 Fetcher Thread Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicA:3 topicB:6 topicB:5 KafkaClusterFlinkCluster On a failure, the Consumer Thread re-assigns partitions and spawns new threads as needed 3 Kafka reassigns partitions topicB:4 topicB:2 topicA:1 topicB:2 topicB:4 topicA:1 topicB:2 Fetcher Thread topicB:4 topicA:1 topicB:4 topicB:2 topicA:1 4 Flink assigns partitions to existing or new threads
  • 13. Kafka 0.9+ Consumer 13 Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Consumer Thread KafkaClusterFlinkCluster New Kafka Consumer Magic TaskManager TaskManager Since Kafka 0.9, the new Consumer API handles broker failures/rebalancing, offset committing, topic querying, …
  • 14. Exactly-once for Kafka consumers  Mechanism is the same for all connector versions  Offsets to Zookeeper / Broker for group.id restart and external tools (at-least-once)  Offsets checkpointed for exactly-once with Flink state 14
  • 15. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 0 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 0, 0 This toy example is reading from a Kafka topic with two partitions, each containing “a”, “b”, “c”, … as messages. The offset is set to 0 for both partitions, a counter is initialized to 0.
  • 16. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator a counter = 0 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 1, 0 The Kafka consumer starts reading messages from partition 0. Message “a” is in-flight, the offset for the first consumer has been set to 1.
  • 17. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator a counter = 1 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 2, 1 a b Trigger Checkpoint at source Message “a” arrives at the counter, it is set to 1. The consumers both read the next records (“b” and “a”). The offsets are set accordingly. In parallel, the checkpoint coordinator decides to trigger a checkpoint at the source …
  • 18. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator a counter = 2 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 1 a b offsets = 2, 1 c The source has created a snapshot of its state (“offset=2,1”), which is now stored in the checkpoint coordinator. The sources emitted a checkpoint barrier after messages “a” and “b”.
  • 19. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a b offsets = 2, 1 counter = 3 c b The map operator has received checkpoint barriers from both sources. It checkpoints its state (counter=3) in the coordinator. At the same time, the consumers are further reading more data from the Kafka partitions.
  • 20. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 4 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a offsets = 2, 1 counter = 3 c b Notify checkpoint complete The checkpoint coordinator informs the Kafka consumer that the checkpoint has been completed. It commits the checkpoints offsets into Zookeeper. Note that Flink is not relying on the Kafka offsets in ZK for restoring from failures
  • 21. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 4 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a offsets = 2, 1 counter = 3 c b Checkpoint in Zookeeper/ Broker The checkpoint is now persisted in Zookeeper. External tools such as the Kafka Offset Checker can see the lag of the consumer group.
  • 22. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 5 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 4, 2 offsets = 2, 1 counter = 3 c b d The processing further advances
  • 23. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 5 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 4, 2 offsets = 2, 1 counter = 3 c b d Failure Some failure has happened (such as worker failure)
  • 24. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 2, 1 offsets = 2, 1 counter = 3 Reset all operators to last completed checkpoint The checkpoint coordinator restores the state at all the operators participating at the checkpointing. The Kafka sources start from offset 2 and 1, the counter’s value is 3.
  • 25. a b c d e a b c d e Flink Kafka Consumer Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 1 offsets = 2, 1 counter = 3 Continue processing … c The system continues with the processing, the counter’s value is consistent across a worker failure.
  • 27. Consistently move and process data 27 Process Transform Analyze Exactly-once: • Apache Kafka • Kinesis • RabbitMQ / ActiveMQ • File monitoring Exactly-once: • Rolling file sink • With idempotent updates • Apache Cassandra • Elasticsearch • Redis At-least-once (duplicates): • Apache Kafka  Flink allows to move data between systems, keeping consistency
  • 28. Continuous File Monitoring 28 Some FileSystem Monitoring task Periodic Querying Parallel file reader Parallel file reader Parallel file reader • File Path • Offset Records  The monitoring task checkpoints the last “modification time”  The file readers checkpoint the current file + offset and the list of pending files to read
  • 29. Rolling / Bucketing File Sink  System time bucketing  Bucketing based on record data 29 Bucketing Operator 11:00 10:00 9:00 9 5 1 1 8 4 2 4 6 2 3 4 Bucketing Operator 8:00 0-4 5-9
  • 30. Bucketing File Sink exactly-once  On Hadoop 2.7+, we call truncate() to remove invalid data on restore  On earlier versions, we’ll write a metadata file with valid offsets  Downstream consumers must take valid offset metadata into account 30
  • 31. Kafka Producer: Avoid data loss  Apache Kafka does currently not provide the infrastructure to produce in an exactly-once fashion  By avoiding data-loss, we can guarantee at-least-once. 31 Flink Kafka Producer Kafka broker Kafka partition unacknowledged=7 On checkpoint, Flink calls flush() and waits for unack == 0  Guarantee that data has been written ACK ACK ACK ACK
  • 32. Apache Bahir and the future of connectors What’s next 32
  • 33. Future of Connectors in Flink  Kafka 0.10 support, with timestamps  Dynamic scaling support for Kafka and other connectors  Refactor Kafka connector API 33
  • 34. Apache Bahir™  Bahir is a community specialized in connectors, allowing faster releases independent of engine releases.  Apache Bahir™ has been created for providing community- contributed connectors a platform, following Apache governance.  The Flink community decided to move some of our connectors there. Kafka, Kinesis, streaming files, … will stay in Flink!  Flink connectors in Bahir: ActiveMQ, Redis, Flume sink, RethinkDB (incoming), streaming Hbase (incoming).  New connector contributions are welcome! 34 Disclaimer: The description of the Bahir community is my personal view. I am not a representative of the project.
  • 36. Connectors in Apache Flink  Ask me now!  Follow me on Twitter: @rmetzger_  Ask the Flink community on user@flink.apache.org  Ask me privately on rmetzger@apache.org 36
  • 38. Message Queues supported by Flink 38  Traditional message queues have different semantics than Kafka, Kinesis, etc.  RabbitMQ • Advanced Message Queuing Protocol (AMQP) • Available in Apache Flink  ActiveMQ • Java Message Service (JMS) • Available in Apache Bahir (no release yet) Image source: http://www.instructables.com/id/Spark-Core-Photon-and-CloudMQTT/step1/What-is-Message-Queuing/
  • 39. Message Queue Semantics 39 Flink RabbitMQ Source Offset Flink Kafka Consumer  In MQs, messages are removed once they are consumed  Replay not possible
  • 40. Message Acknowledging  Once a checkpoint has been completed by all operators, the messages in the queue are acknowledged, leading to their removal from the queue. 40 id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 2: id=4 id=5 id=6 Checkpoint 1 completed id=8 id=7 id=6 id=5 id=4 Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 2: id=4 id=5 id=6 Message queue ACK id=1 ACK id=2 ACK id=3
  • 41. Message Acknowledging  In case of a failure, all the unacknowledged messages are consumed again 41 id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 2: id=4 id=5 id=6 System failure Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Message queue id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Message are not lost and send again after recovery
  • 42. Message Acknowledging  What happens if the system fails after a checkpoint is completed, but before all messages have been acknowledged? 42 Checkpoint 1 completed id=8 id=7 id=6 id=5 id=4 Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 2: id=4 id=5 id=6ACK id=1 ACK id=2 ACK id=3 FAIL  Flink stores a correlation ID of each (un-acked) message to de-duplicate on restore id=3

Editor's Notes

  1. Take home for attendees: Understand connector internals to configure them properly Learn how to implement your own connectors
  2. Take home for attendees: Understand connector internals to configure them properly Learn how to implement your own connectors
  3. As always in computer science: Simple but powerful concepts are superior. Kafka is simple but very efficient.
  4. Simple Consumer for 0.8: We need control over offset committing, consumer group behavior, … High level consumer didn’t offer that.
  5. Pre 1.0, Flink didn’t handle broker failures transparently. The Kafka consumer relied on the fault tolerance mechanisms
  6. Correlation IDs need to be unique, so producers have to act accordingly