SlideShare a Scribd company logo
NICO KRUBER
SOLUTION ARCHITECT / SOFTWARE ENGINEER @ DATA ARTISANS,
APACHE FLINK COMMITTER
IMPROVING THROUGHPUT AND LATENCY
WITH FLINK’S NETWORK STACK
© 2018 data Artisans2
FLINK DATA TRANSPORT (LOGICAL)
• Subtask output
‒ pipelined-bounded
‒ pipelined-unbounded
‒ Blocking
• Scheduling type
‒ all at once
‒ next stage on complete output
‒ next stage on first output
• Transport
‒ high throughput via buffers
‒ low latency via buffer timeout
Subtask 1
Subtask 2
Subtask 3
Subtask 4
Stream Partition
Abstraction over:
© 2018 data Artisans3
TaskManager 1 TaskManager 2
Subtask 4
1
2
Buffer Pool
Subtask 2
3
4
Buffer Pool
Subtask 3
1
2
Buffer Pool
Empty
Buffer
Subtask 1
3
4
Buffer Pool
Buffer with
Data in Queue
FLINK DATA TRANSPORT (PHYSICAL)
TCP Connection
© 2018 data Artisans4
TaskManager 1 TaskManager 2
Subtask 2
3
4
Buffer Pool
Subtask 1
3
4
Buffer Pool
Buffer with
Data in Queue
Subtask 4
1
2
Buffer Pool
Subtask 3
1
2
Buffer Pool
Empty
Buffer
FLINK DATA TRANSPORT (PHYSICAL)
TCP Connection
Backpressure
© 2018 data Artisans5
TaskManager 1 TaskManager 2
Subtask 2
3
4
Buffer Pool
Subtask 1
3
4
Buffer Pool
Subtask 4
1
2
Buffer Pool
Subtask 3
1
2
Buffer Pool
FLINK DATA TRANSPORT (PHYSICAL)
TCP Connection
Backpressure
Zoom in
© 2018 data Artisans6
CREDIT-BASED FLOW CONTROL
© 2018 data Artisans7
Subtask 4
TaskManager 1 TaskManager 2
Subtask 2
2
Buffer Pool
CREDIT-BASED FLOW CONTROL (FLINK 1.5+)
TCP Connection
1
Floating
Buffers
Exclusive
Buffers
Backlog
© 2018 data Artisans8
Subtask 4
TaskManager 1 TaskManager 2
Subtask 2
2
Buffer Pool
CREDIT-BASED FLOW CONTROL (FLINK 1.5+)
TCP Connection
Floating
Buffers
0
Unannounced
Credit
2
Channel
Credit
announce credit
Send buffers &
announce backlog size
4 Backlog size
Ask for
floating
buffers
321
© 2018 data Artisans9
CREDIT-BASED FLOW CONTROL (FLINK 1.5+)
• Never blocks the TCP connection
➢ Better resource utilization with data
skew in multiplexed connections
• Avoids overloading of slow receivers
(direct control over amount of buffered
data)
➢ Improves checkpoint alignment
• cost: additional announce messages
(piggy-bagged),
potential round-trip latency
Checkpoint Duration
Without Flow Control
With Flow Control
© 2018 data Artisans10
LOW LATENCY IMPROVEMENTS
© 2018 data Artisans11
TaskManager 1 TaskManager 2
Subtask 4
1
2
Subtask 2
Buffer Pool
Subtask 3
1
2
Subtask 1
3
4
NETWORK STACK (EXTENDED)
TCP Connection
NettyServer
Buffer Pool Buffer Pool
Buffer Pool
NettyClient
RecordWriter
3
4
RecordWriter
RecordReader
RecordReader
Zoom in
© 2018 data Artisans12
FROM RECORD TO NETWORK
Subtask 1
NettyServer
Buffer
Pool
StreamRecordWriter
Write data &
update writer index
notify
new data
take data &
remove buffer
get new
buffer
© 2018 data Artisans13
FROM RECORD TO NETWORK
Subtask 1
NettyServer
Buffer
Pool
StreamRecordWriter
Output
Flusher
flush
notify
new data
Write data &
update writer index
take data &
update reader index
© 2018 data Artisans14
BufferConsumer
BufferBuilder
BUFFER BUILDER & CONSUMER
• Producer-Consumer structure with lightweight synchronization
MemorySegment
volatile int
writePosition
int readPosition
• append()
• commit()
• finish() • build() →
Buffer.readOnly
Slice()
Buffer
(wrapping MemorySegment)
© 2018 data Artisans15
LATENCY VS. THROUGHPUT
▪ low latency via buffer timeout
*100 nodes x 8 slots
▪ high throughput through buffers
© 2018 data Artisans16
CONNECTION TYPES
© 2018 data Artisans17
LOCAL VS. REMOTE CONNECTIONS
• Every (unchained) connection:
‒ Requires serialization
‒ Assembles serialized records into buffers
‒ Forwards a buffer when it is full or the buffer timeout hit
• Remote connection:
‒ Sent via multiplexed Netty TCP connections (one per pair of tasks and task managers)
‒ As soon as a buffer is on the wire, it can be re-used
➢ Allows credit-based flow control to control amount of buffered data
• Local connection:
‒ Direct connection between sender and receiver: buffers are shared
➢ No need for further flow control (buffered data = sender buffers)
© 2018 data Artisans18
TUNING OPTIONS
© 2018 data Artisans19
CREDIT-BASED FLOW CONTROL
• taskmanager.network.credit-model: true/false
• taskmanager.network.memory.buffers-per-channel: 2
• taskmanager.network.memory.floating-buffers-per-gate: 8
• Number of exclusive buffers should be enough to saturate the network for a full
round-trip-time (2 x network latency)
➢ #exclBuffers * segmentSize = round-trip-time * throughput
Subtask 4
TaskManager 1 TaskManager 2
Subtask 2
2
0
Unannounced
Credit2
Channel
Credit
announce credit
Send buffers &
announce backlog size
0
Backlog size
© 2018 data Artisans20
CREDIT-BASED FLOW CONTROL
• Number of exclusive buffers too high
➢ higher number of required network buffers
➢ buffering more during checkpoint alignment
➢ BUT: faster ramp-up (before floating buffers kick in)
• Number of exclusive buffers too low
➢ times of in-activity during ramp-up
Subtask 4
TaskManager 1 TaskManager 2
Subtask 2
2
0
Unannounced
Credit2
Channel
Credit
announce credit
Send buffers &
announce backlog size
0
Backlog size
© 2018 data Artisans21
BUFFER TIMEOUT
• StreamExecutionEnvironment#setBufferTimeout()
• Affects every unchained connection: remote or local
➢ Upper bound on latency for low throughput channels(!)
➢ Trade-off throughput vs. latency (see earlier)
© 2018 data Artisans22
NETWORK THREADS
• netty.client.numThreads (default: number of slots)
• netty.server.numThreads (default: number of slots)
• May become a bottleneck if thread(s) are overloaded
• BUT: may also become an overhead if too many
➢ Do your own benchmarks and verify for your job!
© 2018 data Artisans23
USE LINUX-NATIVE EPOLL (FLINK 1.6+)
• taskmanager.network.netty.transport: AUTO | NIO | EPOLL
• EPOLL may reduce the channel polling overhead between user space and
kernel/system space
• There should be no downside in activating this or at least AUTO.
➢ Do your own benchmarks for your job!
• Please give feedback in FLINK-10177 so that we can decide whether to use
AUTO by default.
© 2018 data Artisans24
METRICS
© 2018 data Artisans25
NETWORK STACK METRICS
• Backpressure monitor
‒ Web/REST UI, /jobs/:jobid/vertices/:vertexid/backpressure)
• [input, output]QueueLength
• numRecords[In, Out]
• numBytesOut, numBytesIn[Local, Remote]
• numBuffersOut, numBuffersIn[Local, Remote] (Flink 1.5.3+, 1.6.1+)
© 2018 data Artisans26
LATENCY MARKERS
• ExecutionConfig#setLatencyTrackingInterval() (default: every 2s)
• Sources periodically emit a LatencyMarker with a timestamp
• These flow with the stream and properly queue behind records
• Latency markers bypass operators, e.g. windows
• Once received, they will be re-emitted onto a random output channel
• We create one histogram per source ↔ operator pair (window size: 128)
• source_id.<sourceId>.source_subtask_index.<subtaskIdx>.
operator_id.<operatorId>.operator_subtask_index.<subtaskIdx>
➢ 10 operators, parallelism 100 = 9 * 100 * 100 = 90,000 histograms!
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#latency-tracking
© 2018 data Artisans27
COMMON ANTIPATTERNS
© 2018 data Artisans28
REPEATED KEYBY’S ON THE SAME KEY
• KeyedStream is not retained
‒ UDF could have changed the key
• Additional keyBy() is necessary to gain access to keyed state, but:
‒ Prevents chaining
‒ Adds an additional shuffle
➢ DataStreamUtils#reinterpretAsKeyedStream
.keyBy(“location”)
.keyBy(“location”)
© 2018 data Artisans29
Subtask 4
TaskManager 1 TaskManager 2
Subtask 2
Buffer Pool
CREDIT-BASED FLOW CONTROL (FLINK 1.5+)
TCP Connection
Floating
Buffers
Exclusive
Buffers
Buffers: #channels * 2 + 8
LatencyMarker
➢ synchronization overhead
for the output flusher!
© 2018 data Artisans30
WHAT‘S UP NEXT?
© 2018 data Artisans31
NETWORK SERIALIZATION STACK (FLINK 1.7?)
• Serialization for broadcasts once per record, not channel
• Only one intermediate serialization buffer (on heap)
➢ significantly reduces the memory footprint
• see FLINK-9913
TaskManager 1
Subtask 2
RecordWriter
© 2018 data Artisans32
OPENSSL-BASED SSL ENGINE (FLINK 1.7?)
• Runs native code
• Uses advanced CPU instruction sets
➢ May reduce encryption/decryption overhead (needs verification)
• see FLINK-9816
© 2018 data Artisans33
MOVE OUTPUT FLUSHER TO NETTY
• Current implementation may have (GC) problems with many channels
➢ schedule the output flusher inside the Netty event loop
• see FLINK-8625
THANK YOU!
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers

More Related Content

What's hot

Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
Altinity Ltd
 
Same plan different performance
Same plan different performanceSame plan different performance
Same plan different performance
Mauro Pagano
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Johann Lombardi
 
Awr + 12c performance tuning
Awr + 12c performance tuningAwr + 12c performance tuning
Awr + 12c performance tuning
AiougVizagChapter
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
Mauro Pagano
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
Taro L. Saito
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)
Marco Pas
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
Carlos Sierra
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
John Beresniewicz
 
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBHow a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
Carlos Sierra
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Flink Complex Event Processing
Flink Complex Event ProcessingFlink Complex Event Processing
Flink Complex Event Processing
Dawid Wysakowicz
 

What's hot (20)

Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
 
Same plan different performance
Same plan different performanceSame plan different performance
Same plan different performance
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
 
Awr + 12c performance tuning
Awr + 12c performance tuningAwr + 12c performance tuning
Awr + 12c performance tuning
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBHow a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Flink Complex Event Processing
Flink Complex Event ProcessingFlink Complex Event Processing
Flink Complex Event Processing
 

Similar to Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency with Flink's network stack"

Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Comsysto Reply GmbH
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
Xin Wang
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
Multi-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation StrategiesMulti-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation Strategies
Logan Best
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)Art Schanz
 
Hungary Usergroup - Midonet overlay programming
Hungary Usergroup - Midonet overlay programmingHungary Usergroup - Midonet overlay programming
Hungary Usergroup - Midonet overlay programming
Marton Kiss
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
Amazon Web Services
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Multi-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation StrategiesMulti-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation Strategies
Sagi Brody
 

Similar to Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency with Flink's network stack" (20)

Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Multi-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation StrategiesMulti-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation Strategies
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
 
Hungary Usergroup - Midonet overlay programming
Hungary Usergroup - Midonet overlay programmingHungary Usergroup - Midonet overlay programming
Hungary Usergroup - Midonet overlay programming
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Multi-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation StrategiesMulti-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation Strategies
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency with Flink's network stack"

  • 1. NICO KRUBER SOLUTION ARCHITECT / SOFTWARE ENGINEER @ DATA ARTISANS, APACHE FLINK COMMITTER IMPROVING THROUGHPUT AND LATENCY WITH FLINK’S NETWORK STACK
  • 2. © 2018 data Artisans2 FLINK DATA TRANSPORT (LOGICAL) • Subtask output ‒ pipelined-bounded ‒ pipelined-unbounded ‒ Blocking • Scheduling type ‒ all at once ‒ next stage on complete output ‒ next stage on first output • Transport ‒ high throughput via buffers ‒ low latency via buffer timeout Subtask 1 Subtask 2 Subtask 3 Subtask 4 Stream Partition Abstraction over:
  • 3. © 2018 data Artisans3 TaskManager 1 TaskManager 2 Subtask 4 1 2 Buffer Pool Subtask 2 3 4 Buffer Pool Subtask 3 1 2 Buffer Pool Empty Buffer Subtask 1 3 4 Buffer Pool Buffer with Data in Queue FLINK DATA TRANSPORT (PHYSICAL) TCP Connection
  • 4. © 2018 data Artisans4 TaskManager 1 TaskManager 2 Subtask 2 3 4 Buffer Pool Subtask 1 3 4 Buffer Pool Buffer with Data in Queue Subtask 4 1 2 Buffer Pool Subtask 3 1 2 Buffer Pool Empty Buffer FLINK DATA TRANSPORT (PHYSICAL) TCP Connection Backpressure
  • 5. © 2018 data Artisans5 TaskManager 1 TaskManager 2 Subtask 2 3 4 Buffer Pool Subtask 1 3 4 Buffer Pool Subtask 4 1 2 Buffer Pool Subtask 3 1 2 Buffer Pool FLINK DATA TRANSPORT (PHYSICAL) TCP Connection Backpressure Zoom in
  • 6. © 2018 data Artisans6 CREDIT-BASED FLOW CONTROL
  • 7. © 2018 data Artisans7 Subtask 4 TaskManager 1 TaskManager 2 Subtask 2 2 Buffer Pool CREDIT-BASED FLOW CONTROL (FLINK 1.5+) TCP Connection 1 Floating Buffers Exclusive Buffers Backlog
  • 8. © 2018 data Artisans8 Subtask 4 TaskManager 1 TaskManager 2 Subtask 2 2 Buffer Pool CREDIT-BASED FLOW CONTROL (FLINK 1.5+) TCP Connection Floating Buffers 0 Unannounced Credit 2 Channel Credit announce credit Send buffers & announce backlog size 4 Backlog size Ask for floating buffers 321
  • 9. © 2018 data Artisans9 CREDIT-BASED FLOW CONTROL (FLINK 1.5+) • Never blocks the TCP connection ➢ Better resource utilization with data skew in multiplexed connections • Avoids overloading of slow receivers (direct control over amount of buffered data) ➢ Improves checkpoint alignment • cost: additional announce messages (piggy-bagged), potential round-trip latency Checkpoint Duration Without Flow Control With Flow Control
  • 10. © 2018 data Artisans10 LOW LATENCY IMPROVEMENTS
  • 11. © 2018 data Artisans11 TaskManager 1 TaskManager 2 Subtask 4 1 2 Subtask 2 Buffer Pool Subtask 3 1 2 Subtask 1 3 4 NETWORK STACK (EXTENDED) TCP Connection NettyServer Buffer Pool Buffer Pool Buffer Pool NettyClient RecordWriter 3 4 RecordWriter RecordReader RecordReader Zoom in
  • 12. © 2018 data Artisans12 FROM RECORD TO NETWORK Subtask 1 NettyServer Buffer Pool StreamRecordWriter Write data & update writer index notify new data take data & remove buffer get new buffer
  • 13. © 2018 data Artisans13 FROM RECORD TO NETWORK Subtask 1 NettyServer Buffer Pool StreamRecordWriter Output Flusher flush notify new data Write data & update writer index take data & update reader index
  • 14. © 2018 data Artisans14 BufferConsumer BufferBuilder BUFFER BUILDER & CONSUMER • Producer-Consumer structure with lightweight synchronization MemorySegment volatile int writePosition int readPosition • append() • commit() • finish() • build() → Buffer.readOnly Slice() Buffer (wrapping MemorySegment)
  • 15. © 2018 data Artisans15 LATENCY VS. THROUGHPUT ▪ low latency via buffer timeout *100 nodes x 8 slots ▪ high throughput through buffers
  • 16. © 2018 data Artisans16 CONNECTION TYPES
  • 17. © 2018 data Artisans17 LOCAL VS. REMOTE CONNECTIONS • Every (unchained) connection: ‒ Requires serialization ‒ Assembles serialized records into buffers ‒ Forwards a buffer when it is full or the buffer timeout hit • Remote connection: ‒ Sent via multiplexed Netty TCP connections (one per pair of tasks and task managers) ‒ As soon as a buffer is on the wire, it can be re-used ➢ Allows credit-based flow control to control amount of buffered data • Local connection: ‒ Direct connection between sender and receiver: buffers are shared ➢ No need for further flow control (buffered data = sender buffers)
  • 18. © 2018 data Artisans18 TUNING OPTIONS
  • 19. © 2018 data Artisans19 CREDIT-BASED FLOW CONTROL • taskmanager.network.credit-model: true/false • taskmanager.network.memory.buffers-per-channel: 2 • taskmanager.network.memory.floating-buffers-per-gate: 8 • Number of exclusive buffers should be enough to saturate the network for a full round-trip-time (2 x network latency) ➢ #exclBuffers * segmentSize = round-trip-time * throughput Subtask 4 TaskManager 1 TaskManager 2 Subtask 2 2 0 Unannounced Credit2 Channel Credit announce credit Send buffers & announce backlog size 0 Backlog size
  • 20. © 2018 data Artisans20 CREDIT-BASED FLOW CONTROL • Number of exclusive buffers too high ➢ higher number of required network buffers ➢ buffering more during checkpoint alignment ➢ BUT: faster ramp-up (before floating buffers kick in) • Number of exclusive buffers too low ➢ times of in-activity during ramp-up Subtask 4 TaskManager 1 TaskManager 2 Subtask 2 2 0 Unannounced Credit2 Channel Credit announce credit Send buffers & announce backlog size 0 Backlog size
  • 21. © 2018 data Artisans21 BUFFER TIMEOUT • StreamExecutionEnvironment#setBufferTimeout() • Affects every unchained connection: remote or local ➢ Upper bound on latency for low throughput channels(!) ➢ Trade-off throughput vs. latency (see earlier)
  • 22. © 2018 data Artisans22 NETWORK THREADS • netty.client.numThreads (default: number of slots) • netty.server.numThreads (default: number of slots) • May become a bottleneck if thread(s) are overloaded • BUT: may also become an overhead if too many ➢ Do your own benchmarks and verify for your job!
  • 23. © 2018 data Artisans23 USE LINUX-NATIVE EPOLL (FLINK 1.6+) • taskmanager.network.netty.transport: AUTO | NIO | EPOLL • EPOLL may reduce the channel polling overhead between user space and kernel/system space • There should be no downside in activating this or at least AUTO. ➢ Do your own benchmarks for your job! • Please give feedback in FLINK-10177 so that we can decide whether to use AUTO by default.
  • 24. © 2018 data Artisans24 METRICS
  • 25. © 2018 data Artisans25 NETWORK STACK METRICS • Backpressure monitor ‒ Web/REST UI, /jobs/:jobid/vertices/:vertexid/backpressure) • [input, output]QueueLength • numRecords[In, Out] • numBytesOut, numBytesIn[Local, Remote] • numBuffersOut, numBuffersIn[Local, Remote] (Flink 1.5.3+, 1.6.1+)
  • 26. © 2018 data Artisans26 LATENCY MARKERS • ExecutionConfig#setLatencyTrackingInterval() (default: every 2s) • Sources periodically emit a LatencyMarker with a timestamp • These flow with the stream and properly queue behind records • Latency markers bypass operators, e.g. windows • Once received, they will be re-emitted onto a random output channel • We create one histogram per source ↔ operator pair (window size: 128) • source_id.<sourceId>.source_subtask_index.<subtaskIdx>. operator_id.<operatorId>.operator_subtask_index.<subtaskIdx> ➢ 10 operators, parallelism 100 = 9 * 100 * 100 = 90,000 histograms! https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#latency-tracking
  • 27. © 2018 data Artisans27 COMMON ANTIPATTERNS
  • 28. © 2018 data Artisans28 REPEATED KEYBY’S ON THE SAME KEY • KeyedStream is not retained ‒ UDF could have changed the key • Additional keyBy() is necessary to gain access to keyed state, but: ‒ Prevents chaining ‒ Adds an additional shuffle ➢ DataStreamUtils#reinterpretAsKeyedStream .keyBy(“location”) .keyBy(“location”)
  • 29. © 2018 data Artisans29 Subtask 4 TaskManager 1 TaskManager 2 Subtask 2 Buffer Pool CREDIT-BASED FLOW CONTROL (FLINK 1.5+) TCP Connection Floating Buffers Exclusive Buffers Buffers: #channels * 2 + 8 LatencyMarker ➢ synchronization overhead for the output flusher!
  • 30. © 2018 data Artisans30 WHAT‘S UP NEXT?
  • 31. © 2018 data Artisans31 NETWORK SERIALIZATION STACK (FLINK 1.7?) • Serialization for broadcasts once per record, not channel • Only one intermediate serialization buffer (on heap) ➢ significantly reduces the memory footprint • see FLINK-9913 TaskManager 1 Subtask 2 RecordWriter
  • 32. © 2018 data Artisans32 OPENSSL-BASED SSL ENGINE (FLINK 1.7?) • Runs native code • Uses advanced CPU instruction sets ➢ May reduce encryption/decryption overhead (needs verification) • see FLINK-9816
  • 33. © 2018 data Artisans33 MOVE OUTPUT FLUSHER TO NETTY • Current implementation may have (GC) problems with many channels ➢ schedule the output flusher inside the Netty event loop • see FLINK-8625
  • 34. THANK YOU! @dataArtisans @ApacheFlink WE ARE HIRING data-artisans.com/careers