Apache Flink Hands On

Robert Metzger
Robert MetzgerCo-Founder and Engineering Lead at data Artisans
Hands on Apache Flink
How to run, debug and speed up
Flink applications
Robert Metzger
rmetzger@apache.org
@rmetzger_
This talk
• Frequently asked questions + their
answers
• An overview over the tooling in Flink
• An outlook into the future
flink.apache.org 1
“One week of trials and errors
can save up to half an hour of
reading the documentation.”
– Paris Hilton
flink.apache.org 2
WRITE AND TEST YOUR JOB
The first step
flink.apache.org 3
Get started with an empty project
• Generate a skeleton project with Maven
flink.apache.org 4
mvn archetype:generate /
-DarchetypeGroupId=org.apache.flink /
-DarchetypeArtifactId=flink-quickstart-java /
-DarchetypeVersion=0.9-SNAPSHOT
you can also put
“quickstart-scala” here
or “0.8.1”
• No need for manually downloading any
.tgz or .jar files for now
Local Development
• Start Flink in your IDE for local
development & debugging.
flink.apache.org 5
final ExecutionEnvironment env =
ExecutionEnvironment.createLocalEnvironment();
• Use our testing framework
@RunWith(Parameterized.class)
class YourTest extends MultipleProgramsTestBase {
@Test
public void testRunWithConfiguration(){
expectedResult = "1 11n“;
}}
Debugging with the IDE
flink.apache.org 6
RUN YOUR JOB ON A (FAKE)
CLUSTER
Get your hands dirty
flink.apache.org 7
Got no cluster? – Renting options
• Google Compute Engine [1]
• Amazon EMR or any other cloud provider
with preinstalled Hadoop YARN [2]
• Install Flink yourself on the machines
flink.apache.org 8
./bdutil -e extensions/flink/flink_env.sh deploy
[1] http://ci.apache.org/projects/flink/flink-docs-master/setup/gce_setup.html
[2] http://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html
wget http://stratosphere-bin.amazonaws.com/flink-0.9-SNAPSHOT-bin-hadoop2.tgz
tar xvzf flink-0.9-SNAPSHOT-bin-hadoop2.tgz
cd flink-0.9-SNAPSHOT/
./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
Got no money?
• Listen closely to this talk and become a
freelance “Big Data Consultant”
• Start a cluster locally in the meantime
flink.apache.org 9
$ tar xzf flink-*.tgz
$ cd flink
$ bin/start-cluster.sh
Starting Job Manager
Starting task manager on host
$ jps
5158 JobManager
5262 TaskManager
assert hasCluster;
• Submitting a job
– /bin/flink (Command Line)
– RemoteExecutionEnvironment
(From a local or remote java app)
– Web Frontend (GUI)
– Per job on YARN (Command Line, directly to
YARN)
– Scala Shell
flink.apache.org 10
Web Frontends – Web Job Client
flink.apache.org 11
Select jobs and
preview plan
Understand Optimizer choices
Web Frontends – Job Manager
flink.apache.org 12
Overall system status
Job execution details
Task Manager resource
utilization
Debugging on a cluster
• Good old system out debugging
– Get a logger
– Start logging
– You can also use System.out.println().
flink.apache.org 13
private static final Logger LOG =
LoggerFactory.getLogger(YourJob.class);
LOG.info("elementCount = {}", elementCount);
Getting logs on a cluster
• Non-YARN (=bare metal installation)
– The logs are located in each TaskManager’s
log/ directory.
– ssh there and read the logs.
• YARN
– Make sure YARN log aggregation is enabled
– Retrieve logs from YARN (once app is
finished)
flink.apache.org 14
$ yarn logs -applicationId <application ID>
Flink Logs
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - --------------------------------------------------------------------------------
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager (Version: 0.9-SNAPSHOT, Rev:2e515fc, Date:27.05.2015 @ 11:24:23 CEST)
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Current user: robert
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.75-b04
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 736 MiBytes
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: (not set)
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options:
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -XX:MaxPermSize=256m
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms768m
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx768m
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog.file=/home/robert/incubator-flink/build-target/bin/../log/flink-robert-jobmanager-robert-da.log
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/home/robert/incubator-flink/build-target/bin/../conf/log4j.properties
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/home/robert/incubator-flink/build-target/bin/../conf/logback.xml
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments:
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir
11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - /home/robert/incubator-flink/build-target/bin/../conf
11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode
11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - local
11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - --streamingMode
11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - batch
11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - --------------------------------------------------------------------------------
11:42:39,469 INFO org.apache.flink.runtime.jobmanager.JobManager - Loading configuration from /home/robert/incubator-flink/build-target/bin/../conf
11:42:39,525 INFO org.apache.flink.runtime.jobmanager.JobManager - Security is not enabled. Starting non-authenticated JobManager.
11:42:39,525 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager
11:42:39,527 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor system at localhost:6123.
11:42:40,189 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
11:42:40,316 INFO Remoting - Starting remoting
11:42:40,569 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:6123]
11:42:40,573 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor
11:42:40,580 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-50f75dc9-3001-4c1b-bc2a-6658ac21322b
11:42:40,581 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:51194 - max concurrent requests: 50 - max backlog: 1000
11:42:40,613 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting embedded TaskManager for JobManager's LOCAL execution mode
11:42:40,615 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka://flink/user/jobmanager#205521910.
11:42:40,663 INFO org.apache.flink.runtime.taskmanager.TaskManager - Messages between TaskManager and JobManager have a max timeout of 100000 milliseconds
11:42:40,666 INFO org.apache.flink.runtime.taskmanager.TaskManager - Temporary file directory '/tmp': total 7 GB, usable 7 GB (100.00% usable)
11:42:41,092 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 64 MB for network buffer pool (number of memory segments: 2048, bytes per segment: 32768).
11:42:41,511 INFO org.apache.flink.runtime.taskmanager.TaskManager - Using 0.7 of the currently free heap space for Flink managed memory (461 MB).
11:42:42,520 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-4c6f4364-1975-48b7-99d9-a74e4edb7103 for spill files.
11:42:42,523 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManger web frontend
flink.apache.org 15
Build Information
JVM details
Init messages
Get logs of a running YARN
application
flink.apache.org 16
Debugging on a cluster -
Accumulators
• Useful to verify your assumptions about
the data
flink.apache.org 17
class Tokenizer extends RichFlatMapFunction<String, String>>
{
@Override
public void flatMap(String value, Collector<String> out) {
getRuntimeContext()
.getLongCounter("elementCount").add(1L);
// do more stuff.
} }
Use “Rich*Functions” to get RuntimeContext
Debugging on a cluster -
Accumulators
• Where can I get the accumulator results?
– returned by env.execute()
– displayed when executed with /bin/flink
– in the JobManager web frontend
flink.apache.org 18
JobExecutionResult result = env.execute("WordCount");
long ec = result.getAccumulatorResult("elementCount");
Excursion: RichFunctions
• The default functions are SAMs (Single
abstract method). Interfaces with one
method (for Java8 Lambdas)
• There is a “Rich” variant for each function.
– RichFlatMapFunction, …
– Methods
• open(Configuration c) & close()
• getRuntimeContext()
flink.apache.org 19
Excursion: RichFunctions &
RuntimeContext
• The RuntimeContext provides some useful
methods
• getIndexOfThisSubtask () /
getNumberOfParallelSubtasks() – who am
I, and if yes how many?
• getExecutionConfig()
• Accumulators
• DistributedCache
flink.apache.org 20
Attaching a remote debugger to
Flink in a Cluster
flink.apache.org 21
Attaching a debugger to Flink in a
cluster
• Add JVM start option in flink-conf.yaml
env.java.opts: “-agentlib:jdwp=….”
• Open an SSH tunnel to the machine:
ssh -f -N -L 5005:127.0.0.1:5005 user@host
• Use your IDE to start a remote debugging
session
flink.apache.org 22
JOB TUNING
Make it run faster
flink.apache.org 23
Tuning options
• CPU
– Processing slots, threads, …
• Memory
– How to adjust memory usage on the
TaskManager
• I/O
– Specifying temporary directories for spilling
flink.apache.org 24
Tell Flink how many CPUs you
have
• taskmanager.numberOfTaskSlots
– number of parallel job instances
– number of pipelines per TaskManager
• recommended: number of CPU cores
flink.apache.org 25
Map Reduce
Map Reduce
Map Reduce
Map Reduce
Map Reduce
Map Reduce
Map Reduce
Task Manager 1
Slot 1
Slot 2
Slot 3
Task Manager 2
Slot 1
Slot 2
Slot 3
Task Manager 3
Slot 1
Slot 2
Slot 3
Task
Managers: 3
Total number of
processing
slots: 9
flink-config.yaml:
taskmanager.numberOfTaskSlots: 3
(Recommended value: Number of CPU cores)
or
/bin/yarn-session.sh –slots 3 –n 3
Processing slots
Slots – Wordcount with
parallelism=1
flink.apache.org 27
Task Manager 1
Slot 1
Slot 2
Slot 3
Task Manager 2
Slot 1
Slot 2
Slot 3
Task Manager 3
Slot 1
Slot 2
Slot 3
Source ->
flatMap
Reduce Sink
When no argument given,
parallelism.default from
flink-config.yaml is used.
Default value = 1
Slots – Wordcount with higher
parallelism (= 2 here)
flink.apache.org
28Task Manager 1
Slot 1
Slot 2
Slot 3
Task Manager 2
Slot 1
Slot 2
Slot 3
Task Manager 3
Slot 1
Slot 2
Slot 3
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Places to set parallelism for a job
flink-config.yaml
parallelism.default: 2
or Flink Client:
./bin/flink -p 2
or ExecutionEnvironment:
env.setParallelism(2)
Slots – Wordcount using all
resources (parallelism = 9)
flink.apache.org 29
Task Manager 1
Slot 1
Slot 2
Slot 3
Task Manager 2
Slot 1
Slot 2
Slot 3
Task Manager 3
Slot 1
Slot 2
Slot 3
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Source -
> flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Source ->
flatMap
Reduce Sink
Slots – Setting parallelism on a per
operator basis
flink.apache.org 30
Task Manager 1
Slot 1
Slot 2
Slot 3
Task Manager 2
Slot 1
Slot 2
Slot 3
Task Manager 3
Slot 1
Slot 2
Slot 3
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source -
> flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
The parallelism of each operator can be set individually in the APIs
counts.writeAsCsv(outputPath, "n", " ").setParallelism(1);
Sink
Slots – Setting parallelism on a per
operator basis
flink.apache.org 31
Task Manager 1
Slot 1
Slot 2
Slot 3
Task Manager 2
Slot 1
Slot 2
Slot 3
Task Manager 3
Slot 1
Slot 2
Slot 3
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source -
> flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Source ->
flatMap
Reduce
Sink
The data is streamed to this Sink
from all the other slots on the
other TaskManagers
Tuning options
• CPU
– Processing slots, threads, …
• Memory
– How to adjust memory usage on the
TaskManager
• I/O
– Specifying temporary directories for spilling
flink.apache.org 32
flink.apache.org 33
Memory in Flink - Theory
flink.apache.org 34
taskmanager.network.numberOfBuffers
relative: taskmanager.memory.fraction
absolute: taskmanager.memory.size
Memory in Flink - Configuration
taskmanager.heap.mb
or „-tm“ argument for bin/yarn-session.sh
Memory in Flink - OOM
flink.apache.org 35
2015-02-20 11:22:54 INFO JobClient:345 - java.lang.OutOfMemoryError: Java heap space
at org.apache.flink.runtime.io.network.serialization.DataOutputSerializer.resize(DataOutputSerializer.java:249)
at org.apache.flink.runtime.io.network.serialization.DataOutputSerializer.write(DataOutputSerializer.java:93)
at org.apache.flink.api.java.typeutils.runtime.DataOutputViewStream.write(DataOutputViewStream.java:39)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:163)
at com.esotericsoftware.kryo.io.Output.require(Output.java:142)
at com.esotericsoftware.kryo.io.Output.writeBoolean(Output.java:613)
at com.twitter.chill.java.BitSetSerializer.write(BitSetSerializer.java:42)
at com.twitter.chill.java.BitSetSerializer.write(BitSetSerializer.java:29)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:599)
at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.serialize(KryoSerializer.java:155)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:91)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:30)
at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
at
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76
at org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
at org.apache.flink.api.scala.GroupedDataSet$$anon$2.reduce(GroupedDataSet.scala:262)
at org.apache.flink.runtime.operators.GroupReduceDriver.run(GroupReduceDriver.java:124)
at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:493)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
at java.lang.Thread.run(Thread.java:745)
Memory is missing
here
Reduce managed
memory
reduce
taskmanager.
memory.fraction
Memory in Flink – Network buffers
flink.apache.org 36
Memory is missing
here
Managed memory
will shrink
automatically
Error: java.lang.Exception: Failed to deploy the task CHAIN
Reduce(org.okkam.flink.maintenance.deduplication.blocking.RemoveDuplicateReduceGr
oupFunction) ->
Combine(org.apache.flink.api.java.operators.DistinctOperator$DistinctFunction) (15/28) -
execution #0 to slot SubSlot 5 (cab978f80c0cb7071136cd755e971be9 (5) -
ALLOCATED/ALIVE):
org.apache.flink.runtime.io.network.InsufficientResourcesException: okkam-nano-
2.okkam.it has not enough buffers to safely execute CHAIN
Reduce(org.okkam.flink.maintenance.deduplication.blocking.RemoveDuplicateReduceGr
oupFunction) ->
Combine(org.apache.flink.api.java.operators.DistinctOperator$DistinctFunction)
(36 buffers missing)
increase „taskmanager.network.numberOfBuffers“
What are these buffers needed for?
flink.apache.org 37
TaskManager 1
Slot 2
Map Reduce
Slot 1
TaskManager 2
Slot 2
Slot 1
A small Flink cluster with 4 processing slots (on 2 Task Managers)
A simple MapReduce Job in Flink:
What are these buffers needed for?
flink.apache.org 38
Map Reduce job with a parallelism of 2 and 2 processing slots per Machine
TaskManager 1 TaskManager 2
Slot1Slot2
Map
Map
Reduce
Reduce
Map
Map
Reduce
Reduce
Map
Map
Reduce
Reduce
Map
Map
Reduce
Reduce
Slot1Slot2
Network buffer
8 buffers for outgoing
data 8 buffers for incoming
data
What are these buffers needed for?
flink.apache.org 39
Map Reduce job with a parallelism of 2 and 2 processing slots per Machine
TaskManager 1 TaskManager 2
Slot1Slot2
Map
Map
Reduce
Reduce
Map
Map
Reduce
Reduce
Map
Map
Reduce
Reduce
Map
Map
Reduce
Reduce
Tuning options
• CPU
– Processing slots, threads, …
• Memory
– How to adjust memory usage on the
TaskManager
• I/O
– Specifying temporary directories for spilling
flink.apache.org 40
Tuning options
• Memory
– How to adjust memory usage on the
TaskManager
• CPU
– Processing slots, threads, …
• I/O
– Specifying temporary directories for spilling
flink.apache.org 41
Disk I/O
• Sometimes your data doesn’t fit into main
memory, so we have to spill to disk
– taskmanager.tmp.dirs: /mnt/disk1,/mnt/disk2
• Use real local disks only (no tmpfs or
NAS)
flink.apache.org 42
Reader
Thread
Disk 1
Writer
Thread
Reader
Thread
Writer
Thread
Disk 2
Task Manager
Outlook
• Per job monitoring & metrics
• Less configuration values with dynamic
memory management
• Download operator results to debug them
locally
flink.apache.org 43
Join our community
• RTFM (= read the documentation)
• Mailing lists
– Subscribe: user-subscribe@flink.apache.org
– Ask: user@flink.apache.org
• Stack Overflow
– tag with “flink” so that we get an email
notification ;)
• IRC: freenode#flink
• Read the code, its open source 
flink.apache.org 44
Flink Forward registration & call for
abstracts is open now
flink.apache.org 45
• 12/13 October 2015
• Kulturbrauerei Berlin
• With Flink Workshops / Trainings!
1 of 46

Recommended

Productizing Structured Streaming Jobs by
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
3.2K views63 slides
Rocks db state store in structured streaming by
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streamingBalaji Mohanam
1.4K views12 slides
Logstash by
LogstashLogstash
Logstash琛琳 饶
34.5K views33 slides
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia by
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
11.7K views32 slides
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash by
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
2.3K views30 slides
G1 Garbage Collector: Details and Tuning by
G1 Garbage Collector: Details and TuningG1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and TuningSimone Bordet
18.7K views66 slides

More Related Content

What's hot

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen by
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
8.4K views50 slides
Apache Beam: A unified model for batch and stream processing data by
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
22.5K views73 slides
How to Extend Apache Spark with Customized Optimizations by
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsDatabricks
5.1K views51 slides
kafka by
kafkakafka
kafkaAmikam Snir
1K views23 slides
Building a fully managed stream processing platform on Flink at scale for Lin... by
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
855 views56 slides
Koalas: Pandas on Apache Spark by
Koalas: Pandas on Apache SparkKoalas: Pandas on Apache Spark
Koalas: Pandas on Apache SparkDatabricks
3K views40 slides

What's hot(20)

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen by confluent
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent8.4K views
How to Extend Apache Spark with Customized Optimizations by Databricks
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
Databricks5.1K views
Building a fully managed stream processing platform on Flink at scale for Lin... by Flink Forward
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward855 views
Koalas: Pandas on Apache Spark by Databricks
Koalas: Pandas on Apache SparkKoalas: Pandas on Apache Spark
Koalas: Pandas on Apache Spark
Databricks3K views
Performance Optimizations in Apache Impala by Cloudera, Inc.
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.10.7K views
Stephan Ewen - Experiences running Flink at Very Large Scale by Ververica
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica 3.5K views
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data by DataWorks Summit
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit3.4K views
Apache Spark in Depth: Core Concepts, Architecture & Internals by Anton Kirillov
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov9.6K views
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc... by Databricks
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks10.8K views
Evening out the uneven: dealing with skew in Flink by Flink Forward
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward2.5K views
Autoscaling Flink with Reactive Mode by Flink Forward
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward925 views
Getting Started with Apache Spark on Kubernetes by Databricks
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks710 views
Introduction to Apache Flink by datamantra
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra5.2K views
Top 5 Mistakes to Avoid When Writing Apache Spark Applications by Cloudera, Inc.
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.127.8K views
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,... by confluent
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent6.3K views
Cost-based Query Optimization in Apache Phoenix using Apache Calcite by Julian Hyde
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde8K views

Similar to Apache Flink Hands On

Riga Dev Day - Automated Android Continuous Integration by
Riga Dev Day - Automated Android Continuous IntegrationRiga Dev Day - Automated Android Continuous Integration
Riga Dev Day - Automated Android Continuous IntegrationNicolas Frankel
12.7K views48 slides
Spark on Yarn by
Spark on YarnSpark on Yarn
Spark on YarnQubole
1.8K views41 slides
How eBay does Automatic Outage Planning by
How eBay does Automatic Outage PlanningHow eBay does Automatic Outage Planning
How eBay does Automatic Outage PlanningCA | Automic Software
1.3K views13 slides
Introduction to Laravel Framework (5.2) by
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)Viral Solani
2.5K views52 slides
Ansible benelux meetup - Amsterdam 27-5-2015 by
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Pavel Chunyayev
875 views38 slides
Spark 2.x Troubleshooting Guide by
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
51.1K views19 slides

Similar to Apache Flink Hands On(20)

Riga Dev Day - Automated Android Continuous Integration by Nicolas Frankel
Riga Dev Day - Automated Android Continuous IntegrationRiga Dev Day - Automated Android Continuous Integration
Riga Dev Day - Automated Android Continuous Integration
Nicolas Frankel12.7K views
Spark on Yarn by Qubole
Spark on YarnSpark on Yarn
Spark on Yarn
Qubole1.8K views
Introduction to Laravel Framework (5.2) by Viral Solani
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)
Viral Solani2.5K views
Ansible benelux meetup - Amsterdam 27-5-2015 by Pavel Chunyayev
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
Pavel Chunyayev875 views
Spark 2.x Troubleshooting Guide by IBM
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
IBM51.1K views
Why scala is not my ideal language and what I can do with this by Ruslan Shevchenko
Why scala is not my ideal language and what I can do with thisWhy scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with this
Ruslan Shevchenko939 views
RichFaces - Testing on Mobile Devices by Pavol Pitoňák
RichFaces - Testing on Mobile DevicesRichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile Devices
Pavol Pitoňák12.4K views
Release with confidence by John Congdon
Release with confidenceRelease with confidence
Release with confidence
John Congdon1.8K views
Performance tuning with zend framework by Alan Seiden
Performance tuning with zend frameworkPerformance tuning with zend framework
Performance tuning with zend framework
Alan Seiden7.7K views
The power of linux advanced tracer [POUG18] by Mahmoud Hatem
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
Mahmoud Hatem1.3K views
Using apache spark for processing trillions of records each day at Datadog by Vadim Semenov
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
Vadim Semenov1.4K views
Spark summit2014 techtalk - testing spark by Anu Shetty
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing spark
Anu Shetty2.1K views
Performance Profiling in Rust by InfluxData
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
InfluxData906 views
Audit your reactive applications by OCTO Technology
Audit your reactive applicationsAudit your reactive applications
Audit your reactive applications
OCTO Technology1.5K views
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat) by OpenBlend society
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
Container orchestration from theory to practice by Docker, Inc.
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
Docker, Inc.344 views
Automated Java Deployments With Rpm by Martin Jackson
Automated Java Deployments With RpmAutomated Java Deployments With Rpm
Automated Java Deployments With Rpm
Martin Jackson18K views

More from Robert Metzger

How to Contribute to Apache Flink (and Flink at the Apache Software Foundation) by
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)Robert Metzger
945 views20 slides
dA Platform Overview by
dA Platform OverviewdA Platform Overview
dA Platform OverviewRobert Metzger
402 views26 slides
Apache Flink @ Tel Aviv / Herzliya Meetup by
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupRobert Metzger
628 views54 slides
Apache Flink Community Updates November 2016 @ Berlin Meetup by
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupRobert Metzger
1K views22 slides
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016) by
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
2.9K views31 slides
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup by
Community Update May 2016 (January - May) | Berlin Apache Flink MeetupCommunity Update May 2016 (January - May) | Berlin Apache Flink Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink MeetupRobert Metzger
444 views14 slides

More from Robert Metzger(20)

How to Contribute to Apache Flink (and Flink at the Apache Software Foundation) by Robert Metzger
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
Robert Metzger945 views
Apache Flink @ Tel Aviv / Herzliya Meetup by Robert Metzger
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger628 views
Apache Flink Community Updates November 2016 @ Berlin Meetup by Robert Metzger
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger1K views
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016) by Robert Metzger
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger2.9K views
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup by Robert Metzger
Community Update May 2016 (January - May) | Berlin Apache Flink MeetupCommunity Update May 2016 (January - May) | Berlin Apache Flink Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
Robert Metzger444 views
GOTO Night Amsterdam - Stream processing with Apache Flink by Robert Metzger
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger882 views
QCon London - Stream Processing with Apache Flink by Robert Metzger
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
Robert Metzger2K views
January 2016 Flink Community Update & Roadmap 2016 by Robert Metzger
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger3.5K views
Flink Community Update December 2015: Year in Review by Robert Metzger
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
Robert Metzger955 views
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int... by Robert Metzger
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger842 views
Chicago Flink Meetup: Flink's streaming architecture by Robert Metzger
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger1.1K views
Flink September 2015 Community Update by Robert Metzger
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community Update
Robert Metzger829 views
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015 by Robert Metzger
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger4.4K views
Click-Through Example for Flink’s KafkaConsumer Checkpointing by Robert Metzger
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Robert Metzger57.9K views
August Flink Community Update by Robert Metzger
August Flink Community UpdateAugust Flink Community Update
August Flink Community Update
Robert Metzger504 views
Flink Cummunity Update July (Berlin Meetup) by Robert Metzger
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)
Robert Metzger815 views
Apache Flink First Half of 2015 Community Update by Robert Metzger
Apache Flink First Half of 2015 Community UpdateApache Flink First Half of 2015 Community Update
Apache Flink First Half of 2015 Community Update
Robert Metzger850 views
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA by Robert Metzger
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Robert Metzger2.2K views
Berlin Apache Flink Meetup May 2015, Community Update by Robert Metzger
Berlin Apache Flink Meetup May 2015, Community UpdateBerlin Apache Flink Meetup May 2015, Community Update
Berlin Apache Flink Meetup May 2015, Community Update
Robert Metzger464 views

Recently uploaded

DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
110 views21 slides
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TShapeBlue
81 views34 slides
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
48 views17 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
67 views20 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
54 views27 slides
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsShapeBlue
172 views13 slides

Recently uploaded(20)

DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue110 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue81 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue48 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE67 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue172 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue58 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue63 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson142 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue114 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue69 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue121 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue75 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue191 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views

Apache Flink Hands On

  • 1. Hands on Apache Flink How to run, debug and speed up Flink applications Robert Metzger rmetzger@apache.org @rmetzger_
  • 2. This talk • Frequently asked questions + their answers • An overview over the tooling in Flink • An outlook into the future flink.apache.org 1
  • 3. “One week of trials and errors can save up to half an hour of reading the documentation.” – Paris Hilton flink.apache.org 2
  • 4. WRITE AND TEST YOUR JOB The first step flink.apache.org 3
  • 5. Get started with an empty project • Generate a skeleton project with Maven flink.apache.org 4 mvn archetype:generate / -DarchetypeGroupId=org.apache.flink / -DarchetypeArtifactId=flink-quickstart-java / -DarchetypeVersion=0.9-SNAPSHOT you can also put “quickstart-scala” here or “0.8.1” • No need for manually downloading any .tgz or .jar files for now
  • 6. Local Development • Start Flink in your IDE for local development & debugging. flink.apache.org 5 final ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(); • Use our testing framework @RunWith(Parameterized.class) class YourTest extends MultipleProgramsTestBase { @Test public void testRunWithConfiguration(){ expectedResult = "1 11n“; }}
  • 7. Debugging with the IDE flink.apache.org 6
  • 8. RUN YOUR JOB ON A (FAKE) CLUSTER Get your hands dirty flink.apache.org 7
  • 9. Got no cluster? – Renting options • Google Compute Engine [1] • Amazon EMR or any other cloud provider with preinstalled Hadoop YARN [2] • Install Flink yourself on the machines flink.apache.org 8 ./bdutil -e extensions/flink/flink_env.sh deploy [1] http://ci.apache.org/projects/flink/flink-docs-master/setup/gce_setup.html [2] http://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html wget http://stratosphere-bin.amazonaws.com/flink-0.9-SNAPSHOT-bin-hadoop2.tgz tar xvzf flink-0.9-SNAPSHOT-bin-hadoop2.tgz cd flink-0.9-SNAPSHOT/ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
  • 10. Got no money? • Listen closely to this talk and become a freelance “Big Data Consultant” • Start a cluster locally in the meantime flink.apache.org 9 $ tar xzf flink-*.tgz $ cd flink $ bin/start-cluster.sh Starting Job Manager Starting task manager on host $ jps 5158 JobManager 5262 TaskManager
  • 11. assert hasCluster; • Submitting a job – /bin/flink (Command Line) – RemoteExecutionEnvironment (From a local or remote java app) – Web Frontend (GUI) – Per job on YARN (Command Line, directly to YARN) – Scala Shell flink.apache.org 10
  • 12. Web Frontends – Web Job Client flink.apache.org 11 Select jobs and preview plan Understand Optimizer choices
  • 13. Web Frontends – Job Manager flink.apache.org 12 Overall system status Job execution details Task Manager resource utilization
  • 14. Debugging on a cluster • Good old system out debugging – Get a logger – Start logging – You can also use System.out.println(). flink.apache.org 13 private static final Logger LOG = LoggerFactory.getLogger(YourJob.class); LOG.info("elementCount = {}", elementCount);
  • 15. Getting logs on a cluster • Non-YARN (=bare metal installation) – The logs are located in each TaskManager’s log/ directory. – ssh there and read the logs. • YARN – Make sure YARN log aggregation is enabled – Retrieve logs from YARN (once app is finished) flink.apache.org 14 $ yarn logs -applicationId <application ID>
  • 16. Flink Logs 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -------------------------------------------------------------------------------- 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager (Version: 0.9-SNAPSHOT, Rev:2e515fc, Date:27.05.2015 @ 11:24:23 CEST) 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Current user: robert 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.75-b04 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 736 MiBytes 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: (not set) 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -XX:MaxPermSize=256m 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms768m 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx768m 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog.file=/home/robert/incubator-flink/build-target/bin/../log/flink-robert-jobmanager-robert-da.log 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/home/robert/incubator-flink/build-target/bin/../conf/log4j.properties 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/home/robert/incubator-flink/build-target/bin/../conf/logback.xml 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments: 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir 11:42:39,233 INFO org.apache.flink.runtime.jobmanager.JobManager - /home/robert/incubator-flink/build-target/bin/../conf 11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode 11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - local 11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - --streamingMode 11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - batch 11:42:39,234 INFO org.apache.flink.runtime.jobmanager.JobManager - -------------------------------------------------------------------------------- 11:42:39,469 INFO org.apache.flink.runtime.jobmanager.JobManager - Loading configuration from /home/robert/incubator-flink/build-target/bin/../conf 11:42:39,525 INFO org.apache.flink.runtime.jobmanager.JobManager - Security is not enabled. Starting non-authenticated JobManager. 11:42:39,525 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager 11:42:39,527 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor system at localhost:6123. 11:42:40,189 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 11:42:40,316 INFO Remoting - Starting remoting 11:42:40,569 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:6123] 11:42:40,573 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor 11:42:40,580 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-50f75dc9-3001-4c1b-bc2a-6658ac21322b 11:42:40,581 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:51194 - max concurrent requests: 50 - max backlog: 1000 11:42:40,613 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting embedded TaskManager for JobManager's LOCAL execution mode 11:42:40,615 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka://flink/user/jobmanager#205521910. 11:42:40,663 INFO org.apache.flink.runtime.taskmanager.TaskManager - Messages between TaskManager and JobManager have a max timeout of 100000 milliseconds 11:42:40,666 INFO org.apache.flink.runtime.taskmanager.TaskManager - Temporary file directory '/tmp': total 7 GB, usable 7 GB (100.00% usable) 11:42:41,092 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 64 MB for network buffer pool (number of memory segments: 2048, bytes per segment: 32768). 11:42:41,511 INFO org.apache.flink.runtime.taskmanager.TaskManager - Using 0.7 of the currently free heap space for Flink managed memory (461 MB). 11:42:42,520 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-4c6f4364-1975-48b7-99d9-a74e4edb7103 for spill files. 11:42:42,523 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManger web frontend flink.apache.org 15 Build Information JVM details Init messages
  • 17. Get logs of a running YARN application flink.apache.org 16
  • 18. Debugging on a cluster - Accumulators • Useful to verify your assumptions about the data flink.apache.org 17 class Tokenizer extends RichFlatMapFunction<String, String>> { @Override public void flatMap(String value, Collector<String> out) { getRuntimeContext() .getLongCounter("elementCount").add(1L); // do more stuff. } } Use “Rich*Functions” to get RuntimeContext
  • 19. Debugging on a cluster - Accumulators • Where can I get the accumulator results? – returned by env.execute() – displayed when executed with /bin/flink – in the JobManager web frontend flink.apache.org 18 JobExecutionResult result = env.execute("WordCount"); long ec = result.getAccumulatorResult("elementCount");
  • 20. Excursion: RichFunctions • The default functions are SAMs (Single abstract method). Interfaces with one method (for Java8 Lambdas) • There is a “Rich” variant for each function. – RichFlatMapFunction, … – Methods • open(Configuration c) & close() • getRuntimeContext() flink.apache.org 19
  • 21. Excursion: RichFunctions & RuntimeContext • The RuntimeContext provides some useful methods • getIndexOfThisSubtask () / getNumberOfParallelSubtasks() – who am I, and if yes how many? • getExecutionConfig() • Accumulators • DistributedCache flink.apache.org 20
  • 22. Attaching a remote debugger to Flink in a Cluster flink.apache.org 21
  • 23. Attaching a debugger to Flink in a cluster • Add JVM start option in flink-conf.yaml env.java.opts: “-agentlib:jdwp=….” • Open an SSH tunnel to the machine: ssh -f -N -L 5005:127.0.0.1:5005 user@host • Use your IDE to start a remote debugging session flink.apache.org 22
  • 24. JOB TUNING Make it run faster flink.apache.org 23
  • 25. Tuning options • CPU – Processing slots, threads, … • Memory – How to adjust memory usage on the TaskManager • I/O – Specifying temporary directories for spilling flink.apache.org 24
  • 26. Tell Flink how many CPUs you have • taskmanager.numberOfTaskSlots – number of parallel job instances – number of pipelines per TaskManager • recommended: number of CPU cores flink.apache.org 25 Map Reduce Map Reduce Map Reduce Map Reduce Map Reduce Map Reduce Map Reduce
  • 27. Task Manager 1 Slot 1 Slot 2 Slot 3 Task Manager 2 Slot 1 Slot 2 Slot 3 Task Manager 3 Slot 1 Slot 2 Slot 3 Task Managers: 3 Total number of processing slots: 9 flink-config.yaml: taskmanager.numberOfTaskSlots: 3 (Recommended value: Number of CPU cores) or /bin/yarn-session.sh –slots 3 –n 3 Processing slots
  • 28. Slots – Wordcount with parallelism=1 flink.apache.org 27 Task Manager 1 Slot 1 Slot 2 Slot 3 Task Manager 2 Slot 1 Slot 2 Slot 3 Task Manager 3 Slot 1 Slot 2 Slot 3 Source -> flatMap Reduce Sink When no argument given, parallelism.default from flink-config.yaml is used. Default value = 1
  • 29. Slots – Wordcount with higher parallelism (= 2 here) flink.apache.org 28Task Manager 1 Slot 1 Slot 2 Slot 3 Task Manager 2 Slot 1 Slot 2 Slot 3 Task Manager 3 Slot 1 Slot 2 Slot 3 Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink Places to set parallelism for a job flink-config.yaml parallelism.default: 2 or Flink Client: ./bin/flink -p 2 or ExecutionEnvironment: env.setParallelism(2)
  • 30. Slots – Wordcount using all resources (parallelism = 9) flink.apache.org 29 Task Manager 1 Slot 1 Slot 2 Slot 3 Task Manager 2 Slot 1 Slot 2 Slot 3 Task Manager 3 Slot 1 Slot 2 Slot 3 Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink Source - > flatMap Reduce Sink Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink Source -> flatMap Reduce Sink
  • 31. Slots – Setting parallelism on a per operator basis flink.apache.org 30 Task Manager 1 Slot 1 Slot 2 Slot 3 Task Manager 2 Slot 1 Slot 2 Slot 3 Task Manager 3 Slot 1 Slot 2 Slot 3 Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source - > flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce The parallelism of each operator can be set individually in the APIs counts.writeAsCsv(outputPath, "n", " ").setParallelism(1); Sink
  • 32. Slots – Setting parallelism on a per operator basis flink.apache.org 31 Task Manager 1 Slot 1 Slot 2 Slot 3 Task Manager 2 Slot 1 Slot 2 Slot 3 Task Manager 3 Slot 1 Slot 2 Slot 3 Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source - > flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Source -> flatMap Reduce Sink The data is streamed to this Sink from all the other slots on the other TaskManagers
  • 33. Tuning options • CPU – Processing slots, threads, … • Memory – How to adjust memory usage on the TaskManager • I/O – Specifying temporary directories for spilling flink.apache.org 32
  • 35. flink.apache.org 34 taskmanager.network.numberOfBuffers relative: taskmanager.memory.fraction absolute: taskmanager.memory.size Memory in Flink - Configuration taskmanager.heap.mb or „-tm“ argument for bin/yarn-session.sh
  • 36. Memory in Flink - OOM flink.apache.org 35 2015-02-20 11:22:54 INFO JobClient:345 - java.lang.OutOfMemoryError: Java heap space at org.apache.flink.runtime.io.network.serialization.DataOutputSerializer.resize(DataOutputSerializer.java:249) at org.apache.flink.runtime.io.network.serialization.DataOutputSerializer.write(DataOutputSerializer.java:93) at org.apache.flink.api.java.typeutils.runtime.DataOutputViewStream.write(DataOutputViewStream.java:39) at com.esotericsoftware.kryo.io.Output.flush(Output.java:163) at com.esotericsoftware.kryo.io.Output.require(Output.java:142) at com.esotericsoftware.kryo.io.Output.writeBoolean(Output.java:613) at com.twitter.chill.java.BitSetSerializer.write(BitSetSerializer.java:42) at com.twitter.chill.java.BitSetSerializer.write(BitSetSerializer.java:29) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:599) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.serialize(KryoSerializer.java:155) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:91) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:30) at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51) at org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76 at org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82) at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88) at org.apache.flink.api.scala.GroupedDataSet$$anon$2.reduce(GroupedDataSet.scala:262) at org.apache.flink.runtime.operators.GroupReduceDriver.run(GroupReduceDriver.java:124) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:493) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) at java.lang.Thread.run(Thread.java:745) Memory is missing here Reduce managed memory reduce taskmanager. memory.fraction
  • 37. Memory in Flink – Network buffers flink.apache.org 36 Memory is missing here Managed memory will shrink automatically Error: java.lang.Exception: Failed to deploy the task CHAIN Reduce(org.okkam.flink.maintenance.deduplication.blocking.RemoveDuplicateReduceGr oupFunction) -> Combine(org.apache.flink.api.java.operators.DistinctOperator$DistinctFunction) (15/28) - execution #0 to slot SubSlot 5 (cab978f80c0cb7071136cd755e971be9 (5) - ALLOCATED/ALIVE): org.apache.flink.runtime.io.network.InsufficientResourcesException: okkam-nano- 2.okkam.it has not enough buffers to safely execute CHAIN Reduce(org.okkam.flink.maintenance.deduplication.blocking.RemoveDuplicateReduceGr oupFunction) -> Combine(org.apache.flink.api.java.operators.DistinctOperator$DistinctFunction) (36 buffers missing) increase „taskmanager.network.numberOfBuffers“
  • 38. What are these buffers needed for? flink.apache.org 37 TaskManager 1 Slot 2 Map Reduce Slot 1 TaskManager 2 Slot 2 Slot 1 A small Flink cluster with 4 processing slots (on 2 Task Managers) A simple MapReduce Job in Flink:
  • 39. What are these buffers needed for? flink.apache.org 38 Map Reduce job with a parallelism of 2 and 2 processing slots per Machine TaskManager 1 TaskManager 2 Slot1Slot2 Map Map Reduce Reduce Map Map Reduce Reduce Map Map Reduce Reduce Map Map Reduce Reduce Slot1Slot2 Network buffer 8 buffers for outgoing data 8 buffers for incoming data
  • 40. What are these buffers needed for? flink.apache.org 39 Map Reduce job with a parallelism of 2 and 2 processing slots per Machine TaskManager 1 TaskManager 2 Slot1Slot2 Map Map Reduce Reduce Map Map Reduce Reduce Map Map Reduce Reduce Map Map Reduce Reduce
  • 41. Tuning options • CPU – Processing slots, threads, … • Memory – How to adjust memory usage on the TaskManager • I/O – Specifying temporary directories for spilling flink.apache.org 40
  • 42. Tuning options • Memory – How to adjust memory usage on the TaskManager • CPU – Processing slots, threads, … • I/O – Specifying temporary directories for spilling flink.apache.org 41
  • 43. Disk I/O • Sometimes your data doesn’t fit into main memory, so we have to spill to disk – taskmanager.tmp.dirs: /mnt/disk1,/mnt/disk2 • Use real local disks only (no tmpfs or NAS) flink.apache.org 42 Reader Thread Disk 1 Writer Thread Reader Thread Writer Thread Disk 2 Task Manager
  • 44. Outlook • Per job monitoring & metrics • Less configuration values with dynamic memory management • Download operator results to debug them locally flink.apache.org 43
  • 45. Join our community • RTFM (= read the documentation) • Mailing lists – Subscribe: user-subscribe@flink.apache.org – Ask: user@flink.apache.org • Stack Overflow – tag with “flink” so that we get an email notification ;) • IRC: freenode#flink • Read the code, its open source  flink.apache.org 44
  • 46. Flink Forward registration & call for abstracts is open now flink.apache.org 45 • 12/13 October 2015 • Kulturbrauerei Berlin • With Flink Workshops / Trainings!

Editor's Notes

  1. My goal: Everybody finds a new, useful feature of flink in this talk!
  2. scripts, no typing required
  3. An entire slide about cloud computing without having “cloud” on it
  4. bin/start-cluster.sh is also the option for those with Flink “on premise”
  5. this way you can also start multiple threads per disk