SlideShare a Scribd company logo
Introduction to
Apache Flink™
Maximilian Michels
mxm@apache.org
@stadtlegende
EIT ICT Summer School 2015
Ufuk Celebi
uce@apache.org
@iamuce
About Us
 Ufuk Celebi <uce@apache.org>
 Maximilian Michels <mxm@apache.org>
 Apache Flink Committers
 Working full-time on Apache Flink at
dataArtisans
2
Structure
Goal: Get you started with Apache Flink
 Overview
 Internals
 APIs
 Exercise
 Demo
 Advanced features
3
Exercises
 We will ask you to do an exercise using Flink
 Please check out the website for this session:
http://dataArtisans.github.io/eit-summerschool-15
 The website also contains a solution which we will
present later on but, first, try to solve the exercise.
4
1 year of Flink - code
April 2014 April 2015
Flink Community
0
20
40
60
80
100
120
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15
#unique contributor ids by git commits
In top 5 of Apache's big
data projects after one year
in the Apache Software
Foundation
The Apache Way
 Independent, non-profit organization
 Community-driven open source software
development approach
 Consensus-based decision making
 Public communication and open to new
contributors
7
Integration into the Big Data Stack
8
9
A stream processor
with many applications
Streaming dataflow runtime
Apache Flink
What can I do with Flink?
10
Basic API Concept
How do I write a Flink program?
1. Bootstrap sources
2. Apply operations
3. Output to source
11
Data
Stream
Operation
Data
Stream
Source Sink
Data
Set
Operation
Data
Set
Source Sink
Stream & Batch Processing
 DataStream API
12
Stock Feed
Name Price
Microsoft 124
Google 516
Apple 235
… …
Alert if
Microsoft
> 120
Write
event to
database
Sum every
10
seconds
Alert if
sum >
10000
Microsoft 124
Google 516
Apple 235
Microsoft 124
Google 516
Apple 235
b h
2 1
3 5
7 4
… …
Map Reduce
a
1
2
…
 DataSet API
Streaming & Batch
13
Batch
finite
blocking or
pipelined
high
Streaming
infinite
pipelined
low
Input
Data transfer
Latency
Scaling out
14
Data
Set
Operation
Data
Set
Source Sink
Data
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source Sink
Scaling up
15
Flink’s Architecture
16
Architecture Overview
 Client
 Master (Job Manager)
 Worker (Task Manager)
17
Client
Job Manager
Task
Manager
Task
Manager
Task
Manager
Client
 Optimize
 Construct job graph
 Pass job graph to job manager
 Retrieve job results
18
Job Manager
Client
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Optimizer
Type
extraction
Data
Source
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Job Manager
 Parallelization: Create Execution Graph
 Scheduling: Assign tasks to task managers
 State tracking: Supervise the execution
19
Job Manager
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Task Manager
Task Manager
Task Manager
Task Manager
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
hash-part [0] hash-part [0]
GroupRed
sort
forwar
d
Task Manager
 Operations are split up into tasks depending
on the specified parallelism
 Each parallel instance of an operation runs in
a separate task slot
 The scheduler may run several tasks from
different operators in one task slot
20
Task Manager
S
l
o
t
Task ManagerTask Manager
S
l
o
t
S
l
o
t
From Program to Execution
21
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Optimizer
Type extraction
stack
Task
scheduling
Dataflow
metadata
Pre-flight (Client)
Job Manager
Task Managers
Data
Source
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Program
Dataflow
Graph
deploy
operators
track
intermediate
results
Flink's execution model
22
Flink execution model
 A program is a graph (DAG) of operators
 Operators = computation + state
 Operators produce intermediate results = logical
streams of records
 Other operators can consume those
23
map
join sum
ID1
ID2
ID3
24
A map-reduce
job with Flink
ExecutionGraph
JobManager
TaskManager 1
TaskManager 2
M1
M2
RP1RP2
R1
R2
1
2 3a
3b
4a
4b
5a
5b
"Blocked" result partition
25
Execution Graph
Job Manager
Task Manager 1
Task Manager 2
M1
M2
RP1RP2
R1
R2
1
2 3a
3b
4a
4b
5a
5b
Streaming
"Pipelined" result partition
Non-native streaming
26
stream
discretizer
Job Job Job Job
while (true) {
// get next few records
// issue batch job
}
Batch on Streaming
 Batch programs are a special kind of
streaming program
27
Infinite Streams Finite Streams
Stream Windows Global View
Pipelined
Data Exchange
Pipelined or
Blocking Exchange
Streaming Programs Batch Programs
Stream processor applications
28
Stream
processing
Batch
processing
Machine Learning at scale
Graph Analysis
Demo
29
Introduction to the DataSet API
30
Flink’s APIs
31
DataStream (Java/Scala)
Streaming dataflow runtime
DataSet (Java/Scala)
API Preview
32
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))
.groupBy("word").sum("frequency")
.print()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
DataSet API (batch):
DataStream API (streaming):
WordCount: main method
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
33
Execution Environment
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
34
Data Sources
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
35
Data types
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
36
Transformations
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
37
User functions
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
38
DataSinks
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
39
Execute!
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
// get input data either from file or use example data
DataSet<String> inputText = env.readTextFile(args[0]);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in tuples containing: (word,1)
inputText.flatMap(new Tokenizer())
// group by the tuple field "0"
.groupBy(0)
//sum up tuple field "1"
.reduceGroup(new SumWords());
// emit result
counts.writeAsCsv(args[1], "n", " ");
// execute program
env.execute("WordCount Example");
}
40
WordCount: Map
public static class Tokenizer
implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value,
Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(
new Tuple2<String, Integer>(token, 1));
}
}
}
} 41
WordCount: Map: Interface
public static class Tokenizer
implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value,
Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(
new Tuple2<String, Integer>(token, 1));
}
}
}
} 42
WordCount: Map: Types
public static class Tokenizer
implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value,
Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(
new Tuple2<String, Integer>(token, 1));
}
}
}
} 43
WordCount: Map: Collector
public static class Tokenizer
implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value,
Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(
new Tuple2<String, Integer>(token, 1));
}
}
}
} 44
WordCount: Reduce
public static class SumWords implements
GroupReduceFunction<Tuple2<String, Integer>,
Tuple2<String, Integer>> {
@Override
public void reduce(Iterable<Tuple2<String, Integer>> values,
Collector<Tuple2<String, Integer>> out) {
int count = 0;
String word = null;
for (Tuple2<String, Integer> tuple : values) {
word = tuple.f0;
count++;
}
out.collect(new Tuple2<String, Integer>(word, count));
}
}
45
WordCount: Reduce: Interface
public static class SumWords implements
GroupReduceFunction<Tuple2<String, Integer>,
Tuple2<String, Integer>> {
@Override
public void reduce(Iterable<Tuple2<String, Integer>> values,
Collector<Tuple2<String, Integer>> out) {
int count = 0;
String word = null;
for (Tuple2<String, Integer> tuple : values) {
word = tuple.f0;
count++;
}
out.collect(new Tuple2<String, Integer>(word, count));
}
}
46
WordCount: Reduce: Types
public static class SumWords implements
GroupReduceFunction<Tuple2<String, Integer>,
Tuple2<String, Integer>> {
@Override
public void reduce(Iterable<Tuple2<String, Integer>> values,
Collector<Tuple2<String, Integer>> out) {
int count = 0;
String word = null;
for (Tuple2<String, Integer> tuple : values) {
word = tuple.f0;
count++;
}
out.collect(new Tuple2<String, Integer>(word, count));
}
}
47
WordCount: Reduce: Collector
public static class SumWords implements
GroupReduceFunction<Tuple2<String, Integer>,
Tuple2<String, Integer>> {
@Override
public void reduce(Iterable<Tuple2<String, Integer>> values,
Collector<Tuple2<String, Integer>> out) {
int count = 0;
String word = null;
for (Tuple2<String, Integer> tuple : values) {
word = tuple.f0;
count++;
}
out.collect(new Tuple2<String, Integer>(word, count));
}
}
48
DataSet API Concepts
49
Data Types
 Basic Java Types
• String, Long, Integer, Boolean,…
• Arrays
 Composite Types
• Tuple
• PoJo (Java objects)
• Custom type
50
Tuples
 The easiest, lightweight, and generic way
of encapsulating data in Flink
 Tuple1 up to Tuple25
Tuple3<String, String, Integer> person =
new Tuple3<>("Max", "Mustermann", 42);
// zero based index!
String firstName = person.f0;
String secondName = person.f1;
Integer age = person.f2;
51
Transformations: Map
DataSet<Integer> integers = env.fromElements(1, 2, 3, 4);
// Regular Map - Takes one element and produces one element
DataSet<Integer> doubleIntegers =
integers.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) {
return value * 2;
}
});
doubleIntegers.print();
> 2, 4, 6, 8
// Flat Map - Takes one element and produces zero, one, or more elements.
DataSet<Integer> doubleIntegers2 =
integers.flatMap(new FlatMapFunction<Integer, Integer>() {
@Override
public void flatMap(Integer value, Collector<Integer> out) {
out.collect(value * 2);
}
});
doubleIntegers2.print();
> 2, 4, 6, 8
52
Transformations: Filter
// The DataSet
DataSet<Integer> integers = env.fromElements(1, 2, 3, 4);
DataSet<Integer> filtered =
integers.filter(new FilterFunction<Integer>() {
@Override
public boolean filter(Integer value) {
return value != 3;
}
});
integers.print();
> 1, 2, 4
53
Groupings and Reduce
// (name, age) of employees
DataSet<Tuple2<String, Integer>> employees = …
// group by second field (age)
DataSet<Integer, Integer> grouped = employees.groupBy(1)
// return a list of age groups with its counts
.reduceGroup(new CountSameAge());
54
Name Age
Stephan 18
Fabian 23
Julia 27
Romeo 27
Anna 18
• DataSets can be split into groups
• Groups are defined using a
common key
AgeGroup Count
18 2
23 1
27 2
GroupReduce
public static class CountSameAge implements GroupReduceFunction <Tuple2<String,
Integer>, Tuple2<Integer, Integer>> {
@Override
public void reduce(Iterable<Tuple2<String, Integer>> values,
Collector<Tuple2<Integer, Integer>> out) {
Integer ageGroup = 0;
Integer countsInGroup = 0;
for (Tuple2<String, Integer> person : values) {
ageGroup = person.f1;
countsInGroup++;
}
out.collect(new Tuple2<Integer, Integer>
(ageGroup, countsInGroup));
}
}
55
Joining two DataSets
// authors (id, name, email)
DataSet<Tuple3<Integer, String, String>> authors = ..;
// posts (title, content, author_id)
DataSet<Tuple3<String, String, Integer>> posts = ..;
DataSet<Tuple2<
Tuple3<Integer, String, String>,
Tuple3<String, String, Integer>
>> archive = authors.join(posts).where(0).equalTo(2);
56
Authors
Id Name email
1 Fabian fabian@..
2 Julia julia@...
3 Max max@...
4 Romeo romeo@.
Posts
Title Content Author id
… … 2
.. .. 4
.. .. 4
.. .. 1
.. .. 2
Joining two DataSets
// authors (id, name, email)
DataSet<Tuple3<Integer, String, String>> authors = ..;
// posts (title, content, author_id)
DataSet<Tuple3<String, String, Integer>> posts = ..;
DataSet<Tuple2<
Tuple3<Integer, String, String>,
Tuple3<String, String, Integer>
>> archive = authors.join(posts).where(0).equalTo(2);
57
Archive
Id Name email Title Content Author id
1 Fabian fabian@.. .. .. 1
2 Julia julia@... .. .. 2
2 Julia julia@... .. .. 2
3 Romeo romeo@... .. .. 4
4 Romeo romeo@. .. .. 4
Join with join function
// authors (id, name, email)
DataSet<Tuple3<Integer, String, String>> authors = ..;
// posts (title, content, author_id)
DataSet<Tuple3<String, String, Integer>> posts = ..;
// (title, author name)
DataSet<Tuple2<String, String>> archive =
authors.join(posts).where(0).equalTo(2)
.with(new PostsByUser());
public static class PostsByUser implements
JoinFunction<Tuple3<Integer, String, String>,
Tuple3<String, String, Integer>,
Tuple2<String, String>> {
@Override
public Tuple2<String, String> join(
Tuple3<Integer, String, String> left,
Tuple3<String, String, Integer> right) {
return new Tuple2<String, String>(left.f1, right.f0);
}
}
58
Archive
Name Title
Fabian ..
Julia ..
Julia ..
Romeo ..
Romeo ..
Data Sources / Data Sinks
59
Data Sources
Text
 readTextFile(“/path/to/file”)
CSV
 readCsvFile(“/path/to/file”)
Collection
 fromCollection(collection)
 fromElements(1,2,3,4,5)
60
Data Sources: Collections
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// read from elements
DataSet<String> names = env.fromElements(“Some”, “Example”, “Strings”);
// read from Java collection
List<String> list = new ArrayList<String>();
list.add(“Some”);
list.add(“Example”);
list.add(“Strings”);
DataSet<String> names = env.fromCollection(list);
61
Data Sources: File-Based
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// read text file from local or distributed file system
DataSet<String> localLines =
env.readTextFile(”/path/to/my/textfile");
// read a CSV file with three fields
DataSet<Tuple3<Integer, String, Double>> csvInput =
env.readCsvFile(“/the/CSV/file")
.types(Integer.class, String.class, Double.class);
// read a CSV file with five fields, taking only two of them
DataSet<Tuple2<String, Double>> csvInput =
env.readCsvFile(“/the/CSV/file")
// take the first and the fourth field
.includeFields("10010") .types(String.class,
Double.class);
62
Data Sinks
Text
 writeAsText(“/path/to/file”)
 writeAsFormattedText(“/path/to/file”, formatFunction)
CSV
 writeAsCsv(“/path/to/file”)
Return data to the Client
 Print()
 Collect()
 Count()
63
Data Sinks (lazy)
 Lazily executed when env.execute() is called
DataSet<…> result;
// write DataSet to a file on the local file system
result.writeAsText(“/path/to/file");
// write DataSet to a file and overwrite the file if it exists
result.writeAsText("/path/to/file”,FileSystem.WriteMode.OVERWRITE);
// tuples as lines with pipe as the separator "a|b|c”
result.writeAsCsv("/path/to/file", "n", "|");
// this wites values as strings using a user-defined TextFormatter object
result.writeAsFormattedText("/path/to/file",
new TextFormatter<Tuple2<Integer, Integer>>() {
public String format (Tuple2<Integer, Integer> value) {
return value.f1 + " - " + value.f0;
}
});
64
Data Sinks (eager)
 Eagerly executed
DataSet<Tuple2<String, Integer> result;
// print
result.print();
// count
int numberOfElements = result.count();
// collect
List<Tuple2<String, Integer> materializedResults = result.collect();
65
Execution Setups
66
Ways to Run a Flink Program
67
DataSet (Java/Scala/Python) DataStream (Java/Scala)
Local Remote Yarn Tez Embedded
Streaming dataflow runtime
Local Execution
 Starts local Flink cluster
 All processes run in the
same JVM
 Behaves just like a
regular Cluster
 Very useful for developing
and debugging
68
Job Manager
Task
Manager
Task
Manager
Task
Manager
Task
Manager
JVM
Remote Execution
 The cluster mode
 Submit a Job
remotely
 Monitors the status
of the job
70
Client Job Manager
Cluster
Task
Manager
Task
Manager
Task
Manager
Task
Manager
Submit job
YARN Execution
 Multi user scenario
 Resource sharing
 Uses YARN
containers to run a
Flink cluster
 Very easy to setup
Flink
71
Client
Node Manager
Job Manager
YARN Cluster
Resource Manager
Node Manager
Task
Manager
Node Manager
Task
Manager
Node Manager
Other
Application
Exercises and Hands-on
http://dataArtisans.github.io/
eit-summerschool-15
73
Closing
74
tl;dr: What was this about?
 The case for Flink
• Low latency
• High throughput
• Fault-tolerant
• Easy to use APIs, library ecosystem
• Growing community
 A stream processor that is great for batch
analytics as well
75
I ♥ , do you?
76
 Get involved and start a discussion on
Flink‘s mailing list
 { user, dev }@flink.apache.org
 Subscribe to news@flink.apache.org
 Follow flink.apache.org/blog and
@ApacheFlink on Twitter
77
flink-forward.org
Call for papers deadline:
August 14, 2015
October 12-13, 2015
Discount code: FlinkEITSummerSchool25
Thank you for listening!
78

More Related Content

What's hot

Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache flink
Apache flinkApache flink
Apache flink
pranay kumar
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
Slim Baltagi
 
Apache flink
Apache flinkApache flink
Apache flink
Ahmed Nader
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
confluent
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 

What's hot (20)

Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Apache flink
Apache flinkApache flink
Apache flink
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 

Similar to Introduction to Apache Flink

Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
Databricks
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Databricks
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Robert Metzger
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
Albert Bifet
 
(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i
Nico Ludwig
 
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
Robert Metzger
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
Anyscale
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
DataWorks Summit
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
confluent
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
Databricks
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its API
Jörn Guy Süß JGS
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
Databricks
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
DataWorks Summit
 

Similar to Introduction to Apache Flink (20)

Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i
 
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its API
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 

Introduction to Apache Flink

  • 1. Introduction to Apache Flink™ Maximilian Michels mxm@apache.org @stadtlegende EIT ICT Summer School 2015 Ufuk Celebi uce@apache.org @iamuce
  • 2. About Us  Ufuk Celebi <uce@apache.org>  Maximilian Michels <mxm@apache.org>  Apache Flink Committers  Working full-time on Apache Flink at dataArtisans 2
  • 3. Structure Goal: Get you started with Apache Flink  Overview  Internals  APIs  Exercise  Demo  Advanced features 3
  • 4. Exercises  We will ask you to do an exercise using Flink  Please check out the website for this session: http://dataArtisans.github.io/eit-summerschool-15  The website also contains a solution which we will present later on but, first, try to solve the exercise. 4
  • 5. 1 year of Flink - code April 2014 April 2015
  • 6. Flink Community 0 20 40 60 80 100 120 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15 #unique contributor ids by git commits In top 5 of Apache's big data projects after one year in the Apache Software Foundation
  • 7. The Apache Way  Independent, non-profit organization  Community-driven open source software development approach  Consensus-based decision making  Public communication and open to new contributors 7
  • 8. Integration into the Big Data Stack 8
  • 9. 9 A stream processor with many applications Streaming dataflow runtime Apache Flink
  • 10. What can I do with Flink? 10
  • 11. Basic API Concept How do I write a Flink program? 1. Bootstrap sources 2. Apply operations 3. Output to source 11 Data Stream Operation Data Stream Source Sink Data Set Operation Data Set Source Sink
  • 12. Stream & Batch Processing  DataStream API 12 Stock Feed Name Price Microsoft 124 Google 516 Apple 235 … … Alert if Microsoft > 120 Write event to database Sum every 10 seconds Alert if sum > 10000 Microsoft 124 Google 516 Apple 235 Microsoft 124 Google 516 Apple 235 b h 2 1 3 5 7 4 … … Map Reduce a 1 2 …  DataSet API
  • 13. Streaming & Batch 13 Batch finite blocking or pipelined high Streaming infinite pipelined low Input Data transfer Latency
  • 14. Scaling out 14 Data Set Operation Data Set Source Sink Data Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source SinkData Set Operation Data Set Source Sink
  • 17. Architecture Overview  Client  Master (Job Manager)  Worker (Task Manager) 17 Client Job Manager Task Manager Task Manager Task Manager
  • 18. Client  Optimize  Construct job graph  Pass job graph to job manager  Retrieve job results 18 Job Manager Client case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Optimizer Type extraction Data Source orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT probe hash-part [0] hash-part [0] GroupRed sort forward
  • 19. Job Manager  Parallelization: Create Execution Graph  Scheduling: Assign tasks to task managers  State tracking: Supervise the execution 19 Job Manager Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Task Manager Task Manager Task Manager Task Manager Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d Data Source orders.tbl Filter Map DataSour ce lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] GroupRed sort forwar d
  • 20. Task Manager  Operations are split up into tasks depending on the specified parallelism  Each parallel instance of an operation runs in a separate task slot  The scheduler may run several tasks from different operators in one task slot 20 Task Manager S l o t Task ManagerTask Manager S l o t S l o t
  • 21. From Program to Execution 21 case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Optimizer Type extraction stack Task scheduling Dataflow metadata Pre-flight (Client) Job Manager Task Managers Data Source orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow Graph deploy operators track intermediate results
  • 23. Flink execution model  A program is a graph (DAG) of operators  Operators = computation + state  Operators produce intermediate results = logical streams of records  Other operators can consume those 23 map join sum ID1 ID2 ID3
  • 24. 24 A map-reduce job with Flink ExecutionGraph JobManager TaskManager 1 TaskManager 2 M1 M2 RP1RP2 R1 R2 1 2 3a 3b 4a 4b 5a 5b "Blocked" result partition
  • 25. 25 Execution Graph Job Manager Task Manager 1 Task Manager 2 M1 M2 RP1RP2 R1 R2 1 2 3a 3b 4a 4b 5a 5b Streaming "Pipelined" result partition
  • 26. Non-native streaming 26 stream discretizer Job Job Job Job while (true) { // get next few records // issue batch job }
  • 27. Batch on Streaming  Batch programs are a special kind of streaming program 27 Infinite Streams Finite Streams Stream Windows Global View Pipelined Data Exchange Pipelined or Blocking Exchange Streaming Programs Batch Programs
  • 30. Introduction to the DataSet API 30
  • 31. Flink’s APIs 31 DataStream (Java/Scala) Streaming dataflow runtime DataSet (Java/Scala)
  • 32. API Preview 32 case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming):
  • 33. WordCount: main method public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 33
  • 34. Execution Environment public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 34
  • 35. Data Sources public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 35
  • 36. Data types public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 36
  • 37. Transformations public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 37
  • 38. User functions public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 38
  • 39. DataSinks public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 39
  • 40. Execute! public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // get input data either from file or use example data DataSet<String> inputText = env.readTextFile(args[0]); DataSet<Tuple2<String, Integer>> counts = // split up the lines in tuples containing: (word,1) inputText.flatMap(new Tokenizer()) // group by the tuple field "0" .groupBy(0) //sum up tuple field "1" .reduceGroup(new SumWords()); // emit result counts.writeAsCsv(args[1], "n", " "); // execute program env.execute("WordCount Example"); } 40
  • 41. WordCount: Map public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) { // normalize and split the line String[] tokens = value.toLowerCase().split("W+"); // emit the pairs for (String token : tokens) { if (token.length() > 0) { out.collect( new Tuple2<String, Integer>(token, 1)); } } } } 41
  • 42. WordCount: Map: Interface public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) { // normalize and split the line String[] tokens = value.toLowerCase().split("W+"); // emit the pairs for (String token : tokens) { if (token.length() > 0) { out.collect( new Tuple2<String, Integer>(token, 1)); } } } } 42
  • 43. WordCount: Map: Types public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) { // normalize and split the line String[] tokens = value.toLowerCase().split("W+"); // emit the pairs for (String token : tokens) { if (token.length() > 0) { out.collect( new Tuple2<String, Integer>(token, 1)); } } } } 43
  • 44. WordCount: Map: Collector public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) { // normalize and split the line String[] tokens = value.toLowerCase().split("W+"); // emit the pairs for (String token : tokens) { if (token.length() > 0) { out.collect( new Tuple2<String, Integer>(token, 1)); } } } } 44
  • 45. WordCount: Reduce public static class SumWords implements GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>> { @Override public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<String, Integer>> out) { int count = 0; String word = null; for (Tuple2<String, Integer> tuple : values) { word = tuple.f0; count++; } out.collect(new Tuple2<String, Integer>(word, count)); } } 45
  • 46. WordCount: Reduce: Interface public static class SumWords implements GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>> { @Override public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<String, Integer>> out) { int count = 0; String word = null; for (Tuple2<String, Integer> tuple : values) { word = tuple.f0; count++; } out.collect(new Tuple2<String, Integer>(word, count)); } } 46
  • 47. WordCount: Reduce: Types public static class SumWords implements GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>> { @Override public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<String, Integer>> out) { int count = 0; String word = null; for (Tuple2<String, Integer> tuple : values) { word = tuple.f0; count++; } out.collect(new Tuple2<String, Integer>(word, count)); } } 47
  • 48. WordCount: Reduce: Collector public static class SumWords implements GroupReduceFunction<Tuple2<String, Integer>, Tuple2<String, Integer>> { @Override public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<String, Integer>> out) { int count = 0; String word = null; for (Tuple2<String, Integer> tuple : values) { word = tuple.f0; count++; } out.collect(new Tuple2<String, Integer>(word, count)); } } 48
  • 50. Data Types  Basic Java Types • String, Long, Integer, Boolean,… • Arrays  Composite Types • Tuple • PoJo (Java objects) • Custom type 50
  • 51. Tuples  The easiest, lightweight, and generic way of encapsulating data in Flink  Tuple1 up to Tuple25 Tuple3<String, String, Integer> person = new Tuple3<>("Max", "Mustermann", 42); // zero based index! String firstName = person.f0; String secondName = person.f1; Integer age = person.f2; 51
  • 52. Transformations: Map DataSet<Integer> integers = env.fromElements(1, 2, 3, 4); // Regular Map - Takes one element and produces one element DataSet<Integer> doubleIntegers = integers.map(new MapFunction<Integer, Integer>() { @Override public Integer map(Integer value) { return value * 2; } }); doubleIntegers.print(); > 2, 4, 6, 8 // Flat Map - Takes one element and produces zero, one, or more elements. DataSet<Integer> doubleIntegers2 = integers.flatMap(new FlatMapFunction<Integer, Integer>() { @Override public void flatMap(Integer value, Collector<Integer> out) { out.collect(value * 2); } }); doubleIntegers2.print(); > 2, 4, 6, 8 52
  • 53. Transformations: Filter // The DataSet DataSet<Integer> integers = env.fromElements(1, 2, 3, 4); DataSet<Integer> filtered = integers.filter(new FilterFunction<Integer>() { @Override public boolean filter(Integer value) { return value != 3; } }); integers.print(); > 1, 2, 4 53
  • 54. Groupings and Reduce // (name, age) of employees DataSet<Tuple2<String, Integer>> employees = … // group by second field (age) DataSet<Integer, Integer> grouped = employees.groupBy(1) // return a list of age groups with its counts .reduceGroup(new CountSameAge()); 54 Name Age Stephan 18 Fabian 23 Julia 27 Romeo 27 Anna 18 • DataSets can be split into groups • Groups are defined using a common key AgeGroup Count 18 2 23 1 27 2
  • 55. GroupReduce public static class CountSameAge implements GroupReduceFunction <Tuple2<String, Integer>, Tuple2<Integer, Integer>> { @Override public void reduce(Iterable<Tuple2<String, Integer>> values, Collector<Tuple2<Integer, Integer>> out) { Integer ageGroup = 0; Integer countsInGroup = 0; for (Tuple2<String, Integer> person : values) { ageGroup = person.f1; countsInGroup++; } out.collect(new Tuple2<Integer, Integer> (ageGroup, countsInGroup)); } } 55
  • 56. Joining two DataSets // authors (id, name, email) DataSet<Tuple3<Integer, String, String>> authors = ..; // posts (title, content, author_id) DataSet<Tuple3<String, String, Integer>> posts = ..; DataSet<Tuple2< Tuple3<Integer, String, String>, Tuple3<String, String, Integer> >> archive = authors.join(posts).where(0).equalTo(2); 56 Authors Id Name email 1 Fabian fabian@.. 2 Julia julia@... 3 Max max@... 4 Romeo romeo@. Posts Title Content Author id … … 2 .. .. 4 .. .. 4 .. .. 1 .. .. 2
  • 57. Joining two DataSets // authors (id, name, email) DataSet<Tuple3<Integer, String, String>> authors = ..; // posts (title, content, author_id) DataSet<Tuple3<String, String, Integer>> posts = ..; DataSet<Tuple2< Tuple3<Integer, String, String>, Tuple3<String, String, Integer> >> archive = authors.join(posts).where(0).equalTo(2); 57 Archive Id Name email Title Content Author id 1 Fabian fabian@.. .. .. 1 2 Julia julia@... .. .. 2 2 Julia julia@... .. .. 2 3 Romeo romeo@... .. .. 4 4 Romeo romeo@. .. .. 4
  • 58. Join with join function // authors (id, name, email) DataSet<Tuple3<Integer, String, String>> authors = ..; // posts (title, content, author_id) DataSet<Tuple3<String, String, Integer>> posts = ..; // (title, author name) DataSet<Tuple2<String, String>> archive = authors.join(posts).where(0).equalTo(2) .with(new PostsByUser()); public static class PostsByUser implements JoinFunction<Tuple3<Integer, String, String>, Tuple3<String, String, Integer>, Tuple2<String, String>> { @Override public Tuple2<String, String> join( Tuple3<Integer, String, String> left, Tuple3<String, String, Integer> right) { return new Tuple2<String, String>(left.f1, right.f0); } } 58 Archive Name Title Fabian .. Julia .. Julia .. Romeo .. Romeo ..
  • 59. Data Sources / Data Sinks 59
  • 60. Data Sources Text  readTextFile(“/path/to/file”) CSV  readCsvFile(“/path/to/file”) Collection  fromCollection(collection)  fromElements(1,2,3,4,5) 60
  • 61. Data Sources: Collections ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // read from elements DataSet<String> names = env.fromElements(“Some”, “Example”, “Strings”); // read from Java collection List<String> list = new ArrayList<String>(); list.add(“Some”); list.add(“Example”); list.add(“Strings”); DataSet<String> names = env.fromCollection(list); 61
  • 62. Data Sources: File-Based ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // read text file from local or distributed file system DataSet<String> localLines = env.readTextFile(”/path/to/my/textfile"); // read a CSV file with three fields DataSet<Tuple3<Integer, String, Double>> csvInput = env.readCsvFile(“/the/CSV/file") .types(Integer.class, String.class, Double.class); // read a CSV file with five fields, taking only two of them DataSet<Tuple2<String, Double>> csvInput = env.readCsvFile(“/the/CSV/file") // take the first and the fourth field .includeFields("10010") .types(String.class, Double.class); 62
  • 63. Data Sinks Text  writeAsText(“/path/to/file”)  writeAsFormattedText(“/path/to/file”, formatFunction) CSV  writeAsCsv(“/path/to/file”) Return data to the Client  Print()  Collect()  Count() 63
  • 64. Data Sinks (lazy)  Lazily executed when env.execute() is called DataSet<…> result; // write DataSet to a file on the local file system result.writeAsText(“/path/to/file"); // write DataSet to a file and overwrite the file if it exists result.writeAsText("/path/to/file”,FileSystem.WriteMode.OVERWRITE); // tuples as lines with pipe as the separator "a|b|c” result.writeAsCsv("/path/to/file", "n", "|"); // this wites values as strings using a user-defined TextFormatter object result.writeAsFormattedText("/path/to/file", new TextFormatter<Tuple2<Integer, Integer>>() { public String format (Tuple2<Integer, Integer> value) { return value.f1 + " - " + value.f0; } }); 64
  • 65. Data Sinks (eager)  Eagerly executed DataSet<Tuple2<String, Integer> result; // print result.print(); // count int numberOfElements = result.count(); // collect List<Tuple2<String, Integer> materializedResults = result.collect(); 65
  • 67. Ways to Run a Flink Program 67 DataSet (Java/Scala/Python) DataStream (Java/Scala) Local Remote Yarn Tez Embedded Streaming dataflow runtime
  • 68. Local Execution  Starts local Flink cluster  All processes run in the same JVM  Behaves just like a regular Cluster  Very useful for developing and debugging 68 Job Manager Task Manager Task Manager Task Manager Task Manager JVM
  • 69. Remote Execution  The cluster mode  Submit a Job remotely  Monitors the status of the job 70 Client Job Manager Cluster Task Manager Task Manager Task Manager Task Manager Submit job
  • 70. YARN Execution  Multi user scenario  Resource sharing  Uses YARN containers to run a Flink cluster  Very easy to setup Flink 71 Client Node Manager Job Manager YARN Cluster Resource Manager Node Manager Task Manager Node Manager Task Manager Node Manager Other Application
  • 73. tl;dr: What was this about?  The case for Flink • Low latency • High throughput • Fault-tolerant • Easy to use APIs, library ecosystem • Growing community  A stream processor that is great for batch analytics as well 75
  • 74. I ♥ , do you? 76  Get involved and start a discussion on Flink‘s mailing list  { user, dev }@flink.apache.org  Subscribe to news@flink.apache.org  Follow flink.apache.org/blog and @ApacheFlink on Twitter
  • 75. 77 flink-forward.org Call for papers deadline: August 14, 2015 October 12-13, 2015 Discount code: FlinkEITSummerSchool25
  • 76. Thank you for listening! 78

Editor's Notes

  1. dev list: 300-400 messages/month. record 1000 messages on