Apache Flink 
Fast and reliable big data processing 
Aljoscha Krettek 
aljoscha@apache.org
What is Apache Flink? 
• Project undergoing incubation in the Apache Software 
Foundation 
• Originating from the Stratosphere research project 
started at TU Berlin in 2009 
• http://flink.incubator.apache.org 
• 59 contributors (doubled in ~4 months) 
• Has awesome squirrel logo
What is Apache Flink? 
Flink Client
Apache Flink 
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 
flink.incubator.apache.org 
… 
env.execute();
Apache Flink 
DataSet<String> input = env.readTextFile(“/hello/there”); 
flink.incubator.apache.org
Apache Flink 
DataSet<String> input = env.readTextFile(“hdfs:///hello/there”); 
flink.incubator.apache.org
Apache Flink 
DataSet<Tuple2<String,Integer> input = env.readCsvFile(“/hello/there”) 
flink.incubator.apache.org 
.fieldDelimiter(‘|’) 
.lineDelimiter(“n") 
.ignoreFirstLine() 
.types(String.class, Integer.class);
Apache Flink 
DataSet<Beer> beers = input.map( new MapFunction<?, Beer>() { 
flink.incubator.apache.org 
public Beer map(? in) { 
return new Beer(in.f0, in.f1); 
} 
});
Apache Flink 
DataSet<Beer> beers = input.map( in -> new Beer(in.f0, in.f1) ); 
flink.incubator.apache.org
Apache Flink 
val beers = input.map( in => new Beer(in._1, in._2) ) 
flink.incubator.apache.org
Apache Flink 
DataSet<Beer> filtered = beers.filter( new FilterFunction<Beer>() { 
flink.incubator.apache.org 
public boolean filter(Beer in) { 
return beer.name.contains(“augustiner”); 
} 
});
Apache Flink 
flink.incubator.apache.org 
DataSet<Beer> grouped = beers 
.groupBy(“name”) 
.sortGroup(“rating”, Order.DESCENDING) 
.reduceGroup( new GroupReduceFunction<Beer, Beer>() { 
public Beer reduceGroup(Iterable<Beer> in, Collector<Beer> out) { 
out.collect(in.iterator().next()); 
} 
});
Apache Flink 
DataSet<Tuple2<String, Integer>> aggregated = input.groupBy(0).sum(1); 
flink.incubator.apache.org
Apache Flink 
flink.incubator.apache.org 
result.print();
Apache Flink 
result.writeAsText(“/ciao/for/now”); 
flink.incubator.apache.org
github.com/aljoscha/beer-analysis 
www.filedropper.com/beerdatacsv 
www.filedropper.com/beerdata (large) 
flink.incubator.apache.org 
github.com/apache/incubator-flink 
meetup.com/Apache-Flink-Meetup

Apache Flink Hands-On

Editor's Notes

  • #4 Data processing engine which let you write programs in a functional style and executes them automatically in parallel