Apache Flink Hands-On

Apache Flink
Fast and reliable big data processing
Aljoscha Krettek
aljoscha@apache.org

What is Apache Flink?
• Project undergoing incubation in the Apache Software
Foundation
• Originating from the Stratosphere research project
started at TU Berlin in 2009
• http://flink.incubator.apache.org
• 59 contributors (doubled in ~4 months)
• Has awesome squirrel logo

What is Apache Flink?
Flink Client

Apache Flink
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
flink.incubator.apache.org
…
env.execute();

Apache Flink
DataSet<String> input = env.readTextFile(“/hello/there”);

Apache Flink
DataSet<String> input = env.readTextFile(“hdfs:///hello/there”);

Apache Flink
DataSet<Tuple2<String,Integer> input = env.readCsvFile(“/hello/there”)
.fieldDelimiter(‘|’)
.lineDelimiter(“n")
.ignoreFirstLine()
.types(String.class, Integer.class);

Apache Flink
DataSet<Beer> beers = input.map( new MapFunction<?, Beer>() {
public Beer map(? in) {
return new Beer(in.f0, in.f1);
}
});

Apache Flink
DataSet<Beer> beers = input.map( in -> new Beer(in.f0, in.f1) );

Apache Flink
val beers = input.map( in => new Beer(in._1, in._2) )

Apache Flink
DataSet<Beer> filtered = beers.filter( new FilterFunction<Beer>() {
public boolean filter(Beer in) {
return beer.name.contains(“augustiner”);
}
});

Apache Flink
DataSet<Beer> grouped = beers
.groupBy(“name”)
.sortGroup(“rating”, Order.DESCENDING)
.reduceGroup( new GroupReduceFunction<Beer, Beer>() {
public Beer reduceGroup(Iterable<Beer> in, Collector<Beer> out) {
out.collect(in.iterator().next());
}
});

Apache Flink
DataSet<Tuple2<String, Integer>> aggregated = input.groupBy(0).sum(1);

Apache Flink
result.print();

Apache Flink
result.writeAsText(“/ciao/for/now”);

github.com/aljoscha/beer-analysis
www.filedropper.com/beerdatacsv
www.filedropper.com/beerdata (large)
github.com/apache/incubator-flink
meetup.com/Apache-Flink-Meetup

Apache Flink Hands-On

More Related Content

What's hot

Viewers also liked

Similar to Apache Flink Hands-On

More from Aljoscha Krettek

Apache Flink Hands-On

Editor's Notes