Flink September 2015 Community Update

Berlin Apache Flink Meetup #11
Community Update
September 2015
Robert Metzger
Committer and PMC Member
rmetzger@apache.org
@rmetzger_

Apache Flink is an open source platform for
scalable batch and stream data processing.
Apache Flink is …
flink.apache.org 1
• The core of Flink is a distributed
streaming dataflow engine.
• Executing dataflows in
parallel on clusters
• Providing a reliable
foundation for various
workloads
• DataSet and DataStream
programming abstractions are
the foundation for user programs
and higher layers

One engine for many use cases
flink.apache.org 2
Real time streaming
topologies
Machine Learning at scale
Graph Analysis
Long batch
pipelines

What happened?
• New Committer: Matthias Sax
• 0.9.1 released
• Discussions for releasing 0.10 started
• Cascading on Flink released:
https://github.com/dataArtisans/cascading-flink
• Flink+NiFi integration pull request opened
flink.apache.org 3

Now in master (0.10-SNAPSHOT)
flink.apache.org 4
• Flink dropped Hadoop 2.2.0 support (we require 2.3.0)
• Scala 2.11 artifacts are now available
• Support for allocating off-heap memory
• New window operators (general purpose and processing
time windows)
• old implementation: 50K / core / sec (gets slower over time, high
GC overhead)
• new implementation w/o pre-aggregation: 800K / sec / core
(moderate GC overhead)
• new implementation w/ pre-aggregation: 3mio / sec / core (low
GC overhead)
• Rolling HDFS file sink for DataStream API
• Sink for ElasticSearch
• New JobManager dashboard
• New FlinkKafkaProducer

Flink among “The best open source
big data tools”
flink.apache.org 5

Articles
• data Artisans blog: Kafka + Flink: A practical, how-to guide
[1]
• Gartner blog: Apache Flink Offers a Challenge to Spark [2]
• data Artisans blog: Batch is a special case of streaming [3]
• Flink blog: Off-heap Memory in Apache Flink and the
curious JIT compiler [4]
• MapR blog: Apache Flink: A New Way to Handle Streaming
Data [5]
• Big Data Knowledge Base: Happenings in the Flink
Community - September 2015 [6]
6
[1] http://data-artisans.com/kafka-flink-a-practical-how-to/
[2] http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/
[3] http://data-artisans.com/batch-is-a-special-case-of-streaming/
[4] http://flink.apache.org/news/2015/09/16/off-heap-memory.html
[5] https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data
[6] http://sparkbigdata.com/102-spark-blog-slim-baltagi/17-happenings-in-the-flink-community-september-2015

Events in September
flink.apache.org 7
VLDB 2015
Conference
Workshop
Flink Training
in Berlin
Washington
DC Meetup
Meetup in
Belgium
Milwaukee
Meetup
Budapest:
2 ApacheCon Talks
BigTop Workshop
data2day
Conference in
Karlsruhe
Chicago
Meetup

GitHub stats
flink.apache.org 9

flink.apache.org 10
Flink Forward: 2 days conference with
free training in Berlin, Germany
• Schedule: http://flink-forward.org/?post_type=day

Flink September 2015 Community Update

More Related Content

What's hot

Viewers also liked

Similar to Flink September 2015 Community Update

More from Robert Metzger

Recently uploaded

Flink September 2015 Community Update

Editor's Notes