Berlin Apache Flink Meetup #11
Community Update
September 2015
Robert Metzger
Committer and PMC Member
rmetzger@apache.org
@rmetzger_
Apache Flink is an open source platform for
scalable batch and stream data processing.
Apache Flink is …
flink.apache.org 1
• The core of Flink is a distributed
streaming dataflow engine.
• Executing dataflows in
parallel on clusters
• Providing a reliable
foundation for various
workloads
• DataSet and DataStream
programming abstractions are
the foundation for user programs
and higher layers
One engine for many use cases
flink.apache.org 2
Real time streaming
topologies
Machine Learning at scale
Graph Analysis
Long batch
pipelines
What happened?
• New Committer: Matthias Sax
• 0.9.1 released
• Discussions for releasing 0.10 started
• Cascading on Flink released:
https://github.com/dataArtisans/cascading-flink
• Flink+NiFi integration pull request opened
flink.apache.org 3
Now in master (0.10-SNAPSHOT)
flink.apache.org 4
• Flink dropped Hadoop 2.2.0 support (we require 2.3.0)
• Scala 2.11 artifacts are now available
• Support for allocating off-heap memory
• New window operators (general purpose and processing
time windows)
• old implementation: 50K / core / sec (gets slower over time, high
GC overhead)
• new implementation w/o pre-aggregation: 800K / sec / core
(moderate GC overhead)
• new implementation w/ pre-aggregation: 3mio / sec / core (low
GC overhead)
• Rolling HDFS file sink for DataStream API
• Sink for ElasticSearch
• New JobManager dashboard
• New FlinkKafkaProducer
Flink among “The best open source
big data tools”
flink.apache.org 5
Articles
• data Artisans blog: Kafka + Flink: A practical, how-to guide
[1]
• Gartner blog: Apache Flink Offers a Challenge to Spark [2]
• data Artisans blog: Batch is a special case of streaming [3]
• Flink blog: Off-heap Memory in Apache Flink and the
curious JIT compiler [4]
• MapR blog: Apache Flink: A New Way to Handle Streaming
Data [5]
• Big Data Knowledge Base: Happenings in the Flink
Community - September 2015 [6]
6
[1] http://data-artisans.com/kafka-flink-a-practical-how-to/
[2] http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/
[3] http://data-artisans.com/batch-is-a-special-case-of-streaming/
[4] http://flink.apache.org/news/2015/09/16/off-heap-memory.html
[5] https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data
[6] http://sparkbigdata.com/102-spark-blog-slim-baltagi/17-happenings-in-the-flink-community-september-2015
Events in September
flink.apache.org 7
VLDB 2015
Conference
Workshop
Flink Training
in Berlin
Washington
DC Meetup
Meetup in
Belgium
Milwaukee
Meetup
Budapest:
2 ApacheCon Talks
BigTop Workshop
data2day
Conference in
Karlsruhe
Chicago
Meetup
flink.apache.org 8
GitHub stats
flink.apache.org 9
flink.apache.org 10
Flink Forward: 2 days conference with
free training in Berlin, Germany
• Schedule: http://flink-forward.org/?post_type=day

Flink September 2015 Community Update

  • 1.
    Berlin Apache FlinkMeetup #11 Community Update September 2015 Robert Metzger Committer and PMC Member rmetzger@apache.org @rmetzger_
  • 2.
    Apache Flink isan open source platform for scalable batch and stream data processing. Apache Flink is … flink.apache.org 1 • The core of Flink is a distributed streaming dataflow engine. • Executing dataflows in parallel on clusters • Providing a reliable foundation for various workloads • DataSet and DataStream programming abstractions are the foundation for user programs and higher layers
  • 3.
    One engine formany use cases flink.apache.org 2 Real time streaming topologies Machine Learning at scale Graph Analysis Long batch pipelines
  • 4.
    What happened? • NewCommitter: Matthias Sax • 0.9.1 released • Discussions for releasing 0.10 started • Cascading on Flink released: https://github.com/dataArtisans/cascading-flink • Flink+NiFi integration pull request opened flink.apache.org 3
  • 5.
    Now in master(0.10-SNAPSHOT) flink.apache.org 4 • Flink dropped Hadoop 2.2.0 support (we require 2.3.0) • Scala 2.11 artifacts are now available • Support for allocating off-heap memory • New window operators (general purpose and processing time windows) • old implementation: 50K / core / sec (gets slower over time, high GC overhead) • new implementation w/o pre-aggregation: 800K / sec / core (moderate GC overhead) • new implementation w/ pre-aggregation: 3mio / sec / core (low GC overhead) • Rolling HDFS file sink for DataStream API • Sink for ElasticSearch • New JobManager dashboard • New FlinkKafkaProducer
  • 6.
    Flink among “Thebest open source big data tools” flink.apache.org 5
  • 7.
    Articles • data Artisansblog: Kafka + Flink: A practical, how-to guide [1] • Gartner blog: Apache Flink Offers a Challenge to Spark [2] • data Artisans blog: Batch is a special case of streaming [3] • Flink blog: Off-heap Memory in Apache Flink and the curious JIT compiler [4] • MapR blog: Apache Flink: A New Way to Handle Streaming Data [5] • Big Data Knowledge Base: Happenings in the Flink Community - September 2015 [6] 6 [1] http://data-artisans.com/kafka-flink-a-practical-how-to/ [2] http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/ [3] http://data-artisans.com/batch-is-a-special-case-of-streaming/ [4] http://flink.apache.org/news/2015/09/16/off-heap-memory.html [5] https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data [6] http://sparkbigdata.com/102-spark-blog-slim-baltagi/17-happenings-in-the-flink-community-september-2015
  • 8.
    Events in September flink.apache.org7 VLDB 2015 Conference Workshop Flink Training in Berlin Washington DC Meetup Meetup in Belgium Milwaukee Meetup Budapest: 2 ApacheCon Talks BigTop Workshop data2day Conference in Karlsruhe Chicago Meetup
  • 9.
  • 10.
  • 11.
    flink.apache.org 10 Flink Forward:2 days conference with free training in Berlin, Germany • Schedule: http://flink-forward.org/?post_type=day

Editor's Notes