Berlin Apache Flink Meetup #10
Community Update
August 2015
Robert Metzger
Committer and PMC Member
rmetzger@apache.org
@rmetzger_
Apache Flink is an open source platform for
scalable batch and stream data processing.
Apache Flink is …
flink.apache.org 1
• The core of Flink is a distributed
streaming dataflow engine.
• Executing dataflows in
parallel on clusters
• Providing a reliable
foundation for various
workloads
• DataSet and DataStream
programming abstractions are
the foundation for user programs
and higher layers
One engine for many use cases
flink.apache.org 2
Real time streaming
topologies
Machine Learning at scale
Graph Analysis
Long batch
pipelines
What happened?
• New Committer: Chesnay Schepler
• Discussions for a 0.9.1 release started
• Flink is becoming more popular:
– 1000+ Twitter followers
– 500+ GitHub stars
– Named as “open source Big Data project” to
watch by ZDNet.
– Flink Forward schedule with great speakers
announced
flink.apache.org 3
Now in master (0.10-SNAPSHOT)
flink.apache.org 4
• Gelly Scala API
• Flink dropped Java 6 support
• Streaming connector for Elastic Search
• Sampling operation on DataSet API
• A lot of bug fixes:
– Streaming: APIs, general stability, kafka
connector
Articles and Meetups
• High-throughput, low-latency, and exactly-once stream
processing with Apache Flink [1]
• Introducing Gelly: Graph Processing with Apache Flink [2]
• Apache Flink and the case for stream processing [3]
• Crunching Parquet Files with Apache Flink [4]
• The morning paper: Asynchronous Distributed Snapshots for
Distributed Dataflows [5]
• Five open source Big Data projects to watch [6]
• Big Data Performance Engineering: Examples from Hadoop,
Pig, HBase, Flink and Spark [7]
5
[1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
[3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html
[4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7
[5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/
[6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/
[7] http://www.bigsynapse.com/addressing-big-data-performance
GitHub stats
flink.apache.org 6
Upcoming
• Sept 15: Washington DC Area Apache
Flink Meetup
• Sept 17: StreamProcessing.be meetup
• Sept 28-30: Flink Talks at ApacheCon Big
Data Budapest
New Meetup groups:
• New York
• Boston
flink.apache.org 7
Flink Forward schedule published
flink.apache.org 8
• http://flink-forward.org/?post_type=day
• Talks by Huawei, CapitalOne, Bouyges,
Ericsson, Amadeus, ResearchGate, RedHat,
and many more.

August Flink Community Update

  • 1.
    Berlin Apache FlinkMeetup #10 Community Update August 2015 Robert Metzger Committer and PMC Member rmetzger@apache.org @rmetzger_
  • 2.
    Apache Flink isan open source platform for scalable batch and stream data processing. Apache Flink is … flink.apache.org 1 • The core of Flink is a distributed streaming dataflow engine. • Executing dataflows in parallel on clusters • Providing a reliable foundation for various workloads • DataSet and DataStream programming abstractions are the foundation for user programs and higher layers
  • 3.
    One engine formany use cases flink.apache.org 2 Real time streaming topologies Machine Learning at scale Graph Analysis Long batch pipelines
  • 4.
    What happened? • NewCommitter: Chesnay Schepler • Discussions for a 0.9.1 release started • Flink is becoming more popular: – 1000+ Twitter followers – 500+ GitHub stars – Named as “open source Big Data project” to watch by ZDNet. – Flink Forward schedule with great speakers announced flink.apache.org 3
  • 5.
    Now in master(0.10-SNAPSHOT) flink.apache.org 4 • Gelly Scala API • Flink dropped Java 6 support • Streaming connector for Elastic Search • Sampling operation on DataSet API • A lot of bug fixes: – Streaming: APIs, general stability, kafka connector
  • 6.
    Articles and Meetups •High-throughput, low-latency, and exactly-once stream processing with Apache Flink [1] • Introducing Gelly: Graph Processing with Apache Flink [2] • Apache Flink and the case for stream processing [3] • Crunching Parquet Files with Apache Flink [4] • The morning paper: Asynchronous Distributed Snapshots for Distributed Dataflows [5] • Five open source Big Data projects to watch [6] • Big Data Performance Engineering: Examples from Hadoop, Pig, HBase, Flink and Spark [7] 5 [1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ [2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html [3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html [4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7 [5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/ [6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/ [7] http://www.bigsynapse.com/addressing-big-data-performance
  • 7.
  • 8.
    Upcoming • Sept 15:Washington DC Area Apache Flink Meetup • Sept 17: StreamProcessing.be meetup • Sept 28-30: Flink Talks at ApacheCon Big Data Budapest New Meetup groups: • New York • Boston flink.apache.org 7
  • 9.
    Flink Forward schedulepublished flink.apache.org 8 • http://flink-forward.org/?post_type=day • Talks by Huawei, CapitalOne, Bouyges, Ericsson, Amadeus, ResearchGate, RedHat, and many more.

Editor's Notes