Bay Area Apache Flink Meetup Community Update August 2015

Bay Area Apache Flink Meetup #2
Distributed Stream and Graph Processing
Community Update
August 2015
Henry Saputra
Committer and PMC Member
hsaputra@apache.org
@Kingwulf

Apache Flink is an open source platform for
scalable batch and stream data processing.
Apache Flink is …
2
• The core of Apache Flink is a
distributed streaming dataflow
engine.
• Executing dataflows in
parallel on clusters
• Providing a reliable
foundation for various
workloads
• DataSet and DataStream
programming abstractions are
the foundation for user programs
and higher layers

One engine for many use cases
3
Real time streaming
topologies
Machine Learning at scale
Graph Analysis
Long batch 
pipelines

What happened? - 1
• New PMC: Maximilian Michels
• New Committer: Chesnay Schepler
• Discussions for a 0.9.1 release had started
• Apache Flink is becoming more popular:
– 1000+ Twitter followers
– 500+ GitHub stars
– Named as “open source Big Data project” to
watch by ZDNet.
– Flink Forward schedule with great speakers
announced
4

What happened? - 2
• Apache Flink on Wikipedia: https://
en.wikipedia.org/wiki/Apache_Flink
• New JobManager Dashboard
• Apache SAMOA 0.3.0-incubating with Flink
integration
• New “Features” page
• Contributors list (can you spot your name?)
https://cwiki.apache.org/confluence/display/
FLINK/List+of+contributors
5

New Website Redesign and
New Features page
7

New Architecture diagram in 0.10
documentation
8

More contents in the Wiki for
Internal Information
9

In master (0.10-SNAPSHOT) - 1
10
• Gelly Scala API
• More improvements and fixes for YARN
• Flink dropped Java 6 support
• Streaming connector for Elastic Search
• Sampling operation on DataSet API
• A lot of bug fixes:
– Streaming: APIs, general stability, kafka
connector

In master (0.10-SNAPSHOT) - 2
• Low watermarks / Event time
• New JM Dashboard
• Akka messages are now aware of leader
IDs (for HA)
• Zookeeper integration (for HA)
• Live accumulators (runtime only)
• Stability improvements
11

Articles and Mentions
• High-throughput, low-latency, and exactly-once stream
processing with Apache Flink [1]
• Introducing Gelly: Graph Processing with Apache Flink [2]
• Apache Flink and the case for stream processing [3]
• Crunching Parquet Files with Apache Flink [4]
• The morning paper: Asynchronous Distributed Snapshots for
Distributed Dataflows [5]
• Five open source Big Data projects to watch [6]
• Big Data Performance Engineering: Examples from Hadoop,
Pig, HBase, Flink and Spark [7]
12
[1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
[3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html
[4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7
[5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/
[6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/
[7] http://www.bigsynapse.com/addressing-big-data-performance

New Meetups and Events
13
• Chicago: Flink Training @ Capital One
• Bay Area: Stream & Graph Processing @
MapR
13

Upcoming
• Sept 15: Washington DC Area Apache
Flink Meetup
• Sept 17: StreamProcessing.be meetup
• Sept 28-30: Flink Talks at ApacheCon Big
Data Budapest
New Meetup groups:
• New York
• Boston
15

Flink Forward schedule published
16
• http://flink-forward.org/?post_type=day
• Talks by Google, Data Artisans, Huawei,
CapitalOne, Bouyges, Ericsson, Amadeus,
ResearchGate, RedHat, and many more.
50%
off for this meetup‘s guests
FlinkMeetupBayArea50

Bay Area Apache Flink Meetup Community Update August 2015

More Related Content

What's hot

Similar to Bay Area Apache Flink Meetup Community Update August 2015

Recently uploaded

Bay Area Apache Flink Meetup Community Update August 2015