Apache Flink community Update for March 2016 - Slim Baltagi

This Apache Flink Community Update for March 2016 was given by Slim Baltagi at the New York City Apache Flink Meetup.

Data & Analytics

Apache Flink Community Update
February 2016
Slim Baltagi
@SlimBaltagi
sbaltagi@gmail.com
New York City (NYC) Apache Flink Meetup,
March 3rd, 2016

Flink 1.0
Apache Flink 1.0.0 release is being
finalized. It is being voted and will be
available sometimes this month: March
2016!
Apache Flink 0.10 to 1.0 Migration
Guide will contain the API breaking
changes and deprecated components.
2

Flink in Action
Apache Flink in Action is probably the
First book on Apache Flink!
 It will be published by Manning. It is being
co-authored by Sameer Wadkar
(@wadkar_sameer ),Slim Baltagi
(@SlimBaltagi) and Suneel
Marthi(@suneelmarthi)
Please stay tuned for the MEAP: Manning
Early Access Program!
3

Reading List
 Dataflow/Beam & Spark: A Programming Model
Comparison. February 3rd, 2016
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-
comparison
 How Apache Flink enables new streaming
applications Part II: State and versioning by Ufuk Celebi
and Kostas Tzoumas. February 3rd, 2016
http://data-artisans.com/how-apache-flink-enables-new-streaming-applications/
 The Essential Guide to Streaming-first Processing with
Apache Flink, Fabian Hueske and Kostas
Tzoumashttps://www.mapr.com/blog/essential-guide-streaming-first-
processing-apache-flink by Fabian Hueske and Kostas Tzoumas.
 Apache Flink vs Apache Spark - Reproducible
experiments on
cloud Slides: http://www.slideshare.net/shelan1/apache-flink-vs-
apache-spark-reproducible-experiments-on-cloud
4

Global Flink Meetup Community
 New Meetups:
 Europe
• Apache Flink en Madrid: Feb. 4th , 2016
• Apache Flink London Meetup: Feb. 10th , 2016
 Asia
• Bengaluru Apache Flink Meetup: Feb. 15th , 2016
 America
• Seattle Apache Flink Meetup: Feb. 29, 2016
 New York City (NYC) Apache Flink Meetup is now the
world’s largest Apache Flink meetup! Sorry Berlin and Bay
Area!
6

Upcoming talks at Conferences – March 2016
 Qcon London, March 7-9, 2016. Talk by Robert
Metzger, dataArtisans:
• Stream Processing with Apache Flink
https://qconlondon.com/presentation/stream-
processing-apache-flink
 Strata + Hadoop World ( 1 talk) (March 31, 2016).
Stephan Ewen, dataArtisans:
• Apache Flink: Streaming done right
http://conferences.oreilly.com/strata/hadoop-big-
data-ca/public/schedule/detail/47190
7

Upcoming talks at Conferences – April 2016
 Hadoop Summit Dublin, April 13-14, 2016. 2 talks:
http://hadoopsummit.org/dublin/agenda/
• Overview of Apache Flink; the 4G of Big Data
Analytics Frameworks, Slim Baltagi, Capital One
• Unified Stream & Batch Processing with Apache
Flink, Ufuk Celebi, data Artisans
 Kafka Summit, April 26, 2016
• Advanced Streaming Analytics with Apache Flink and
Apache Kafka, Stephan Ewen, data
Artisanshttp://kafka-summit.org/sessions/advanced-
streaming-analytics-with-apache-flink-and-apache-
kafka/
8

Few Upcoming Meetups
 Talk about Apache Flink 1.0 and its new features by
Stephan Ewen, CEO of data Artisans:
• Washington DC Area Apache Flink Meetup, April 7th,
2016. http://www.meetup.com/Washington-DC-Area-
Apache-Flink-Meetup/
• New York City (NYC) Apache Flink Meetup, April 5th
or April 6th. Date to be confirmed.
http://www.meetup.com/New-York-City-NYC-Apache-
Flink-Meetup/
9

Happenings in the Flink Community
This is a daily updated list for each
month as Flink articles and blogs are
published, announcements are made
about Apache Flink releases, meetups,
conferences, ...
• February 2016 http://sparkbigdata.com/102-spark-blog-
slim-baltagi/28-happenings-in-the-flink-community-
february-2016
• March 2016http://www.sparkbigdata.com/102-spark-blog-
slim-baltagi/29-happenings-in-the-flink-community-march-
2016
10

Slides of my talk at the Hadoop Summit Europe in Dublin, Ireland on April 13th, 2016. The talk introduces Apache Flink as both a multi-purpose Big Data analytics framework and real-world streaming analytics framework. It is focusing on Flink's key differentiators and suitability for streaming analytics use cases. It also shows how Flink enables novel use cases such as distributed CEP (Complex Event Processing) and querying the state by behaving like a key value data store.

Apache Fink 1.0: A New Era for Real-World Streaming Analytics

Apache Flink: Real-World Use Cases for Streaming Analytics

This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications. In this talk, you learn more about: 1. What is Apache Flink Stack? 2. Batch vs. Streaming Analytics 3. Key Differentiators of Apache Flink for Streaming Analytics 4. Real-World Use Cases with Flink for Streaming Analytics 5. Who is using Flink? 6. Where do you go from here?

Building Streaming Data Applications Using Apache Kafka

Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing. In this talk you will learn more about: 1. A quick introduction to Kafka Core, Kafka Connect and Kafka Streams: What is and why? 2. Code and step-by-step instructions to build an end-to-end streaming data application using Apache Kafka

Stateful Stream Processing at In-Memory Speed

Jamie Grier

Apache Flink(tm) - A Next-Generation Stream Processor

In diesem Vortrag wird es zunächst einen kurzen Überblick über den aktuellen Stand im Bereich der Streaming-Datenanalyse geben. Danach wird es mit einer kleinen Einführung in das Apache-Flink-System zur Echtzeit-Datenanalyse weitergehen, bevor wir tiefer in einige der interessanten Eigenschaften eintauchen werden, die Flink von den anderen Spielern in diesem Bereich unterscheidet. Dazu werden wir beispielhafte Anwendungsfälle betrachten, die entweder direkt von Nutzern stammen oder auf unserer Erfahrung mit Nutzern basieren. Spezielle Eigenschaften, die wir betrachten werden, sind beispielsweise die Unterstützung für die Zerlegung von Events in einzelnen Sessions basierend auf der Zeit, zu der ein Ereignis passierte (event-time), Bestimmung von Zeitpunkten zum jeweiligen Speichern des Zustands eines Streaming-Programms für spätere Neustarts, die effiziente Abwicklung bei sehr großen zustandsorientierten Streaming-Berechnungen und die Zugänglichkeit des Zustandes von außerhalb.

Strata EU 2014: Spark Streaming Case Studies

Paco Nathan

20120907 microbiome-intro

Leo Lahti

Apache Flink is a community-driven open source and memory-centric Big Data analytics framework. It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases. Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table). At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly). In this talk, you will learn in more details about: What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?

Apache Flink Crash Course by Slim Baltagi and Srini Palthepu

In this hands-on Apache Flink presentation, you will learn in a step-by-step tutorial style about: • How to setup and configure your Apache Flink environment: Local/VM image (on a single machine), cluster (standalone), YARN, cloud (Google Compute Engine, Amazon EMR, ... )? • How to get familiar with Flink tools (Command-Line Interface, Web Client, JobManager Web Interface, Interactive Scala Shell, Zeppelin notebook)? • How to run some Apache Flink example programs? • How to get familiar with Flink's APIs and libraries? • How to write your Apache Flink code in the IDE (IntelliJ IDEA or Eclipse)? • How to test and debug your Apache Flink code? • How to deploy your Apache Flink code in local, in a cluster or in the cloud? • How to tune your Apache Flink application (CPU, Memory, I/O)?

Stream Processing: Choosing the Right Tool for the Job

Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams. The main questions are: How much data does it need to process? (throughput) Does it need to be fast? (latency) Who will build it? (supported languages, level of API, SQL capabilities, built-in windowing and joining functionalities, etc) Is accurate ordering important? (event time vs. processing time) Is there a batch component? (integration of batch API) How do we want it to run? (deployment options: standalone, YARN, mesos, …) How much state do we have? (state store options) – What if a message gets lost? (message delivery guarantees, checkpointing). For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.

QCon London - Stream Processing with Apache Flink

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

Automatic Detection of Web Trackers by Vasia Kalavri

Unified Batch and Real-Time Stream Processing Using Apache Flink

This talk was given at Capital One on September 15, 2015 at the launch of the Washington DC Area Apache Flink Meetup. Apache flink is positioned at the forefront of 2 major trends in Big Data Analytics: - Unification of Batch and Stream processing - Multi-purpose Big Data Analytics frameworks In these slides, we will also find answers to the burning question: Why Apache Flink? You will also learn more about how Apache Flink compares to Hadoop MapReduce, Apache Spark and Apache Storm.

Airflow at lyft for Airflow summit 2020 conference

Tao Feng

Community Update May 2016 (January - May) | Berlin Apache Flink Meetup

ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark

Carolyn Duby

ODSC East 2017 - How to use Zeppelin and Spark to document your research. Reproducible research documents not just the findings of a study but the exact code required to produce those findings. Reproducible research is a requirement for study authors to reliably repeat their analysis or accelerate new findings by applying the same techniques to new data. The increased transparency allows peers to quickly understand and compare the methods of the study to other studies and can lead to higher levels of trust, interest and eventually more citations of your work. Big data introduces some new challenges for reproducible research. As our data universe expands and the open data movement grows, more data is available than ever to analyze, and the possible combinations are infinite. Data cleaning and feature extraction often involve lengthy sequences of transformations. The space allotted for publications is not adequate to effectively describe all the details, so they can be reviewed and reproduced by others. Fortunately, the open source community is addressing this need with Apache Spark, Zeppelin and Hadoop. Apache Spark 2.0 makes it even simpler and faster to harness the power of a Hadoop computing cluster to clean, analyze, explore and train machine learning models on large data sets. Zeppelin web-based notebooks capture and share code and interactive visualizations with others. After this session you will be able to create a reproducible data science pipeline over large data sets using Spark, Zeppelin, and a Hadoop distributed computing cluster. Learn how to combine Spark with other supported interpreters to codify your results from cleaning to exploration to feature extraction and machine learning. Discover how to share your notebooks and data with others using the cloud. This talk will cover Spark and show examples, but it is not intended to be a complete tutorial on Spark.

Baymeetup-FlinkResearch

Foo Sounds

Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar

This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies. Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.

Jamie Grier - Robust Stream Processing with Apache Flink

http://flink-forward.org/kb_sessions/robust-stream-processing-with-apache-flink/ In this hands on talk and demonstration I’ll give a very short introduction to stream processing and then dive into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible. We’ll focus on correctness and robustness in stream processing. During this live demo we’ll be developing a realtime analytics application and modifying it on the fly based on the topics we’re working though. We’ll exercise Flink’s unique features, demonstrate fault-recovery, clearly explain and demonstrate why Event Time is such an important concept in robust stateful stream processing and talk about and demonstrate the features you need in a stream processor in production. Some of the topics covered will be: – Stateful Stream Processing – Event Time vs. Processing Time – Fault tolerance – State management in the face of faults – Savepoints – Data re-processing – Planned downtime and upgrades

The Past, Present, and Future of Apache Flink®

Stream processing still evolves and changes at a speed that can make it hard to keep up with the developments. Being at the forefront of stream processing technology, the evolution of Apache Flink has mirrored many of these developments and continues to do so. We will take you on a journey through the major milestones of stream processing technology in past years, diving into the latest additions that Apache Flink and other communities introduced to the stream processing landscape, such as Streamng SQL, Time Versioned Tables, cluster-library-duality, language portability, etc. We will take a sneak peek into our crystal ball and present in what the Flink community is working on next.

Pinot: Near Realtime Analytics @ Uber

Xiang Fu

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit

This talk given at the Hadoop Summit in San Jose on June 28, 2016, analyzes a few major trends in Big Data analytics. These are a few takeaways from this talk: - Adopt Apache Beam for easier development and portability between Big Data Execution Engines. - Adopt stream analytics for faster time to insight, competitive advantages and operational efficiency. - Accelerate your Big Data applications with In-Memory open source tools. - Adopt Rapid Application Development of Big Data applications: APIs, Notebooks, GUIs, Microservices… - Have Machine Learning part of your strategy or passively watch your industry completely transformed! - How to advance your strategy for hybrid integration between cloud and on-premise deployments?

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

What's hot

Apache Spark vs Apache Flink

AKASH SIHAG

Flink Community Update December 2015: Year in Review

Slim Baltagi – Flink vs. Spark

Apache Flink: Past, Present and Future

Gyula Fóra

January 2016 Flink Community Update & Roadmap 2016

Why apache Flink is the 4G of Big Data Analytics Frameworks

Apache Flink Crash Course by Slim Baltagi and Srini Palthepu

Stream Processing: Choosing the Right Tool for the Job

QCon London - Stream Processing with Apache Flink

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Till Rohrmann

Automatic Detection of Web Trackers by Vasia Kalavri

Unified Batch and Real-Time Stream Processing Using Apache Flink

Airflow at lyft for Airflow summit 2020 conference

Tao Feng

Community Update May 2016 (January - May) | Berlin Apache Flink Meetup

ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark

Carolyn Duby

Baymeetup-FlinkResearch

Foo Sounds

Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar

Jamie Grier - Robust Stream Processing with Apache Flink

The Past, Present, and Future of Apache Flink®

Pinot: Near Realtime Analytics @ Uber

Xiang Fu

What's hot (20)

Apache Spark vs Apache Flink

Flink Community Update December 2015: Year in Review

Slim Baltagi – Flink vs. Spark

Apache Flink: Past, Present and Future

January 2016 Flink Community Update & Roadmap 2016

Why apache Flink is the 4G of Big Data Analytics Frameworks

Apache Flink Crash Course by Slim Baltagi and Srini Palthepu

Stream Processing: Choosing the Right Tool for the Job

QCon London - Stream Processing with Apache Flink

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Automatic Detection of Web Trackers by Vasia Kalavri

Unified Batch and Real-Time Stream Processing Using Apache Flink

Airflow at lyft for Airflow summit 2020 conference

Community Update May 2016 (January - May) | Berlin Apache Flink Meetup

ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark

Baymeetup-FlinkResearch

Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar

Jamie Grier - Robust Stream Processing with Apache Flink

The Past, Present, and Future of Apache Flink®

Pinot: Near Realtime Analytics @ Uber

Viewers also liked

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi

This introductory level talk is about Apache Flink: a multi-purpose Big Data analytics framework leading a movement towards the unification of batch and stream processing in the open source. With the many technical innovations it brings along with its unique vision and philosophy, it is considered the 4 G (4th Generation) of Big Data Analytics frameworks providing the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases: batch, streaming, relational queries, machine learning and graph processing. In this talk, you will learn about: 1. What is Apache Flink stack and how it fits into the Big Data ecosystem? 2. How Apache Flink integrates with Hadoop and other open source tools for data input and output as well as deployment? 3. Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark. 4. Who is using Apache Flink? 5. Where to learn more about Apache Flink?

Step-by-Step Introduction to Apache Flink

This a talk that I gave at the 2nd Apache Flink meetup in Washington DC Area hosted and sponsored by Capital One on November 19, 2015. You will quickly learn in step-by-step way: How to setup and configure your Apache Flink environment? How to use Apache Flink tools? 3. How to run the examples in the Apache Flink bundle? 4. How to set up your IDE (IntelliJ IDEA or Eclipse) for Apache Flink? 5. How to write your Apache Flink program in an IDE?

A Big Data Journey: Bringing Open Source to Finance

Slim Baltagi & Rick Fath. Closing Keynote: Big Data Executive Summit. Chicago 11/28/2012. PART I – Hadoop at CME: Our Practical Experience 1. What’s CME Group Inc.? 2. Big Data & CME Group: a natural fit! 3. Drivers for Hadoop adoption at CME Group 4. Key Big Data projects at CME Group 5. Key Learning’s PART II - Bringing Hadoop to the Enterprise: Challenges & Opportunities PART II - Bringing Hadoop to the Enterprise 1. What is Hadoop, what it isn’t and what it can help you do? 2. What are the operational concerns and risks? 3. What organizational changes to expect? 4. What are the observed Hadoop trends?

Flink vs. Spark

Big Data at CME Group: Challenges and Opportunities

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...

Many use cases in the telecommunication industry require producing counters, quality metrics, and alarms in a streaming fashion with very low latency. Most of this metrics are only valuable when they’re made available as soon as the associated events happened. In our company we are looking for a system able to produce this kind of real-time indicator, which must handle massive amounts of data (400,000 eps) with often peak loads (like New Year’s Eve) or out-of-order events like massive network disorder. Low latency and flexible window management with specific watermark emission are also a must-haves. Heterogeneous format, multiple flow correlation, and the possibility of late data arrival are other challenges. Flink being already widely used at Bouygues Telecom for real-time data integration, its features made it the evident candidate for the future System. In this talk, we'll present a real use case of streaming analytics using Flink, Kafka & HBase along with other legacy systems.

Flink Case Study: Capital One

Building a Modern Data Architecture with Enterprise Hadoop

Overview of Apache Flink: Next-Gen Big Data Analytics Framework

These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing. In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?

Transitioning Compute Models: Hadoop MapReduce to Spark

Apache Flink vs Apache Spark - Reproducible experiments on cloud.

Shelan Perera

A Comparative Performance Evaluation of Apache Flink

Dongwon Kim

Hadoop or Spark: is it an either-or proposition? By Slim Baltagi

DataWorks Summit/Hadoop Summit

Hadoop or Spark: is it an either-or proposition? An exodus away from Hadoop to Spark is picking up steam in the news headlines and talks! Away from marketing fluff and politics, this talk analyzes such news and claims from a technical perspective. In practical ways, while referring to components and tools from both Hadoop and Spark ecosystems, this talk will show that the relationship between Hadoop and Spark is not of an either-or type but can take different forms such as: evolution, transition, integration, alternation and complementarity.

Viewers also liked (15)

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi

Step-by-Step Introduction to Apache Flink

A Big Data Journey: Bringing Open Source to Finance

Flink vs. Spark

Big Data at CME Group: Challenges and Opportunities

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...

Flink Case Study: Capital One

Building a Modern Data Architecture with Enterprise Hadoop

Overview of Apache Flink: Next-Gen Big Data Analytics Framework

Transitioning Compute Models: Hadoop MapReduce to Spark

Apache Flink vs Apache Spark - Reproducible experiments on cloud.

A Comparative Performance Evaluation of Apache Flink

Hadoop or Spark: is it an either-or proposition? By Slim Baltagi

Similar to Apache Flink community Update for March 2016 - Slim Baltagi

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks

Apache Flink Community Update March 2015

Flink Cummunity Update July (Berlin Meetup)Robert Metzger

Present and future of unified, portable, and efficient data processing with A...

DataWorks Summit

The world of big data involves an ever-changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the big data ecosystem together; it enables users to "run any data processing pipeline anywhere." This talk will briefly cover the capabilities of the Beam model for data processing and discuss its architecture, including the portability model. We’ll focus on the present state of the community and the current status of the Beam ecosystem. We’ll cover the state of the art in data processing and discuss where Beam is going next, including completion of the portability framework and the Streaming SQL. Finally, we’ll discuss areas of improvement and how anybody can join us on the path of creating the glue that interconnects the big data ecosystem. Speaker Davor Bonaci, Apache Software Foundation; Simbly, V.P. of Apache Beam; Founder/CEO at Operiant

Flink Community Update 2015 June

Márton Balassi

Apache Flink First Half of 2015 Community Update

Real time stock processing with apache nifi, apache flink and apache kafka

Timothy Spann

Flink Community Update April 2015

August Flink Community Update

CoC23_Utilizing Real-Time Transit Data for Travel Optimization

Timothy Spann

CoC23_Utilizing Real-Time Transit Data for Travel Optimization @PaasDev www.datainmotion.dev github.com/tspannhw medium.com/@tspann Principal Developer Advocate Princeton Future of Data Meetup ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC, ex-EY, ex-HPE. Apache NiFi x Apache Kafka x Apache Flink There are a lot of factors involved in determining how you can find our way around and avoid delays, bad weather,dangers and expenses. In this talk I will focus on public transport in the largest transit system in the United States, the MTA, which is focused around New York City. Utilizing public and semi-public data feeds, this can be extended to most city and metropolitan areas around the world. As a personal example, I live in New Jersey and this is an extremely useful use of open source and public data. Once I am notified that I need to travel to Manhattan, I need to start my data streams flowing. Most of the data sources are REST feeds that are ingested by Apache NiFi to transform, convert, enrich and finalize it for usage in streaming tables with Flink SQL, but also keep that same contract with Kafka consumers, Iceberg tables and other users of this data. I do not need to many user interfaces to interopt with the system as I want my final decision sent in a Slack message to me and then I’ll get moving. Along the way data will be visible in NiFi lineage, Kafka topic views, Flink SQL output, REST output and Iceberg tables. Apache NiFi, Apache Kafka, Apache OpenNLP, Apache Tika, Apache Flink, Apache Avro, Apache Parquet, Apache Iceberg. https://github.com/tspannhw/FLaNK-MTA/tree/main https://medium.com/@tspann/finding-the-best-way-around-7491c76ca4cb https://medium.com/@tspann/open-source-streaming-talks-in-progress-3e75af8848b0 https://medium.com/@tspann/watching-airport-traffic-in-real-time-32c522a6e386

Data streaming

Alberto Paro

Bay Area Apache Flink Meetup Community Update August 2015

Henry Saputra

Apache Flink - Community Update January 2015

Fabian Hueske

Present and future of unified, portable and efficient data processing with Ap...

DataWorks Summit

What's new in Spark 2.0?rerngvit yanggratoke

Berlin Apache Flink Meetup May 2015, Community Update

Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup

Ververica

Apache Flink Community Updates November 2016 @ Berlin Meetup

Robust stream processing with Apache Flink