SlideShare a Scribd company logo
Apache Flink 101
- The Rise of Stream Processing and Beyond
Bowen Li
Commiter@Apache Flink, Senior Engineer@Alibaba
Nov 20, 2019 @Big Data Bellevue
● Flink use cases
● Flink in a nutshell - what makes Flink successful in stream processing
● Beyond stream processing
○ Batch
○ Data warehousing and Notebook
○ AI/Machine Learning
○ Serverless
Agenda
Time = Value
Business demands Real Time Computation
Flink at Alibaba
● Powers real time computations of all business units at Alibaba group
● Powers all search and recommendations, both online and offline
● Provided as cloud service to public on Ali Cloud
Single’s Day Global Shopping Festival on 11/11
Single’s Day Stats - Nov 11, 2019
Alibaba
○ GMV
■ $14 million in the first 21s
■ $1.4 billion in the first 96s
■ $38 billion in 24h
○ 982 PB data generated in total
○ 544,000 transactions/sec at peak
Flink @ Alibaba
○ 2 billion events/sec, 3 TB/sec - up 111% from 2018
Flink at Alibaba
Use Case 1: online ML
○ hundreds of millions events
○ 100+ billions features
○ e2e second latency
○ real-time training, feature and module update
Flink at Alibaba
Use Case 1: online ML
Flink at Alibaba
Use Case 2: Real Time GMV dashboard
Flink at Alibaba
Use Case 2: Real Time GMV dashboard
Flink at Scale
Real Time AI / ML
Real Time AnalyticsReal Time Fraud Detection/Risk Management
Real Time Dynamic Pricing
Flink at Scale
Real Time Compute Service on public cloud backed by Flink
Kinesis Analytics
Flink in a Nutshell
- key differentiaters from other open source solutions
ancient squirrel from Ice Age!
Flink in a Nutshell
Stateful Computations …...
Stateful Computations …...
Why (built-in) state matters?
● computation with context, rather than single record
● no network IO, lower latency
● no external dependency, full control by framework for exactly-once semantics
● ...
Stateful Computations …...
Flink provides built-in state backends that support rich, arbitrary data
structure in a fully fault tolerant way
○ In-memory, splliable backend
○ RocksDB backend
Stateful Computations over Event Streams…...
Flink in a Nutshell
Stateful Computations over Event Streams…...
It means a few things…
1. All your data is data streams!
○ batch v.s. streaming - just execution models
○ bounded v.s. unbounded data streams - key difference
○ technically all data processing is stream processing
Stateful Computations over Event Streams…...
2. Streaming-first, pipelined execution
○ record flowing thru the system
-> extremely high throughput, super fast, ultra low latency
○ fondamentally different from batch-first, staged-execution framework
○ can’t achieved by mini-batch workaround
Stateful Computations over Event Streams…...
3. Event comes with time!
○ event time v.s. processing time
○ windows aggregation, sessionazation, pattern recognition, time-base joins
○ out-of-order and late data
Flink support all the most comprehensive time semantics natively
from the beginning
Stateful Computations over Event Streams
in an Expressive …...
Flink in a Nutshell
Stateful Computations over Event Streams
in an Expressive …...
Layered APIs with the most comprehensive semantics
Flink’s layered APIs
Streaming SQL Table API DataStream API ProcessFunction
More
- declarative
- optimizable
- understandable
- stable
- unified for streaming and batch
More
- advanced
- precise control
- optimized
(if you know what you are
doing)
Streaming SQL
Flink SQL> CREATE TABLE test(user BIGINT, msg VARCHAR, ts VARCHAR)
WITH (
'connector.type' = 'kafka',
'connector.topic' = 'topic_name',
'format.type' = 'avro',
'connector.startup-mode' = 'earliest-offset'
)
Flink SQL> SELECT * FROM mytest;
Flink SQL> INSERT INTO unique_user SELECT DISTINCT user FROM test;
Table API
// Java/Scala/Python
tableEnvironment
.connect(
new Kafka().version("1.0")
.topic("topic_name")
.startFromEarliest())
.schema(...)
.isAppendMode()
.registerTableSource("test");
Table test = tableEnv.scan("test");
test.select("user").disintct().insertInto("unique_user");
DataStream API
// Java/Scala
FlinkKafkaConsumer<...> consumer = new FlinkKafkaConsumer<>("test", ...);
consumer.setStartFromEarlist();
DataStream<...> stream = env.addSource(consumer);
stream
.keyBy("user")
.flatMap(new DataStreamDistinctReduce() { … });
Does SQL Make Sense in Streaming?
Stream-Table Duality
Word Count
Hello 1
World 1
Hello 2
World 2
Hello 3
Hello 4
Word Count
Hello 4
World 2
Stream Table
materialized
changelog
popular as CDC in database replication
Does SQL Make Sense in Streaming?
Stream
Data
Dynamic
Table
Dynamic
Table
Stream
Data
Continuous Query
Static
Table
← snapshot →
Static
Table
Flink
Static QueryBatch
Data
Batch
Data
Stateful Computations over Event Streams
in an Expressive, Scalable …...
Flink in a Nutshell
Stateful Computations over Event Streams
in an Expressive, Scalable …...
● Horizontally scalable
● Battle tested
○ trillions of records per day
○ terabytes of state
○ run on thousands of cores
Stateful Computations over Event Streams
in an Expressive, Scalable, Operational-focused
…...
Flink in a Nutshell
Stateful Computations over Event Streams
in an Expressive, Scalable, Operational-focused
…...
● Deploy anywhere
○ kubernetes, yarn, mesos, standalone
● Deploy flexibly
○ per-job mode, session mode
● Highly available
○ with HA setup
Stateful Computations over Event Streams
in an Expressive, Scalable, Operational-focused,
Fault Tolerant way
Flink in a Nutshell
Stateful Computations over Event Streams
in an Expressive, Scalable, Operational-focused,
Fault Tolerant way
● Checkpoint/Savepoint
○ on-the-fly, don’t scrafice performance
○ support incremental
● Exactly-once
○ State consistency
○ End-to-end with transactional connectors
Apache Flink:
Stateful Computations over Event Streams
in an
Expressive,
Scalable,
Operational-focused,
Fault Tolerant way
The only open source framework that provide all the above capabilities
Going Beyond Stream Processing
● Batch -> Unified Data Processing
● Data Warehousing and Notebook
● Machine Learning / AI / DL
● Serverless
Recap the lambda architecture ......
○ infra: high operation cost
○ dev: costly maintenance, and hard to learn 2+ stack
○ business: hard to sync to guarantee logic consistency
Why Unified Streaming and Batch Data Processing?
MQ / Pub-Sub
HDFS / S3
Stream Processing
(online)
Batch Processing
(offline)
Combine Results
(serving)
Why Flink?
Flink’s philosophy: streaming first, with batch is a special case of streaming
State-of-the-Art Batch Processing on a Stream Processor
<= Flink 1.8 from Flink 1.9
Performance of Blink versus Spark in the TPC-DS benchmark, aggregate time for all queries together.
Presentation by Xiaowei Jiang at Flink Forward Beijing, 2018.
Data Warehousing
Initial integration in Flink 1.9 for Hive 2.3.4 and 1.2.1
Full integration (read, write, udf) in Flink 1.10 for all Hive 1.x, 2.x, and 3.x
Notebook
Machine Learning & AI & DL
Machine Learning & AI
Recap the “lambda” architecture, again, for ML
MQ / Pub-Sub
HDFS / S3
Online
Training
Offline
Training
Model
Validation
Preprocessing
Dynamic Model
Static Model
Static Model
Preprocessing
Inference
Flink is popular for online ML now
MQ / Pub-Sub
HDFS / S3
Online
Training
Offline
Training
Model
Validation
Preprocessing
Dynamic Model
Static Model
Static Model
Preprocessing
Inference
Streaming-Batch Unified ML
Use Flink everywhere to reduce maintenance and operation cost of code and infra
MQ / Pub-Sub
HDFS / S3
Online
Training
Offline
Training
Model
Validation
Preprocessing
Dynamic Model
Static Model
Static Model
Preprocessing
Inference
Machine Learning & AI
ML Stage
ML Flow
Efforts &
Requirements
MQ / Pub-Sub
HDFS / S3
Online Training
Offline Training
Model
Validation
Preprocessing
Dynamic Model
Static Model
Static Model
Preprocessing
Inference
Data Acquisition Preprocessing Model Training Model Validation
& Serving
Inference
Rich Connectors
Dataset Management
Stream-Batch unification
Strong API & SQL Support
Enhanced Iteration
Flink ML lib
DL on Flink
(TF, PyTorch)
Model Serving
Model Registry
& Management
Rollout/Rollback
Online
Evaluation
Flink ML Pipeline
Python API support
Machine Learning & AI
ML Stage
ML Flow
Efforts &
Requirements
MQ / Pub-Sub
HDFS / S3
Online Training
Offline Training
Model
Validation
Preprocessing
Dynamic Model
Static Model
Static Model
Preprocessing
Inference
Data Acquisition Preprocessing Model Training Model Validation
& Serving
Inference
Rich Connectors
Dataset Management
Stream-Batch unification
Strong API & SQL Support
Enhaced Iteration
Flink ML lib
DL on Flink
(TF, PyTorch)
Model Serving
Model Management
Rollout/Rollback
Online
Evaluation
Flink ML Pipeline
Python API support
Flink ML Libs
● Completely rewrote
● Based on ML pipeline, powered by Table API
● Battle tested algorithms
○ K-means
○ Naive Bayes
○ Linear regression
○ GBDT
○ Decision tree
○ PCA
○ Random forest
○ Correlation
○ ….
Flink ML Pipeline
training
inference
input table 1 ModelTransformerEstimatorTransformer
input table 2 result tableModelTransformer
Two type of operators
● Transformer: data -> data
● Estimator: data -> model
Estimator Pipeline Model Pipeline
Model Pipeline
pipeline.fit(input1)
pipeline.transform(input2)
Deep Learning Pipeline
Data Acquisition
Data Process
& Tranformation
Model Training Model Validation Model Serving
Parameter
Tuning
Deep Learning Pipeline
source 1
source 2
join udtf
Flink Cluster
External
MQ / FS
Tensorflow
Cluster
worker workerworker
PS PS
Flink Deep Learning Pipeline
Data Acquisition
Data Process
& Tranformation
Model Training Model Validation Model Serving
Parameter
Tuning
Flink + Tensorflow Integration
source 1
source 2
join udtf
Flink Cluster
External
MQ / FS
Tensorflow
Cluster
worker workerworker
PS PS
source 1
source 2
join udtf
a single Flink Cluster
worker workerworker
PS PS
DL on Flink ML Pipeline
source 1
source 2
join udtf
worker workerworker
PS PS
Transformer Estimator
check out flink-ai-extended https://github.com/alibaba/flink-ai-extended
Serverless
Serverless
Event Driven Function as a Service
Benefits:
● elastic
● lightweight
Challenges:
○ state
■ consistency
■ IO
■ capacity
○ hard to build complex
applications
Event Driven State Management Composable
isn’t that….
Event Driven State Management Composable
isn’t that….
Stream Processing!
Check out project State Function, announced in Oct 2019!
https://statefun.io/
It officially became part of Apache Flink last week.
Thanks!
Twitter: @Bowen__Li
ML: dev / user@flink.apache.org
Meetup: Seattle Flink Meetup
https://www.meetup.com/seattle-apache-flink/
Q & A

More Related Content

What's hot

NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
Gregory Keys
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
Stephan Ewen
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
kafka
kafkakafka
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
Dhrubaji Mandal ♛
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Databricks
 

What's hot (20)

NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
kafka
kafkakafka
kafka
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 

Similar to Apache Flink 101 - the rise of stream processing and beyond

Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Bowen Li
 
BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
Ido Shilon
 
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableWhy Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
HostedbyConfluent
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
Monal Daxini
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Bowen Li
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
Jamie Grier
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
confluent
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Snowflakes in the Cloud Real world experience on a new approach for Big Data
Snowflakes in the Cloud Real world experience on a new approach for Big DataSnowflakes in the Cloud Real world experience on a new approach for Big Data
Snowflakes in the Cloud Real world experience on a new approach for Big Data
DevFest DC
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 

Similar to Apache Flink 101 - the rise of stream processing and beyond (20)

Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
 
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableWhy Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Snowflakes in the Cloud Real world experience on a new approach for Big Data
Snowflakes in the Cloud Real world experience on a new approach for Big DataSnowflakes in the Cloud Real world experience on a new approach for Big Data
Snowflakes in the Cloud Real world experience on a new approach for Big Data
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 

More from Bowen Li

Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
Bowen Li
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
Bowen Li
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
Bowen Li
 
Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019
Bowen Li
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Bowen Li
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
Bowen Li
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Bowen Li
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Bowen Li
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
Bowen Li
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Bowen Li
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
Bowen Li
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Bowen Li
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
Bowen Li
 

More from Bowen Li (13)

Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 
How to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetupHow to contribute to Apache Flink @ Seattle Flink meetup
How to contribute to Apache Flink @ Seattle Flink meetup
 
Community update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to FlinkCommunity update on flink 1.9 and How to Contribute to Flink
Community update on flink 1.9 and How to Contribute to Flink
 
Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019Integrating Flink with Hive - Flink Forward SF 2019
Integrating Flink with Hive - Flink Forward SF 2019
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
 
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
 
Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018Status Update of Seattle Flink Meetup, Jun 2018
Status Update of Seattle Flink Meetup, Jun 2018
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
 
Opening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink MeetupOpening - Seattle Apache Flink Meetup
Opening - Seattle Apache Flink Meetup
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Apache Flink 101 - the rise of stream processing and beyond

  • 1. Apache Flink 101 - The Rise of Stream Processing and Beyond Bowen Li Commiter@Apache Flink, Senior Engineer@Alibaba Nov 20, 2019 @Big Data Bellevue
  • 2. ● Flink use cases ● Flink in a nutshell - what makes Flink successful in stream processing ● Beyond stream processing ○ Batch ○ Data warehousing and Notebook ○ AI/Machine Learning ○ Serverless Agenda
  • 3. Time = Value Business demands Real Time Computation
  • 4.
  • 5. Flink at Alibaba ● Powers real time computations of all business units at Alibaba group ● Powers all search and recommendations, both online and offline ● Provided as cloud service to public on Ali Cloud
  • 6. Single’s Day Global Shopping Festival on 11/11
  • 7. Single’s Day Stats - Nov 11, 2019 Alibaba ○ GMV ■ $14 million in the first 21s ■ $1.4 billion in the first 96s ■ $38 billion in 24h ○ 982 PB data generated in total ○ 544,000 transactions/sec at peak Flink @ Alibaba ○ 2 billion events/sec, 3 TB/sec - up 111% from 2018
  • 8. Flink at Alibaba Use Case 1: online ML
  • 9. ○ hundreds of millions events ○ 100+ billions features ○ e2e second latency ○ real-time training, feature and module update Flink at Alibaba Use Case 1: online ML
  • 10. Flink at Alibaba Use Case 2: Real Time GMV dashboard
  • 11. Flink at Alibaba Use Case 2: Real Time GMV dashboard
  • 12. Flink at Scale Real Time AI / ML Real Time AnalyticsReal Time Fraud Detection/Risk Management Real Time Dynamic Pricing
  • 13. Flink at Scale Real Time Compute Service on public cloud backed by Flink Kinesis Analytics
  • 14. Flink in a Nutshell - key differentiaters from other open source solutions ancient squirrel from Ice Age!
  • 15. Flink in a Nutshell Stateful Computations …...
  • 16. Stateful Computations …... Why (built-in) state matters? ● computation with context, rather than single record ● no network IO, lower latency ● no external dependency, full control by framework for exactly-once semantics ● ...
  • 17. Stateful Computations …... Flink provides built-in state backends that support rich, arbitrary data structure in a fully fault tolerant way ○ In-memory, splliable backend ○ RocksDB backend
  • 18. Stateful Computations over Event Streams…... Flink in a Nutshell
  • 19. Stateful Computations over Event Streams…... It means a few things… 1. All your data is data streams! ○ batch v.s. streaming - just execution models ○ bounded v.s. unbounded data streams - key difference ○ technically all data processing is stream processing
  • 20. Stateful Computations over Event Streams…... 2. Streaming-first, pipelined execution ○ record flowing thru the system -> extremely high throughput, super fast, ultra low latency ○ fondamentally different from batch-first, staged-execution framework ○ can’t achieved by mini-batch workaround
  • 21. Stateful Computations over Event Streams…... 3. Event comes with time! ○ event time v.s. processing time ○ windows aggregation, sessionazation, pattern recognition, time-base joins ○ out-of-order and late data Flink support all the most comprehensive time semantics natively from the beginning
  • 22. Stateful Computations over Event Streams in an Expressive …... Flink in a Nutshell
  • 23. Stateful Computations over Event Streams in an Expressive …... Layered APIs with the most comprehensive semantics
  • 24. Flink’s layered APIs Streaming SQL Table API DataStream API ProcessFunction More - declarative - optimizable - understandable - stable - unified for streaming and batch More - advanced - precise control - optimized (if you know what you are doing)
  • 25. Streaming SQL Flink SQL> CREATE TABLE test(user BIGINT, msg VARCHAR, ts VARCHAR) WITH ( 'connector.type' = 'kafka', 'connector.topic' = 'topic_name', 'format.type' = 'avro', 'connector.startup-mode' = 'earliest-offset' ) Flink SQL> SELECT * FROM mytest; Flink SQL> INSERT INTO unique_user SELECT DISTINCT user FROM test;
  • 26. Table API // Java/Scala/Python tableEnvironment .connect( new Kafka().version("1.0") .topic("topic_name") .startFromEarliest()) .schema(...) .isAppendMode() .registerTableSource("test"); Table test = tableEnv.scan("test"); test.select("user").disintct().insertInto("unique_user");
  • 27. DataStream API // Java/Scala FlinkKafkaConsumer<...> consumer = new FlinkKafkaConsumer<>("test", ...); consumer.setStartFromEarlist(); DataStream<...> stream = env.addSource(consumer); stream .keyBy("user") .flatMap(new DataStreamDistinctReduce() { … });
  • 28. Does SQL Make Sense in Streaming? Stream-Table Duality Word Count Hello 1 World 1 Hello 2 World 2 Hello 3 Hello 4 Word Count Hello 4 World 2 Stream Table materialized changelog popular as CDC in database replication
  • 29. Does SQL Make Sense in Streaming? Stream Data Dynamic Table Dynamic Table Stream Data Continuous Query Static Table ← snapshot → Static Table Flink Static QueryBatch Data Batch Data
  • 30. Stateful Computations over Event Streams in an Expressive, Scalable …... Flink in a Nutshell
  • 31. Stateful Computations over Event Streams in an Expressive, Scalable …... ● Horizontally scalable ● Battle tested ○ trillions of records per day ○ terabytes of state ○ run on thousands of cores
  • 32. Stateful Computations over Event Streams in an Expressive, Scalable, Operational-focused …... Flink in a Nutshell
  • 33. Stateful Computations over Event Streams in an Expressive, Scalable, Operational-focused …... ● Deploy anywhere ○ kubernetes, yarn, mesos, standalone ● Deploy flexibly ○ per-job mode, session mode ● Highly available ○ with HA setup
  • 34. Stateful Computations over Event Streams in an Expressive, Scalable, Operational-focused, Fault Tolerant way Flink in a Nutshell
  • 35. Stateful Computations over Event Streams in an Expressive, Scalable, Operational-focused, Fault Tolerant way ● Checkpoint/Savepoint ○ on-the-fly, don’t scrafice performance ○ support incremental ● Exactly-once ○ State consistency ○ End-to-end with transactional connectors
  • 36. Apache Flink: Stateful Computations over Event Streams in an Expressive, Scalable, Operational-focused, Fault Tolerant way The only open source framework that provide all the above capabilities
  • 37. Going Beyond Stream Processing ● Batch -> Unified Data Processing ● Data Warehousing and Notebook ● Machine Learning / AI / DL ● Serverless
  • 38. Recap the lambda architecture ...... ○ infra: high operation cost ○ dev: costly maintenance, and hard to learn 2+ stack ○ business: hard to sync to guarantee logic consistency Why Unified Streaming and Batch Data Processing? MQ / Pub-Sub HDFS / S3 Stream Processing (online) Batch Processing (offline) Combine Results (serving)
  • 39. Why Flink? Flink’s philosophy: streaming first, with batch is a special case of streaming
  • 40. State-of-the-Art Batch Processing on a Stream Processor <= Flink 1.8 from Flink 1.9
  • 41. Performance of Blink versus Spark in the TPC-DS benchmark, aggregate time for all queries together. Presentation by Xiaowei Jiang at Flink Forward Beijing, 2018.
  • 42. Data Warehousing Initial integration in Flink 1.9 for Hive 2.3.4 and 1.2.1 Full integration (read, write, udf) in Flink 1.10 for all Hive 1.x, 2.x, and 3.x
  • 45. Machine Learning & AI Recap the “lambda” architecture, again, for ML MQ / Pub-Sub HDFS / S3 Online Training Offline Training Model Validation Preprocessing Dynamic Model Static Model Static Model Preprocessing Inference
  • 46. Flink is popular for online ML now MQ / Pub-Sub HDFS / S3 Online Training Offline Training Model Validation Preprocessing Dynamic Model Static Model Static Model Preprocessing Inference
  • 47. Streaming-Batch Unified ML Use Flink everywhere to reduce maintenance and operation cost of code and infra MQ / Pub-Sub HDFS / S3 Online Training Offline Training Model Validation Preprocessing Dynamic Model Static Model Static Model Preprocessing Inference
  • 48. Machine Learning & AI ML Stage ML Flow Efforts & Requirements MQ / Pub-Sub HDFS / S3 Online Training Offline Training Model Validation Preprocessing Dynamic Model Static Model Static Model Preprocessing Inference Data Acquisition Preprocessing Model Training Model Validation & Serving Inference Rich Connectors Dataset Management Stream-Batch unification Strong API & SQL Support Enhanced Iteration Flink ML lib DL on Flink (TF, PyTorch) Model Serving Model Registry & Management Rollout/Rollback Online Evaluation Flink ML Pipeline Python API support
  • 49. Machine Learning & AI ML Stage ML Flow Efforts & Requirements MQ / Pub-Sub HDFS / S3 Online Training Offline Training Model Validation Preprocessing Dynamic Model Static Model Static Model Preprocessing Inference Data Acquisition Preprocessing Model Training Model Validation & Serving Inference Rich Connectors Dataset Management Stream-Batch unification Strong API & SQL Support Enhaced Iteration Flink ML lib DL on Flink (TF, PyTorch) Model Serving Model Management Rollout/Rollback Online Evaluation Flink ML Pipeline Python API support
  • 50. Flink ML Libs ● Completely rewrote ● Based on ML pipeline, powered by Table API ● Battle tested algorithms ○ K-means ○ Naive Bayes ○ Linear regression ○ GBDT ○ Decision tree ○ PCA ○ Random forest ○ Correlation ○ ….
  • 51. Flink ML Pipeline training inference input table 1 ModelTransformerEstimatorTransformer input table 2 result tableModelTransformer Two type of operators ● Transformer: data -> data ● Estimator: data -> model Estimator Pipeline Model Pipeline Model Pipeline pipeline.fit(input1) pipeline.transform(input2)
  • 52. Deep Learning Pipeline Data Acquisition Data Process & Tranformation Model Training Model Validation Model Serving Parameter Tuning
  • 53. Deep Learning Pipeline source 1 source 2 join udtf Flink Cluster External MQ / FS Tensorflow Cluster worker workerworker PS PS
  • 54. Flink Deep Learning Pipeline Data Acquisition Data Process & Tranformation Model Training Model Validation Model Serving Parameter Tuning
  • 55. Flink + Tensorflow Integration source 1 source 2 join udtf Flink Cluster External MQ / FS Tensorflow Cluster worker workerworker PS PS source 1 source 2 join udtf a single Flink Cluster worker workerworker PS PS
  • 56. DL on Flink ML Pipeline source 1 source 2 join udtf worker workerworker PS PS Transformer Estimator check out flink-ai-extended https://github.com/alibaba/flink-ai-extended
  • 58. Serverless Event Driven Function as a Service Benefits: ● elastic ● lightweight Challenges: ○ state ■ consistency ■ IO ■ capacity ○ hard to build complex applications
  • 59. Event Driven State Management Composable isn’t that….
  • 60. Event Driven State Management Composable isn’t that…. Stream Processing!
  • 61. Check out project State Function, announced in Oct 2019! https://statefun.io/ It officially became part of Apache Flink last week.
  • 62. Thanks! Twitter: @Bowen__Li ML: dev / user@flink.apache.org Meetup: Seattle Flink Meetup https://www.meetup.com/seattle-apache-flink/
  • 63. Q & A