SlideShare a Scribd company logo
1 of 54
Download to read offline
© 2019 Ververica
Seth Wiesman
Senior Solutions Architect @ Ververica
Committer Apache Flink
Unified Data Processing with Apache Flink and Apache
Pulsar
© 2019 Ververica2
About Ververica (the company formerly known as “data Artisans”)
Original Creators of
Apache Flink®
Enterprise Stream Processing
With Ververica Platform
Subsidiary of
Alibaba Group
© 2019 Ververica3
Apache Flink
© 2019 Ververica4
Apache Flink
© 2019 Ververica5
2.5 B2M 985 PB
Sub-
Second 100TB
containers data size throughput latency state size
events / sec
Apache Flink at
The "Singles Day" (11/11/2019)
© 2019 Ververica6
Why Stream Processing?
© 2019 Ververica7
Stream Processing is
real-time data processing
and real-time data-driven actions
© 2019 Ververica8
Stream Processing is
the unification of real-time and
offline analytics
© 2019 Ververica9
Stream Processing is
the intersection of data
analytics and applications
© 2019 Ververica10
Stream Processing is
to event-driven applications what
the database is to request/response apps
© 2019 Ververica11
Stream Processing is
a flexible and extensible architecture
for data-driven applications
© 2019 Ververica12
Application /
Business Logic
Stream
Processor
(Datalake, Database)
Application /
Business Logic
Batch Proc. or Req/resp. Stream Processing
Stream Processing changes how Applications and Data interact
request/trigger result/response
event stream event stream
events are the data
events act as triggers
application logic triggered
by events/changes
© 2019 Ververica13
What is Stream Processing for?
data changes slowly
Ad-hoc queries, data exploration,
ML model training
Batch Proc. or Req/resp.
Most business logic
query/logic changes fast data changes fast
query/logic changes slowly
Continuous Streaming
© 2019 Ververica14
more lag time
data warehousing
OLAP / BI / reporting
continuous monitoring
(position, risk, …)
real-time ML model
training/evaluation
distributed
OLTP-style apps
more real time
continuous
ETL
real-time behavior modeling
(recommenders, pricing, ..)
The Spectrum of Streaming Data Use Cases
machine learning
model training
unified offline/
real-time analytics
real-time alerts
(fraud, security, …)
© 2019 Ververica15
Stateful Single Record Processing
© 2019 Ververica16
Everything is a Stream
Streams Of Records in a Log or MQ
© 2019 Ververica17
Everything is a Stream
Stream of Requests/Responses to/from Services
Service
DB
à event sourcing architecture
GET /a/b POST /b/c PUT /e/f 200 404 200 200 403
© 2019 Ververica18
Everything is a Stream
Stream of Rows in a Table or in Files
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am
…
© 2019 Ververica19
Everything is a Stream
Stream of Rows in a Table or in Files
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am
…
a batch
© 2019 Ververica20
Everything is a Stream
Streams may span storage systems
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-11
10:00pm
…
Parquet files Avro records
more distant past
(e.g., compressed files in DFS/Object Store)
recent past
(e.g., events in MQ/Log)
© 2019 Ververica21
© 2019 Ververica22
Bounded and Unbounded Streams
© 2019 Ververica23
Components of a Streaming Data Architecture
Event producers
(applications, servers,
databases, sensors)
Log / Stream Storage
(Pulsar)
Stream
Processing
Stream
Processing
Stream
Processing
Results (Views)
(K/V stores, databases)
Triggered
Applications
(Apache Flink)
© 2019 Ververica24
Flink Runtime
Stateful Computations over Data Streams
Stateful
Stream Processing
Streams, State, Time
Event-driven
Applications
Stateful Functions
Streaming Analytics
SQL and Tables
Apache Flink: Analytics and Applications on Streaming Data
© 2019 Ververica25
Stateful Stream
Processing
© 2019 Ververica26
Flink Runtime
Stateful Computations over Data Streams
Stateful
Stream Processing
Streams, State, Time
Event-driven
Applications
Stateful Functions
Streaming Analytics
SQL and Tables
Apache Flink: Analytics and Applications on Streaming Data
© 2019 Ververica27
Stateful Stream Processing
Computation
Computation
Computation
Computation
Source (Stream)
Source (Static)
Sink Sink
Transformation
State
State
State
© 2019 Ververica28
Example Use Cases
•Real time search and recommendation models (e.g., Alibaba)
•Build a real-time session behavior profile of users (e.g., Netflix)
•Real time trade settlement dashboard (e.g., UBS)
•Real time revenue accounting (various AdTechs)
•Machine Learning-based anomaly/fraud detection (e.g., ING, Microsoft)
•Real-time data refinement and data pipelines (many)
© 2019 Ververica29
DataStream API
Source
Transformation
Windowed Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink
© 2019 Ververica30
DataStream API Process Functions
30
© 2019 Ververica31
Streaming Analytics
© 2019 Ververica32
Flink Runtime
Stateful Computations over Data Streams
Stateful
Stream Processing
Streams, State, Time
Event-driven
Applications
Stateful Functions
Streaming Analytics
SQL and Tables
Apache Flink: Analytics and Applications on Streaming Data
© 2019 Ververica33
Example Use Cases
•Realtime Analytics Platforms (e.g., Alibaba, Uber, Lyft, Yelp!, Tencent)
•Materializing Views (dashboards, data marts)
•ETL - batch and continuous
•Machine Learning Training (Alibaba, new ML library)
© 2019 Ververica34
SQL / Table API – Batch Queries
SQL
Query
Batch Query
Execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL '1' HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL '1' HOUR), room
Full TPC-DS support
in Flink 1.10
© 2019 Ververica35
Interpreting Streams as Tables
© 2019 Ververica36
SQL / Table API – Streaming Data Case
SELECT
room,
TUMBLE_END(rowtime, INTERVAL '1' HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL '1' HOUR), room
SQL
Query
Interpret Stream
as Table
Incremental
Query Execution output result
changes as stream
update database
with changes
© 2019 Ververica37
FLIP-72
Add Pulsar connectors and Catalog to Apache Flink
> CREATE CATALOG my_pulsar (
‘type’ = ‘pulsar’,
‘adminUrl’ = ‘localhost:9092’
);
> USE my_pulsar;
> INSERT INTO aggregations
SELECT
room,
TUMBLE_END(rowtime, INTERVAL '1' HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL '1' HOUR), room
© 2019 Ververica38
Materialized Views Example
logCDC
Continuous
SQL Query
Continuous
SQL Query
Continuous
SQL Query
Materialized View
Materialized View
Archive
© 2019 Ververica39
Materialized Views Example
logCDC
Continuous
SQL Query Materialized Views
View Materialization
(streaming)
Dashboard:
Many short queries
(batch)
© 2019 Ververica40
Many handy SQL features: Temporal Joins, Pattern Matching, …
SELECT tf.time
tf.price * rh.rate as conv_fare
FROM taxiFare AS tf
LATERAL TABLE (Rates(tf.time)) AS rh
WHERE tf.currency = rh.currency;
© 2019 Ververica41
Event-driven
Applications
© 2019 Ververica42
Flink Runtime
Stateful Computations over Data Streams
Stateful
Stream Processing
Streams, State, Time
Event-driven
Applications
Stateful Functions
Streaming Analytics
SQL and Tables
Apache Flink: Analytics and Applications on Streaming Data
© 2019 Ververica43
Classical Tiered Application Architecture
App App App
© 2019 Ververica44
Consistency in Database Applications
App App App
© 2019 Ververica45
Consistency in Database Applications
App App App
For any failure in any call, it becomes
hard to reason about what effects did or did
not already happen
X
© 2019 Ververica46
Applying the Stream Processing Approach to Applications
App App App
X
© 2019 Ververica47
Stateful Functions
© 2019 Ververica48
Stream Processing F-a-a-S
λ
λ
λ
λ
simplicity / generality
state management
composability
lightweight resources
performance
event-driven
Can we combine some
of these properties
?
© 2019 Ververica49
Stateful Functions
f(a,b)
f(a,b)
f(a,b)
f(a,b)
f(a,b) mass storage
(S3, GCF, ECS, HDFS, …)
event ingress
event egress
f(a,b)
snapshot
state
© 2019 Ververica50
Stateful Functions compared to Stream Proc. & Apache Flink
Apache Flink
DataStream/Table
Stateful Functions
f(a,b)
f(a,b)
f(a,b)
Pool of Resources
(Apache Flink Cluster)
Arbitrary Function-to-Function
messaging. Not restricted to a DAG.
Functions are multiplexed and share resources.
Makes it possible to run many very small jobs.
Solves two major challenges
f(a,b)
f(a,b)
f(a,b)
f(a,b)
f(a,b)
© 2019 Ververica51
Example: Ride Sharing App
Driver status
updates
Passenger
ride requests
Ride
status update
Driver
Ride
Pass-
enger
Geo-
index
update create
bill
Inform /
book
bid
lookup
update cell
seeking
confirmed
riding
free
bidding
booked
© 2019 Ververica52
data preparation
combining knowledge/information
filtering, enriching,
aggregating, joining events
coordination,
(interacting) state machines
complex event/state
interactions
“occasional” actions or
spiky loads
compute-intensive
or blocking
Stream Processing
Streaming SQL
Stateful Functions F-a-a-S
f(a,b)
f(a,b)
f(a,b)
λ
λ
λ
λ
state-centricevent/stream-centric stateless / compute-centric
© 2019 Ververica53
Putting it all together
f(a,b)
f(a,b)
f(a,b)
λ
λ
λ
λ
FaaS
render map/route image
create a receipt PDF
send email
Stateful Functions
ride life-cycle
driver-to-ride matching
Stream Processing
traffic models
demand forecast & pricing
Billing
Passenger updates
Driver position updates
Driver status updates
© 2019 Ververica54
❤
Thank You!

More Related Content

What's hot

Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
HostedbyConfluent
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 

What's hot (20)

Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
 
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-events
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Event streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architectureEvent streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architecture
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 

Similar to Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman

Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 

Similar to Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman (20)

KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
 
Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Facilitez votre transition DevOps grâce à l'automatisation de votre infras...
 Facilitez votre transition DevOps grâce à l'automatisation de votre infras... Facilitez votre transition DevOps grâce à l'automatisation de votre infras...
Facilitez votre transition DevOps grâce à l'automatisation de votre infras...
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
 
How to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent IIHow to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent II
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour Dallas
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
 

More from StreamNative

Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Recently uploaded

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 

Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman

  • 1. © 2019 Ververica Seth Wiesman Senior Solutions Architect @ Ververica Committer Apache Flink Unified Data Processing with Apache Flink and Apache Pulsar
  • 2. © 2019 Ververica2 About Ververica (the company formerly known as “data Artisans”) Original Creators of Apache Flink® Enterprise Stream Processing With Ververica Platform Subsidiary of Alibaba Group
  • 5. © 2019 Ververica5 2.5 B2M 985 PB Sub- Second 100TB containers data size throughput latency state size events / sec Apache Flink at The "Singles Day" (11/11/2019)
  • 6. © 2019 Ververica6 Why Stream Processing?
  • 7. © 2019 Ververica7 Stream Processing is real-time data processing and real-time data-driven actions
  • 8. © 2019 Ververica8 Stream Processing is the unification of real-time and offline analytics
  • 9. © 2019 Ververica9 Stream Processing is the intersection of data analytics and applications
  • 10. © 2019 Ververica10 Stream Processing is to event-driven applications what the database is to request/response apps
  • 11. © 2019 Ververica11 Stream Processing is a flexible and extensible architecture for data-driven applications
  • 12. © 2019 Ververica12 Application / Business Logic Stream Processor (Datalake, Database) Application / Business Logic Batch Proc. or Req/resp. Stream Processing Stream Processing changes how Applications and Data interact request/trigger result/response event stream event stream events are the data events act as triggers application logic triggered by events/changes
  • 13. © 2019 Ververica13 What is Stream Processing for? data changes slowly Ad-hoc queries, data exploration, ML model training Batch Proc. or Req/resp. Most business logic query/logic changes fast data changes fast query/logic changes slowly Continuous Streaming
  • 14. © 2019 Ververica14 more lag time data warehousing OLAP / BI / reporting continuous monitoring (position, risk, …) real-time ML model training/evaluation distributed OLTP-style apps more real time continuous ETL real-time behavior modeling (recommenders, pricing, ..) The Spectrum of Streaming Data Use Cases machine learning model training unified offline/ real-time analytics real-time alerts (fraud, security, …)
  • 15. © 2019 Ververica15 Stateful Single Record Processing
  • 16. © 2019 Ververica16 Everything is a Stream Streams Of Records in a Log or MQ
  • 17. © 2019 Ververica17 Everything is a Stream Stream of Requests/Responses to/from Services Service DB à event sourcing architecture GET /a/b POST /b/c PUT /e/f 200 404 200 200 403
  • 18. © 2019 Ververica18 Everything is a Stream Stream of Rows in a Table or in Files 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am …
  • 19. © 2019 Ververica19 Everything is a Stream Stream of Rows in a Table or in Files 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am … a batch
  • 20. © 2019 Ververica20 Everything is a Stream Streams may span storage systems 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-11 10:00pm … Parquet files Avro records more distant past (e.g., compressed files in DFS/Object Store) recent past (e.g., events in MQ/Log)
  • 22. © 2019 Ververica22 Bounded and Unbounded Streams
  • 23. © 2019 Ververica23 Components of a Streaming Data Architecture Event producers (applications, servers, databases, sensors) Log / Stream Storage (Pulsar) Stream Processing Stream Processing Stream Processing Results (Views) (K/V stores, databases) Triggered Applications (Apache Flink)
  • 24. © 2019 Ververica24 Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Event-driven Applications Stateful Functions Streaming Analytics SQL and Tables Apache Flink: Analytics and Applications on Streaming Data
  • 25. © 2019 Ververica25 Stateful Stream Processing
  • 26. © 2019 Ververica26 Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Event-driven Applications Stateful Functions Streaming Analytics SQL and Tables Apache Flink: Analytics and Applications on Streaming Data
  • 27. © 2019 Ververica27 Stateful Stream Processing Computation Computation Computation Computation Source (Stream) Source (Static) Sink Sink Transformation State State State
  • 28. © 2019 Ververica28 Example Use Cases •Real time search and recommendation models (e.g., Alibaba) •Build a real-time session behavior profile of users (e.g., Netflix) •Real time trade settlement dashboard (e.g., UBS) •Real time revenue accounting (various AdTechs) •Machine Learning-based anomaly/fraud detection (e.g., ING, Microsoft) •Real-time data refinement and data pipelines (many)
  • 29. © 2019 Ververica29 DataStream API Source Transformation Windowed Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Streaming Dataflow Source Transform Window (state read/write) Sink
  • 30. © 2019 Ververica30 DataStream API Process Functions 30
  • 32. © 2019 Ververica32 Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Event-driven Applications Stateful Functions Streaming Analytics SQL and Tables Apache Flink: Analytics and Applications on Streaming Data
  • 33. © 2019 Ververica33 Example Use Cases •Realtime Analytics Platforms (e.g., Alibaba, Uber, Lyft, Yelp!, Tencent) •Materializing Views (dashboards, data marts) •ETL - batch and continuous •Machine Learning Training (Alibaba, new ML library)
  • 34. © 2019 Ververica34 SQL / Table API – Batch Queries SQL Query Batch Query Execution SELECT room, TUMBLE_END(rowtime, INTERVAL '1' HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), room Full TPC-DS support in Flink 1.10
  • 35. © 2019 Ververica35 Interpreting Streams as Tables
  • 36. © 2019 Ververica36 SQL / Table API – Streaming Data Case SELECT room, TUMBLE_END(rowtime, INTERVAL '1' HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), room SQL Query Interpret Stream as Table Incremental Query Execution output result changes as stream update database with changes
  • 37. © 2019 Ververica37 FLIP-72 Add Pulsar connectors and Catalog to Apache Flink > CREATE CATALOG my_pulsar ( ‘type’ = ‘pulsar’, ‘adminUrl’ = ‘localhost:9092’ ); > USE my_pulsar; > INSERT INTO aggregations SELECT room, TUMBLE_END(rowtime, INTERVAL '1' HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), room
  • 38. © 2019 Ververica38 Materialized Views Example logCDC Continuous SQL Query Continuous SQL Query Continuous SQL Query Materialized View Materialized View Archive
  • 39. © 2019 Ververica39 Materialized Views Example logCDC Continuous SQL Query Materialized Views View Materialization (streaming) Dashboard: Many short queries (batch)
  • 40. © 2019 Ververica40 Many handy SQL features: Temporal Joins, Pattern Matching, … SELECT tf.time tf.price * rh.rate as conv_fare FROM taxiFare AS tf LATERAL TABLE (Rates(tf.time)) AS rh WHERE tf.currency = rh.currency;
  • 42. © 2019 Ververica42 Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Event-driven Applications Stateful Functions Streaming Analytics SQL and Tables Apache Flink: Analytics and Applications on Streaming Data
  • 43. © 2019 Ververica43 Classical Tiered Application Architecture App App App
  • 44. © 2019 Ververica44 Consistency in Database Applications App App App
  • 45. © 2019 Ververica45 Consistency in Database Applications App App App For any failure in any call, it becomes hard to reason about what effects did or did not already happen X
  • 46. © 2019 Ververica46 Applying the Stream Processing Approach to Applications App App App X
  • 48. © 2019 Ververica48 Stream Processing F-a-a-S λ λ λ λ simplicity / generality state management composability lightweight resources performance event-driven Can we combine some of these properties ?
  • 49. © 2019 Ververica49 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state
  • 50. © 2019 Ververica50 Stateful Functions compared to Stream Proc. & Apache Flink Apache Flink DataStream/Table Stateful Functions f(a,b) f(a,b) f(a,b) Pool of Resources (Apache Flink Cluster) Arbitrary Function-to-Function messaging. Not restricted to a DAG. Functions are multiplexed and share resources. Makes it possible to run many very small jobs. Solves two major challenges f(a,b) f(a,b) f(a,b) f(a,b) f(a,b)
  • 51. © 2019 Ververica51 Example: Ride Sharing App Driver status updates Passenger ride requests Ride status update Driver Ride Pass- enger Geo- index update create bill Inform / book bid lookup update cell seeking confirmed riding free bidding booked
  • 52. © 2019 Ververica52 data preparation combining knowledge/information filtering, enriching, aggregating, joining events coordination, (interacting) state machines complex event/state interactions “occasional” actions or spiky loads compute-intensive or blocking Stream Processing Streaming SQL Stateful Functions F-a-a-S f(a,b) f(a,b) f(a,b) λ λ λ λ state-centricevent/stream-centric stateless / compute-centric
  • 53. © 2019 Ververica53 Putting it all together f(a,b) f(a,b) f(a,b) λ λ λ λ FaaS render map/route image create a receipt PDF send email Stateful Functions ride life-cycle driver-to-ride matching Stream Processing traffic models demand forecast & pricing Billing Passenger updates Driver position updates Driver status updates