SlideShare a Scribd company logo
Stream Processing Advanced
with Flink and Pulsar
Till Rohrmann, Engineering Lead at Ververica, @stsffap
Addison Higham, Chief Architect at StreamNative, @addisonjh
Why is Stream Processing Important?
Spectrum of data streaming applications
offline real time
Data warehousing
OLAP / BI / reporting
Machine learning & model
training
Continuous
ETL
Unified offline /
real-time analytics
Continuous
monitoring
(position, risk, …)
Real-time behaviour
modeling (pricing,
recommenders, …)
Real-time alerts
(fraud, security, …)
Real-time ML
model
training/evaluation
Distributed OLTP
applications
Choice of data architecture defines which uses cases you can cover!
Modern Streaming Data Architecture
Stream
Processing
Stream
Processing
Stream
Processing
Stream Storage
Long Term Storage
Results / Views
Triggered
Applications
Event Producers
Reference Streaming Data Architecture: Flink + Pulsar
Stream Storage
Long Term Storage
Results / Views
Stateful
Functions
Event Producers
Apache Flink: Analytics and Applications on Streaming Data
Flink Runtime
Stateful Computations over Data Streams
Stateful Stream
Processing
Streams, State, Time
Streaming Analytics
SQL and Tables
Event-driven
Applications
Stateful Functions
Apache Flink in Numbers
● Contributors: 887
● Most active Apache ML
● Most active sources by visits and commits (#2)
● Github stars: 16.4k
● Commits: >27k
● Releases: 1.13 latest major release
● Maven downloads per month: ~ 170k
● LOC: > 1.8 million
● Latency: < 1s
● Throughput: 4 billion events/s, 7 TB/s
● Input size: 100 TB for batch job
Some Apache Flink Users
One of The Most Advanced Stream Processors
● First class support for state
○ Asynchronous barrier checkpointing algorithm to create globally consistent checkpoints
● Event-time support
○ Correctness under delayed events
● End-to-end exactly once processing guarantees
○ Correctness under faults
● Resource elastic
○ Flink applications’ resources can be adjusted to the actual need
● Unified batch-streaming APIs
○ A single query to process historical as well as live data
● Stream: Sequence of data which is made available over time
● All computation processes chunks of data over time producing results over
time → Stream processing
○ E.g. reading data from disks is done in streaming fashion
● Events can be of various forms
● Decisive difference: Is my stream bounded or not?
Everything Is a Stream
SQL / Table API: Unified Batch & Stream Processing
SQL Query
Batch query
execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL ‘1’ HOUR),
room
Stream-Table Duality
User lastLogin
Alice 2021-01-01
Bob 2021-01-02
User lastLogin
Alice 2021-01-14
Bob 2021-02-01
Eve 2021-01-21
Alice, 2021-01-01
Bob, 2021-01-02
Alice, 2021-01-14
Eve, 2021-01-21
Bob, 2021-02-01
SQL / Table API: Running The Same Query On Streams
SQL Query
Incremental
query execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL ‘1’ HOUR),
room
Interpret stream as
table
Data Has to Come From Somewhere
● Flink is a powerful compute engine for unified batch and streaming
processing that is pushing boundaries of data processing
● For Flink really to shine, a storage system that supports efficient stream and
batch ingestion is required
Pulsar offers exactly that!
● Apache Pulsar is evolving to meet the needs of a system like Flink, offering
both a low-latency, high bandwidth streaming API as well as a flexible
architecture to support batch access
Why is Pulsar a Great Fit For Streaming And Batch?
Streaming
● Like other streaming systems,
Pulsar can be used like a
distributed scale-out log, providing
consumer controlled offsets and
high throughput through parallel
partitions of topics
Batch
● In addition, Pulsar’s segment-
based, multi-layer architecture
allows for applications to access
historical data more directly via
the underlying storage
Producer Consumer
Broker 2
Broker 1 Broker 3
Topic1-Part1 Topic1-Part2 Topic1-Part3
Segment 2
Segment 1 Segment 3 Segment X
Bookie
1
Bookie
2
Bookie
3
Segment 1
Segment X
Segment 1
Segment X
Segment 1
Segment X
Offload
Segments
+
Flink + Pulsar
Adoption at Scale
While Flink and Pulsar are both widely used independently, together they offer a
strong technology for unified batch and stream storage and compute
Many large organizations are adopting Flink + Pulsar for complex workloads that
can take advantage of the strengths of both systems
State of The Pulsar-Flink Connector Today
● Pulsar-Flink connector supports Flink 1.11, 1.12 and soon 1.13
● Exactly once source and sink via producer deduplication
● Full support for Flink Schema with integration to Pulsar Schema Registry
● Full Flink SQL support for batch and streaming modes
○ Also support for an “upsert” mode which can reason about inserts, updates, and deletes
● Support for KeyShared subscriptions for higher source parallelism than
number of topic-partitions
● https://github.com/streamnative/pulsar-flink
+
Demo
The Road Ahead
Many in-progress features are being developed
● Pulsar-Flink connector being upstreamed into Flink
○ Targeted for Flink 1.14
● Pulsar Transaction Support
○ Pulsar transaction’s allow for stronger exactly-once processing guarantees
● Native Pulsar Watermarking
○ Pulsar is able to broker watermarks between producers and consumers
● Parallel Batch Source
○ Query multiple segments in parallel for higher throughput
● Unified Batch + Streaming Source
○ Using Pulsar’s batch mode for catch up and then switching over to streaming ingestion
Job 1
Flink Support For Pulsar Transactions
Pulsar transactions are GA in Pulsar 2.8.0
Support in Flink is nearing completion, which can allows for end-to-end exactly-
once processing guarantees!
Learn more about transactions at the talk Exactly-Once Made Easy: Transactional Messaging in
Apache Pulsar at 11:30 AM
Job 2
Job 3
tx
The Importance of Watermarks
High-quality watermarks are crucial for correct and
stable stream processing jobs.
In order for results to be correct, we need to take into
account “Event Time” rather than solely relying on
“Processing Time”
This is especially important when replaying or
processing older streaming data
“Watermarks” advance the event-time clock, if the clock
does not “tick” accurately, the results will not be correct
Ideal
Reality
Skew
Event Time
Native Pulsar Watermarking
Pulsar is adding support for watermarks
● Producers have a new API for injecting
watermarks into the stream, with
multiple producers potentially producing
● Watermarks are broadcast across all
partitions, which allows the broker to
dispatch more accurate watermarks to
consumers, in both realtime and
historical scenarios
● The API is designed to be simple to
integrate with Flink, removing the
complexity of accurate watermark
generation from the developer
// Pulsar
for (int = 0; i < NUM_MESSAGES; i++) {
producer.newMessage()
.value("hello with event time! " + i)
.eventTime(
System.currentTimeMillis()).sendAsync();
}
Producer
.newWatermark()
.eventTime(
System.currentTimeMillis()).sendAsync();
// Flink
PulsarSource<String> pulsarSource = new
FlinkPulsarSource(topic, ...);
pulsarSource.assignTimestampsAndWatermarks(
PulsarWatermarkStrategy.forPulsarWatermarks());
Shared Event-Time Domain
End-to-End Watermarks
Job 1
Job 2
Job 3
Producer
W|24
W|17 W|12
W|12
W|8
W|4
Pulsar + Flink and StreamNative + Ververica
Pulsar + Flink community collaboration
● Contributing Pulsar-Flink connector to Flink repository
● Evolve connector to support more advanced Pulsar and Flink features
● Build best in the class open source stream processing platform
StreamNative + Ververica Cloud partnership
● Help customers to unlock the full potential of stream processing
Thanks a lot for your attention!
Questions?

More Related Content

What's hot

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
Knoldus Inc.
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Julian Hyde
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
Databricks
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Apache flink
Apache flinkApache flink
Apache flink
pranay kumar
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
mxmxm
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent
 

What's hot (20)

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Apache flink
Apache flinkApache flink
Apache flink
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 

Similar to Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Neil Avery
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
Timothy Spann
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
Monal Daxini
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
Nan Zhu
 

Similar to Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote (20)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 

More from StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

  • 1. Stream Processing Advanced with Flink and Pulsar Till Rohrmann, Engineering Lead at Ververica, @stsffap Addison Higham, Chief Architect at StreamNative, @addisonjh
  • 2. Why is Stream Processing Important?
  • 3.
  • 4.
  • 5. Spectrum of data streaming applications offline real time Data warehousing OLAP / BI / reporting Machine learning & model training Continuous ETL Unified offline / real-time analytics Continuous monitoring (position, risk, …) Real-time behaviour modeling (pricing, recommenders, …) Real-time alerts (fraud, security, …) Real-time ML model training/evaluation Distributed OLTP applications Choice of data architecture defines which uses cases you can cover!
  • 6. Modern Streaming Data Architecture Stream Processing Stream Processing Stream Processing Stream Storage Long Term Storage Results / Views Triggered Applications Event Producers
  • 7. Reference Streaming Data Architecture: Flink + Pulsar Stream Storage Long Term Storage Results / Views Stateful Functions Event Producers
  • 8.
  • 9. Apache Flink: Analytics and Applications on Streaming Data Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Streaming Analytics SQL and Tables Event-driven Applications Stateful Functions
  • 10. Apache Flink in Numbers ● Contributors: 887 ● Most active Apache ML ● Most active sources by visits and commits (#2) ● Github stars: 16.4k ● Commits: >27k ● Releases: 1.13 latest major release ● Maven downloads per month: ~ 170k ● LOC: > 1.8 million ● Latency: < 1s ● Throughput: 4 billion events/s, 7 TB/s ● Input size: 100 TB for batch job
  • 12. One of The Most Advanced Stream Processors ● First class support for state ○ Asynchronous barrier checkpointing algorithm to create globally consistent checkpoints ● Event-time support ○ Correctness under delayed events ● End-to-end exactly once processing guarantees ○ Correctness under faults ● Resource elastic ○ Flink applications’ resources can be adjusted to the actual need ● Unified batch-streaming APIs ○ A single query to process historical as well as live data
  • 13. ● Stream: Sequence of data which is made available over time ● All computation processes chunks of data over time producing results over time → Stream processing ○ E.g. reading data from disks is done in streaming fashion ● Events can be of various forms ● Decisive difference: Is my stream bounded or not? Everything Is a Stream
  • 14. SQL / Table API: Unified Batch & Stream Processing SQL Query Batch query execution SELECT room, TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL ‘1’ HOUR), room
  • 15. Stream-Table Duality User lastLogin Alice 2021-01-01 Bob 2021-01-02 User lastLogin Alice 2021-01-14 Bob 2021-02-01 Eve 2021-01-21 Alice, 2021-01-01 Bob, 2021-01-02 Alice, 2021-01-14 Eve, 2021-01-21 Bob, 2021-02-01
  • 16. SQL / Table API: Running The Same Query On Streams SQL Query Incremental query execution SELECT room, TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL ‘1’ HOUR), room Interpret stream as table
  • 17.
  • 18. Data Has to Come From Somewhere ● Flink is a powerful compute engine for unified batch and streaming processing that is pushing boundaries of data processing ● For Flink really to shine, a storage system that supports efficient stream and batch ingestion is required Pulsar offers exactly that! ● Apache Pulsar is evolving to meet the needs of a system like Flink, offering both a low-latency, high bandwidth streaming API as well as a flexible architecture to support batch access
  • 19. Why is Pulsar a Great Fit For Streaming And Batch? Streaming ● Like other streaming systems, Pulsar can be used like a distributed scale-out log, providing consumer controlled offsets and high throughput through parallel partitions of topics Batch ● In addition, Pulsar’s segment- based, multi-layer architecture allows for applications to access historical data more directly via the underlying storage Producer Consumer Broker 2 Broker 1 Broker 3 Topic1-Part1 Topic1-Part2 Topic1-Part3 Segment 2 Segment 1 Segment 3 Segment X Bookie 1 Bookie 2 Bookie 3 Segment 1 Segment X Segment 1 Segment X Segment 1 Segment X Offload Segments
  • 20. +
  • 22. Adoption at Scale While Flink and Pulsar are both widely used independently, together they offer a strong technology for unified batch and stream storage and compute Many large organizations are adopting Flink + Pulsar for complex workloads that can take advantage of the strengths of both systems
  • 23. State of The Pulsar-Flink Connector Today ● Pulsar-Flink connector supports Flink 1.11, 1.12 and soon 1.13 ● Exactly once source and sink via producer deduplication ● Full support for Flink Schema with integration to Pulsar Schema Registry ● Full Flink SQL support for batch and streaming modes ○ Also support for an “upsert” mode which can reason about inserts, updates, and deletes ● Support for KeyShared subscriptions for higher source parallelism than number of topic-partitions ● https://github.com/streamnative/pulsar-flink
  • 25. The Road Ahead Many in-progress features are being developed ● Pulsar-Flink connector being upstreamed into Flink ○ Targeted for Flink 1.14 ● Pulsar Transaction Support ○ Pulsar transaction’s allow for stronger exactly-once processing guarantees ● Native Pulsar Watermarking ○ Pulsar is able to broker watermarks between producers and consumers ● Parallel Batch Source ○ Query multiple segments in parallel for higher throughput ● Unified Batch + Streaming Source ○ Using Pulsar’s batch mode for catch up and then switching over to streaming ingestion
  • 26. Job 1 Flink Support For Pulsar Transactions Pulsar transactions are GA in Pulsar 2.8.0 Support in Flink is nearing completion, which can allows for end-to-end exactly- once processing guarantees! Learn more about transactions at the talk Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar at 11:30 AM Job 2 Job 3 tx
  • 27. The Importance of Watermarks High-quality watermarks are crucial for correct and stable stream processing jobs. In order for results to be correct, we need to take into account “Event Time” rather than solely relying on “Processing Time” This is especially important when replaying or processing older streaming data “Watermarks” advance the event-time clock, if the clock does not “tick” accurately, the results will not be correct Ideal Reality Skew Event Time
  • 28. Native Pulsar Watermarking Pulsar is adding support for watermarks ● Producers have a new API for injecting watermarks into the stream, with multiple producers potentially producing ● Watermarks are broadcast across all partitions, which allows the broker to dispatch more accurate watermarks to consumers, in both realtime and historical scenarios ● The API is designed to be simple to integrate with Flink, removing the complexity of accurate watermark generation from the developer // Pulsar for (int = 0; i < NUM_MESSAGES; i++) { producer.newMessage() .value("hello with event time! " + i) .eventTime( System.currentTimeMillis()).sendAsync(); } Producer .newWatermark() .eventTime( System.currentTimeMillis()).sendAsync(); // Flink PulsarSource<String> pulsarSource = new FlinkPulsarSource(topic, ...); pulsarSource.assignTimestampsAndWatermarks( PulsarWatermarkStrategy.forPulsarWatermarks());
  • 29. Shared Event-Time Domain End-to-End Watermarks Job 1 Job 2 Job 3 Producer W|24 W|17 W|12 W|12 W|8 W|4
  • 30. Pulsar + Flink and StreamNative + Ververica Pulsar + Flink community collaboration ● Contributing Pulsar-Flink connector to Flink repository ● Evolve connector to support more advanced Pulsar and Flink features ● Build best in the class open source stream processing platform StreamNative + Ververica Cloud partnership ● Help customers to unlock the full potential of stream processing
  • 31. Thanks a lot for your attention! Questions?

Editor's Notes

  1. Faster results → faster business decisions → competitive advantage → more money
  2. Generalization of batch processing Processing of historical data Processing of live data Processing both historical data + live data
  3. Brings together the worlds of data analytics and event-driven applications Building powerful applications which benefit from real time analytics (e.g. fraud detection, match-making, etc.)
  4. Real time search and recommendation models (e.g., Alibaba) Build a real-time session behavior profile of users (e.g., Netflix) Real time trade settlement dashboard (e.g., UBS) Real time revenue accounting (various AdTechs) Machine Learning-based anomaly/fraud detection (e.g., ING, Microsoft) Real-time data refinement and data pipelines (many) Realtime Analytics Platforms (e.g., Alibaba, Uber, Lyft, Yelp!, Tencent) Materializing Views (dashboards, data marts) ETL - batch and continuous Machine Learning Training (Alibaba, new ML library) Sub-second latency Peak throughput: 4B Event/s Throughput size: 7 TB/s Largest Batch Job Input size: 100TB Flink cluster size: 30K
  5. If everything is a stream, can I treat it with a single API? TableAPI, SQL and DataStream can handle bounded (batch) and unbounded streams Write the program once → Run it on live and historic data Lower development and maintenance costs Having a single engine for batch and stream processing Do batch processing when catching up with the stream One engine to run batch & streaming workloads Scheduler which can schedule bounded and unbounded tasks Exploiting bounded streams property for runtime optimizations More batching of results Out of order processing Stage wise execution possible
  6. In this section, I will give a brief overview of watermarks and discuss correctness and skew a bit
  7. Watermarks are assertions about the progression of event time, according to the data producer. Pulsar is brokering the assertions from producer to consumer. The quality of the watermarks (and hence the job output) is determined by the producer; system internals (partitioning, I/O performance, etc) are effectively factored out.