Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Keynote
Event-Driven
Applications
Done Right
Sijie Guo
Co-Founder & CEO • StreamNative
Matteo Merli
CTO • StreamNative
Sijie Guo
Co-Founder & CEO
StreamNative @sijieg @sijie
● Co-Founder & CEO, StreamNative
● PMC Member of Apache Pulsar
● PMC Chair of Apache BookKeeper
● Ex Co-Founder, Streamlio
● Ex Twitter, Yahoo!
Bar Chart Race - Pull Requests
HUGE GROWTH IN THE COMMUNITY
Top-10 PRs in messaging and streaming
HUGE GROWTH IN THE COMMUNITY
Pulsar ranked as a Top 5 ASF project
HUGE GROWTH IN THE COMMUNITY
> 560+
Contributors
> 10,000+
Commits
> 7,000+
Slack Members
> 1,000+
Organizations
using Pulsar
Pulsar has already been deployed by thousands
of companies across the globe
HUGE GROWTH IN THE COMMUNITY
New Platform Around
Messaging and Streaming
SAY WHAT YOU MEAN
Applications Pipelines
Apache Pulsar
Payment service
Inventory management
Personal recommendations
Shipment tracking/alerting
Fleet management
Customer
communication
Dynamic pricing
Geofencing
Real-time analytics
Customer 360
Streaming ETL
Cloud Lakehouse Ingestion
Change data capture
Complex event processing
Log aggregation
ML Pipelines
Application Domain Data Domain
Applications
Payment service
Inventory management
Personal recommendations
Shipment tracking/alerting
Fleet management
Customer
communication
Dynamic pricing
Geofencing
Application Domain
Applications are built with HTTP, REST, or some
other protocol made from requests and replies
The Evolution of Microservices
Request
Reply
This works well when ecosystems are small …
The Evolution of Microservices
But gets harder as they grow more complex and
more interconnected
The Evolution of Microservices
Services are tightly coupled
The Evolution of Microservices
So if one service fails …
The Evolution of Microservices
x
…or even just runs slowly
The Evolution of Microservices
The fall-out could be much larger
The Evolution of Microservices
x
x
x
x
Others end up feeling that pain
The Evolution of Microservices
x
x
x
x
x
At company scale, the majority of processes
are asynchronous to one another
The Evolution of Microservices
Billing Inventory Fulfillment Fraud
So it makes sense to DECOUPLE services from
one another
The Evolution of Microservices
Billing Inventory Fulfillment Fraud
DECOUPLE
It evolves towards event-driven architecture
The Evolution of Microservices
Billing Inventory Fulfillment Fraud
MESSAGING & STREAMING PLATFORM
Messaging & Streaming Technologies Old and New
The Evolution of Microservices
MESSAGING & STREAMING PLATFORM
1. Data abstraction: what is the right data abstraction?
2. API: what is the right API to provide? Is it just Streams?
3. Primitives: what are primitives to provide?
4. Processing Semantics: How to meet business needs?
5. Tools: what are the tools to offer?
Matteo Merli
CTO
StreamNative @merlimat @merlimat
● CTO, StreamNative
● Co-creator and PMC Chair of Apache Pulsar
● Ex Co-Founder, Streamlio
● Ex Splunk, Yahoo!
How event driven
applications have
evolved
Example of a complex event-driven application
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Data abstraction
2. API
3. Primitives
4. Processing Semantics
5. Tools
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Processing Semantics
5. 3 Layers of Abstraction
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Processing Semantics
5. 3 Layers of Abstraction
● “Asynchronously broadcast messages to a set of consumers”
● Decouples producers from consumers
● Subscribers state is managed by the system
PUBLISH/SUBSCRIBE, IN THE LARGE
Topics
● Infinite Retention
● Source of truth
● Can be read from any position
● Events can be stored:
○ By row
○ Columnar format → Lakehouse Integration
Replayable Streams
Topics
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Processing Semantics
5. 3 Layers of Abstraction
● Unbound sequence of events
● Ordered
● Replayable
● Partitioned for scalability
Stream
Stream, Queue, and Table
● Queue is the fundamental “decoupling” mechanism
● The “Queue” is the unit to distribute work across a set of
workers
● Multiple routing strategies:
○ Round-robin
○ Per-key ordering
● Individual message tracking
● Delivery delay
Queue
Stream, Queue, and Table
● View the data in the topic as a “table”
● Access last value for each message key
● Can be used for:
○ Sharing state between processes
○ Distributed cache with local reads
● Integrated with transactions
Table
Stream, Queue, and Table
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Processing Semantics
5. 3 Layers of Abstraction
Shared vs Exclusive
Shared Producers Shared Producers Exclusive Producers
Shared Subscribers Exclusive Subscribers Exclusive Subscribers
● Exclusive access for producers:
○ Allows to manage producers epoch
○ Allows to implement leader election
● For consumers the choice is:
○ Shared subscribers → Distribute work
○ Exclusive subscribers → Receive all messages in order
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Processing Semantics
5. 3 Layers of Abstraction
Semantics
AT LEAST ONCE
Message pulled once or
more times; processed
each time
Duplicates are possible
Receipt guaranteed
No missing data
Semantics
AT MOST ONCE
Message pulled once
? May or may not be
received
No duplicates
Possible missing data
?
AT LEAST ONCE
Message pulled once or
more times; processed
each time
Duplicates are possible
Receipt guaranteed
No missing data
Semantics
AT MOST ONCE
Message pulled once
? May or may not be
received
No duplicates
Possible missing data
?
AT LEAST ONCE
Message pulled once or
more times; processed
each time
Duplicates are possible
EXACTLY ONCE
Message pulled once or
more times; processed
ONLY once
Receipt guaranteed
No duplicates
No missing data
Receipt guaranteed
No missing data
Semantics
AT MOST ONCE
Message pulled once
? May or may not be
received
No duplicates
Possible missing data
?
AT LEAST ONCE
Message pulled once or
more times; processed
each time
EXACTLY ONCE
Message pulled once or
more times; processed
ONLY once
Receipt guaranteed
No duplicates
No missing data
PIP-30: Pulsar Transactions
Receipt guaranteed
No missing data
Duplicates are possible
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Processing Semantics
5. 3 Layers of Abstraction
● A simple, high-performance, language-agnostic TCP protocol
● The protocol is versioned and maintains backward compatibility
with older versions
● Layers of abstraction build upon simple client operations
Client
Three Layers of Abstraction
Client - REST
Three Layers of Abstraction
{REST}
admin requests
pulsar-admin, pulsarctl, API
Client - REST
Three Layers of Abstraction
{REST}
admin requests
data requests
PIP-64: Introduce REST endpoints
for producing, consuming, and
reading messages
Functions - Serverless Event Processing
Three Layers of Abstraction
input topic 1
input topic 2
input topic N
…
output topic 1
output topic 2
Functions - Serverless Event Processing
Three Layers of Abstraction
input topic 1
input topic 2
input topic N
…
output topic 1
output topic 2
Three Layers of Abstraction
PULSAR CLIENT
PULSAR
FUNCTIONS
Less complexity,
Less flexibility
More flexibility,
More complexity
Three Layers of Abstraction
PULSAR CLIENT
PULSAR
FUNCTIONS
?
Less complexity,
Less flexibility
More flexibility,
More complexity
Three Layers of Abstraction
PULSAR CLIENT
PULSAR
FUNCTIONS
SQL
Less complexity,
Less flexibility
More flexibility,
More complexity
Introducing (pf)SQL - Pulsar Functions made easy
Three Layers of Abstraction
● Designed for Pulsar Functions
● Filtering / Transformation /
Routing
● Transformations for Pulsar IO
connectors
SQL - Pulsar Functions made easy
Three Layers of Abstraction
Neng Lu
Platform Engineering Lead
StreamNative
Simplify Pulsar Functions
Development with SQL
3:20 PM - 3:50 PM
5 FUNDAMENTALS OF
Modern
Event-Driven
Applications
1. Topics
2. Stream, Queue, and Table
3. Shared vs Exclusive
4. Semantics
5. 3 Layers of Abstraction
Concluding
● Complex event-driven applications require a
wide-range of abstractions, API and semantics for
each specific problem
● Pulsar is the system that solves all these problems
in the most natural and comprehensive way
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Keynote
Thank you!
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Sijie Guo
Co-Founder & CEO • StreamNative
Matteo Merli
CTO • StreamNative

Event-Driven Applications Done Right - Pulsar Summit SF 2022

  • 1.
    Pulsar Summit San Francisco HotelNikko August 18 2022 Keynote Event-Driven Applications Done Right Sijie Guo Co-Founder & CEO • StreamNative Matteo Merli CTO • StreamNative
  • 2.
    Sijie Guo Co-Founder &CEO StreamNative @sijieg @sijie ● Co-Founder & CEO, StreamNative ● PMC Member of Apache Pulsar ● PMC Chair of Apache BookKeeper ● Ex Co-Founder, Streamlio ● Ex Twitter, Yahoo!
  • 4.
    Bar Chart Race- Pull Requests HUGE GROWTH IN THE COMMUNITY
  • 5.
    Top-10 PRs inmessaging and streaming HUGE GROWTH IN THE COMMUNITY
  • 6.
    Pulsar ranked asa Top 5 ASF project HUGE GROWTH IN THE COMMUNITY > 560+ Contributors > 10,000+ Commits > 7,000+ Slack Members > 1,000+ Organizations using Pulsar
  • 7.
    Pulsar has alreadybeen deployed by thousands of companies across the globe HUGE GROWTH IN THE COMMUNITY
  • 8.
  • 9.
  • 10.
    Applications Pipelines Apache Pulsar Paymentservice Inventory management Personal recommendations Shipment tracking/alerting Fleet management Customer communication Dynamic pricing Geofencing Real-time analytics Customer 360 Streaming ETL Cloud Lakehouse Ingestion Change data capture Complex event processing Log aggregation ML Pipelines Application Domain Data Domain
  • 11.
    Applications Payment service Inventory management Personalrecommendations Shipment tracking/alerting Fleet management Customer communication Dynamic pricing Geofencing Application Domain
  • 12.
    Applications are builtwith HTTP, REST, or some other protocol made from requests and replies The Evolution of Microservices Request Reply
  • 13.
    This works wellwhen ecosystems are small … The Evolution of Microservices
  • 14.
    But gets harderas they grow more complex and more interconnected The Evolution of Microservices
  • 15.
    Services are tightlycoupled The Evolution of Microservices
  • 16.
    So if oneservice fails … The Evolution of Microservices x
  • 17.
    …or even justruns slowly The Evolution of Microservices
  • 18.
    The fall-out couldbe much larger The Evolution of Microservices x x x x
  • 19.
    Others end upfeeling that pain The Evolution of Microservices x x x x x
  • 20.
    At company scale,the majority of processes are asynchronous to one another The Evolution of Microservices Billing Inventory Fulfillment Fraud
  • 21.
    So it makessense to DECOUPLE services from one another The Evolution of Microservices Billing Inventory Fulfillment Fraud DECOUPLE
  • 22.
    It evolves towardsevent-driven architecture The Evolution of Microservices Billing Inventory Fulfillment Fraud MESSAGING & STREAMING PLATFORM
  • 23.
    Messaging & StreamingTechnologies Old and New The Evolution of Microservices MESSAGING & STREAMING PLATFORM
  • 24.
    1. Data abstraction:what is the right data abstraction? 2. API: what is the right API to provide? Is it just Streams? 3. Primitives: what are primitives to provide? 4. Processing Semantics: How to meet business needs? 5. Tools: what are the tools to offer?
  • 25.
    Matteo Merli CTO StreamNative @merlimat@merlimat ● CTO, StreamNative ● Co-creator and PMC Chair of Apache Pulsar ● Ex Co-Founder, Streamlio ● Ex Splunk, Yahoo!
  • 26.
  • 27.
    Example of acomplex event-driven application
  • 28.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Data abstraction 2. API 3. Primitives 4. Processing Semantics 5. Tools
  • 29.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Processing Semantics 5. 3 Layers of Abstraction
  • 30.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Processing Semantics 5. 3 Layers of Abstraction
  • 31.
    ● “Asynchronously broadcastmessages to a set of consumers” ● Decouples producers from consumers ● Subscribers state is managed by the system PUBLISH/SUBSCRIBE, IN THE LARGE Topics
  • 32.
    ● Infinite Retention ●Source of truth ● Can be read from any position ● Events can be stored: ○ By row ○ Columnar format → Lakehouse Integration Replayable Streams Topics
  • 33.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Processing Semantics 5. 3 Layers of Abstraction
  • 34.
    ● Unbound sequenceof events ● Ordered ● Replayable ● Partitioned for scalability Stream Stream, Queue, and Table
  • 35.
    ● Queue isthe fundamental “decoupling” mechanism ● The “Queue” is the unit to distribute work across a set of workers ● Multiple routing strategies: ○ Round-robin ○ Per-key ordering ● Individual message tracking ● Delivery delay Queue Stream, Queue, and Table
  • 36.
    ● View thedata in the topic as a “table” ● Access last value for each message key ● Can be used for: ○ Sharing state between processes ○ Distributed cache with local reads ● Integrated with transactions Table Stream, Queue, and Table
  • 37.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Processing Semantics 5. 3 Layers of Abstraction
  • 38.
    Shared vs Exclusive SharedProducers Shared Producers Exclusive Producers Shared Subscribers Exclusive Subscribers Exclusive Subscribers ● Exclusive access for producers: ○ Allows to manage producers epoch ○ Allows to implement leader election ● For consumers the choice is: ○ Shared subscribers → Distribute work ○ Exclusive subscribers → Receive all messages in order
  • 39.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Processing Semantics 5. 3 Layers of Abstraction
  • 40.
    Semantics AT LEAST ONCE Messagepulled once or more times; processed each time Duplicates are possible Receipt guaranteed No missing data
  • 41.
    Semantics AT MOST ONCE Messagepulled once ? May or may not be received No duplicates Possible missing data ? AT LEAST ONCE Message pulled once or more times; processed each time Duplicates are possible Receipt guaranteed No missing data
  • 42.
    Semantics AT MOST ONCE Messagepulled once ? May or may not be received No duplicates Possible missing data ? AT LEAST ONCE Message pulled once or more times; processed each time Duplicates are possible EXACTLY ONCE Message pulled once or more times; processed ONLY once Receipt guaranteed No duplicates No missing data Receipt guaranteed No missing data
  • 43.
    Semantics AT MOST ONCE Messagepulled once ? May or may not be received No duplicates Possible missing data ? AT LEAST ONCE Message pulled once or more times; processed each time EXACTLY ONCE Message pulled once or more times; processed ONLY once Receipt guaranteed No duplicates No missing data PIP-30: Pulsar Transactions Receipt guaranteed No missing data Duplicates are possible
  • 44.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Processing Semantics 5. 3 Layers of Abstraction
  • 45.
    ● A simple,high-performance, language-agnostic TCP protocol ● The protocol is versioned and maintains backward compatibility with older versions ● Layers of abstraction build upon simple client operations Client Three Layers of Abstraction
  • 46.
    Client - REST ThreeLayers of Abstraction {REST} admin requests pulsar-admin, pulsarctl, API
  • 47.
    Client - REST ThreeLayers of Abstraction {REST} admin requests data requests PIP-64: Introduce REST endpoints for producing, consuming, and reading messages
  • 48.
    Functions - ServerlessEvent Processing Three Layers of Abstraction input topic 1 input topic 2 input topic N … output topic 1 output topic 2
  • 49.
    Functions - ServerlessEvent Processing Three Layers of Abstraction input topic 1 input topic 2 input topic N … output topic 1 output topic 2
  • 50.
    Three Layers ofAbstraction PULSAR CLIENT PULSAR FUNCTIONS Less complexity, Less flexibility More flexibility, More complexity
  • 51.
    Three Layers ofAbstraction PULSAR CLIENT PULSAR FUNCTIONS ? Less complexity, Less flexibility More flexibility, More complexity
  • 52.
    Three Layers ofAbstraction PULSAR CLIENT PULSAR FUNCTIONS SQL Less complexity, Less flexibility More flexibility, More complexity
  • 53.
    Introducing (pf)SQL -Pulsar Functions made easy Three Layers of Abstraction ● Designed for Pulsar Functions ● Filtering / Transformation / Routing ● Transformations for Pulsar IO connectors
  • 54.
    SQL - PulsarFunctions made easy Three Layers of Abstraction Neng Lu Platform Engineering Lead StreamNative Simplify Pulsar Functions Development with SQL 3:20 PM - 3:50 PM
  • 55.
    5 FUNDAMENTALS OF Modern Event-Driven Applications 1.Topics 2. Stream, Queue, and Table 3. Shared vs Exclusive 4. Semantics 5. 3 Layers of Abstraction
  • 56.
    Concluding ● Complex event-drivenapplications require a wide-range of abstractions, API and semantics for each specific problem ● Pulsar is the system that solves all these problems in the most natural and comprehensive way
  • 57.
    Pulsar Summit San Francisco HotelNikko August 18 2022 Keynote Thank you! Pulsar Summit San Francisco Hotel Nikko August 18 2022 Sijie Guo Co-Founder & CEO • StreamNative Matteo Merli CTO • StreamNative