What can
Apache Pulsar
do for FinTech?
streamnative.io
Tim Spann
Developer Advocate
StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Apache
Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud,
Trino, Aerospike, IoT and more.
John Kinson
Head of Sales, EMEA
StreamNative
● Startup, Scale-up and Large Enterprise expert
● Building the StreamNative Sales function in EMEA
● Experience:
○ 25+ years of building and selling distributed and embedded systems in
the telecoms, digital media and cloud enterprise software industries
Agenda
01 Welcome
02 Introduction to Messaging + Data Streaming
03 Introduction to Apache Pulsar
04 Why Open Source
05 Resources
06 Q&A
3
4
➔ Asynchronous messages triggered by
events
➔ Consuming messages regardless of
Language, System, Sender
➔ Queueing
➔ Routing
➔ Work Queues
➔ JPMorgan Chase AMQP
MESSAGING
5
➔ Perform in Real-Time
➔ Process Events as They Happen
➔ Joining Streams with SQL
➔ Find Anomalies Immediately
➔ Ordering and Arrival Semantics
➔ Continuous Streams of Data
DATA STREAMING
streamnative.io
Accessing historical as well as
real-time data
Pub/sub model enables event streams
to be sent from multiple producers,
and consumed by multiple consumers
To process large amounts of data in a
highly scalable way
When is Messaging and
Streaming used?
Industry trends
Banking
Transforming from
siloed systems
to combined data streams
Provide faster claim
processing, fraud detection and
system integration
Insurance
Handle huge columns of
data from sensors
IoT
7
Apache Pulsar is a Cloud-Native Messaging
and Event-Streaming Platform.
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example,
sending one email message to many
recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Pulsar: Unified Messaging + Data Streaming
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example,
sending one email message to many
recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Pulsar: Unified Messaging + Data Streaming
.. and Streaming
Works best in situations where the
order of messages is important—for
example, data ingestion.
Kafka and Amazon Kinesis are
examples of messaging systems that
use streaming semantics for
consuming messages.
Unified Messaging and Streaming
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
CDC
Apps
Building
Microservices
Asynchronous
Communication
Building Real Time
Applications
Highly Resilient
Tiered storage
12
Pulsar Benefits
Pulsar Global Adoption
Using Pulsar with Fintech
14
Low latency
Geo-replication
Data integrity
High availability
Durability
Multi-tenancy
Multiple data consumers:
Transactions, payment
processing, alerts,
analytics, KYC, fraud
detection with ML & AI
Large data volumes,
high scalability
Financial event
messaging
Many topics, producers,
consumers
Why Open
Source Pulsar?
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
16
● We would get many benefits from an
open source model
○ Other companies would help
develop the product
○ Better security, code escrow,
longevity
● We would keep the core features in the
OSS version
● We could build commercial offerings,
services around the core product
OUR BETS AND EARLY DECISIONS
Why Open
Source Pulsar?
17
C/OSS Model
Benefits Challenges
Many developers
Security,
Longevity,
Escrow
Why pay?
Multiple roadmaps
RESOURCES
Here are resources to continue your journey
with Apache Pulsar
Now Available
On-Demand Pulsar
Training
Academy.StreamNative.io
19
[On-Demand Video]
Introduction to Pulsar
Watch Now!
20
FREE ebook
Apache Pulsar
in Action
Access Now!
John Kinson
Head of Sales
EMEA
Q&A
Tim Spann
Developer Advocate
@PaaSDev
linkedin.com/in/
timothyspann
github.com/tspannhw
john@streamnative.io
linkedin.com/in/
johnkinson
+44 207 072 1095
22
Thank you
streamnative.io
Industry trends
Notable industries and sectors using data streaming:
Banking - transforming from siloed systems to combined data streams
○ Typical applications of event streaming include banking sector processing of
financial transactions, with multiple customer touchpoints, notifications, and
support for mobile devices
○ Banking data (transactions and meta data) can be streamed in parallel for
fraud detection using ML and AI in near real-time
Insurance - building a single view from multiple data sources to provide faster claim
processing, fraud detection and system integration
IoT - handling huge volumes of data from sensors
Adopted Pulsar to replace
Kafka in their DSP (Data
Streaming Platform).
● 1.5-2x lower in capex
cost
● 5-50x improvement in
latency
● 2-3x lower in opex due
● Process 10
petabytes/day
Adopted Pulsar to power
their billing platform,
Midas, which processing
hundreds of billions of
financial transactions daily.
Adoption then expanded to
Tencent’s Federated
Learning Platform and
Tencent Gaming.
Applied Materials is one of
the biggest semiconductor
hardware and software
supplier in the industry.
They adopted Pulsar to
enable them to build a
message bus to tie all of
their data together. They
previously used Tibco.
Pulsar Adoption Use Cases
Agenda
Welcome
Introduction to Messaging + Data Streaming
● What is messaging and data streaming?
● When is it used?
● What are the industry trends?
Introduction to Apache Pulsar
● What it is
● What it enables
● Who uses it today?
● Using Apache Pulsar in FinTech applications
Why Open Source
● Why open source Apache Pulsar?
● What have been the benefits and challenges?
Resources
Q&A
Industry trends
Banking
Transforming from
siloed systems
to combined data streams
Provide faster claim
processing, fraud detection and
system integration
Insurance
Handle huge columns of
data from sensors
IoT
26
Pulsar Adoption Spreads
Tencent serves billions of users and over a million merchants.
Use Case #1: Payments
Early 2019, Tencent
adopts Pulsar to power
their billing platform,
Midas, processing
hundreds of billions of
financial transactions
daily.
Use Case #2: ML/AI
Pulsar adoption
spreads to Tencent’s
Federated Learning
Platform where it
supports trillions of
concurrent federated
learnings every day.
Use Case #3: Gaming
Tencent’s Gaming
Department replaces
Kafka with Pulsar for
its logging pipeline.
Founded By The
Creators Of Apache Pulsar
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
Data veterans with extensive industry experience
Messages - the basic unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Producer-Consumer
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that producers
use to transmit messages to subscribed consumers.
● Messages belong to a topic and contain an arbitrary
payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration rules
that determine how messages are delivered to
consumers.
● Consumers receive messages.
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
33
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
34
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Unified Messaging Model
Streaming
Messaging
Producer 1
Producer 2
Pulsar
Topic/Partition
m0
m1
m2
m3
m4
Consumer D-1
Consumer D-2
Consumer D-3
Subscription D
<
k
2
,
v
1
>
<
k
2
,
v
3
>
<k3,v2>
<
k
1
,
v
0
>
<
k
1
,
v
4
>
Key-Shared
Consumer C-1
Consumer C-2
Consumer C-3
Subscription C
m1
m2
m3
m4
m0
Shared
Failover
Consumer B-1
Consumer B-0
Subscription B
m1
m2
m3
m4
m0
In case of failure in
Consumer B-0
Consumer A-1
Consumer A-0
Subscription A
m1
m2
m3
m4
m0
Exclusive
X
Connectivity
• Libraries - (Java, Python, Go, NodeJS,
WebSockets, C++, C#, Scala, Rust,...)
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Use Cases
Multi-Tenant Data
Infrastructure
AdTech
Fraud Detection
FinTech
IoT Analytics
Microservices Development
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
A serverless event streaming
framework
● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Publish the results of the
computation to another topic.
● Support multiple
programming languages (Java,
Python, Go)
● Can leverage 3rd-party
libraries to support the
execution of ML models on
the edge.
Pulsar Functions
Moving Data In and Out of Pulsar
IO/Connectors are a simple way to integrate with external systems and move
data in and out of Pulsar. https://pulsar.apache.org/docs/en/io-jdbc-sink/
● Built on top of Pulsar Functions
● Built-in connectors - hub.streamnative.io
Source Sink
Kafka-on-Pulsar (Kop)
Pulsar SQL
Presto/Trino workers can read
segments directly from
bookies (or offloaded storage)
in parallel.
Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment 2 Segment 3 Segment 4 Segment X
Segment 1
Segment 1 Segment 1
Segment 3 Segment 3
Segment 3
Segment 2
Segment 2
Segment 2
Segment 4
Segment 4
Segment 4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordinator
...
...
SQL Worker SQL Worker SQL Worker
SQL Worker
Query
Topic
Metadata
<-> Events <->
Streaming FLiPS Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
<-> Events <->
CDC
Apps
Review: Key Pulsar Terminology
● Producer is a process that publishes messages to a topic.
● Consumer is a process that establishes a subscription to a topic
and processes messages published to that topic.
● Subscription: A subscription is a named configuration rule that
determines how messages are delivered to consumers. Four
subscription modes are available in Pulsar: exclusive, shared,
failover, and key-shared.
● Brokers handle the connections and routes messages.
● Topics are named channels for transmitting messages from
producers to consumers. Partitioned Topics are “virtual” topics
composed of multiple topics.
● Messages belong to a topic and contain an arbitrary payload.
● Instance is a group of clusters that
act together as a single unit.
● Cluster is a set of Pulsar brokers,
ZooKeeper quorum, and an
ensemble of BookKeeper bookies.
● Tenants are the administrative unit
for allocating capacity and enforcing
an authentication/ authorization
scheme.
● Namespaces are a grouping
mechanism for related topics.
The Need For Real-Time Data
Hybrid and multi-cloud
strategies with native
geo-replication
Seamlessly build
microservice architectures
with support for streaming
and messaging workloads
Built for Kubernetes
CloudNative
migrations with tools
360 degree customer data
multi-tenancy, infinite
retention, and extensive
connector ecosystem
streamnative.io
Tim Spann
Developer Advocate
StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Apache
Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud,
Trino, Aerospike, IoT and more.
Background
● Provides a data platform
for the cloud
● Customers include 92 of
the Fortune 100
● Core use cases include
real-time monitoring,
interactive applications,
log processing & analytics,
IOT analytics, streaming
data transformation,
real-time analytics &
event-driven workflows
Why Pulsar
● Scalability
● Durability
● Fault Tolerance
● High Availability
● Sharing & Isolation
● Messaging Models
● Persistence
● Client Languages
● Deployment in k8s
● Operability
● Disaster REcovery
● TCO
● Community & Adoption
Benefits
● 1.5-2x lower in capex
cost
● 5-50x improvement in
latency
● 2-3x lower in opex due to
layered architecture
● Processes billions of
messages/day in
production
Background
● The third-largest payment
provider in China behind
Alipay and WeChat
Payment
● 500 million registered users
and 41.9 million active users
● Need to improve the
efficiency of fraud detection
for mobile payments
● Current lambda architecture
of Kafka + Hive is complex
and difficult to maintain
Benefits
● Reduce complexity by 33%
(clusters reduced from six to
four)
● Improve production
efficiency by 11 times
● Higher stability due to the
unified architecture
Why Pulsar
● Cloud-native architecture
and segment-centric
storage
● Pulsar is able to do both
streaming and batch
processing
● Able to build a unified
data processing stack
with Pulsar and Spark,
streamlining messy
operations problems
StreamNative Customer Spotlight:
Background
● Flipkart is the largest
e-commerce company
in India with $6B+ in
annual revenue
● Company-wide
messaging platform,
supporting different
types of streaming use
cases, including:
payment processing,
order tracking,
warehouse, logistics, etc.
Why StreamNative
● Work with the original
developers of Pulsar and
top Pulsar engineers
● Experience operating
large scale,
geo-replicated
messaging systems
● 24 x 7 support to
support mission-critical
business applications
Benefits
● Able to handle spikes in
traffic without manual
rebalancing or system failure
● Reduced operational
complexity and total cost of
ownership
● Support the move to cloud
StreamNative Customer Spotlight:
Background
● Narvar provides
e-commerce supply chain
management software,
powering 300 retailers and
650 brands
● Core use case:
asynchronous processing
to distribute tasks between
the various systems,
including individual
retailers’ ordering and
warehouse management
applications
Why StreamNative
● Work with the original
developers of Pulsar and
top Pulsar engineers
● “Before we began working
with StreamNative, Sijie
Guo and his team helped us
work out some production
issues. We were very
impressed by how quickly
they solved our problems
and their willingness to
help.” - Ankush Goyal
Benefits
● Accelerate application
development
● Able to handle spikes in
traffic without manual
rebalancing or system failure
● Reduced customer issues
streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
Building An App
Code Along With Tim
<<DEMO>>
Geo-Replication
Pulsar has built-in cross
data center replication
that is used in production
already.
Why Open
Source Pulsar?
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
● Other companies would help develop the
product
● We could build commercial offerings, services
around the core product
● We would get many benefits from an open
source model

Open Source Bristol 30 March 2022

  • 1.
  • 2.
    streamnative.io Tim Spann Developer Advocate StreamNative ●FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Apache Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud, Trino, Aerospike, IoT and more. John Kinson Head of Sales, EMEA StreamNative ● Startup, Scale-up and Large Enterprise expert ● Building the StreamNative Sales function in EMEA ● Experience: ○ 25+ years of building and selling distributed and embedded systems in the telecoms, digital media and cloud enterprise software industries
  • 3.
    Agenda 01 Welcome 02 Introductionto Messaging + Data Streaming 03 Introduction to Apache Pulsar 04 Why Open Source 05 Resources 06 Q&A 3
  • 4.
    4 ➔ Asynchronous messagestriggered by events ➔ Consuming messages regardless of Language, System, Sender ➔ Queueing ➔ Routing ➔ Work Queues ➔ JPMorgan Chase AMQP MESSAGING
  • 5.
    5 ➔ Perform inReal-Time ➔ Process Events as They Happen ➔ Joining Streams with SQL ➔ Find Anomalies Immediately ➔ Ordering and Arrival Semantics ➔ Continuous Streams of Data DATA STREAMING
  • 6.
    streamnative.io Accessing historical aswell as real-time data Pub/sub model enables event streams to be sent from multiple producers, and consumed by multiple consumers To process large amounts of data in a highly scalable way When is Messaging and Streaming used?
  • 7.
    Industry trends Banking Transforming from siloedsystems to combined data streams Provide faster claim processing, fraud detection and system integration Insurance Handle huge columns of data from sensors IoT 7
  • 8.
    Apache Pulsar isa Cloud-Native Messaging and Event-Streaming Platform.
  • 9.
    Messaging Ideal for workqueues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. Pulsar: Unified Messaging + Data Streaming
  • 10.
    Messaging Ideal for workqueues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. Pulsar: Unified Messaging + Data Streaming .. and Streaming Works best in situations where the order of messages is important—for example, data ingestion. Kafka and Amazon Kinesis are examples of messaging systems that use streaming semantics for consuming messages.
  • 11.
    Unified Messaging andStreaming StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols CDC Apps
  • 12.
  • 13.
  • 14.
    Using Pulsar withFintech 14 Low latency Geo-replication Data integrity High availability Durability Multi-tenancy Multiple data consumers: Transactions, payment processing, alerts, analytics, KYC, fraud detection with ML & AI Large data volumes, high scalability Financial event messaging Many topics, producers, consumers
  • 15.
    Why Open Source Pulsar? SijieGuo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO
  • 16.
    16 ● We wouldget many benefits from an open source model ○ Other companies would help develop the product ○ Better security, code escrow, longevity ● We would keep the core features in the OSS version ● We could build commercial offerings, services around the core product OUR BETS AND EARLY DECISIONS Why Open Source Pulsar?
  • 17.
    17 C/OSS Model Benefits Challenges Manydevelopers Security, Longevity, Escrow Why pay? Multiple roadmaps
  • 18.
    RESOURCES Here are resourcesto continue your journey with Apache Pulsar
  • 19.
  • 20.
  • 21.
    John Kinson Head ofSales EMEA Q&A Tim Spann Developer Advocate @PaaSDev linkedin.com/in/ timothyspann github.com/tspannhw john@streamnative.io linkedin.com/in/ johnkinson +44 207 072 1095
  • 22.
  • 23.
    streamnative.io Industry trends Notable industriesand sectors using data streaming: Banking - transforming from siloed systems to combined data streams ○ Typical applications of event streaming include banking sector processing of financial transactions, with multiple customer touchpoints, notifications, and support for mobile devices ○ Banking data (transactions and meta data) can be streamed in parallel for fraud detection using ML and AI in near real-time Insurance - building a single view from multiple data sources to provide faster claim processing, fraud detection and system integration IoT - handling huge volumes of data from sensors
  • 24.
    Adopted Pulsar toreplace Kafka in their DSP (Data Streaming Platform). ● 1.5-2x lower in capex cost ● 5-50x improvement in latency ● 2-3x lower in opex due ● Process 10 petabytes/day Adopted Pulsar to power their billing platform, Midas, which processing hundreds of billions of financial transactions daily. Adoption then expanded to Tencent’s Federated Learning Platform and Tencent Gaming. Applied Materials is one of the biggest semiconductor hardware and software supplier in the industry. They adopted Pulsar to enable them to build a message bus to tie all of their data together. They previously used Tibco. Pulsar Adoption Use Cases
  • 25.
    Agenda Welcome Introduction to Messaging+ Data Streaming ● What is messaging and data streaming? ● When is it used? ● What are the industry trends? Introduction to Apache Pulsar ● What it is ● What it enables ● Who uses it today? ● Using Apache Pulsar in FinTech applications Why Open Source ● Why open source Apache Pulsar? ● What have been the benefits and challenges? Resources Q&A
  • 26.
    Industry trends Banking Transforming from siloedsystems to combined data streams Provide faster claim processing, fraud detection and system integration Insurance Handle huge columns of data from sensors IoT 26
  • 27.
    Pulsar Adoption Spreads Tencentserves billions of users and over a million merchants. Use Case #1: Payments Early 2019, Tencent adopts Pulsar to power their billing platform, Midas, processing hundreds of billions of financial transactions daily. Use Case #2: ML/AI Pulsar adoption spreads to Tencent’s Federated Learning Platform where it supports trillions of concurrent federated learnings every day. Use Case #3: Gaming Tencent’s Gaming Department replaces Kafka with Pulsar for its logging pipeline.
  • 28.
    Founded By The CreatorsOf Apache Pulsar Sijie Guo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO Data veterans with extensive industry experience
  • 29.
    Messages - thebasic unit of Pulsar Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication.
  • 30.
    Producer-Consumer Producer Consumer Publisher sendsdata and doesn't know about the subscribers or their status. All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it Topic Topic
  • 31.
    Pulsar’s Publish-Subscribe model Broker Subscription Consumer1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  • 32.
    Pulsar Subscription Modes Differentsubscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 33.
    Messaging Ordering Guarantees Topic OrderingGuarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 33 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 34.
    Messaging Ordering Guarantees Topic OrderingGuarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 34 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 35.
    Unified Messaging Model Streaming Messaging Producer1 Producer 2 Pulsar Topic/Partition m0 m1 m2 m3 m4 Consumer D-1 Consumer D-2 Consumer D-3 Subscription D < k 2 , v 1 > < k 2 , v 3 > <k3,v2> < k 1 , v 0 > < k 1 , v 4 > Key-Shared Consumer C-1 Consumer C-2 Consumer C-3 Subscription C m1 m2 m3 m4 m0 Shared Failover Consumer B-1 Consumer B-0 Subscription B m1 m2 m3 m4 m0 In case of failure in Consumer B-0 Consumer A-1 Consumer A-0 Subscription A m1 m2 m3 m4 m0 Exclusive X
  • 36.
    Connectivity • Libraries -(Java, Python, Go, NodeJS, WebSockets, C++, C#, Scala, Rust,...) • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 37.
    Use Cases Multi-Tenant Data Infrastructure AdTech FraudDetection FinTech IoT Analytics Microservices Development
  • 38.
    Schema Registry Schema Registry schema-1(value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 39.
    Pulsar Functions ● Lightweightcomputation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  • 40.
    ● Consume messagesfrom one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 41.
    Moving Data Inand Out of Pulsar IO/Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. https://pulsar.apache.org/docs/en/io-jdbc-sink/ ● Built on top of Pulsar Functions ● Built-in connectors - hub.streamnative.io Source Sink
  • 42.
  • 43.
    Pulsar SQL Presto/Trino workerscan read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  • 44.
    <-> Events <-> StreamingFLiPS Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols <-> Events <-> CDC Apps
  • 45.
    Review: Key PulsarTerminology ● Producer is a process that publishes messages to a topic. ● Consumer is a process that establishes a subscription to a topic and processes messages published to that topic. ● Subscription: A subscription is a named configuration rule that determines how messages are delivered to consumers. Four subscription modes are available in Pulsar: exclusive, shared, failover, and key-shared. ● Brokers handle the connections and routes messages. ● Topics are named channels for transmitting messages from producers to consumers. Partitioned Topics are “virtual” topics composed of multiple topics. ● Messages belong to a topic and contain an arbitrary payload. ● Instance is a group of clusters that act together as a single unit. ● Cluster is a set of Pulsar brokers, ZooKeeper quorum, and an ensemble of BookKeeper bookies. ● Tenants are the administrative unit for allocating capacity and enforcing an authentication/ authorization scheme. ● Namespaces are a grouping mechanism for related topics.
  • 46.
    The Need ForReal-Time Data Hybrid and multi-cloud strategies with native geo-replication Seamlessly build microservice architectures with support for streaming and messaging workloads Built for Kubernetes CloudNative migrations with tools 360 degree customer data multi-tenancy, infinite retention, and extensive connector ecosystem
  • 47.
    streamnative.io Tim Spann Developer Advocate StreamNative ●FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Apache Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud, Trino, Aerospike, IoT and more.
  • 48.
    Background ● Provides adata platform for the cloud ● Customers include 92 of the Fortune 100 ● Core use cases include real-time monitoring, interactive applications, log processing & analytics, IOT analytics, streaming data transformation, real-time analytics & event-driven workflows Why Pulsar ● Scalability ● Durability ● Fault Tolerance ● High Availability ● Sharing & Isolation ● Messaging Models ● Persistence ● Client Languages ● Deployment in k8s ● Operability ● Disaster REcovery ● TCO ● Community & Adoption Benefits ● 1.5-2x lower in capex cost ● 5-50x improvement in latency ● 2-3x lower in opex due to layered architecture ● Processes billions of messages/day in production
  • 49.
    Background ● The third-largestpayment provider in China behind Alipay and WeChat Payment ● 500 million registered users and 41.9 million active users ● Need to improve the efficiency of fraud detection for mobile payments ● Current lambda architecture of Kafka + Hive is complex and difficult to maintain Benefits ● Reduce complexity by 33% (clusters reduced from six to four) ● Improve production efficiency by 11 times ● Higher stability due to the unified architecture Why Pulsar ● Cloud-native architecture and segment-centric storage ● Pulsar is able to do both streaming and batch processing ● Able to build a unified data processing stack with Pulsar and Spark, streamlining messy operations problems
  • 50.
    StreamNative Customer Spotlight: Background ●Flipkart is the largest e-commerce company in India with $6B+ in annual revenue ● Company-wide messaging platform, supporting different types of streaming use cases, including: payment processing, order tracking, warehouse, logistics, etc. Why StreamNative ● Work with the original developers of Pulsar and top Pulsar engineers ● Experience operating large scale, geo-replicated messaging systems ● 24 x 7 support to support mission-critical business applications Benefits ● Able to handle spikes in traffic without manual rebalancing or system failure ● Reduced operational complexity and total cost of ownership ● Support the move to cloud
  • 51.
    StreamNative Customer Spotlight: Background ●Narvar provides e-commerce supply chain management software, powering 300 retailers and 650 brands ● Core use case: asynchronous processing to distribute tasks between the various systems, including individual retailers’ ordering and warehouse management applications Why StreamNative ● Work with the original developers of Pulsar and top Pulsar engineers ● “Before we began working with StreamNative, Sijie Guo and his team helped us work out some production issues. We were very impressed by how quickly they solved our problems and their willingness to help.” - Ankush Goyal Benefits ● Accelerate application development ● Able to handle spikes in traffic without manual rebalancing or system failure ● Reduced customer issues
  • 52.
    streamnative.io Passionate and dedicatedteam. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  • 53.
    Building An App CodeAlong With Tim <<DEMO>>
  • 54.
    Geo-Replication Pulsar has built-incross data center replication that is used in production already.
  • 55.
    Why Open Source Pulsar? SijieGuo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO ● Other companies would help develop the product ● We could build commercial offerings, services around the core product ● We would get many benefits from an open source model