Hands-on Workshop: Apache Pulsar

Matteo Merli & Sijie Guo
fast, durable, ﬂexible pub/sub messaging

What is Apache Pulsar?
3
Ordering
Guaranteed ordering
Multi-tenancy
A single cluster can
support many tenants
and use cases
High throughput
Can reach 1.8 M
messages/s in a
single partition
Durability
Data replicated and
synced to disk
Geo-replication
Out of box support for
geographically
distributed
applications
Uniﬁed messaging
model
Support both Topic &
Queue semantic in a
single model
Delivery Guarantees
At least once, at most
once and effectively once
Low Latency
Low publish latency of
5ms at 99pct
Highly scalable
Can support millions of
topics

Usage of Pulsar
• In production for 3+ years at Yahoo
• Powering critical products like:
• Yahoo Mail, Yahoo Finance, Gemini Ads,
Flickr and Sherpa (NoSQL database)
• 80+ tenants
• 2.3 Million topics
• 100 B messages / day
• Full-mesh replication in 8 data-centers
4

Why build a new system?
• No existing solution to satisfy requirements
• Multi tenant — 1M topics — Low latency — Durability — Geo replication
• Kafka doesn’t scale well with many topics:
• Storage model based on individual directory per topic partition
• Enabling durability kills the performance
• Many other choking points: getting stats, access to metadata, ﬂow-control
• Operations are not very convenient
• eg: replacing a server, manual commands to copy the data and involves clients
• clients access to ZK clusters not desirable
• Ability to manage large backlogs
• No scalable support to keep consumer position
5

Architecture view
Separate layers between
brokers bookies
• Broker and bookies can
be added
independently
• Trafﬁc can be shifted
very quickly across
brokers
• New bookies will ramp
up on trafﬁc quickly
6
Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1
Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5
Apache BookKeeper
Apache Pulsar
Producer Consumer

Apache BookKeeper
• A replicated log storage
• Low-latency durable writes
• Simple repeatable read
consistency
• Highly available
• Store many logs per node
• I/O Isolation
7

Kafka vs. Pulsar Segment
Distribution

BookKeeper - Storage
• A single bookie can serve and
store thousands of ledgers
• Write and read paths are
separated:
• Avoid read activity to impact
write latency
• Writes are added to in-
memory write-cache and
committed to journal
• Write cache is ﬂushed in
background to separated
device
• Entries are sorted to allow for
mostly sequential reads
9

Pulsar Concepts
Support both topic and queue semantics in a uniﬁed topic
concept
11

Pulsar client library
• Java — C++ — Python — WebSocket APIs
• Partitioned topics
• Apache Kafka compatibility wrapper API
• Transparent batching of messages
• Compression
• TLS encryption and authentication
• End-to-end encryption
15

Use cases - Message Queue
• Decouple online / background
• Provide high-availability
• Reliable data transport
20
Online
events
Pulsar
topic 1
Worker 1
Worker 2
Worker 3
Pulsar
topic 2
Low latency
publish
Long running task
Notiﬁcation

• Async processing of time intensive background tasks
• Publishes and complete HTTP request
• Examples: Image / Video transcoding, bulk operations (large folder deletions)
21

22
• Delegate the processing to multiple compute jobs which interact with micro-services
• Wait until the response is pushed back to the HTTP server
• Examples: breaking monolithic applications into micro-services

Use cases - Notiﬁcations
• Change data capture
• Listeners are frequently different tenants
• Quotas needs to ensure producer is not affected
23
Event
Pulsar
topic
Component 1
Component 2
Component 3
Listeners

Use cases - Notiﬁcations
• Replicating DB transactions across regions
• Notiﬁcation to multiple interested tenants (example: new user
account, …)
24

Use cases - Feedback system
• Coordinate a large number of machines
• Propagate state
25
External
inputs
Pulsar
topic 1
Serving
system
Serving
system
Serving
system
Pulsar
topic 2
Controller
Updates
Feedback

Multi-Tenancy
• Authentication / Authorization / Namespaces / Admin APIs
• I/O Isolations between writes and reads
• Provided by BookKeeper - Ensure readers draining backlog won’t
affect publishers
• Soft isolation
• Storage quotas — ﬂow-control — back-pressure — rate limiting
• Hardware isolation
• Constrain some tenants on a subset of brokers or bookies
27

Geo-Replication
• Scalable
asynchronous
replication
• Integrated in the
broker message
ﬂow
• Simple
conﬁguration to
add/remove
regions
30
Topic (T1) Topic (T1)
Topic (T1)
Subscrip@on (S1) Subscrip@on (S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data Center A Data Center B
Data Center C

Pulsar — Conclusion
• A fast durable, distributed pub/sub messaging
• Flexible Traditional Messaging: Queuing and Pub/Sub
• Focus on message dispatch and consumption
• Remove data as soon as possible if they are not needed
• It is backed by a scalable log store
• durable message store, zero data loss
• allow rewinding / reprocessing messages for backﬁll, bootstrap systems and
stream computing
• It carries all the fantastic features from the log store.
32

Curious to Learn More?
• Apache Pulsar : http://pulsar.incubator.apache.org
• Apache DistributedLog : http://bookkeeper.apache.org/
distributedlog
• Apache BookKeeper : http://bookkeeper.apache.org
• Follow Us @apache_pulsar @asfbookkeeper
@distributedlog
34

Curious to Learn More?
• Messaging, Storage, or Both: https://streaml.io/blog/
messaging-storage-or-both/
• Why BookKeeper: Consistency, Durability and Availability:
https://streaml.io/blog/why-apache-bookkeeper/
• Introduction to Apache Pulsar: https://streaml.io/blog/intro-
to-pulsar/
• Kafka vs DistributedLog: https://bookkeeper.apache.org/
distributedlog/technical-review/2016/09/19/kafka-vs-
distributedlog.html
35

Curious to learn more about
Streamlio?
• Streamlio: https://streaml.io
• Sandbox Preview: https://streaml.io/docs/getting-started
• Learn slack channel: https://learn-streamlio.slack.com 
36

Hands-on Workshop: Apache Pulsar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hands-on Workshop: Apache Pulsar

Similar to Hands-on Workshop: Apache Pulsar (20)

Recently uploaded

Recently uploaded (20)

Hands-on Workshop: Apache Pulsar