How Pulsar Stores Your Data - Pulsar Summit NA 2021

● SME for pulsar at Nutanix
● Love
○ Distributed systems
○ Open source
● Hands on developer, aspiring architect
● Love spending time with data (stores,
steam, analytics etc)
● Contributions to pulsar & MySQL
Who am I ?
https://www.linkedin.com/in/shivjijha/
https://twitter.com/ShivjiJha

Catalogue
• Pulsar modules
• Data stores
• Distributed storage
• Storage internals (Read / Write)
• Tying together (Metadata layer)

Pulsar Modules
SERVING LAYER
METADATA LAYER
STORAGE LAYER
CLIENT

Pulsar Modules
SERVING LAYER (BROKER)
METADATA LAYER (ZOOKEEPER)
STORAGE LAYER (BOOKKEEPER)
CLIENT

Pulsar Data Stores
STORAGE LAYER (BOOKKEEPER)
CLIENT

Pulsar Data Stores
BOOKKEEPER
CLIENT
OBJECT STORE

Pulsar Data Stores
SERVING LAYER
(BROKER)
BOOKKEEPER
CLIENT
OBJECT STORE
CACHE

Pulsar Data Stores : Overview
• Broker Cache
• Single broker owner for topic
• Primary Store
• Bookkeeper
• Cold Store
• Object Store
• Metadata Layer
• Zookeeper

scale compute
scale storage
Pulsar @ scale

Pulsar Data Stores : Bookkeeper
LEDGER
Ledger Metadata
Status : Open
Last Entry Id : -1
Ensemble Size
Write Quorum Size
Read Quorum Size
Ensembles : [ [], [] ]

LEDGER
entry 0
Ledger Metadata
Status : Open
Last Entry Id : 0
Ensemble Size
Write Quorum Size
Read Quorum Size

LEDGER
entry 0
entry 1
entry 2
Ledger Metadata
Status : Closed
Last Entry Id : 2
Ensemble Size
Write Quorum Size
Read Quorum Size

LEDGER
entry 0
entry 1
entry 2
Entry Data
Metadata:
1.LedgerId
2.EntryId
3.Last Add Confirmed
4.Digest (CRC32 / CRC32C)
Data : byte []

Pulsar Data Stores : Write Path
1.Client sends write on topic
2.The owner broker (bookkeeper client)
writes to current ensemble of bookies
3.Client waits for #ackQuorum acks
4.Acknowledge to client.

Pulsar Data Stores : Write Path
1.Client sends write on topic
2.The owner broker (bookkeeper client)
writes to current ensemble of bookies
3.Client waits for #ackQuorum acks
4.Acknowledge to client.
New ledger registered in zookeeper
against topicName.

Pulsar Data Stores : Read Path
1.Client sends read on topic
2.The owner broker searches local
cache.
3.If cache miss, read from closest
bookie.
a. On failure try other bookie

• Internally, journal for short lived write queue
• Facilitates quick writes (appends)
• Write cache for organized and flush writes
• Read cache for quick reads
• if data not in broker cache
• if data not in write cache
• Ledgers for organized data interleaving topics.
• RocksDB to index data ledger by ledger
• Topic to ledger(s) mapping in zookeeper.

● Consistency over availability
● Durability
○ Ensemble
○ Write Quorum
○ Ack Quorum
● Configure how much / how long to store in bookkeeper
● Configure replication factor
● Choose durability vs latency / throughput

Pulsar Data Stores : Cold Store
• Jcloud library
• aws-s3
• google-cloud-storage
• filesystem
• Broker writes to cold store, not bookkeeper.
• bookkeeper => broker => object store
• too much bandwidth?
• Why not offload directly from bookie to object store?
• Schema not stored with data in object store.
• consumers like flink, presto etc can work with just bookkeeper

Pulsar Data Stores : Distribution
tenant
namespace
bundle1 bundle2 bundle3
T
O
P
I
C
1
T
O
P
I
C
2
T
O
P
I
C
3
T
O
P
I
C
4
T
O
P
I
C
5
T
O
P
I
C
6
• Shards (aka namespace Bundles) for load balancing

tenant
namespace
bundles
topic
ledgers

tenant
namespace
bundles
topic
ledgers
ledgers[], schemaLedgers[],
compactedLedgers[]

tenant
namespace
bundles
topic
ledgers
ledgerId, entries range,
ledger size, offloaded?
ledgers[], schemaLedgers[],
compactedLedgers[]

Bookkeeper: Client & Server
• Bookkeeper has no leader / follower.
• All bookies have same responsibilities.
• Thick bookie client implements replication, consistency etc
• Simple bookie APIs
• Resources
• Ledger
• Entry

ENTRY <L1 EO>
ENTRY <L1 E1>
ENTRY <L1 E2>
ENTRY<L2 EO>
ENTRY<L2 E1>
ENTRY <L1 E3>
ENTRY <L2
E2>
ENTRY <L3 EO>
BOOKKEEPER
CLIENT
BROKER
JOURNAL 0
DISK
BOOKKEEPER

ENTRY <L1 EO>
ENTRY <L1 E1>
ENTRY <L1 E2>
ENTRY<L2 EO>
ENTRY<L2 E1>
ENTRY <L1 E3>
ENTRY <L2
E2>
ENTRY <L3 EO>
BOOKKEEPER
CLIENT
BROKER
JOURNAL 0
DISK
BOOKKEEPER
• First step in pulsar write path
• Distributed WAL
• like databases
(MySQL, postgres etc)
• sequential writes
• no reads from journal
• write cache
• write isolation with
separate disks for journal
and ledger

ENTRY <L1 EO>
ENTRY <L1 E1>
ENTRY <L1 E2>
ENTRY<L2 EO>
ENTRY<L2 E1>
ENTRY <L1 E3>
ENTRY <L2
E2>
ENTRY <L3 EO>
BOOKKEEPER
CLIENT
BROKER
JOURNAL 0
WRITE
CACHE
DISK
MEMORY
BOOKKEEPER

Bookkeeper: Read Cache
write cache
read
cache
entry log
L1 index
L2 index
flush

Bookkeeper: Read Cache
write cache
read
cache
entry log
L1 index
L2 index
flush
entry =>
message batch

Pulsar Data Stores : Read Write Internals

Bookkeeper: Ledgers
• Sequential reads
• Still interleaved across topics
• indexed
• rocksDB
• (ledger, entry id => log file, offset)
• Read path:
• Broker çache
• no n/w trip,
• no disk access
• Bookkeeper
• Write Cache
• Read Cache
• RocksDB index
• Access from disk

Bookkeeper: LAC & LAP
• LAC : Last add confirmed
• In response to write().
• This entry and all previous written (cumulative ack).
• As a result of the sequential write.
• LAP : Last add pushed
• Readers can read until LAC

Bookkeeper: Fencing
• Recovery after
• bookie failure
• network partition b/w broker and bookkeeper
• New bookie
• Puts ledger state in recovery,
• Fences the ledger with consensus
• Writes to new ledger.
• Old owner can’t write to ledger anymore.
• Consistency
• No split brain

Metadata Layer
1.Pointers to data
a.Topic ledgers mapping
b.Ledger topics mapping
c.Topic schema mapping
2.Service Discovery
a.List of available bookies (read / write / both?)
b.List of available brokers
c.Which broker owns which topic
d.How much load on which topic etc
3.Distributed coordination
a.Locks
b.Leader election

Metadata Layer
4. System Configuration
a.Dynamic configurations for hot reload
b.feature flags
5. Provisioning Configuration
a.Metadata for tenants, namespaces etc
b.Namespace policies

References:
• https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-
works
• Pulsar without Zookeeper: Introducing the Metadata Access Layer in Pulsar
• TGI Pulsar 009: Introduction of Apache BookKeeper

Q & A time
Drop me a hello at:
https://www.linkedin.com/in/shivjijha/
https://twitter.com/ShivjiJha
Pulsar community
https://apache-pulsar.slack.com/
users@pulsar.apache.org
dev@pulsar.apache.org

How Pulsar Stores Your Data - Pulsar Summit NA 2021

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How Pulsar Stores Your Data - Pulsar Summit NA 2021

Similar to How Pulsar Stores Your Data - Pulsar Summit NA 2021 (20)

More from StreamNative

More from StreamNative (20)

Recently uploaded

Recently uploaded (20)

How Pulsar Stores Your Data - Pulsar Summit NA 2021