Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Summit NA 2021

Pulsar Virtual Summit North America 2021
Exactly-Once Made Easy:
Transactional Messaging
in Apache Pulsar
Sijie Guo
Co-Founder and CEO @ StreamNative
Addison Higham
Chief Architect @ StreamNative

Who are we?
● Sijie Guo (@sijieg)
● CEO, StreamNative
● PMC Member of Pulsar/BookKeeper
● Ex-Streamlio, Ex-Twitter
● Addison Higham (@addisonjh)
● Chief Architect, StreamNative
● Pulsar Committer
● Formerly Architect at Instructure

StreamNative
Founded by the creators of Apache Pulsar, StreamNative provides a
cloud-native, unified messaging and streaming platform powered by
Apache Pulsar to support multi-cloud and hybrid-cloud strategies

Messaging Semantics
✓ At-most once
✓ At-least once
✓ Exactly once

Messaging Semantics
✓ At-most once
✓ At-least once
✓ Exactly once
Since Pulsar was released

Messaging Semantics
✓ At-most once
✓ At-least once
✓ Exactly once
Idempotent Producer - 1.20.0-incubating
(PIP-6: Guaranteed Message Deduplication)

Revisit Existing Semantics

Pulsar’s Existing Semantics
Producer Broker Topic (Log)
send(m1)

append(m1)

m1

m1
ack(m1)

send(m2)
m1

append(m2)
m1

m1
m2
ack(m2)

m1
m2
ack(m2)
What do we do now?

At-least Once: Resend (m2)
m1
m2
send(m2)

m1
m2
append(m2)

m1
m2
m2
Duplicates!!

Why the duplicates are introduced?
✓ Broker can fail
✓ The request from Producer to Broker can fail
✓ Producer or Consumer can fail

At-most Once: Don’t resend
m1
m2

Message Deduplication
✓ Producer: Idempotent Producer
✓ Broker: Guaranteed Message Deduplication (PIP-6)
✓ Consumer: Reader + Checkpoints (Flink / Spark)

Idempotent Producer
✓ Producer Name - Identify who is producing the messages
✓ Sequence ID: Identify the message
✓ Producer Name + Sequence ID: The unique identifier for a message

Guaranteed Message Deduplication
✓ Broker maintains a map between Producer Name and last
produced sequence ID
✓ Broker accepts a message if its sequence ID is larger than the last
produced sequence ID
✓ Broker discards a message whose sequence ID is smaller than the
last produced Sequence ID
✓ Broker keeps a map between Producer Name and last Sequence ID
in a deduplication cursor (stored in Apache BookKeeper)

Exactly Once
send(1, m1)

Exactly Once Producer
append(1, m1)
1,
m1

append(2, m2)
1,
m1
2,
m2

1,
m1
2,
m2
ack(2, m2)
What do we do now?

1,
m1
2,
m2
send(2, m2)

1,
m1
2,
m2
append(2, m2)

1,
m1
2,
m2
append(2, m2)
Duplicate detected

1,
m1
2,
m2
ack(2, m2)

Enable Exactly Once
✓ Enable deduplication: `bin/pulsar-admin namespaces set-
deduplication -e tenant/namespace`
✓ Set producer name when creating a producer
✓ Specify increasing sequence id when producing messages (optional)

Exactly Once Consumer
Consumer Broker Topic (Log)
1,
m1
2,
m2
receive_after(m1)
Last Received: m1

Exactly Once Consumer
Consumer Broker Topic (Log)
1,
m1
2,
m2
dispatch(m2)
Last Received: m2

Limitations
✓ It only works when producing messages to one partition
✓ It only works for producing one message
✓ There is no atomicity when producing multiple messages to one
partition or many partitions
✓ Consumers are required to store the Message ID along with its state
and seek back to the Message ID when restoring the state

Introducing Transactions

PulsarCash, powered by Apache Pulsar
✓ Transfer Topic: record all the transfer requests
✓ Cash Transfer Function: perform the cash transfer action
✓ BalanceUpdate Topic: record the balance-update requests

PulsarCash
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)

Ack Transfer
Transfer Topic
BalanceUpdate
Topic
BalanceUpdate
Topic
Cash Transfer
Function
Ack: (100, 0, 0)

Reprocessed
Transfer!
Transfer Topic
BalanceUpdate
Topic
BalanceUpdate
Topic
Cash Transfer
Function
Ack: (100, 0, 0)

Lost Money!
Transfer Topic
BalanceUpdate
Topic
BalanceUpdate
Topic
Cash Transfer
Function
Ack: (100, 0, 0)

Pulsar Transaction Explained

Transaction Semantics
✓ Atomic writes across multiple topic partitions
✓ Atomic acknowledgments across multiple topic partitions
✓ All the operations made within one transaction either all succeed or
all fail
✓ Conditional acknowledgement to handle network partition
✓ Consumers are *ONLY* allowed to read committed messages

Without Transaction API
Message<Transfer> tf = inputConsumer.receive();
MessageId msg1 = producer1.newMessage().value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send();
MessageId msg2 = producer2.newMessage().value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(tf.getMessageId());

Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
1) Receive Message 2) Produce Messages
3) Ack Message

Transaction API
Message<Transfer> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(...).build().get();
MessageId msg1 = producer1.newMessage(txn).value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send;
MessageId msg2 = producer2.newMessage(txn).value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(message.getMessageId(), txn);
txn.commit().get();

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer

Transaction Coordinator (TC)
✓ TC: Transaction manager, coordinating committing and aborting
transactions
✓ In-Memory + Transaction Log
✓ Transaction Log is powered by a partitioned Pulsar topic
✓ Locating a TC is locating a partition of the transaction log topic

Transaction Buffer (TB)
✓ TB: store and index transaction data (per topic partition)
✓ TB is implemented using another managed-ledger (ML)
✓ Transactional messages are appended to TB
✓ Transaction index is maintained in memory and snapshotted to
ledgers
✓ Transaction index can be rebuilt from TB

Transactional Subscription State (TSS)
✓ Introduce ACK_PENDING state
✓ Add response for acknowledgment, aka Ack-on-Ack
✓ Acknowledgment state is updated to cursor ledger
✓ Acknowledgment state can be replayed from cursor ledger

Transaction Execution Flow

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
1. New Txn

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
2.2 Produced Messages To
Topics with Txn
2.1 Add Produced
Topics To Txn

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
3.1 Add Acked
Subscriptions to Txn
Tx1: ACK (M0)
3.2 Ack messages with Txn
Tx1: add [S0]

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
4.0 Commit Txn
Tx1: ACK (M0)
Tx1: add [S0]
4.0.1 Committing Txn
Tx1: Committing

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1: Committed Tx1: (c) Tx1: (c)
4.1.1 Commit Txn
on Subscriptions
4.1.0 Commit Txn
on Topics

Pulsar Client
Cursor
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1: Committed
Tx1: Committed Tx1: (c) Tx1: (c)
4.2 Commit Txn

Failure Handling in Transaction

Transaction API - Async Example
inputConsumer.receiveAsync.thenCompose(tf -> {
return
client.newTransaction().withTransactionTimeout(...).build().thenCompose(txn ->{
producer1.newMessage(txn).value(BalanceTransfer(...)).sendAsync();
producer2.newMessage(txn).value(BalacneTransfer(...)).sendAsync();
inputConsumer.acknowledgeAsync(tf.getMessageId(), txn);
return txn.commit();
});
});

PulsarCash
Transfer Topic
BalanceUpdate
Topic
BalanceUpdate
Topic
Cash Transfer
Function
Ack: (100, 0, 0)

PulsarCash
Transfer Topic
BalanceUpdate
Topic
BalanceUpdate
Topic
Cash Transfer
Function
Ack: (100, 0, 0)
Transaction

Pulsar Transaction
Makes Messaging and Streaming
easy and reliable for everyone

What’s Next
✓ Transaction Support in other languages (e.g. C++, Go)
✓ Transaction in Pulsar Functions & Pulsar IO
✓ Transaction in Kafka-on-Pulsar, AMQP-on-Pulsar, MQTT-on-Pulsar
✓ Transaction with State Storage in Pulsar Functions
✓ ...

Credits
✓ Developers: Penghui, Bo Cong, Ran Gao, Yong Zhang, Marvin Cai
✓ Reviewers: Jia Zhai, Matteo Merli, Addison Higham, Sijie Guo
✓ … and many other Pulsar users & contributors

Try Pulsar Transaction today!
✓ GA Release: 2.8.0
✓ Try it today!
✓ StreamNative Cloud - Fully managed SaaS service
✓ StreamNative Platform - Self-managed enterprise software

StreamNative Platform
Self-managed enterprise offering of Pulsar
✓ Kafka-on-Pulsar
✓ Function Mesh for serverless streaming
✓ Enterprise-ready security
✓ Pulsar Operators
✓ Seamless StreamNative Cloud
experience
https://streamnative.io/platform

StreamNative Cloud
Fully-managed Pulsar-as-a-Service
✓ Massive scale without the ops overhead
✓ Built for hybrid and multi-cloud
✓ Cloud-Hosted & Cloud-Managed
✓ Stream across public clouds for multi-
cloud applications
✓ Elastic, consumption-based pricing with
‘pay as you go’ model
✓ Reliably scale mission-critical apps
https://streamnative.io/cloud

We’re hiring
Build Pulsar with the team that builds Pulsar
✓ Work with the creators of Pulsar
✓ Exciting, growth-stage company
✓ Open and collaborative environment
✓ Competitive compensation and benefits
✓ Best teammates on earth
https://streamnative.io/careers

Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Summit NA 2021

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Summit NA 2021

Similar to Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Summit NA 2021 (20)

More from StreamNative

More from StreamNative (20)

Recently uploaded

Recently uploaded (20)

Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Summit NA 2021