Pulsar Virtual Summit North America 2021
Exactly-Once Made Easy:
Transactional Messaging
in Apache Pulsar
Sijie Guo
Co-Founder and CEO @ StreamNative
Addison Higham
Chief Architect @ StreamNative
Pulsar Virtual Summit North America 2021
Who are we?
● Sijie Guo (@sijieg)
● CEO, StreamNative
● PMC Member of Pulsar/BookKeeper
● Ex-Streamlio, Ex-Twitter
● Addison Higham (@addisonjh)
● Chief Architect, StreamNative
● Pulsar Committer
● Formerly Architect at Instructure
Pulsar Virtual Summit North America 2021
StreamNative
Founded by the creators of Apache Pulsar, StreamNative provides a
cloud-native, unified messaging and streaming platform powered by
Apache Pulsar to support multi-cloud and hybrid-cloud strategies
Messaging Semantics
✓ At-most once
✓ At-least once
✓ Exactly once
Messaging Semantics
✓ At-most once
✓ At-least once
✓ Exactly once
Since Pulsar was released
Messaging Semantics
✓ At-most once
✓ At-least once
✓ Exactly once
Idempotent Producer - 1.20.0-incubating
(PIP-6: Guaranteed Message Deduplication)
Pulsar Virtual Summit North America 2021
Revisit Existing Semantics
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
send(m1)
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
append(m1)
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
m1
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
m1
ack(m1)
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
m1
ack(m1)
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
send(m2)
m1
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
append(m2)
m1
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
m1
m2
ack(m2)
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
m1
m2
ack(m2)
Pulsar’s Existing Semantics
Producer Broker Topic (Log)
m1
m2
ack(m2)
What do we do now?
At-least Once: Resend (m2)
Producer Broker Topic (Log)
m1
m2
send(m2)
At-least Once: Resend (m2)
Producer Broker Topic (Log)
m1
m2
append(m2)
At-least Once: Resend (m2)
Producer Broker Topic (Log)
m1
m2
m2
Duplicates!!
Why the duplicates are introduced?
✓ Broker can fail
✓ The request from Producer to Broker can fail
✓ Producer or Consumer can fail
At-most Once: Don’t resend
Producer Broker Topic (Log)
m1
m2
I want exactly-once
Message Deduplication
✓ Producer: Idempotent Producer
✓ Broker: Guaranteed Message Deduplication (PIP-6)
✓ Consumer: Reader + Checkpoints (Flink / Spark)
Idempotent Producer
✓ Producer Name - Identify who is producing the messages
✓ Sequence ID: Identify the message
✓ Producer Name + Sequence ID: The unique identifier for a message
Guaranteed Message Deduplication
✓ Broker maintains a map between Producer Name and last
produced sequence ID
✓ Broker accepts a message if its sequence ID is larger than the last
produced sequence ID
✓ Broker discards a message whose sequence ID is smaller than the
last produced Sequence ID
✓ Broker keeps a map between Producer Name and last Sequence ID
in a deduplication cursor (stored in Apache BookKeeper)
Exactly Once
Producer Broker Topic (Log)
send(1, m1)
Exactly Once Producer
Producer Broker Topic (Log)
append(1, m1)
1,
m1
Exactly Once Producer
Producer Broker Topic (Log)
append(2, m2)
1,
m1
2,
m2
Exactly Once Producer
Producer Broker Topic (Log)
1,
m1
2,
m2
ack(2, m2)
What do we do now?
Exactly Once Producer
Producer Broker Topic (Log)
1,
m1
2,
m2
send(2, m2)
Exactly Once Producer
Producer Broker Topic (Log)
1,
m1
2,
m2
append(2, m2)
Exactly Once Producer
Producer Broker Topic (Log)
1,
m1
2,
m2
append(2, m2)
Duplicate detected
Exactly Once Producer
Producer Broker Topic (Log)
1,
m1
2,
m2
ack(2, m2)
Enable Exactly Once
✓ Enable deduplication: `bin/pulsar-admin namespaces set-
deduplication -e tenant/namespace`
✓ Set producer name when creating a producer
✓ Specify increasing sequence id when producing messages (optional)
Exactly Once Consumer
Consumer Broker Topic (Log)
1,
m1
2,
m2
receive_after(m1)
Last Received: m1
Exactly Once Consumer
Consumer Broker Topic (Log)
1,
m1
2,
m2
dispatch(m2)
Last Received: m2
Limitations
✓ It only works when producing messages to one partition
✓ It only works for producing one message
✓ There is no atomicity when producing multiple messages to one
partition or many partitions
✓ Consumers are required to store the Message ID along with its state
and seek back to the Message ID when restoring the state
Pulsar Virtual Summit North America 2021
Introducing Transactions
PulsarCash
PulsarCash
PulsarCash, powered by Apache Pulsar
✓ Transfer Topic: record all the transfer requests
✓ Cash Transfer Function: perform the cash transfer action
✓ BalanceUpdate Topic: record the balance-update requests
PulsarCash
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)
Ack Transfer
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)
Ack: (100, 0, 0)
Reprocessed
Transfer!
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)
Ack: (100, 0, 0)
Lost Money!
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)
Ack: (100, 0, 0)
Pulsar Virtual Summit North America 2021
Pulsar Transaction Explained
Transaction Semantics
✓ Atomic writes across multiple topic partitions
✓ Atomic acknowledgments across multiple topic partitions
✓ All the operations made within one transaction either all succeed or
all fail
✓ Conditional acknowledgement to handle network partition
✓ Consumers are *ONLY* allowed to read committed messages
Without Transaction API
Message<Transfer> tf = inputConsumer.receive();
MessageId msg1 = producer1.newMessage().value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send();
MessageId msg2 = producer2.newMessage().value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(tf.getMessageId());
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
1) Receive Message 2) Produce Messages
3) Ack Message
Transaction API
Message<Transfer> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(...).build().get();
MessageId msg1 = producer1.newMessage(txn).value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send;
MessageId msg2 = producer2.newMessage(txn).value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(message.getMessageId(), txn);
txn.commit().get();
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Transaction Coordinator (TC)
✓ TC: Transaction manager, coordinating committing and aborting
transactions
✓ In-Memory + Transaction Log
✓ Transaction Log is powered by a partitioned Pulsar topic
✓ Locating a TC is locating a partition of the transaction log topic
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Transaction Buffer (TB)
✓ TB: store and index transaction data (per topic partition)
✓ TB is implemented using another managed-ledger (ML)
✓ Transactional messages are appended to TB
✓ Transaction index is maintained in memory and snapshotted to
ledgers
✓ Transaction index can be rebuilt from TB
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 1 Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Transactional Subscription State (TSS)
✓ Introduce ACK_PENDING state
✓ Add response for acknowledgment, aka Ack-on-Ack
✓ Acknowledgment state is updated to cursor ledger
✓ Acknowledgment state can be replayed from cursor ledger
Pulsar Virtual Summit North America 2021
Transaction Execution Flow
Transaction API
Message<Transfer> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(...).build().get();
MessageId msg1 = producer1.newMessage(txn).value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send;
MessageId msg2 = producer2.newMessage(txn).value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(message.getMessageId(), txn);
txn.commit().get();
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
1. New Txn
Transaction API
Message<Transfer> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(...).build().get();
MessageId msg1 = producer1.newMessage(txn).value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send;
MessageId msg2 = producer2.newMessage(txn).value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(message.getMessageId(), txn);
txn.commit().get();
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
2.2 Produced Messages To
Topics with Txn
2.1 Add Produced
Topics To Txn
Transaction API
Message<Transfer> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(...).build().get();
MessageId msg1 = producer1.newMessage(txn).value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send;
MessageId msg2 = producer2.newMessage(txn).value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(message.getMessageId(), txn);
txn.commit().get();
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
3.1 Add Acked
Subscriptions to Txn
Tx1: ACK (M0)
3.2 Ack messages with Txn
Tx1: add [S0]
Transaction API
Message<Transfer> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(...).build().get();
MessageId msg1 = producer1.newMessage(txn).value(
BalanceTransfer(tf.sender, tf.amount, “debit”)).send;
MessageId msg2 = producer2.newMessage(txn).value(
BalanceTransfer(tf.receiver, tf.amount, “credit”)).send();
inputConsumer.acknowledge(message.getMessageId(), txn);
txn.commit().get();
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
4.0 Commit Txn
Tx1: ACK (M0)
Tx1: add [S0]
4.0.1 Committing Txn
Tx1: Committing
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1: Committed Tx1: (c) Tx1: (c)
4.1.1 Commit Txn
on Subscriptions
4.1.0 Commit Txn
on Topics
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1: Committed
Tx1: Committed Tx1: (c) Tx1: (c)
4.2 Commit Txn
Pulsar Virtual Summit North America 2021
Failure Handling in Transaction
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
1. New Txn
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
2.2 Produced Messages To
Topics with Txn
2.1 Add Produced
Topics To Txn
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
3.1 Add Acked
Subscriptions to Txn
Tx1: ACK (M0)
3.2 Ack messages with Txn
Tx1: add [S0]
Pulsar Client
Cursor
Input Topic Output Topic 1 Output Topic 2
Broker 0 Broker 1
Input
Consumer
Producer 2
Coordinator
Transaction Log
Txn
Buffer
Txn
Buffer
Txn
New Txn
Producer 1
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1: Committed Tx1: (c) Tx1: (c)
4.1.1 Commit Txn
on Subscriptions
4.1.0 Commit Txn
on Topics
Transaction API - Async Example
inputConsumer.receiveAsync.thenCompose(tf -> {
return
client.newTransaction().withTransactionTimeout(...).build().thenCompose(txn ->{
producer1.newMessage(txn).value(BalanceTransfer(...)).sendAsync();
producer2.newMessage(txn).value(BalacneTransfer(...)).sendAsync();
inputConsumer.acknowledgeAsync(tf.getMessageId(), txn);
return txn.commit();
});
});
PulsarCash
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)
Ack: (100, 0, 0)
PulsarCash
Transfer Topic
User:bob, credit($10)
BalanceUpdate
Topic
BalanceUpdate
Topic
User:alice, debit($10)
Cash Transfer
Function
(100,0,0): transfer ($10, alice -> bob)
Ack: (100, 0, 0)
Transaction
Pulsar Virtual Summit North America 2021
Pulsar Transaction
Makes Messaging and Streaming
easy and reliable for everyone
What’s Next
✓ Transaction Support in other languages (e.g. C++, Go)
✓ Transaction in Pulsar Functions & Pulsar IO
✓ Transaction in Kafka-on-Pulsar, AMQP-on-Pulsar, MQTT-on-Pulsar
✓ Transaction with State Storage in Pulsar Functions
✓ ...
Credits
✓ Developers: Penghui, Bo Cong, Ran Gao, Yong Zhang, Marvin Cai
✓ Reviewers: Jia Zhai, Matteo Merli, Addison Higham, Sijie Guo
✓ … and many other Pulsar users & contributors
Try Pulsar Transaction today!
✓ GA Release: 2.8.0
✓ Try it today!
✓ StreamNative Cloud - Fully managed SaaS service
✓ StreamNative Platform - Self-managed enterprise software
StreamNative Platform
Self-managed enterprise offering of Pulsar
✓ Kafka-on-Pulsar
✓ Function Mesh for serverless streaming
✓ Enterprise-ready security
✓ Pulsar Operators
✓ Seamless StreamNative Cloud
experience
https://streamnative.io/platform
StreamNative Cloud
Fully-managed Pulsar-as-a-Service
✓ Massive scale without the ops overhead
✓ Built for hybrid and multi-cloud
✓ Cloud-Hosted & Cloud-Managed
✓ Stream across public clouds for multi-
cloud applications
✓ Elastic, consumption-based pricing with
‘pay as you go’ model
✓ Reliably scale mission-critical apps
https://streamnative.io/cloud
We’re hiring
Build Pulsar with the team that builds Pulsar
✓ Work with the creators of Pulsar
✓ Exciting, growth-stage company
✓ Open and collaborative environment
✓ Competitive compensation and benefits
✓ Best teammates on earth
https://streamnative.io/careers

Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Summit NA 2021

  • 1.
    Pulsar Virtual SummitNorth America 2021 Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar Sijie Guo Co-Founder and CEO @ StreamNative Addison Higham Chief Architect @ StreamNative
  • 2.
    Pulsar Virtual SummitNorth America 2021 Who are we? ● Sijie Guo (@sijieg) ● CEO, StreamNative ● PMC Member of Pulsar/BookKeeper ● Ex-Streamlio, Ex-Twitter ● Addison Higham (@addisonjh) ● Chief Architect, StreamNative ● Pulsar Committer ● Formerly Architect at Instructure
  • 3.
    Pulsar Virtual SummitNorth America 2021 StreamNative Founded by the creators of Apache Pulsar, StreamNative provides a cloud-native, unified messaging and streaming platform powered by Apache Pulsar to support multi-cloud and hybrid-cloud strategies
  • 4.
    Messaging Semantics ✓ At-mostonce ✓ At-least once ✓ Exactly once
  • 5.
    Messaging Semantics ✓ At-mostonce ✓ At-least once ✓ Exactly once Since Pulsar was released
  • 6.
    Messaging Semantics ✓ At-mostonce ✓ At-least once ✓ Exactly once Idempotent Producer - 1.20.0-incubating (PIP-6: Guaranteed Message Deduplication)
  • 7.
    Pulsar Virtual SummitNorth America 2021 Revisit Existing Semantics
  • 8.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) send(m1)
  • 9.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) append(m1)
  • 10.
  • 11.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) m1 ack(m1)
  • 12.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) m1 ack(m1)
  • 13.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) send(m2) m1
  • 14.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) append(m2) m1
  • 15.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) m1 m2 ack(m2)
  • 16.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) m1 m2 ack(m2)
  • 17.
    Pulsar’s Existing Semantics ProducerBroker Topic (Log) m1 m2 ack(m2) What do we do now?
  • 18.
    At-least Once: Resend(m2) Producer Broker Topic (Log) m1 m2 send(m2)
  • 19.
    At-least Once: Resend(m2) Producer Broker Topic (Log) m1 m2 append(m2)
  • 20.
    At-least Once: Resend(m2) Producer Broker Topic (Log) m1 m2 m2 Duplicates!!
  • 21.
    Why the duplicatesare introduced? ✓ Broker can fail ✓ The request from Producer to Broker can fail ✓ Producer or Consumer can fail
  • 22.
    At-most Once: Don’tresend Producer Broker Topic (Log) m1 m2
  • 23.
  • 24.
    Message Deduplication ✓ Producer:Idempotent Producer ✓ Broker: Guaranteed Message Deduplication (PIP-6) ✓ Consumer: Reader + Checkpoints (Flink / Spark)
  • 25.
    Idempotent Producer ✓ ProducerName - Identify who is producing the messages ✓ Sequence ID: Identify the message ✓ Producer Name + Sequence ID: The unique identifier for a message
  • 26.
    Guaranteed Message Deduplication ✓Broker maintains a map between Producer Name and last produced sequence ID ✓ Broker accepts a message if its sequence ID is larger than the last produced sequence ID ✓ Broker discards a message whose sequence ID is smaller than the last produced Sequence ID ✓ Broker keeps a map between Producer Name and last Sequence ID in a deduplication cursor (stored in Apache BookKeeper)
  • 27.
    Exactly Once Producer BrokerTopic (Log) send(1, m1)
  • 28.
    Exactly Once Producer ProducerBroker Topic (Log) append(1, m1) 1, m1
  • 29.
    Exactly Once Producer ProducerBroker Topic (Log) append(2, m2) 1, m1 2, m2
  • 30.
    Exactly Once Producer ProducerBroker Topic (Log) 1, m1 2, m2 ack(2, m2) What do we do now?
  • 31.
    Exactly Once Producer ProducerBroker Topic (Log) 1, m1 2, m2 send(2, m2)
  • 32.
    Exactly Once Producer ProducerBroker Topic (Log) 1, m1 2, m2 append(2, m2)
  • 33.
    Exactly Once Producer ProducerBroker Topic (Log) 1, m1 2, m2 append(2, m2) Duplicate detected
  • 34.
    Exactly Once Producer ProducerBroker Topic (Log) 1, m1 2, m2 ack(2, m2)
  • 35.
    Enable Exactly Once ✓Enable deduplication: `bin/pulsar-admin namespaces set- deduplication -e tenant/namespace` ✓ Set producer name when creating a producer ✓ Specify increasing sequence id when producing messages (optional)
  • 36.
    Exactly Once Consumer ConsumerBroker Topic (Log) 1, m1 2, m2 receive_after(m1) Last Received: m1
  • 37.
    Exactly Once Consumer ConsumerBroker Topic (Log) 1, m1 2, m2 dispatch(m2) Last Received: m2
  • 38.
    Limitations ✓ It onlyworks when producing messages to one partition ✓ It only works for producing one message ✓ There is no atomicity when producing multiple messages to one partition or many partitions ✓ Consumers are required to store the Message ID along with its state and seek back to the Message ID when restoring the state
  • 39.
    Pulsar Virtual SummitNorth America 2021 Introducing Transactions
  • 40.
  • 41.
  • 42.
    PulsarCash, powered byApache Pulsar ✓ Transfer Topic: record all the transfer requests ✓ Cash Transfer Function: perform the cash transfer action ✓ BalanceUpdate Topic: record the balance-update requests
  • 43.
    PulsarCash Transfer Topic User:bob, credit($10) BalanceUpdate Topic BalanceUpdate Topic User:alice,debit($10) Cash Transfer Function (100,0,0): transfer ($10, alice -> bob)
  • 44.
    Ack Transfer Transfer Topic User:bob,credit($10) BalanceUpdate Topic BalanceUpdate Topic User:alice, debit($10) Cash Transfer Function (100,0,0): transfer ($10, alice -> bob) Ack: (100, 0, 0)
  • 45.
    Reprocessed Transfer! Transfer Topic User:bob, credit($10) BalanceUpdate Topic BalanceUpdate Topic User:alice,debit($10) Cash Transfer Function (100,0,0): transfer ($10, alice -> bob) Ack: (100, 0, 0)
  • 46.
    Lost Money! Transfer Topic User:bob,credit($10) BalanceUpdate Topic BalanceUpdate Topic User:alice, debit($10) Cash Transfer Function (100,0,0): transfer ($10, alice -> bob) Ack: (100, 0, 0)
  • 47.
    Pulsar Virtual SummitNorth America 2021 Pulsar Transaction Explained
  • 48.
    Transaction Semantics ✓ Atomicwrites across multiple topic partitions ✓ Atomic acknowledgments across multiple topic partitions ✓ All the operations made within one transaction either all succeed or all fail ✓ Conditional acknowledgement to handle network partition ✓ Consumers are *ONLY* allowed to read committed messages
  • 49.
    Without Transaction API Message<Transfer>tf = inputConsumer.receive(); MessageId msg1 = producer1.newMessage().value( BalanceTransfer(tf.sender, tf.amount, “debit”)).send(); MessageId msg2 = producer2.newMessage().value( BalanceTransfer(tf.receiver, tf.amount, “credit”)).send(); inputConsumer.acknowledge(tf.getMessageId());
  • 50.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 1 Producer 2 1) Receive Message 2) Produce Messages 3) Ack Message
  • 51.
    Transaction API Message<Transfer> message= inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(...).build().get(); MessageId msg1 = producer1.newMessage(txn).value( BalanceTransfer(tf.sender, tf.amount, “debit”)).send; MessageId msg2 = producer2.newMessage(txn).value( BalanceTransfer(tf.receiver, tf.amount, “credit”)).send(); inputConsumer.acknowledge(message.getMessageId(), txn); txn.commit().get();
  • 52.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 1 Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer
  • 53.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 1 Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer
  • 54.
    Transaction Coordinator (TC) ✓TC: Transaction manager, coordinating committing and aborting transactions ✓ In-Memory + Transaction Log ✓ Transaction Log is powered by a partitioned Pulsar topic ✓ Locating a TC is locating a partition of the transaction log topic
  • 55.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 1 Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer
  • 56.
    Transaction Buffer (TB) ✓TB: store and index transaction data (per topic partition) ✓ TB is implemented using another managed-ledger (ML) ✓ Transactional messages are appended to TB ✓ Transaction index is maintained in memory and snapshotted to ledgers ✓ Transaction index can be rebuilt from TB
  • 57.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 1 Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer
  • 58.
    Transactional Subscription State(TSS) ✓ Introduce ACK_PENDING state ✓ Add response for acknowledgment, aka Ack-on-Ack ✓ Acknowledgment state is updated to cursor ledger ✓ Acknowledgment state can be replayed from cursor ledger
  • 59.
    Pulsar Virtual SummitNorth America 2021 Transaction Execution Flow
  • 60.
    Transaction API Message<Transfer> message= inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(...).build().get(); MessageId msg1 = producer1.newMessage(txn).value( BalanceTransfer(tf.sender, tf.amount, “debit”)).send; MessageId msg2 = producer2.newMessage(txn).value( BalanceTransfer(tf.receiver, tf.amount, “credit”)).send(); inputConsumer.acknowledge(message.getMessageId(), txn); txn.commit().get();
  • 61.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 1. New Txn
  • 62.
    Transaction API Message<Transfer> message= inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(...).build().get(); MessageId msg1 = producer1.newMessage(txn).value( BalanceTransfer(tf.sender, tf.amount, “debit”)).send; MessageId msg2 = producer2.newMessage(txn).value( BalanceTransfer(tf.receiver, tf.amount, “credit”)).send(); inputConsumer.acknowledge(message.getMessageId(), txn); txn.commit().get();
  • 63.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 2.2 Produced Messages To Topics with Txn 2.1 Add Produced Topics To Txn
  • 64.
    Transaction API Message<Transfer> message= inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(...).build().get(); MessageId msg1 = producer1.newMessage(txn).value( BalanceTransfer(tf.sender, tf.amount, “debit”)).send; MessageId msg2 = producer2.newMessage(txn).value( BalanceTransfer(tf.receiver, tf.amount, “credit”)).send(); inputConsumer.acknowledge(message.getMessageId(), txn); txn.commit().get();
  • 65.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 3.1 Add Acked Subscriptions to Txn Tx1: ACK (M0) 3.2 Ack messages with Txn Tx1: add [S0]
  • 66.
    Transaction API Message<Transfer> message= inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(...).build().get(); MessageId msg1 = producer1.newMessage(txn).value( BalanceTransfer(tf.sender, tf.amount, “debit”)).send; MessageId msg2 = producer2.newMessage(txn).value( BalanceTransfer(tf.receiver, tf.amount, “credit”)).send(); inputConsumer.acknowledge(message.getMessageId(), txn); txn.commit().get();
  • 67.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 4.0 Commit Txn Tx1: ACK (M0) Tx1: add [S0] 4.0.1 Committing Txn Tx1: Committing
  • 68.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 Tx1: ACK (M0) Tx1: add [S0] Tx1: Committing Tx1: Committed Tx1: (c) Tx1: (c) 4.1.1 Commit Txn on Subscriptions 4.1.0 Commit Txn on Topics
  • 69.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 Tx1: ACK (M0) Tx1: add [S0] Tx1: Committing Tx1: Committed Tx1: Committed Tx1: (c) Tx1: (c) 4.2 Commit Txn
  • 70.
    Pulsar Virtual SummitNorth America 2021 Failure Handling in Transaction
  • 71.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 1. New Txn
  • 72.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 2.2 Produced Messages To Topics with Txn 2.1 Add Produced Topics To Txn
  • 73.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 3.1 Add Acked Subscriptions to Txn Tx1: ACK (M0) 3.2 Ack messages with Txn Tx1: add [S0]
  • 74.
    Pulsar Client Cursor Input TopicOutput Topic 1 Output Topic 2 Broker 0 Broker 1 Input Consumer Producer 2 Coordinator Transaction Log Txn Buffer Txn Buffer Txn New Txn Producer 1 Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 Tx1: ACK (M0) Tx1: add [S0] Tx1: Committing Tx1: Committed Tx1: (c) Tx1: (c) 4.1.1 Commit Txn on Subscriptions 4.1.0 Commit Txn on Topics
  • 75.
    Transaction API -Async Example inputConsumer.receiveAsync.thenCompose(tf -> { return client.newTransaction().withTransactionTimeout(...).build().thenCompose(txn ->{ producer1.newMessage(txn).value(BalanceTransfer(...)).sendAsync(); producer2.newMessage(txn).value(BalacneTransfer(...)).sendAsync(); inputConsumer.acknowledgeAsync(tf.getMessageId(), txn); return txn.commit(); }); });
  • 76.
    PulsarCash Transfer Topic User:bob, credit($10) BalanceUpdate Topic BalanceUpdate Topic User:alice,debit($10) Cash Transfer Function (100,0,0): transfer ($10, alice -> bob) Ack: (100, 0, 0)
  • 77.
    PulsarCash Transfer Topic User:bob, credit($10) BalanceUpdate Topic BalanceUpdate Topic User:alice,debit($10) Cash Transfer Function (100,0,0): transfer ($10, alice -> bob) Ack: (100, 0, 0) Transaction
  • 78.
    Pulsar Virtual SummitNorth America 2021 Pulsar Transaction Makes Messaging and Streaming easy and reliable for everyone
  • 79.
    What’s Next ✓ TransactionSupport in other languages (e.g. C++, Go) ✓ Transaction in Pulsar Functions & Pulsar IO ✓ Transaction in Kafka-on-Pulsar, AMQP-on-Pulsar, MQTT-on-Pulsar ✓ Transaction with State Storage in Pulsar Functions ✓ ...
  • 80.
    Credits ✓ Developers: Penghui,Bo Cong, Ran Gao, Yong Zhang, Marvin Cai ✓ Reviewers: Jia Zhai, Matteo Merli, Addison Higham, Sijie Guo ✓ … and many other Pulsar users & contributors
  • 81.
    Try Pulsar Transactiontoday! ✓ GA Release: 2.8.0 ✓ Try it today! ✓ StreamNative Cloud - Fully managed SaaS service ✓ StreamNative Platform - Self-managed enterprise software
  • 82.
    StreamNative Platform Self-managed enterpriseoffering of Pulsar ✓ Kafka-on-Pulsar ✓ Function Mesh for serverless streaming ✓ Enterprise-ready security ✓ Pulsar Operators ✓ Seamless StreamNative Cloud experience https://streamnative.io/platform
  • 83.
    StreamNative Cloud Fully-managed Pulsar-as-a-Service ✓Massive scale without the ops overhead ✓ Built for hybrid and multi-cloud ✓ Cloud-Hosted & Cloud-Managed ✓ Stream across public clouds for multi- cloud applications ✓ Elastic, consumption-based pricing with ‘pay as you go’ model ✓ Reliably scale mission-critical apps https://streamnative.io/cloud
  • 84.
    We’re hiring Build Pulsarwith the team that builds Pulsar ✓ Work with the creators of Pulsar ✓ Exciting, growth-stage company ✓ Open and collaborative environment ✓ Competitive compensation and benefits ✓ Best teammates on earth https://streamnative.io/careers