Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transaction preview of Apache Pulsar

79 views

Published on

Transaction preview of Apache Pulsar

Published in: Data & Analytics
  • Be the first to comment

Transaction preview of Apache Pulsar

  1. 1. Penghui Li Apache Pulsar PMC Member Transaction Support in Pulsar Yong Zhang Apache Pulsar Contributor
  2. 2. What is Apache Pulsar?
  3. 3. Pub/Sub Messaging
  4. 4. “Flexible Pub/Sub messaging backed by durable log/stream storage”
  5. 5. 2012: Pulsar idea started at Yahoo!
 5 years on production, 100+ applications, 10+ data centers 2016/09 Yahoo open sourced Pulsar
 2017/06 Yahoo donated Pulsar to ASF
 2018/09 Pulsar graduated as a Top-Level project
 2018/09 InfoWorld Best Open Source Project
  6. 6. Pulsar Community
  7. 7. Pulsar Community
  8. 8. • At-most once • At-least once • Exactly once Messaging Semantics
  9. 9. • At-most once • At-least once • Exactly once Messaging Semantics Before 1.20.0-incubating
  10. 10. • At-most once • At-least once • Exactly once Messaging Semantics PIP-6: Guaranteed Message Deduplication
  11. 11. Revisit Existing Semantics
  12. 12. Pulsar’s Existing Semantics Log BrokerProducer send(m1)
  13. 13. Pulsar’s Existing Semantics Log BrokerProducer append(m1)
  14. 14. Pulsar’s Existing Semantics Log BrokerProducer m1
  15. 15. Pulsar’s Existing Semantics Log BrokerProducer ack(m1) m1
  16. 16. Pulsar’s Existing Semantics Log BrokerProducer ack(m1) m1
  17. 17. Pulsar’s Existing Semantics Log BrokerProducer send(m2) m1
  18. 18. Pulsar’s Existing Semantics Log BrokerProducer append(m2) m1 m2
  19. 19. Pulsar’s Existing Semantics Log BrokerProducer m1 m2 ack(m2)
  20. 20. Pulsar’s Existing Semantics Log BrokerProducer m1 m2 ack(m2) What do we do now?
  21. 21. At Least Once Log BrokerProducer m1 m2 send(m2)
  22. 22. At Least Once Log BrokerProducer m1 m2 append(m2) m2
  23. 23. At Least Once Log BrokerProducer m1 m2 append(m2) m2 Duplicates !!
  24. 24. • Broker can fail • The request from Producer to Broker can fail • Producer or Consumer can fail Why the duplicates are introduced?
  25. 25. I want exactly-once
  26. 26. • Producer: Idempotent Producer • Broker: Guaranteed Message Deduplication (PIP-6) • Consumer: Reader + Checkpoints (Flink / Spark) Message Deduplication
  27. 27. • Producer Name - Identify who is producing the messages • Sequence ID - Identify the message • Producer Name + Sequence ID: The unique identifier for a message Idempotent Producer
  28. 28. • Broker maintains a map between Producer Name and Last- Produced-Sequence-ID • Broker accepts messages if the sequence id of a new message is larger than its last produced sequence id • Broker treats messages whose sequence id are smaller • Broker keeps the map in a de-duplication cursor (stored in bookkeeper) Guaranteed Message Deduplication
  29. 29. Exactly Once Log BrokerProducer send(1, m1)
  30. 30. Exactly Once Log BrokerProducer append(1, m1) 1,m1
  31. 31. Exactly Once Log BrokerProducer append(2, m2) 1,m1 2,m2
  32. 32. Exactly Once Log BrokerProducer 1,m1 2,m2 ack(2, m2) What do we do now?
  33. 33. Exactly Once Log BrokerProducer 1,m1 2,m2 send(2, m2)
  34. 34. Exactly Once Log BrokerProducer 1,m1 2,m2 append(2, m2)
  35. 35. Exactly Once Log BrokerProducer 1,m1 2,m2 append(2, m2) Duplicate detected
  36. 36. Exactly Once Log BrokerProducer 1,m1 2,m2 ack(2, m2)
  37. 37. • `bin/pulsar-admin set-deduplication -e tenant/namespace` • Set producer name when creating a Producer • Specify increasing sequence id when producing messages Enable Exactly Once
  38. 38. • It only works when producing messages to one partition • It only works for producing one message • There is no atomicity when producing multiple messages to one partition or many partitions • Consumers are required to store the MessageId along with its state and seek back to the MessageId when restoring the state Limitations
  39. 39. Introducing Transactions
  40. 40. PulsarCash
  41. 41. PulsarCash Transfer $10 Alice Bob
  42. 42. • Transfer Topic : record the transfer requests • Cash Transfer Function: perform the cash transfer action • BalanceUpdate Topic: record the balance-update requests PulsarCash, powered by Apache Pulsar
  43. 43. PulsarCash Cash Transfer Function Balance user:alice, debit($10) balance update balance update user:bob, credit($10) (100,0,0): transfer($10, alice -> bob) Transfer Topic
  44. 44. Ack Transfer Cash Transfer function Balance user:alice, debit($10) balance update balance update user:bob, credit($10) (100,0,0): transfer($10, alice -> bob) Ack: (100,0,0)
  45. 45. Reprocessed Transfer! Cash Transfer function Balance user:alice, debit($10) balance update balance update user:bob, credit($10) (100,0,0): transfer($10, alice -> bob) Ack: (100,0,0)
  46. 46. Lost Money! Cash Transfer function Balance user:alice, debit($10) balance update balance update user:bob, credit($10) (100,0,0): transfer($10, alice -> bob) Ack: (100,0,0)
  47. 47. Pulsar Transaction Explained
  48. 48. • Atomic writes across multiple partitions • Atomic acknowledges across multiple subscriptions • All the actions made within one transaction either all succeed or all fail • Consumers are *ONLY* allowed to read committed messages Transaction Semantics
  49. 49. Message<String> message = inputConsumer.receive(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage().value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage().value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId()); Without Transaction API
  50. 50. Broker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Data Log Data Log Pulsar Client Input Consumer Producer 1 Producer 2 0) Receive Message 1) Produce Messages 2) Ack Messages
  51. 51. Message<String> message = inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage(txn).value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); txn.commit().get(); MessageId msgId1 = sendFuture1.get(); MessageId msgId2 = sendFuture2.get(); Transaction API
  52. 52. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2
  53. 53. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2
  54. 54. • TC: transaction manager, coordinating committing and aborting transactions • In-Memory + Transaction Log • Transaction Log is powered by a partitioned Pulsar topic • `pulsar/system/__transaction_coordinator_log` • Locating a TC is locating a partition of the transaction log topic Transaction Coordinator (TC)
  55. 55. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2
  56. 56. • TB: store and index transaction data per topic partition • TB is implemented using another ML (managed-ledger) as TB log • Messages are appended to into TB log • Transaction Index is maintained in memory and snapshotted to ledgers • Transaction Index can be replayed from TB log Transaction Buffer (TB)
  57. 57. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2
  58. 58. • Introduce ACK_PENDING state • Add response for acknowledgement, aka ack-on-ack • Ack state is updated to cursor ledger • Ack state can be replayed from cursor ledger Transactional Subscription State
  59. 59. Transaction Execution Flow
  60. 60. Message<String> message = inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage(txn).value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); txn.commit().get(); MessageId msgId1 = sendFuture1.get(); MessageId msgId2 = sendFuture2.get(); Transaction API - New Transaction
  61. 61. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2 Txn New Txn 1. New Txn Tx1
  62. 62. Message<String> message = inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage(txn).value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); txn.commit().get(); MessageId msgId1 = sendFuture1.get(); MessageId msgId2 = sendFuture2.get(); Transaction API - Produce Messages
  63. 63. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2 Txn New Txn 2.0 Add Produced Topics To Txn Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 2.1 Produced Messages To Topics with Txn
  64. 64. Message<String> message = inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage(txn).value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); txn.commit().get(); MessageId msgId1 = sendFuture1.get(); MessageId msgId2 = sendFuture2.get(); Transaction API - Acknowledges
  65. 65. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2 Txn New Txn 3.0 Add Acked Subscriptions To Txn Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 3.0 Ack messages with Txn Tx1: ACK (M0) Tx1: add [S0]
  66. 66. Message<String> message = inputConsumer.receive(); Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get(); CompletableFuture<MessageId> sendFuture1 = producer1.newMessage(txn).value(“output-message-1”).sendAsync(); CompletableFuture<MessageId> sendFuture2 = producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); txn.commit().get(); MessageId msgId1 = sendFuture1.get(); MessageId msgId2 = sendFuture2.get(); Transaction API - Commit
  67. 67. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2 Txn New Txn 4.0 Commit Txn Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 Tx1: ACK (M0) Tx1: add [S0] 4.0 Committing Txn Tx1: Committing
  68. 68. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2 Txn New Txn Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 Tx1: ACK (M0) Tx1: add [S0] 4.1.0 Commit Txn On Topics 4.1.1 Commit Txn On Subscriptions Tx1 (c) Tx1 (c) Tx1: Committing Tx1: Committed Tx1: Committed
  69. 69. CoordinatorBroker-0 Broker-1 InputTopic OutputTopic-1 OutputTopic-2 Cursor Transaction Log Data Log Txn Buffer Data Log Txn Buffer Pulsar Client Input Consumer Producer 1 Producer 2 Txn New Txn Tx1 Tx1: add [T1, T2] Tx1: M1 Tx1: M2 Tx1: ACK (M0) Tx1: add [S0] Tx1: Committing Tx1 (c) Tx1 (c) Tx1: Committed Tx1: Committed 4.2 Committed Txn
  70. 70. inputConsumer.receiveAsync().thenCompose(message -> { return client.newTransaction().withTransactionTimeout(…).build().thenCompose(txn -> { producer1.newMessage(txn).value(“output-message-1”).sendAsync(); producer2.newMessage(txn).value(“output-message-2”).sendAsync(); inputConsumer.acknowledgeAsync(message.getMessageId(), txn); return txn.commit(); }); }) Transaction API - Async Example
  71. 71. PulsarCash Cash Transfer function Balance user:alice, debit($10) balance update balance update user:bob, credit($10) (100,0,0): transfer($10, alice -> bob)Ack: (100,0,0)
  72. 72. PulsarCash Cash Transfer function Balance user:alice, debit($10) balance update balance update user:bob, credit($10) (100,0,0): transfer($10, alice -> bob)Ack: (100,0,0) Transaction
  73. 73. Make Event Streaming easy, simple, and reliable for everyone Pulsar Transaction
  74. 74. Available to use in Pulsar 2.6.0 When is it available?
  75. 75. • Transaction support in other languages (e.g. C++, Go) • Transaction in Pulsar Functions & Pulsar IO • Transaction in Kafka-on-Pulsar (KOP) • Transaction for Flink / Spark job • Transaction for State storage in Pulsar Functions • … Roadmap
  76. 76. • Ivan Kelly • Matteo Merli • Jia Zhai • Penghui Li • Marvin Cai • Yong Zhang • … and many other Pulsar users & contributors Credits
  77. 77. Wechat Subscription: ApachePulsar Mailing Lists
 dev@pulsar.apache.org, users@pulsar.apache.org Slack
 https://apache-pulsar.slack.com (#china)
 register: https://apache-pulsar.herokuapp.com/ https://github.com/apache/pulsar https://github.com/apache/bookkeeper
  78. 78. Thanks! Penghui Li Yong Zhang

×