SlideShare a Scribd company logo
1 of 37
Download to read offline
Matteo Merli
Guaranteed “effectively-once” messaging semantic
What is Apache Pulsar?
• Distributed pub/sub messaging
• Backed by a scalable log store — Apache BookKeeper
• Streaming & Queuing
• Low latency
• Multi-tenant
• Geo-Replication
2
Architecture view
• Separate layers
between brokers
bookies
• Broker and bookies
can be added
independently
• Traffic can be shifted
very quickly across
brokers
• New bookies will ramp
up on traffic quickly
3
Pulsar Broker 1 Pulsar Broker 2 Pulsar Broker 3
Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5
Apache BookKeeper
Apache Pulsar
Producer Consumer
Messaging model
4
Messaging semantics
At most once
At least once
Exactly once
5
“Exactly once”
• There is no agreement in industry on what it really means
• Any vendor has claimed exactly once at some point
• Many caveats… “only if there are no crashes…”
• No formal definition of exactly once — unlike “consensus” or “atomic
broadcast”
6
“Effectively once”
• Identify and discard duplicated messages with 100% accuracy
• In presence of any kind of failures
• Messages can be received and processed more than once
• …but effects on the resulting state will be observed only once
7
What can fail?
8
What can fail?
9
What can fail?
10
What can fail?
11
What can fail? — Geo-Replication
12
Breaking the problem
1. Store the message once — ”producer idempotency”
2. Allow applications to “process data only-once”
13
Idempotent producer
• Pulsar broker detects and discards messages that are being retransmitted
• It works when a broker crashes and topic is reassigned
• It works when a producer application crashes
14
Identifying producers
• Use “sequence ids” to detect retransmissions
• Each producer on a topic has it own sequence of messages
• Use “producer-name” to identify producers
15
Detecting duplicates
16
Detecting duplicates
17
Detecting duplicates
18
Detecting duplicates
19
Sequence Id snapshot
20
Sequence Id snapshot
21
Sequence Id snapshot
• Snapshots are taken every N entries to limit recovery time
• Snapshot & cursor updates are atomic
• Cursor updates are stored in BookKeeper — durable & replicated
• On recovery
• Load the snapshot from the cursor
• Replay the entries from the cursor position
22
What if application producer crashes?
• Pulsar needs to identify the new producer as being the same “logical”
producer as before
• In practice, this is only useful if you have a “replayable” source (eg: file,
stream, …)
23
Resuming a producer session
ProducerConfiguration conf = new ProducerConfiguration();
conf.setProducerName("my-producer-name");
conf.setSendTimeout(0, TimeUnit.SECONDS);
Producer producer = client.createProducer(MY_TOPIC, conf);
// Get last committed sequence id before crash
long lastSequenceId = producer.getLastSequenceId();
24
Using sequence Ids
// Fictitious record reader class
RecordReader source = new RecordReader("/my/file/path");
long fileOffset = producer.getLastSequenceId();
source.seekToOffset(fileOffset);
while (source.hasNext()) {
long currentOffset = source.currentOffset();
Message msg = MessageBuilder.create()
.setSequenceId(currentOffset)
.setContent(source.next()).build();
producer.send(msg);
}
25
Consuming messages only once
• Pulsar Consumer API is very convenient
• Managed subscription — tracking individual messages
Consumer consumer = client.subscribe(MY_TOPIC, MY_SUBSCRIPTION_NAME);
while (true) {
Message msg = consumer.receive();
// Process the message...
consumer.acknowledge(msg);
}
26
Effectively-once with Consumer
• Consumer is very simple but doesn’t allow a large degree of control
• Processing and acknowledge are not atomic
• To achieve “effectively once” we need to rely on an external system to
deduplicate the processing results. Eg:
• RDBMS — Keep the message id as a column with a “unique” index
• Critical write to update the state — compareAndSet() or similar
27
Pulsar Reader
• Reader is a low level API to receive data from a Pulsar topic
• There is no managed subscription
• Application always specifies the message id where it wants to start reading
from
28
Reader example
MessageId lastMessageId = recoverLastMessageIdFromDB();
Reader reader = client.createReader(MY_TOPIC, lastMessageId,
new ReaderConfiguration());
while (true) {
Message msg = reader.readNext();
byte[] msgId = msg.getMessageId().toByteArray();
// Process the message and store msgId atomically
}
29
Example — Pulsar Functions
30
Pulsar Functions
• A function gets messages from 1 or more topics
• An instance of the function is invoked to process the event
• The output of the function is published on 1 or more topics
• Super simple to use — No SDK required — Python example:
def process(input):
return input + '!'
31
Pulsar Functions
32
Effectively once with functions
• Use the message id from source topic as sequence id for sink topic
• Works with “Consumer” API
• When consuming from multiple topics or partitions, creates 1 producer per
each source topic/partition, to ensure monotonic sequence ids
33
Performance
• Pulsar approach guarantees deduplication in all failure scenarios
• Overhead is minimal: 2 in memory hashmap updates
• No reduction in throughput — No increased latency
• Controllable increase in recovery time
34
Performance — Benchmark
OpenMessaging
Benchmark
1 Topic / 1 Partition
1 Partition / 1
Consumer
1Kb msg
35
Difference with Kafka approach
36
Kafka Pulsar
Producer Idempotency Best-effort (in memory only) Guaranteed after crash
Transactions 2 phase commit No transactions
Dedup across producer
sessions
No Yes
Dedup with geo-
replication
No Yes
Throughput
Lower (1 in-flight message/batch for
ordering)
Equal
Curious to Learn More?
• Apache Pulsar — https://pulsar.incubator.apache.org
• Follow Us — @apache_pulsar
• Streamlio blog — https://streaml.io/blog
37

More Related Content

What's hot

Using or not using magic onion
Using or not using magic onionUsing or not using magic onion
Using or not using magic onionGoichi Shinohara
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Getting started with Apache Camel presentation at BarcelonaJUG, january 2014
Getting started with Apache Camel presentation at BarcelonaJUG, january 2014Getting started with Apache Camel presentation at BarcelonaJUG, january 2014
Getting started with Apache Camel presentation at BarcelonaJUG, january 2014Claus Ibsen
 
Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18CodeOps Technologies LLP
 
CI/CD (DevOps) 101
CI/CD (DevOps) 101CI/CD (DevOps) 101
CI/CD (DevOps) 101Hazzim Anaya
 
Quic을 이용한 네트워크 성능 개선
 Quic을 이용한 네트워크 성능 개선 Quic을 이용한 네트워크 성능 개선
Quic을 이용한 네트워크 성능 개선NAVER D2
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFinside-BigData.com
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBInfluxData
 
Introduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQIntroduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQDmitriy Samovskiy
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopSudhir Tonse
 
ContainerDays Hamburg 2023 — Cilium Workshop.pdf
ContainerDays Hamburg 2023 — Cilium Workshop.pdfContainerDays Hamburg 2023 — Cilium Workshop.pdf
ContainerDays Hamburg 2023 — Cilium Workshop.pdfRaphaël PINSON
 
Cloud Native Bern 05.2023 — Zero Trust Visibility
Cloud Native Bern 05.2023 — Zero Trust VisibilityCloud Native Bern 05.2023 — Zero Trust Visibility
Cloud Native Bern 05.2023 — Zero Trust VisibilityRaphaël PINSON
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Low Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdfLow Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdfClaus Ibsen
 
SIP Attack Handling (Kamailio World 2021)
SIP Attack Handling (Kamailio World 2021)SIP Attack Handling (Kamailio World 2021)
SIP Attack Handling (Kamailio World 2021)Fred Posner
 

What's hot (20)

Using or not using magic onion
Using or not using magic onionUsing or not using magic onion
Using or not using magic onion
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Getting started with Apache Camel presentation at BarcelonaJUG, january 2014
Getting started with Apache Camel presentation at BarcelonaJUG, january 2014Getting started with Apache Camel presentation at BarcelonaJUG, january 2014
Getting started with Apache Camel presentation at BarcelonaJUG, january 2014
 
Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18
 
CI/CD (DevOps) 101
CI/CD (DevOps) 101CI/CD (DevOps) 101
CI/CD (DevOps) 101
 
Quic을 이용한 네트워크 성능 개선
 Quic을 이용한 네트워크 성능 개선 Quic을 이용한 네트워크 성능 개선
Quic을 이용한 네트워크 성능 개선
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
 
Introduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQIntroduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQ
 
Docker swarm
Docker swarmDocker swarm
Docker swarm
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash Workshop
 
ContainerDays Hamburg 2023 — Cilium Workshop.pdf
ContainerDays Hamburg 2023 — Cilium Workshop.pdfContainerDays Hamburg 2023 — Cilium Workshop.pdf
ContainerDays Hamburg 2023 — Cilium Workshop.pdf
 
Cloud Native Bern 05.2023 — Zero Trust Visibility
Cloud Native Bern 05.2023 — Zero Trust VisibilityCloud Native Bern 05.2023 — Zero Trust Visibility
Cloud Native Bern 05.2023 — Zero Trust Visibility
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Low Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdfLow Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdf
 
SIP Attack Handling (Kamailio World 2021)
SIP Attack Handling (Kamailio World 2021)SIP Attack Handling (Kamailio World 2021)
SIP Attack Handling (Kamailio World 2021)
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
"DevOps > CI+CD "
"DevOps > CI+CD ""DevOps > CI+CD "
"DevOps > CI+CD "
 

Similar to Effectively-once semantics in Apache Pulsar

Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability Jeff Holoman
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupRoy Russo
 
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...tdc-globalcode
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Yuta Iwama
 
Ruby Microservices with RabbitMQ
Ruby Microservices with RabbitMQRuby Microservices with RabbitMQ
Ruby Microservices with RabbitMQZoran Majstorovic
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupKarthik Ramasamy
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafkaconfluent
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutYaroslav Tkachenko
 
Injection on Steroids: Codeless code injection and 0-day techniques
Injection on Steroids: Codeless code injection and 0-day techniquesInjection on Steroids: Codeless code injection and 0-day techniques
Injection on Steroids: Codeless code injection and 0-day techniquesenSilo
 
MULTI-THREADING in python appalication.pptx
MULTI-THREADING in python appalication.pptxMULTI-THREADING in python appalication.pptx
MULTI-THREADING in python appalication.pptxSaiDhanushM
 

Similar to Effectively-once semantics in Apache Pulsar (20)

Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users Group
 
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016
 
Ruby Microservices with RabbitMQ
Ruby Microservices with RabbitMQRuby Microservices with RabbitMQ
Ruby Microservices with RabbitMQ
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - Meetup
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 
Injection on Steroids: Codeless code injection and 0-day techniques
Injection on Steroids: Codeless code injection and 0-day techniquesInjection on Steroids: Codeless code injection and 0-day techniques
Injection on Steroids: Codeless code injection and 0-day techniques
 
MULTI-THREADING in python appalication.pptx
MULTI-THREADING in python appalication.pptxMULTI-THREADING in python appalication.pptx
MULTI-THREADING in python appalication.pptx
 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Effectively-once semantics in Apache Pulsar

  • 2. What is Apache Pulsar? • Distributed pub/sub messaging • Backed by a scalable log store — Apache BookKeeper • Streaming & Queuing • Low latency • Multi-tenant • Geo-Replication 2
  • 3. Architecture view • Separate layers between brokers bookies • Broker and bookies can be added independently • Traffic can be shifted very quickly across brokers • New bookies will ramp up on traffic quickly 3 Pulsar Broker 1 Pulsar Broker 2 Pulsar Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Apache BookKeeper Apache Pulsar Producer Consumer
  • 5. Messaging semantics At most once At least once Exactly once 5
  • 6. “Exactly once” • There is no agreement in industry on what it really means • Any vendor has claimed exactly once at some point • Many caveats… “only if there are no crashes…” • No formal definition of exactly once — unlike “consensus” or “atomic broadcast” 6
  • 7. “Effectively once” • Identify and discard duplicated messages with 100% accuracy • In presence of any kind of failures • Messages can be received and processed more than once • …but effects on the resulting state will be observed only once 7
  • 12. What can fail? — Geo-Replication 12
  • 13. Breaking the problem 1. Store the message once — ”producer idempotency” 2. Allow applications to “process data only-once” 13
  • 14. Idempotent producer • Pulsar broker detects and discards messages that are being retransmitted • It works when a broker crashes and topic is reassigned • It works when a producer application crashes 14
  • 15. Identifying producers • Use “sequence ids” to detect retransmissions • Each producer on a topic has it own sequence of messages • Use “producer-name” to identify producers 15
  • 22. Sequence Id snapshot • Snapshots are taken every N entries to limit recovery time • Snapshot & cursor updates are atomic • Cursor updates are stored in BookKeeper — durable & replicated • On recovery • Load the snapshot from the cursor • Replay the entries from the cursor position 22
  • 23. What if application producer crashes? • Pulsar needs to identify the new producer as being the same “logical” producer as before • In practice, this is only useful if you have a “replayable” source (eg: file, stream, …) 23
  • 24. Resuming a producer session ProducerConfiguration conf = new ProducerConfiguration(); conf.setProducerName("my-producer-name"); conf.setSendTimeout(0, TimeUnit.SECONDS); Producer producer = client.createProducer(MY_TOPIC, conf); // Get last committed sequence id before crash long lastSequenceId = producer.getLastSequenceId(); 24
  • 25. Using sequence Ids // Fictitious record reader class RecordReader source = new RecordReader("/my/file/path"); long fileOffset = producer.getLastSequenceId(); source.seekToOffset(fileOffset); while (source.hasNext()) { long currentOffset = source.currentOffset(); Message msg = MessageBuilder.create() .setSequenceId(currentOffset) .setContent(source.next()).build(); producer.send(msg); } 25
  • 26. Consuming messages only once • Pulsar Consumer API is very convenient • Managed subscription — tracking individual messages Consumer consumer = client.subscribe(MY_TOPIC, MY_SUBSCRIPTION_NAME); while (true) { Message msg = consumer.receive(); // Process the message... consumer.acknowledge(msg); } 26
  • 27. Effectively-once with Consumer • Consumer is very simple but doesn’t allow a large degree of control • Processing and acknowledge are not atomic • To achieve “effectively once” we need to rely on an external system to deduplicate the processing results. Eg: • RDBMS — Keep the message id as a column with a “unique” index • Critical write to update the state — compareAndSet() or similar 27
  • 28. Pulsar Reader • Reader is a low level API to receive data from a Pulsar topic • There is no managed subscription • Application always specifies the message id where it wants to start reading from 28
  • 29. Reader example MessageId lastMessageId = recoverLastMessageIdFromDB(); Reader reader = client.createReader(MY_TOPIC, lastMessageId, new ReaderConfiguration()); while (true) { Message msg = reader.readNext(); byte[] msgId = msg.getMessageId().toByteArray(); // Process the message and store msgId atomically } 29
  • 30. Example — Pulsar Functions 30
  • 31. Pulsar Functions • A function gets messages from 1 or more topics • An instance of the function is invoked to process the event • The output of the function is published on 1 or more topics • Super simple to use — No SDK required — Python example: def process(input): return input + '!' 31
  • 33. Effectively once with functions • Use the message id from source topic as sequence id for sink topic • Works with “Consumer” API • When consuming from multiple topics or partitions, creates 1 producer per each source topic/partition, to ensure monotonic sequence ids 33
  • 34. Performance • Pulsar approach guarantees deduplication in all failure scenarios • Overhead is minimal: 2 in memory hashmap updates • No reduction in throughput — No increased latency • Controllable increase in recovery time 34
  • 35. Performance — Benchmark OpenMessaging Benchmark 1 Topic / 1 Partition 1 Partition / 1 Consumer 1Kb msg 35
  • 36. Difference with Kafka approach 36 Kafka Pulsar Producer Idempotency Best-effort (in memory only) Guaranteed after crash Transactions 2 phase commit No transactions Dedup across producer sessions No Yes Dedup with geo- replication No Yes Throughput Lower (1 in-flight message/batch for ordering) Equal
  • 37. Curious to Learn More? • Apache Pulsar — https://pulsar.incubator.apache.org • Follow Us — @apache_pulsar • Streamlio blog — https://streaml.io/blog 37