SlideShare a Scribd company logo
1 of 22
Download to read offline
Infinite* Topic Backlogs
with Apache Pulsar
Ivan Kelly
@ivankelly
Peter Norvig - Google
We don’t have better algorithms than anyone else; we
just have more data
An example system
Stream
Processing
Engine
Model
User
events
Serving
System
Songs user has
listened to
Recommendation algorithm
User listening model
Recommended songs
Stream
Processing
Engine
Partial Model
Live User
events
Serving
System
Historical
User Events
Batch
Processing
Engine
Partial Model
Combined
Model
Stream
Processing
Engine
Live User
events
Serving
System
Partial Model
Historical
User Events
Batch
Processing
Engine
Partial Model
Combined
Model
Model
Apache Pulsar
Messaging System
Pubsub and queuing
Strong guarantees on message persistence
Unlimited topic backlog size
Tiered Storage
Lightweight compute
SQL interface available
Anatomy of a topic backlog
A B C
Broker
Client produces message
Broker acknowledges
message to client
Broker writes message
to topic backlog
D E F G
Topic Backlog
Anatomy of a topic backlog
A B C D E F G
Closed Open
Topic Backlog
A B C
Storage 2
Storage 3
Storage 1
Storage layer
Storage 1 Storage 2 Storage 3
Serving layer
Broker 1 Broker 2
Storage layer
Add nodes to add storage
Topic Backlog
Mirroring
1/n = 0.5 space efficiency
Tolerates n - 1 = 1 failures
Gets worse with more replicas
Striping with parity
1 - 1/n = 0.833 space efficiency
Tolerates 1 failure
Space efficiency increases with nParity
Tiered Storage
Broker
Client writes
messages
Storage nodes
Long-term storage
Client reads
messages
1. Pulsar topic metadata updated with potential new location
2. All messages from segment copied to data object in long-term storage
3. Copying process tracks message offsets at defined interval
4. Offsets written to index object in long-term storage
5. Pulsar topic metadata updated with offload completed status
6. Original segment deleted from Pulsar storage after grace period
Available long-term storage options
Since v2.1.0
Since v2.2.0
Implemented but not merged
Implementing your own
public interface LedgerOffloader {
String getOffloadDriverName();
Map<String, String> getOffloadDriverMetadata();
CompletableFuture<Void> offload(ReadHandle ledger, UUID uid,
Map<String, String> extraMetadata);
CompletableFuture<ReadHandle> readOffloaded(long ledgerId, UUID uid,
Map<String, String> offloadDriverMetadata);
CompletableFuture<Void> deleteOffloaded(long ledgerId, UUID uid,
Map<String, String> offloadDriverMetadata);
}
1. Implement LedgerOffloader interface
2. Drop NAR file in <pulsar-directory>/offloaders
Demo
Bloom
Filter
Function
Twitter topic
Query topic
Result topic
Querying massive backlogs
Pulsar provides SQL interface from v2.2.0
Based on PrestoDB
Table schema from topic schema
Queries data at rest
class ListenEvent {
UUID userId;
String artist;
String title;
String album;
}
Schema schema = Schema.AVRO(ListenEvent.class);
String topic = “user_profiling”;
try (Producer<ListenEvent> p = client.newProducer(schema)
.topic(topic).create()) {
p.send(new ListenEvent(myId, "Metallica",
"Fade to Black", "Ride the lightning"));
p.send(new ListenEvent(myId, "Black Sabbath",
"Sweet Leaf", "Master of Reality"));
p.send(new ListenEvent(myId, "Dio", "Holy Diver", "Holy Diver"));
}
presto> SELECT * from pulsar."public/default".user_profiling WHERE artist = 'Dio';
userId | artist | title | album |
--------------------------------------+--------+------------+------------+
79a6540c-e113-11e8-a275-636165afe980 | Dio | Holy Diver | Holy Diver |
Long-term
storage
Broker
StorageStorageStorageStorageStorageStorage
BrokerBrokerBrokerBroker
Presto
Worker
Presto
Worker
Presto
Worker
Presto
client
Summary
Apache Pulsar
• Unlimited backlog size
• Offload older data to cheaper storage
• Query with an SQL interface
Other uses cases for massive backlogs
• CQRS Event sourcing
• Data marts
• Audit logs
• Security logs
Blog: https://streaml.io/blog
@streamlio @ivankelly

More Related Content

More from J On The Beach

The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EEJ On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to failJ On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersJ On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every levelJ On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 
Getting started with Deep Reinforcement Learning
Getting started with Deep Reinforcement LearningGetting started with Deep Reinforcement Learning
Getting started with Deep Reinforcement LearningJ On The Beach
 
Toward Predictability and Stability At The Edge Of Chaos
Toward Predictability and Stability At The Edge Of ChaosToward Predictability and Stability At The Edge Of Chaos
Toward Predictability and Stability At The Edge Of ChaosJ On The Beach
 
Numeric programming with spire
Numeric programming with spireNumeric programming with spire
Numeric programming with spireJ On The Beach
 
Istio Service Mesh & pragmatic microservices architecture
Istio Service Mesh & pragmatic microservices architectureIstio Service Mesh & pragmatic microservices architecture
Istio Service Mesh & pragmatic microservices architectureJ On The Beach
 
GraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereGraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereJ On The Beach
 
Continuous delivery in the real world
Continuous delivery in the real world Continuous delivery in the real world
Continuous delivery in the real world J On The Beach
 
Free the Functions with Fn project!
Free the Functions with Fn project!Free the Functions with Fn project!
Free the Functions with Fn project!J On The Beach
 
Leveraging AI for facilitating refugee integration
Leveraging AI for facilitating refugee integrationLeveraging AI for facilitating refugee integration
Leveraging AI for facilitating refugee integrationJ On The Beach
 

More from J On The Beach (20)

The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 
Getting started with Deep Reinforcement Learning
Getting started with Deep Reinforcement LearningGetting started with Deep Reinforcement Learning
Getting started with Deep Reinforcement Learning
 
Toward Predictability and Stability At The Edge Of Chaos
Toward Predictability and Stability At The Edge Of ChaosToward Predictability and Stability At The Edge Of Chaos
Toward Predictability and Stability At The Edge Of Chaos
 
Numeric programming with spire
Numeric programming with spireNumeric programming with spire
Numeric programming with spire
 
Istio Service Mesh & pragmatic microservices architecture
Istio Service Mesh & pragmatic microservices architectureIstio Service Mesh & pragmatic microservices architecture
Istio Service Mesh & pragmatic microservices architecture
 
GraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereGraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster Everywhere
 
Continuous delivery in the real world
Continuous delivery in the real world Continuous delivery in the real world
Continuous delivery in the real world
 
Free the Functions with Fn project!
Free the Functions with Fn project!Free the Functions with Fn project!
Free the Functions with Fn project!
 
Leveraging AI for facilitating refugee integration
Leveraging AI for facilitating refugee integrationLeveraging AI for facilitating refugee integration
Leveraging AI for facilitating refugee integration
 

Recently uploaded

OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionWave PLM
 
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Primacy Infotech
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...naitiksharma1124
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfWSO2
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignNeo4j
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfsteffenkarlsson2
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfFurqanuddin10
 
Naer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research SynthesisNaer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research Synthesisparimabajra
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfVictor Lopez
 

Recently uploaded (20)

OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
Naer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research SynthesisNaer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research Synthesis
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 

Infinite Topic Backlogs with Apache Pulsar