SlideShare a Scribd company logo
Apache Pulsar: Next Generation
Cloud-Native Messaging & Streaming
January 2019
Connected World
Increasingly connected world
!3
Internet of Things
30 B connected devices by 2020
Health Care
153 Exabytes (2013) -> 2314 Exabytes
(2020)
Machine Data
40% of digital universe by 2020
Connected Vehicles
Data transferred per vehicle per month
4 TB -> 10 TB
Digital Assistants (Predictive Analytics)
$2B (2012) -> $6.5B (2019) [1]
Siri/Cortana/Google Now
Augmented/Virtual Reality
$150B by 2020 [2]
Oculus/HoloLens/Magic Leap
Ñ
+
>
• Events are analyzed and processed as they arrive
• Decisions are timely, contextual and based on fresh data
• Decision latency is eliminated
• Data in motion
Fast data processing
!4
Ingest/
Buffer
Analyze Act
Elements of stream processing
!5
ComputeMessaging
Storage
Data Ingestion Data Processing
Results StorageData Storage
Data
Serving
Apache Pulsar
!6
Flexible Messaging + Streaming System
backed by durable log storage
Apache Pulsar: 

Messaging + Storage
Apache Pulsar: Tenants, namespaces, topics
!8
Apache Pulsar Cluster
Product Safety
ETL
Fraud
Detection
Topic-1
Account History
Topic-2
User Clustering
Topic-1
Risk Classification
Marketing
Campaigns
ETL
Topic-1
Budgeted Spend
Topic-2
Demographic Classification
Topic-1
Location Resolution
Data
Serving
Microservice
Topic-1
Customer Authentication
Tenants
Namespaces
Apache Pulsar: Topics
!9
Topic
Producers
Consumers
Time
Apache Pulsar: Topic partitions
!10
Topic - P0 Producers
Consumers
Time
Topic - P1
Topic - P2
Apache Pulsar: Segments
!11
Producers
Consumers
Time
Segment 1 Segment 2 Segment 3
Segment 1 Segment 2 Segment 3 Segment 4
Segment 1 Segment 2 Segment 3
P0
P1
P2
Apache Pulsar
!12
Bookie Bookie Bookie
Broker Broker Broker
Producer Consumer
• Layered Architecture
• Independent Scalability

• Fault Tolerance

• Instant Scalability
Apache Pulsar: Segment-centric storage
!13
• Logical Partition

• Partition divided into Segments

• Size-based & Time-based

• Uniformly distributed across the
cluster 

• Broker is the only point of interaction for clients (producers and
consumers)
• Brokers acquire ownership of group of topics and “serve” them
• Broker has no durable state
• Provides service discovery mechanism for client to connect to right
broker
Apache Pulsar: Broker
!14
Apache Pulsar: Broker
!15
Apache Pulsar: Broker failure recovery
!16
• Topic is reassigned to an available
broker based on load
• Can reconstruct the previous state
consistently
• No data needs to be copied
• Failover handled transparently by
client library
Apache Pulsar: Bookie failure recovery
!17
• After a write failure, BookKeeper will
immediately switch write to a new
bookie, within the same segment.
• As long as we have any 3 bookies in
the cluster, we can continue to write
Apache Pulsar: Bookie failure recovery
!18
In background, starts a many-to-
many recovery process to regain
the configured replication factor
Apache Pulsar: Seamless cluster expansion
!19
1234…20212223…40414243…60616263…
Segment 1
Segment 3
Segment 2
Segment 2
Segment 1
Segment 3
Segment 4
Segment 3
Segment 2
Segment 1
Segment 4
Segment 4
Segment Y
Segment Z
Segment X
Apache Pulsar: Tiered storage
!20
Low Cost Storage
1234…20212223…40414243…60616263…
Segment 3
Segment 2Segment 3
Segment 4
Segment 3
Segment 1
Segment 4 Segment 4
Partitions vs segments - why should you care?
!21
Legacy Architectures
● Storage co-resident with processing
● Partition-centric
● Cumbersome to scale--data
redistribution, performance impact
Logical
View
Apache Pulsar
● Storage decoupled from processing
● Partitions stored as segments
● Flexible, easy scalability
Partition
Processing
& Storage
Segment 1 Segment 3Segment 2 Segment n
Partition
Broker
Partition
(primary)
Broker
Partition
(copy)
Broker
Partition
(copy)
Broker Broker Broker
Segment 1
Segment 2
Segment n
.
.
.
Segment 2
Segment 3
Segment n
.
.
.
Segment 3
Segment 1
Segment n
.
.
.
Segment 1
Segment 2
Segment n
.
.
.
Processing
(brokers)
Storage
• In Kafka, partitions are assigned to brokers “permanently”
• A single partition is stored entirely in a single node
• Retention is limited by a single node storage capacity
• Failure recovery and capacity expansion require expensive
“rebalancing”
• Rebalancing has a big impact over the system, affecting regular
traffic
Partitions vs Segments: Why should you care?
!22
Apache Pulsar: Durability
!23
Bookie
Bookie
BookieBrokerProducer
Journal
Journal
Journal
fsync
fsync
fsync
Apache Pulsar: Isolation
!24
Unified messaging model: Streaming
!25
Pulsar topic/
partition
Producer 2
Producer 1
Consumer 1
Consumer 2
Subscription
A
M4
M3
M2
M1
M0
M4
M3
M2
M1
M0
X
Exclusive
Unified messaging model: Streaming
!26
Pulsar topic/
partition
Producer 2
Producer 1
Consumer 1
Consumer 2
Subscription
B
M4
M3
M2
M1
M0
M4
M3
M2
M1
M0
Failover
In case of failure in
consumer 1
Unified messaging model: Queuing
!27
Pulsar topic/
partition
Producer 2
Producer 1
Consumer 2
Consumer 3
Subscription
C
M4
M3
M2
M1
M0
Shared
Traffic is equally distributed
across consumers
Consumer 1
M4M3
M2M1M0
Replication for disaster recovery
!28
Topic (T1) Topic (T1)
Topic (T1)
Subscription
(S1)
Subscription
(S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data Center A Data Center B
Data Center C
Integrated in the broker
message flow
Simple configuration to
add/remove regions
Asynchronous (default) and
synchronous replication
Apache Pulsar: Multi-tenancy
!29
Apache Pulsar Cluster
Product
Safety
ETL
Fraud
Detection
Topic-1
Account History
Topic-2
User Clustering
Topic-1
Risk Classification
MarketingCampaigns
ETL
Topic-1
Budgeted Spend
Topic-2
Demographic
Classification
Topic-1
Location Resolution
Data
Serving
Microservice
Topic-1
Customer
Authentication
10 TB
7 TB
5 TB
• Authentication
• Authorization
• Software isolation
• Storage quotas, flow control, back pressure, rate limiting
• Hardware isolation
• Constrain some tenants on a subset of brokers/bookies
Pulsar clients
!30
Apache Pulsar Cluster
Java
Python
Go
C++ C
• Provides type safety to applications built on top of Pulsar
• Two approaches
• Client side - type safety enforcement up to the application
• Server side - system enforces type safety and ensures that producers and consumers
remain synced
• Schema registry enables clients to upload data schemas on a topic basis.
• Schemas dictate which data types are recognized as valid for that topic
Schema registry
!31
Compute
• Consume data as it is produced (pub/sub)
• Heavy weight compute - continuous data processing (DAG Processing)
• Light weight compute - transform and react to data as it arrives
• Interactive query of stored streams
How to process data modeled as streams
!33
Significant set of processing tasks are exceedingly simple
• Data transformations
• Data classification
• Data enrichment
• Data routing
• Data extraction and loading
• Real time aggregation
• Microservices
Lessons learned: Use cases
!34
Light weight compute
!35
f(x)
Incoming Messages Output Messages
ABSTRACT VIEW OF COMPUTE REPRESENTATION
Applying insight gained from serverless
• Simplest possible API function or procedure
• Support for multi language
• Use native API for each language
• Scale developers
• Use of message bus native concepts - input and output as topics
• Flexible runtime - simple standalone applications vs managed system
applications
Stream native compute using functions
!36
SDK-LESS API
import java.util.function.Function;
public class ExclamationFunction implements Function<String, String> {
@Override
public String apply(String input) {
return input + "!";
}
}
Pulsar Functions
!37
• ATMOST_ONCE
• Message acked to Pulsar as soon as we receive it
• ATLEAST_ONCE
• Message acked to Pulsar after the function completes
• Default behavior - don’t want people to loose data
• EFFECTIVELY_ONCE
• Uses Pulsar’s inbuilt effectively once semantics
• Controlled at runtime by user
Processing guarantees
!38
Deploying functions: Broker
!39
Broker 1
Worker
Function
wordcount-1
Function
transform-2
Broker 1
Worker
Function
transform-1
Function
dataroute-1
Broker 1
Worker
Function
wordcount-2
Function
transform-3
Node 1 Node 2 Node 3
Deploying functions: Worker nodes
!40
Worker
Function
wordcount-1
Function
transform-2
Worker
Function
transform-1
Function
dataroute-1
Worker
Function
wordcount-2
Function
transform-3
Node 1 Node 2 Node 3
Broker 1 Broker 2 Broker 3
Node 4 Node 5 Node 6
Deploying functions: Kubernetes
!41
Function
wordcount-1
Function
transform-1
Function
transform-3
Pod 1 Pod 2 Pod 3
Broker 1 Broker 2 Broker 3
Pod 7 Pod 8 Pod 9
Function
dataroute-1
Function
wordcount-2
Function
transform-2
Pod 4 Pod 5 Pod 6
Interactive querying of streams: Pulsar SQL
!42
1234…20212223…40414243…60616263…
Segment 1
Segment 3
Segment 2
Segment 2
Segment 1
Segment 3
Segment 4
Segment 3
Segment 2
Segment 1
Segment 4
Segment 4
Segment
Reader
Segment
Reader
Segment
Reader
Segment
Reader
Coordina
tor
Pulsar performance: Publish rate
!43
Pulsar performance: Latency
!44
Apache Pulsar VS. Apache Kafka
!45
Multi-tenancy
A single cluster can support
many tenants and use cases
Seamless Cluster Expansion
Expand the cluster without any
down time
High throughput & Low Latency
Can reach 1.8 M messages/s in
a single partition and publish
latency of 5ms at 99pct
Durability
Data replicated and synced to
disk
Geo-replication
Out of box support for
geographically distributed
applications
Unified messaging model
Support both Topic & Queue
semantic in a single model
Tiered Storage
Hot/warm data for real time access
and cold event data in cheaper
storage
Pulsar Functions
Flexible light weight compute
Highly scalable
Can support millions of topics, makes
data modeling easier
Apache Pulsar VS. Apache Kafka
!46
https://jack-vanlightly.com/sketches/2018/10/2/kafka-vs-pulsar-rebalancing-sketch
Thanks to JACK VANLIGHTLY
Apache Pulsar: Tying solutions together
!47
Tiered Storage
Stream Storage
AWS S3
Google Cloud
Storage
Azure Blob
Storage
HDFS
BookKeeper
Analytics
Presto SQL
Messaging
Pulsar Brokers
Event Processing
Pulsar Functions Complex Stream
Pulsar IO
Cassandra Kinesis MySQL MongoDB
Other
Frameworks
• 4+ years
• Serves 2.3 million topics
• 500 billion messages/day
• 400+ bookie nodes
• 150+ broker nodes
• Average latency < 5 ms
• 99.9% 15 ms (strong durability guarantees)
• Zero data loss
• 150+ applications
• Self served provisioning
• Full-mesh cross-datacenter replication - 8+ data centers
Apache Pulsar in production at scale
!48
• Twitter: @apache_pulsar
• Wechat Subscription: ApachePulsar
• Mailing Lists

dev@pulsar.apache.org, users@pulsar.apache.org
• Slack

https://apache-pulsar.slack.com
• Localization

https://crowdin.com/project/apache-pulsar
• Github

https://github.com/apache/pulsar

https://github.com/apache/bookkeeper
Apache Pulsar community
!49
• Understanding How Pulsar Works

https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
• How To (Not) Lose Messages on Apache Pulsar Cluster

https://jack-vanlightly.com/blog/2018/10/21/how-to-not-lose-messages-on-an-apache-
pulsar-cluster
More readings
!50
• Unified queuing and streaming

https://streaml.io/blog/pulsar-streaming-queuing
• Segment centric storage

https://streaml.io/blog/pulsar-segment-based-architecture
• Messaging, Storage or Both

https://streaml.io/blog/messaging-storage-or-both
• Access patterns and tiered storage

https://streaml.io/blog/access-patterns-and-tiered-storage-in-apache-pulsar
• Tiered Storage in Apache Pulsar

https://streaml.io/blog/tiered-storage-in-apache-pulsar
More readings
!51
Conclusion
!52
ComputeMessaging
Storage
Apache Pulsar - Cloud Native
@karthikz

More Related Content

What's hot

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best PracticesOracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Markus Michalewicz
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
Naresh Rupareliya
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
RubiX
RubiXRubiX
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
Anil Nair
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
Julien Pivotto
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
StreamNative
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarApache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Enrico Olivelli
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
StreamNative
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
HostedbyConfluent
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
StreamNative
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 

What's hot (20)

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best PracticesOracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
RubiX
RubiXRubiX
RubiX
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarApache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 

Similar to Apache Pulsar Overview

Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Karthik Ramasamy
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
StreamNative
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Comsysto Reply GmbH
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
ZalandoHayley
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
Zalando Technology
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
SolarWinds Loggly
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
Stéphane Maldini
 

Similar to Apache Pulsar Overview (20)

Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 

More from Streamlio

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
Streamlio
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
Streamlio
 
Strata London 2018: Multi-everything with Apache Pulsar
Strata London 2018:  Multi-everything with Apache PulsarStrata London 2018:  Multi-everything with Apache Pulsar
Strata London 2018: Multi-everything with Apache Pulsar
Streamlio
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
Streamlio
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
Streamlio
 
Event Data Processing with Streamlio
Event Data Processing with StreamlioEvent Data Processing with Streamlio
Event Data Processing with Streamlio
Streamlio
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
Streamlio
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
Streamlio
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
Streamlio
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
Streamlio
 
Autopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in HeronAutopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in Heron
Streamlio
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
Streamlio
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Streamlio
 

More from Streamlio (13)

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Strata London 2018: Multi-everything with Apache Pulsar
Strata London 2018:  Multi-everything with Apache PulsarStrata London 2018:  Multi-everything with Apache Pulsar
Strata London 2018: Multi-everything with Apache Pulsar
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Event Data Processing with Streamlio
Event Data Processing with StreamlioEvent Data Processing with Streamlio
Event Data Processing with Streamlio
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
 
Autopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in HeronAutopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in Heron
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 

Recently uploaded

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 

Recently uploaded (20)

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 

Apache Pulsar Overview

  • 1. Apache Pulsar: Next Generation Cloud-Native Messaging & Streaming January 2019
  • 3. Increasingly connected world !3 Internet of Things 30 B connected devices by 2020 Health Care 153 Exabytes (2013) -> 2314 Exabytes (2020) Machine Data 40% of digital universe by 2020 Connected Vehicles Data transferred per vehicle per month 4 TB -> 10 TB Digital Assistants (Predictive Analytics) $2B (2012) -> $6.5B (2019) [1] Siri/Cortana/Google Now Augmented/Virtual Reality $150B by 2020 [2] Oculus/HoloLens/Magic Leap Ñ + >
  • 4. • Events are analyzed and processed as they arrive • Decisions are timely, contextual and based on fresh data • Decision latency is eliminated • Data in motion Fast data processing !4 Ingest/ Buffer Analyze Act
  • 5. Elements of stream processing !5 ComputeMessaging Storage Data Ingestion Data Processing Results StorageData Storage Data Serving
  • 6. Apache Pulsar !6 Flexible Messaging + Streaming System backed by durable log storage
  • 8. Apache Pulsar: Tenants, namespaces, topics !8 Apache Pulsar Cluster Product Safety ETL Fraud Detection Topic-1 Account History Topic-2 User Clustering Topic-1 Risk Classification Marketing Campaigns ETL Topic-1 Budgeted Spend Topic-2 Demographic Classification Topic-1 Location Resolution Data Serving Microservice Topic-1 Customer Authentication Tenants Namespaces
  • 10. Apache Pulsar: Topic partitions !10 Topic - P0 Producers Consumers Time Topic - P1 Topic - P2
  • 11. Apache Pulsar: Segments !11 Producers Consumers Time Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 4 Segment 1 Segment 2 Segment 3 P0 P1 P2
  • 12. Apache Pulsar !12 Bookie Bookie Bookie Broker Broker Broker Producer Consumer • Layered Architecture • Independent Scalability
 • Fault Tolerance
 • Instant Scalability
  • 13. Apache Pulsar: Segment-centric storage !13 • Logical Partition
 • Partition divided into Segments
 • Size-based & Time-based
 • Uniformly distributed across the cluster 

  • 14. • Broker is the only point of interaction for clients (producers and consumers) • Brokers acquire ownership of group of topics and “serve” them • Broker has no durable state • Provides service discovery mechanism for client to connect to right broker Apache Pulsar: Broker !14
  • 16. Apache Pulsar: Broker failure recovery !16 • Topic is reassigned to an available broker based on load • Can reconstruct the previous state consistently • No data needs to be copied • Failover handled transparently by client library
  • 17. Apache Pulsar: Bookie failure recovery !17 • After a write failure, BookKeeper will immediately switch write to a new bookie, within the same segment. • As long as we have any 3 bookies in the cluster, we can continue to write
  • 18. Apache Pulsar: Bookie failure recovery !18 In background, starts a many-to- many recovery process to regain the configured replication factor
  • 19. Apache Pulsar: Seamless cluster expansion !19 1234…20212223…40414243…60616263… Segment 1 Segment 3 Segment 2 Segment 2 Segment 1 Segment 3 Segment 4 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4 Segment Y Segment Z Segment X
  • 20. Apache Pulsar: Tiered storage !20 Low Cost Storage 1234…20212223…40414243…60616263… Segment 3 Segment 2Segment 3 Segment 4 Segment 3 Segment 1 Segment 4 Segment 4
  • 21. Partitions vs segments - why should you care? !21 Legacy Architectures ● Storage co-resident with processing ● Partition-centric ● Cumbersome to scale--data redistribution, performance impact Logical View Apache Pulsar ● Storage decoupled from processing ● Partitions stored as segments ● Flexible, easy scalability Partition Processing & Storage Segment 1 Segment 3Segment 2 Segment n Partition Broker Partition (primary) Broker Partition (copy) Broker Partition (copy) Broker Broker Broker Segment 1 Segment 2 Segment n .
.
. Segment 2 Segment 3 Segment n .
.
. Segment 3 Segment 1 Segment n .
.
. Segment 1 Segment 2 Segment n .
.
. Processing (brokers) Storage
  • 22. • In Kafka, partitions are assigned to brokers “permanently” • A single partition is stored entirely in a single node • Retention is limited by a single node storage capacity • Failure recovery and capacity expansion require expensive “rebalancing” • Rebalancing has a big impact over the system, affecting regular traffic Partitions vs Segments: Why should you care? !22
  • 25. Unified messaging model: Streaming !25 Pulsar topic/ partition Producer 2 Producer 1 Consumer 1 Consumer 2 Subscription A M4 M3 M2 M1 M0 M4 M3 M2 M1 M0 X Exclusive
  • 26. Unified messaging model: Streaming !26 Pulsar topic/ partition Producer 2 Producer 1 Consumer 1 Consumer 2 Subscription B M4 M3 M2 M1 M0 M4 M3 M2 M1 M0 Failover In case of failure in consumer 1
  • 27. Unified messaging model: Queuing !27 Pulsar topic/ partition Producer 2 Producer 1 Consumer 2 Consumer 3 Subscription C M4 M3 M2 M1 M0 Shared Traffic is equally distributed across consumers Consumer 1 M4M3 M2M1M0
  • 28. Replication for disaster recovery !28 Topic (T1) Topic (T1) Topic (T1) Subscription (S1) Subscription (S1) Producer (P1) Consumer (C1) Producer (P3) Producer (P2) Consumer (C2) Data Center A Data Center B Data Center C Integrated in the broker message flow Simple configuration to add/remove regions Asynchronous (default) and synchronous replication
  • 29. Apache Pulsar: Multi-tenancy !29 Apache Pulsar Cluster Product Safety ETL Fraud Detection Topic-1 Account History Topic-2 User Clustering Topic-1 Risk Classification MarketingCampaigns ETL Topic-1 Budgeted Spend Topic-2 Demographic Classification Topic-1 Location Resolution Data Serving Microservice Topic-1 Customer Authentication 10 TB 7 TB 5 TB • Authentication • Authorization • Software isolation • Storage quotas, flow control, back pressure, rate limiting • Hardware isolation • Constrain some tenants on a subset of brokers/bookies
  • 30. Pulsar clients !30 Apache Pulsar Cluster Java Python Go C++ C
  • 31. • Provides type safety to applications built on top of Pulsar • Two approaches • Client side - type safety enforcement up to the application • Server side - system enforces type safety and ensures that producers and consumers remain synced • Schema registry enables clients to upload data schemas on a topic basis. • Schemas dictate which data types are recognized as valid for that topic Schema registry !31
  • 33. • Consume data as it is produced (pub/sub) • Heavy weight compute - continuous data processing (DAG Processing) • Light weight compute - transform and react to data as it arrives • Interactive query of stored streams How to process data modeled as streams !33
  • 34. Significant set of processing tasks are exceedingly simple • Data transformations • Data classification • Data enrichment • Data routing • Data extraction and loading • Real time aggregation • Microservices Lessons learned: Use cases !34
  • 35. Light weight compute !35 f(x) Incoming Messages Output Messages ABSTRACT VIEW OF COMPUTE REPRESENTATION
  • 36. Applying insight gained from serverless • Simplest possible API function or procedure • Support for multi language • Use native API for each language • Scale developers • Use of message bus native concepts - input and output as topics • Flexible runtime - simple standalone applications vs managed system applications Stream native compute using functions !36
  • 37. SDK-LESS API import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } } Pulsar Functions !37
  • 38. • ATMOST_ONCE • Message acked to Pulsar as soon as we receive it • ATLEAST_ONCE • Message acked to Pulsar after the function completes • Default behavior - don’t want people to loose data • EFFECTIVELY_ONCE • Uses Pulsar’s inbuilt effectively once semantics • Controlled at runtime by user Processing guarantees !38
  • 39. Deploying functions: Broker !39 Broker 1 Worker Function wordcount-1 Function transform-2 Broker 1 Worker Function transform-1 Function dataroute-1 Broker 1 Worker Function wordcount-2 Function transform-3 Node 1 Node 2 Node 3
  • 40. Deploying functions: Worker nodes !40 Worker Function wordcount-1 Function transform-2 Worker Function transform-1 Function dataroute-1 Worker Function wordcount-2 Function transform-3 Node 1 Node 2 Node 3 Broker 1 Broker 2 Broker 3 Node 4 Node 5 Node 6
  • 41. Deploying functions: Kubernetes !41 Function wordcount-1 Function transform-1 Function transform-3 Pod 1 Pod 2 Pod 3 Broker 1 Broker 2 Broker 3 Pod 7 Pod 8 Pod 9 Function dataroute-1 Function wordcount-2 Function transform-2 Pod 4 Pod 5 Pod 6
  • 42. Interactive querying of streams: Pulsar SQL !42 1234…20212223…40414243…60616263… Segment 1 Segment 3 Segment 2 Segment 2 Segment 1 Segment 3 Segment 4 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4 Segment Reader Segment Reader Segment Reader Segment Reader Coordina tor
  • 45. Apache Pulsar VS. Apache Kafka !45 Multi-tenancy A single cluster can support many tenants and use cases Seamless Cluster Expansion Expand the cluster without any down time High throughput & Low Latency Can reach 1.8 M messages/s in a single partition and publish latency of 5ms at 99pct Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Topic & Queue semantic in a single model Tiered Storage Hot/warm data for real time access and cold event data in cheaper storage Pulsar Functions Flexible light weight compute Highly scalable Can support millions of topics, makes data modeling easier
  • 46. Apache Pulsar VS. Apache Kafka !46 https://jack-vanlightly.com/sketches/2018/10/2/kafka-vs-pulsar-rebalancing-sketch Thanks to JACK VANLIGHTLY
  • 47. Apache Pulsar: Tying solutions together !47 Tiered Storage Stream Storage AWS S3 Google Cloud Storage Azure Blob Storage HDFS BookKeeper Analytics Presto SQL Messaging Pulsar Brokers Event Processing Pulsar Functions Complex Stream Pulsar IO Cassandra Kinesis MySQL MongoDB Other Frameworks
  • 48. • 4+ years • Serves 2.3 million topics • 500 billion messages/day • 400+ bookie nodes • 150+ broker nodes • Average latency < 5 ms • 99.9% 15 ms (strong durability guarantees) • Zero data loss • 150+ applications • Self served provisioning • Full-mesh cross-datacenter replication - 8+ data centers Apache Pulsar in production at scale !48
  • 49. • Twitter: @apache_pulsar • Wechat Subscription: ApachePulsar • Mailing Lists
 dev@pulsar.apache.org, users@pulsar.apache.org • Slack
 https://apache-pulsar.slack.com • Localization
 https://crowdin.com/project/apache-pulsar • Github
 https://github.com/apache/pulsar
 https://github.com/apache/bookkeeper Apache Pulsar community !49
  • 50. • Understanding How Pulsar Works
 https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works • How To (Not) Lose Messages on Apache Pulsar Cluster
 https://jack-vanlightly.com/blog/2018/10/21/how-to-not-lose-messages-on-an-apache- pulsar-cluster More readings !50
  • 51. • Unified queuing and streaming
 https://streaml.io/blog/pulsar-streaming-queuing • Segment centric storage
 https://streaml.io/blog/pulsar-segment-based-architecture • Messaging, Storage or Both
 https://streaml.io/blog/messaging-storage-or-both • Access patterns and tiered storage
 https://streaml.io/blog/access-patterns-and-tiered-storage-in-apache-pulsar • Tiered Storage in Apache Pulsar
 https://streaml.io/blog/tiered-storage-in-apache-pulsar More readings !51