Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distributed pub/sub platform
github.com/yahoo/pulsar
Matteo Merli — mmerli@yahoo-inc.com
11/17/2016
Agenda
1. Pulsar Overview
2. Common use cases
3. Messaging API
4. Architecture
5. Future
6. Q&A
What is Pulsar?
3
▪ Hosted pub-sub messaging
▪ Simple messaging model
▪ Highly scalable
› Topics, Message throughput
▪ Ord...
Pulsar production usage stats
4
▪ 1.5+ years
▪ 1.4 Million topics
▪ Publishes 100 billion messages/day (delivery 7x)
▪ Ave...
Why build a new system?
5
▪ No existing solution to satisfy requirements
› Multi tenant — 1M topics — Low latency — Durabi...
Common use cases
Message queue
7
▪ Decouple online / background
▪ Provide high-availability
▪ Reliable data transport
Online
events
Pulsar
...
Notifications
8
▪ Listeners are frequently different tenants
▪ Quotas needs to ensure producer is not affected
Event
Pulsa...
Feedback system
9
External
inputs
Pulsar
topic 1
Serving
system
Serving
system
Serving
system
Pulsar
topic 2
Controller
Up...
Geo replication
10
▪ Asynchronous replication
▪ Integrated in the broker message flow
▪ Simple configuration to add/remove...
Platforms
11
▪ Pulsar used to build other platforms
▪ Provide high-level abstraction with strict guarantees
▪ Example: She...
Messaging API
Messaging Model
13
Consumer-A1 receives all messages published on T; B1, B2, B3 receive one third each
Shared
Exclusive
Co...
Producer example
14
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = ...
Consumer example
15
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = ...
Additional client library features
16
▪ Partitioned topics
▪ Transparent batching of messages
▪ Compression
▪ End-to-end c...
Architecture
Architecture / 1
18
Broker
‣ Clients interacts only
with brokers
‣ No durable state
Bookie
‣ Apache BookKeeper
storage nod...
Architecture / 2
19
Separate layers
between brokers
bookies
‣ Broker and bookies can
be added
independently
‣ Traffic can ...
Architecture / 3
20
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Service
discovery
Producer
App
Pulsar
lib
Replication
Manage...
Architecture / 4
21
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Service
discovery
Producer
App
Pulsar
lib
Replication
Manage...
BookKeeper
22
▪ Replicated log service
▪ Offer consistency and durability
▪ Restores replication factor after node failure...
BookKeeper - Storage
23
▪ A single bookie can serve
and store thousands of
ledgers
▪ Write and read paths are
separated:
›...
Single topic — Throughput and latency
24
Throughput and 99pct publish latency — 1 Topic — 1 Producer
Latency(ms)
0
1
2
3
4...
Future
Future
26
▪ WebSocket API
› More language bindings based on top of it
▪ C++ API
› Existing C++ client library is being pre...
Final Remarks
• Check out the code and docs at github.com/yahoo/pulsar
• Give feedback or ask for more details on mailing ...
Upcoming SlideShare
Loading in …5
×

Pulsar - Distributed pub/sub platform

1,084 views

Published on

Tech talk on Pulsar

Published in: Software
  • Be the first to comment

Pulsar - Distributed pub/sub platform

  1. 1. Distributed pub/sub platform github.com/yahoo/pulsar Matteo Merli — mmerli@yahoo-inc.com 11/17/2016
  2. 2. Agenda 1. Pulsar Overview 2. Common use cases 3. Messaging API 4. Architecture 5. Future 6. Q&A
  3. 3. What is Pulsar? 3 ▪ Hosted pub-sub messaging ▪ Simple messaging model ▪ Highly scalable › Topics, Message throughput ▪ Ordering, durability & delivery guarantees ▪ Supports multi-tenancy ▪ Geo-replication ▪ Easy to operate (Amin APIs, Add capacity, replace machines) Pulsar Cluster Broker Bookie ZK Global ZK Producer Consumer Replication
  4. 4. Pulsar production usage stats 4 ▪ 1.5+ years ▪ 1.4 Million topics ▪ Publishes 100 billion messages/day (delivery 7x) ▪ Average latency < 5ms, 99% 15ms ▪ Zero data loss ▪ 80+ applications ▪ Critical component of major Yahoo systems: › Mail, Finance, Sports, Gemini Ads ▪ Self-Served provisioning ▪ Full-mesh cross-datacenter replication – 8 data centers
  5. 5. Why build a new system? 5 ▪ No existing solution to satisfy requirements › Multi tenant — 1M topics — Low latency — Durability — Geo replication ▪ Kafka doesn’t scale well with many topics: › Storage model based on individual directory per topic partition › Enabling durability kills the performance ▪ Operations are not very convenient › eg: replacing a server, manual commands to copy the data and involves clients › clients access to ZK clusters not desirable ▪ Ability to manage large backlogs ▪ No scalable support to keep consumer position
  6. 6. Common use cases
  7. 7. Message queue 7 ▪ Decouple online / background ▪ Provide high-availability ▪ Reliable data transport Online events Pulsar topic 1 Worker 1 Worker 2 Worker 3 Pulsar topic 2 Low latency publish Long running task Notification
  8. 8. Notifications 8 ▪ Listeners are frequently different tenants ▪ Quotas needs to ensure producer is not affected Event Pulsar topic Component 1 Component 2 Component 3 Listeners
  9. 9. Feedback system 9 External inputs Pulsar topic 1 Serving system Serving system Serving system Pulsar topic 2 Controller Updates Feedback ▪ Coordinate a large number of machines ▪ Propagate state
  10. 10. Geo replication 10 ▪ Asynchronous replication ▪ Integrated in the broker message flow ▪ Simple configuration to add/remove regions
  11. 11. Platforms 11 ▪ Pulsar used to build other platforms ▪ Provide high-level abstraction with strict guarantees ▪ Example: Sherpa distributed key-value store › Massive database powering most of Yahoo’s online data serving applications › Built upon the concept of a common message bus › Pulsar provides: • Durable log • Replication within and across geo-locations
  12. 12. Messaging API
  13. 13. Messaging Model 13 Consumer-A1 receives all messages published on T; B1, B2, B3 receive one third each Shared Exclusive Consumer-B1 Consumer-B2 Consumer-B3 Topic-T Subscription-B Subscription-A Consumer-A1 Producer-X Producer-Y
  14. 14. Producer example 14 PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
  15. 15. Consumer example 15 PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }
  16. 16. Additional client library features 16 ▪ Partitioned topics ▪ Transparent batching of messages ▪ Compression ▪ End-to-end checksum ▪ TLS encryption ▪ Individual and cumulative acknowledgment ▪ Client side stats
  17. 17. Architecture
  18. 18. Architecture / 1 18 Broker ‣ Clients interacts only with brokers ‣ No durable state Bookie ‣ Apache BookKeeper storage nodes ‣ Distributed write-ahead log ‣ Each machine stores data from many topicsPulsar Cluster ZK Producer Consumer Broker 1 Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Broker 2
  19. 19. Architecture / 2 19 Separate layers between brokers bookies ‣ Broker and bookies can be added independently ‣ Traffic can be shifted very quickly across brokers ‣ New bookies will ramp up on traffic quickly Pulsar Cluster ZK Producer Consumer Broker 1 Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Broker 2
  20. 20. Architecture / 3 20 Pulsar Cluster Broker Bookie ZK Global ZK Service discovery Producer App Pulsar lib Replication Managed Ledger BK Client Global replicators Cache Dispatcher Consumer App Pulsar lib Load Balancer Client library ‣ Lookup correct broker through service discovery ‣ Direct connection to broker ‣ When connection is established, authentication and authorization are enforced ‣ Reconnect with back off strategy
  21. 21. Architecture / 4 21 Pulsar Cluster Broker Bookie ZK Global ZK Service discovery Producer App Pulsar lib Replication Managed Ledger BK Client Global replicators Cache Dispatcher Consumer App Pulsar lib Load Balancer Dispatcher ‣ End-to-end async message processing ‣ Messages are relayed across producers, bookies and consumers with no copies ‣ Pooled ref-counted buffers Managed Ledger ‣ Abstraction for single topic storage ‣ Cache recent messages
  22. 22. BookKeeper 22 ▪ Replicated log service ▪ Offer consistency and durability ▪ Restores replication factor after node failures ▪ Why is it a good choice for Pulsar? › Very efficient storage for sequential data › Very good distribution of IO across all bookies • For each topic we are creating multiple ledgers over time › Isolation of write and reads › Flexible model for quorum writes with different tradeoffs
  23. 23. BookKeeper - Storage 23 ▪ A single bookie can serve and store thousands of ledgers ▪ Write and read paths are separated: › Avoid read activity to impact write latency › Writes are added to in- memory write-cache and committed to journal › Write cache is flushed in background to separated device ▪ Entries are sorted to allow for mostly sequential reads
  24. 24. Single topic — Throughput and latency 24 Throughput and 99pct publish latency — 1 Topic — 1 Producer Latency(ms) 0 1 2 3 4 5 6 Throughput (msg/s) 1,000 10,000 100,000 1,000,000 10,000,000 1,800,000 10 Bytes 100 Bytes 1KB
  25. 25. Future
  26. 26. Future 26 ▪ WebSocket API › More language bindings based on top of it ▪ C++ API › Existing C++ client library is being prepared for OSS release ▪ End-to-End data encryption › Use symmetric/asymmetric encryption from producer to consumer › Data encrypted in flight and at rest › Don’t need to trust the service for security ▪ Globally consistent topics › Store the data in multiple regions › Can migrate across regions with consistency
  27. 27. Final Remarks • Check out the code and docs at github.com/yahoo/pulsar • Give feedback or ask for more details on mailing lists: • Pulsar-Users • Pulsar-Dev

×