Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Pulsar First Overview

288 views

Published on

This is an overview of interesting features from Apache Pulsar. Keep in mind that by the time I did this presentation I did not have used Pulsar yet. It's just my first impressions from the list of features.

Published in: Technology
  • Be the first to comment

Apache Pulsar First Overview

  1. 1. Ricardo Paiva First impressions of Apache Pulsar features from someone that have never used it. :) Apache Pulsar First Overview
  2. 2. Motivation
  3. 3. 3 • Kafka is an amazing tool, with increadible througput and resilience, but it has some drawbacks or lacks few features:  Capacity of a partition is limited by the smallest node  Ops - Add/remove a new broker requires cluster rebalancing  No long term storage  Only sub/pub client pattern (no work queue)  No namespace or tenancy management  No multi-cluster replication Motivation
  4. 4. Key concepts
  5. 5. 5 • Tiered Storage Uses Apache Jclouds
  6. 6. 6 • Multi-tenant and Namespace
  7. 7. Pulsar Components
  8. 8. 8 • Brokers
  9. 9. 9 • Bookies
  10. 10. 10 • Producer
  11. 11. 11 • Consumer
  12. 12. 12 • Zookeeper
  13. 13. 13 •  It uses BookKeeper but other schema registry can be plugged  Can be uploaded when a typed Producer is created or via REST API  Versioned  Defined at topic level  Format types:  String (used for UTF-8-encoded strings)  JSON  Protobuf  Avro  Only works with Java Schema Registry
  14. 14. Subscription modes
  15. 15. 15 • Message Acknowledgment
  16. 16. 16 •  Message Retention  Applies to messages that are marked as acknowledged and set to be deleted  It’s a time limit applied on a topic whereas.  TTL  Applies to messages that were not consumed  It’s a time limit on consumption with a subscription. Retention
  17. 17. 17 • Exclusive
  18. 18. 18 • Failover
  19. 19. 19 • Shared (Working queue)  Message ordering is not guaranteed.  You cannot use cumulative acknowledgment with shared mode.
  20. 20. Internals
  21. 21. 21 • Bookie Storage
  22. 22. 22 • Cold storage
  23. 23. 23 • SQL with Presto
  24. 24. Other features
  25. 25. 25 • Geo Replication (Sync)  Requires global Zookeeper installation  Region Aware Placement Policy  Higher latency
  26. 26. 26 • Geo Replication (ASync)  Rack Aware Placement Policy  First persisted to the local cluster and then replicated asynchronously to the remote clusters  Enabled on a per-tenant basis  Types:  master-slave replication  active-active bidirectional replication  full-mesh replication between multiple data centers
  27. 27. 27 •  Per producer/topic sequence numbers to detect duplicates  Each topic owner broker maintains an in-memory hashmap of the latest sequence number per topic/producer.  The broker periodically snapshots the latest sequence number to a cursor, which allows the map to be reconstructed by another broker after a fail-over. Deduplication https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar
  28. 28. 28 •  Lightweight compute framework for Pulsar  Can run inside or outside the cluster  State storage is handled by BookKeeper  "Serverless" idea Pulsar Functions

×