Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Five years of operating a large scale globally replicated Pulsar installation — Francis&Ludwig Pummer

311 views

Published on

Five years of operating a large scale globally replicated Pulsar installation

Published in: Internet
  • Be the first to comment

Five years of operating a large scale globally replicated Pulsar installation — Francis&Ludwig Pummer

  1. 1. June 18, 2020 Five Years of Operating a Large Scale Globally Replicated Pulsar Installation Ludwig Pummer ludwig@verizonmedia.com Joe Francis joef@verizonmedia.com
  2. 2. People Ludwig Pummer ludwig@verizonmedia.com Principal Production Engineer Verizon Media Joe Francis joef@verizonmedia.com Director, Core Platforms Verizon Media 2
  3. 3. Agenda 3 1. Focus & Use Cases 2. Scaling up 3. Provisioning and Capacity 4. Hardware Evolution 5. JVM GC Experiences 6. Metrics and Monitoring 7. Deployment 8. Broker Isolation Policies 9. BookKeeper Storage Utilization 10. BookKeeper Rack Awareness
  4. 4. Our Focus ● Operate a hosted pub-sub service within VMG ○ open-sourced as Pulsar ● Global presence ○ 6 DC (Asia, Europe, US) ○ full mesh replication ● Business critical use cases ○ Serving use cases ○ Lower latency bus for other low latency service ○ Write availability 4
  5. 5. Use Cases ● Application integration ○ Server-to-server control, status, notification messages ● Persistent queue ○ Buffering, feed ingestion, task distribution ● Message bus for large scale data stores ○ Durable log ○ Replication within and across geo-locations 5
  6. 6. Trajectory 2015 1 tenant 2 clusters, 2 DC 60K wps @ 2KB 60K rps <100 topics 6 2016 20 tenants 12 clusters, 6 DC 500K wps avg 1.1M rps avg 1.4M topics 2020 100+ tenants 18 clusters, 6 DC 1.3M wps avg / 3M peak 2M rps avg / 6M peak 2.8M topics
  7. 7. More deliveries (increase fanout) → Add Brokers More publishes → Add Bookies, Brokers More storage → Add Bookies Massively more topics → Add Clusters (SuperCluster) See PIP 8 for SuperCluster (peer clusters) 7 Scaling up a Cluster
  8. 8. 8 Storage & I/O auto-balancing wps by namespace wps by bookkeeper automatic distribution Out for Reimage In after Reimage
  9. 9. Provisioning Model 9 New tenant provides: ● Average message size ● Peak publishes per second ● Steady-state deliveries per second (fan out) ● Per cluster/DC ● Tenants: x509 principal: Athenz ○ https://www.athenz.io/ ○ Open source platform for X.509 cert based service authentication and authorization Calculate: ● Broker messages/sec ● Broker bandwidth ● Bookie MB/sec
  10. 10. Hardware Evolution: Brokers 10 Pre-2015 8-core 24GB RAM 1G NIC 2015 12-core 48GB RAM 10G NIC 2020 12-core 96GB RAM 10G or 25G NIC
  11. 11. Hardware Evolution: BookKeepers 11 Pre-2015 12-core 32GB RAM 1G NIC 12 x 300GB 15K RPM SAS drives (2 x HW RAID-10 of 6 drives) 2015 12-core 64GB RAM 10G NIC 10 x 4TB 7.2k RPM SATA 2 x 120GB SSD (1 x HW RAID-10 of 10 drives 1 x HW RAID-1 of 2 SSDs) 2020 36-core 192GB RAM 25G NIC 4 x 4TB NVMe 2 x 128GB Optane Persistent Memory
  12. 12. Hardware Evolution: ZooKeepers 12 Pre-2015 8-core 24GB RAM 1G NIC 240 GB SSD 2015 12-core 64GB RAM 10G NIC 2x 240 GB SSD 2020 12-core 64GB RAM 10G NIC 2x 960 GB SSD
  13. 13. JVM Garbage Collector 13 G1GC Pulsar 1.x G1GC Pulsar 2.x ZGC Pulsar 2.x gc pause previous minute gc events previous minute AdoptOpenJDK 11 gc time previous minute gc events previous minute safepoint pause previous minute safepoint events previous minute AdoptOpenJDK 11
  14. 14. Metrics and Monitoring 14
  15. 15. Metrics and Monitoring: Too much data 15
  16. 16. Metrics and Monitoring: Tenant metrics 16 Brokers /admin/destinations collector topic monitor
  17. 17. Metrics and Monitoring: Tenant metrics 17
  18. 18. ● No downtime. Manage risk. ● Staged. Sequenced. ● Low parallelism ● Managed by Screwdriver jobs ○ https://docs.screwdriver.cd/ ○ Open Source Build Platform designed for Continuous Delivery ● Screwdriver launches Ansible-like tool Deployment 18
  19. 19. Deployment 19 Deploy Order Post-deploy checks 1. ZooKeeper (not prod) rejoined quorum 2. BookKeeper port, ZK node, bookiesanity 3. Broker ports, sanity
  20. 20. { "namespaces" : [ "tenant-one/.*" ], "primary" : [ "broker6[7-9].example.com" ], "secondary" : [ "none" ], "auto_failover_policy" : { "policy_type" : "min_available", "parameters" : { "min_limit" : "0", "usage_threshold" : "100" } } } 20 Broker Isolation Policies Uses ● High Profile/Reserved capacity ● Misbehaving tenants ● Debugging
  21. 21. Factors and Configuration impacting cluster storage utilization ● Number of topics x write throughput ● Increased write-quorum ● Increased topic TTL ● Increased retention period ● Compaction thresholds and intervals ● Over-replication ● MinLedgerRolloverTime, MaxLedgerRolloverTime, CursorRolloverTime ● Crossing BookKeeper compaction threshold 21 BookKeeper Storage Utilization
  22. 22. Rack as a failure domain 2 Logical Racks vs. N Logical Racks 22 BookKeeper Rack Awareness A D A B B C C D A D E A B F B C E C D F
  23. 23. Thank you. Ludwig Pummer ludwig@verizonmedia.com Joe Francis joef@verizonmedia.com

×