Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Pulsar at Yahoo! Japan

1,048 views

Published on

Nozomi from Yahoo! Japan gave a presentation how Yahoo! Japan uses Apache Pulsar to build their internal messaging platform for processing tens of billions of messages every day. He explains why Yahoo! Japan choose Pulsar and what are the use cases of Apache Pulsar and their best practices.

#PulsarBeijingMeetup

Published in: Internet
  • Be the first to comment

Apache Pulsar at Yahoo! Japan

  1. 1. Apache Pulsar at Yahoo! JAPAN Yahoo Japan Corporation Nozomi Kurihara Aug., 17th, 2019
  2. 2. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 2 Who am I? Nozomi Kurihara • Software engineer at Yahoo! JAPAN (April 2012 ~) • Working on internal messaging platform using Apache Pulsar • Committer of Apache Pulsar • (Hobby: Board / video games!)
  3. 3. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. Agenda 3 1. What is Apache Pulsar? 2. Why did Yahoo! JAPAN choose Apache Pulsar? 3. How does Yahoo! JAPAN use Apache Pulsar? 4. Future plans
  4. 4. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 4 What is Apache Pulsar?
  5. 5. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 5 Apache Pulsar Flexible pub-sub system backed by durable log storage ▪ History: › 2014 Development started at Yahoo! Inc. › 2015 Available in production in Yahoo! Inc. › Sep. 2016 Open-sourced (Apache License 2.0) › June 2017 Moved to Apache Incubator Project › Sep. 2018 Graduated as Top Level Project! ▪ Users: › Verizon media (Yahoo! Inc.) › Comcast › The Weather Channel › Mercado Libre › Streamlio › Yahoo! JAPAN etc. ▪ Competitors: › Apache Kafka › RabbitMQ › Apache ActiveMQ › Apache RocketMQ etc.
  6. 6. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 6 Pub-Sub messaging Message transmission from one system to another via Topic ▪ Producers publish messages to Topics ▪ Consumers receive only messages from Topics to which they subscribe ▪ Decoupled (no need to know each other) → asynchronous, scalable, resilient TopicProducer Consumer 1 Consumer 2 Consumer 3 Pub-Sub system message (log, notification, etc.) Publish Subscribe
  7. 7. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 7 Architecture Producer Consumer Broker 1 Broker 2 Broker 3 Bookie 1 Local ZK Bookie 2 Bookie 3 Pulsar Cluster Configuration Store (Global ZK) ■3 components: ‣ Broker ‣ Bookie ‣ ZooKeeper
  8. 8. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 8 Architecture - Broker ■Broker ‣ Serving node for clients’ requests ‣ No data locality (stateless) Producer Consumer Broker 1 Broker 2 Broker 3 Bookie 1 Local ZK Bookie 2 Bookie 3 Pulsar Cluster Configuration Store (Global ZK)
  9. 9. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 9 Architecture - Bookie Apache BookKeeper: distributed write-ahead log system Producer Consumer Broker 1 Broker 2 Broker 3 Bookie 1 Local ZK Bookie 2 Bookie 3 Pulsar Cluster Configuration Store (Global ZK) Copyright © 2016 - 2018 The Apache Software Foundation,
 licensed under the Apache License, version 2.0. ■Bookie (Apache BookKeeper) ‣ Storage node for messages ‣ Durable, Scalable, Consistent, Fault-tolerant, Low- latency
  10. 10. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 10 Architecture - ZooKeeper Copyright © 2016 - 2018 The Apache Software Foundation,
 licensed under the Apache License, version 2.0. Producer Consumer Broker 1 Broker 2 Broker 3 Bookie 1 Local ZK Bookie 2 Bookie 3 Pulsar Cluster Configuration Store (Global ZK) ■Apache ZooKeeper ‣ Store metadata and configuration ‣ Local ZK: within local cluster ‣ Configuration Store: across all clusters
  11. 11. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 11 Why did Yahoo! JAPAN choose Apache Pulsar?
  12. 12. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 12 Yahoo! JAPAN https://www.yahoo.co.jp/
  13. 13. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 13 Yahoo! JAPAN – 3 numbers 100+ 150,000+ 93,000,000+ image: aflo Unique Browsers (avg in 2018/7-9) servers (real) services
  14. 14. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 14 Why did Yahoo! JAPAN choose Pulsar? ▪ Large number of customers ▪ Large number of services ▪ Sensitive/mission-critical messages ▪ Multiple data centers → High performance & scalability → Multi-tenancy → Durability → Geo-replication Pulsar meets all these requirements!
  15. 15. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 15 Scalability Just adding Brokers/Bookies increases serving/storage capacity! (no special operation e.g. data rebalancing is required) Producer Consumer Broker 1 Broker 2 Broker 3 Bookie 1 Local ZK Bookie 2 Bookie 3 Pulsar Cluster Configuration Store (Global ZK) Broker X for more serving capacity for more storage capacity Bookie Y
  16. 16. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 16 Multi-tenancy Multiple services can share one Pulsar system ▪ Just use Pulsar as a “Tenant” → no need to maintain own messaging system ▪ Authentication/Authorization mechanism protects messages from interception ProducerService A Consumer Producer Consumer Producer Consumer Producer Consumer Topic A Topic B Topic C Topic D Service B Authentication/Authorization blocks unauthorized access Service C Service D Pulsar System
  17. 17. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 17 Geo-replication Producer Topic Pulsar can replicate messages to another cluster 1. Producers only have to publish messages to Pulsar in the same data center 2. Pulsar asynchronously replicates messages to another cluster 3. Consumers can receive messages from the same data center Pulsar Cluster A Consumer Consumer Consumer Data center A Topic Consumer Consumer Consumer Data center B Geo-replication Pulsar Cluster B
  18. 18. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 18 How does Yahoo! JAPAN use Apache Pulsar?
  19. 19. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. East Broker Bookie ZK WebSocket Proxy 19 System architecture in Yahoo! JAPAN Service B (Java) Service A (Node.js) West Broker Bookie ZK WebSocket Proxy Geo-replication Service C (C++) Prometheus + Grafana Collect metrics + Visualize For each cluster: ・20 WSs ・15 Brokers ・10 Bookies ・5 ZKs
  20. 20. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 20 Users More and more services start to use Pulsar! • 210+ tenants • 4000+ topics • ~100K publishes/s • ~180K subscribes/s Typical use cases: • Notification • Job queueing • Log pipeline
  21. 21. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 21 Case 1 – Notification of contents update ▪ Various contents files pushed from partner companies to Yahoo! JAPAN ▪ Notification sent to topic when contents are updated ▪ Once services receive notification, they then fetch contents from file server Producer Consumer Topic Service A Pulsar ①send notification ③fetch content files Consumer Service B Consumer Service CPartner Companies weather, map, news etc. FTP server ftpd ②receive notification
  22. 22. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 22 Case 2 – Job queuing in mail service ▪ Indexing of mail can be heavy → you can execute it asynchronously ▪ Producers register jobs to Pulsar ▪ Consumers take jobs from Pulsar at their own pace Producer Consumer Producer Topic Handler for indexing Mail BE server Mail BE server Pulsar request Register a job Re-register if it fails Take and process a job
  23. 23. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 23 Case 3 – Log pipeline ▪ Publisher: computing platforms on which YJ applications are running ▪ Subscriber: data platforms (monitoring, analyzing, storing etc.) Pulsar app2 app3 … PaaS container1 … CaaS container2 app1 … Monitoring Analyzing Storing PaaS_logs CaaS_logs Computing PFs Data PFs … logs Service developers deploy apps check logs
  24. 24. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 24 Case 3 – Log pipeline Pulsar app2 app3 … PaaS container1 … CaaS container2 app1 … Monitoring Analyzing Storing PaaS_logs CaaS_logs Computing PFs Data PFs … Logs can have destinations → Consumers need to filter them To: Monitoring To: Analyzing Filtering discard Filtering Filtering
  25. 25. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 25 Case 3‘ – Log pipeline + filtering (Future plan) ▪ Filtering on Pulsar side ▪ Pulsar Function is helpful to filter! Pulsarapp2 app3 … PaaS container1 … CaaS container2 app1 … Pulsar Functions … Monitoring Analyzing Storing PaaS_logs CaaS_logs For Monitoring For Analyzing For Storing Computing PFs Data PFs Filtering
  26. 26. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 26 Migration from Kafka ▪ We have an internal FaaS system using Apache OpenWhisk ▪ Problem: FaaS team had to maintain Apache Kafka ▪ Solution: migrate from Kafka to our internal Pulsar ▪ Pulsar Kafka Wrapper needs only a few configuration changes (.pom, topic name, etc.)  <dependency> -  <groupId>org.apache.kafka</groupId> -  <artifactId>kakfa-clients</artifactId> -  <version>0.10.2.1</version> +  <groupId>org.apache.pulsar</groupId> +  <artifactId>pulsar-client-kafka</artifactId> +  <version>2.4.0</version>  </dependency>
  27. 27. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 27 Future plans
  28. 28. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 28 Node.js Client Node.js users can easily use Pulsar! Implementation: • https://github.com/apache/pulsar-client-node • Based on C++ Client Done: ✅ basic functionalities(producer, consumer, reader) ✅ test codes ✅ performance scripts Todo: • publish to npm registry • Fix release flow • support more features (multi-topic consume, unack etc.)
  29. 29. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 29 Admin WebUI (under development) Administrators can easily and intuitively manage Pulsar topics! Implementation • https://gist.github.com/massakam/8e9bd3ca62874f18cf3ce3ecb6db1473 • Vue.js + Express Done: ✅ basic pages (tenants, namespaces, topics etc.) Todo: • open repository • advanced commands (unload, skip-messages etc.) • authentication to Broker
  30. 30. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 30
  31. 31. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 31
  32. 32. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 32
  33. 33. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 33
  34. 34. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 34 Looking for the best naming! pulsar-ui? pulsar-console? pulsar-manager?
  35. 35. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 35 Conclusion
  36. 36. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. Conclusion 36 ▪ Apache Pulsar is fast, durable, scalable, multi-tenant messaging platform, has useful built-in features like geo replication and Pulsar Functions etc. ▪ Yahoo! JAPAN uses it as a centralized platform for various services ▪ Node.js Client is already open-sourced and Admin UI will come soon Welcoming your any contributions, because it’s an OPEN-SOURCE!

×