Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What's new in apache pulsar 2.4.0

118 views

Published on

We are very glad to see the Apache Pulsar community has successfully released the wonderful 2.4.0 release after a few months of accumulated hard works. It is a great milestone for this fast-growing project and the whole Pulsar community. Sijie shares a selection of some of the most interesting and important features the community added to this new release.

Published in: Internet
  • Be the first to comment

What's new in apache pulsar 2.4.0

  1. 1. Sijie Guo (@sijieg) What’s new in Pulsar 2.4.0
  2. 2. Who am I ❏ Pulsar PMC Member ❏ BookKeeper PMC Chair ❏ Ex-Twitter, Ex-Yahoo ❏ Interested in event streaming technologies
  3. 3. Pulsar Releases ❏ 2.2.0 - 10/27/2018 ❏ 2.2.1 - 12/14/2018 ❏ 2.3.0 - 02/22/2019 ❏ 2.3.1 - 04/12/2019 ❏ 2.3.2 - 05/02/2019 ❏ 2.4.0 - 06/29/2019
  4. 4. Event Streaming Platform
  5. 5. Highlights of Pulsar 2.4.0 ❏ Key_Shared Subscription ❏ Delayed/Scheduled Messages ❏ Replicated Subscription ❏ Kerberos Authentication ❏ Configurable MaxMessageSize ❏ Go Functions ❏ Schema Enhancements
  6. 6. Key_Shared Subscription (1)
  7. 7. Key_Shared Subscription (2) ❏ Existing Subscription Modes ❏ Streaming modes : Exclusive / Failover ❏ Pro: partition based ordered consumption ❏ Con: the consumption parallelism is limited by the number of partitions ❏ Queuing mode: Shared ❏ Pro: scale beyond the limitation of the number of partitions ❏ Con: unordered consumption
  8. 8. Key_Shared Subscription (3)
  9. 9. Key_Shared Subscription (4) ❏ Key based ordering ❏ Key can be message key or a separated *order* key ❏ HashRing based routing ❏ Key based batcher ❏ Policies for messages without *keys* https://github.com/apache/pulsar/wiki/PIP-34:-Add-new-subscribe-type-Key_shared
  10. 10. Key_Shared Subscription (5) ❏ Future Tasks ❏ Sticky Key-Range Consumer ❏ Use case: EO & Auto scaling for Flink ❏ Consumer selector policy ❏ Consumer priority ❏ Auto scale up-and-down partitions (*) https://github.com/apache/pulsar/issues/4077
  11. 11. Delayed / Scheduled Messages (1) deliverAfter producer.newMessage() .deliverAfter(3L, TimeUnit.Minute) .value("Hello Pulsar after 3 minutes!") .send(); deliverAt producer.newMessage() .deliverAt(new Date(...).getTime()) .value("Hello Pulsar at ...") .send(); https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
  12. 12. Delayed / Scheduled Messages (2) ❏ DelayedDeliveryTracker Abstraction ❏ In-memory priority queue implementation ❏ Plan for memory and resource usage ❏ A persistent hashed-wheel implementation in 2.5.0+ https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery
  13. 13. Replicated Subscription (1) ❏ Problem ❏ Data are replicated asynchronously between regions ❏ Subscriptions are local to regions ❏ Subscription state is not replicated across regions ❏ Seek back by time when moving a subscription from one region to the other region
  14. 14. Replicated Subscription (2) ❏ Solution: replicate subscription state ❏ Establish an association between messaging ids from different regions ❏ Distributed Snapshot ❏ Snapshots are stored as “marker” messages ❏ Marker messages are inline with other messages and replicated across regions
  15. 15. Replicated Subscription (3) ❏ Enable replicated subscription https://github.com/apache/pulsar/wiki/PIP-33:-Replicated-subscriptions
  16. 16. Kerberos Authentication ❏ PIP-30: Mutual Authentication
  17. 17. Configurable MaxMessageSize ❏ MaxMessageSize was limited at 5 MB ❏ Introduce a new broker setting `maxMessageSize` ❏ Introduce a new field in client protocol `max_message_size` ❏ Client discovers the max_message_size of a broker at connecting phase and set the batch buffer accordingly https://github.com/apache/pulsar/wiki/PIP-36%3A-Max-Message-Size
  18. 18. Go Functions
  19. 19. Schema Enhancements ❏ Schema versioning support ❏ Transitive schema compatibility check strategies ❏ BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE, FULL_TRANSITIVE ❏ SchemaBuilder, RecordBuilder ❏ AUTO_CONSUME ❏ KeyValue Schema
  20. 20. Pulsar Ecosystem ❏ Clients ❏ Connectors ❏ Pulsar UI & management tools ❏ Tracing integration ❏ Camel integration ❏ Big data ecosystem integration
  21. 21. Clients ❏ Golang: https://github.com/apache/pulsar-client-go ❏ @merlimat @wolfstudy ❏ Nodejs: https://github.com/apache/pulsar-client-node ❏ Yahoo Japan ❏ Ruby: https://github.com/apache/pulsar-client-ruby ❏ Rust: https://github.com/wyyerd/pulsar-rs
  22. 22. Connectors ❏ Flume Source and Sink #3597 ❏ Redis Sink #3700 ❏ Solr Sink #3885 ❏ RabbitMQ Sink #3967 ❏ InfluxDB Sink #4017 ❏ Flume Ng Pulsar Sink ❏ Elastic beat output to Pulsar
  23. 23. Pulsar UI & management tools ❏ Pulsar Express ❏ Pulsar Web UI from Yahoo Japan ❏ Pulsar Manager
  24. 24. Tracing Integration ❏ Zipkin Pulsar Transport: https://github.com/openzipkin/zipkin/issues/2297 ❏ Skywalking Pulsar Integration (by Zhaopin.com)
  25. 25. Camel Integration ❏ Contributed by The Hut Group ❏ Available in Camel 3 and Camel 2.4 ❏ A feather in their caps
  26. 26. Pulsar-Spark (1) ❏ Spark Structured Streaming Connector ❏ Spark SQL Support ❏ Integrating with Pulsar schema https://github.com/streamnative/pulsar-spark
  27. 27. Pulsar-Spark (2) - Streaming Queries https://github.com/streamnative/pulsar-spark
  28. 28. Pulsar-Spark (3) - Batch Queries https://github.com/streamnative/pulsar-spark
  29. 29. Pulsar-Spark (4) - Schema of Pulsar Source https://github.com/streamnative/pulsar-spark ❏ Topics without schema or with primitive schemas ❏ `value` field for message payload ❏ Topics with struct schemas (AVRO, JSON) ❏ Field names and types are kept in the row ❏ Metadata Fields ❏ __key: Binary ❏ __topic: String ❏ __messageId: Binary ❏ __publishTime: Timestamp ❏ __eventTime: Timestamp
  30. 30. Pulsar-Spark (5) - Schema Examples Primitive Schema Avro Schema https://github.com/streamnative/pulsar-spark
  31. 31. Pulsar-Spark (6) - Write query results to Pulsar https://github.com/streamnative/pulsar-spark
  32. 32. BigData Ecosystem Integration ❏ pulsar-flink ❏ Schema integration, State integration ❏ pulsar-hive ❏ External tables, managed tables ❏ Partitioned tables
  33. 33. Community ❏ Recurring Pulsar Meetups ❏ Conferences ❏ KubeCon Pulsar Booth from Yahoo Japan ❏ ApacheCon 2019 : 5 Pulsar talks accepted ❏ Flink Forward SF & Berlin 2019 ❏ QCon, ArchSummit, …
  34. 34. 2.5.0, Ecosystem & More ❏ Transaction Support ❏ Batch receive interface ❏ HDFS Offloader ❏ Columnar Offloader ❏ Auto partition scaling up-and-down ❏ Schema & EO support in pulsar-flink integration ❏ Go Functions (metrics, secrets, …) ❏ Kafka Gateway
  35. 35. Resources ❏ Twitter: @apache_pulsar / @sijieg / @streamnativeio ❏ Slack: https://apache-pulsar.herokuapp.com ❏ 2.4.0 Release Notes ❏ What’s New in Apache Pulsar 2.4.0 ❏ Pulsar Spark SQL & Structured Streaming Connector ❏ Curated list of Pulsar resources
  36. 36. Thanks!

×