Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Zhaopin contributes to Pulsar community

67 views

Published on

In Apache Pulsar Beijing Meetup, Penghui Li and Bo Cong from Zhaopin.com shares their experiences of how do they work with Pulsar community and how do they contribute features to Pulsar community. They also gave a deep dive into the features that they contributed to Pulsar.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

How Zhaopin contributes to Pulsar community

  1. 1. PHOTO Zhaopin in Pulsar community Penghui Li 李李鹏辉 Messaging platform leader in zhaopin.com Apache Pulsar Committer
  2. 2. Our team Penghui Li Bo Cong
  3. 3. Apache Pulsar in zhaopin.com 2018/08 First service for online applications 2018/10 1 billion / day 2019/02 6 billion / day 2019/08 20 billion / day 50+ Namespaces 5000+ Topics
  4. 4. 1. Features of zhaopin contributing to the community 2. Details of Key_shared subscription 3. Release Pulsar 4. Details of Pulsar multiple schema version 5. Details of HDFS Offloader
  5. 5. Dead letter topic Topic Topic-DLQConsumer … 2 1 0 1 message 1 process failed so many times
  6. 6. Client interceptors TopicProducer Consumer Send Send Ack Receive Acknowledge
  7. 7. Time partitioned un-ack message tracker p-4 p-3 p-2 p-1 p-0 Current partition Timeout partition Add messages to tracker Send redelivery request
  8. 8. Message redelivery optimization 7 3 2 06 5 4 1 08 Consumer internal queue ConsumerBroker 0 1 2 3 4 5 6 7 0
  9. 9. Key_shared subscription A new subscription mode in 2.4.0 Producer 1 Producer 2 Pulsar topic <k1,v0> <k2,v1> <k3,v2> <k2,v3> <k1,v4> Subscription Consumer D-1 Consumer D-2 Consumer D-3 <k1,v0> <k1,v4> <k3,v2> <k2,v1> <k2,v3>
  10. 10. Start with Key_shared subscription Consumer consumer = client.newConsumer() .topic(“my-topic”) .subscriptionName(“my-sub”) .subscriptionType(SubscriptionType.Key_Shared) .subscribe()
  11. 11. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9
  12. 12. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-2
  13. 13. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-2 Consumer-3
  14. 14. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-4
  15. 15. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-3Consumer-4
  16. 16. Key-based message batcher p-0 p-1 p-2 <k3,v0> <k2,v0> <k2,v1> <k3,v1> <k1,v1> <k4,v0> <k5,v0> <k6,v0> <k6,v1>
  17. 17. Key-based message batcher p-0 p-1 p-2 <k3,v0> <k2,v0> <k2,v1> <k3,v1> <k1,v1> <k4,v0> <k5,v0> <k6,v0> <k6,v1>
  18. 18. Use Key-based message batcher Consumer consumer = client.newConsumer() .topic(“my-topic”) .subscriptionName(“my-sub”) .subscriptionType(SubscriptionType.Key_Shared) .batcherBuilder(BatcherBuilder.KEY_BASED) .subscribe()
  19. 19. Pulsar SQL improvements ✓ Namespace delimiter rewriter ✓ Partition as internal column ✓ Primitive schema handle ➡ Multiple version schemas handle
  20. 20. Some other improvements ✓ Service URL provider ✓ Consumer reconnect limiter ➡ Messages batch receive
  21. 21. Next ★ Topic level policy ★ Sticky consumer
  22. 22. 2.4.0 Release 1. New branch and tag 2. Stage release (check -> sign -> stage) 3. Move master to new version and write release notes 4. Start vote 5. Promote release and publish 6. Update site and announce the release
  23. 23. PHOTO Schema versioning & HDFS offloader Bo Cong 丛搏 Message platform engineer in zhaopin.com Apache Pulsar contributor
  24. 24. The meaning of multi-version schema message 1 message 2 message 3 message 4 message 5 Message's schema is not immutable version 0 version 1 version 2 version 3 version 4schema
  25. 25. Problems caused by version changes Class Person { Int id; } Class Person { Int id; @AvroDefault(""Zhang San"") String name; } Class Person { Int id; String name; } Version 0 Version 2 Version 1 Can read Can readCan’t read
  26. 26. Change in compatibility policy Back Ward Back Ward Transitive version 2 version 1 version 0 can read can read version 2 version 1 version 0 can read can read can read
  27. 27. Schema creation process admin client api admin rest api producer create consumer subscribe schema data SchemaRegistryService new schema old schema version compatibility check Incompatible version
  28. 28. Multi-version use in pulsar Avro schema message 1 version 0 message 2 version 1 message 3 version 2 version 3 consumer
  29. 29. SchemaInfoProvider Message exist new AvroReader() Multi-version use in pulsar Avro schema new ReflectDatumReader<>(writerSchema, readerSchema) ReaderCache Version0 read not exist find schema by version 0 from broker read If the read and write schema is different in the Avro schema, the reader needs to generate the corresponding read and write schema.
  30. 30. Multi-version use in pulsar Auto consume schema only support AvroSchema and JsonSchema GenericAvroRecord GenericJsonRecord getField unlike JsonSchema or AvroSchema, the reader only needs writerSchema. Consumer<GenericRecord> consumer = client .newConsumer(Schema.AUTO_CONSUME()) .topic("test") .subscriptionName("test") .subscribe();
  31. 31. The use of schema definition Class Person { @Nullable String name; } SchemaDefinition<Person> schemaDefinition = SchemaDefinition.<Person>builder() .withAlwaysAllowNull(false) .withPojo(Person.class).build(); Producer<Person> producer = null; producer = client .newProducer(Schema.AVRO(schemaDefinition)) .topic("persistent://public/default/test") .create();
  32. 32. Why do we need HDFS offloader Bookeeper HDFS ManagedLedger (Broker) •Cold and Heat Data Separation SSD HDD High throughput Low latency Massive data storage
  33. 33. Offload topic ledgers to HDFS stored relative path tenant/namespace/topic/ledgerId + "-" + uuid.toString() topic ledger 1 ledger 2 ledger 3 index data index data index data
  34. 34. HDFS Offloader storage structure •Storage mode use org.apache.hadoop.io.MapFile Index Data entryID entryData entryID entryData entryID entryData entryID entryID entryID entryID entryID entryID entryID
  35. 35. Configuring HDFS offloader broker.conf managedLedgerOffloadDriver=filesystem offloadersDirectory=./offloaders fileSystemURI=hdfs://127.0.0.1:9000 fileSystemProfilePath=../conf/filesystem_offload_core_site.xml
  36. 36. Thanks!

×