Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How Tencent Applies Apache Pulsar to Apache InLong - A Streaming Data Integration Platform 1
Self Intro ZIRUI PENG 2 • Software Engineer – Mainly focused on Storage • Working in Tencent big data group (my first year...
3 Apache InLong • Apache InLong is a one-stop data streaming platform that provides automatic, secure, distributed, and ef...
4 SDK Agent Http Bus Ingest Converge InLong Structure Manager TubeMQ Pulsar KOP SDK HBase Flink Sort Consume Cache Sort fi...
5 Reliability Volume Performance 55 trillions lines of data generated per day. Such amount of user data generated per day ...
6 Comparison TubeMQ Kafka Pulsar Latency Very low，10ms Low，250ms Very low，10ms TPS High，14W+/s Normal，10W+/s High，14W+/s F...
7 • Low latency: no more than 10ms consume delay for flink jobs and data analysis • Key shared: advertiser name used as ke...
8 Cluster Disaster Tolerance shenzhen IDC-A zk broke r Book keeper IDC-B zk broke r Book keeper replicate • Scenario: • Pr...
9 Cluster Disaster Tolerance Beijing zk broke r Book keeper zk broke r Book keeper zk broke r Book keeper replicate replic...
10 broker bookie http proxy Native KOP queue ZK channel topic partition Fragment Entry session Question: how to support va...
11 broker bookie Tag ZK topic partition Fragment Entry Question: when business varies , topics grows like disaster Ans: ad...
12 Improvements – KOP Question: how to change from kafka to pulsar without changing code? Ans: Kafka on Pulsar (PIP 70) An...
13 Improvements – KOP
14 � Auto Recovery: � broker throttle: � topic auto: � auto split: � ttl，retention: � topic uploading: � autocreation/dele...
15 pull requests brief note 5603 [client] retry when get meta failed enhancement 5960 [cli] add option to support message ...
Upcoming SlideShare
Loading in …5
×
Technology
26 views
Jun. 22, 2021

How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integration Platform - Pulsar Summit NA 2021

As the largest provider of Internet products and services in China, Tencent serves billions of users across the world. Such huge number of users has brought unprecedented value to the big data generated.

Serves as the front line of Tencent Big Data, Apache InLong is a one-stop streaming data integration solution which is mainly responsible for data collection, distribution, preprocessing and management.

Apache InLong choose pulsar as its data middleware for its high reliability and other capabilities like multi-tenancy, read-write separation, cross-regional replication and flexible fault tolerance.

Tencent Big Data Team will share their journal of adopting Pulsar in their core data engine to process tens of billions of data integration. Besides, some problems they encountered during the process and the improvements on pulsar they have made will also be shared as an example for future Pulsar users.

no profile picture user

  • Be the first to comment

  • Be the first to like this

How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integration Platform - Pulsar Summit NA 2021

  1. 1. How Tencent Applies Apache Pulsar to Apache InLong - A Streaming Data Integration Platform 1
  2. 2. Self Intro ZIRUI PENG 2 • Software Engineer – Mainly focused on Storage • Working in Tencent big data group (my first year of work!) • Contributing in Apache InLong, Apache Pulsar etc. • Fascinated about big data technology, message queue etc.
  3. 3. 3 Apache InLong • Apache InLong is a one-stop data streaming platform that provides automatic, secure, distributed, and efficient data publishing and subscription capabilities.
  4. 4. 4 SDK Agent Http Bus Ingest Converge InLong Structure Manager TubeMQ Pulsar KOP SDK HBase Flink Sort Consume Cache Sort file db mess age Agent, supports TubeMQ，developed by Tencent Big Data group and . Bus, Sort, Source
  5. 5. 5 Reliability Volume Performance 55 trillions lines of data generated per day. Such amount of user data generated per day puts a heavy burden on storage, Data input could be payment or user privacy, no data loss or error can be tolerated Machine Learning and other Real-time computing tasks requires short consumption time Requirements for MQ
  6. 6. 6 Comparison TubeMQ Kafka Pulsar Latency Very low，10ms Low，250ms Very low，10ms TPS High，14W+/s Normal，10W+/s High，14W+/s Filter consume Supports client filter or server filter Supports client filter Supports client filter Data copy No copies Multiple copies Multiple copies Reliability Low Low High, autorecory stability High running in Tencent at almost 7 years, which supports 33 trillion of data Unstable when topics get higher Considering we User friendly supports Java or C++ Lib only 1 client Multiple clients CAP Model AP AP or CP CP or AP Some Test Result
  7. 7. 7 • Low latency: no more than 10ms consume delay for flink jobs and data analysis • Key shared: advertiser name used as key to make sure binlogs are consumed sequentially • Massive consumers: thousands of consumers for one topic for various use cases • Dead letter queue: errors and exceptions can be consumed later for monitoring Case – Tencent Ads Data flow Background account statement of advertises can be used as data input for analise or Reconciliation Binlog InLong Agent Pulsar Flink Pulsar Client InLong Sort db es hdfs
  8. 8. 8 Cluster Disaster Tolerance shenzhen IDC-A zk broke r Book keeper IDC-B zk broke r Book keeper replicate • Scenario: • Produce and Consume in one region • Support fault tolerance • Pros: • Easy deployment and cost less resources • Cons: • Cost more delay when requests from different regions • Case: • Tencent Marvel ads
  9. 9. 9 Cluster Disaster Tolerance Beijing zk broke r Book keeper zk broke r Book keeper zk broke r Book keeper replicate replicate replicate • Scenario: • Produce and Consume across regions • Support fault tolerance across regions • Pros: • Less time delay when requests are from different clusters • Cons: • Cost more resources in deployment • Case: • Tencent QQ minor world Shenzhen Shanghai
  10. 10. 10 broker bookie http proxy Native KOP queue ZK channel topic partition Fragment Entry session Question: how to support various sdks? Ans: add proxy layer NodePHP sdk etc. Http Proxy Layer
  11. 11. 11 broker bookie Tag ZK topic partition Fragment Entry Question: when business varies , topics grows like disaster Ans: add tag in message properties, filter when consumed Filter consume using Tag Producer Filter Consumer
  12. 12. 12 Improvements – KOP Question: how to change from kafka to pulsar without changing code? Ans: Kafka on Pulsar (PIP 70) Ans: Kafka on Pulsar (PIP 70)
  13. 13. 13 Improvements – KOP
  14. 14. 14 � Auto Recovery: � broker throttle: � topic auto: � auto split: � ttl，retention: � topic uploading: � autocreation/deleteInactive:
  15. 15. 15 pull requests brief note 5603 [client] retry when get meta failed enhancement 5960 [cli] add option to support message key and message properties enhancement 6187 [cli] support unload partitioned topic enhancement 6431 [broker] fix publish buffer limit not take effect bugfix 6469 [dep] security upgrade enhancement 6524 [client] fix client connection leak bugfix

×