Performance and
BigData dongeforever@apache.org
dongeforever
Apache RocketMQ PMC/Committer.
Interested in algorithms and
performance.
Now spending more time in open
source, including MQ and BigData
Content Map
❖ Challenge of Big Data Stream
❖ Kafka Overview
❖ Batch Through
❖ Compression Through
❖ Structural Compression
❖ BigData Eco-system
Challenge of Big Data Stream
Challenge of Big Data Stream
❖ High Throughput —— Million TPS
❖ IO BandWidth —— Network & Disk
❖ Storage Cost
Kafka Overview
❖ Open sourced in early 2011, graduate from the Apache
Incubator on 23 October 2012
❖ Log Aggregation, Messaging, Stream Processing, Event
Sourcing
❖ Widely used in BigData processing, integrate with
Storm, Spark, Flink, Samza, Hadoop, Flume, etc.
Kafka Domain Model
Kafka Store Structure
Content Map
❖ Challenge of Big Data Stream
❖ Kafka Overview
❖ Batch Through
❖ Compression Through
❖ Structural Compression
❖ BigData Eco-system
Batch Through
Kafka Producer TPS vs Msg Size
Proxy Producer TPS vs Msg Size
RocketMQ Batch
❖ send(Collection<Message> msgs)
❖ Atomic
❖ Get 150w TPS
❖ https://rocketmq.apache.org/docs/batch-example/
Content Map
❖ Challenge of Big Data Stream
❖ Kafka Overview
❖ Batch Through
❖ Compression Through
❖ Structural Compression
❖ BigData Eco-system
Compression Through
• Clients handle the Compress/DeCompress
• Only Need One Operation in Broker
Compression Through
• For log collection, about 5~10 compression ratio
Content Map
❖ Challenge of Big Data Stream
❖ Kafka Overview
❖ Batch Through
❖ Compression Through
❖ Structural Compression
❖ BigData Eco-system
Structural Compression
Structural Compression
Structural Compression
❖ Assume each msg has N bytes, each batch has B msgs
❖ Size of 0.10: (34 + N) * B
❖ Size of 0.11: 61 + (7 + N) * B
❖ For N <= 100, save storage upper to 20%~50%
Content Map
❖ Challenge of Big Data Stream
❖ Kafka Overview
❖ Batch Through
❖ Compression Through
❖ Structural Compression
❖ BigData Eco-system
External ecosystem
External ecosystem
https://github.com/apache/rocketmq-externals

3.1.Performance and BigData Ecosystem