6. Key Terms Introduction
• Broker: MQ process(Minimum unit in kafka cluster)
• Topic: Category of message(data is store in)
• Producer: push message to broker(write data)
• Consumer: pull message from broker(read data)
• ConsumerGroup: provide tolerance、 scalability、parallel for
Consumer
• Partition: provide tolerance、 scalability、parallel
• Offset: Message position on each partition
7. Message Delivery
At most once: Messages may be lost but are never redelivered
At least once: Messages are never lost but may be redelivered
Exactly once: Each message is delivered once and only
once(0.11.x)
Messages sent by a producer to a particular topic partition will
be appended in the order they are sent
A consumer instance sees records in the order they are stored in
the log.
Tolerate up to N-1 server failures.(depends replication factors)
11. Producer
• Load balancing(sends data directly to the broker that is the leader for the
partition)
• Acks=0 producer no wait any acknowledgment from the broker at all.
Lowest latency at the cost of durability but high data lost.
• Acks=1 producer gets an acknowledgment after the leader wrote the
record to its local log, but will respond without awaiting full
acknowledgement from all followers. Maybe follower will be lost data if
leader commit after.
• Acks=-1 producer gets all acknowledgment after all in-sync replicas has
received the data. Strong guarantee data not be lost.
• batch.size=100 ,net client
• send.buffer.bytes=100*1024
• producer.type=async
• compression.type=none
• max.in.flight.requests.per.connection=3
Note: min.insync.replicas>=2
ACKs Throughput Latency Durability
0 High Low No Gurantee
1 Medium Medium Leader
-1 Low High ISR
12. Broker
• More partition = more concurrent process = more memory =
more io access =increase throughput= increase latency
(brokers have to distribution on each partition) P.S single topic
less than 1024 partitions
• Number of Factors = two brokers at least
num.io.threads=8 num.network.threads=3 background.threads=10
queued.max.requests=500
socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
num.recovery.threads.per.data.dir=2
Log.retention.hours=24 log.flush.interval.messages=10000
log.flush.interval.ms=1000 log.cleanup.policy=delete log.cleaner.enable=true
log.cleaner.threads=1 log.cleaner.backoff.ms=30000
log.segment.bytes=1073741824 replica.fetch.min.bytes=1
replica.high.watermark.checkpoint.interval.ms=5000
replica.fetch.wait.max.ms=500
min.insync.replicas=2
13. Consumer
• Need enough partitions to handle message from producer
• Maximum number of consumer = a multiple of broker(balance
is better)
• max.poll.records=5000
• enable.auto.commit=true
• auto.commit.interval.ms=5000
• fetch.max.wait.ms=500
• fetch.min.bytes=1
• keep small Batch size in our .net client(for real time consumer
data)
14. JVM
• Avoid out of memory
• Avoid high frequency trigger GC
-Xmx8g –Xms8g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -
XX:MaxMetaspaceFreeRatio=80 -XX:MinMetaspaceFreeRatio=50 -
XX:G1HeapRegionSize=16M -XX:InitiatingHeapOccupancyPercent=35
-Xms: Set initial Java heap size
-Xmx: Set maximum Java heap size
+UseG1GC: Enable G1 GC
MaxGCPauseMillis: Set maximum pause
MaxMetaspaceFreeRatio: Set maximun metaspace free ratio
MinMetaspaceFreeRatio: Set minimun metaspace free ratio
G1HeapRegionSize: Adjust G1 region on each heap
InitiatingHeapOccupancyPercent: initial Java heap occupancy threshold