Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka meetup JP #3 - Engineering Apache Kafka at LINE

1,507 views

Published on

Presentation material of Kafka meetup JP #3
https://kafka-apache-jp.connpass.com/event/58619/

Published in: Engineering

Kafka meetup JP #3 - Engineering Apache Kafka at LINE

  1. 1. Engineering Apache Kafka at LINE Yuto Kawamura
  2. 2. Speaker introduction - Name: Yuto Kawamura(kawamuray) - Software engineer @ LINE - Designing and implementing inter-service data passing infrastructure - ❤ Apache Kafka - Apache Kafka contributor - KAFKA-4614 Improved broker’s response time - KAFKA-4024 Removed unnecessary blocking behavior of producer - Past publications - Applying Kafka Streams for internal message delivery pipeline - https://engineering.linecorp.com/en/blog/detail/80 - Monitoring Apache Kafka w/ Prometheus - https://www.slideshare.net/kawamuray/monitoring-kafka-w-prometheus
  3. 3. Apache Kafka - a new layer for service isolation Application server Another app server Stats system Abuser detection system Produce once, unified data format, general info Failure destination can resume consuming data after recovered. Apache Kafka - Publisher can stay agnostic for: - who uses data - remote failure - Subscriber can: - stay isolated from unexpectedly high workload on publisher - stop consuming temporary and continue after recovered from failure - can access all data from different services in the same way All data in 1-path
  4. 4. Data and usage - What kind of data are stored? - Structured events - Application server’s request log(!= access log) - Mutation logs of HBase datastore - Task processing request - How are those data used? - Data replication - Protect service storage from unexpected access - Abuser detection - User action based processing - Metrics generation - Statistics - Asynchronous task processing
  5. 5. Facts - Kafka version: 0.10.0.1 + inhouse patches and backports - 140+ billion messages /day - 38+ TB incoming data /day - 3.5+ million messages /sec on peak - tens of broker servers - Single, multi-tenanted cluster(+ secondary cluster for backup) - Target latency(Produce response time): - < 1 ms for 50th %ile - < 10ms for 99th %ile
  6. 6. Monitoring - Prometheus + Grafana - ref: https://www.slideshare.net/kawamuray/monitoring-kafka-w-prometheus - jmx_exprter and node_exporter - for Kafka broker metrics - simpleclient_java - for Producer/Consumer metrics - kafka-consumer-group-exporter - for general consumer monitoring - https://github.com/kawamuray/prometheus-kafka-consumer-group-exporter - Templated Grafana dashboard for generalized consumer monitoring - Consumers can start monitoring their consumer’s health immediately after the deployment
  7. 7. Hello, experts :D
  8. 8. Advanced observations - Apache Kafka performs really well. However... - It’s heavily relying on various system primitives - page cache - https://kafka.apache.org/documentation/#design_filesystem - sendfile(2) (linux) - https://kafka.apache.org/documentation/#maximizingefficiency - FileRecords#writeTo: bytesTransferred = channel .transferTo(position, count, destChannel);
  9. 9. Observing Kafka’s performance detail - Sometimes built-in metrics and jtools(jstack, jmap, jvisualvm) aren’t enough to learn everything - e.g, “Reading disk” is unusual thing in our cluster - happens only when some consumer being delayed and start requesting old offsets - Dive into OS metrics - node_disk_bytes_read(from node_exporter)
  10. 10. The problem is... - Slow response for FetchConsumer caused by reading disk, potentially involves other responses being sent within the same network processor - Disk read potentially happens “transparently” inside sendfile(2) syscall - => not observable from application(Kafka) level - Some consumer impl like old storm-kafka doesn’t checkpoints consumption offset to broker - => can’t observe currently consumed offset from broker side
  11. 11. Dive into kernel - SystemTap - SystemTap - A tool to perform dynamic tracing on Linux operation system - Can hook arbitrary execution point of Linux kernel internal - Very less overhead with compare to other traditional tools - Similar tools: - DTrace - eBPF
  12. 12. Observing syscall duration global s global records global sampled probe begin { printf("Start observing syscall %s duration... Press C-c to exitn", @1) } probe syscall.$1 { s[tid()] = gettimeofday_us() } probe syscall.$1.return { elapsed = gettimeofday_us() - s[tid()] delete s[tid()] records <<< elapsed sampled++ } probe timer.s(1) { printf("Sampled %d syscalls...n", sampled) } probe end { printf("syscall %s duration(us):n", @1) print(@hist_log(records)) } - syscall-trace.stp $ stap -x PID syscall-duration.stp sendfile Start observing syscall sendfile duration... Press C-c to exit ... Sampled 307489 syscalls... ^Csyscall sendfile duration(us): value |-------------------------------------------------- count 0 | 0 1 | 0 2 |@@@@@@@ 36801 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 230072 8 |@@@@@@@@ 40797 16 |@ 8550 32 | 535 64 | 407 128 | 35 256 | 7 512 | 0 1024 | 0 2048 | 3 4096 | 1 8192 | 3 16384 | 0 32768 | 1 65536 | 1 131072 | 0 262144 | 0
  13. 13. Observing slow sendfile(2) global s probe syscall.sendfile { s[tid()] = gettimeofday_us() } probe syscall.sendfile.return { elapsed = gettimeofday_us() - s[tid()] delete s[tid()] if (elapsed >= $1) { printf("sendfile took %d us: dest=%d src=%d offset=%d size=%dn", elapsed, $out_fd, $in_fd, $offset, $count) } } - Now we know most sendfile is completing within few microsecs - Then what does slow sendfile doing? - sendfile-trace.stp $ stap -x PID sendfile-trace.stp 1000 sendfile took 2517 us: dest=11388 src=4427 offset=139647745139744 size=724183 sendfile took 11784 us: dest=11388 src=4415 offset=139647745139744 size=183748 sendfile took 5961 us: dest=11388 src=4427 offset=139647745139744 size=847618
  14. 14. Finding who is reading disk $ ls -l /proc/PID/fd/{4415,11388} lrwx------ 1 user user 64 May 19 14:08 /proc/PID/fd/4415 -> /path/to/kafka-data/topic-foo-bar-9/00000000077927876918.log lrwx------ 1 user user 64 May 19 14:01 /proc/PID/fd/11388 -> socket:[311161013] $ netstat -npe | grep 311161013 tcp 0 0 LOCAL_IP:12345 REMOTE_IP:60258 ESTABLISHED 123 311161013 PID/java - Gotcha! - Technique was leveraged when I worked for https://issues.apache.org/jira/browse/KAFKA-4614
  15. 15. Conclusion - However… - Kafka works incredibly stable and high performant for sane traffic - Even with default config! - Kafka is serving 140+ billion daily messages only w/ tens of servers - Having single data hub makes: - inter-service collaboration easier - isolated workload - destination agnostic data format - more data reusability - resource utilization efficient - When going into deeper analysis, an OS level took like SystemTap helps you a lot
  16. 16. Small notice... - How many of you were at Kafka Summit NY…? - Kafka Summit SF will be held at 8/28 at SF - https://kafka-summit.org/events/kafka-summit-sf/ - https://kafka-summit.org/sessions/single-data-hub-services-feed-100-billion-messages-per-day/ - I’m going to give a presentation about our Kafka engineering chronicle - Come and join us!
  17. 17. End of presentation. Questions?

×