Successfully reported this slideshow.
Your SlideShare is downloading. ×

Continuous Go Profiling & Observability

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 40 Ad

Continuous Go Profiling & Observability

Download to read offline

This presentation is for Go developers and operators of Go applications who are interested in reducing costs and latency, or debugging problems such as memory leaks, infinite loops, performance regressions, etc. of such applications. We'll start with a brief description of the unique aspects of the Go runtime, and then take a look at the builtin profilers as well as Go's execution tracer. Additionally we'll look at the interoperability with popular observability tools such as Linux perf and bpftrace. After this presentation you should have a good idea of the various tools you can use, and which ones might be the most useful to you in a production environment.

This presentation is for Go developers and operators of Go applications who are interested in reducing costs and latency, or debugging problems such as memory leaks, infinite loops, performance regressions, etc. of such applications. We'll start with a brief description of the unique aspects of the Go runtime, and then take a look at the builtin profilers as well as Go's execution tracer. Additionally we'll look at the interoperability with popular observability tools such as Linux perf and bpftrace. After this presentation you should have a good idea of the various tools you can use, and which ones might be the most useful to you in a production environment.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Continuous Go Profiling & Observability (20)

Advertisement

More from ScyllaDB (20)

Recently uploaded (20)

Advertisement

Continuous Go Profiling & Observability

  1. 1. Brought to you by Continuous Go Profiling & Observability Felix Geisendörfer Staff Engineer at
  2. 2. ■ Go developers and operators of Go applications ■ Interested in reducing costs and latency, or debugging problems such as memory leaks, infinite loops and performance regressions ■ Focus is on Go’s built-in tools, but we’ll also cover Linux perf and eBPF Target Audience
  3. 3. Felix Geisendörfer Staff Engineer at Datadog ■ Working on continuous Go profiling as a product ■ Previous 6.5 years working for Apple (Factory Traceability) ■ Open Source Contributor (node.js, Go): github.com/felixge
  4. 4. https://dtdg.co/p99-go-profiling Slides
  5. 5. What is profiling? ■ Anything that produces a weighted list of stack traces ■ Example: CPU Profiler that interrupts process every 10ms of CPU time, captures a stack trace and aggregates their counts stack trace count main;foo 5 main;foo;bar 4 main;foobar 4
  6. 6. What is Continuous Profiling? ■ Profiling in production ■ Continuously upload profiles to a backend for later analysis
  7. 7. Why profile in production? ■ Data distributions have a big impact on performance ■ Production profiles can help mitigate and root cause incidents ■ Profiling is usually low overhead (1-10%)
  8. 8. About Go ■ Compiled language like C/C++/Rust ■ Should work well with industry standard observability tools … right?
  9. 9. Does Go pass the Duck Test?
  10. 10. Goroutines ■ Green threads scheduled onto OS thread by Go runtime ■ Tightly integrated with Go’s network stack (epoll on Linux) ■ Tiny 2 KiB stacks that grow dynamically ■ Fast context switching (~170ns), 10x faster than Linux threads see https://dtdg.co/3n6kBoC ■ Data sharing via mutexes and channels (CSP)
  11. 11. The trouble with goroutines uprobe:./example:main.Foo { @start[tid] = nsecs; } uretprobe:./example:main.Foo { @msecs = hist((nsecs - @start[tid]) / 1000000); delete(@start[tid]); } END { clear(@start); }
  12. 12. uretprobes + dynamic stacks = 💣 $ sudo bpftrace -c ./example funclatency.bpf Attaching 3 probes... SIGILL: illegal instruction PC=0x7fffffffe001 m=4 sigcode=128 instruction bytes: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 goroutine 1 [running]: runtime: unknown pc 0x7fffffffe001 stack: frame={sp:0xc00006cf70, fp:0x0} stack=[0xc00006c000,0xc00006d000) 000000c00006ce70: 000000c000014010 0000000000000010 000000c00006ce80: 000000c000018000 000000000000004b 000000c00006ce90: 000000c00001a000 0000000000000013 see: runtime: ebpf uretprobe support #22008: https://dtdg.co/3s4vnfn
  13. 13. Thread IDs? Goroutine IDs! uprobe:./example:main.Foo { @start[tid] = nsecs; } uretprobe:./example:main.Foo { @msecs = hist((nsecs - @start[tid]) / 1000000); delete(@start[tid]); } END { clear(@start); }
  14. 14. Thread IDs? Goroutine IDs! struct stack { uintptr_t lo; uintptr_t hi; } struct gobuf { uintptr_t sp; uintptr_t pc; uintptr_t g; uintptr_t ctxt; uintptr_t ret; uintptr_t lr; uintptr_t bp; } struct g { struct stack stack; uintptr_t stackguard0; uintptr_t stackguard1; uintptr_t _panic; uintptr_t _defer; uintptr_t m; struct gobuf sched; uintptr_t syscallsp; uintptr_t syscallpc; uintptr_t stktopsp; uintptr_t param; uint32_t atomicstatus; uint32_t stackLock; uint64_t goid; } uprobe:./example:runtime.execute { @gids[tid] = ((struct g *)sarg0)->goid; }
  15. 15. ■ Does not follow System V AMD64 ABI 🙈 ■ Arguments are passed on the stack rather than using registers (slowish) ■ Go 1.17 switched to a register calling convention, but still idiosyncratic (to support goroutine scalability, multiple return arguments, etc.) ■ ABI0 remains in use to support legacy assembly code Go’s Calling Convention See Proposal: Register-based Go calling convention: https://dtdg.co/2VIPOSV
  16. 16. ■ Requires separate stack for C call frames which need to be static ■ High complexity and some overhead (~60ns) to switch between stacks see https://dtdg.co/2X1HvTq Calling C Code
  17. 17. ■ Go pushed frame pointers onto the stack, has no -fomit-frame-pointer ■ Go also generates DWARF unwind/symbol tables by default ■ Leads to good interoperability with tools such as Linux perf ■ Go runtime uses idiosyncratic gopclntab unwinding and symbol tables (DWARF is strippable and $@!%^# turing complete, so this is good) Less odd: Stack Traces
  18. 18. Duck Test: Go is an odd duck Pay attention when using 3rd party tools in production Ashley Willis (CC BY-NC-SA 4.0)
  19. 19. ■ Quirky runtime, Pedestrian language, limited type system, but ... ■ What Go lacks as language, it makes up for in tooling ■ Built-in documentation, testing, benchmarking, code formatting, tracing, profiling and more! So why bother with Go?
  20. 20. ■ Five different profilers: CPU, Heap, Mutex, Block, Goroutine go test -cpuprofile cpu.prof -memprofile mem.prof -bench ■ pprof visualization and analysis tool go tool pprof -http=:6060 cpu.prof Built-in observability tools
  21. 21. Built-in observability tools ■ Runtime execution tracer (⚠ overhead can be > 10%) go test -trace trace.out -bench
  22. 22. Built-in Profilers
  23. 23. ■ Three profilers that measure time: ● CPU ● Block ● Mutex Profilers measuring time
  24. 24. CPU Profiler
  25. 25. ■ Annotate goroutines with arbitrary key/value pairs ■ Understand CPU consumption of individual requests, users, endpoints, etc. CPU Profiler: Labels labels := pprof.Labels("user_id", "123") pprof.Do(ctx, labels, func(ctx context.Context) { // handle request go update(ctx) // child goroutine inherits labels })
  26. 26. ■ Uses setitimer(2) to receive SIGPROF signal for every 10ms of CPU time ■ Signal handler takes stack traces and aggregates them into a profile ■ setitimer(2) has thread delivery bias and can’t keep up when utilizing more than 2.5 cores 🙄 ■ Rhys Hiltner (Twitch) and myself are working on an upstream patch to use timer_create(2) See: runtime/pprof: Linux CPU profiles inaccurate beyond 250% CPU use #35057: https://dtdg.co/3CAeApm CPU Profiler: Implementation Details
  27. 27. ■ Samples mutex wait (both) and channel wait (block profiler) events ■ Why the overlap? ● Block captures Lock(), i.e. the blocked mutexes ● Mutex captures Unlock(), i.e. the mutexes doing the blocking ■ Block profile used to be biased. Fix contributed for Go 1.17. see https://go-review.googlesource.com/c/go/+/299991 Mutex & Block Profiler
  28. 28. Recap: Profilers measuring time
  29. 29. Allocation & Heap Profiler func malloc(size): object = ... // alloc magic if poisson_sample(size): s = stacktrace() profile[s].allocs++ profile[s].alloc_bytes += sizeof(object) track_profiled(object, s) return object func sweep(object): // do gc stuff to free object if is_profiled(object) s = alloc_stacktrace(object) profile[s].frees++ profile[s].free_bytes += sizeof(object) return object
  30. 30. ■ Allocations per stack trace ■ Memory remaining inuse on the heap (allocs-frees) ■ Can identify the source of memory leaks, but not the refs retaining things Allocation & Heap Profiler
  31. 31. ■ Can sometimes guide CPU optimizations better than CPU profiler Allocation & Heap Profiler made using tweetpik.com
  32. 32. ■ Second-Order Effects: Reducing allocs can make unrelated code faster (!) ■ 💡 Reduce allocations and number of pointers on the heap Allocation & Heap Profiler made using tweetpik.com
  33. 33. ■ Briefly stops all goroutines and captures their stack traces (⚠ Latency) ■ Useful for debugging goroutine leaks ■ Text output format also includes waiting times for debugging “stuck programs” (block/mutex don’t show this until the blocking event has finished) ■ fgprof captures goroutine profiles at 100 Hz -> Wallclock Profile https://github.com/felixge/fgprof Goroutine Profiler
  34. 34. Bonus: Linux perf & eBPF
  35. 35. ■ Frame pointers & DWARF tables lead to good interoperability ■ perf offers better accuracy (but accuracy of builtin profilers is decent enough) ■ Deals with dual Go and C stacks (no need for runtime.SetCgoTraceback()) ■ Downsides: Linux only, Security, Permissions, Lack of Profiler Labels ■ Example: perf record -F 99 -g ./myapp && perf report Linux perf
  36. 36. ■ Example: bpftrace -e 'profile:hz:99 { @[ustack()] = count(); }' -c ./myapp ■ Should require less context switching, stacks aggregated in kernel ■ Otherwise similar caveats as Linux perf eBPF (bpftrace)
  37. 37. Recap
  38. 38. ■ Go is a bit odd for a compiled language, but ... ■ Wide variety of profiling and observability tools can be used ■ Most should be safe for production (⚠ goroutine profiler, execution tracer, uretprobes) ■ Continuous Profiling makes sure you always have the data at your fingertips Recap
  39. 39. Check out github.com/DataDog/go-profiler-notes for more in-depth Go profiling research
  40. 40. Brought to you by Felix Geisendörfer p99@felixge.de @felixge

×