Gathering all the profiling information unconditionally for all processing comprising a distributed service can be quite overwhelming and costly. Especially when most of that information might not be relevant or actionable.
The idea of contextual profiling is to utilize the context captured by tracers to drive the collection and representation of the profiling data in a way that relates directly to the customer's application and business processes.
This talk will shed more light on how the contextual profiling is implemented in Datadog Java Profiler and muse the idea of bringing the concept to OpenJDK JFR such it may be used by all Java users.
3. Introduction
- Active in JVM performance area since 2006
- NetBeans Profiler
- VisualVM [1]
- BTrace [2]
- JMX, Serviceability
- OpenJDK member and Reviewer
- Currently at Datadog in charge of JVM profiling
- In-house profiling agent
- Heavily based on async-profiler [3]
- Participating in OpenJDK
- JFR fixes, backports
- Proposing/implementing new features
Disclaimer
- Examples will be shown in Datadog UI
- Yet, not a Datadog pitch!
[1] https://visualvm.github.io
[2] https://github.com/btraceio/btrace
[3] https://github.com/async-profiler/async-profiler
4. Continuous Profiling
- Single execution profiling
- Traditional profilers - JProfiler, VisualVM, YourKit etc.
- Used in development phase
- Overhead not a big concern
- Results restricted by the development environment
- Continuous profiling
- Cloud deployments, Continuous delivery etc.
- Profiling in development environment not sufficient
- Profiler is ‘always on’
- Past performance can be inspected and analyzed
- Very overhead sensitive
5. JDK Flight Recorder
- Available in JDK 9+ and JDK 8 after update 272
- Capture profiling data on demand
- Jcmd
- JMX
- Streaming available since JDK 14 [1]
[1] https://openjdk.org/jeps/349
[2] https://docs.oracle.com/en/java/javase/11/tools/java.html
[3]
https://docs.oracle.com/javacomponents/jmc-5-5/jfr-command-referenc
e/diagnostic-command-reference.htm
> jcmd myapp.jar JFR.dump name=rec1 filename=dump1.jfr
…
> jcmd myapp.jar JFR.dump name=rec1 filename=dump2.jfr
Start JFR at JVM startup
> java -XX:StartFlightRecording=name=rec1,filename=my_recording.jfr -jar myapp.jar
More JFR options are described in Java Tools Reference [2]
Capture profiling data via JCMD
More JCMD related options are described in JCMD Tool Reference [3]
6. JVMTI Agent
- AsyncGetCallTrace
- ‘Unofficial’ API to get non-biased stacktraces
- Not really maintained
- Lurking bugs can crash your JVM
- async-profiler [1]
- Widely used fully functional profiler
- Exports to multiple formats
- CSV
- Flamegraph [2]
- JFR binary format
[1] https://github.com/async-profiler/async-profiler
[2] https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
7. Datadog Continuous Profiler
[1] https://openjdk.org/jeps/349
[2] https://github.com/async-profiler/async-profiler
[3] https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
- Agent, Backend, UI
- Backend and UI are proprietary, closed source
- Agent is open source
- https://github.com/dataDog/dd-trace-java
- Opportunistic Agent
- Use any available datasource
- Combines JFR and in-house profiling agent
- Subject to availability
- AsyncGetCallTrace can be crashy on older JVMs
- JFR is not available from all Java vendors
- J9, Zing
- Integrated Agent
- Distributed as a part of the tracer agent
- Integrates with the tracer
- ! Important for context !
9. Introducing Context
- ‘Context’ is:
- A set of simple values describing current workload
- Can be thought of as tags
- User specific meaning
- ‘Context’ allows:
- Mapping performance data back to
- HTTP requests
- REST API calls
- GRPC calls
- etc.
- Slice’n’Dice analysis of the performance data
- ‘Context’ is difficult because it must be:
- Of acceptable cardinality
- Fully propagated between threads
- Executors
- Fork-Join
- Reactive frameworks
- Loom!
10. Implementing Context
- Labels in PPROF
- Ready to use
- Profile size implications
- Go runtime has native support
- Nothing in JVM
- ‘Thread-coloring’ approach was considered in JRockit
- Never implemented
- JFR is still not aware of context
- Custom implementation is needed
11. Datadog Profiling Context
- Context propagation
- Implemented in Java tracer
- Context associated with a unit of work
- Independent of executing thread
- Context persistence
- Implemented in the profiler agent
- Store context in JFR events
- Easy and fast Java<->Native interop is mandatory
- No JNI calls, please!
- Shared memory buffer
- Relying on Java and native side being tightly coupled
- Semi-custom context
- Capped at ten custom tags
- Custom tag types/names
- Must be defined before profiler is started
- Stored in the JFR recording
12. Shared Memory Context
- One context per thread
- Sparse thread-page map
- Static size
- Efficient memory layout
- 64 bytes to match the common x64 cache line size
- Checksum
- Used to detect tearing, partial writer
- 64 bit/8 bytes
- Context Content
- Provides 10 slots (currently)
- Each slot is 4 bytes
- Possibly up to 14 slots (56 bytes)
14. JFR Event Context
- Contextual JFR events
- Used context slots as event attributes
- Event scheme generated at startup
- Store context slot names/ids
- Use Settings event
- Dictionarized context contents
- Strings mapped to unique IDs
- Context content is the ID
- Strings stored as JFR constant pool
- Standard JFR binary format feature
- Custom context is fully restorable
- Context slot name
- Context slot value
15. Java API
- ContextSetter
- Register context before profiling is started
- Count
- Names
- Set context values
- Register dictionarized strings
16. Context Propagation
- Context is bound to a work item
- Work item can be processed by multiple threads
- Manual threads
- Thread pools
- Reactive frameworks
- Context must be carried from thread to thread
- Concept of activate/deactivate
- Piggy-back on distributed tracing
- Datadog Tracer already does propagation
- Context propagation ‘for free’
- Profiling context needs more detailed propagation
- Tracer needs to be aware of profiler needs
18. Wrap-up
- Benefits of using profiling context
- Captures additional information about the environment during
profiling
- Enables slicing and dicing of profiling data to identify
performance issues
- Associates profiling data with specific users or inputs for
informed decisions
- Provides connection between parallel code execution to identify
interactions
- Overall, helps developers optimize code more effectively
- Next steps
- Bring the benefits to JDK/JFR
- Many built-in events would benefit from this
- Locks, I/O, etc.
- Standardized implementation