SlideShare a Scribd company logo
1 of 22
Download to read offline
Future-Proof JVM Profiling
Evolving the platform profiling support
Jaroslav Bachorik
Staff Software Engineer
Datadog
● Capturing the appliccation performance related data
○ … and using it to improve resource usage
● Can be classified into
○ Execution Profiling
Where is the CPU time being spent
○ Memory Profiling
■ Allocation profiling
Which code is allocating the most and of which type
■ Heap profiling
Which objects are retained and who is allocating them
○ Latency Profiling
What is causing the application to do ‘nothing’
■ Wallclock profiling
■ Lock profiling
■ (Synchronous) I/O profiling
■ Syscall profiling
Profiling Is …
● Sampling profiling
○ collects random samples
○ creates a statistical representation of the process behaviour
○ light-weight
○ does not provide exact duration information and call-graph
● Tracing profiling
○ traces exact method invocation with stackttrace and timing
○ heavily intruding
■ overhead
■ JIT and memory management interference
○ provides exact call-graph and duration information
Sampling vs. Tracing Profiling
● Good enough results
● Acceptable overhead
● In practice ‘profiling’ == ‘sampling profiling’
Sampling Profiling!
● System deployments are complex
○ Cloud, K8s
● Profiling in-isolation is not enough
○ Same-service, multiple-instances
○ Same-service, multiple envs
○ How to correlate and distinguish?
● Enter APM - Application Performance Management
○ Combined tracing/profiling
■ Tracing provides ‘coarse’ information about operations
■ Profiling fills in the gory details about the code execution
○ User in control of what is traced
■ Profiles scoped to traces allow causal analysis
○ Frame level information exposed by profiler
■ Can be used to drive debug session by dynamic instrumentation
Profiling In Cloud
Demo Time!
JVM Profiling Support
- JMX
- A complex management and observability framework
- Since 2003, JSR 160
- Easily used from Java
- JVMTI
- Low level tooling interface
- Since 2004, JSR 163
- Requires native agent
- JFR
- One-stop solution for JVM (and application) observability
- In OpenJDK since 2017 (JDK 9, backported to JDK 8 in 2020, update 262/272)
- No special agent required
- AsyncGetCallTrace (ASGCT)
- A ‘special’ way to get ‘raw’ stacktrace
- Introduced for Sun One Studio in 2004
- Requires native agent and custom profiling infrastructure
JMX
- Execution profiling
- GetAllStackTraces
- Safe-point biased
- Overhead grows with number of sampled threads
- Obsolete method
- Ubiqitous
JVMTI
- Focused on tracing profiling
- very high overhead
- Execution Sampling
- GetAllStackTraces or GetStackTrace
- Frame reference via jmethodID validity issues
- becomes invalid if parent class unloaded
- can’t force strong refs to all classes in stacktrace atomically
- Safe-point biased (as JMX)
- Allocation sampling since JDK 11
- JEP 331
- Not biased towards TLAB size
- Modern sampler with known statistical properties
- Samples can be ‘upscaled’ to real allocation sizes estimates
- Profiling support is very 2000-ish
JFR
- JDK Flight Recorder
- Low overhead observability framework
- Supports all profiling modes
- Execution Sampling
- low overhead
- avoids safe-point bias
- not a ‘true’ CPU profiler
- sampler driven by wallclock interval
- failed samples are not reported
- separate sampler thread - possibility of starvation
- non-trivial upscaling to CPU time per sample
- Allocation Sampling
- low overhead
- biased on TLAB size
- non-uniform sampling = no easy upscaling
- Heap Profiling
- biased on TLAB size
- can collect reference-chains (light-weight heapdump)
- Lock Profiling
- Thresholded on minimum blocking duration to report
- Prevents swamping recording with lock event
- Makes profiler blind to latency induced by very many short lock events
- Syscall Profiling
- Kind of - wallclock profile for threads handling JNI native code
AsyncGetCallTrace (ASGCT)
- ‘Unofficial’ API to get non-safepoint-biased stacktraces
- Added for Sun Studio One many years ago
- Not really maintained
- Lurking bugs can crash your JVM
- Can be called from signal handler -> stack may be inconsistent
- Some have been fixed
- Innocently looking methods mutating global state
- Asserts and guards for invariants not valid when using
ASGCT
- Still, the foundation of almost all 3rd party Java profilers
Can I Just Use JFR?
- TL;DR - almost
- There are still some parts missing, provided by 3rd party profilers
- ‘Proper’ CPU profiler
- driven by CPU time rather than wallclock time
- Non-biased allocation profiler
- JFR allocation sampler biases on TLAB size
- Non-biased heap profiler
- trading-off non-biased nature for reference-chains
- Profiling context
- required for tracer-profiler integration (think OTel)
- labelling events by context
- guarding events by context
- eg. instead of threshold use the presence of context
- JFR is currently very closed to enhancements
- Adding support required for contemporary profiling needs is excruciatingly slow
Really, Can I Just Use JFR?
● Yes! If the following features are implemented
○ [Proper] CPU Profiler
○ Profiling Context
● Having an API to request event emission from native would be great!
○ Custom sampling policies
○ Integration with perf events (woohoo!)
○ ebpf anyone
○ Prototyping concepts in 3rd party code before adding to JFR core
Improved JFR CPU Profiler
● Use CPU time based sampler driver (perf_event_open, timer_create)
○ Subject to availability
■ Prefer perf_event_open, if available
■ Fall-back to timer_create, if available
■ If not on Linux or neither perf_event_open nor timer_create
is available, fall-back to the dedicated sampler thread
○ Alternatively, provide a way to request ExecutionSample event
from a native signal handler
● Make the stacktrace acquisition safe to run in signal-handler
○ JEP 435: Asynchronous Stack Trace VM API
■ Samples recorded at the exact PC
■ But stack walked only on the method-exit safe-point
(credits to Erik Oesterlund for this idea!)
■ Johannes Bechberger making great progress
JFR Context
● What is context?
○ Trace ID
○ REST endpoint
○ Request URL
○ … and anything else allowing to scope JFR events
● Start simple
○ Context is attached to a (virtual) thread
○ Context value is a plain string
○ Finite small number of context values
○ Context values are represented as augmented event fields
○ No automatic context propagation
■ It is up to the API user take care of continuations carrying
the right context
● There is prior art eg. in Go
○ Profiler Labels
○ It is a first-class runtime citizen
JFR Context Alternatives
● We (Datadog) tried to work-around the lack of context by
○ Special ‘Context’ events
■ Event spans time between context set and context unset
■ Huge amount of such events for reactive/async apps
■ Getting very complex when tracking more than one context
attribute
■ Thresholding does not help as it leads to unpredictable
context loss
■ Difficult to correlate with the rest of the data that can be
sampled
○ Special ‘Context Change’ events
■ Each event represents state transition
■ Easier to correlate with potentially sampled data
■ The amount of events turned out to be unbearable (millions
per minute)
External Context Implementation
● Implement the context in a separate profiler
● Datadog profiler has such an implementation
● It comes at the cost of
○ Replicating the JFR writer implementation
○ Replicating several JFR provided events
○ Missing context for low-level events like the most of the
thread-halting events (ThreadPark, MonitorWait, etc.)
○ Relying on ASGCT which may crash the profiled app
● Still, the feature is loved by our customers for the increased clarity of
the profiling data
TL;DR Datadog Profiling Context
Datadog Profiling Context
- Context propagation
- Implemented in Java tracer
- Context associated with a unit of work
- Independent of executing thread
- Context persistence
- Implemented in the profiler agent
- Store context in JFR events
- Easy and fast Java<->Native interop is mandatory
- No JNI calls, please!
- Shared memory buffer
- Relying on Java and native side being tightly coupled
- Tags are plain strings
- Dictionarized
- No custom types
- Semi-custom context
- Capped at ten custom tags
- Custom tag types/names
- Must be defined before profiler is started
- Stored in the JFR recording
Shared Memory Context
- One context per thread
- Sparse thread-page map
- Static size
- Efficient memory layout
- 64 bytes to match the common x64 cache line size
- Checksum
- Used to detect tearing, partial writes
- 64 bit/8 bytes
- Context Content
- Provides 10 slots (currently)
- Each slot is 4 bytes
- Possibly up to 14 slots (56 bytes)
Shared Memory Context
Thread 1
Thread 2
…
Thread N
1 2 3 4 5 6 7 8 9 10
chksum
64b
Context data (10 slots, 40 bytes
64 bytes (eg. cache line)
1 2 3 4 5 6 7 8 9 10
chksum
64b
Context data (10 slots, 40 bytes)
64 bytes (eg. cache line)
Thread
page
map
Java API

More Related Content

What's hot

Automated testing with Cypress
Automated testing with CypressAutomated testing with Cypress
Automated testing with CypressYong Shean Chong
 
マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021Yusuke Suzuki
 
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例Yahoo!デベロッパーネットワーク
 
Google BigQuery クエリの処理の流れ - #bq_sushi
Google BigQuery クエリの処理の流れ - #bq_sushi Google BigQuery クエリの処理の流れ - #bq_sushi
Google BigQuery クエリの処理の流れ - #bq_sushi Google Cloud Platform - Japan
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingDatabricks
 
Akkaで分散システム入門
Akkaで分散システム入門Akkaで分散システム入門
Akkaで分散システム入門Shingo Omura
 
LibreOffice API について
LibreOffice API についてLibreOffice API について
LibreOffice API について健一 辰濱
 
困らない程度のJDK入門
困らない程度のJDK入門困らない程度のJDK入門
困らない程度のJDK入門Yohei Oda
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)NTT DATA Technology & Innovation
 
Swaggerで始めるモデルファーストなAPI開発
Swaggerで始めるモデルファーストなAPI開発Swaggerで始めるモデルファーストなAPI開発
Swaggerで始めるモデルファーストなAPI開発Takuro Sasaki
 
新人Git/Github研修公開用スライド(その1)
新人Git/Github研修公開用スライド(その1)新人Git/Github研修公開用スライド(その1)
新人Git/Github研修公開用スライド(その1)pupupopo88
 
REST API のコツ
REST API のコツREST API のコツ
REST API のコツpospome
 
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)NTT DATA Technology & Innovation
 
初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!Tetsutaro Watanabe
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog
 
Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26
Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26
Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26Yahoo!デベロッパーネットワーク
 
劇的改善 CI 4時間から5分へ〜私がやった10のこと〜
劇的改善 CI 4時間から5分へ〜私がやった10のこと〜劇的改善 CI 4時間から5分へ〜私がやった10のこと〜
劇的改善 CI 4時間から5分へ〜私がやった10のこと〜Recruit Lifestyle Co., Ltd.
 

What's hot (20)

Understanding MLOps
Understanding MLOpsUnderstanding MLOps
Understanding MLOps
 
Automated testing with Cypress
Automated testing with CypressAutomated testing with Cypress
Automated testing with Cypress
 
マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021
 
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
Yahoo!ニュースにおけるBFFパフォーマンスチューニング事例
 
Google BigQuery クエリの処理の流れ - #bq_sushi
Google BigQuery クエリの処理の流れ - #bq_sushi Google BigQuery クエリの処理の流れ - #bq_sushi
Google BigQuery クエリの処理の流れ - #bq_sushi
 
Mavenの真実とウソ
Mavenの真実とウソMavenの真実とウソ
Mavenの真実とウソ
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
 
Akkaで分散システム入門
Akkaで分散システム入門Akkaで分散システム入門
Akkaで分散システム入門
 
LibreOffice API について
LibreOffice API についてLibreOffice API について
LibreOffice API について
 
困らない程度のJDK入門
困らない程度のJDK入門困らない程度のJDK入門
困らない程度のJDK入門
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
 
Swaggerで始めるモデルファーストなAPI開発
Swaggerで始めるモデルファーストなAPI開発Swaggerで始めるモデルファーストなAPI開発
Swaggerで始めるモデルファーストなAPI開発
 
新人Git/Github研修公開用スライド(その1)
新人Git/Github研修公開用スライド(その1)新人Git/Github研修公開用スライド(その1)
新人Git/Github研修公開用スライド(その1)
 
REST API のコツ
REST API のコツREST API のコツ
REST API のコツ
 
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
 
初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26
Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26
Micrometer/Prometheusによる大規模システムモニタリング #jsug #sf_26
 
劇的改善 CI 4時間から5分へ〜私がやった10のこと〜
劇的改善 CI 4時間から5分へ〜私がやった10のこと〜劇的改善 CI 4時間から5分へ〜私がやった10のこと〜
劇的改善 CI 4時間から5分へ〜私がやった10のこと〜
 

Similar to Java Profiling Future

ContextualContinuous Profilng
ContextualContinuous ProfilngContextualContinuous Profilng
ContextualContinuous ProfilngJaroslav Bachorik
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @WindwardDemi Ben-Ari
 
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...Towards "write once - run whenever possible" with Safety Critical Java af Ben...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...InfinIT - Innovationsnetværket for it
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoringSimon Ritter
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime DomainDemi Ben-Ari
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 
Java troubleshooting thread dump
Java troubleshooting thread dumpJava troubleshooting thread dump
Java troubleshooting thread dumpejlp12
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingDemi Ben-Ari
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hoodRichardWarburton
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007Baruch Sadogursky
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with GatlingPetr Vlček
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 

Similar to Java Profiling Future (20)

ContextualContinuous Profilng
ContextualContinuous ProfilngContextualContinuous Profilng
ContextualContinuous Profilng
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
 
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...Towards "write once - run whenever possible" with Safety Critical Java af Ben...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoring
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime Domain
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
Java troubleshooting thread dump
Java troubleshooting thread dumpJava troubleshooting thread dump
Java troubleshooting thread dump
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hood
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with Gatling
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 

Java Profiling Future

  • 1. Future-Proof JVM Profiling Evolving the platform profiling support Jaroslav Bachorik Staff Software Engineer Datadog
  • 2. ● Capturing the appliccation performance related data ○ … and using it to improve resource usage ● Can be classified into ○ Execution Profiling Where is the CPU time being spent ○ Memory Profiling ■ Allocation profiling Which code is allocating the most and of which type ■ Heap profiling Which objects are retained and who is allocating them ○ Latency Profiling What is causing the application to do ‘nothing’ ■ Wallclock profiling ■ Lock profiling ■ (Synchronous) I/O profiling ■ Syscall profiling Profiling Is …
  • 3. ● Sampling profiling ○ collects random samples ○ creates a statistical representation of the process behaviour ○ light-weight ○ does not provide exact duration information and call-graph ● Tracing profiling ○ traces exact method invocation with stackttrace and timing ○ heavily intruding ■ overhead ■ JIT and memory management interference ○ provides exact call-graph and duration information Sampling vs. Tracing Profiling
  • 4. ● Good enough results ● Acceptable overhead ● In practice ‘profiling’ == ‘sampling profiling’ Sampling Profiling!
  • 5. ● System deployments are complex ○ Cloud, K8s ● Profiling in-isolation is not enough ○ Same-service, multiple-instances ○ Same-service, multiple envs ○ How to correlate and distinguish? ● Enter APM - Application Performance Management ○ Combined tracing/profiling ■ Tracing provides ‘coarse’ information about operations ■ Profiling fills in the gory details about the code execution ○ User in control of what is traced ■ Profiles scoped to traces allow causal analysis ○ Frame level information exposed by profiler ■ Can be used to drive debug session by dynamic instrumentation Profiling In Cloud
  • 7. JVM Profiling Support - JMX - A complex management and observability framework - Since 2003, JSR 160 - Easily used from Java - JVMTI - Low level tooling interface - Since 2004, JSR 163 - Requires native agent - JFR - One-stop solution for JVM (and application) observability - In OpenJDK since 2017 (JDK 9, backported to JDK 8 in 2020, update 262/272) - No special agent required - AsyncGetCallTrace (ASGCT) - A ‘special’ way to get ‘raw’ stacktrace - Introduced for Sun One Studio in 2004 - Requires native agent and custom profiling infrastructure
  • 8. JMX - Execution profiling - GetAllStackTraces - Safe-point biased - Overhead grows with number of sampled threads - Obsolete method - Ubiqitous
  • 9. JVMTI - Focused on tracing profiling - very high overhead - Execution Sampling - GetAllStackTraces or GetStackTrace - Frame reference via jmethodID validity issues - becomes invalid if parent class unloaded - can’t force strong refs to all classes in stacktrace atomically - Safe-point biased (as JMX) - Allocation sampling since JDK 11 - JEP 331 - Not biased towards TLAB size - Modern sampler with known statistical properties - Samples can be ‘upscaled’ to real allocation sizes estimates - Profiling support is very 2000-ish
  • 10. JFR - JDK Flight Recorder - Low overhead observability framework - Supports all profiling modes - Execution Sampling - low overhead - avoids safe-point bias - not a ‘true’ CPU profiler - sampler driven by wallclock interval - failed samples are not reported - separate sampler thread - possibility of starvation - non-trivial upscaling to CPU time per sample - Allocation Sampling - low overhead - biased on TLAB size - non-uniform sampling = no easy upscaling - Heap Profiling - biased on TLAB size - can collect reference-chains (light-weight heapdump) - Lock Profiling - Thresholded on minimum blocking duration to report - Prevents swamping recording with lock event - Makes profiler blind to latency induced by very many short lock events - Syscall Profiling - Kind of - wallclock profile for threads handling JNI native code
  • 11. AsyncGetCallTrace (ASGCT) - ‘Unofficial’ API to get non-safepoint-biased stacktraces - Added for Sun Studio One many years ago - Not really maintained - Lurking bugs can crash your JVM - Can be called from signal handler -> stack may be inconsistent - Some have been fixed - Innocently looking methods mutating global state - Asserts and guards for invariants not valid when using ASGCT - Still, the foundation of almost all 3rd party Java profilers
  • 12. Can I Just Use JFR? - TL;DR - almost - There are still some parts missing, provided by 3rd party profilers - ‘Proper’ CPU profiler - driven by CPU time rather than wallclock time - Non-biased allocation profiler - JFR allocation sampler biases on TLAB size - Non-biased heap profiler - trading-off non-biased nature for reference-chains - Profiling context - required for tracer-profiler integration (think OTel) - labelling events by context - guarding events by context - eg. instead of threshold use the presence of context - JFR is currently very closed to enhancements - Adding support required for contemporary profiling needs is excruciatingly slow
  • 13. Really, Can I Just Use JFR? ● Yes! If the following features are implemented ○ [Proper] CPU Profiler ○ Profiling Context ● Having an API to request event emission from native would be great! ○ Custom sampling policies ○ Integration with perf events (woohoo!) ○ ebpf anyone ○ Prototyping concepts in 3rd party code before adding to JFR core
  • 14. Improved JFR CPU Profiler ● Use CPU time based sampler driver (perf_event_open, timer_create) ○ Subject to availability ■ Prefer perf_event_open, if available ■ Fall-back to timer_create, if available ■ If not on Linux or neither perf_event_open nor timer_create is available, fall-back to the dedicated sampler thread ○ Alternatively, provide a way to request ExecutionSample event from a native signal handler ● Make the stacktrace acquisition safe to run in signal-handler ○ JEP 435: Asynchronous Stack Trace VM API ■ Samples recorded at the exact PC ■ But stack walked only on the method-exit safe-point (credits to Erik Oesterlund for this idea!) ■ Johannes Bechberger making great progress
  • 15. JFR Context ● What is context? ○ Trace ID ○ REST endpoint ○ Request URL ○ … and anything else allowing to scope JFR events ● Start simple ○ Context is attached to a (virtual) thread ○ Context value is a plain string ○ Finite small number of context values ○ Context values are represented as augmented event fields ○ No automatic context propagation ■ It is up to the API user take care of continuations carrying the right context ● There is prior art eg. in Go ○ Profiler Labels ○ It is a first-class runtime citizen
  • 16. JFR Context Alternatives ● We (Datadog) tried to work-around the lack of context by ○ Special ‘Context’ events ■ Event spans time between context set and context unset ■ Huge amount of such events for reactive/async apps ■ Getting very complex when tracking more than one context attribute ■ Thresholding does not help as it leads to unpredictable context loss ■ Difficult to correlate with the rest of the data that can be sampled ○ Special ‘Context Change’ events ■ Each event represents state transition ■ Easier to correlate with potentially sampled data ■ The amount of events turned out to be unbearable (millions per minute)
  • 17. External Context Implementation ● Implement the context in a separate profiler ● Datadog profiler has such an implementation ● It comes at the cost of ○ Replicating the JFR writer implementation ○ Replicating several JFR provided events ○ Missing context for low-level events like the most of the thread-halting events (ThreadPark, MonitorWait, etc.) ○ Relying on ASGCT which may crash the profiled app ● Still, the feature is loved by our customers for the increased clarity of the profiling data
  • 19. Datadog Profiling Context - Context propagation - Implemented in Java tracer - Context associated with a unit of work - Independent of executing thread - Context persistence - Implemented in the profiler agent - Store context in JFR events - Easy and fast Java<->Native interop is mandatory - No JNI calls, please! - Shared memory buffer - Relying on Java and native side being tightly coupled - Tags are plain strings - Dictionarized - No custom types - Semi-custom context - Capped at ten custom tags - Custom tag types/names - Must be defined before profiler is started - Stored in the JFR recording
  • 20. Shared Memory Context - One context per thread - Sparse thread-page map - Static size - Efficient memory layout - 64 bytes to match the common x64 cache line size - Checksum - Used to detect tearing, partial writes - 64 bit/8 bytes - Context Content - Provides 10 slots (currently) - Each slot is 4 bytes - Possibly up to 14 slots (56 bytes)
  • 21. Shared Memory Context Thread 1 Thread 2 … Thread N 1 2 3 4 5 6 7 8 9 10 chksum 64b Context data (10 slots, 40 bytes 64 bytes (eg. cache line) 1 2 3 4 5 6 7 8 9 10 chksum 64b Context data (10 slots, 40 bytes) 64 bytes (eg. cache line) Thread page map