Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Jvm profiling under the hood

2,755 views

Published on

Profilers find performance bottlenecks in your app but provide confusing information. Let's give you insights into how your profiler and your app are really interacting. What profiling APIs are available, how they work, and what their implementation on the JVM (OpenJDK) side looks like:

Stack sampling profilers: stop motion view of your app
GetCallTrace(JVisualVM case study): The official stack sampling API
Safepoints and safepoint sampling bias
AsyncGetCallTrace(Honest Profiler Case Study): The unofficial API
JVM Profilers vs System Profilers: No API needed?

Published in: Technology
  • Be the first to comment

Jvm profiling under the hood

  1. 1. JVM Profiling Under da Hood Richard Warburton - @RichardWarburto Nitsan Wakart - @nitsanw
  2. 2. Why Profile? Lies, Damn Lies and Statistical Profiling Under the Hood Conclusion
  3. 3. Measure data from your application
  4. 4. Exploratory Profiling
  5. 5. Execution Profiling = Where in code is my application spending time?
  6. 6. CPU Profiling Limitations ● Finds CPU bound bottlenecks ● Many problems not CPU Bound ○ Networking ○ Database or External Service ○ I/O ○ Garbage Collection ○ Insufficient Parallelism ○ Blocking & Queuing Effects
  7. 7. Why Profile? Lies, Damn Lies and Statistical Profiling Under the Hood Conclusion
  8. 8. Different Execution Profilers ● Instrumenting ○ Adds timing code to application ● Sampling ○ Collects thread dumps periodically
  9. 9. Sampling Profilers WebServerThread.run() Controller.doSomething() Controller.next() Repo.readPerson() new Person() View.printHtml()
  10. 10. Periodicity Bias ● Bias from sampling at a fixed interval ● Periodic operations with the same frequency as the samples ● Timed operations
  11. 11. Periodicity Bias a() ??? a() ??? a() ??? a() ???
  12. 12. Stack Trace Sampling ● JVMTI interface: GetCallTrace ○ Trigger a global safepoint(not on Zing) ○ Collect stack trace ● Large impact on application ● Samples only at safepoints
  13. 13. Example private static void outer() { for (int i = 0; i < OUTER; i++) { hotMethod(i); } } // https://github.com/RichardWarburton/profiling-samples
  14. 14. Example (2) private static void hotMethod(final int i) { for (int k = 0; k < N; k++) { final int[] array = SafePointBias. array; final int index = i % SIZE; for (int j = index; j < SIZE; j++) { array[index] += array[j]; } } }
  15. 15. -XX:+PrintSafepointStatistics ThreadDump 48 Maximum sync time 985 ms
  16. 16. Whats a safepoint? ● Java threads poll global flag ○ At ‘uncounted’ loops back edge ○ At method exit/enter ● A safepoint poll can be delayed by: ○ Large methods ○ Long running ‘counted’ loops ○ BONUS: Page faults/thread suspension
  17. 17. Safepoint Bias WebServerThread.run() Controller.doSomething() Controller.next() Repo.readPerson() new Person() View.printHtml() ???
  18. 18. Let sleeping dogs lie? ● ‘GetCallTrace’ profilers will sample ALL threads ● Even sleeping threads...
  19. 19. This Application Mostly Sleeps JVisualVM snapshot
  20. 20. No CPU? No profile! JMC profile
  21. 21. Why Profile? Lies, Damn Lies and Statistical Profiling Under the Hood Conclusion
  22. 22. Honest Profiler https://github.com/richardwarburton/honest-profiler
  23. 23. AsyncGetCallTrace ● Used by Oracle Solaris Studio ● Adapted to open source prototype by Google’s Jeremy Manson ● Unsupported, Undocumented … Underestimated
  24. 24. SIGPROF - Interrupt Handlers ● OS Managed timing based interrupt ● Interrupts the thread and directly calls an event handler ● Used by profilers we’ll be talking about
  25. 25. Design Log File Processor Thread Graphical UI Console UI Signal Handler Signal Handler Os Timer Thread
  26. 26. “You are in a maze of twisty little stack frames, all alike”
  27. 27. AsyncGetCallTrace under the hood ● A Java thread is ‘possessed’ ● You have the PC/FP/SP ● What is the call trace? ○ jmethodId - Java Method Identifier ○ bci - Byte Code Index -> used to find line number
  28. 28. Where Am I? ● Given a PC what is the current method? ● Is this a Java method? ○ Each method ‘lives’ in a range of addresses ● If not, what do we do?
  29. 29. Java Method? Which line? ● Given a PC, what is the current line? ○ Not all instructions map directly to a source line ● Given super-scalar CPUs what does PC mean? ● What are the limits of PC accuracy?
  30. 30. “> I think Andi mentioned this to me last year -- > that instruction profiling was no longer reliable. It never was.” http://permalink.gmane.org/gmane.linux.kernel.perf.user/1948 Exchange between Brenden Gregg and Andi Kleen
  31. 31. Skid ● PC indicated will be >= to PC at sample time ● Known limitation of instruction profiling ● Leads to harder ‘blame analysis’
  32. 32. Limits of line number accuracy: Line number (derived from BCI) is the closest attributable BCI to the PC (-XX:+DebugNonSafepoint) The PC itself is within some skid distance from actual sampled instruction
  33. 33. ● Divided into frames ○ frame { sender*, stack*, pc } ● A single linked list: root(null, s0, pc1) <- call1 (root, s1, pc2) <- call2(call1, s2, pc2) ● Convert to: (jmethodId,lineno) The Stack
  34. 34. A typical stack ● JVM Thread runner infra: ○ JavaThread::run to JavaCalls::call_helper ● Interleaved Java frames: ○ Interpreted ○ Compiled ○ Java to Native and back ● Top frame may be Java or Native
  35. 35. Native frames ● Ignored, but need to navigate through ● Use a dedicated FP register to find sender ● But only if compiled to do so… ● Use a last remembered Java frame instead See: http://duartes.org/gustavo/blog/post/journey-to-the-stack/
  36. 36. Java Compiled Frames ● C1/C2 produce native code ● No FP register: use set frame size ● Challenge: methods can move (GC) ● Challenge: methods can get recompiled
  37. 37. Java Interpreter frames ● Separately managed by the runtime ● Make an effort to look like normal frames ● Challenge: may be interrupted half-way through construction...
  38. 38. Virtual Frames ● C1/C2 inline code (intrinsics/other methods) ● No data on stack ● Must use JVM debug info
  39. 39. AsyncGetCallTrace Limitations ● Only profiles running threads ● Accuracy of line info limited by reality ● Only reports Java frames/threads ● Must lookup debug info during call
  40. 40. Compilers: Friend or Fiend? void safe_reset(void *start, size_t size) { char *base = reinterpret_cast<char *>(start); char *end = base + size; for (char *p = base; p < end; p++) { *p = 0; } }
  41. 41. Compilers: Friend or Fiend? safe_reset(void*, unsigned long): lea rdx, [rdi+rsi] cmp rdi, rdx jae .L3 sub rdx, rdi xor esi, esi jmp memset .L3: rep ret
  42. 42. Concurrency Bug ● Even simple concurrency bugs are hard to spot ● Unspotted race condition in the ring buffer ● Spotted thanks to open source & Rajiv Signal
  43. 43. Writer Reader
  44. 44. Writer Reader
  45. 45. Extra Credit!
  46. 46. Native Profiling Tools ● Profile native methods ● Profile at the instruction level ● Profile hardware counters
  47. 47. Perf ● A Linux profiling tool ● Can be made to work with Java ● JMH integration ● Ongoing integration efforts
  48. 48. Solaris Studio ● Works on Linux! ● Secret Weapon! ● Give it a go!
  49. 49. ZVision ● Works for Zing ● No HWC support ● Very informative
  50. 50. Why Profile? Lies, Damn Lies and Statistical Profiling Under the Hood Conclusion
  51. 51. What did we cover? ● Biases in Profilers ● More accurate sampling ● Alternative Profiling Approaches
  52. 52. Don’t just blindly trust your tooling.
  53. 53. Test your measuring instruments
  54. 54. Open Source enables implementation review
  55. 55. Q & A @nitsanw psy-lob-saw.blogspot.co.uk @richardwarburto insightfullogic.com java8training.com www.pluralsight. com/author/richard- warburton
  56. 56. Slides after here just for reference, don’t delete or show

×