• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When You're Not Looking)
 

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When You're Not Looking)

on

  • 13,828 views

A talk on how the JVM's JIT works, how to monitor it, and how to interpret the results for great justice.

A talk on how the JVM's JIT works, how to monitor it, and how to interpret the results for great justice.

Statistics

Views

Total Views
13,828
Views on SlideShare
7,268
Embed Views
6,560

Actions

Likes
29
Downloads
229
Comments
4

37 Embeds 6,560

http://cheremin.blogspot.com 5194
http://cheremin.blogspot.ru 874
http://cheremin.blogspot.se 100
http://cheremin.blogspot.co.uk 80
http://cheremin.blogspot.de 70
https://twitter.com 32
http://cheremin.blogspot.co.il 19
http://a0.twimg.com 19
https://twimg0-a.akamaihd.net 18
http://cheremin.blogspot.kr 18
http://us-w1.rockmelt.com 16
http://tweetedtimes.com 16
http://cheremin.blogspot.co.nz 14
http://cheremin.blogspot.ro 13
https://si0.twimg.com 12
http://cheremin.blogspot.com.au 10
http://cheremin.blogspot.ie 6
http://cheremin.blogspot.cz 6
http://cheremin.blogspot.sg 4
http://cheremin.blogspot.ca 4
http://cheremin.blogspot.no 3
http://cheremin.blogspot.jp 3
http://cheremin.blogspot.fr 3
http://cheremin.blogspot.it 3
http://cheremin.blogspot.com.es 3
http://cheremin.blogspot.dk 3
http://iprismwanbalancer 2
http://zootool.com 2
http://192.168.6.52 2
http://www.vc.com 2
http://cheremin.blogspot.in 2
http://cheremin.blogspot.nl 2
http://cheremin.blogspot.fi 1
http://cheremin.blogspot.ch 1
http://cheremin.blogspot.sk 1
http://paper.li 1
http://cheremin.blogspot.co.at 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

14 of 4 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • excellent presentation!
    Are you sure you want to
    Your message goes here
    Processing…
  • Awesome, thanks!
    Are you sure you want to
    Your message goes here
    Processing…
  • @HiroshiNakamura I fixed slide 5 for my ConFoo talk, but I'll look at the other items you mention. Thank you!
    Are you sure you want to
    Your message goes here
    Processing…
  • Awesome slides. Thank you very much for sharing this.
    A couple of findins;
    - 'What We Will Lean' in a slide. typo? (p5)
    - URL to logc needs update (p64)
    - Notes removed? (p68)
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When You're Not Looking) Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When You're Not Looking) Presentation Transcript

  • What the JVM Does With Your BytecodeWhen Nobody’s Looking
  • JVM JIT for Dummies And the rest of you, too.
  • Intro• Charles Oliver Nutter • “JRuby Guy” • Sun Microsystems 2006-2009 • Engine Yard 2009-• Primarily responsible for compiler, perf • Lots of bytecode generation
  • What We Won’t• GC tuning• GC monitoring with VisualVM • Google ‘visualgc’, it’s awesome
  • What We Will Lean• How the JVM’s JIT works• Monitoring the JIT• Finding problems• Dumping assembly (don’t be scared!)
  • JIT• Just-In-Time compilation• Compiled when needed • Maybe immediately before execution • ...or when we decide it’s important • ...or never?
  • Mixed-Mode• Interpreted • Bytecode-walking • Artificial stack• Compiled • Direct native operations • Native registers, memory, etc
  • Profiling• Gather data about code while interpreting • Invariants (types, constants, nulls) • Statistics (branches, calls)• Use that information to optimize • Educated guess?
  • The Golden Rule of Optimization Don’t do unnecessary work.
  • Optimization• Method inlining• Loop unrolling• Lock coarsening/eliding• Dead code elimination• Duplicate code elimination• Escape analysis
  • Inlining?• Combine caller and callee into one unit • e.g. based on profile • Perhaps with a guard/test• Optimize as a whole • More code means better visibility
  • Inliningint addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum;}int add(int a, int b) { return a + b;}
  • Inliningint addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Only one target is ever seen}int add(int a, int b) { return a + b;}
  • Inliningint addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = accum + i; } return accum; Don’t bother making the call}
  • Loop Unrolling• Works for small, constant loops• Avoid tests, branching• Allow inlining a single call as many
  • Loop Unrollingprivate static final String[] options = { "yes", "no", "maybe"};public void looper() { for (String option : options) { process(option); }} Small loop, constant stride, constant size
  • Loop Unrollingprivate static final String[] options = { "yes", "no", "maybe"};public void looper() { process(options[0]); process(options[1]); Unrolled! process(options[2]);}
  • Lock Coarseningpublic void needsLocks() { for (option : options) { process(option); } Repeatedly locking}private synchronized String process(String option) { // some wacky thread-unsafe code}
  • Lock Coarseningpublic void needsLocks() { Lock once synchronized (this) { for (option : options) { // some wacky thread-unsafe code } }}
  • Lock Elidingpublic void overCautious() { Synchronize on List l = new ArrayList(); synchronized (l) { new Object for (option : options) { l.add(process(option)); } }} But we know it never escapes this thread...
  • Lock Elidingpublic void overCautious() { List l = new ArrayList(); for (option : options) { l.add( /* process()’s code */); }} No need to lock
  • Escape Analysisprivate static class Foo { public String a; public String b; Foo(String a, String b) { this.a = a; this.b = b; }}
  • Escape Analysispublic void bar() { Foo f = new Foo("Hello", "Øredev"); baz(f);}public void baz(Foo f) { Same object all System.out.print(f.a); System.out.print(", "); the way through quux(f);} Never “escapes”public void quux(Foo f) { these methods System.out.print(f.b); System.out.println(!);}
  • Escape Analysispublic secret awesome inlinedBarBazQuux() { System.out.print("Hello"); System.out.print(", "); System.out.print("Øredev"); System.out.println(!);} Don’t bother allocating Foo object!
  • Perf Sinks• Memory accesses • By far the biggest expense• Calls • Memory ref + branch kills pipeline • Call stack, register juggling costs• Locks and volatile writes
  • Volatile?• Each CPU maintains a memory cache• Caches may be out of sync • If it doesn’t matter, no problem • If it does matter, threads disagree!• Volatile forces synchronization of cache • Across cores and to main memory
  • Call Site• The place where you make a call• Monomorphic (“one shape”) • Single target class• Bimorphic (“two shapes”)• Polymorphic (“many shapes”)• Megamorphic (“you’re screwed”)
  • Blah.javaSystem.currentTimeMillis(); // static, monomorphicList list1 = new ArrayList(); // constructor, monomorphicList list2 = new LinkedList();for (List list : new List[]{ list1, list2 }) { list.add("hello"); // bimorphic}for (Object obj : new Object[]{ foo, list1, new Object() }) { obj.toString(); // polymorphic}
  • Hotspot• -client mode (C1) inlines, less aggressive • Fewer opportunities to optimize• -server mode (C2) inlines aggressively • Based on richer runtime profiling • We’ll focus on this• Tiered mode combines them • -XX:+TieredCompilation
  • C2 “server” Inlining• Profile to find “hot spots” • Call sites • Branch statistics • Profile until 10k calls• Inline mono/bimorphic calls• Other mechanisms for polymorphic calls
  • Tuning Inlining• -XX:+MaxInlineSize=35 • Largest inlinable method (bytecode)• -XX:+InlineSmallCode=# • Largest inlinable compiled method• -XX:+FreqInlineSize=# • Largest frequently-called method...
  • Tuning Inlining• -XX:+MaxInlineLevel=9 • How deep does the rabbit hole go?• -XX:+MaxRecursiveInlineLevel=# • Recursive inlining
  • Now it gets fun!
  • Monitoring the JIT• Dozens of flags• Reams of output• Always evolving• How can you understand it?
  • public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } static int add(int a, int b) { return a + b; }}
  • $ java -versionopenjdk version "1.7.0-b147"OpenJDK Runtime Environment (build 1.7.0-b147-20110927)OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode)$ javac Accumulator.java$ java Accumulator 1000499500
  • Print Compilation• -XX:+PrintCompilation• Print methods as they JIT • Class + name + size
  • $ java -XX:+PrintCompilation Accumulator 1000 53 1 java.lang.String::hashCode (67 bytes)499500
  • $ java -XX:+PrintCompilation Accumulator 1000 53 1 java.lang.String::hashCode (67 bytes)499500 Where’s our code?!?
  • $ java -XX:+PrintCompilation Accumulator 1000 53 1 java.lang.String::hashCode (67 bytes)499500 Where’s our code?!? Remember...10k calls before JIT
  • 10k loop, 10k calls to add$ java -XX:+PrintCompilation Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 64 2 Accumulator::add (4 bytes)49995000 Hooray!
  • But what’s this?$ java -XX:+PrintCompilation Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 64 2 Accumulator::add (4 bytes)49995000 Class loading, security logic, other stuff...
  • Dear god...there’s zombies in my code?!?1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • Dear god...there’s zombies in my code?!?1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes) Not entrant? What the heck?
  • Optimistic Compilers• Assume profile is accurate• Aggressively optimize based on profile• Bail out if we’re wrong • ...and hope that we’re usually right
  • Deoptimization• Bail out of running code• Monitoring flags describe process • “uncommon trap” - we were wrong • “not entrant” - don’t let new calls enter • “zombie” - on its way to deadness
  • No JIT At All?• Code is too big• Code isn’t called enough
  • That looks exciting!1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • Exception handling in here (boring!)1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • What’s this “n” all about?1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • This method is native...maybe “intrinsic”1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes) We’ll come back to that...
  • And this one? 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)Method has been replaced while running (OSR)
  • Millis from JVM start1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)1412 71 java.lang.String::indexOf (7 bytes)1420 72 ! java.io.BufferedReader::readLine (304 bytes)1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant1435 74 n java.lang.Object::hashCode (0 bytes)1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant1449 75 java.lang.String::endsWith (15 bytes)1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)1665 76 java.lang.ClassLoader::checkName (43 bytes) Sequence number of compilation
  • Print Inlining• -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining• Display hierarchy of inlined methods• Include reasons for not inlining• More, better output on OpenJDK 7
  • $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > Accumulator 1000049995000
  • $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > Accumulator 1000049995000 Um...I don’t see anything inlining
  • public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } static int add(int a, int b) { return a + b; }}
  • public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Called 10k times } static int add(int a, int b) { return a + b; }}
  • public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Called 10k times } static int add(int a, int b) { JITs as expected return a + b; }}
  • public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Called 10k times } static int add(int a, int b) { JITs as expected return a + b; }} But makes no calls!
  • static double addAllSqrts(int max) { double accum = 0; for (int i = 0; i < max; i++) { accum = addSqrt(accum, i); } return accum;}static int addSqrt(double a, int b) { return a + sqrt(b);}static double sqrt(int a) { return Math.sqrt(b);}
  • $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > -XX:+PrintCompilation > Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 65 2 Accumulator::addSqrt (7 bytes) @ 3 Accumulator::sqrt (6 bytes) inline (hot) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 65 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic)666616.4591971082
  • $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining HOT HOT HOT!> -XX:+PrintCompilation > Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 65 2 Accumulator::addSqrt (7 bytes) @ 3 Accumulator::sqrt (6 bytes) inline (hot) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 65 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic)666616.4591971082
  • LogCompilation• -XX:+LogCompilation• Worst. XML. Evar.• <JDK>/hotspot/src/share/tools/LogCompilation • or http://github.com/headius/logc
  • scopes_pcs_offset=1384 dependencies_offset=1576 handler_table_offset=1592 nul_chk_table_offset=1736oops_offset=992 method=org/jruby/lexer/yacc/ByteArrayLexerSource$ByteArrayCursor read ()I bytes=49count=5296 backedge_count=1 iicount=10296 stamp=0.412/><writer thread=4425007104/><nmethod compile_id=21 compiler=C2 entry=4345862528 size=1152 address=4345862160relocation_offset=288 insts_offset=368 stub_offset=688 scopes_data_offset=840 scopes_pcs_offset=904dependencies_offset=1016 handler_table_offset=1032 oops_offset=784 method=org/jruby/lexer/yacc/ByteArrayLexerSource forward (I)I bytes=111 count=5296 backedge_count=1 iicount=10296 stamp=0.412/><writer thread=4300214272/><task_queued compile_id=22 method=org/jruby/lexer/yacc/ByteArrayLexerSource read ()I bytes=10count=5000 backedge_count=1 iicount=10000 stamp=0.433 comment=count hot_count=10000/><writer thread=4426067968/><nmethod compile_id=22 compiler=C2 entry=4345885984 size=1888 address=4345885584relocation_offset=288 insts_offset=400 stub_offset=912 scopes_data_offset=1104scopes_pcs_offset=1496 dependencies_offset=1704 handler_table_offset=1720 nul_chk_table_offset=1864oops_offset=1024 method=org/jruby/lexer/yacc/ByteArrayLexerSource read ()I bytes=10 count=5044backedge_count=1 iicount=10044 stamp=0.435/><writer thread=4300214272/><task_queued compile_id=23 method=java/util/HashMap hash (I)I bytes=23 count=5000 backedge_count=1iicount=10000 stamp=0.442 comment=count hot_count=10000/><writer thread=4425007104/><nmethod compile_id=23 compiler=C2 entry=4345887808 size=440 address=4345887504relocation_offset=288 insts_offset=304 stub_offset=368 scopes_data_offset=392 scopes_pcs_offset=400dependencies_offset=432 method=java/util/HashMap hash (I)I bytes=23 count=5039 backedge_count=1iicount=10039 stamp=0.442/><writer thread=4300214272/><dependency_failed type=abstract_with_unique_concrete_subtype ctxk=org/jruby/lexer/yacc/LexerSourcex=org/jruby/lexer/yacc/ByteArrayLexerSource witness=org/jruby/lexer/yacc/InputStreamLexerSourcestamp=0.456/><dependency_failed type=abstract_with_unique_concrete_subtype ctxk=org/jruby/lexer/yacc/LexerSourcex=org/jruby/lexer/yacc/ByteArrayLexerSource witness=org/jruby/lexer/yacc/InputStreamLexerSourcestamp=0.456/><dependency_failed type=abstract_with_unique_concrete_subtype ctxk=org/jruby/lexer/yacc/LexerSourcex=org/jruby/lexer/yacc/ByteArrayLexerSource witness=org/jruby/lexer/yacc/InputStreamLexerSourcestamp=0.456/><dependency_failed type=abstract_with_unique_concrete_subtype ctxk=org/jruby/lexer/yacc/LexerSourcex=org/jruby/lexer/yacc/ByteArrayLexerSource witness=org/jruby/lexer/yacc/InputStreamLexerSourcestamp=0.456/>
  • $ java -jar logc.jar hotspot.log1 java.lang.String::hashCode (67 bytes)2 Accumulator::addSqrt (7 bytes)3 Accumulator::sqrt (6 bytes)
  • $ java -jar logc.jar hotspot.log1 java.lang.String::hashCode (67 bytes)2 Accumulator::addSqrt (7 bytes) @ 2 Accumulator::sqrt (6 bytes) (end time: 0.0660 nodes: 36) @ 2 java.lang.Math::sqrt (5 bytes)3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes)
  • 8 sun.nio.cs.UTF_8$Encoder::encode (361 bytes)6 uncommon trap null_check make_not_entrant @8 java/lang/String equals (Ljava/lang/Object;)Z6 make_not_entrant9 java.lang.String::equals (88 bytes)10 java.util.LinkedList::indexOf (73 bytes)
  • Hotspot sees it’s 100% String 10 java.util.LinkedList::indexOf (73 bytes) @ 52 java.lang.Object::equals (11 bytes) type profile java/lang/Object -> java/lang/String (100%) @ 52 java.lang.String::equals (88 bytes) 11 java.lang.String::indexOf (87 bytes) @ 83 java.lang.String::indexOfSupplementary too big Too big to inline! Could be bad?
  • Intrinsic?• Known to the JIT • Don’t inline bytecode • Do insert “best” native code • e.g. kernel-level memory operation • e.g. optimized sqrt in machine code
  • $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > -XX:+PrintCompilation > Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 65 2 Accumulator::addSqrt (7 bytes) @ 3 Accumulator::sqrt (6 bytes) inline (hot) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 65 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic)666616.4591971082 Calls treated specially by JIT
  • Common Intrinsics• String#equals• Most (all?) Math methods• System.arraycopy• Object#hashCode• Object#getClass• sun.misc.Unsafe methods
  • Did someone sayMACHINE CODE?!
  • The Red Pill• Knowing code compiles is good• Knowing code inlines is better• Seeing the actual assembly is best!
  • Caveat• I don’t really know assembly.• But I fake it really well.
  • Print Assembly• -XX:+PrintAssembly• Google “hotspot printassembly”• http://wikis.sun.com/display/ HotSpotInternals/PrintAssembly• Assembly-dumping plugin for Hotspot
  • Alternative• -XX:+PrintOptoAssembly• Only in debug/fastdebug builds• Not as pretty
  • Wednesday, July 27, 2011 ~/oscon ! java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintAssembly > Accumulator 10000 OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output Loaded disassembler from hsdis-amd64.dylib ...
  • Decoding compiled method 11343cbd0:Code:[Disassembling for mach=i386:x86-64][Entry Point][Verified Entry Point][Constants] # {method} add (II)I in Accumulator # parm0: rsi = int # parm1: rdx = int # [sp+0x20] (sp of caller) 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq
  • Woah there buddy...
  • x86_64 Assembly 101 add Two’s complement add sub ...subtract mov* Move data from a to b jmp gotoje, jne, jl, jge, ... Jump if ==, !=, <, >=, ... push, pop Call stack operations call*, ret* Call, return from subroutineeax, ebx, esi, ... 32-bit registersrax, rbx, rsi, ... 64-bit registers
  • Register Machine• Instead of stack moves, we have “slots”• Move data into slots• Trigger operations that manipulate data• Get new data out of slots• JVM stack, locals end up as register ops
  • Stack?• Native code has a stack too • Maintains registers from call to call• Various calling conventions • Caller saves registers? • Callee saves registers?
  • Decoding compiled method 11343cbd0: <= address of new compiled codeCode:[Disassembling for mach=i386:x86-64] <= architecture[Entry Point][Verified Entry Point][Constants] # {method} add (II)I in Accumulator <= method, signature, class # parm0: rsi = int <= first parm to method goes in rsi # parm1: rdx = int <= second parm goes in rdx # [sp+0x20] (sp of caller) <= caller’s pointer into native stack
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq rbp points at current stack frame, so we save it off.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Two args, so we bump stack pointer by 0x10.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Do nothing, e.g. to memory-align code.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq At the “-1” instruction of our add() method... i.e. here we go!
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Move parm1 into eax.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Add parm0 and parm1, store result in eax.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq How nice, Hotspot shows us this is our “iadd” op!
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Put stack pointer back where it was.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Restore rbp from stack.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq Poll a “safepoint”...give JVM a chance to GC, etc.
  • 11343cd00: push %rbp11343cd01: sub $0x10,%rsp11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16)11343cd06: mov %esi,%eax11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16)11343cd0a: add $0x10,%rsp11343cd0e: pop %rbp11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return}11343cd15: retq All done!
  • Things to Watch For• CALL operations • Indicates something failed to inline• LOCK operations • Cache-busting, e.g. volatility
  • CALL 1134858f5: xchg %ax,%ax 1134858f7: callq 113414aa0 ; OopMap{off=316} ;*invokespecial addAsBignum ; - org.jruby.RubyFixnum::addFixnum@29 (line 348) ; {optimized virtual_call} 1134858fc: jmpq 11348586dRuby integer adds might overflow into Bignum, leading toaddAsBignum call. In this case, it’s never called, so Hotspot emits callq assuming we won’t hit it.
  • LOCKCode from a RubyBasicObject’s default constructor.11345d823: mov 0x70(%r8),%r9d ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::<init>@5 (line 76) ; - org.jruby.RubyObject::<init>@2 (line 118) ; - org.jruby.RubyNumeric::<init>@2 (line 111) ; - org.jruby.RubyInteger::<init>@2 (line 95) ; - org.jruby.RubyFixnum::<init>@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173)11345d827: mov %r9d,0x14(%rax)11345d82b: lock addl $0x0,(%rsp) ;*putfield varTable ; - org.jruby.RubyBasicObject::<init>@8 (line 76) ; - org.jruby.RubyObject::<init>@2 (line 118) ; - org.jruby.RubyNumeric::<init>@2 (line 111) ; - org.jruby.RubyInteger::<init>@2 (line 95) ; - org.jruby.RubyFixnum::<init>@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173)Why are we doing a volatile write in the constructor?
  • LOCKpublic class RubyBasicObject ... { private static final boolean DEBUG = false; private static final Object[] NULL_OBJECT_ARRAY = new Object[0]; // The class of this object protected transient RubyClass metaClass; // zeroed by jvm protected int flags; // variable table, lazily allocated as needed (if needed) private volatile Object[] varTable = NULL_OBJECT_ARRAY; Maybe it’s not such a good idea to pre-init a volatile?
  • LOCK~/projects/jruby ! git log 2f935de1e40bfd8b29b3a74eaed699e519571046 -1 | catcommit 2f935de1e40bfd8b29b3a74eaed699e519571046Author: Charles Oliver Nutter <headius@headius.com>Date: Tue Jun 14 02:59:41 2011 -0500 Do not eagerly initialize volatile varTable field in RubyBasicObject;speeds object creation significantly. LEVEL UP!
  • invokedynamic?• Largely, it works the same • MethodHandles optimize to x86 asm • Inlining as normal• Performance nearly the same as static!• And that’s exactly the point!
  • What Have We Learned? • How Hotspot’s JIT works • How to monitor the JIT • How to find problems • How to fix problems we find
  • What We Missed• Tuning GC settings in JVM• Monitoring GC with VisualVM • Google ‘visualgc’...it’s awesome
  • You’re no dummy now! ;-)
  • Thank you!• headius@headius.com, @headius• http://blog.headius.com• “java virtual machine specification”• “jvm opcodes”