SlideShare a Scribd company logo
1 of 65
Download to read offline
H2O.ai
Machine Intelligence
Fast, Scalable In-Memory Machine and Deep Learning
For Smarter Applications
Bits of Advice
For the VM Writer
Cliff Click
H2O.ai
Machine Intelligence
Who Am I?
Cliff Click
CTO, Co-Founder H2O.ai
cliffc@h2o.ai
40 yrs coding
35 yrs building compilers
30 yrs distributed computation
20 yrs OS, device drivers, HPC, HotSpot
10 yrs Low-latency GC, custom java hardware,
NonBlockingHashMap
20 patents, dozens of papers
100s of public talks
PhD Computer Science
1995 Rice University
HotSpot JVM Server Compiler
“showed the world JITing is possible”
H2O.ai
Machine Intelligence
Some History of HotSpot
● HotSpot started as a Self VM ported to Java
– Simple GC, simple JIT, single-threaded, X86 only
● I've been working on HotSpot for ~20 yrs now
● Watched HotSpot become 'thread-robust'
● Many ports: X86, Sparc, IA64, X86-64, MIPS, Azul
● Pick up 2 new compilers (C1 & C2, -client & -server)
● Pick up many new GC's (serial, train, parallel, CMS, G1,
Azul's Pauseless)
● Other Cool Stuff: reflection as bytecodes, thin locks, JMM,
JNI, JVM{DI,TI,PI}, ....
H2O.ai
Machine Intelligence
Nature of This Talk
● Much hard experience learned building HotSpot
– Too much info for one talk!
● Many topics are crucial for
– Ultimate speed, power, or footprint
– Engineering complexity & cost
● Interlock badly, in non-obvious ways
● I'm gonna breeze through! (limits of time)
– Happy to stop for questions
– ASK: because if your confused,
so is the next person
H2O.ai
Machine Intelligence
Agenda
● Some Choices To Make
● Native Calls – a Deep Dive
● Things that Worked Well
● Hard Things but Worth Doing
● Things I Won't Do Again
● Q&A
(if time, A Deep Dive into Code Unloading)
H2O.ai
Machine Intelligence
VMs are Big Complex Beasties
● Or Really Really Small (e.g., Squawk, OOVM)
● Depends on the Feature Set
● Many features interact in Bad Ways
– Usually interactions not obvious
● I went the Big Desktop/Server route
– Very different choices from the cell-phone guys
– Must solve a different set of problems
● I suspect most of this applies to .Net as well
H2O.ai
Machine Intelligence
• Portable or not
─ Native code vs JIT'd code
─ Calling conventions, Endianess
─ Threads, POSIX, stacks
• Footprint
─ Embedded (1G), desktop (32G+), server (16G-256G)
• X86 vs RISC (ARM? GPU? DSP?)
• JIT or Interpret (or no Interpreter!)
• Multi-threaded
─ Cooperative vs preemption
• Multi-CPU
Some Choices to make...
H2O.ai
Machine Intelligence
• Interpreter
• Simple, software only
─ Pure C
─ GCC label vars (about 2x faster than pure C)
─ Pure ASM (about 2x faster again)
• Hardware support (ARM jazelle, pico-Java?)
• Fancy: inline & schedule dispatch
─ Seen at least 2 different ways to slice this
• Requires stack oriented layout, bad for JITs
• or none at all...
Interpreter Choices...
H2O.ai
Machine Intelligence
• JIT
─ None? (gives up peak performance)
─ stage0, template style?
─ stage1, light optimization, linear-scan allocator?
─ stage2, all the optimizations, graph-coloring?
• Portable or targeted (usual X86 + Vendor)?
• Intercalls with native?
• Mixed-mode with Interpreter?
─ Also messes with calling convention
• Class loading vs inlining non-final methods?
Some JIT choices to make...
H2O.ai
Machine Intelligence
More JIT choices to make...
• Stage-0 JIT + No Interpreter
• Template generation, very fast low quality code
• No funny stack layouts
• Easy calling conventions for all
• Still slower than just interpreting run-once code
• Lots of run-once code at startup
• Lots of bulky code to startup
─ Big footprint
─ Can throw away & regenerate
H2O.ai
Machine Intelligence
Some GC choices to make...
• GC
─ Simple? (single generation, StopTheWorld?)
─ Fast? (Throughput? Low pause?)
─ Exact? (allows moving objects, compaction, de-frag, bump
pointer allocation)
─ Conservative? (No need to track all objects)
─ Tried & true? (e.g. Mark/Sweep)
─ Fancy new algorithm?
• Parallel? (really hard!)
• Concurrent? (really really hard!)
• Both? (really4
hard!)
H2O.ai
Machine Intelligence
More GC choices...
● Stop-Anywhere vs Safepoints
● Stop-Anywhere
– OOP Maps at each PC? (bulky data)
– Interpret instructions? (hard/slow on X86)
– Fixed OOP registers & stack areas? (bad register usage)
● Safepoints
– Cooperative? (polling costs?)
● Polling in software vs hardware?
– Preempt at wrong place?
– Roll forward vs step forward vs interpret?
H2O.ai
Machine Intelligence
Threading Issues
• Multi-threading costs
─ All operations no longer atomic
─ May be preempted inconveniently
─ Need locking (never needed it before!)
• Threads can block for I/O, or deadlock
• GC requires SP (& usually PC) for stack roots
• Hard to find the SP+PC of remote thread
─ Common problem of OS's
─ ! Very surprising to me, until I tried it !
─ Fails (w/low frequency) on many “robust” OSes!
─ Caught in page fault handler, nested in tlb handler, nested
in stack overflow handler nested in ....
H2O.ai
Machine Intelligence
Multiple CPU Issues
• Multi-CPUs?
─ Now need Atomic operations
─ Coherency, memory fences
─ Low-frequency data race bugs
• Need scale-able locks in JVM
─ Not just correctness locks
• Need scale-able locks for Java/JIT'd code
─ Spinning & retries
─ 'fair' locks under super high contention
• GC can be in-progress when a thread awakens
─ Threads now need to take a form of GC lock to run
H2O.ai
Machine Intelligence
64-bit Math Choices...
• Long (64bit int) math vs JIT
─ Major user is BigInteger package
─ Called by crypto routines
─ Called by web services everywhere
• BigInteger usage is as a 'pair of ints w/carry'
─ Lots of masking to high or low halves
─ Lots of shift-by-32
• Optimizes really well as a pair of ints
─ All those mask/shifts turn into simple register selection
─ Better code on 32-bit machines by far (X86!)
─ Almost tied with “good” 64b code on 64-bit machines
─ (add+addc vs add8)
addadd
addcaddc
H L H L
H2O.ai
Machine Intelligence
● Some Choices To Make
● Native Calls – a Deep Dive
● Things that Worked Well
● Hard Things but Worth Doing
● Things I Won't Do Again
● Q&A
(if time, A Deep Dive into Code Unloading)
Agenda
H2O.ai
Machine Intelligence
Native Calls: A Deep Dive
save 64 # register window push
mov i0,o0 # Move incoming arg
# o0 holds live OOP at call
call foo # The Native Call
nop # (really fill delay slot)
mov o0,i0 # Move outgoing arg
return #
restore # pop sparc window
● Ideally: simple call
– Only true for fixed application set
– Often holds true for cell phones, small embedded
● Reality: Too complex, wants own frame
● Example: native Object foo( double d );
H2O.ai
Machine Intelligence
The Argument Shuffle
● First problem: JIT and Native calling conventions
– Eg: SparcV8 passes float in int regs
– Java normally floats passed in float regs
– No Java varargs or prototype-less code
● Want fast argument shuffle
– Auto-generate asm shuffle code from sig
std d0,[sp+64] # float args passed in int regs
# 'this' pointer is in o0
ldw [sp+64],o1 # misaligned, need 2 loads
ldw [sp+68],o2 # double now in o1/o2
H2O.ai
Machine Intelligence
Handlizing OOP Arguments
● Cannot pass in OOPs to Native
– Lest GC be unable to find them
– (only for moving collector)
– Handlize OOPs – part of arg shuffle code
– Need an OOP-map as well: [sp+72] live across call
set 0,o0 # assume null arg
beq i0,skip # handle of null is null
stw i0,[sp+72] # handlize 'this'
add sp,72,o0 # ptr to 'this' passed in o0
skip:
● Reverse for returning an OOP after the call
beq o0,is_null
ldw [o0],o0 # de-handlize return value
is_null:
H2O.ai
Machine Intelligence
Synchronized Native
● May need locking, i.e. synchronized keyword
ldw [i0+0],l1 # Load 'this' header word
or l1,2,l2 # set locked-bit
stw l2,[sp+76] # save header on stack
cas [i0],l1,l2 # Attempt to fast-lock
cmp l1,l2 # success or fail
bne slow_lock # CAS failed? Recursive lock?
# not shown: inline recursive-lock handling
# Both [sp+72] and i0 hold a live OOP across call
# [sp+76] holds 'displaced header' – needed for inflation
call foo # The Native Call
nop # (really: fill in delay slot)
cas [i0],l2,l1 # Attempt to fast-unlock
cmp l1,l2 #
bne slow_unlock # (awake waiting threads)
H2O.ai
Machine Intelligence
Allowing Stack Crawls
● Next problem: Native code may block
– We must allow a GC (if multi-threaded)
– But GC needs SP+PC to crawl stack
– Which portable OS + native compiler won't tell us
● So store SP/PC before calling native
– Storing SP is also trigger that allows GC
setlo #ret_pc,l0
sethi #ret_pc,l0 # set 32-bit PC before allow GC
stw l0,[g7+&pc] # Rely on Sparc TSO here (IA64!)
stw sp,[g7+&sp] # Enable GC from here on
call foo # The Native Call!!!
nop # (really fill delay slot)
H2O.ai
Machine Intelligence
Returning while GC-in-Progress
● Next: Native code returns while GC active
– Must block until GC completes
– No good to spin (eats CPU)
● Similar to 'lock acquire'
● Requires real Atomic operation
setlo #ret_pc,l0
sethi #ret_pc,l0 # set 32-bit PC before allow GC
stw l0,[g7+&pc] # Rely on Sparc TSO here (IA64!)
stw sp,[g7+&sp] # Enable GC from here on
call foo # The Native Call!!!
nop # (really fill delay slot)
ret_pc: add g7+&sp,l0 # CAS does not take an offset
cas g0,sp,[l0] # if SP still there, store 0 to disable gc
bne gc_in_progress
H2O.ai
Machine Intelligence
Odds-N-Ends
ldw [g7+&gjni_sp],l3
stw g0,[l3+&top]
....
call foo
nop # delay slot
stw g0,[l3+&top]
● Need JNIEnv* argument, really offset from thread
● Reset temp handles before and after
● Profiling tags (optional)
# JNIEnv* is 1st
arg, shuffle others to o1,o2,etc...
add g7,&jnienv_offset,o0
H2O.ai
Machine Intelligence
A Sample Native Call
save 64 # register window push
add g7,&jnienv_offset,o0 # JNIEnv* in o0
set 0,o1 # assume null arg
beq i0,skip # handle of null is null
stw i0,[sp+72] # handlize 'this'
add sp,72,o1 # ptr to 'this' passed in o1
skip:
std d0,[sp+64] # float args passed in int regs
ldd [sp+64],o2 # double now in o2/o3
setlo #ret_pc,l0
sethi #ret_pc,l0 # set 32-bit PC before allow GC
stw l0,[g7+&pc] # Rely on Sparc TSO here (IA64!)
stw sp,[g7+&sp] # Enable GC from here on
call foo # The Native Call
ret_pc: add g7+&sp,l0 # CAS does not take an offset
cas 0,sp,[l0] # if SP still there, store 0 to disable gc
bne gc_in_progress
set 0,i0 # assume null result
beq o0,is_null
ldw [o0],i0 # de-handlize return value
is_null:
return #
restore # pop sparc window
The Call
}Stack crawl
GC lock
}GC unlock
}Handles,
GC-unsafe natives
}De-Handlize
}Arg-shuffle
JIT vs native convention
H2O.ai
Machine Intelligence
● Some Choices To Make
● Native Calls – a Deep Dive
● Things that Worked Well
● Hard Things but Worth Doing
● Things I Won't Do Again
● Q&A
(if time, A Deep Dive into Code Unloading)
Agenda
H2O.ai
Machine Intelligence
Safepoints, Preemption & Polling
● Safepoint notion
– Easy for Server compiler to track, optimize
– Good optimization
– Threads stop at 'convenient' spots
– Threads do many self-service tasks
● Self-stack is hot in local CPU cache
● Polling for Safepoints
– Software polling not so expensive
● Cooperative Preemption
– most stopped threads already Safepointed
H2O.ai
Machine Intelligence
Heavy weight JIT Compiler
• Heavy weight compiler
─ Needed for peak performance
─ Can be really heavyweight and still OK
─ Loop optimizations (unrolling, peeling, invariant motion)
─ Actually plenty cheap and payoff well
• C2's Graph IR
─ Very non-traditional
─ But very fast & light
─ And very easy to extend
• C2's Graph-Coloring Allocator
─ Robust in the face of over-inlining
H2O.ai
Machine Intelligence
Portable Stack Manipulation
● Portable stack crawling code
– Need SP & PC
– Need notion of 'next' frame, 'no more frames'
● Frame iterator
– Works for wide range of CPUs and OSs
● Less-portable bits:
– Flush register windows
● Must lazy flush & track flushing
– Two kinds of stack on IA64, Azul
● Frame adapters for mixing JIT, interpreter
– Custom asm bits for reorganizing call args
– Really cheap, once you figure it out
H2O.ai
Machine Intelligence
The Code Cache, Debugging
● CodeCache notion
– All code in same 4Gig space
– Only need a 32-bit PC everywhere
– All calls use 'cheap' local call
– Big savings on X86 (&Sparc, RISC) vs 'far' call
● BlahBlahBlah-ALot debugging flags
– SafepointALot, CompileALot, GCAlot, etc...
– Stress all sorts of interesting things
– Easy for QA to run big apps long time w/flags
– Catches zillions of bugs, usually quite easily
H2O.ai
Machine Intelligence
Thin Locks
● Thin-lock notion
– HotSpot's, Bacon-bits, whatever
– Key: single CAS on object word to own lock
– CAS on unlock hardly matters; it's all cache-hot now
● Actually, want a thinner-lock:
– Thread speculative 'owns' lock until contention
– No atomic ops, ever (until contention)
– Pain-in-neck to 'steal' to another thread
– BUT toooo many Java locks never ever contend
● JMM worked out nicely as well
H2O.ai
Machine Intelligence
● Some Choices To Make
● Native Calls – a Deep Dive
● Things that Worked Well
● Hard Things but Worth Doing
● Things I Won't Do Again
● Q&A
(if time, A Deep Dive into Code Unloading)
Agenda
H2O.ai
Machine Intelligence
Porting to Many CPUs+OSs
● Portable
– Sparc (windows, RISC)
– X86 (CISC, tiny register set)
– Both endianess
● System more robust for handling all flavors
– Requires better code discipline
– Separates out idea from implementation better
● No middle compilation tier
– HS has needed a middle tier for years
H2O.ai
Machine Intelligence
Deoptimization
● No runtime cost to inline non-finals
● No runtime cost if you DO override code
– Must recompile of course
● Must flip compiled frame into interpreted frame
● Not Rocket Science
– But darned tricky to get right
– Only HS does it
● Others pay a 'no-hoisting' cost at runtime
H2O.ai
Machine Intelligence
Self-Modifying Code
● Code Patching
– Inline-caches
– “Not-entrant” code
● Patch in the face of racing Java threads, of course
– Must be legit for Java threads to see partial patches
● Almost easy on RISC
– Still must do I-cache shoot-down
● Pain on X86
– Variable-size (does it fit?)
– Instructions span cache-lines
– No atomic update
H2O.ai
Machine Intelligence
HLL Assembler
● Hand ASM in HLL
– Turns out need lots of hand ASM
● Want tight integration to runtime & VM invariants
– External ASM doesn't provide this
● Fairly easy to make ASM 'look' like ASM
– But actually valid HLL code (C, Java)
– Which, when run, emits ASM to a buffer
– And proves invariants, and provides other support
H2O.ai
Machine Intelligence
64b Object Header
● Single word on 64-bit VM
● Large memory savings plus speed in caches
● Needs dense KlassID vs 64b KlassPtr
● Thread ID for locking, speculative locking
● HashCode – want all 32bits when hashing more
than a few billion objects
H2O.ai
Machine Intelligence
Dense Thread ID
● Align stacks on 2Mb boundary
– Stack overflow/underflow: TLB 4K page protection
– Protect whole stack for various GC asserts
● Mask SP to get Thread ptr; shift for Thread ID
– Plus Thread Local Storage for VM
● TID in Object headers for locking
● Thread ptr very common in core VM code
– Must be fast
H2O.ai
Machine Intelligence
Safepointing Single Threads
● Software polling of single TLS word
– 1-clock on X86, predicted branch, L1 cache hit load
● Set a bit to stop a thread at safepoint
● Thread does many self-service tasks
– Crawl self stack for GC roots, GC phase flip
– Install remote exception, or stack overflow
– Revoke biased lock
– Debugger hooks; stop/start/conditional breakpoints
– Clean inline caches
– Cooperative preemption at safepoint
H2O.ai
Machine Intelligence
● Some Choices To Make
● Native Calls – a Deep Dive
● Things that Worked Well
● Hard Things but Worth Doing
● Things I Won't Do Again
● Q&A
(if time, A Deep Dive into Code Unloading)
Agenda
H2O.ai
Machine Intelligence
Things I won't do again...
● Write a VM in C/C++
– Java plenty fast now
– Mixing OOPS in a non-GC language a total pain
– Forgetting 'this' is an OOP
● Across a GC-allowable call
– Roll-your-own malloc pointless now
● C2's BURS patterning-matching
– Harkens back to VAX days
– Never needed on RISC
– Not needed on X86 for a long time now
– Adds an extra indirection in the JIT engineering
H2O.ai
Machine Intelligence
Things I won't do again...
● Patch & roll-forward Safepoints
– Hideously complex
– Very heavy-weight to 'safepoint' a single thread
– Multiple OS suspend/resumes
– Patching non-trivial
– Required duplicate code
● And so required dup PC handling throughout VM
● Generic callee-save registers
– Real mess to crawl stacks and track
– X86 doesn't need them
– Register windows work fine, both variable and fixed
– Only PPC & ARM common+many regs+no windows
H2O.ai
Machine Intelligence
Things I won't do again...
● Adapter frames
– For intercalling JIT'd code and interpreter
– Contrast to 'frame adapters'
– These leave a frame on stack
– Extra frame screws up all kinds of stack crawls
– Occasionally end up with unbounded extra frames
● No adapter frames means: interpreter & JIT'd
code must agree on return value register
● Only an issue for SparcV8 long values
H2O.ai
Machine Intelligence
Things I won't do again...
● Constant OOPs in code
– Looks good on X86 as 32-bit immediate
– Split instructions on Sparc, PPC, other RISC
– Moving collector requires patching the constant
● Multi-instruction patch?
– Must patch with all threads stopped
– Requires Stop-The-World pause in GC
● Better answer: pay to load from table every time
– Easy to schedule around
– Concurrent GC
– Fewer instructions, especially for 64-bit VMs
– No patching, no tracking of where OOP is
H2O.ai
Machine Intelligence
Things I won't do again...
● Lock'd object header in stack
– Means no tracking recursion count
– Means total pain when inserting hashcode,
– or inflating lock due to contention,
– or moving during concurrent GC
H2O.ai
Machine Intelligence
● Some Choices To Make
● Native Calls – a Deep Dive
● Things that Worked Well
● Hard Things but Worth Doing
● Things I Won't Do Again
● Q&A
(if time, A Deep Dive into Code Unloading)
Agenda
H2O.ai
Machine Intelligence
Summary
● The need to run any code, including native
● Run it fast, well, defensively
● Handle extremes of threading, GC, code
volume
● ---- Be General Purpose ----
● Forces difficult tradeoffs
● Things that are cheap & easy ....
– when you own the whole stack
● Are hard when you don't!
H2O.ai
Machine Intelligence
Open Questions for a Big JVM
● Graph-coloring vs linear-scan allocator?
– I can argue strongly both ways
● 'Right size' for an object header?
– Azul HotSpot: 1 (64bit) word, not two
● Interpreter vs stage0 JIT?
– Tracing stage1 JIT vs Heavy-weight vs multi-tier
● OS Threads vs Roll-Your-Own (e.g. Green)?
● High-Throughput vs Low-Pause GC?
H2O.ai
Machine Intelligence
Fast, Scalable In-Memory Machine and Deep Learning
For Smarter Applications
Bits of Advice
- Deep Secrets
Cliff Click
H2O.ai
Machine Intelligence
Inline Caches
● Wonderful – for fixing a horrible problem
– Virtual calls: both very common and SLOW
● Horrible – for causing a wonderful? problem
● All major JVMs (Sun, IBM, BEA) do it
● “Makes Java as fast as C”
● Or at least, make virtual calls as cheap as static
– A key performance trick
H2O.ai
Machine Intelligence
vs Virtual Calls
● What's wrong with v-calls?
● OFF by default in C++
● ON by default in Java
● So Java sees a zillion more of 'em than C++
● And so MUST be fast or “Java is slow”
● Typical (good) implementation:
– LD / LD / JR
● Dependent loads (no OOO), so cost is:
– 3 + 3 + 25 = 31clks
– Actually 1st
load probably misses in L1 cache 50%
H2O.ai
Machine Intelligence
A Tiny Fast Cache
● But – 95% of static v-calls are to Same Class 'this'
● Means jump to same target address
● Predict 'this' class with 1-entry cache
● On a hit, use static call – NOT jump-register
● Cost is:
– LD / (build class immediate) / CMP / TRAP / CALL
● cmp predicts nicely, OOO covers rest:
– Max(3,1 + 1) = 3
● Now 95% of v-calls nearly same cost as C calls!
H2O.ai
Machine Intelligence
But ....
● On prediction failure, patch code to Do It Right
– (that LD / LD / JR sequence)
● Actually, start with cache empty & patch to fill
– Target chosen lazily – important!
– First call fails prediction
– Patches to 1st
caller's target
● Patching must be safe in presence of racing CPUs
– Always a multi-instruction patch
– CPUs can be pre-empted between any 2 instructions
– Sleep during patching, then awaken and...
– See any 'cut' of the patch sequence
H2O.ai
Machine Intelligence
Code patching
● Fairly easy to patch -
– Empty to static (no prediction needed)
– Empty to predicted (install key/value into cache)
– Predicted to full-lookup
● Impossible to patch:
– (predicted or full-lookup) to empty!
– Must guarantee no thread is mid-call!
● Without a full stop-all-threads at safepoints
– Or some other roll-forwards / roll-backwards scheme
– Very expensive
– So don't clean caches
H2O.ai
Machine Intelligence
Inline Caches
● Means inline caches hold on to busted predictions
– Until we bother to stop and clean
– Or a thread stumbles across it and fails prediction
● Means one piece of JIT'd code holds a pointer to
another – indefinitely!
– At any time a running CPU might load the address
– Keep it in his PC register
– Get context-switched out
– Wake up sometime later....
● Means I cannot trivially remove dead JIT'd code!
H2O.ai
Machine Intelligence
NMethods
● HotSpot jargon for “JIT'd code”
● Complex data structures
● Very heavy with multi-threaded issues
– Hence full of subtle bugs
● More than one active per top-level Java method
● Very complex lifetimes
– Constantly produced (as new classes loaded) so...
– Must be constantly culled as well
H2O.ai
Machine Intelligence
NMethods
● Generated code can be in active use
– CPUs are actively fetching instructions
– Might be OS hard-preempted at ANY point
– Live code in I$
● Return PCs active in frames for long times
– GC must find PCs to find NMethods to find OOP-Maps
● Code is patched
– Inline-caches are VERY common, VERY hot
– Direct calls from one NMethod to another
● Code contains OOPs
– GC is actively moving objects
H2O.ai
Machine Intelligence
NMethods - Data
● The Code
– And relocation info
● Class dependencies
– e.g. invalidate this code when a subclass of X is loaded
● Inline caches
– direct calls from one NMethod to another
● OOP-Maps at Safepoints
– OOPs in code – roots for GC, maybe changed
● Other info
– Deopt maps, exception handling tables, inlining info
H2O.ai
Machine Intelligence
NMethods - Lifetime
● Construction & Installation
● Publish
● Actively Used
● NotEntrant
● Deoptimized
● Zombie
● Flush
H2O.ai
Machine Intelligence
NMethods – Make & Installation
● During construction
– embedded OOPs must be kept alive
– NMethod can not do it
– Need “hand-off” with compiler
● Race during Install with invalidating class load
– Compiler has made assumptions about Classes
– If class gets loaded, must invalidate NMethod
● Maybe need frame adapters
– To map call signature from interpreter to JIT'd layout
H2O.ai
Machine Intelligence
NMethods – Publish
● Make a 'strong ref' from owning Method
● Other running threads can instantly find
– And start executing
– And instantly need GC info
● Must fence appropriately
H2O.ai
Machine Intelligence
NMethods – Active
● Running threads find via calls to owning Method
● Inline-caches directly reference
● PC register in CPUs directly reference
● Stack frame return PCs (hardware stacks!) ref
● GC info needed
● Deoptimization info needed
● Exception handling tables needed
H2O.ai
Machine Intelligence
NMethods – NotEntrant
● No new calls can be made
– But old calls are legit, can complete
● Happens when compiler makes poor (not
illegal) decisions
– And wants to try again after profiling
● Patch entry point with a trap
– Callers hit trap & fixup
– Generally fixup calling code to not call again and
– Find another way to execute
● All data remains alive
H2O.ai
Machine Intelligence
NMethods – Deoptimized
● No OLD calls either
– Class-loading makes existing calls illegal to
continue
● Wipe Code out with NOPs
● Existing nested calls run full speed till return
– And fall thru NOPs into a handler
● No more OOPs in code, no exception handlers
– No OOP-Maps, inline caches, dependency info
● But need deopt info (which may include OOPs)
● Inline caches, I$, CPU PCs still point here!
H2O.ai
Machine Intelligence
NMethods – Zombie
● No more pending Deoptimizations remain
– Generally found by GC at next full Safepoint
● But Inline-caches still point here
– So CPU PCs still point to 1st
instruction
● Must sweep all existing NMethods
– Could be slow, megabytes of JIT'd code
– But done concurrently, parallel
● After sweep, must flush all I$'s
– Could be slow, impacts all running CPUs
H2O.ai
Machine Intelligence
NMethods – Flush
● Remove Code from CodeCache
● Reclaim all other datastructures
● It's Dead Jim, Finally!

More Related Content

What's hot

JDBC: java DataBase connectivity
JDBC: java DataBase connectivityJDBC: java DataBase connectivity
JDBC: java DataBase connectivityTanmoy Barman
 
Introduction to JADE (Java Agent DEvelopment) Framework
Introduction to JADE (Java Agent DEvelopment) FrameworkIntroduction to JADE (Java Agent DEvelopment) Framework
Introduction to JADE (Java Agent DEvelopment) FrameworkAhmed Gad
 
Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)Mudasir Qazi
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Databasepuja_dhar
 
뱅크샐러드 파이썬맛 레시피
뱅크샐러드 파이썬맛 레시피뱅크샐러드 파이썬맛 레시피
뱅크샐러드 파이썬맛 레시피겨울 정
 
Transaction management in DBMS
Transaction management in DBMSTransaction management in DBMS
Transaction management in DBMSMegha Sharma
 
Jdbc architecture and driver types ppt
Jdbc architecture and driver types pptJdbc architecture and driver types ppt
Jdbc architecture and driver types pptkamal kotecha
 
Django, 저는 이렇게 씁니다.
Django, 저는 이렇게 씁니다.Django, 저는 이렇게 씁니다.
Django, 저는 이렇게 씁니다.Kyoung Up Jung
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database DesignArchit Saxena
 
Degree of relationship set
Degree of relationship setDegree of relationship set
Degree of relationship setMegha Sharma
 
토스 이직기 & 적응기 (99Con)
토스 이직기 & 적응기 (99Con)토스 이직기 & 적응기 (99Con)
토스 이직기 & 적응기 (99Con)HyunSeob Lee
 

What's hot (20)

JDBC: java DataBase connectivity
JDBC: java DataBase connectivityJDBC: java DataBase connectivity
JDBC: java DataBase connectivity
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Introduction to JADE (Java Agent DEvelopment) Framework
Introduction to JADE (Java Agent DEvelopment) FrameworkIntroduction to JADE (Java Agent DEvelopment) Framework
Introduction to JADE (Java Agent DEvelopment) Framework
 
Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
 
Corso js and angular
Corso js and angularCorso js and angular
Corso js and angular
 
PHP - Introduction to File Handling with PHP
PHP -  Introduction to  File Handling with PHPPHP -  Introduction to  File Handling with PHP
PHP - Introduction to File Handling with PHP
 
뱅크샐러드 파이썬맛 레시피
뱅크샐러드 파이썬맛 레시피뱅크샐러드 파이썬맛 레시피
뱅크샐러드 파이썬맛 레시피
 
Transaction management in DBMS
Transaction management in DBMSTransaction management in DBMS
Transaction management in DBMS
 
Js ppt
Js pptJs ppt
Js ppt
 
DBMS Part-2.pdf
DBMS Part-2.pdfDBMS Part-2.pdf
DBMS Part-2.pdf
 
Jdbc architecture and driver types ppt
Jdbc architecture and driver types pptJdbc architecture and driver types ppt
Jdbc architecture and driver types ppt
 
Django, 저는 이렇게 씁니다.
Django, 저는 이렇게 씁니다.Django, 저는 이렇게 씁니다.
Django, 저는 이렇게 씁니다.
 
introduction of Java beans
introduction of Java beansintroduction of Java beans
introduction of Java beans
 
UML Diagrams
UML DiagramsUML Diagrams
UML Diagrams
 
Unit 4
Unit 4Unit 4
Unit 4
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Degree of relationship set
Degree of relationship setDegree of relationship set
Degree of relationship set
 
토스 이직기 & 적응기 (99Con)
토스 이직기 & 적응기 (99Con)토스 이직기 & 적응기 (99Con)
토스 이직기 & 적응기 (99Con)
 

Similar to Fast, Scalable In-Memory Machine and Deep Learning

SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...Area41
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
 
Objects? No thanks!
Objects? No thanks!Objects? No thanks!
Objects? No thanks!corehard_by
 
How to build a virtual machine
How to build a virtual machineHow to build a virtual machine
How to build a virtual machineTerence Parr
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Micro control idsecconf2010
Micro control idsecconf2010Micro control idsecconf2010
Micro control idsecconf2010idsecconf
 
Pylons + Tokyo Cabinet
Pylons + Tokyo CabinetPylons + Tokyo Cabinet
Pylons + Tokyo CabinetBen Cheng
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++JetBrains
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoValeriia Maliarenko
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...The Linux Foundation
 
Streaming 101: Hello World
Streaming 101:  Hello WorldStreaming 101:  Hello World
Streaming 101: Hello WorldJosh Fischer
 
Arduino delphi 2014_7_bonn
Arduino delphi 2014_7_bonnArduino delphi 2014_7_bonn
Arduino delphi 2014_7_bonnMax Kleiner
 
Random tips that will save your project's life
Random tips that will save your project's lifeRandom tips that will save your project's life
Random tips that will save your project's lifeMariano Iglesias
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
How to build Open Hardware self-navigating car robot
How to build Open Hardware self-navigating car robotHow to build Open Hardware self-navigating car robot
How to build Open Hardware self-navigating car robotTomáš Jukin
 
One library for all Java encryption
One library for all Java encryptionOne library for all Java encryption
One library for all Java encryptionDan Cvrcek
 

Similar to Fast, Scalable In-Memory Machine and Deep Learning (20)

SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
Objects? No thanks!
Objects? No thanks!Objects? No thanks!
Objects? No thanks!
 
How to build a virtual machine
How to build a virtual machineHow to build a virtual machine
How to build a virtual machine
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Micro control idsecconf2010
Micro control idsecconf2010Micro control idsecconf2010
Micro control idsecconf2010
 
Pylons + Tokyo Cabinet
Pylons + Tokyo CabinetPylons + Tokyo Cabinet
Pylons + Tokyo Cabinet
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
 
Streaming 101: Hello World
Streaming 101:  Hello WorldStreaming 101:  Hello World
Streaming 101: Hello World
 
Arduino delphi 2014_7_bonn
Arduino delphi 2014_7_bonnArduino delphi 2014_7_bonn
Arduino delphi 2014_7_bonn
 
Random tips that will save your project's life
Random tips that will save your project's lifeRandom tips that will save your project's life
Random tips that will save your project's life
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
How to build Open Hardware self-navigating car robot
How to build Open Hardware self-navigating car robotHow to build Open Hardware self-navigating car robot
How to build Open Hardware self-navigating car robot
 
One library for all Java encryption
One library for all Java encryptionOne library for all Java encryption
One library for all Java encryption
 

Recently uploaded

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Recently uploaded (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

Fast, Scalable In-Memory Machine and Deep Learning

  • 1. H2O.ai Machine Intelligence Fast, Scalable In-Memory Machine and Deep Learning For Smarter Applications Bits of Advice For the VM Writer Cliff Click
  • 2. H2O.ai Machine Intelligence Who Am I? Cliff Click CTO, Co-Founder H2O.ai cliffc@h2o.ai 40 yrs coding 35 yrs building compilers 30 yrs distributed computation 20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware, NonBlockingHashMap 20 patents, dozens of papers 100s of public talks PhD Computer Science 1995 Rice University HotSpot JVM Server Compiler “showed the world JITing is possible”
  • 3. H2O.ai Machine Intelligence Some History of HotSpot ● HotSpot started as a Self VM ported to Java – Simple GC, simple JIT, single-threaded, X86 only ● I've been working on HotSpot for ~20 yrs now ● Watched HotSpot become 'thread-robust' ● Many ports: X86, Sparc, IA64, X86-64, MIPS, Azul ● Pick up 2 new compilers (C1 & C2, -client & -server) ● Pick up many new GC's (serial, train, parallel, CMS, G1, Azul's Pauseless) ● Other Cool Stuff: reflection as bytecodes, thin locks, JMM, JNI, JVM{DI,TI,PI}, ....
  • 4. H2O.ai Machine Intelligence Nature of This Talk ● Much hard experience learned building HotSpot – Too much info for one talk! ● Many topics are crucial for – Ultimate speed, power, or footprint – Engineering complexity & cost ● Interlock badly, in non-obvious ways ● I'm gonna breeze through! (limits of time) – Happy to stop for questions – ASK: because if your confused, so is the next person
  • 5. H2O.ai Machine Intelligence Agenda ● Some Choices To Make ● Native Calls – a Deep Dive ● Things that Worked Well ● Hard Things but Worth Doing ● Things I Won't Do Again ● Q&A (if time, A Deep Dive into Code Unloading)
  • 6. H2O.ai Machine Intelligence VMs are Big Complex Beasties ● Or Really Really Small (e.g., Squawk, OOVM) ● Depends on the Feature Set ● Many features interact in Bad Ways – Usually interactions not obvious ● I went the Big Desktop/Server route – Very different choices from the cell-phone guys – Must solve a different set of problems ● I suspect most of this applies to .Net as well
  • 7. H2O.ai Machine Intelligence • Portable or not ─ Native code vs JIT'd code ─ Calling conventions, Endianess ─ Threads, POSIX, stacks • Footprint ─ Embedded (1G), desktop (32G+), server (16G-256G) • X86 vs RISC (ARM? GPU? DSP?) • JIT or Interpret (or no Interpreter!) • Multi-threaded ─ Cooperative vs preemption • Multi-CPU Some Choices to make...
  • 8. H2O.ai Machine Intelligence • Interpreter • Simple, software only ─ Pure C ─ GCC label vars (about 2x faster than pure C) ─ Pure ASM (about 2x faster again) • Hardware support (ARM jazelle, pico-Java?) • Fancy: inline & schedule dispatch ─ Seen at least 2 different ways to slice this • Requires stack oriented layout, bad for JITs • or none at all... Interpreter Choices...
  • 9. H2O.ai Machine Intelligence • JIT ─ None? (gives up peak performance) ─ stage0, template style? ─ stage1, light optimization, linear-scan allocator? ─ stage2, all the optimizations, graph-coloring? • Portable or targeted (usual X86 + Vendor)? • Intercalls with native? • Mixed-mode with Interpreter? ─ Also messes with calling convention • Class loading vs inlining non-final methods? Some JIT choices to make...
  • 10. H2O.ai Machine Intelligence More JIT choices to make... • Stage-0 JIT + No Interpreter • Template generation, very fast low quality code • No funny stack layouts • Easy calling conventions for all • Still slower than just interpreting run-once code • Lots of run-once code at startup • Lots of bulky code to startup ─ Big footprint ─ Can throw away & regenerate
  • 11. H2O.ai Machine Intelligence Some GC choices to make... • GC ─ Simple? (single generation, StopTheWorld?) ─ Fast? (Throughput? Low pause?) ─ Exact? (allows moving objects, compaction, de-frag, bump pointer allocation) ─ Conservative? (No need to track all objects) ─ Tried & true? (e.g. Mark/Sweep) ─ Fancy new algorithm? • Parallel? (really hard!) • Concurrent? (really really hard!) • Both? (really4 hard!)
  • 12. H2O.ai Machine Intelligence More GC choices... ● Stop-Anywhere vs Safepoints ● Stop-Anywhere – OOP Maps at each PC? (bulky data) – Interpret instructions? (hard/slow on X86) – Fixed OOP registers & stack areas? (bad register usage) ● Safepoints – Cooperative? (polling costs?) ● Polling in software vs hardware? – Preempt at wrong place? – Roll forward vs step forward vs interpret?
  • 13. H2O.ai Machine Intelligence Threading Issues • Multi-threading costs ─ All operations no longer atomic ─ May be preempted inconveniently ─ Need locking (never needed it before!) • Threads can block for I/O, or deadlock • GC requires SP (& usually PC) for stack roots • Hard to find the SP+PC of remote thread ─ Common problem of OS's ─ ! Very surprising to me, until I tried it ! ─ Fails (w/low frequency) on many “robust” OSes! ─ Caught in page fault handler, nested in tlb handler, nested in stack overflow handler nested in ....
  • 14. H2O.ai Machine Intelligence Multiple CPU Issues • Multi-CPUs? ─ Now need Atomic operations ─ Coherency, memory fences ─ Low-frequency data race bugs • Need scale-able locks in JVM ─ Not just correctness locks • Need scale-able locks for Java/JIT'd code ─ Spinning & retries ─ 'fair' locks under super high contention • GC can be in-progress when a thread awakens ─ Threads now need to take a form of GC lock to run
  • 15. H2O.ai Machine Intelligence 64-bit Math Choices... • Long (64bit int) math vs JIT ─ Major user is BigInteger package ─ Called by crypto routines ─ Called by web services everywhere • BigInteger usage is as a 'pair of ints w/carry' ─ Lots of masking to high or low halves ─ Lots of shift-by-32 • Optimizes really well as a pair of ints ─ All those mask/shifts turn into simple register selection ─ Better code on 32-bit machines by far (X86!) ─ Almost tied with “good” 64b code on 64-bit machines ─ (add+addc vs add8) addadd addcaddc H L H L
  • 16. H2O.ai Machine Intelligence ● Some Choices To Make ● Native Calls – a Deep Dive ● Things that Worked Well ● Hard Things but Worth Doing ● Things I Won't Do Again ● Q&A (if time, A Deep Dive into Code Unloading) Agenda
  • 17. H2O.ai Machine Intelligence Native Calls: A Deep Dive save 64 # register window push mov i0,o0 # Move incoming arg # o0 holds live OOP at call call foo # The Native Call nop # (really fill delay slot) mov o0,i0 # Move outgoing arg return # restore # pop sparc window ● Ideally: simple call – Only true for fixed application set – Often holds true for cell phones, small embedded ● Reality: Too complex, wants own frame ● Example: native Object foo( double d );
  • 18. H2O.ai Machine Intelligence The Argument Shuffle ● First problem: JIT and Native calling conventions – Eg: SparcV8 passes float in int regs – Java normally floats passed in float regs – No Java varargs or prototype-less code ● Want fast argument shuffle – Auto-generate asm shuffle code from sig std d0,[sp+64] # float args passed in int regs # 'this' pointer is in o0 ldw [sp+64],o1 # misaligned, need 2 loads ldw [sp+68],o2 # double now in o1/o2
  • 19. H2O.ai Machine Intelligence Handlizing OOP Arguments ● Cannot pass in OOPs to Native – Lest GC be unable to find them – (only for moving collector) – Handlize OOPs – part of arg shuffle code – Need an OOP-map as well: [sp+72] live across call set 0,o0 # assume null arg beq i0,skip # handle of null is null stw i0,[sp+72] # handlize 'this' add sp,72,o0 # ptr to 'this' passed in o0 skip: ● Reverse for returning an OOP after the call beq o0,is_null ldw [o0],o0 # de-handlize return value is_null:
  • 20. H2O.ai Machine Intelligence Synchronized Native ● May need locking, i.e. synchronized keyword ldw [i0+0],l1 # Load 'this' header word or l1,2,l2 # set locked-bit stw l2,[sp+76] # save header on stack cas [i0],l1,l2 # Attempt to fast-lock cmp l1,l2 # success or fail bne slow_lock # CAS failed? Recursive lock? # not shown: inline recursive-lock handling # Both [sp+72] and i0 hold a live OOP across call # [sp+76] holds 'displaced header' – needed for inflation call foo # The Native Call nop # (really: fill in delay slot) cas [i0],l2,l1 # Attempt to fast-unlock cmp l1,l2 # bne slow_unlock # (awake waiting threads)
  • 21. H2O.ai Machine Intelligence Allowing Stack Crawls ● Next problem: Native code may block – We must allow a GC (if multi-threaded) – But GC needs SP+PC to crawl stack – Which portable OS + native compiler won't tell us ● So store SP/PC before calling native – Storing SP is also trigger that allows GC setlo #ret_pc,l0 sethi #ret_pc,l0 # set 32-bit PC before allow GC stw l0,[g7+&pc] # Rely on Sparc TSO here (IA64!) stw sp,[g7+&sp] # Enable GC from here on call foo # The Native Call!!! nop # (really fill delay slot)
  • 22. H2O.ai Machine Intelligence Returning while GC-in-Progress ● Next: Native code returns while GC active – Must block until GC completes – No good to spin (eats CPU) ● Similar to 'lock acquire' ● Requires real Atomic operation setlo #ret_pc,l0 sethi #ret_pc,l0 # set 32-bit PC before allow GC stw l0,[g7+&pc] # Rely on Sparc TSO here (IA64!) stw sp,[g7+&sp] # Enable GC from here on call foo # The Native Call!!! nop # (really fill delay slot) ret_pc: add g7+&sp,l0 # CAS does not take an offset cas g0,sp,[l0] # if SP still there, store 0 to disable gc bne gc_in_progress
  • 23. H2O.ai Machine Intelligence Odds-N-Ends ldw [g7+&gjni_sp],l3 stw g0,[l3+&top] .... call foo nop # delay slot stw g0,[l3+&top] ● Need JNIEnv* argument, really offset from thread ● Reset temp handles before and after ● Profiling tags (optional) # JNIEnv* is 1st arg, shuffle others to o1,o2,etc... add g7,&jnienv_offset,o0
  • 24. H2O.ai Machine Intelligence A Sample Native Call save 64 # register window push add g7,&jnienv_offset,o0 # JNIEnv* in o0 set 0,o1 # assume null arg beq i0,skip # handle of null is null stw i0,[sp+72] # handlize 'this' add sp,72,o1 # ptr to 'this' passed in o1 skip: std d0,[sp+64] # float args passed in int regs ldd [sp+64],o2 # double now in o2/o3 setlo #ret_pc,l0 sethi #ret_pc,l0 # set 32-bit PC before allow GC stw l0,[g7+&pc] # Rely on Sparc TSO here (IA64!) stw sp,[g7+&sp] # Enable GC from here on call foo # The Native Call ret_pc: add g7+&sp,l0 # CAS does not take an offset cas 0,sp,[l0] # if SP still there, store 0 to disable gc bne gc_in_progress set 0,i0 # assume null result beq o0,is_null ldw [o0],i0 # de-handlize return value is_null: return # restore # pop sparc window The Call }Stack crawl GC lock }GC unlock }Handles, GC-unsafe natives }De-Handlize }Arg-shuffle JIT vs native convention
  • 25. H2O.ai Machine Intelligence ● Some Choices To Make ● Native Calls – a Deep Dive ● Things that Worked Well ● Hard Things but Worth Doing ● Things I Won't Do Again ● Q&A (if time, A Deep Dive into Code Unloading) Agenda
  • 26. H2O.ai Machine Intelligence Safepoints, Preemption & Polling ● Safepoint notion – Easy for Server compiler to track, optimize – Good optimization – Threads stop at 'convenient' spots – Threads do many self-service tasks ● Self-stack is hot in local CPU cache ● Polling for Safepoints – Software polling not so expensive ● Cooperative Preemption – most stopped threads already Safepointed
  • 27. H2O.ai Machine Intelligence Heavy weight JIT Compiler • Heavy weight compiler ─ Needed for peak performance ─ Can be really heavyweight and still OK ─ Loop optimizations (unrolling, peeling, invariant motion) ─ Actually plenty cheap and payoff well • C2's Graph IR ─ Very non-traditional ─ But very fast & light ─ And very easy to extend • C2's Graph-Coloring Allocator ─ Robust in the face of over-inlining
  • 28. H2O.ai Machine Intelligence Portable Stack Manipulation ● Portable stack crawling code – Need SP & PC – Need notion of 'next' frame, 'no more frames' ● Frame iterator – Works for wide range of CPUs and OSs ● Less-portable bits: – Flush register windows ● Must lazy flush & track flushing – Two kinds of stack on IA64, Azul ● Frame adapters for mixing JIT, interpreter – Custom asm bits for reorganizing call args – Really cheap, once you figure it out
  • 29. H2O.ai Machine Intelligence The Code Cache, Debugging ● CodeCache notion – All code in same 4Gig space – Only need a 32-bit PC everywhere – All calls use 'cheap' local call – Big savings on X86 (&Sparc, RISC) vs 'far' call ● BlahBlahBlah-ALot debugging flags – SafepointALot, CompileALot, GCAlot, etc... – Stress all sorts of interesting things – Easy for QA to run big apps long time w/flags – Catches zillions of bugs, usually quite easily
  • 30. H2O.ai Machine Intelligence Thin Locks ● Thin-lock notion – HotSpot's, Bacon-bits, whatever – Key: single CAS on object word to own lock – CAS on unlock hardly matters; it's all cache-hot now ● Actually, want a thinner-lock: – Thread speculative 'owns' lock until contention – No atomic ops, ever (until contention) – Pain-in-neck to 'steal' to another thread – BUT toooo many Java locks never ever contend ● JMM worked out nicely as well
  • 31. H2O.ai Machine Intelligence ● Some Choices To Make ● Native Calls – a Deep Dive ● Things that Worked Well ● Hard Things but Worth Doing ● Things I Won't Do Again ● Q&A (if time, A Deep Dive into Code Unloading) Agenda
  • 32. H2O.ai Machine Intelligence Porting to Many CPUs+OSs ● Portable – Sparc (windows, RISC) – X86 (CISC, tiny register set) – Both endianess ● System more robust for handling all flavors – Requires better code discipline – Separates out idea from implementation better ● No middle compilation tier – HS has needed a middle tier for years
  • 33. H2O.ai Machine Intelligence Deoptimization ● No runtime cost to inline non-finals ● No runtime cost if you DO override code – Must recompile of course ● Must flip compiled frame into interpreted frame ● Not Rocket Science – But darned tricky to get right – Only HS does it ● Others pay a 'no-hoisting' cost at runtime
  • 34. H2O.ai Machine Intelligence Self-Modifying Code ● Code Patching – Inline-caches – “Not-entrant” code ● Patch in the face of racing Java threads, of course – Must be legit for Java threads to see partial patches ● Almost easy on RISC – Still must do I-cache shoot-down ● Pain on X86 – Variable-size (does it fit?) – Instructions span cache-lines – No atomic update
  • 35. H2O.ai Machine Intelligence HLL Assembler ● Hand ASM in HLL – Turns out need lots of hand ASM ● Want tight integration to runtime & VM invariants – External ASM doesn't provide this ● Fairly easy to make ASM 'look' like ASM – But actually valid HLL code (C, Java) – Which, when run, emits ASM to a buffer – And proves invariants, and provides other support
  • 36. H2O.ai Machine Intelligence 64b Object Header ● Single word on 64-bit VM ● Large memory savings plus speed in caches ● Needs dense KlassID vs 64b KlassPtr ● Thread ID for locking, speculative locking ● HashCode – want all 32bits when hashing more than a few billion objects
  • 37. H2O.ai Machine Intelligence Dense Thread ID ● Align stacks on 2Mb boundary – Stack overflow/underflow: TLB 4K page protection – Protect whole stack for various GC asserts ● Mask SP to get Thread ptr; shift for Thread ID – Plus Thread Local Storage for VM ● TID in Object headers for locking ● Thread ptr very common in core VM code – Must be fast
  • 38. H2O.ai Machine Intelligence Safepointing Single Threads ● Software polling of single TLS word – 1-clock on X86, predicted branch, L1 cache hit load ● Set a bit to stop a thread at safepoint ● Thread does many self-service tasks – Crawl self stack for GC roots, GC phase flip – Install remote exception, or stack overflow – Revoke biased lock – Debugger hooks; stop/start/conditional breakpoints – Clean inline caches – Cooperative preemption at safepoint
  • 39. H2O.ai Machine Intelligence ● Some Choices To Make ● Native Calls – a Deep Dive ● Things that Worked Well ● Hard Things but Worth Doing ● Things I Won't Do Again ● Q&A (if time, A Deep Dive into Code Unloading) Agenda
  • 40. H2O.ai Machine Intelligence Things I won't do again... ● Write a VM in C/C++ – Java plenty fast now – Mixing OOPS in a non-GC language a total pain – Forgetting 'this' is an OOP ● Across a GC-allowable call – Roll-your-own malloc pointless now ● C2's BURS patterning-matching – Harkens back to VAX days – Never needed on RISC – Not needed on X86 for a long time now – Adds an extra indirection in the JIT engineering
  • 41. H2O.ai Machine Intelligence Things I won't do again... ● Patch & roll-forward Safepoints – Hideously complex – Very heavy-weight to 'safepoint' a single thread – Multiple OS suspend/resumes – Patching non-trivial – Required duplicate code ● And so required dup PC handling throughout VM ● Generic callee-save registers – Real mess to crawl stacks and track – X86 doesn't need them – Register windows work fine, both variable and fixed – Only PPC & ARM common+many regs+no windows
  • 42. H2O.ai Machine Intelligence Things I won't do again... ● Adapter frames – For intercalling JIT'd code and interpreter – Contrast to 'frame adapters' – These leave a frame on stack – Extra frame screws up all kinds of stack crawls – Occasionally end up with unbounded extra frames ● No adapter frames means: interpreter & JIT'd code must agree on return value register ● Only an issue for SparcV8 long values
  • 43. H2O.ai Machine Intelligence Things I won't do again... ● Constant OOPs in code – Looks good on X86 as 32-bit immediate – Split instructions on Sparc, PPC, other RISC – Moving collector requires patching the constant ● Multi-instruction patch? – Must patch with all threads stopped – Requires Stop-The-World pause in GC ● Better answer: pay to load from table every time – Easy to schedule around – Concurrent GC – Fewer instructions, especially for 64-bit VMs – No patching, no tracking of where OOP is
  • 44. H2O.ai Machine Intelligence Things I won't do again... ● Lock'd object header in stack – Means no tracking recursion count – Means total pain when inserting hashcode, – or inflating lock due to contention, – or moving during concurrent GC
  • 45. H2O.ai Machine Intelligence ● Some Choices To Make ● Native Calls – a Deep Dive ● Things that Worked Well ● Hard Things but Worth Doing ● Things I Won't Do Again ● Q&A (if time, A Deep Dive into Code Unloading) Agenda
  • 46. H2O.ai Machine Intelligence Summary ● The need to run any code, including native ● Run it fast, well, defensively ● Handle extremes of threading, GC, code volume ● ---- Be General Purpose ---- ● Forces difficult tradeoffs ● Things that are cheap & easy .... – when you own the whole stack ● Are hard when you don't!
  • 47. H2O.ai Machine Intelligence Open Questions for a Big JVM ● Graph-coloring vs linear-scan allocator? – I can argue strongly both ways ● 'Right size' for an object header? – Azul HotSpot: 1 (64bit) word, not two ● Interpreter vs stage0 JIT? – Tracing stage1 JIT vs Heavy-weight vs multi-tier ● OS Threads vs Roll-Your-Own (e.g. Green)? ● High-Throughput vs Low-Pause GC?
  • 48. H2O.ai Machine Intelligence Fast, Scalable In-Memory Machine and Deep Learning For Smarter Applications Bits of Advice - Deep Secrets Cliff Click
  • 49. H2O.ai Machine Intelligence Inline Caches ● Wonderful – for fixing a horrible problem – Virtual calls: both very common and SLOW ● Horrible – for causing a wonderful? problem ● All major JVMs (Sun, IBM, BEA) do it ● “Makes Java as fast as C” ● Or at least, make virtual calls as cheap as static – A key performance trick
  • 50. H2O.ai Machine Intelligence vs Virtual Calls ● What's wrong with v-calls? ● OFF by default in C++ ● ON by default in Java ● So Java sees a zillion more of 'em than C++ ● And so MUST be fast or “Java is slow” ● Typical (good) implementation: – LD / LD / JR ● Dependent loads (no OOO), so cost is: – 3 + 3 + 25 = 31clks – Actually 1st load probably misses in L1 cache 50%
  • 51. H2O.ai Machine Intelligence A Tiny Fast Cache ● But – 95% of static v-calls are to Same Class 'this' ● Means jump to same target address ● Predict 'this' class with 1-entry cache ● On a hit, use static call – NOT jump-register ● Cost is: – LD / (build class immediate) / CMP / TRAP / CALL ● cmp predicts nicely, OOO covers rest: – Max(3,1 + 1) = 3 ● Now 95% of v-calls nearly same cost as C calls!
  • 52. H2O.ai Machine Intelligence But .... ● On prediction failure, patch code to Do It Right – (that LD / LD / JR sequence) ● Actually, start with cache empty & patch to fill – Target chosen lazily – important! – First call fails prediction – Patches to 1st caller's target ● Patching must be safe in presence of racing CPUs – Always a multi-instruction patch – CPUs can be pre-empted between any 2 instructions – Sleep during patching, then awaken and... – See any 'cut' of the patch sequence
  • 53. H2O.ai Machine Intelligence Code patching ● Fairly easy to patch - – Empty to static (no prediction needed) – Empty to predicted (install key/value into cache) – Predicted to full-lookup ● Impossible to patch: – (predicted or full-lookup) to empty! – Must guarantee no thread is mid-call! ● Without a full stop-all-threads at safepoints – Or some other roll-forwards / roll-backwards scheme – Very expensive – So don't clean caches
  • 54. H2O.ai Machine Intelligence Inline Caches ● Means inline caches hold on to busted predictions – Until we bother to stop and clean – Or a thread stumbles across it and fails prediction ● Means one piece of JIT'd code holds a pointer to another – indefinitely! – At any time a running CPU might load the address – Keep it in his PC register – Get context-switched out – Wake up sometime later.... ● Means I cannot trivially remove dead JIT'd code!
  • 55. H2O.ai Machine Intelligence NMethods ● HotSpot jargon for “JIT'd code” ● Complex data structures ● Very heavy with multi-threaded issues – Hence full of subtle bugs ● More than one active per top-level Java method ● Very complex lifetimes – Constantly produced (as new classes loaded) so... – Must be constantly culled as well
  • 56. H2O.ai Machine Intelligence NMethods ● Generated code can be in active use – CPUs are actively fetching instructions – Might be OS hard-preempted at ANY point – Live code in I$ ● Return PCs active in frames for long times – GC must find PCs to find NMethods to find OOP-Maps ● Code is patched – Inline-caches are VERY common, VERY hot – Direct calls from one NMethod to another ● Code contains OOPs – GC is actively moving objects
  • 57. H2O.ai Machine Intelligence NMethods - Data ● The Code – And relocation info ● Class dependencies – e.g. invalidate this code when a subclass of X is loaded ● Inline caches – direct calls from one NMethod to another ● OOP-Maps at Safepoints – OOPs in code – roots for GC, maybe changed ● Other info – Deopt maps, exception handling tables, inlining info
  • 58. H2O.ai Machine Intelligence NMethods - Lifetime ● Construction & Installation ● Publish ● Actively Used ● NotEntrant ● Deoptimized ● Zombie ● Flush
  • 59. H2O.ai Machine Intelligence NMethods – Make & Installation ● During construction – embedded OOPs must be kept alive – NMethod can not do it – Need “hand-off” with compiler ● Race during Install with invalidating class load – Compiler has made assumptions about Classes – If class gets loaded, must invalidate NMethod ● Maybe need frame adapters – To map call signature from interpreter to JIT'd layout
  • 60. H2O.ai Machine Intelligence NMethods – Publish ● Make a 'strong ref' from owning Method ● Other running threads can instantly find – And start executing – And instantly need GC info ● Must fence appropriately
  • 61. H2O.ai Machine Intelligence NMethods – Active ● Running threads find via calls to owning Method ● Inline-caches directly reference ● PC register in CPUs directly reference ● Stack frame return PCs (hardware stacks!) ref ● GC info needed ● Deoptimization info needed ● Exception handling tables needed
  • 62. H2O.ai Machine Intelligence NMethods – NotEntrant ● No new calls can be made – But old calls are legit, can complete ● Happens when compiler makes poor (not illegal) decisions – And wants to try again after profiling ● Patch entry point with a trap – Callers hit trap & fixup – Generally fixup calling code to not call again and – Find another way to execute ● All data remains alive
  • 63. H2O.ai Machine Intelligence NMethods – Deoptimized ● No OLD calls either – Class-loading makes existing calls illegal to continue ● Wipe Code out with NOPs ● Existing nested calls run full speed till return – And fall thru NOPs into a handler ● No more OOPs in code, no exception handlers – No OOP-Maps, inline caches, dependency info ● But need deopt info (which may include OOPs) ● Inline caches, I$, CPU PCs still point here!
  • 64. H2O.ai Machine Intelligence NMethods – Zombie ● No more pending Deoptimizations remain – Generally found by GC at next full Safepoint ● But Inline-caches still point here – So CPU PCs still point to 1st instruction ● Must sweep all existing NMethods – Could be slow, megabytes of JIT'd code – But done concurrently, parallel ● After sweep, must flush all I$'s – Could be slow, impacts all running CPUs
  • 65. H2O.ai Machine Intelligence NMethods – Flush ● Remove Code from CodeCache ● Reclaim all other datastructures ● It's Dead Jim, Finally!