Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The	good,	the	good	enough,	and	the	
things	we	wish	we	had	done	better
Lessons	from	a	production	JVM	runtime	developer
Mark...
Lessons	from	a	Production	JVM	
Runtime	Developer
The	good,	the	good	enough,	and	the	things	we	wish	we	had	done	better
Mark...
3
Important disclaimers
• THE	INFORMATION	CONTAINED	IN	THIS	PRESENTATION	IS	PROVIDED	FOR	INFORMATIONAL	PURPOSES	ONLY.
• WH...
Complete	implementation
Robust	quality	even	under	high	stress
Scalable	performance
Reliable	and	stable	service
Wide	variet...
Sounds	pretty	constrained
It	is,	but	it’s	still	possible	to	innovate
5
J9:	Production	JVM	for	~18	years,	still	vibrant
• Java	(SE)	releases:	Java	1.4.2,	5.0,	6,	6.1,	7,	7.1,	8,	Java.next
• Some...
Production	Runtime:	J9	JVM
Calls to
C
libraries
7Operating system
Native
applications
OS-specific calls
Virtual machine
Ga...
Two	open	source	projects	from	J9	JVM!
Open	
JDK
HotSpot
Eclipse	OMR
Open	
JDK
Open	J9
OMR
Open	
JDK
Open	J9
OMR
Proven	ada...
Eclipse	OMR	Mission
Build	an	open	reusable	language	runtime	foundation	for	the	cloud
• To	accelerate	cloud	platform	advanc...
Eclipse	OMR	technology	components
port platform	abstraction	(porting)	library
thread cross	platform	pthread-like	threading...
Lessons
(finally!)
11
Lesson	#1
Build	a	platform	port	and	thread	libraries
(keep	the	“ifdef soup”	in	one	place)
Easiest	if	you	start	with	more	t...
OMR	platform	porting	library:	omr/port
• Cross	platform	(Linux,	OSX,	Windows,	AIX,	zOS,	etc.)	“thin”	wrap	for:
• Time,	Pro...
OMR	thread	library:	omr/thread
• Cross	platform	(Linux,	OSX,	Windows,	AIX,	zOS,	etc.)	library	for	threads	
and	synchroniza...
Lesson	#2
Create	a	fast	event	pub/sub	framework
e.g.	JIT	can	listen	for	class	(un)load	and	class	
loader	events
e.g.	build...
• Create	event	descriptions	(including	parameters)	in	XML
• Hookgen tool	generates	headers	automatically
• Easy	to	trigger...
Lesson	#3
Invest	in	diagnostic	tools
Don’t	just	debug	with	printf’s
And	grub	around	in	core	files
17
OMR	diagnostics
• DDRGen:	builds	a	representation	of	internal	VM	data	structures
• Read	dwarf/debug	output	generated	by	co...
Lesson	#4
Do	not	only	use	language	level	tests
For	all	components,	but	especially	for	the	JIT
Too	many	variables	not	under...
Lesson	#5
Please	don’t	build	yet	another	GC
There	are	way	too	many	of	them	already!
And	they’re	all	different
L
20
OMR	Garbage	Collector:	omr/gc
• Highly	parallel,	scalable	garbage	collector
• Exploits	multiple	cores
• Balances	work	for	...
“Lesson”	#6
There’s	a	new	open	source	JIT	in	town	in	OMR
60+	optimizations	and	analyses
Create	your	own	optimization	seque...
Final	bit	of	advice:
JIT	and	interpreter	have	each	other’s	backs
JIT:	make	simple	stuff	fast,	let	interpreter	do	hard	stuf...
Pilot	projects	using	OMR	in	existing	runtimes
• Use	port,	thread,	hook,	GC,	JIT,	and	method	profiling	from	OMR
• Capabilit...
Method	Profiling	for	Ruby
25
Scalable	high	performance	Garbage	Collection
<cycle-start id="2" type="global" contextid="0" timestamp="2015-08-05T17:21:5...
Same	GC	visualization	&	insight	for	all	languages
27
Proof point: Just in Time Compilers
• CRuby, CPython, SOM++ do not have JIT compilers
• Our efforts to date have high focu...
Speedup	Relative	to	Interpreter
Ruby	Bench9000	Micro	Benchmarks
3x
2x
1x
29
Open	J9	is	also	coming!
We’re	working	on	it	around	our
next	IBM	SDK	for	Java	release
30
Connecting	production	runtimes	and	research
• OMR	and	Open	J9:	production	runtime	technology	in	open	source	projects
• IBM...
Interesting	links	and	contact	points
• Mark	Stoodley mstoodle@ca.ibm.com @mstoodle
• Mailing	List omr-dev@eclipse.org
• Si...
Upcoming SlideShare
Loading in …5
×

VMIL keynote : Lessons from a production JVM runtime developer

614 views

Published on

Lessons about production language runtime design and information about the Eclipse OMR project, used to build all kinds of language runtimes.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

VMIL keynote : Lessons from a production JVM runtime developer

  1. 1. The good, the good enough, and the things we wish we had done better Lessons from a production JVM runtime developer Mark Stoodley “Production JVM Runtime Developer” at IBM Project co-lead for Eclipse OMR
  2. 2. Lessons from a Production JVM Runtime Developer The good, the good enough, and the things we wish we had done better Mark Stoodley “Production JVM (J9) Runtime Developer” at IBM Project co-lead for Eclipse OMR
  3. 3. 3 Important disclaimers • THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. • WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. • ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES. • ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. • IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. • IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. • NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: – CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS
  4. 4. Complete implementation Robust quality even under high stress Scalable performance Reliable and stable service Wide variety of deployed workloads 4 Some Production Runtime Characteristics
  5. 5. Sounds pretty constrained It is, but it’s still possible to innovate 5
  6. 6. J9: Production JVM for ~18 years, still vibrant • Java (SE) releases: Java 1.4.2, 5.0, 6, 6.1, 7, 7.1, 8, Java.next • Some technology highlights : – Cooperative suspend (1999) – Diagnostic abilities: e.g. limit files, per method options (1999) – Full optimization while supporting type accurate GC (1999) – AOT (rom-able) compilation for Java (1999) – Adaptive compilation (cold, warm, hot, very hot, scorching) (1999) – Aggressive runtime native code patching (2000) – Invocation and time-based compilation triggers (2000) – JIT profiling infrastructure and optimizations (2001) – Speculative class hierarchy based inlining and optimization (2001) – Fairly complete set of classical compiler optimizations and dataflow analyses (2001) – Java-specific optimizations like ”check” removal (2001) – Java debug support (2001) – Escape analysis and stack allocation (2001) – Automatic lock coarsening (2002) – Multiple code caches (2005) – Real-time Specification for Java (AOT and JIT) (2005) – Shared classes (2005) – Asynchronous compilation (2006) – Interpreter profiling (2006) – Dynamic AOT compilation for Java (2006) – Hot Code Replacement support (2007) – Compressed references (2007) – Multiple compilation threads (2010) – On stack replacement (2013) – Transactional Memory (2013) – Packed objects (2013) – Multitenancy (2013) – Auto SIMD (2014) – Auto GPU (2014) – Heuristic tuning and retuning (1999– ongoing) • Performance metrics that have been or are actively tracked : – Latency (elapsed time) – Throughput (operations / sec) – Start-up time – Ramp-up time – CPU consumption – Resource consumption at idle – Compilation time – Memory footprint – JIT library size – Incremental pauses • Diagnostic facilities – Direct Dump Reader, Snap files, verbose trace files, -Xtrace, -Xdump, … – JIT logs, JIT limit files, per-method JIT option sets – Core analyzer tool – Health Center, GC Memory Visualizer, Memory Analysis tool, … • Hardware platforms that are or have been supported : – ME: ARM32, X86(IA32), MIPS, POWER, SH4 – 32-bit SE: ARM, POWER, X86, Z – 64-bit SE: POWER, X86, Z – Hard real-time (RTSJ compliant): IA32 • Hardware exploitation highlights: – Efficient CPU instruction sequences – Managing different kinds of hardware registers – Exploiting hardware data type support – Cryptographic, compression acceleration – Character conversion loop recognition and acceleration – Atomic locking and other synchronization optimization – Simultaneous Multi Threading – Transactional Memory – SIMD (Single instruction multiple data) – GPU (Graphics processing unit) 6
  7. 7. Production Runtime: J9 JVM Calls to C libraries 7Operating system Native applications OS-specific calls Virtual machine Garbage collector Interpreter Exception handler Class loader Pluggable components that dynamically load into the virtual machine Thread model JVM Profiler Debugger Port Library (file IO, sockets, memory allocation) Uses one of many Java platform configurations JCL natives JNIJava calls JNI, INL, Fastcall TR JIT VM Interface Zip, fdlibm Java VM Classes SE 8 SE 7 SE 6 SE 5 CDC MIDP CLDC
  8. 8. Two open source projects from J9 JVM! Open JDK HotSpot Eclipse OMR Open JDK Open J9 OMR Open JDK Open J9 OMR Proven adaptable technology in the open for rapid innovation and collaboration across multiple language communities Open JDK IBM SDK for Java Java community open innovation and collaboration, deep platform exploitation for X86 & IBM hardware platforms (OpenPOWER, Linux ONE) Ruby? OMR Communities Beyond Java: Eclipse OMR COBOL PL/IEmulator Python? OMR SOM? OMR Invent Your Own Language! Long term support, quick response for problems, and other forms of IBM customer specific engagement + IBM isms GC JITDiag Port 8
  9. 9. Eclipse OMR Mission Build an open reusable language runtime foundation for the cloud • To accelerate cloud platform advancement and innovation • In full cooperation with existing language communities • Via a diverse community of people interested in language runtimes • Professional developers • Researchers • Students • Hobbyists 9
  10. 10. Eclipse OMR technology components port platform abstraction (porting) library thread cross platform pthread-like threading library vm APIs to manage per-interpreter and per-thread contexts gc garbage collection framework for managed heaps compiler extensible compiler framework jitbuilder WIP project to simplify bring up for a new JIT compiler omrtrace library for publishing trace events for monitoring/diagnostics omrsigcompat signal handling compatibility library example demonstration code to show how a language runtime might consume OMR components, also used for testing fvtest language independent test framework built on the example glue so that components can be tested outside of a language runtime, uses Google Test 1.7 framework + a few others ~800KLOC at this point, more components coming! 10
  11. 11. Lessons (finally!) 11
  12. 12. Lesson #1 Build a platform port and thread libraries (keep the “ifdef soup” in one place) Easiest if you start with more than one platform 12
  13. 13. OMR platform porting library: omr/port • Cross platform (Linux, OSX, Windows, AIX, zOS, etc.) “thin” wrap for: • Time, Process, Memory: allocate, free, reserve, pages, numa • CPU, Environment, Files and Permissions, Shared libraries • Terminal (tty), Strings, Locales, Signals • Etc. • Port library actually a struct containing many function pointers, e.g. uintptr_t (*time_hires_clock)(struct OMRPortLibrary *portLibrary); uintptr_t (*sysinfo_get_pid)(struct OMRPortLibrary *portLibrary); intptr_t (*sysinfo_get_CPU_utilization)(struct OMRPortLibrary *portLibrary, struct J9SysinfoCPUTime *cpuTime); 13
  14. 14. OMR thread library: omr/thread • Cross platform (Linux, OSX, Windows, AIX, zOS, etc.) library for threads and synchronization • E.g. semaphores, mutexes, threads, policies, priorities, interrupts, thread local storage, CPU consumption, affinity, sleeping, waiting, fork where supported, etc. • Very similar to pthreads, e.g. • intptr_t omrthread_create(omrthread_t *handle, uintptr_t stacksize, uintptr_t priority, uintptr_t suspend, omrthread_entrypoint_t entrypoint, void *entryarg); • Also has 3-tier spin lock from Java: fast for short hold times • Inner: atomic cmpxchg, middle: spin without yielding, outer: yield, fail: block J14
  15. 15. Lesson #2 Create a fast event pub/sub framework e.g. JIT can listen for class (un)load and class loader events e.g. build verbose GC functionality on top of it 15
  16. 16. • Create event descriptions (including parameters) in XML • Hookgen tool generates headers automatically • Easy to trigger an event: TRIGGER_J9HOOK_MM_OMR_GC_CYCLE_START( _extensions->omrHookInterface, env->getOmrVMThread(), omrtime_hires_clock(), J9HOOK_MM_OMR_GC_CYCLE_START, /* other args */); • Easy to register an event hook (e.g. from GC verbose output logger): (*_mmPrivateHooks)->J9HookRegister(_mmPrivateHooks, J9HOOK_MM_PRIVATE_SYSTEM_GC_START, verboseHandlerSystemGCStart, /* user data args */); 16 OMR event hooks: omr/util/hookable
  17. 17. Lesson #3 Invest in diagnostic tools Don’t just debug with printf’s And grub around in core files 17
  18. 18. OMR diagnostics • DDRGen: builds a representation of internal VM data structures • Read dwarf/debug output generated by compiler • Also scrapes header files to learn about macros for, say, bit flags • Still under development / refactoring from Java (jdmpview) • Trace engine • Similar to events, can define tracepoints with xml • Can set verbosity so not all tracepoints are on by default (saves startup!) • GC verbose logs • Leverage event hooks to generate files readable by IBM Health Center and GC Memory Visualizer tool • JIT compiler logs, limit files, and option sets • Detail logging output with numbered opts and transformations • Can use command-line options to enable/disable numbered opts / transformations • Limit files / options can only enable or exclude compiling certain methods • Option sets can specify JIT diagnostic options for only certain methods 18
  19. 19. Lesson #4 Do not only use language level tests For all components, but especially for the JIT Too many variables not under language control Need to improve here See omr/fvtests/<component> 19
  20. 20. Lesson #5 Please don’t build yet another GC There are way too many of them already! And they’re all different L 20
  21. 21. OMR Garbage Collector: omr/gc • Highly parallel, scalable garbage collector • Exploits multiple cores • Balances work for multiple threads • Rock solid automatic memory management for language runtimes • Used for over a decade in the IBM J9 enterprise caliber Java Virtual Machine • Mark/Sweep GC pause times (depends on live data set size): • ~ 0.5 millisecond for small (2-4MB) heaps • ~ 5 milliseconds for heaps at 10s of MBs • Integrating Mark/Sweep GC to existing runtime should be <100 lines of code • Can then add even more advanced capabilities incrementally • Compaction • Generational • Concurrent 21
  22. 22. “Lesson” #6 There’s a new open source JIT in town in OMR 60+ optimizations and analyses Create your own optimization sequences New JitBuilder library to simplify IL generation 22
  23. 23. Final bit of advice: JIT and interpreter have each other’s backs JIT: make simple stuff fast, let interpreter do hard stuff Interpreter: be simple and get everything 100% right, let JIT make the right things fast 23
  24. 24. Pilot projects using OMR in existing runtimes • Use port, thread, hook, GC, JIT, and method profiling from OMR • Capabilities from IBM J9 migrated to: • CRuby, CPython, SOM++ • Method profiling via IBM Health Center • Fast, Scalable Garbage Collection • Verbose GC output for free • Enables exact same GC visualization and insights for all languages • E.g. enhancement • CRuby: all object memory moved onto managed heap • Just In Time compilers • Compiled code with focus on compatibility 24 * = not yet open source
  25. 25. Method Profiling for Ruby 25
  26. 26. Scalable high performance Garbage Collection <cycle-start id="2" type="global" contextid="0" timestamp="2015-08-05T17:21:58.105" intervalms="5066.731" /> <gc-start id="3" type="global" contextid="2" timestamp="2015-08-05T17:21:58.105"> <mem-info id="4" free="596848" total="4194304" percent="14"> <mem type="tenure" free="596848" total="4194304" percent="14" /> </mem-info> </gc-start> <allocation-stats totalBytes="3596216" > <allocated-bytes non-tlh="720016" tlh="2876200" /> </allocation-stats> <gc-op id="5" type="mark" timems="4.881" contextid="2" timestamp="2015-08-05T17:21:58.110"> <trace-info objectcount="8914" scancount="7208" scanbytes="288320" /> </gc-op> <gc-op id="8" type="sweep" timems="0.688" contextid="2" timestamp="2015-08-05T17:21:58.111" /> <gc-end id="9" type="global" contextid="2" durationms="5.896" usertimems="7.999" systemtimems="1.999" timestamp="2015-08-05T17:21:58.111" activeThreads="2"> <mem-info id="10" free="2508160" total="4194304" percent="59"> <mem type="tenure" free="2508160" total="4194304" percent="59" micro-fragmented="297048" macro-fragmented="723458" /> </mem-info> </gc-end> <cycle-end id="11" type="global" contextid="2" timestamp="2015-08-05T17:21:58.111" /> Q: Does this verbose GC output come from Java, Ruby, Python, or SOM++ ? A: Could be any one of them! 26
  27. 27. Same GC visualization & insight for all languages 27
  28. 28. Proof point: Just in Time Compilers • CRuby, CPython, SOM++ do not have JIT compilers • Our efforts to date have high focus on compatibility: • Compile native instructions for methods and blocks • Avoid big changes to how existing runtimes work (ease adoption) • Consistent behaviour for compiled code vs interpreted code • No restrictions on native code used by extension modules: we can run Rails! • No benchmark tuning or specials, no profile exploitation (yet) • Recent focus has been 100% on open sourcing JIT • Not on language ports 28
  29. 29. Speedup Relative to Interpreter Ruby Bench9000 Micro Benchmarks 3x 2x 1x 29
  30. 30. Open J9 is also coming! We’re working on it around our next IBM SDK for Java release 30
  31. 31. Connecting production runtimes and research • OMR and Open J9: production runtime technology in open source projects • IBM developers working directly in open source projects • Not research: this is how IBM builds its runtimes going forward • Opportunity for researchers and runtime developers to work side by side • Flexible licensing (EPL 1.0 or AL 2.0) • Level playing field • OMR and Open J9 technology could become testbed for runtimes research • Early days: APIs are still evolving and improving around solid technology base • Realistic path for research work to become active production code • Ideally, path for research to influence more than one language runtime 31
  32. 32. Interesting links and contact points • Mark Stoodley mstoodle@ca.ibm.com @mstoodle • Mailing List omr-dev@eclipse.org • Sign up at https://dev.eclipse.org/mailman/listinfo/omr-dev • Eclipse OMR Web Site https://www.eclipse.org/omr • Eclipse OMR DeveloperWorks Open site https://developer.ibm.com/open/omr/ • Eclipse OMR Github project https://github.com/eclipse/omr • IBM SDK for Java Docker images https://hub.docker.com/r/ibmcom/ibmjava/ • Ruby+OMR Technology Preview Github project with Docker images for Linux on LinuxONE, OpenPOWER, and X86 https://github.com/rubyomr-preview/rubyomr-preview • SOM++ with OMR GC and JitBuilder https://github.com/charliegracie/SOMpp/tree/OSCON2016 32

×