We explain various kinds of bad memory utilization patterns in Java applications, present a tool to efficiently detect them, and give a number of common solutions to these problems.
Assume that at the high level your data
is represented efficiently
• Data doesn’t sit in memory for longer than needed
• No unnecessary duplicate data structures
- E.g. don’t keep same objects in both a List and a Set
• Data structures are appropriate
- E.g. don’t use ConcurrentHashMap when no concurrency
• Data format is appropriate
- E.g. don’t use Strings for int/double numbers
Main sources of memory waste
(from bottom to top level)
• JVM internal object implementation
• Inefficient common data structures
- Collections
- Boxed numbers
• Data duplication - often biggest overhead
• Memory leaks
Internal Object Format: Alignment
• To enable 4-byte pointers (compressedOops) with
>4G heap, objects are 8-byte aligned
• Thus, for example:
- java.lang.Integer effective size is 16 bytes
(12b header + 4b int)
- java.lang.Long effective size is 24 bytes - not 20!
(12b header + 8b long + 4b padding)
Summary: small objects are bad
• A small object’s overhead is up to 400% of its
workload
• There are apps with up to 40% of the heap wasted
due to this
• See if you can change your code to “consolidate”
objects or put their contents into flat arrays
• Avoid heap size > 32G! (really ~30G)
- Unless your data is mostly int[], byte[] etc.
Common Collections
• JDK: java.util.ArrayList, java.util.HashMap,
java.util.concurrent.ConcurrentHashMap etc.
• Third-party - mainly Google:
com.google.common.collect.*
• Scala has its own equivalent of JDK collections
• JDK collections are nothing magical
- Written in Java, easy to load and read in IDE
Memory Footprint of JDK Collections
• JDK pays not much attention to memory footprint
- Just some optimizations for empty ArrayLists and HashMaps
- ConcurrentHashMap and some Google collections are the
worst “memory hogs”
• Memory is wasted due to:
- Default size of the internal array (10 for AL, 16 for HM) too
high for small maps. Never shrinks after initialization.
- $Entry objects used by all Maps take at least 32b each!
- Sets just reuse Map structure, no footprint optimization
Boxed numbers - related to collections
• java.lang.Integer, java.lang.Double etc.
• Were introduced mainly to avoid creating
specialized classes like IntToObjectHashMap
• However, proven to be extremely wasteful:
- Single int takes 4b. java.lang.Integer effective size is 16b
(12b header + 4b int), plus 4b pointer to it
- Single long takes 8b. java.lang.Long effective size is 24b
(12b header + 8b long + 4b padding), plus 4b pointer to it
JDK Collections: Summary
• Initialized, but empty collections waste memory
• Things like HashMap<Object, Integer> are bad
• HashMap$Entry etc. may take up to 30% of memory
• Some third-party libraries provide alternatives
- In particular, fastutil.di.unimi.it (University of Milan, Italy)
- Has Object2IntHashMap, Long2ObjectHashMap,
Int2DoubleHashMap, etc. - no boxed numbers
- Has Object2ObjectOpenHashMap : no $Entry objects
Data Duplication
• Can happen for many reasons:
- s = s1 + s2 or s = s.toUpperCase() etc. always
generates a new String object
- intObj = new Integer(intScalar) always generates
a new Integer object
- Duplicate byte[] buffers in I/O, serialization, etc.
• Very hard to detect without tooling
- Small amount of duplication is inevitable
- 20-40% waste is not uncommon in unoptimized apps
• Duplicate Strings are most common and easy to fix
Dealing with String duplication
• Use tooling to determine where dup strings are either
- generated, e.g. s = s.toUpperCase();
- permanently attached, e.g. this.name = name;
• Use String.intern() to de-duplicate
- Uses a JVM-internal fast, scalable canonicalization hashtable
- Table is fixed and preallocated - no extra memory overhead
- Small CPU overhead is normally offset by reduced GC time
and improved cache locality
• s = s.toUpperCase.intern();
this.name = name.intern(); …
Other duplicate data
• Can be almost anything. Examples:
- Timestamp objects
- Partitions (with HashMaps and ArrayLists) in Apache Hive
- Various byte[], char[] etc. data buffers everywhere
• So far convenient tooling so far for automatic
detection of arbitrary duplicate objects
• But one can often guess correctly
- Just look at classes that take most memory…
Dealing with non-string duplicates
• Use WeakHashMap to store canonicalized objects
- com.google.common.collect.Interner wraps a
(Weak)HashMap
• For big data structures, interning may cause some
CPU performance impact
- Interning calls hashCode() and equals()
- GC time reduction would likely offset this
• If duplicate objects are mutable, like HashMap…
- May need CopyOnFirstChangeHashMap, etc.
Duplicate Data: Summary
• Duplicate data may cause huge memory waste
- Observed up to 40% overhead in unoptimized apps
• Duplicate Strings are easy to
- Detect (but need tooling to analyze a heap dump)
- Get rid of - just use String.intern()
• Other kinds of duplicate data more difficult to find
- But it’s worth the effort!
- Mutable duplicate data is more difficult to deal with
Memory Leaks
• Unlike C++, Java doesn’t have real leaks
- Data that’s not used anymore, but not released
- Too much persistent data cached in memory
• No reliable way to distinguish leaked data…
- But any data structure that just keeps growing is bad
• So, just pay attention to the biggest (and growing)
data structures
- Heap dump: see which GC root(s) hold most memory
- Runtime profiling can be more accurate, but more expensive
What is it
• Offline heap analysis tool
- Runs once on a given heap dump, produces a text report
• Simple command-line interface:
- Just one jar + .sh script
- No complex installation
- Can run anywhere (laptop or remote headless machine)
- Needs JDK 8
• See http://www.jxray.com for more info
JXRay: main features
• Shows you what occupies the heap
- Object histogram: which objects take most memory
- Reference chains: which GC roots/data structures keep
biggest object “lumps” in memory
• Shows you where memory is wasted
- Object headers
- Duplicate Strings
- Bad collections (empty; 1-element; small (2-4 element))
- Bad object arrays (empty (all nulls); length 0 or 1; 1-element)
- Boxed numbers
- Duplicate primitive arrays (e.g. byte[] buffers)
Keeping results succinct
• No GUI - generates a plain text report
- Easy to save and exchange
- Small: ~50K regardless of the dump size
- Details a given problem once its overhead is above
threshold (by default 0.1% of used heap)
• Knows about internals of most standard collections
- More compact/informative representation
• Aggregates reference chains from GC roots to
problematic objects
Reference chain aggregation:
assumptions
• A problem is important if many objects have it
- E.g.1000s/1000,000s of duplicate strings
• Usually there are not too many places in the code
responsible for such a problem
- Foo(String s) {
this.s = s.toUpperCase(); …
}
- Bar(String s1, String s2) {
this.s = s1 + s2; …
}
Reference chain aggregation: what is it
• In the heap, we may have e.g.
Baz.stat1 -> HashMap@243 -> ArrayList@650 -> Foo.s = “xyz”
Baz.stat2 -> LinkedList@798 -> HashSet@134 -> Bar.s = “0”
Baz.stat1 -> HashMap@529 -> ArrayList@351 -> Foo.s = “abc”
Baz.stat2 -> LinkedList@284 -> HashSet@960 -> Bar.s = “1”
… 1000s more chains like this
• JXRay aggregates them all into just two lines:
Baz.stat1 -> {HashMap} -> {ArrayList} -> Foo.s (“abc”,”xyz” and
3567 more dup strings)
Baz.stat2 -> {LinkedList} -> {HashSet} -> Bar.s (“0”, “1” and …)
Bad collections
• Empty: no elements at all
- Is it used at all? If yes, allocate lazily.
• 1-element
- Always has only 1 element - replace with object
- Almost always has 1 element - solution more complex.
Switch between Object and collection/array lazily.
• Small: 2..4 elements
- Consider smaller initial capacity
- Consider replacing with a plain array
Bad object arrays
• Empty: only nulls
- Same as empty collections - delete or allocate lazily
• Length 0
- Replace with a singleton zero-length array
• Length 1
- Replace with an object?
• Single non-null element
- Replace with an object? Reduce length?
A Monitoring app
• Scalability wasn’t great
- Some users had to increase -Xmx again and again.
- Unclear how to choose the correct size
• Big heap -> long full GC pauses -> frozen UI
• Some OOMs in small clusters
- Not a scale problem - a bug?
Investigation, part 1
• Started with the smaller dumps with OOMs
- Immediately found duplicate strings
- One string repeated 1000s times used 90% of the heap
- Long SQL query saved in DB many times, then retrieved
- Adding two String.intern() calls solved the problem.. almost
• Duplicate byte[] buffers in a 3rd-party library code
- That still caused noticeable overhead
- Ended up limiting saved query size at high level
- Library/auto-gen code may be difficult to change…
Investigation, part 2
• Next, looked into heap dumps with scalability
problems
- Both real and artificial benchmark setup
• Found all the usual issues
- String duplication
- Empty or small (1-4 elements) collections
- Tons of small objects (object headers used 31% of heap!)
- Boxed numbers
Standard solutions applied
• Duplicate strings: add more String.intern() calls
- Easy: check jxray report, find what data structures reference
bad strings, edit code
- Non-trivial when a String object is mostly managed by auto-
generated code
• Bad collections: less trivial
- Sometimes it’s enough to replace new HashMap() with new
HashMap(expectedSize)
- Found ArrayLists that almost always size 0/1
Dealing with mostly 0/1-size ArrayLists
• Replaced ArrayList list; —> Object valueOrArray;
• Depending on the situation, valueOrArray may
- be null
- point to a single object (element)
- point to an array of objects (elements)
• ~70 LOC hand-written for this
- But memory savings were worth the effort
Dealing with non-string duplicate data
• Heap contained a lot of of small objects
class TimestampAndData {
long timestamp;
long value;
… }
• Guessed that there may be many duplicates
- E.g. many values are just 0/1
• Added a simple canonicalization cache. Result:
- 8x fewer TimestampAndData objects
- 16% memory savings
A Monitoring app: conclusions
• Fixing string/other data duplication, boxed nums,
small/empty collections: together saved ~50%
- Depends on the workload
- Scalability improved: more data - higher savings
• Can still save more - replace standard HashMaps
with more memory-friendly maps
- HashMap$Entry objects may take a lot of memory!
Apache Hive: Hive Server 2 (HS2)
• HS2 may run out of memory
• Most scenarios involve 1000s of partitions and 10s
of concurrent queries
• Not many heap dumps from real users
• Create a benchmark which reproduces the
problem, measure where memory goes, optimize
Experimental setup
• Created a Hive table with 2000 small partitions
• Running 50 concurrent queries like “select
count(myfield_1) from mytable;” crashes an HS2
server with -Xmx500m
• More partitions or concurrent queries - more
memory needed
HS2: Investigation
• Looked into the heap dump generated after OOM
• Not too many different problems:
- Duplicate strings: 23%
- java.util.Properties objects take 20% of memory
- Various bad collections: 18%
• Apparently, many Properties are duplicate
- A separate copy per partition per query
- For a read-only partition, all per-query copies are identical
HS2: Fixing duplicate strings
• Some String.intern() calls added
• Some strings come from HDFS code
- Need separate changes in Hadoop code
• Most interesting: String fields of java.net.URI
- private fields initialized internally - no access
- But still can read/write using Java Reflection
- Wrote StringInternUtils.internStringsInURI(URI) method
HS2: Fixing duplicate
java.util.Properties objects
• Main problem: Properties object is mutable
- All PartitionDesc objects representing the same partition
cannot simply use one “canonicalized” Properties object
- If one is changed, others should not!
• Had to implement a new class
class CopyOnFirstWriteProperties extends Properties {
Properties interned; // Used until/unless a mutator called
// Inherited table is filled and used after first mutation
…
}
HS2: Improvements based on simple
read-only benchmark
• Fixing duplicate strings and properties together
saved ~37% of memory
• Another ~5% can be saved by reduplicating
strings in HDFS
• Another ~10% can be saved by dealing with bad
collections
Investigating/fixing concrete apps:
conclusions
• Any app can develop memory problems over time
- Check and optimize periodically
• Many such problems are easy enough to fix
- Intern strings, initialize collections lazily, etc.
• Duplication other than strings is frequent
- More difficult to fix, but may be well worth the effort
- Need to improve tooling to detect it automatically