#GeodeSummit - Off-Heap Storage Current and Future Design

Off-heap Storage
•
Edin Zulich

Agenda
• Motivation and goals for off-heap storage
• Off-heap features and usage
• Implementation overview
• Preliminary benchmarks: off-heap vs. heap
• Tips and best practices
• Future Directions

Motivation and goals for
off-heap storage

Why Off-heap
•
• Increase data density and reduce memory overhead
• 128+ GB user data in one JVM
• 10+ TB user data in one cluster
• Usable out-of-box without extensive GC tuning of JVM
• Maintain existing throughput performance

Off-heap: How Do I Use It?
• Set the off-heap memory size for the process
– Using the new property: off-heap-memory-size
• Mark regions whose entry values should be stored off-heap
– Using the new region attribute: off-heap (false | true)
• Adjust the JVM heap memory size down accordingly
– The smaller the better; at least try to keep it below 32G
• Optionally
– Configure Resource Manager
– Lock the off-heap memory

Off-heap Features
• Startup options
• Interaction with other features
• Resource Manager
• Monitoring & Management
• Limitations

Startup Options
• --off-heap-memory-size
specifies amount of off-heap memory to allocate
• --lock-memory
specifies whether to lock the off-heap memory
• Example:
gfsh start server –initial-heap=10G –max-heap=10G –off-heap-
memory-size=200G –lock-memory=true

Off-heap Interaction with Other Features
• Deltas: values have to be copied (the clone option is ON,
and cannot be turned off)
• EntryEvents
– Limited availability of oldValue, newValue
• Indexes
– Functional range indexes not supported (too expensive)
• Region operations that compare the value do not call
object equals, but either PDXInstance or serialized bytes
– In order to do equality check without having to deserialize data

More Expensive with Off-heap
• PDX
– Values currently copied from off-heap to create a PDXInstance
• Deltas: to apply a delta you have to serialize/deserialize
every time
• Compression: there is an extra copy on decompression
• Querying: deserialize on every query

Off-heap and Resource Manager
• Out of Memory Semantics
• Eviction and Critical Thresholds
• Resource Manager API

“Out of Memory” Occurs When...
• Java heap runs out of memory
– Threads start throwing OutOfMemoryError
• Off-heap runs out of memory
– Threads start throwing OutOfOffHeapMemoryException
• => causing the Geode member to close and disconnect
– Closes the Cache to prevent reading inconsistent data
– Disconnects from the Geode cluster to prevent distribution problems
or hangs

Eviction and Critical Thresholds for Off-heap
• CriticalOffHeapPercentage
– triggers LowMemoryException for puts into off-heap regions
– critical member informs other members that it is critical
• EvictionOffHeapPercentage
– triggers eviction of entries in off-heap regions configured with
LRU_HEAP
• Semantics the same as with the equivalent heap
thresholds

Startup Options
• Existing:
• -critical-heap-percentage
• -eviction-heap-percentage
• New:
• -critical-off-heap-percentage
• -eviction-off-heap-percentage
• Example:
start server –initial-heap=10G –max-heap=10G –off-heap-memory-
size=200G –lock-memory=true –critical-off-heap-percentage=99

ResourceManager API
• GemFireCache#getResourceManager()
• com.gemstone.gemfire.cache.control.ResourceManager
– exposes getters/setters for all of the heap and off-heap threshold
percentages
– Examples:
▪ public void setCriticalOffHeapPercentage(float offHeapPercentage);
▪ public float getCriticalOffHeapPercentage();

Monitoring & Management
• Statistics
• Mbeans
• gfsh

Statistics
name description
defragmentations The total number of times off-heap memory has been defragmented.
defragmentationTime The total time spent defragmenting off-heap memory.
fragmentation The percentage of off-heap memory fragmentation. Updated every time a defragmentation is
performed.
fragments The number of fragments of free off-heap memory. Updated every time a defragmentation is
done.
freeMemory The amount of off-heap memory, in bytes, that is not being used.
largestFragment The largest fragment of memory found by the last compaction of off heap memory. Updated
every time a defragmentation is done.
maxMemory The maximum amount of off-heap memory, in bytes. This is the amount of memory allocated at
startup and does not change.
objects The number of objects stored in off-heap memory.
reads The total number of reads of off-heap memory.
usedMemory The amount of off-heap memory, in bytes, that is being used to store data.

MBeans
MemberMXBean
getOffHeapDefragmentationTime -- provides the value of the defragmentationTime statistic
getOffHeapFragmentation -- provides the value of the fragmentation statistic
getOffHeapFreeMemory -- provides the value of the freeMemory statistic
getOffHeapObjects -- provides the value of the objects statistic
getOffHeapUsedMemory -- provides the value of the usedMemory statistic
getOffHeapMaxMemory -- provides the value of freeMemory + usedMemory
RegionMXBean
listRegionAttributes (operation)
enableOffHeapMemory (true | false)

Gfsh Support for Off-heap Memory
• alter disk-store: new option "--off-heap" for setting off-heap for each
region in the disk-store
• create region: new option "--off-heap" for setting off-heap
• describe member: now displays the off-heap size
• describe offline-disk-store: now shows if a region is off-heap
• describe region: now displays the off-heap region attribute
• show metrics: Now has an offheap category. The offheap metrics
are: maxMemory, freeMemory, usedMemory, objects, fragmentation,
and defragmentationTime
• start server: added --lock-memory, --off-heap-memory-size, --critical-
off-heap-percentage, and --eviction-off-heap-percentage

Off-heap Limitations
• Maximum object size limited to slightly less than 2 GB
• All data nodes must consistently configure a region to be off-
heap
• Functional Range Indexes not supported
• Keys, subscription queue entries not stored off-heap
• Fragmentation statistic is only updated during off-heap
compactions

Off-heap: How are We Doing It?
• Using memory that is separate from the Java heap
– Built our own Memory Manager
– Memory Manager is very finely tuned and specific to our usage
– Avoid GC overhead
▪ Avoid copying of objects for promotion between generations
▪ Garbage Collector is a major performance killer
– Use sun.misc.Unsafe API for performance
• Optimizing code to minimize usage of heap memory
• Using off-heap as primary store instead of overflowing to it

Off-heap Implementation
• Memory allocated in 2GB slabs
– Max data value size: ~2GB
– Object values stored serialized; blobs stored as byte arrays
– Allocation faster for values < 128KB
▪ Controlled by a system property: gemfire.OFF_HEAP_FREE_LIST_COUNT
▪ First try to allocate from the free list; if that fails, allocate from unused memory
▪ Small values (< 8B) inlined (not using any off-heap space)
• Defragmentation consolidates free memory to minimize
fragmentation
– Blocks writes; best to avoid by minimizing fragmentation

Off-heap Implementation (cont’d)
• Allocated chunks
– Header
▪ isSerialized
▪ isCompressed
▪ Size
▪ Padding size
• Free chunks
– Header
▪ Size
▪ Address of next chunk in the list

What is Stored On-heap vs. Off-heap
Always Stored On-heap Stored Off-heap
Region Meta-Data Values
Entry Meta-Data Reference Counts
- Off-Heap Addresses Lists of Free Memory Blocks
Keys WAN Queue Elements
Indexes
Subscription Queue Elements

Preliminary Benchmarks:
Off-heap vs. Heap

Off-heap: Initial Testing Results
• 256 GB user data per node across 8 nodes for total of 2 TB
of user data
• Heap-only test worked twice as hard to produce 1/3 the
updates as test using Off-Heap
– Details on the next slide
• Succeeded in scaling up to much larger in-memory data
• Increased throughput of operations for large data sets

Heap vs. Off-Heap Comparison
Java Heap Off-Heap
creates/sec 30,000 45,000
updates/sec 17,000 (std dev: 2130) 51,000 (std dev: 737)
Java RSS size 50 GB 32 GB
CPU load 70% (load avg 10 cpus) 32% (load avg 5 cpus)
JVM GC ConcurrentMarkSweep ConcurrentMarkSweep
GC ms/sec 777 ms 24 ms
GC marks (GC pauses) 1 per 30 sec never

Recommendations and
Best Practices

Off-heap Rules of Thumb
• Try to minimize fragmentation
– In order to avoid defragmentation
– Beware of usage patterns that lead to fragmentation
▪ Many updates of varying value size
▪ Use cases that require a lot of serialization/deserialization
• Beware of more expensive features
– Deltas
– Querying
– Compression
– PDX

Off-heap Recommendations
• Recommended when
– The values are relatively uniform in size
– The values are mostly less than 128K in size (configurable)
– The usage patterns involve cycles of many creates followed by
destroys or clear
– The values do not need to be frequently deserialized
• Configure all data nodes with the same off-heap-memory-
size

How about...
• Storing keys, indexes, subscription queues off-heap?
• An API to invoke a defragmentation?
• A way to configure the slab size?
• A way to configure the max value size for the most efficient
off-heap allocation, or maybe the size increment?
• A different defragmentation algorithm/policy?
• Anything else?

Feedback Welcome!
Full spec at:
https://cwiki.apache.org/confluence/display/GEODE/Off-
Heap+Memory+Spec

Join the Apache Geode Community!
• Check out: http://geode.incubator.apache.org
• Subscribe: user-subscribe@geode.incubator.apache.org
• Download: http://geode.incubator.apache.org/releases/

#GeodeSummit - Off-Heap Storage Current and Future Design

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to #GeodeSummit - Off-Heap Storage Current and Future Design

Similar to #GeodeSummit - Off-Heap Storage Current and Future Design (20)

More from PivotalOpenSourceHub

More from PivotalOpenSourceHub (13)

Recently uploaded

Recently uploaded (20)

#GeodeSummit - Off-Heap Storage Current and Future Design