Off-heap Storage
•
Edin Zulich
Agenda
• Motivation and goals for off-heap storage
• Off-heap features and usage
• Implementation overview
• Preliminary benchmarks: off-heap vs. heap
• Tips and best practices
• Future Directions
Motivation and goals for
off-heap storage
Why Off-heap
•
• Increase data density and reduce memory overhead
• 128+ GB user data in one JVM
• 10+ TB user data in one cluster
• Usable out-of-box without extensive GC tuning of JVM
• Maintain existing throughput performance
Off-heap Usage and
Features
Off-heap: How Do I Use It?
• Set the off-heap memory size for the process
– Using the new property: off-heap-memory-size
• Mark regions whose entry values should be stored off-heap
– Using the new region attribute: off-heap (false | true)
• Adjust the JVM heap memory size down accordingly
– The smaller the better; at least try to keep it below 32G
• Optionally
– Configure Resource Manager
– Lock the off-heap memory
Off-heap Features
• Startup options
• Interaction with other features
• Resource Manager
• Monitoring & Management
• Limitations
Startup Options
• --off-heap-memory-size
specifies amount of off-heap memory to allocate
• --lock-memory
specifies whether to lock the off-heap memory
• Example:
gfsh start server –initial-heap=10G –max-heap=10G –off-heap-
memory-size=200G –lock-memory=true
Off-heap Interaction with Other Features
• Deltas: values have to be copied (the clone option is ON,
and cannot be turned off)
• EntryEvents
– Limited availability of oldValue, newValue
• Indexes
– Functional range indexes not supported (too expensive)
• Region operations that compare the value do not call
object equals, but either PDXInstance or serialized bytes
– In order to do equality check without having to deserialize data
More Expensive with Off-heap
• PDX
– Values currently copied from off-heap to create a PDXInstance
• Deltas: to apply a delta you have to serialize/deserialize
every time
• Compression: there is an extra copy on decompression
• Querying: deserialize on every query
Off-heap and Resource Manager
• Out of Memory Semantics
• Eviction and Critical Thresholds
• Resource Manager API
“Out of Memory” Occurs When...
• Java heap runs out of memory
– Threads start throwing OutOfMemoryError
• Off-heap runs out of memory
– Threads start throwing OutOfOffHeapMemoryException
• => causing the Geode member to close and disconnect
– Closes the Cache to prevent reading inconsistent data
– Disconnects from the Geode cluster to prevent distribution problems
or hangs
Eviction and Critical Thresholds for Off-heap
• CriticalOffHeapPercentage
– triggers LowMemoryException for puts into off-heap regions
– critical member informs other members that it is critical
• EvictionOffHeapPercentage
– triggers eviction of entries in off-heap regions configured with
LRU_HEAP
• Semantics the same as with the equivalent heap
thresholds
Startup Options
• Existing:
• -critical-heap-percentage
• -eviction-heap-percentage
• New:
• -critical-off-heap-percentage
• -eviction-off-heap-percentage
• Example:
start server –initial-heap=10G –max-heap=10G –off-heap-memory-
size=200G –lock-memory=true –critical-off-heap-percentage=99
ResourceManager API
• GemFireCache#getResourceManager()
• com.gemstone.gemfire.cache.control.ResourceManager
– exposes getters/setters for all of the heap and off-heap threshold
percentages
– Examples:
▪ public void setCriticalOffHeapPercentage(float offHeapPercentage);
▪ public float getCriticalOffHeapPercentage();
Monitoring & Management
• Statistics
• Mbeans
• gfsh
Statistics
name description
defragmentations The total number of times off-heap memory has been defragmented.
defragmentationTime The total time spent defragmenting off-heap memory.
fragmentation The percentage of off-heap memory fragmentation. Updated every time a defragmentation is
performed.
fragments The number of fragments of free off-heap memory. Updated every time a defragmentation is
done.
freeMemory The amount of off-heap memory, in bytes, that is not being used.
largestFragment The largest fragment of memory found by the last compaction of off heap memory. Updated
every time a defragmentation is done.
maxMemory The maximum amount of off-heap memory, in bytes. This is the amount of memory allocated at
startup and does not change.
objects The number of objects stored in off-heap memory.
reads The total number of reads of off-heap memory.
usedMemory The amount of off-heap memory, in bytes, that is being used to store data.
MBeans
MemberMXBean
getOffHeapDefragmentationTime -- provides the value of the defragmentationTime statistic
getOffHeapFragmentation -- provides the value of the fragmentation statistic
getOffHeapFreeMemory -- provides the value of the freeMemory statistic
getOffHeapObjects -- provides the value of the objects statistic
getOffHeapUsedMemory -- provides the value of the usedMemory statistic
getOffHeapMaxMemory -- provides the value of freeMemory + usedMemory
RegionMXBean
listRegionAttributes (operation)
enableOffHeapMemory (true | false)
Gfsh Support for Off-heap Memory
• alter disk-store: new option "--off-heap" for setting off-heap for each
region in the disk-store
• create region: new option "--off-heap" for setting off-heap
• describe member: now displays the off-heap size
• describe offline-disk-store: now shows if a region is off-heap
• describe region: now displays the off-heap region attribute
• show metrics: Now has an offheap category. The offheap metrics
are: maxMemory, freeMemory, usedMemory, objects, fragmentation,
and defragmentationTime
• start server: added --lock-memory, --off-heap-memory-size, --critical-
off-heap-percentage, and --eviction-off-heap-percentage
Off-heap Limitations
• Maximum object size limited to slightly less than 2 GB
• All data nodes must consistently configure a region to be off-
heap
• Functional Range Indexes not supported
• Keys, subscription queue entries not stored off-heap
• Fragmentation statistic is only updated during off-heap
compactions
Implementation Overview
Off-heap: How are We Doing It?
• Using memory that is separate from the Java heap
– Built our own Memory Manager
– Memory Manager is very finely tuned and specific to our usage
– Avoid GC overhead
▪ Avoid copying of objects for promotion between generations
▪ Garbage Collector is a major performance killer
– Use sun.misc.Unsafe API for performance
• Optimizing code to minimize usage of heap memory
• Using off-heap as primary store instead of overflowing to it
Off-heap
Memory
Management
Off-heap Implementation
• Memory allocated in 2GB slabs
– Max data value size: ~2GB
– Object values stored serialized; blobs stored as byte arrays
– Allocation faster for values < 128KB
▪ Controlled by a system property: gemfire.OFF_HEAP_FREE_LIST_COUNT
▪ First try to allocate from the free list; if that fails, allocate from unused memory
▪ Small values (< 8B) inlined (not using any off-heap space)
• Defragmentation consolidates free memory to minimize
fragmentation
– Blocks writes; best to avoid by minimizing fragmentation
Off-heap Implementation (cont’d)
• Allocated chunks
– Header
▪ isSerialized
▪ isCompressed
▪ Size
▪ Padding size
• Free chunks
– Header
▪ Size
▪ Address of next chunk in the list
What is Stored On-heap vs. Off-heap
Always Stored On-heap Stored Off-heap
Region Meta-Data Values
Entry Meta-Data Reference Counts
- Off-Heap Addresses Lists of Free Memory Blocks
Keys WAN Queue Elements
Indexes
Subscription Queue Elements
Preliminary Benchmarks:
Off-heap vs. Heap
Off-heap: Initial Testing Results
• 256 GB user data per node across 8 nodes for total of 2 TB
of user data
• Heap-only test worked twice as hard to produce 1/3 the
updates as test using Off-Heap
– Details on the next slide
• Succeeded in scaling up to much larger in-memory data
• Increased throughput of operations for large data sets
Heap vs. Off-Heap Comparison
Java Heap Off-Heap
creates/sec 30,000 45,000
updates/sec 17,000 (std dev: 2130) 51,000 (std dev: 737)
Java RSS size 50 GB 32 GB
CPU load 70% (load avg 10 cpus) 32% (load avg 5 cpus)
JVM GC ConcurrentMarkSweep ConcurrentMarkSweep
GC ms/sec 777 ms 24 ms
GC marks (GC pauses) 1 per 30 sec never
Recommendations and
Best Practices
Off-heap Rules of Thumb
• Try to minimize fragmentation
– In order to avoid defragmentation
– Beware of usage patterns that lead to fragmentation
▪ Many updates of varying value size
▪ Use cases that require a lot of serialization/deserialization
• Beware of more expensive features
– Deltas
– Querying
– Compression
– PDX
Off-heap Recommendations
• Recommended when
– The values are relatively uniform in size
– The values are mostly less than 128K in size (configurable)
– The usage patterns involve cycles of many creates followed by
destroys or clear
– The values do not need to be frequently deserialized
• Configure all data nodes with the same off-heap-memory-
size
Future
How about...
• Storing keys, indexes, subscription queues off-heap?
• An API to invoke a defragmentation?
• A way to configure the slab size?
• A way to configure the max value size for the most efficient
off-heap allocation, or maybe the size increment?
• A different defragmentation algorithm/policy?
• Anything else?
Questions?
Feedback Welcome!
Full spec at:
https://cwiki.apache.org/confluence/display/GEODE/Off-
Heap+Memory+Spec
Join the Apache Geode Community!
• Check out: http://geode.incubator.apache.org
• Subscribe: user-subscribe@geode.incubator.apache.org
• Download: http://geode.incubator.apache.org/releases/
Thank You!

#GeodeSummit - Off-Heap Storage Current and Future Design

  • 1.
  • 2.
    Agenda • Motivation andgoals for off-heap storage • Off-heap features and usage • Implementation overview • Preliminary benchmarks: off-heap vs. heap • Tips and best practices • Future Directions
  • 3.
    Motivation and goalsfor off-heap storage
  • 4.
    Why Off-heap • • Increasedata density and reduce memory overhead • 128+ GB user data in one JVM • 10+ TB user data in one cluster • Usable out-of-box without extensive GC tuning of JVM • Maintain existing throughput performance
  • 5.
  • 6.
    Off-heap: How DoI Use It? • Set the off-heap memory size for the process – Using the new property: off-heap-memory-size • Mark regions whose entry values should be stored off-heap – Using the new region attribute: off-heap (false | true) • Adjust the JVM heap memory size down accordingly – The smaller the better; at least try to keep it below 32G • Optionally – Configure Resource Manager – Lock the off-heap memory
  • 7.
    Off-heap Features • Startupoptions • Interaction with other features • Resource Manager • Monitoring & Management • Limitations
  • 8.
    Startup Options • --off-heap-memory-size specifiesamount of off-heap memory to allocate • --lock-memory specifies whether to lock the off-heap memory • Example: gfsh start server –initial-heap=10G –max-heap=10G –off-heap- memory-size=200G –lock-memory=true
  • 9.
    Off-heap Interaction withOther Features • Deltas: values have to be copied (the clone option is ON, and cannot be turned off) • EntryEvents – Limited availability of oldValue, newValue • Indexes – Functional range indexes not supported (too expensive) • Region operations that compare the value do not call object equals, but either PDXInstance or serialized bytes – In order to do equality check without having to deserialize data
  • 10.
    More Expensive withOff-heap • PDX – Values currently copied from off-heap to create a PDXInstance • Deltas: to apply a delta you have to serialize/deserialize every time • Compression: there is an extra copy on decompression • Querying: deserialize on every query
  • 11.
    Off-heap and ResourceManager • Out of Memory Semantics • Eviction and Critical Thresholds • Resource Manager API
  • 12.
    “Out of Memory”Occurs When... • Java heap runs out of memory – Threads start throwing OutOfMemoryError • Off-heap runs out of memory – Threads start throwing OutOfOffHeapMemoryException • => causing the Geode member to close and disconnect – Closes the Cache to prevent reading inconsistent data – Disconnects from the Geode cluster to prevent distribution problems or hangs
  • 13.
    Eviction and CriticalThresholds for Off-heap • CriticalOffHeapPercentage – triggers LowMemoryException for puts into off-heap regions – critical member informs other members that it is critical • EvictionOffHeapPercentage – triggers eviction of entries in off-heap regions configured with LRU_HEAP • Semantics the same as with the equivalent heap thresholds
  • 14.
    Startup Options • Existing: •-critical-heap-percentage • -eviction-heap-percentage • New: • -critical-off-heap-percentage • -eviction-off-heap-percentage • Example: start server –initial-heap=10G –max-heap=10G –off-heap-memory- size=200G –lock-memory=true –critical-off-heap-percentage=99
  • 15.
    ResourceManager API • GemFireCache#getResourceManager() •com.gemstone.gemfire.cache.control.ResourceManager – exposes getters/setters for all of the heap and off-heap threshold percentages – Examples: ▪ public void setCriticalOffHeapPercentage(float offHeapPercentage); ▪ public float getCriticalOffHeapPercentage();
  • 16.
    Monitoring & Management •Statistics • Mbeans • gfsh
  • 17.
    Statistics name description defragmentations Thetotal number of times off-heap memory has been defragmented. defragmentationTime The total time spent defragmenting off-heap memory. fragmentation The percentage of off-heap memory fragmentation. Updated every time a defragmentation is performed. fragments The number of fragments of free off-heap memory. Updated every time a defragmentation is done. freeMemory The amount of off-heap memory, in bytes, that is not being used. largestFragment The largest fragment of memory found by the last compaction of off heap memory. Updated every time a defragmentation is done. maxMemory The maximum amount of off-heap memory, in bytes. This is the amount of memory allocated at startup and does not change. objects The number of objects stored in off-heap memory. reads The total number of reads of off-heap memory. usedMemory The amount of off-heap memory, in bytes, that is being used to store data.
  • 18.
    MBeans MemberMXBean getOffHeapDefragmentationTime -- providesthe value of the defragmentationTime statistic getOffHeapFragmentation -- provides the value of the fragmentation statistic getOffHeapFreeMemory -- provides the value of the freeMemory statistic getOffHeapObjects -- provides the value of the objects statistic getOffHeapUsedMemory -- provides the value of the usedMemory statistic getOffHeapMaxMemory -- provides the value of freeMemory + usedMemory RegionMXBean listRegionAttributes (operation) enableOffHeapMemory (true | false)
  • 19.
    Gfsh Support forOff-heap Memory • alter disk-store: new option "--off-heap" for setting off-heap for each region in the disk-store • create region: new option "--off-heap" for setting off-heap • describe member: now displays the off-heap size • describe offline-disk-store: now shows if a region is off-heap • describe region: now displays the off-heap region attribute • show metrics: Now has an offheap category. The offheap metrics are: maxMemory, freeMemory, usedMemory, objects, fragmentation, and defragmentationTime • start server: added --lock-memory, --off-heap-memory-size, --critical- off-heap-percentage, and --eviction-off-heap-percentage
  • 20.
    Off-heap Limitations • Maximumobject size limited to slightly less than 2 GB • All data nodes must consistently configure a region to be off- heap • Functional Range Indexes not supported • Keys, subscription queue entries not stored off-heap • Fragmentation statistic is only updated during off-heap compactions
  • 21.
  • 22.
    Off-heap: How areWe Doing It? • Using memory that is separate from the Java heap – Built our own Memory Manager – Memory Manager is very finely tuned and specific to our usage – Avoid GC overhead ▪ Avoid copying of objects for promotion between generations ▪ Garbage Collector is a major performance killer – Use sun.misc.Unsafe API for performance • Optimizing code to minimize usage of heap memory • Using off-heap as primary store instead of overflowing to it
  • 23.
  • 24.
    Off-heap Implementation • Memoryallocated in 2GB slabs – Max data value size: ~2GB – Object values stored serialized; blobs stored as byte arrays – Allocation faster for values < 128KB ▪ Controlled by a system property: gemfire.OFF_HEAP_FREE_LIST_COUNT ▪ First try to allocate from the free list; if that fails, allocate from unused memory ▪ Small values (< 8B) inlined (not using any off-heap space) • Defragmentation consolidates free memory to minimize fragmentation – Blocks writes; best to avoid by minimizing fragmentation
  • 25.
    Off-heap Implementation (cont’d) •Allocated chunks – Header ▪ isSerialized ▪ isCompressed ▪ Size ▪ Padding size • Free chunks – Header ▪ Size ▪ Address of next chunk in the list
  • 26.
    What is StoredOn-heap vs. Off-heap Always Stored On-heap Stored Off-heap Region Meta-Data Values Entry Meta-Data Reference Counts - Off-Heap Addresses Lists of Free Memory Blocks Keys WAN Queue Elements Indexes Subscription Queue Elements
  • 27.
  • 28.
    Off-heap: Initial TestingResults • 256 GB user data per node across 8 nodes for total of 2 TB of user data • Heap-only test worked twice as hard to produce 1/3 the updates as test using Off-Heap – Details on the next slide • Succeeded in scaling up to much larger in-memory data • Increased throughput of operations for large data sets
  • 29.
    Heap vs. Off-HeapComparison Java Heap Off-Heap creates/sec 30,000 45,000 updates/sec 17,000 (std dev: 2130) 51,000 (std dev: 737) Java RSS size 50 GB 32 GB CPU load 70% (load avg 10 cpus) 32% (load avg 5 cpus) JVM GC ConcurrentMarkSweep ConcurrentMarkSweep GC ms/sec 777 ms 24 ms GC marks (GC pauses) 1 per 30 sec never
  • 30.
  • 31.
    Off-heap Rules ofThumb • Try to minimize fragmentation – In order to avoid defragmentation – Beware of usage patterns that lead to fragmentation ▪ Many updates of varying value size ▪ Use cases that require a lot of serialization/deserialization • Beware of more expensive features – Deltas – Querying – Compression – PDX
  • 32.
    Off-heap Recommendations • Recommendedwhen – The values are relatively uniform in size – The values are mostly less than 128K in size (configurable) – The usage patterns involve cycles of many creates followed by destroys or clear – The values do not need to be frequently deserialized • Configure all data nodes with the same off-heap-memory- size
  • 33.
  • 34.
    How about... • Storingkeys, indexes, subscription queues off-heap? • An API to invoke a defragmentation? • A way to configure the slab size? • A way to configure the max value size for the most efficient off-heap allocation, or maybe the size increment? • A different defragmentation algorithm/policy? • Anything else?
  • 35.
  • 36.
    Feedback Welcome! Full specat: https://cwiki.apache.org/confluence/display/GEODE/Off- Heap+Memory+Spec
  • 37.
    Join the ApacheGeode Community! • Check out: http://geode.incubator.apache.org • Subscribe: user-subscribe@geode.incubator.apache.org • Download: http://geode.incubator.apache.org/releases/
  • 38.