Off-heap Storage
•
Agenda
• Motivation and goals for off-heap storage
• Off-heap features and usage
• Implementation overview
• Preliminary benchmarks: off-heap vs. heap
• Tips and best practices
Motivation and goals for
off-heap storage
Why Off-heap
•
• Increase data density and reduce memory overhead
• 50+ GB user data in one JVM
• 10+ TB user data in one cluster
• Usable out-of-box without extensive GC tuning of JVM
• Maintain existing throughput performance
Off-heap Usage and
Features
Off-heap: How Do I Use It?
• Set the off-heap memory size for the process
– Using the new property: off-heap-memory-size
• Mark regions whose entry values should be stored off-heap
– Using the new region attribute: off-heap (false | true)
• Adjust the JVM heap memory size down accordingly
– The smaller the better; at least try to keep it below 32G
• Optionally
– Configure Resource Manager
Off-heap Features
• Startup options
• Interaction with other features
• Resource Manager
• Monitoring & Management
• Limitations
Startup Options
• --off-heap-memory-size – specifies amount of off-heap
memory to allocate
• -lock-memory – specifies to lock memory from the OS
• Example:
gfsh start server –initial-heap=10G –max-heap=10G –off-heap-
memory-size=200G –lock-memory=true
Off-heap Interaction with Other Features
• PDX
– Values currently copied from off-heap to create a PDXInstance
• Deltas: expensive
• Compression: compatible with off-heap
• Querying: more expensive with off-heap
• EntryEvents
– Limited availability of oldValue, newValue
• Indexes
– Functional range indexes not supported (too expensive)
Off-heap and Resource Manager
• Out of Memory Semantics
• Eviction and Critical Thresholds
• Resource Manager API
Out of Memory occurs when...
• Java heap runs out of memory
– Threads start throwing OutOfMemoryError
• Off-heap runs out of memory
– Threads start throwing OutOfOffHeapMemoryException
• => causing the Geode member to close and disconnect
– Closes the Cache to prevent reading inconsistent data
– Disconnects from the Geode cluster to prevent distribution problems
or hangs
Eviction and Critical Thresholds for Java Heap
• CriticalHeapPercentage
– triggers LowMemoryException for puts into heap regions
– default is 90%
– critical member informs other members that it is critical
• EvictionHeapPercentage
– triggers eviction of entries in heap regions configured with
LRU_HEAP
– default is 90% of CriticalHeapPercentage
Eviction and Critical Thresholds for Off-heap
• CriticalOffHeapPercentage
– triggers LowMemoryException for puts into off-heap regions
– default is 90% if –off-heap-memory-size is specified
– critical member informs other members that it is critical
• EvictionOffHeapPercentage
– triggers eviction of entries in off-heap regions configured with
LRU_HEAP
– default is 90% of CriticalOffHeapPercentage if –off-heap-memory-
size is specified
Startup Options
• Existing:
• -critical-heap-percentage
• -eviction-heap-percentage
• New:
• -critical-off-heap-percentage
• -eviction-off-heap-percentage
• Example:
start server –initial-heap=10G –max-heap=10G –off-heap-memory-
size=200G –lock-memory=true –critical-off-heap-percentage=99
ResourceManager API
• GemFireCache#getResourceManager()
• com.gemstone.gemfire.cache.control.ResourceManager
– exposes getters/setters for all of the heap and off-heap threshold
percentages
– Examples:
▪ public void setCriticalOffHeapPercentage(float offHeapPercentage);
▪ public float getCriticalOffHeapPercentage();
Monitoring & Management
• Statistics
• Mbeans
• gfsh
Statistics
name description
compactions The total number of times off-heap memory has been compacted.
compactionTime The total time spent compacting off-heap memory.
fragmentation The percentage of off-heap memory fragmentation. Updated every time a compaction is
performed.
fragments The number of fragments of free off-heap memory. Updated every time a compaction is done.
freeMemory The amount of off-heap memory, in bytes, that is not being used.
largestFragment The largest fragment of memory found by the last compaction of off heap memory. Updated
every time a compaction is done.
maxMemory The maximum amount of off-heap memory, in bytes. This is the amount of memory allocated at
startup and does not change.
objects The number of objects stored in off-heap memory.
reads The total number of reads of off-heap memory.
usedMemory The amount of off-heap memory, in bytes, that is being used to store data.
MBeans
MemberMXBean
getOffHeapCompactionTime -- provides the value of the compactionTime statistic
getOffHeapFragmentation -- provides the value of the fragmentation statistic
getOffHeapFreeMemory -- provides the value of the freeMemory statistic
getOffHeapObjects -- provides the value of the objects statistic
getOffHeapUsedMemory -- provides the value of the usedMemory statistic
getOffHeapMaxMemory -- provides the value of freeMemory + usedMemory
RegionMXBean
listRegionAttributes (operation)
enableOffHeapMemory (true | false)
Gfsh Support for Off-heap Memory
• alter disk-store: new option "--off-heap" for setting off-heap for each
region in the disk-store
• create region: new option "--off-heap" for setting off-heap
• describe member: now displays the off-heap size
• describe offline-disk-store: now shows if a region is off-heap
• describe region: now displays the off-heap region attribute
• show metrics: Now has an offheap category. The offheap metrics
are: maxMemory, freeMemory, usedMemory, objects, fragmentation,
and compactionTime
• start server: added --lock-memory, --off-heap-memory-size, --critical-
off-heap-percentage, and --eviction-off-heap-perentage
Off-heap Limitations
• Maximum object size limited to slightly less than 2 GB
• All data nodes must consistently configure a region to be off-
heap
• Functional Range Indexes not supported
• Keys, subscription queue entries not stored off-heap
• Fragmentation statistic is only updated during off-heap
compactions
Implementation Overview
Off-heap: How are We Doing It?
• Using memory that is separate from the Java heap
– Build our own Memory Manager
– Memory Manager is very finely tuned and specific to our usage
– Avoid GC overhead
▪ Avoid copying of objects for promotion between generations
▪ Garbage Collector is a major performance killer
– Use sun.misc.Unsafe API for performance
• Optimizing code to minimize usage of heap memory
• Using off-heap as primary store instead of overflowing to it
Off-heap
Memory
Management
Off-heap Implementation
• Memory allocated in 2GB slabs
– Max data value size: ~2GB
– Object values stored serialized; blobs stored as byte arrays
– Allocation faster for values < 128KB
▪ Controlled by a system property: gemfire.OFF_HEAP_FREE_LIST_COUNT
▪ First try to allocate from the free list; if that fails, allocate from unused memory
▪ Small values (< 8B) inlined (not using any off-heap space)
• Compaction consolidates free memory to minimize
fragmentation
– Blocks writes; best to avoid by minimizing fragmentation
Off-heap Implementation (cont’d)
• Allocated chunks
– Header
▪ isSerialized
▪ isCompressed
▪ Size
▪ Padding size
• Free chunks
– Header
▪ Size
▪ Address of next chunk in the list
What is Stored On-heap vs. Off-heap
Stored On-heap Stored Off-heap
Region Meta-Data Values
Entry Meta-Data Reference Counts
Off-Heap Addresses Lists of Free Memory Blocks
Keys WAN Queue Elements
Indexes
Subscription Queue Elements
Preliminary Benchmarks:
Off-heap vs. Heap
Off-heap: Initial Testing Results
• 256 GB user data per node across 8 nodes for total of 2 TB
of user data
• Heap-only test worked twice as hard to produce 1/3 the
updates as test using Off-Heap
– Details on the next slide
• Succeeded in scaling up to much larger in-memory data
• Increased throughput of operations for large data sets
Heap vs. Off-Heap Comparison
Java Heap Off-Heap
creates/sec 30,000 45,000
updates/sec 17,000 (std dev: 2130) 51,000 (std dev: 737)
Java RSS size 50 GB 32 GB
CPU load 70% (load avg 10 cpus) 32% (load avg 5 cpus)
JVM GC ConcurrentMarkSweep ConcurrentMarkSweep
GC ms/sec 777 ms 24 ms
GC marks (GC pauses) 1 per 30 sec never
Recommendations and
Best Practices
Off-heap Rules of Thumb
• Avoid fragmentation
– In order to avoid compaction
– Avoid usage patterns that lead to fragmentation
– Many updates of varying value size
• Avoid “unfriendly” features
– Deltas
– Functional Range Indexes
– Querying
Off-heap Recommendations
• Do use when
– The values are relatively uniform in size
– The values are mostly less than 128K in size
– The usage patterns involve cycles of many creates followed by
destroys or clear
– The values do not need to be frequently deserialized
• Configure all data nodes with the same off-heap-memory-
size
Questions for You...
We’d appreciate your thoughts...
• Would you like an API to invoke a compaction?
• Would you like to be able to configure the slab size?
• Would you like to configure the max value size for the most
efficient off-heap allocation, or maybe the size increment?
• Anything else?
• Full spec at:
https://cwiki.apache.org/confluence/display/GEODE/Off-
Heap+Memory+Spec
Thank You!

Apache Geode Offheap Storage

  • 1.
  • 2.
    Agenda • Motivation andgoals for off-heap storage • Off-heap features and usage • Implementation overview • Preliminary benchmarks: off-heap vs. heap • Tips and best practices
  • 3.
    Motivation and goalsfor off-heap storage
  • 4.
    Why Off-heap • • Increasedata density and reduce memory overhead • 50+ GB user data in one JVM • 10+ TB user data in one cluster • Usable out-of-box without extensive GC tuning of JVM • Maintain existing throughput performance
  • 5.
  • 6.
    Off-heap: How DoI Use It? • Set the off-heap memory size for the process – Using the new property: off-heap-memory-size • Mark regions whose entry values should be stored off-heap – Using the new region attribute: off-heap (false | true) • Adjust the JVM heap memory size down accordingly – The smaller the better; at least try to keep it below 32G • Optionally – Configure Resource Manager
  • 7.
    Off-heap Features • Startupoptions • Interaction with other features • Resource Manager • Monitoring & Management • Limitations
  • 8.
    Startup Options • --off-heap-memory-size– specifies amount of off-heap memory to allocate • -lock-memory – specifies to lock memory from the OS • Example: gfsh start server –initial-heap=10G –max-heap=10G –off-heap- memory-size=200G –lock-memory=true
  • 9.
    Off-heap Interaction withOther Features • PDX – Values currently copied from off-heap to create a PDXInstance • Deltas: expensive • Compression: compatible with off-heap • Querying: more expensive with off-heap • EntryEvents – Limited availability of oldValue, newValue • Indexes – Functional range indexes not supported (too expensive)
  • 10.
    Off-heap and ResourceManager • Out of Memory Semantics • Eviction and Critical Thresholds • Resource Manager API
  • 11.
    Out of Memoryoccurs when... • Java heap runs out of memory – Threads start throwing OutOfMemoryError • Off-heap runs out of memory – Threads start throwing OutOfOffHeapMemoryException • => causing the Geode member to close and disconnect – Closes the Cache to prevent reading inconsistent data – Disconnects from the Geode cluster to prevent distribution problems or hangs
  • 12.
    Eviction and CriticalThresholds for Java Heap • CriticalHeapPercentage – triggers LowMemoryException for puts into heap regions – default is 90% – critical member informs other members that it is critical • EvictionHeapPercentage – triggers eviction of entries in heap regions configured with LRU_HEAP – default is 90% of CriticalHeapPercentage
  • 13.
    Eviction and CriticalThresholds for Off-heap • CriticalOffHeapPercentage – triggers LowMemoryException for puts into off-heap regions – default is 90% if –off-heap-memory-size is specified – critical member informs other members that it is critical • EvictionOffHeapPercentage – triggers eviction of entries in off-heap regions configured with LRU_HEAP – default is 90% of CriticalOffHeapPercentage if –off-heap-memory- size is specified
  • 14.
    Startup Options • Existing: •-critical-heap-percentage • -eviction-heap-percentage • New: • -critical-off-heap-percentage • -eviction-off-heap-percentage • Example: start server –initial-heap=10G –max-heap=10G –off-heap-memory- size=200G –lock-memory=true –critical-off-heap-percentage=99
  • 15.
    ResourceManager API • GemFireCache#getResourceManager() •com.gemstone.gemfire.cache.control.ResourceManager – exposes getters/setters for all of the heap and off-heap threshold percentages – Examples: ▪ public void setCriticalOffHeapPercentage(float offHeapPercentage); ▪ public float getCriticalOffHeapPercentage();
  • 16.
    Monitoring & Management •Statistics • Mbeans • gfsh
  • 17.
    Statistics name description compactions Thetotal number of times off-heap memory has been compacted. compactionTime The total time spent compacting off-heap memory. fragmentation The percentage of off-heap memory fragmentation. Updated every time a compaction is performed. fragments The number of fragments of free off-heap memory. Updated every time a compaction is done. freeMemory The amount of off-heap memory, in bytes, that is not being used. largestFragment The largest fragment of memory found by the last compaction of off heap memory. Updated every time a compaction is done. maxMemory The maximum amount of off-heap memory, in bytes. This is the amount of memory allocated at startup and does not change. objects The number of objects stored in off-heap memory. reads The total number of reads of off-heap memory. usedMemory The amount of off-heap memory, in bytes, that is being used to store data.
  • 18.
    MBeans MemberMXBean getOffHeapCompactionTime -- providesthe value of the compactionTime statistic getOffHeapFragmentation -- provides the value of the fragmentation statistic getOffHeapFreeMemory -- provides the value of the freeMemory statistic getOffHeapObjects -- provides the value of the objects statistic getOffHeapUsedMemory -- provides the value of the usedMemory statistic getOffHeapMaxMemory -- provides the value of freeMemory + usedMemory RegionMXBean listRegionAttributes (operation) enableOffHeapMemory (true | false)
  • 19.
    Gfsh Support forOff-heap Memory • alter disk-store: new option "--off-heap" for setting off-heap for each region in the disk-store • create region: new option "--off-heap" for setting off-heap • describe member: now displays the off-heap size • describe offline-disk-store: now shows if a region is off-heap • describe region: now displays the off-heap region attribute • show metrics: Now has an offheap category. The offheap metrics are: maxMemory, freeMemory, usedMemory, objects, fragmentation, and compactionTime • start server: added --lock-memory, --off-heap-memory-size, --critical- off-heap-percentage, and --eviction-off-heap-perentage
  • 20.
    Off-heap Limitations • Maximumobject size limited to slightly less than 2 GB • All data nodes must consistently configure a region to be off- heap • Functional Range Indexes not supported • Keys, subscription queue entries not stored off-heap • Fragmentation statistic is only updated during off-heap compactions
  • 21.
  • 22.
    Off-heap: How areWe Doing It? • Using memory that is separate from the Java heap – Build our own Memory Manager – Memory Manager is very finely tuned and specific to our usage – Avoid GC overhead ▪ Avoid copying of objects for promotion between generations ▪ Garbage Collector is a major performance killer – Use sun.misc.Unsafe API for performance • Optimizing code to minimize usage of heap memory • Using off-heap as primary store instead of overflowing to it
  • 23.
  • 24.
    Off-heap Implementation • Memoryallocated in 2GB slabs – Max data value size: ~2GB – Object values stored serialized; blobs stored as byte arrays – Allocation faster for values < 128KB ▪ Controlled by a system property: gemfire.OFF_HEAP_FREE_LIST_COUNT ▪ First try to allocate from the free list; if that fails, allocate from unused memory ▪ Small values (< 8B) inlined (not using any off-heap space) • Compaction consolidates free memory to minimize fragmentation – Blocks writes; best to avoid by minimizing fragmentation
  • 25.
    Off-heap Implementation (cont’d) •Allocated chunks – Header ▪ isSerialized ▪ isCompressed ▪ Size ▪ Padding size • Free chunks – Header ▪ Size ▪ Address of next chunk in the list
  • 26.
    What is StoredOn-heap vs. Off-heap Stored On-heap Stored Off-heap Region Meta-Data Values Entry Meta-Data Reference Counts Off-Heap Addresses Lists of Free Memory Blocks Keys WAN Queue Elements Indexes Subscription Queue Elements
  • 27.
  • 28.
    Off-heap: Initial TestingResults • 256 GB user data per node across 8 nodes for total of 2 TB of user data • Heap-only test worked twice as hard to produce 1/3 the updates as test using Off-Heap – Details on the next slide • Succeeded in scaling up to much larger in-memory data • Increased throughput of operations for large data sets
  • 29.
    Heap vs. Off-HeapComparison Java Heap Off-Heap creates/sec 30,000 45,000 updates/sec 17,000 (std dev: 2130) 51,000 (std dev: 737) Java RSS size 50 GB 32 GB CPU load 70% (load avg 10 cpus) 32% (load avg 5 cpus) JVM GC ConcurrentMarkSweep ConcurrentMarkSweep GC ms/sec 777 ms 24 ms GC marks (GC pauses) 1 per 30 sec never
  • 30.
  • 31.
    Off-heap Rules ofThumb • Avoid fragmentation – In order to avoid compaction – Avoid usage patterns that lead to fragmentation – Many updates of varying value size • Avoid “unfriendly” features – Deltas – Functional Range Indexes – Querying
  • 32.
    Off-heap Recommendations • Douse when – The values are relatively uniform in size – The values are mostly less than 128K in size – The usage patterns involve cycles of many creates followed by destroys or clear – The values do not need to be frequently deserialized • Configure all data nodes with the same off-heap-memory- size
  • 33.
  • 34.
    We’d appreciate yourthoughts... • Would you like an API to invoke a compaction? • Would you like to be able to configure the slab size? • Would you like to configure the max value size for the most efficient off-heap allocation, or maybe the size increment? • Anything else? • Full spec at: https://cwiki.apache.org/confluence/display/GEODE/Off- Heap+Memory+Spec
  • 35.