Ever tried to get get clarity on what kinds of memory there are and how to tune each of them ? If not, very likely your jobs are configured incorrectly. As we found out, its is not straightforward and it is not well documented either. This session will provide information on the types of memory to be aware of, the calculations involved in determining how much is allocated to each type of memory and how to tune it depending on the use case.
2. Why Tune ?
• Important to know how much data can be stored in
the chosen state backend
• Which also dictates parallelism of stateful operators
• Under allocating leads to job crashing with OOM
• Over allocating (via more parallelism or container
size) is wasting $$$
• Tuning discussion here is centered around
• Streaming jobs
• Yarn containers
3. TaskMgr Container Memory Layout
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Yarn Container Size
Available to Flink
Cut Off + Available ≈ Container Size
For now, ignore the JVM metaspace size
4. “Cut Off” Space
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Yarn Container Size
Available to Flink
“Cut Off” Space:
• Safety Zone: If JVM tries to exceed container limit, it will be killed. By “cutting off” some
memory, Flink can operate in a smaller slightly space without fear of being externally
terminated.
• Parent and Peer processes: Utilized by scripts that launch the Flink JVM and any other peer
processes in container.
• Native allocations: Allocations from native (C/C++) libraries invoked by Flink (e.g. RocksDB).
5. On or Outside JVM Heap
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Container Size
Available to Flink
Cut Off Space: Outside JVM Heap – Native mem allocations
Netw Buff: Outside JVM – Java Direct Mem Allocation
TM Managed Mem: Configurable to be on JVM Heap or Outside JVM (via Direct Mem allocation).
But this mem is not used in streaming mode. (Also can’t be sized to 0 bytes)
6. Configs & Formulas
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Container Size
Available to Flink
containerized.heap-cutoff-ratio: % of container mem to set aside as Cut Off space.
taskmanager.network.memory.fraction: % of JVM Heap. Is divided into 32KB segments by default.
taskmanager.memory.fraction: % of (Available – Netw Buff) = TM managed memory size.
taskmanager.memory.off-heap: true/false: Choose if TM mgd mem goes on JVM Heap or outside.
taskmanager.memory.preallocate: true.false: Chose if TM mgd mem is allocated lazily or at startup.
7. Hints to Simplify Calculations
“Cut Off” Space
JAVA
Metasp
ace/pe
rmgen
Flink
Network
Buff
JVM Heap
TaskMgr
Managed
Memory
Container Size
Available to Flink
TM Managed Memory
- Place it on JVM heap
- Keep it real small (but larger than 0)
- Disable pre-allocation on it
- You may be able to get away by ignoring Java Metaspace… but good idea to check its size.
- Prior to Java 8 it was called PermGen space and defaults to < 100MB.
ignore
8. Hints to
Simplify
Calculations
• taskmanager.memory.offheap = false
• taskmanager.memory.preallocate = false
• taskmanager.memory.fraction = a small non zero
value
• Therefore, intuitively, available main mem:
• For RocksDB backend ≈ Cut Off
• For Mem/FS state backend ≈ JVM Heap = (ContainerSz –
Cut Off – NetwBuff)
9. Use Cases
•Typical
• Large JVM Heap: Memory/FS State Backend
• Large Cut Off: RocksDB Backend
•Rarer
• Balancing JVM Heap and Cut Off: Some operators relying
on RocksDB backend to store state and other operators
caching data temporarily in memory using Java Maps/Trees
(i.e. not in state backend).
13. Need to Tweak it Yourself ?
• Try this calculator (clone it for yourself)
• https://docs.google.com/spreadsheets/d/1DMUnHXNdoK1BR9TpTTpqeZvbNq
vXGO7PlNmTojtaStU/edit?usp=sharing_eil&ts=5d9d40ae
• Calculator may be useful for batch jobs as well
• If this was useful. Let me know by liking this tweet:
https://twitter.com/naikrosh/status/1180034347191005184