Spark memory model
Major categorization
● Java heap memory
○ Characterised by Garbage collection
● JVM off-heap memory / direct memory
● Python memory
Java heap memory
Java heap memory
1. Storage Memory — JVM heap space reserved for cached data
2. Execution (or shuffle) Memory — JVM heap space used by data-structures during
shuffle operations (joins, group-by’s and aggregations). Earlier (before Spark 1.6), the
term shuffle memory was also used to describe this section of the memory.
3. User Memory — For storing the data-structures created and managed by the user’s
code
4. Reserved Memory — Reserved by Spark for internal purposes.
Java heap memory
(spark.memory.fraction)
(spark.memory.storageFraction)
Java heap memory (legacy)
Spark2.x vs Spark3.x
Java off-heap memory
Java off-heap memory
1. Off heap dataframes
2. VM overheads — Interned strings, etc.
Java off-heap memory
Java off-heap memory (legacy)
Python worker memory
1. Python worker memory — limits the memory in JVM for Python objects
2. Pyspark Executor memory — limits the memory of the actual Python process
Python worker memory
Python worker memory
References
Monitoring and Instrumentation - Spark 3.3.2 Documentation
Decoding Memory in Spark — Parameters that are often confused | by Sohom Majumdar | Walmart Global Tech Blog | Medium
Apache Spark Memory Management. This blog describes the concepts behind… | by Suhas N M | Analytics Vidhya | Medium

Spark3's new memory model/management