Spark3's new memory model/management

Major categorization
● Java heap memory
○ Characterised by Garbage collection
● JVM off-heap memory / direct memory
● Python memory

Java heap memory
1. Storage Memory — JVM heap space reserved for cached data
2. Execution (or shuffle) Memory — JVM heap space used by data-structures during
shuffle operations (joins, group-by’s and aggregations). Earlier (before Spark 1.6), the
term shuffle memory was also used to describe this section of the memory.
3. User Memory — For storing the data-structures created and managed by the user’s
code
4. Reserved Memory — Reserved by Spark for internal purposes.

Java heap memory
(spark.memory.fraction)
(spark.memory.storageFraction)

Java heap memory (legacy)
Spark2.x vs Spark3.x

Java off-heap memory
1. Off heap dataframes
2. VM overheads — Interned strings, etc.

1. Python worker memory — limits the memory in JVM for Python objects
2. Pyspark Executor memory — limits the memory of the actual Python process
Python worker memory

Spark3's new memory model/management

More Related Content

Similar to Spark3's new memory model/management

Recently uploaded

Spark3's new memory model/management