Apache Spark is widely used for big data analytics & processing. The memory (model) management has changed considerably in Spark3. These slides attempt to explain those differences and how data engineers could leverage it.
4. Java heap memory
1. Storage Memory — JVM heap space reserved for cached data
2. Execution (or shuffle) Memory — JVM heap space used by data-structures during
shuffle operations (joins, group-by’s and aggregations). Earlier (before Spark 1.6), the
term shuffle memory was also used to describe this section of the memory.
3. User Memory — For storing the data-structures created and managed by the user’s
code
4. Reserved Memory — Reserved by Spark for internal purposes.
12. 1. Python worker memory — limits the memory in JVM for Python objects
2. Pyspark Executor memory — limits the memory of the actual Python process
Python worker memory
14. References
Monitoring and Instrumentation - Spark 3.3.2 Documentation
Decoding Memory in Spark — Parameters that are often confused | by Sohom Majumdar | Walmart Global Tech Blog | Medium
Apache Spark Memory Management. This blog describes the concepts behind… | by Suhas N M | Analytics Vidhya | Medium