Apple logo is a trademark of Apple Inc.
Dongjoon Hyun


Pang Wu
The Rise of ZStandard
DATA+AI Summit 2021
THIS IS NOT A CONTRIBUTION
=
This is not a contribution.
Who am I
Dongjoon Hyun
Apache Spark PMC member and Committer


Apache ORC PMC member and Committer


Apache REEF PMC member and Committer


https://github.com/dongjoon-hyun


https://www.linkedin.com/in/dongjoon


@dongjoonhyun
=
This is not a contribution.
Who am I
Pang Wu
Software Engineer @Apple


Maps related data pipelines & dev-tools


Work closely with Apple’s Spark PMC to deliver


new features.


https://www.linkedin.com/in/pangwu/
Agenda
ZStandard


Issues


History


When / Why / How to Use


Limitations


Summary
=
This is not a contribution.
A fast compression algorithm, providing high compression ratios
ZStandard (v1.4.9)
Tunable with compression levels
https://facebook.github.io/zstd/
=
This is not a contribution.
Requires Hadoop 2.9+ and pre-built with zStandard library
Issue 1: Apache Hadoop ZStandardCodec
Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env
scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p")


java.lang.RuntimeException: native zStandard library not available
=
This is not a contribution.
Requires Hadoop 2.9+ and pre-built with zStandard library
Issue 1: Apache Hadoop ZStandardCodec
Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env
scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p")


java.lang.RuntimeException: native zStandard library not available
=
This is not a contribution.
Requires Hadoop 2.9+ and pre-built with zStandard library
Issue 1: Apache Hadoop ZStandardCodec
Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env
Use own codec classes


by using zstd-jni or aircompressor library
scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p")


java.lang.RuntimeException: native zStandard library not available
=
This is not a contribution.
Slow compression and decompression speed
Issue 2: Buffer management
Use RecyclingBufferPool (SPARK-34340/PARQUET-1973/AVRO-3060)
=
This is not a contribution.
Slow compression and decompression speed
Issue 2: Buffer management
Use RecyclingBufferPool (SPARK-34340/PARQUET-1973/AVRO-3060)
Compression speedup
0x
1x
2x
3x
4x
Level 1 Level 2 Level 3
NoPool RecyclingBufferPool
https://issues.apache.org/jira/browse/SPARK-34387
Decompression speedup
0x
0.5x
1x
1.5x
2x
Level 1 Level 2 Level 3
NoPool RecyclingBufferPool
=
This is not a contribution.
Own codecs require more memory than other compression algorithms
Issue 2: Buffer management (Cont.)
`OOMKilled` may happen in K8s environment when we switch to zstd
NAME READY STATUS RESTARTS AGE


job 1/1 Running 0 16m


job-exec-1 0/1 OOMKilled 0 16m


job-exec-2 0/1 OOMKilled 0 16m
=
This is not a contribution.
Own codecs require more memory than other compression algorithms
Issue 2: Buffer management (Cont.)
`OOMKilled` may happen in K8s environment when we switch to zstd
Use ZStdNoFinalizer to improve GC (zstd-jni 1.4.8+)
NAME READY STATUS RESTARTS AGE


job 1/1 Running 0 16m


job-exec-1 0/1 OOMKilled 0 16m


job-exec-2 0/1 OOMKilled 0 16m
=
This is not a contribution.
Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible
Issue 3: zstd-jni inconsistency
API Incompatibility


- https://github.com/luben/zstd-jni/issues/161
=
This is not a contribution.
Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible
Issue 3: zstd-jni inconsistency
API Incompatibility


- https://github.com/luben/zstd-jni/issues/161
=
This is not a contribution.
Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible
Issue 3: zstd-jni inconsistency
API Incompatibility


- https://github.com/luben/zstd-jni/issues/161


Performance inconsistency


- v1.4.5-7 BufferPool was added as the default


- v1.4.5-8 RecyclingBufferPool was added an
d

BufferPool became an interface


- v1.4.7+ NoPool is used by default
=
This is not a contribution.
Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible
Issue 3: zstd-jni inconsistency
API Incompatibility


- https://github.com/luben/zstd-jni/issues/161


Performance inconsistency


- v1.4.5-7 BufferPool was added as the default


- v1.4.5-8 RecyclingBufferPool was added an
d

BufferPool became an interface


- v1.4.7+ NoPool is used by default
Upgrade Spark and dependent Apache projects to use zstd-jni 1.4.9-1


(SPARK-34670, PARQUET-1994, AVRO-3072, KAFKA-12442)
=
This is not a contribution.
Apache Spark with ZStandard
History
v2.3 Add ZStdCompressionCodec


SPARK-19112
=
This is not a contribution.
Apache Spark with ZStandard
History
v2.3 Add ZStdCompressionCodec


v2.4 Add Apache Hadoop 3.1 profile


Use Apache Parquet 1.10 with Hadoop ZStandardCodec


SPARK-19112


SPARK-23807


SPARK-23972
=
This is not a contribution.
Apache Spark with ZStandard
History
v2.3 Add ZStdCompressionCodec


v2.4 Add Apache Hadoop 3.1 profile


Use Apache Parquet 1.10 with Hadoop ZStandardCodec


v3.0 Broadcast MapStatus with ZStdCompressionCodec


Split event log compression from IO compression


v3.1 Upgrade to Zstd-jni 1.4.8
SPARK-19112


SPARK-23807


SPARK-23972


SPARK-29434


SPARK-28118


SPARK-33843
=
This is not a contribution.
Apache Parquet/ORC/Avro with ZStandard
History (Cont.)
Apache Parquet 1.12.0+


- PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD


- PARQUET-1973: Support ZSTD JNI BufferPool


- PARQUET-1994: Upgrade ZSTD JNI to 1.4.9-1


Apache ORC 1.6.0+


- ORC-363: Enable zStandard codec


- ORC-395: Support ZSTD in C++ writer/reader


Apache Avro 1.10.2+


- AVRO-2195: Add Zstandard Codec


- AVRO-3072: Use ZSTD NoFinalizer classes and bump to 1.4.9-1


- AVRO-3060: Support ZSTD level and BufferPool options
=
This is not a contribution.
Apache Parquet/ORC/Avro with ZStandard
History (Cont.)
Apache Parquet 1.12.0+


- PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD


- PARQUET-1973: Support ZSTD JNI BufferPool


- PARQUET-1994: Upgrade ZSTD JNI to 1.4.9-1


Apache ORC 1.6.0+


- ORC-363: Enable zStandard codec


- ORC-395: Support ZSTD in C++ writer/reader


Apache Avro 1.10.2+


- AVRO-2195: Add Zstandard Codec


- AVRO-3072: Use ZSTD NoFinalizer classes and bump to 1.4.9-1


- AVRO-3060: Support ZSTD level and BufferPool options
SPARK-34651


Improve ZSTD support
Agenda
ZStandard


Issues


History


When / Why / How to Use


Limitations


Summary
Use spark.eventLog.compression.codec=zstd
Spark Event Log
Event Log Size (TPCDS 3TB)
0 MB
1600 MB
3200 MB
TEXT LZ4 ZSTD
17x smaller than TEXT


3x smaller than LZ4
spark.eventLog.enabled=true


spark.eventLog.compress=true


spark.eventLog.compression.codec=zstd
Shuffle IO
NAME READY STATUS RESTARTS AGE


disk-emptydir 1/1 Running 0 16m


disk-emptydir-exec-1 0/1 Evicted 0 16m


disk-emptydir-exec-2 0/1 Evicted 0 16m
Use spark.io.compression.codec=zstd
Shuffle IO (Cont.)
Shuffle Write Size (TPCDS 3TB)
0 TB
3 TB
6 TB
9 TB
LZ4 ZSTD
Shuffle Read Size (TPCDS 3TB)
0
3
6
9
LZ4 ZSTD
44% Less
43% Less
Q67
Shuffle IO (Cont.)
20% faster
QUERY EXECUTION
0 min
5 min
10 min
15 min
20 min
LZ4 ZSTD
Apache Parquet ZStandard is smaller than GZIP
Storage
Apache Parquet (TPCDS 1TB)
0 GB
100 GB
200 GB
300 GB
SNAPPY LZ4 GZIP ZSTD
Apache ORC ZStandard is smaller than Parquet ZStandard in general
Storage (Cont.)
TPCDS 3TB
0 GB
350 GB
700 GB
PARQUET ORC
Built-in file format Configurations
FORMAT CONFIGURATION
PARQUET spark.sql.parquet.compression.codec
parquet.compression.codec.zstd.level
parquet.compression.codec.zstd.bufferPool.enabled
AVRO spark.sql.avro.compression.codec
avro.mapred.zstd.level
avro.mapred.zstd.bufferpool
ORC spark.sql.orc.compression.codec
Agenda
ZStandard


Issues


History


When / Why / How to Use


Limitations


Summary
=
This is not a contribution.
Limitations
ZStandard is not supported by CPU/GPU acceleration


Apache ORC is still using ZSTD 1.3.5


- Need to replace aircompressor with zstd-jni


Apache Parquet has more room to optimize memory consumption


- PARQUET-2022: ZstdDecompressorStream should close
`zstdInputStream`
=
This is not a contribution.
Use ZSTD to maximize your cluster utilizations
Summary
Use zstd with event log compression by default


Use zstd with shuffle io compression with K8s volumes


Use zstd with Parquet/ORC/Avro files
TM and © 2021 Apple Inc. All rights reserved.
Feedback
Your feedback is important to us.


Don’t forget to rate and review the sessions.

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro

  • 1.
    Apple logo isa trademark of Apple Inc. Dongjoon Hyun Pang Wu The Rise of ZStandard DATA+AI Summit 2021 THIS IS NOT A CONTRIBUTION
  • 2.
    = This is nota contribution. Who am I Dongjoon Hyun Apache Spark PMC member and Committer Apache ORC PMC member and Committer Apache REEF PMC member and Committer https://github.com/dongjoon-hyun https://www.linkedin.com/in/dongjoon @dongjoonhyun
  • 3.
    = This is nota contribution. Who am I Pang Wu Software Engineer @Apple Maps related data pipelines & dev-tools Work closely with Apple’s Spark PMC to deliver new features. https://www.linkedin.com/in/pangwu/
  • 4.
    Agenda ZStandard Issues History When / Why/ How to Use Limitations Summary
  • 5.
    = This is nota contribution. A fast compression algorithm, providing high compression ratios ZStandard (v1.4.9) Tunable with compression levels https://facebook.github.io/zstd/
  • 6.
    = This is nota contribution. Requires Hadoop 2.9+ and pre-built with zStandard library Issue 1: Apache Hadoop ZStandardCodec Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p") java.lang.RuntimeException: native zStandard library not available
  • 7.
    = This is nota contribution. Requires Hadoop 2.9+ and pre-built with zStandard library Issue 1: Apache Hadoop ZStandardCodec Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p") java.lang.RuntimeException: native zStandard library not available
  • 8.
    = This is nota contribution. Requires Hadoop 2.9+ and pre-built with zStandard library Issue 1: Apache Hadoop ZStandardCodec Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env Use own codec classes 
 by using zstd-jni or aircompressor library scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p") java.lang.RuntimeException: native zStandard library not available
  • 9.
    = This is nota contribution. Slow compression and decompression speed Issue 2: Buffer management Use RecyclingBufferPool (SPARK-34340/PARQUET-1973/AVRO-3060)
  • 10.
    = This is nota contribution. Slow compression and decompression speed Issue 2: Buffer management Use RecyclingBufferPool (SPARK-34340/PARQUET-1973/AVRO-3060) Compression speedup 0x 1x 2x 3x 4x Level 1 Level 2 Level 3 NoPool RecyclingBufferPool https://issues.apache.org/jira/browse/SPARK-34387 Decompression speedup 0x 0.5x 1x 1.5x 2x Level 1 Level 2 Level 3 NoPool RecyclingBufferPool
  • 11.
    = This is nota contribution. Own codecs require more memory than other compression algorithms Issue 2: Buffer management (Cont.) `OOMKilled` may happen in K8s environment when we switch to zstd NAME READY STATUS RESTARTS AGE job 1/1 Running 0 16m job-exec-1 0/1 OOMKilled 0 16m job-exec-2 0/1 OOMKilled 0 16m
  • 12.
    = This is nota contribution. Own codecs require more memory than other compression algorithms Issue 2: Buffer management (Cont.) `OOMKilled` may happen in K8s environment when we switch to zstd Use ZStdNoFinalizer to improve GC (zstd-jni 1.4.8+) NAME READY STATUS RESTARTS AGE job 1/1 Running 0 16m job-exec-1 0/1 OOMKilled 0 16m job-exec-2 0/1 OOMKilled 0 16m
  • 13.
    = This is nota contribution. Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible Issue 3: zstd-jni inconsistency API Incompatibility - https://github.com/luben/zstd-jni/issues/161
  • 14.
    = This is nota contribution. Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible Issue 3: zstd-jni inconsistency API Incompatibility - https://github.com/luben/zstd-jni/issues/161
  • 15.
    = This is nota contribution. Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible Issue 3: zstd-jni inconsistency API Incompatibility - https://github.com/luben/zstd-jni/issues/161 Performance inconsistency - v1.4.5-7 BufferPool was added as the default - v1.4.5-8 RecyclingBufferPool was added an d
 BufferPool became an interface - v1.4.7+ NoPool is used by default
  • 16.
    = This is nota contribution. Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible Issue 3: zstd-jni inconsistency API Incompatibility - https://github.com/luben/zstd-jni/issues/161 Performance inconsistency - v1.4.5-7 BufferPool was added as the default - v1.4.5-8 RecyclingBufferPool was added an d
 BufferPool became an interface - v1.4.7+ NoPool is used by default Upgrade Spark and dependent Apache projects to use zstd-jni 1.4.9-1 
 (SPARK-34670, PARQUET-1994, AVRO-3072, KAFKA-12442)
  • 17.
    = This is nota contribution. Apache Spark with ZStandard History v2.3 Add ZStdCompressionCodec SPARK-19112
  • 18.
    = This is nota contribution. Apache Spark with ZStandard History v2.3 Add ZStdCompressionCodec v2.4 Add Apache Hadoop 3.1 profile Use Apache Parquet 1.10 with Hadoop ZStandardCodec SPARK-19112 SPARK-23807 SPARK-23972
  • 19.
    = This is nota contribution. Apache Spark with ZStandard History v2.3 Add ZStdCompressionCodec v2.4 Add Apache Hadoop 3.1 profile Use Apache Parquet 1.10 with Hadoop ZStandardCodec v3.0 Broadcast MapStatus with ZStdCompressionCodec Split event log compression from IO compression v3.1 Upgrade to Zstd-jni 1.4.8 SPARK-19112 SPARK-23807 SPARK-23972 SPARK-29434 SPARK-28118 SPARK-33843
  • 20.
    = This is nota contribution. Apache Parquet/ORC/Avro with ZStandard History (Cont.) Apache Parquet 1.12.0+ - PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD - PARQUET-1973: Support ZSTD JNI BufferPool - PARQUET-1994: Upgrade ZSTD JNI to 1.4.9-1 Apache ORC 1.6.0+ - ORC-363: Enable zStandard codec - ORC-395: Support ZSTD in C++ writer/reader Apache Avro 1.10.2+ - AVRO-2195: Add Zstandard Codec - AVRO-3072: Use ZSTD NoFinalizer classes and bump to 1.4.9-1 - AVRO-3060: Support ZSTD level and BufferPool options
  • 21.
    = This is nota contribution. Apache Parquet/ORC/Avro with ZStandard History (Cont.) Apache Parquet 1.12.0+ - PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD - PARQUET-1973: Support ZSTD JNI BufferPool - PARQUET-1994: Upgrade ZSTD JNI to 1.4.9-1 Apache ORC 1.6.0+ - ORC-363: Enable zStandard codec - ORC-395: Support ZSTD in C++ writer/reader Apache Avro 1.10.2+ - AVRO-2195: Add Zstandard Codec - AVRO-3072: Use ZSTD NoFinalizer classes and bump to 1.4.9-1 - AVRO-3060: Support ZSTD level and BufferPool options SPARK-34651 Improve ZSTD support
  • 22.
    Agenda ZStandard Issues History When / Why/ How to Use Limitations Summary
  • 23.
    Use spark.eventLog.compression.codec=zstd Spark EventLog Event Log Size (TPCDS 3TB) 0 MB 1600 MB 3200 MB TEXT LZ4 ZSTD 17x smaller than TEXT 
 3x smaller than LZ4 spark.eventLog.enabled=true 
 spark.eventLog.compress=true 
 spark.eventLog.compression.codec=zstd
  • 24.
    Shuffle IO NAME READYSTATUS RESTARTS AGE disk-emptydir 1/1 Running 0 16m disk-emptydir-exec-1 0/1 Evicted 0 16m disk-emptydir-exec-2 0/1 Evicted 0 16m
  • 25.
    Use spark.io.compression.codec=zstd Shuffle IO(Cont.) Shuffle Write Size (TPCDS 3TB) 0 TB 3 TB 6 TB 9 TB LZ4 ZSTD Shuffle Read Size (TPCDS 3TB) 0 3 6 9 LZ4 ZSTD 44% Less 43% Less
  • 26.
    Q67 Shuffle IO (Cont.) 20%faster QUERY EXECUTION 0 min 5 min 10 min 15 min 20 min LZ4 ZSTD
  • 27.
    Apache Parquet ZStandardis smaller than GZIP Storage Apache Parquet (TPCDS 1TB) 0 GB 100 GB 200 GB 300 GB SNAPPY LZ4 GZIP ZSTD
  • 28.
    Apache ORC ZStandardis smaller than Parquet ZStandard in general Storage (Cont.) TPCDS 3TB 0 GB 350 GB 700 GB PARQUET ORC
  • 29.
    Built-in file formatConfigurations FORMAT CONFIGURATION PARQUET spark.sql.parquet.compression.codec parquet.compression.codec.zstd.level parquet.compression.codec.zstd.bufferPool.enabled AVRO spark.sql.avro.compression.codec avro.mapred.zstd.level avro.mapred.zstd.bufferpool ORC spark.sql.orc.compression.codec
  • 30.
    Agenda ZStandard Issues History When / Why/ How to Use Limitations Summary
  • 31.
    = This is nota contribution. Limitations ZStandard is not supported by CPU/GPU acceleration Apache ORC is still using ZSTD 1.3.5 - Need to replace aircompressor with zstd-jni Apache Parquet has more room to optimize memory consumption - PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`
  • 32.
    = This is nota contribution. Use ZSTD to maximize your cluster utilizations Summary Use zstd with event log compression by default Use zstd with shuffle io compression with K8s volumes Use zstd with Parquet/ORC/Avro files
  • 33.
    TM and ©2021 Apple Inc. All rights reserved.
  • 34.
    Feedback Your feedback isimportant to us. Don’t forget to rate and review the sessions.