The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro

Apple logo is a trademark of Apple Inc.
Dongjoon Hyun

Pang Wu
The Rise of ZStandard
DATA+AI Summit 2021
THIS IS NOT A CONTRIBUTION

=
This is not a contribution.
Who am I
Dongjoon Hyun
Apache Spark PMC member and Committer

Apache ORC PMC member and Committer

Apache REEF PMC member and Committer

https://github.com/dongjoon-hyun

https://www.linkedin.com/in/dongjoon

@dongjoonhyun

=
Who am I
Pang Wu
Software Engineer @Apple

Maps related data pipelines & dev-tools

Work closely with Apple’s Spark PMC to deliver

new features.

https://www.linkedin.com/in/pangwu/

Agenda
ZStandard

Issues

History

When / Why / How to Use

Limitations

Summary

=
A fast compression algorithm, providing high compression ratios
ZStandard (v1.4.9)
Tunable with compression levels
https://facebook.github.io/zstd/

=
Requires Hadoop 2.9+ and pre-built with zStandard library
Issue 1: Apache Hadoop ZStandardCodec
Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env
scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p")

java.lang.RuntimeException: native zStandard library not available

=
Requires Hadoop 2.9+ and pre-built with zStandard library
Issue 1: Apache Hadoop ZStandardCodec
Apache Spark 3.1.1 distribution with Hadoop 3.2 fails in K8s env
Use own codec classes
 
by using zstd-jni or aircompressor library
scala> spark.range(10).write.option("compression", "zstd").parquet("/tmp/p")

java.lang.RuntimeException: native zStandard library not available

=
Slow compression and decompression speed
Issue 2: Buffer management
Use RecyclingBufferPool (SPARK-34340/PARQUET-1973/AVRO-3060)

=
Slow compression and decompression speed
Issue 2: Buffer management
Use RecyclingBufferPool (SPARK-34340/PARQUET-1973/AVRO-3060)
Compression speedup
0x
1x
2x
3x
4x
Level 1 Level 2 Level 3
NoPool RecyclingBufferPool
https://issues.apache.org/jira/browse/SPARK-34387
Decompression speedup
0x
0.5x
1x
1.5x
2x
Level 1 Level 2 Level 3
NoPool RecyclingBufferPool

=
Own codecs require more memory than other compression algorithms
Issue 2: Buffer management (Cont.)
`OOMKilled` may happen in K8s environment when we switch to zstd
NAME READY STATUS RESTARTS AGE

job 1/1 Running 0 16m

job-exec-1 0/1 OOMKilled 0 16m


=
Own codecs require more memory than other compression algorithms
Issue 2: Buffer management (Cont.)
`OOMKilled` may happen in K8s environment when we switch to zstd
Use ZStdNoFinalizer to improve GC (zstd-jni 1.4.8+)

job 1/1 Running 0 16m



=
Different zstd-jni versions in Spark/Parquet/Avro/Kafka are incompatible
Issue 3: zstd-jni inconsistency
API Incompatibility

- https://github.com/luben/zstd-jni/issues/161

=
API Incompatibility


Performance inconsistency

- v1.4.5-7 BufferPool was added as the default

- v1.4.5-8 RecyclingBufferPool was added an
d 
BufferPool became an interface

- v1.4.7+ NoPool is used by default

=
API Incompatibility


Performance inconsistency

- v1.4.5-7 BufferPool was added as the default

- v1.4.5-8 RecyclingBufferPool was added an
d 
BufferPool became an interface

- v1.4.7+ NoPool is used by default
Upgrade Spark and dependent Apache projects to use zstd-jni 1.4.9-1
 
(SPARK-34670, PARQUET-1994, AVRO-3072, KAFKA-12442)

=
Apache Spark with ZStandard
History
v2.3 Add ZStdCompressionCodec

SPARK-19112

=
History

v2.4 Add Apache Hadoop 3.1 profile

Use Apache Parquet 1.10 with Hadoop ZStandardCodec

SPARK-19112

SPARK-23807

SPARK-23972

=
History

v2.4 Add Apache Hadoop 3.1 profile

Use Apache Parquet 1.10 with Hadoop ZStandardCodec

v3.0 Broadcast MapStatus with ZStdCompressionCodec

Split event log compression from IO compression

v3.1 Upgrade to Zstd-jni 1.4.8
SPARK-19112

SPARK-23807

SPARK-23972

SPARK-29434

SPARK-28118

SPARK-33843

=
Apache Parquet/ORC/Avro with ZStandard
History (Cont.)
Apache Parquet 1.12.0+

- PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD

- PARQUET-1973: Support ZSTD JNI BufferPool

- PARQUET-1994: Upgrade ZSTD JNI to 1.4.9-1

Apache ORC 1.6.0+

- ORC-363: Enable zStandard codec

- ORC-395: Support ZSTD in C++ writer/reader

Apache Avro 1.10.2+

- AVRO-2195: Add Zstandard Codec

- AVRO-3072: Use ZSTD NoFinalizer classes and bump to 1.4.9-1

- AVRO-3060: Support ZSTD level and BufferPool options

=
Apache Parquet/ORC/Avro with ZStandard
History (Cont.)
Apache Parquet 1.12.0+

- PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD

- PARQUET-1973: Support ZSTD JNI BufferPool

- PARQUET-1994: Upgrade ZSTD JNI to 1.4.9-1

Apache ORC 1.6.0+

- ORC-363: Enable zStandard codec

- ORC-395: Support ZSTD in C++ writer/reader

Apache Avro 1.10.2+

- AVRO-2195: Add Zstandard Codec

- AVRO-3072: Use ZSTD NoFinalizer classes and bump to 1.4.9-1

- AVRO-3060: Support ZSTD level and BufferPool options
SPARK-34651

Improve ZSTD support

Use spark.eventLog.compression.codec=zstd
Spark Event Log
Event Log Size (TPCDS 3TB)
0 MB
1600 MB
3200 MB
TEXT LZ4 ZSTD
17x smaller than TEXT
 
3x smaller than LZ4
spark.eventLog.enabled=true
 
spark.eventLog.compress=true
 
spark.eventLog.compression.codec=zstd

Shuffle IO

disk-emptydir 1/1 Running 0 16m

disk-emptydir-exec-1 0/1 Evicted 0 16m

disk-emptydir-exec-2 0/1 Evicted 0 16m

Use spark.io.compression.codec=zstd
Shuffle IO (Cont.)
Shuffle Write Size (TPCDS 3TB)
0 TB
3 TB
6 TB
9 TB
LZ4 ZSTD
Shuffle Read Size (TPCDS 3TB)
0
3
6
9
LZ4 ZSTD
44% Less
43% Less

Q67
Shuffle IO (Cont.)
20% faster
QUERY EXECUTION
0 min
5 min
10 min
15 min
20 min
LZ4 ZSTD

Apache Parquet ZStandard is smaller than GZIP
Storage
Apache Parquet (TPCDS 1TB)
0 GB
100 GB
200 GB
300 GB
SNAPPY LZ4 GZIP ZSTD

Apache ORC ZStandard is smaller than Parquet ZStandard in general
Storage (Cont.)
TPCDS 3TB
0 GB
350 GB
700 GB
PARQUET ORC

Built-in file format Configurations
FORMAT CONFIGURATION
PARQUET spark.sql.parquet.compression.codec
parquet.compression.codec.zstd.level
parquet.compression.codec.zstd.bufferPool.enabled
AVRO spark.sql.avro.compression.codec
avro.mapred.zstd.level
avro.mapred.zstd.bufferpool
ORC spark.sql.orc.compression.codec

=
Limitations
ZStandard is not supported by CPU/GPU acceleration

Apache ORC is still using ZSTD 1.3.5

- Need to replace aircompressor with zstd-jni

Apache Parquet has more room to optimize memory consumption

- PARQUET-2022: ZstdDecompressorStream should close
`zstdInputStream`

=
Use ZSTD to maximize your cluster utilizations
Summary
Use zstd with event log compression by default

Use zstd with shuffle io compression with K8s volumes

Use zstd with Parquet/ORC/Avro files

Feedback
Your feedback is important to us.

Don’t forget to rate and review the sessions.

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro

More Related Content

What's hot

More from Databricks

Recently uploaded

In this document

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro