1. A Multi-phase Approach to Floating-Point Compression
Kevin Townsend and Joseph Zambreno
Reconfigurable Computing Laboratory
Iowa State University
EIT’15
Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16
2. Outline
1 Introduction
2 Approach
8-byte patterns
Less Than 8-byte patterns
More Than 8-byte patterns
Combining into fzip
3 Results
4 Future Work
Townsend and Zambreno (RCL@ISU) float zip EIT’15 2 / 16
3. Introduction
Introduction
What are floating point datasets?
They are arrays of floating point values.
A 64-bit floating point value has a sign bit 11 exponent bits and 52
fractional bits.
However, you can view this as compressing an array of 64-bit integers.
Why compress them?
Compressed floating point datasets take up less space.
Compression can accelerate data transfer.
Knowledge of floating point datasets can lead to better compression
over general compression schemes.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 3 / 16
4. Approach
Approach
Analysis of 3 different patterns:
Repeating values
Common prefixes
Patterns in the value sequence
We created 3 different compression schemes:
List all values and use indices in this list.
Create a tree of all prefixes and create prefix codes.
Use the Burrows-Wheeler Transform and a simple compression scheme.
We combined all 3 algorithms into one algorithm.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 4 / 16
5. Approach 8-byte patterns
Analysis
0% 20% 40% 60% 80% 100%
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasma
obs errorobs info
obs spitzer
obs temp
Percent of total values
DataSets
Many Repeats No Repeats
In all the datasets at
least 50% of the
values have a
repeat.
Values are 8-bytes
so indices are much
smaller than values.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 5 / 16
6. Approach 8-byte patterns
Algorithm
All repeats stored in a separate array.
One bit indicates if the value that is encoded repeats or not.
If the value does not repeat the 64-bit value follows.
If the value does repeat the index in the repeat array follows.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 6 / 16
7. Approach 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 7 / 16
8. Approach Less Than 8-byte patterns
Analysis
SIGN EXPONENT FRACTION
100% 0%
0 16 32 48 64
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasma
obs error
obs info
obs spitzer
obs temp
number of bits matching previous value
This figure
shows the
amount that
adjacent prefixes
repeat in a given
dataset.
As seen the bits
quickly start
differing after
the 12th bit.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 8 / 16
10. Approach Less Than 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 10 / 16
12. Approach More Than 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 12 / 16
13. Approach Combining into fzip
Algorithm
fzip starts with the BWT compression which creates a new dataset.
Repeats are added to the prefix codes to combine repeat and prefix
compression.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 13 / 16
14. Approach Combining into fzip
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 14 / 16
15. Results
Floating Point Compression Performance
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
bzip -9
FPC 25
gzip -9
Townsend and Zambreno (RCL@ISU) float zip EIT’15 15 / 16
16. Future Work
Future Work
2 directions for future work: towards tradition dictionary approaches
and towards BWT.
The BWT road:
Replace prefix and repeat compression with a ”Move-to-Front”
algorithm.
Has the potential for high compression ratios.
BWT makes this slower and less hardware amenable.
The traditional road:
Replace BWT with dictionary approach (LZW).
This is more hardware amenable.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 16 / 16