Floating Point Compression EIT'15

A Multi-phase Approach to Floating-Point Compression
Kevin Townsend and Joseph Zambreno
Reconﬁgurable Computing Laboratory
Iowa State University
EIT’15
Townsend and Zambreno (RCL@ISU) ﬂoat zip EIT’15 1 / 16

Outline
1 Introduction
2 Approach
8-byte patterns
Less Than 8-byte patterns
More Than 8-byte patterns
Combining into fzip
3 Results
4 Future Work

Introduction
Introduction
What are floating point datasets?
They are arrays of floating point values.
A 64-bit floating point value has a sign bit 11 exponent bits and 52
fractional bits.
However, you can view this as compressing an array of 64-bit integers.
Why compress them?
Compressed floating point datasets take up less space.
Compression can accelerate data transfer.
Knowledge of floating point datasets can lead to better compression
over general compression schemes.

Approach
Approach
Analysis of 3 different patterns:
Repeating values
Common prefixes
Patterns in the value sequence
We created 3 different compression schemes:
List all values and use indices in this list.
Create a tree of all prefixes and create prefix codes.
Use the Burrows-Wheeler Transform and a simple compression scheme.
We combined all 3 algorithms into one algorithm.

Approach 8-byte patterns
Analysis
0% 20% 40% 60% 80% 100%
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasma
obs errorobs info
obs spitzer
obs temp
Percent of total values
DataSets
Many Repeats No Repeats
In all the datasets at
least 50% of the
values have a
repeat.
Values are 8-bytes
so indices are much
smaller than values.

Algorithm
All repeats stored in a separate array.
One bit indicates if the value that is encoded repeats or not.
If the value does not repeat the 64-bit value follows.
If the value does repeat the index in the repeat array follows.

Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Preﬁx Compression
Repeat Compression

Approach Less Than 8-byte patterns
Analysis
SIGN EXPONENT FRACTION
100% 0%
0 16 32 48 64
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasma
obs error
obs info
obs spitzer
obs temp
number of bits matching previous value
This figure
shows the
amount that
adjacent prefixes
repeat in a given
dataset.
As seen the bits
quickly start
differing after
the 12th bit.

Algorithm
0 0 1 0 1 1 1 0
0 0 1 1 1 1 0 0
0 1 0 0 0 0 0 0
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 1 0 0 0 1 0 1
0 1 0 1 0 1 1 0
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
0.1:
1.0:
2.0:
3.0:
3.0:
4.0:
5.0:
100.0:
Encoded Not Encoded
8
8
2 6
2 6
1 1 5 1
1 1 5 1
1 1 3 2 1
1 1 1 2 2 1
1 1 1 2 1 1 1
1 1 1 2 1 1 1
0.1
1.0
2.0
3.0
4.0
5.0
100.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
NotEncodedEncoded
(00,00), (010000,01), (010001,10), (0101,11)

Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Preﬁx Compression
Repeat Compression

Approach More Than 8-byte patterns
Burrows Wheeler Transform
ABCDEABCDEABC$
$ABCDEABCDEABC
C$ABCDEABCDEAB
BC$ABCDEABCDEA
ABC$ABCDEABCDE
EABC$ABCDEABCD
DEABC$ABCDEABC
CDEABC$ABCDEAB
BCDEABC$ABCDEA
ABCDEABC$ABCDE
EABCDEABC$ABCD
DEABCDEABC$ABC
CDEABCDEABC$AB
BCDEABCDEABC$A
ABCDEABCDEABC$
ABCDEABC$ABCDE
ABC$ABCDEABCDE
BCDEABCDEABC$A
BCDEABC$ABCDEA
BC$ABCDEABCDEA
CDEABCDEABC$AB
CDEABC$ABCDEAB
C$ABCDEABCDEAB
DEABCDEABC$ABC
DEABC$ABCDEABC
EABCDEABC$ABCD
EABC$ABCDEABCD
$ABCDEABCDEABC
$EEAAABBBCCDDC
New arrays:
11010010010101
$EABCDC

Approach More Than 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Preﬁx Compression
Repeat Compression

Approach Combining into fzip
Algorithm
fzip starts with the BWT compression which creates a new dataset.
Repeats are added to the preﬁx codes to combine repeat and preﬁx
compression.

Approach Combining into fzip
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Preﬁx Compression
Repeat Compression

Results
Floating Point Compression Performance
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
1 2 4 8 16
DataSets
Compression Ratio
fzip
bzip -9
FPC 25
gzip -9

Future Work
Future Work
2 directions for future work: towards tradition dictionary approaches
and towards BWT.
The BWT road:
Replace preﬁx and repeat compression with a ”Move-to-Front”
algorithm.
Has the potential for high compression ratios.
BWT makes this slower and less hardware amenable.
The traditional road:
Replace BWT with dictionary approach (LZW).
This is more hardware amenable.

Floating Point Compression EIT'15

Recommended

Recommended

More Related Content

Similar to Floating Point Compression EIT'15

Similar to Floating Point Compression EIT'15 (20)

Recently uploaded

Recently uploaded (20)

Floating Point Compression EIT'15