SlideShare a Scribd company logo
1 of 16
Download to read offline
A Multi-phase Approach to Floating-Point Compression
Kevin Townsend and Joseph Zambreno
Reconfigurable Computing Laboratory
Iowa State University
EIT’15
Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16
Outline
1 Introduction
2 Approach
8-byte patterns
Less Than 8-byte patterns
More Than 8-byte patterns
Combining into fzip
3 Results
4 Future Work
Townsend and Zambreno (RCL@ISU) float zip EIT’15 2 / 16
Introduction
Introduction
What are floating point datasets?
They are arrays of floating point values.
A 64-bit floating point value has a sign bit 11 exponent bits and 52
fractional bits.
However, you can view this as compressing an array of 64-bit integers.
Why compress them?
Compressed floating point datasets take up less space.
Compression can accelerate data transfer.
Knowledge of floating point datasets can lead to better compression
over general compression schemes.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 3 / 16
Approach
Approach
Analysis of 3 different patterns:
Repeating values
Common prefixes
Patterns in the value sequence
We created 3 different compression schemes:
List all values and use indices in this list.
Create a tree of all prefixes and create prefix codes.
Use the Burrows-Wheeler Transform and a simple compression scheme.
We combined all 3 algorithms into one algorithm.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 4 / 16
Approach 8-byte patterns
Analysis
0% 20% 40% 60% 80% 100%
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasma
obs errorobs info
obs spitzer
obs temp
Percent of total values
DataSets
Many Repeats No Repeats
In all the datasets at
least 50% of the
values have a
repeat.
Values are 8-bytes
so indices are much
smaller than values.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 5 / 16
Approach 8-byte patterns
Algorithm
All repeats stored in a separate array.
One bit indicates if the value that is encoded repeats or not.
If the value does not repeat the 64-bit value follows.
If the value does repeat the index in the repeat array follows.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 6 / 16
Approach 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 7 / 16
Approach Less Than 8-byte patterns
Analysis
SIGN EXPONENT FRACTION
100% 0%
0 16 32 48 64
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasma
obs error
obs info
obs spitzer
obs temp
number of bits matching previous value
This figure
shows the
amount that
adjacent prefixes
repeat in a given
dataset.
As seen the bits
quickly start
differing after
the 12th bit.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 8 / 16
Approach Less Than 8-byte patterns
Algorithm
0 0 1 0 1 1 1 0
0 0 1 1 1 1 0 0
0 1 0 0 0 0 0 0
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 1 0 0 0 1 0 1
0 1 0 1 0 1 1 0
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
0.1:
1.0:
2.0:
3.0:
3.0:
4.0:
5.0:
100.0:
Encoded Not Encoded
8
8
2 6
2 6
1 1 5 1
1 1 5 1
1 1 3 2 1
1 1 1 2 2 1
1 1 1 2 1 1 1
1 1 1 2 1 1 1
0.1
1.0
2.0
3.0
4.0
5.0
100.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
NotEncodedEncoded
(00,00), (010000,01), (010001,10), (0101,11)
Townsend and Zambreno (RCL@ISU) float zip EIT’15 9 / 16
Approach Less Than 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 10 / 16
Approach More Than 8-byte patterns
Burrows Wheeler Transform
ABCDEABCDEABC$
$ABCDEABCDEABC
C$ABCDEABCDEAB
BC$ABCDEABCDEA
ABC$ABCDEABCDE
EABC$ABCDEABCD
DEABC$ABCDEABC
CDEABC$ABCDEAB
BCDEABC$ABCDEA
ABCDEABC$ABCDE
EABCDEABC$ABCD
DEABCDEABC$ABC
CDEABCDEABC$AB
BCDEABCDEABC$A
ABCDEABCDEABC$
ABCDEABC$ABCDE
ABC$ABCDEABCDE
BCDEABCDEABC$A
BCDEABC$ABCDEA
BC$ABCDEABCDEA
CDEABCDEABC$AB
CDEABC$ABCDEAB
C$ABCDEABCDEAB
DEABCDEABC$ABC
DEABC$ABCDEABC
EABCDEABC$ABCD
EABC$ABCDEABCD
$ABCDEABCDEABC
$EEAAABBBCCDDC
New arrays:
11010010010101
$EABCDC
Townsend and Zambreno (RCL@ISU) float zip EIT’15 11 / 16
Approach More Than 8-byte patterns
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 12 / 16
Approach Combining into fzip
Algorithm
fzip starts with the BWT compression which creates a new dataset.
Repeats are added to the prefix codes to combine repeat and prefix
compression.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 13 / 16
Approach Combining into fzip
Results
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
BWT Compression
Prefix Compression
Repeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 14 / 16
Results
Floating Point Compression Performance
msg btmsg lumsg sp
msg sppm
msg sweep3d
num
brain
num
comet
num
control
num
plasmaobs errorobs info
obs spitzerobs tempaverage
1 2 4 8 16
DataSets
Compression Ratio
fzip
bzip -9
FPC 25
gzip -9
Townsend and Zambreno (RCL@ISU) float zip EIT’15 15 / 16
Future Work
Future Work
2 directions for future work: towards tradition dictionary approaches
and towards BWT.
The BWT road:
Replace prefix and repeat compression with a ”Move-to-Front”
algorithm.
Has the potential for high compression ratios.
BWT makes this slower and less hardware amenable.
The traditional road:
Replace BWT with dictionary approach (LZW).
This is more hardware amenable.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 16 / 16

More Related Content

Similar to Floating Point Compression EIT'15

RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizonac.titus.brown
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Prof. Wim Van Criekinge
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekingeProf. Wim Van Criekinge
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSAndrea Benassi
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureOllieShoresna
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
 

Similar to Floating Point Compression EIT'15 (20)

Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Alignments
AlignmentsAlignments
Alignments
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
 
defense
defensedefense
defense
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPS
 
T5 2017 database_searching_v_upload
T5 2017 database_searching_v_uploadT5 2017 database_searching_v_upload
T5 2017 database_searching_v_upload
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data Structure
 
AllPosters
AllPostersAllPosters
AllPosters
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 

Recently uploaded

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 

Recently uploaded (20)

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 

Floating Point Compression EIT'15

  • 1. A Multi-phase Approach to Floating-Point Compression Kevin Townsend and Joseph Zambreno Reconfigurable Computing Laboratory Iowa State University EIT’15 Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16
  • 2. Outline 1 Introduction 2 Approach 8-byte patterns Less Than 8-byte patterns More Than 8-byte patterns Combining into fzip 3 Results 4 Future Work Townsend and Zambreno (RCL@ISU) float zip EIT’15 2 / 16
  • 3. Introduction Introduction What are floating point datasets? They are arrays of floating point values. A 64-bit floating point value has a sign bit 11 exponent bits and 52 fractional bits. However, you can view this as compressing an array of 64-bit integers. Why compress them? Compressed floating point datasets take up less space. Compression can accelerate data transfer. Knowledge of floating point datasets can lead to better compression over general compression schemes. Townsend and Zambreno (RCL@ISU) float zip EIT’15 3 / 16
  • 4. Approach Approach Analysis of 3 different patterns: Repeating values Common prefixes Patterns in the value sequence We created 3 different compression schemes: List all values and use indices in this list. Create a tree of all prefixes and create prefix codes. Use the Burrows-Wheeler Transform and a simple compression scheme. We combined all 3 algorithms into one algorithm. Townsend and Zambreno (RCL@ISU) float zip EIT’15 4 / 16
  • 5. Approach 8-byte patterns Analysis 0% 20% 40% 60% 80% 100% msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasma obs errorobs info obs spitzer obs temp Percent of total values DataSets Many Repeats No Repeats In all the datasets at least 50% of the values have a repeat. Values are 8-bytes so indices are much smaller than values. Townsend and Zambreno (RCL@ISU) float zip EIT’15 5 / 16
  • 6. Approach 8-byte patterns Algorithm All repeats stored in a separate array. One bit indicates if the value that is encoded repeats or not. If the value does not repeat the 64-bit value follows. If the value does repeat the index in the repeat array follows. Townsend and Zambreno (RCL@ISU) float zip EIT’15 6 / 16
  • 7. Approach 8-byte patterns Results msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasmaobs errorobs info obs spitzerobs tempaverage 1 2 4 8 16 DataSets Compression Ratio fzip BWT Compression Prefix Compression Repeat Compression Townsend and Zambreno (RCL@ISU) float zip EIT’15 7 / 16
  • 8. Approach Less Than 8-byte patterns Analysis SIGN EXPONENT FRACTION 100% 0% 0 16 32 48 64 msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasma obs error obs info obs spitzer obs temp number of bits matching previous value This figure shows the amount that adjacent prefixes repeat in a given dataset. As seen the bits quickly start differing after the 12th bit. Townsend and Zambreno (RCL@ISU) float zip EIT’15 8 / 16
  • 9. Approach Less Than 8-byte patterns Algorithm 0 0 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 . . . . . . . . . . . . . . . . . . . . . . . . 0.1: 1.0: 2.0: 3.0: 3.0: 4.0: 5.0: 100.0: Encoded Not Encoded 8 8 2 6 2 6 1 1 5 1 1 1 5 1 1 1 3 2 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 0.1 1.0 2.0 3.0 4.0 5.0 100.0 . . . . . . . . . . . . . . . . . . . . . NotEncodedEncoded (00,00), (010000,01), (010001,10), (0101,11) Townsend and Zambreno (RCL@ISU) float zip EIT’15 9 / 16
  • 10. Approach Less Than 8-byte patterns Results msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasmaobs errorobs info obs spitzerobs tempaverage 1 2 4 8 16 DataSets Compression Ratio fzip BWT Compression Prefix Compression Repeat Compression Townsend and Zambreno (RCL@ISU) float zip EIT’15 10 / 16
  • 11. Approach More Than 8-byte patterns Burrows Wheeler Transform ABCDEABCDEABC$ $ABCDEABCDEABC C$ABCDEABCDEAB BC$ABCDEABCDEA ABC$ABCDEABCDE EABC$ABCDEABCD DEABC$ABCDEABC CDEABC$ABCDEAB BCDEABC$ABCDEA ABCDEABC$ABCDE EABCDEABC$ABCD DEABCDEABC$ABC CDEABCDEABC$AB BCDEABCDEABC$A ABCDEABCDEABC$ ABCDEABC$ABCDE ABC$ABCDEABCDE BCDEABCDEABC$A BCDEABC$ABCDEA BC$ABCDEABCDEA CDEABCDEABC$AB CDEABC$ABCDEAB C$ABCDEABCDEAB DEABCDEABC$ABC DEABC$ABCDEABC EABCDEABC$ABCD EABC$ABCDEABCD $ABCDEABCDEABC $EEAAABBBCCDDC New arrays: 11010010010101 $EABCDC Townsend and Zambreno (RCL@ISU) float zip EIT’15 11 / 16
  • 12. Approach More Than 8-byte patterns Results msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasmaobs errorobs info obs spitzerobs tempaverage 1 2 4 8 16 DataSets Compression Ratio fzip BWT Compression Prefix Compression Repeat Compression Townsend and Zambreno (RCL@ISU) float zip EIT’15 12 / 16
  • 13. Approach Combining into fzip Algorithm fzip starts with the BWT compression which creates a new dataset. Repeats are added to the prefix codes to combine repeat and prefix compression. Townsend and Zambreno (RCL@ISU) float zip EIT’15 13 / 16
  • 14. Approach Combining into fzip Results msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasmaobs errorobs info obs spitzerobs tempaverage 1 2 4 8 16 DataSets Compression Ratio fzip BWT Compression Prefix Compression Repeat Compression Townsend and Zambreno (RCL@ISU) float zip EIT’15 14 / 16
  • 15. Results Floating Point Compression Performance msg btmsg lumsg sp msg sppm msg sweep3d num brain num comet num control num plasmaobs errorobs info obs spitzerobs tempaverage 1 2 4 8 16 DataSets Compression Ratio fzip bzip -9 FPC 25 gzip -9 Townsend and Zambreno (RCL@ISU) float zip EIT’15 15 / 16
  • 16. Future Work Future Work 2 directions for future work: towards tradition dictionary approaches and towards BWT. The BWT road: Replace prefix and repeat compression with a ”Move-to-Front” algorithm. Has the potential for high compression ratios. BWT makes this slower and less hardware amenable. The traditional road: Replace BWT with dictionary approach (LZW). This is more hardware amenable. Townsend and Zambreno (RCL@ISU) float zip EIT’15 16 / 16