Data Compression Project Presentation

Stockpile Resource Center –
Aircraft Compatibility
Summer Work Presentation:
Graflab Data Compression
Study
Myuran Kanga
August 12, 2010
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,
for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.

Presentation Outline
Introduction
Project Overview – Sam Sterns
Data Compression
Uses for Data Compression
Types of Data Compression
Three Algorithms
Testing Procedure
Compression/Decompression Example
Findings
Conclusion
Page ii

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 1

Introduction
• Myuran Kanga
– Bachelors Degree:
Oklahoma State University – Electrical Engineering
– Master’s Fellowship Program:
Rice University – Electrical Engineering (Communications
Specialization)
– Sandia: Meaningful Work/Projects:
- Team Assimilation
- Shaker Testing
- Cadence ORCAD – Electronic Design Software familiarization
- ORCAD Installation/licensing procedure documentation
- Courses – Quality for Project Management, Engineering Excellence,
Labview Core I, and Labview Core II
- Graflab Data Compression Study/Evaluation
Page 2

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 3

Project Overview – Graflab Data
Compression Study
Page 4
Summary: Evaluation of three Data Compression Algorithms
created by Dr. Samuel D. Sterns.
Primary Investigator/Technical Project Lead: Myuran Kanga
Key Personnel: Jerry Cap and Troy Skousen
Biography: Author – Compression Algorithms: Dr. Sam Sterns [1]
- Electrical Engineer specializing in digital signal processing
and adaptive signal processing
- Distinguished Member of the Technical Staff at Sandia
National Laboratories for 27 years. Retired in 1996.
- Author/Co-author of 7 signal processing textbooks
- Professor Emeritus at the University of New Mexico,
involved with teaching/research at the university since
1960.

Project Overview – Graflab Data
Compression Study cont.
Page 5
Project: Evaluation and interpretation of three data compression
algorithms.
- Algorithms labeled “2”, “3”, and “4”
- Code written in Matlab
- Each similar in nature
- Algorithms implement additional and more sophisticated
methods of compression
- More complex algorithms said to require longer
computational time but greater accuracy
- Hope to utilize compression with GRAFLAB
- GRAFLAB is a database, analysis, and plotting
package used for data reduction, analysis, and
archival purposes at Sandia.

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 6

What is Data
Compression?
Page 7
[2]

Data Compression
Definition: The process of encoding information using
fewer units of storage than an un-encoded
representation of data, through the use of
specific encoding schemes. [3]
Data compression, or sometimes called source coding, is
the process of converting input data into another data
stream that has a smaller size, but retains the essential
information contained within the original data stream.
Page 8

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 9

Data Compression Implementations
Page 10
- Compression is useful because it helps reduce the
consumption of resources, such as hard disk space or
transmission bandwidth.
- With the interest and surge in environmental test data for
the Surveillance Program, significant strains on computer
storage resources will occur.
- Archiving of environmental test data from legacy systems,
including data for the Environment Test lab.
- Familiar examples of data compressed files include .zip,
.rar, .tar file extensions.
[4]

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 11

Lossless vs. Lossy Compression
Two forms of compression: Lossless and Lossy
Lossless compression:
- These types of algorithms usually exploit statistical
redundancy to represent the user’s data more concisely
without error.
- Most real-world data has statistical redundancy
- Example – In English text, the letter ‘e’ is much more
common than the letter ‘z’. Similarly the probability that
the letter ‘q’ will be followed by the letter ‘z’ is very small.
Page 12

Lossless vs. Lossy Compression
Lossy Compression:
- Guided by research on how people perceive the data in
question.
- Used when some loss of fidelity is acceptable.
- As an example, the human eye is more sensitive to subtle
variations in luminance than to variations in color.
Therefore, color complexity can be reduced to maintain
the integrity of images, etc.
- JPEG image compression works in part by “rounding off”
some of this less important information.
- Lossy data compression provides a method of obtaining
the best fidelity for a given amount of compression
desired.
Page 13

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 14

Compression Algorithms
Page 15
Compression “2”
- Quantizes the data signal and packs the result into a sequence
of bytes.
Compression “3”
- Predicts the quantized data and packs the prediction error into
a sequence of bytes.
Compression “4”
- Said to provide the maximum compression
- Encodes the prediction error into a sequence of bytes using
adaptive arithmetic coding.
[5]

Compression Algorithms cont.
Page 16
Quantization
- The process of mapping a continuous range of values by a
relatively small set of discrete symbols or integer values.
- Sampling occurs on a periodic basis to convert the continuous
signal to discrete values.
- Can by viewed as accumulating data in bins
[6]

Page 17
Linear Prediction [7]
- Signal processing tool used in which future values of a digital signal
are estimated as a linear function of previous samples in the data.
- Time varying digital filter, excitation function, desired output y(n)
- Finding the appropriate excitation function and filter coefficients to
minimize the error of the predicted y(n) and original y(n).
- Also called Linear Predictive Coding - Common application:
- Speech compression
- Transmit only filter coefficients (Hk) and excitation sequence
x(n)
- For extreme compression, only transmit filter coefficients and
use a fix-frequency excitation – voice-coder
)(
1
0
0
)()( jnx
N
j
M
j
b jjnya jny 







N
j
j
nejnyny a1
)()()(


N
j
j
jnyn ay
1
^
)()(
)()()(
^
nnyne y

Page 18
Arithmetic Coding [8]
- Long data strings are represented by a single number, which is
obtained by repeatedly partitioning the range of possible values in
proportion to the probabilities of the data string.
- Example string: DABDDB
Symbol Part 1 Part 2 – Freq.
Product
Total
D 65 x 3 23328
A 64 x 0 3 0
B 63 x 1 3 x 1 648
D 62 x 3 3 x 1 x 2 648
D 61 x 3 3 x 1 x 2 x 3 324
B 60 x 1 3 x 1 x 2 x 3 x 3 54
25002
sFrequencieTotalDataCoded _
2510023321325002 
Part 1:
- 6 digit string = Radix of 6
- Multiplied by index of letter A = 0 to
D = 3
Part 2:
- Multiply by frequency of
accumulated product in
symbol data

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 19

Evaluation Procedure/Analysis
Page 20
Classical Waveform Compression Study:
- Triangle Wave - Trapezoid Wave
- Sine Wave - Sawtooth Wave
- Hanning Window - Harmonic Sine Waves
- Combined Sine Waves - Gap Analysis
- White Noise - Sine Wave with Noise
- Power Spectral Density - Square Wave
- .wav File
Waveforms created manually in individual m-files for predictability of
vector arrangement in Matlab. Frequencies and signal durations are
easily modifiable.

Waveform Examples
Page 21
0 1 2 3 4 5 6 7 8 9 10
-5
0
5
Original
Time (Seconds)
Amplitude
0 1 2 3 4 5 6 7 8 9 10
-5
0
5
Decompressed Waveform
Time (Seconds)
Amplitude
0 1 2 3 4 5 6 7 8 9 10
-0.02
0
0.02
Difference
Time (Seconds)
Amplitude
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
50
100
Original
Time (Seconds)
Amplitude
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
50
100
Time (Seconds)
Amplitude
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
-5
0
5
x 10
-4 Difference
Time (Seconds)
Amplitude
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
-1
0
1
Original
Time (Seconds)
Amplitude
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
-1
0
1
Time (Seconds)
Amplitude
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
-2
0
2
x 10
-5 Difference
Time (Seconds)
Amplitude
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
-1
0
1
Original
Time (Seconds)
Amplitude
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
-1
0
1
Time (Seconds)
Amplitude
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
-2
0
2
x 10
-5 Difference
Time (Seconds)
Amplitude
Trapezoid WaveWhite Noise
Gap AnalysisSawtooth Wave

Testing and Measurements
Page 22
Implemented Analysis and Measurements:
- Input and output data array
sizes
- Percentage accuracy of
compression
- Compression ratio - Relative computational time
- Percent difference: Max. and
Min. values of original and
decompressed waveforms
- Percent difference: Standard
deviation value of original and
- Percent error: Max. and min.
values of original and
- Percent error: Standard
deviation value of original and
- Root Mean Square values of
original and decompressed
waveforms
- Normal values of original and
- Difference in RMS values - Difference in Normal values

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 23

Compression/Decompression Example
Page 24
Using Compression “4”, the
compression ratio of the file was
1.52 with an accuracy of 99.6078
percent.
M-file written to create this
.wav file for real-world
compression/decompression
testing.
Compressed output using
Compression “2” and “4” –
Turn up your volume, the
amplitude of the compressed
file is much lower.
Compressed data should
not represent the original
data string. This example
demonstrates the
inefficiency of
Compression “2”.
Original Song
Compressed Song – Compression 2
Decompressed Song
Compressed Song – Compression 4

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 25

Findings
Page 26
Compression “2”:
- Generally, this algorithm produced a compression ratio of about 1 in most
cases. For simple waveforms like the square wave, compression did occur.
- Fastest compression algorithm of the three
- Inefficient compression – Compression ratio of 1 = No compression
Compression “3” and “4”:
- Compression ratio increases with increased data length/duration
- Increased data length/duration causes longer calculation times – Within limits
- Compression “4” produced a much higher compression ratio in comparison to
other algorithms
- Compression “4” is the slowest algorithm – Three compression methods
Special Cases:
- The square wave produces 100% accuracy and very high compression with
all three algorithms
- White Noise does not seem to compress much past a ratio of 1
- Code has been modified to handle gaps in the input data
- The accuracy of compression/decompression for all three algorithms has
proven to be above 99% in all cases

Introduction
Data Compression
Three Algorithms
Findings
Conclusion
Page 27

Future Work
Page 28
- Similar waveform analysis with the raw data files provided by Dr.
Sam Sterns
- Additional error or warning messages
- Noise
- Gaps
- Invalid array data
- Implementation of compression algorithms into Graflab database
- Investigate possibilities of real-time compression/decompression
Recommendations:
- Filter noise from data prior to compression
- Compress all data, disregarding size
- Continue implementation of replacing gaps
with zeros

Summer Work Applicability / Benefit
Page 29
- Applicability to our organization - Meaningful work
- Storing new and legacy environmental test data from the
surveillance program
- Environmental Test lab data storage
- Opportunity to continue education
- Improved Matlab skills
- Introduction to Labview
- ORCAD familiarity
- Organizational and leadership skills – Management course
- Assimilation to Albuquerque, work environment at Sandia
National Laboratories, and Aircraft Compatibility
[9] [10]

Citations and Questions
[1] University of New Mexico – ECE, “Dr. Samuel D. Stearns,” 2010. [Online]. Available:
http://www.ece.unm.edu/faculty/stearns/. [Accessed: July 2010].
[2] Plus Magazine, “Text, Bytes and Videotape,” January 1, 2003. [Online]. Available:
http://plus.maths.org/issue23/features/data/data.jpg. [Accessed: August 2010].
[3] Wikipedia, “Data compression,” July 20, 2010. [Online]. Available:
http://en.wikipedia.org/wiki/Data_compression. [Accessed: August 2010].
[4] Hoax-slyer.com, “Burning-hard-drive,” 2010. [Online]. Available: http://www.hoax-
slayer.com/images/burning-hard-drive.jpg. [Accessed: August 2010].
[5] S. Sterns, Encoding and Decoding of Instrumentation and Telemetry Waveforms. Samuel D. Sterns:
Sandia National Laboratories. January 25, 2008.
[6] Wikipedia, “Quantization (signal processing),” July 2, 2010. [Online]. Available:
http://en.wikipedia.org/wiki/Quantization_(signal_processing). [Accessed: June 2010].
[7] Connexions, “Linear Prediction and Cross Synthesis,” March 18, 2008. [Online]. Available:
http://cnx.org/content/m15478/latest/ . [Accessed: June 2010].
[8] Wikipedia, “Arithmetic coding,” August 7, 2010. [Online]. Available:
http://en.wikipedia.org/wiki/Arithmetic_coding. [Accessed: June 2010].
[9] Rice University, Home page, 2010. [Online]. Available: http://www.rice.edu. [Accessed: August 2010].
Appendix I

Citations and Questions
[10] Sandia National Laboratories, Home page, 2010. [Online]. Available: http://www.sandia.gov. [Accessed:
August 2010].
[11] T. Skousen. (private communication). 2010.
[12] J. Cap. (private communication). 2010.
Appendix II

Data Compression Project Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Data Compression Project Presentation

Similar to Data Compression Project Presentation (20)

More from Myuran Kanga, MS, MBA

More from Myuran Kanga, MS, MBA (6)

Data Compression Project Presentation