Compression

Audio Compression
by: Philipp Herget

Su ciency Course Sequence:

Course Number Course Title Term
HI1341 Introduction to Global History A92
HI2328 History of Revolution in the 20th Century B92
MU1611 Fundamentals of Music I A93
MU2611 Fundamentals of Music II B93
MU3611 Computer Techniques in Music C94

Presented to: Professor Bianchi
Department of Humanities & Arts
Term B, 1996
FWB5102

Submitted in Partial Ful llment
of the Requirements of
the Humanities & Arts Su ciency Program
Worcester Polytechnic Institute
Worcester, Massachusetts

Abstract
This report examines the area of audio compression and its rapidly expanding use
in the world today. Covered topics include a primer on digital audio, discussion of
di erent compression techniques, a description of a variety of compressed formats, and
compression in computers and Hi-Fi stereo equipment. Information was gathered on a
multitude of di erent compression uses.

Contents
1 Introduction 1
2 Digital Audio Basics 2
3 Compression Basics 7
3.1 Lossless vs. Lossy Compression : : : : : : : : : : : : : : : : : : : : : : : : : 7
3.2 Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : : : : : : 9
3.3 Common Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : 10
4 Uses of Compression 17
4.1 Compression in File Formats : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
4.2 Compression in Recording Devices : : : : : : : : : : : : : : : : : : : : : : : 19
5 Conclusion 22
Bibliography 23

i

1 Introduction
The rst form of audio compression came out in 1939 when Dudley rst introduced the
VOCODER (VOice CODER) to reduce the amount of bandwidth needed to transmit speech
over a telephone line (Lynch, 222). The VOCODER broke speech down into certain fre-
quency bands, transmitted information about the amount of energy in each band, and then
synthesized speech using the transmitted information on the receiving end of the device. Since
then, there has been a great deal of research conducted in the area of audio compression. In
the 1960's, compression was used in telephony, and extensive research was done to minimize
bandwidth needed to transmit audio data (Nelson, 313). Today, audio compression is a large
subarea of Audio Engineering.
The need for audio compression is brought about by the tremendous amount of space
required to store high quality digital audio data. One minute of CD quality audio data
takes up 4Mbytes of storage space (Ratcli , 32). The use of compression allows a signi cant
reduction in the amount of data needed to create audio sounds with usually only a minimal loss
in the quality of the audio signal. Compression comes at the expense of the extra hardware or
software needed to compress the signal. However, in todays technologically advanced times,
this cost is usually small compared to the cost of space that is saved.
Compression is used in almost all new digital audio devices on the market, and in many of
the older ones. Some examples are the telephone system, digital message recorders, like those
in answering machines, and Sony's new MiniDisc player. With the use of compression, these
devices are able to store more information in less space. Compression is accompanied by a
loss in quality, but usually so minimal it cannot be heard by most people. A good example
of this is the anti-shock mechanism found in the newer CD players. This mechanism uses a
small portion of digital memory to bu er digital data from the CD. When a physical shock
disrupts the player and it can no longer read data from the CD, the data from the memory
bu er is used to generate the audio signal until the player re-tracks on the CD. To store a
maximum amount of data, the player uses compression to store the data in the memory. The

1

Panasonic SL-S600C has such an anti-shock mechanism with 10 seconds of storage bu er.
The Panasonic SL-S600C Operating Instructions state:
The extra anti-shock function incorporates digital signal compression technology.
When listening to sound with the unit connected to a system at home, it is
recommended that the extra anti-shock switch be set to the OFF position.
The recommendation is given because the compression algorithm used in the storage has a
slightly detrimental impact on the sound quality.
The use of audio compression is a tradeo among di erent factors. Knowledge of audio
compression is useful not only to the designer, but also the consumer. The key questions that
arise in the evaluation of an audio compression systems are how much is the data compressed,
what are the losses associated with the compression, and what is the cost of the compression.
This paper will answer some of these questions by providing a basic awareness of compression,
giving background on compression, explaining various popular compression techniques, and
discussing the compression formats used in various audio devices and audio computer les.

2 Digital Audio Basics
Compression can be accomplished using two di erent methods. The rst method is to take
the data from a standard digital audio system and compress it using software. The second is
to encode the signal in a di erent yet similar manner to that done in a normal digital audio
system. Both of these methods are based on digital audio theory, therefore, the understanding
of their functionality and performances requires an understanding of digital audio basics.
The sounds we hear are caused by variations in air pressure which are picked up by our
ear. In an analog electronic audio system, these pressure signals are converted to a electric
voltage by a microphone. The changing voltage, which represents the sound pressure, is
stored on a medium (like tape), and later used to control a speaker to reproduce the original
sound. The largest source of error in such an audio system occurs in the storage and retrieval
process were noise is added to the sound.
2

Voltage (Air Pressure)

time

Figure 1: An Example of an Analog Waveform

The idea behind a digital system is to represent an analog (continuous) waveform as a
nite number of discrete values. These values can be stored in any digital media, such as a
computer. Later, the values can be converted back to an analog audio signal. This method
is advantageous over the older analog techniques because no information (quality) is lost in
the storage and retrieval process. Also unlike analog, when a copy of a digital recording is
made, the values can be exactly duplicated, creating an exact replica of the original digital
work. However, the process does su er other losses. These losses occur in the conversion
process from the analog to the digital format.
To explain the analog to digital conversion process, we will look at an analog audio
waveform and show each of the steps taken in digitizing it. The waveform in Figure 1
represents a brief moment of an audible sound. The amplitude of the waveform represents
the relative air pressure due to the sound.
In a digital system, the waveform is represented by a series of discrete values. To get
these values, two steps must be taken. First the signal is sampled. This means that discrete
values of the signal are selected in time. The second step is to quantize each of the values
attained in the sampling step. Quantization reduces the amount of storage space required for
each value in a digital system.
In the rst step, the samples are taken at constant intervals. The number of samples

3

Voltage

time

T

Figure 2: An Example of a Sampled Analog Waveform

taken every second is called the sampling rate. Figure 2 shows the result of sampling the
signal. The X's on the waveform represent the samples which were taken. Since the samples
were taken every T seconds, there are 1=T samples per second. The sampling rate shown
in Figure 2 is therefore 1=T samples/s. Typical sampling rates range from 8000 to 44100
samples/s for a CD. The term samples/s is often replaced by the term Hz, kHz, or MHz to
represent units of samples/s, kilo samples/s, or Mega samples/s respectively (Audio FAQ).
The sample values, the values with the X's, now represent the original waveform. These
values could now be stored, and be used at a later time to recreate the original signal. How
well the original signal can be recreated, is related to the number of samples taken in a given
time period. Therefore, the sampling rate is a critical factor in the quality of the digitized
signal. If too few samples are taken, then the original signal cannot be re-generated correctly.
In 1933, a publication by Harry Nyquest proved that if the sampling rate is greater
that twice the highest frequency of the original signal, the original signal can be exactly
reconstructed (Nelson, 321). This means that if we sample our original signal at a rate that
is twice as high as the highest frequency contained in the signal, there will be no theoretical
losses of quality. This sampling rate, necessary for perfect reconstruction, is commonly
referred to as the Nyquest rate.
Now that we have a set of consecutive samples of the original signal, the samples need
4

Voltage

time

T

Figure 3: An Example of a Quantization of the A Sampled Analog Waveform

to be quantized in order to reduce the storage space required by each sample. The process
involves converting the sampled values into a certain number of discrete levels, which are
stored as binary numbers. A sample value is typically converted to one of 2n levels, where n
is the number of bits used to represent each sample digitally. This process is carried out in
hardware by a device called an analog to digital converter (ADC).
The result of quantizing the values from Figure 2 is shown in Figure 3. The samples still
have approximately the same value as before, but have been rounded o " to the nearest of
16 di erent levels. In a digital system, the amount of storage space required by a number
is governed by the number of possible values that number could have. By quantizing the
sample, the number of possible values is limited, signi cantly reducing the required storage
space. After quantizing the value of each sample in the gure to one of 24 levels, only 4 bits
of storage are needed for each sample. In most digital audio systems, either 8 or 16 bits are
used for storage, yielding 28 = 256 or 216 = 65536 di erent levels in the quantization process.
The quantization process is the most signi cant source of error in a digital audio signal.
Each time a value is quantized, the original value is lost, and the value is replaced by an
approximation of the original. The peak value of the error is 1=2 the value of the quantization
step. Thus the smaller the quantization steps, the smaller the error is. This means the more

5

Voltage

time

T

Figure 4: An Example of a Signal Reconstructed from the Digital Data

bits used to quantize the signal, the better the quality of reconstructed sound signal, and the
more space required to store the signal values.
To regain the original signal, each of the values stored as the digital audio signal are
converted back to an analog audio signal using a Digital to Analog Converter (DAC). An
example of the output of the DAC is shown in Figure 4. The DAC takes the sample points and
makes an analog waveform out of them. Due to the process used to convert the waveform,
the resulting signal is comprised of a series of steps. To remedy this, the signal is then put
through a low pass lter which smoothes out the waveform, removing all of the sharp edges
caused by the DAC. The resulting signal is very close to the original.
All the losses in the digital system occur in the conversion process to and from a digital
signal. Once the signal is digital, it can be duplicated, or replayed any number of times and
never lose any quality. This is the advantage of a digital system. The losses generated by
the conversion process can be measured as a Signal to Noise Ratio (SNR), the same measure
used for analog signals. The noise in the signal is considered to be the signal that would have
to be subtracted from the reconstructed signal to obtain the original. SNR is used to compare
the quality of di erent types of quantization, and is also used in the quality measurement of
compression techniques.

6

3 Compression Basics
The underlying idea behind data compression is that a data le can be re-written in a di erent
format that takes up less space. A data format is called compressed when it saves either
more information in the same space, or saves information in less space than a standard
uncompressed format. A compression algorithm for an audio signal will analyze the signal
and store it in a di erent way, hopefully saving space. An analogy could be made between
compression and shorthand. In shorthand, words are represented by symbols, e ectively
shortening the amount of space occupied. Data compression uses the same concept.

3.1 Lossless vs. Lossy Compression
The eld of compression is divided into two categories, lossless and lossy compression. In
lossless compression, no data is lost in the compression process. An example of a lossless
compression program is pkzip for the IBM PC. This is a shareware utility which is widely
available. It can be used to compress and uncompress any type of computer le. When a le
is uncompressed, the exact original is retrieved. The amount of compression that is achieved
is highly dependent on the type of le, and varies greatly from le to le.
In lossy compression schemes, the goal is to encode an approximation of the original.
By using a close approximation of the signal, the coding can usually be accomplished using
much less space. Since an approximation is saved, instead of the original, lossy compression
schemes can only be used to compress information when the exact original is not needed.
This is the case for audio and video data. With these types of data, any digital format used
is an approximation of the original signal. Compression used in computer data or program
les must be compressed using lossless compression because all of the data is usually critical.
In general, lossy compression schemes yield much higher compression ratios than lossless
compression schemes. In many cases, the di erence in quality between the compressed
version and the original is so minimal that it is not noticeable. Yet, in other compression

7

schemes there is a signi cant di erence in quality. Deciding what how much information is
to be lost is up to the discretion of the designer of the algorithm or technique. It is a tradeo
between size and quality.
If the shorthand writer, from the previous analogy, was to write only the main idea's of the
text down, it would be analogous to lossy compression. Using only the main ideas would be
an extreme form of compression. If he or she were to leave out some adjectives and adverbs,
it would again be a form of lossy compression. This one being less lossy than the rst. From
the analogy, it can be seen how the writer (programmer) can decide how important the details
are and how many details to include.
Almost all compression techniques used in digital systems are lossy. This is because
lossless compression algorithms are generally very unpredictable in the amount of compres-
sion they can achieve. In a typical application, there is a limited amount of space" for the
digital audio data that is generated. If the audio data cannot be compressed to a guaranteed
size, it simply will not t in the required space, which is unacceptable.
The reason for the unpredictability of a lossless technique lies in the technique itself. Data
which happens to be in a format which does not lend itself to the way the lossless technique
re-writes" the data will not be compressed. In The Data Compression Book, Mark Nelson
compares raw speech les which were compressed with a shareware lossless data compression
program, ARJ, to demonstrate how well a typical lossless compression scheme will compress
an audio signal. He states:
ARJ results showed that voice les did in fact compress relatively well. The six
sample raw sound les gave the following results:
Filename Original Compressed Ratio
SAMPLE-1.RAW 50777 33036 35%
SAMPLE-2.RAW 12033 8796 27%
SAMPLE-3.RAW 73019 59527 19%
SAMPLE-4.RAW 23702 9418 60%
SAMPLE-5.RAW 27411 19037 30%
SAMPLE-6.RAW 15913 12771 20%

8

His data shows that the compression ratios uctuate greatly depending on the particular
sample of speech that is used.

3.2 Audio Compression Techniques
For any type of compression, the compression ratio and the algorithm used is highly depend-
ent on the type of data that is being compressed. The data source used in this paper is audio
data, and we have already determined that lossy compression will be used in most cases.
Now we can further subdivide the source into music and voice data.
The more information that is known about the source, the better the compression tech-
nique can be tailored toward that type of data. The di erences between music and speech
allow audio compression techniques to be subdivided into two categories: waveform coding
and voice coding. Waveform coding can be used on all types of audio data, including voice.
The goal of waveform coding is to recreate the original waveform after decompression. The
closer the decompressed waveform is to the original, the better the quality of the coding
algorithm is. The second technique, voice coding, yields a much higher compression ratio,
but can only be used if the audio source is a voice. In voice coding, the goal is to recreate the
words that were spoken and not the actual voice. The algorithms utilize priori information
about the human voice, in particular the mechanism that produces it" (Lynch, 255).
Since the two techniques are fundamentally di erent, the performance of each technique
is measured di erently. The performance of waveform coding techniques are measured by
determining how well the uncompressed signal matches the original speech waveform. This
is usually done by measuring the SNR. With the voice coding technique this is not possible
since the technique doesn't try to mimic the waveform. Therefore, in voice coding algorithms,
the quality of the algorithm is measured by listener preference.
These coding techniques can be further subdivided into two categories, time domain
coding and frequency domain coding. In a time domain coding technique, information on each
of the samples of the original signal are encoded. In a frequency domain coding technique,

9

the signal is transformed into it's frequency representation. This frequency representation is
then encoded into a compressed format. Later the information is decoded, and transformed
back into the time representation of the signal to get back the original samples. Most simple
compression algorithms use a time domain coding technique.
The more recent waveform coding techniques provide a much higher compression ratio by
using psychoacoustics to aid in the compression. Psychoacoustics is the study of how sounds
are heard subjectively and of the individual's response to sound stimuli" (Webster's New
World Dictionary, 1147). By basing the compression scheme on psychoacoustic phenomenon,
data that can't be heard by humans can be discarded. For example, in psychoacoustics it has
been determined that certain levels of sounds cannot be heard while other louder sounds are
present (Beerends, 965). This e ect is called masking. By eliminating the unheard sounds
from the audio signal, the signal is simpli ed, and can be more easily compressed. Techniques
like these are used in modern systems where high compression ratios are necessary, like Sony's
new MiniDisc player.

3.3 Common Audio Compression Techniques
The techniques that have been discussed thus far are general subcategories of the approaches
that can be taken when designing an audio compression algorithm. In this section, the details
of some popular compression techniques will be discussed. Since compression is such a large
area, a comprehensive guide to all the di erent compression methods is far beyond the scope
of this paper. However, this section covers some fundamental and some advanced techniques
to provide a general idea of how di erent compression techniques are implemented.
To give a general background, both waveform and voice coding techniques are discussed.
Since the waveform coding techniques are simpler, they will be discussed rst. In these
techniques, the compressed digital data is often obtained from the original signal itself, rather
than creating standard digital audio data and compressing it with software.

10

3.3.1 Waveform Coding Techniques
PCM
Pulse Code Modulation (PCM) refers to the technique used to code the raw digital audio
data as described in Section 2. It is the fundamental digital audio technique that is used
most frequently in digital audio systems. Although PCM is not a compression technique,
when it is used along with non-uniform quantization such as {Law or A{Law, it can be
considered compression. PCM combined with non-uniform quantization is used as a reference
for comparing the performance of other compression schemes (Lynch, 225).

{Law and A{Law Companding
Since the dynamic range of an audio signal is very wide, an audio waveform having a maximum
possible amplitude of 1 volt may never reach over 0.1 volts if the audio signal is not very
loud. If the signal is quantized with a linear scale, the values attained by the signal will
cover only 1/10 of the quantization range. As a result, the softer audio signals have a very
granular waveform after being quantized, and the quality of the sound deteriorates rapidly
as the sound gets softer. To compensate for the wide dynamic range of audio signals, a non-
linear scale can be used to quantize the signal. Using this method, the digitized signal will
have an increased number of steps in the lower range, alleviating the problem (Couch, 152).
Using non-uniform quantization can raise the SNR for a softer sound, making the SNR for
a wide range of sound levels approximately uniform (Couch, 155). Typically, non-uniform
quantization is done on a logarithmic scale.
The two standard formats for the logarithmic quantization of a signal are {Law and
A{Law. A{Law is the standard format used in Europe (Couch, 153), and {Law is used in
the telephone systems of the United States, Canada, and Japan. The {Law quantization,
used in phone systems, uses eight bits of data to provide the dynamic range that normally
requires twelve bits of PCM data (Audio FAQ).
The process of converting a computer le to {Law is a form of compression, since the
11

amount of data that is needed per sample is reduced and the dynamic range of the sample
is increased. The result is much less data with more information. To create {Law or A{
Law data, the signal must be originally be compressed and later expanded. This process is
commonly referred to as companding.

Silence Compression
Silence compression is a form of lossless compression that is extremely easy to implement.
In silence compression, periods of relative silence in a audio signal are replaced by actual
silence. The samples of data that were used to represent the silent part are replaced by a
code and a number telling the device which reconstructs the analog signal how much silence
to insert. This reduces all of the data needed to represent the silent part of the signal down
to a few bytes.
To implement this, the compression algorithm rst determines if the audio data is silent
by comparing the level of the digital audio data to a threshold. If the level is lower than the
threshold, that part of the audio signal is considered silent, and the samples are replaced by
zeros. The performance of the algorithm therefore hinges on the threshold level. The higher
the level, the more compression there is but the more lossy the technique is. The amount of
compression achieved also depends on the total length of all the silent periods in an audio
signal. The amount can be very signi cant in some types of audio data like voice data.
Silence encoding is extremely important for human speech. If you examine a
waveform of human speech, you will see long, relatively at pauses between the
spoken words. (Ratcli 32)
In The Data Compression Book, Mark Nelson wrote silence compression code in C, and
used it to compress some PCM audio data les. The results he obtained were as follows:
Filename Original Compressed Ratio
SAMPLE-1.RAW 50777 37769 26%
SAMPLE-2.RAW 12033 11657 3%
SAMPLE-3.RAW 73019 73072 0%
SAMPLE-4.RAW 13852 10962 21%
SAMPLE-5.RAW 27411 22865 17%
12

a)

b)

Figure 5: An Example of Signals in a DM waveform: a) The original and reconstructed
waveforms and b) The DM waveform

The table indicates that silence compression can be very e ective in some instances, but in
others it may have no e ect at all, or even increase the le size slightly. Silence compression
is used mainly in le formats found in computers.

DM
Delta Modulation (DM) is one of the most primitive forms of audio encoding. In DM, a
stream of 1 bit values is used to represent the analog signal. Each bit contains information
on whether the DM signal is greater or less than the actual audio signal. With this information,
the original signal can then be reconstructed.
Figure 5 shows an example DM signal, the original signal it was generated from, and the
reconstructed signal before ltering. The actual DM signal, Figure 5b, contains information
on whether the output should rise or fall. The size of the step and the rate of the steps are
xed. The reconstruction algorithm simply raises or lowers the input value according to the
DM waveform.
DM su ers from two major losses, granular noise and slope overload. Granular noise
occurs when the input signal is at. The DM signal simulates at regions by rising and
falling, leading to granular noise. Slope overload is caused when the input signal rises faster

13

than the DM signal can follow it. Granular noise can be eliminated by making the step size
small enough, and slope overload can be prevented by increasing the data rate. However,
decreasing the step size and increasing the data rate, also increases the amount of data
needed to store the signal. DM is rarely used, but was explained here to provide a basis for
understanding ADM, which o ers a signi cant advantage over PCM.

ADM
Adaptive Delta Modulation (ADM) is the solution to the problems with DM. In ADM, the
step size is continuously adjusted, making the step size larger in the fast changing parts of
the signal and smaller in the slower changing parts of the signal. Using this technique, both
the granular noise and the slope overload problems are solved.
In order to adjust the step size, an estimation must be made to determine if the signal is
changing rapidly. The estimation in ADM is usually based on the last sample. If the signal
increased for two consecutive samples, the step size is increased. If the two previous steps
were opposite in direction, then the step size is decreased. This estimation method is simple
yet e ective.
The performance of ADM using the above technique turns out to be better than Log PCM
when little data is used to represent a signal1. When more data is used however, Log PCM
performs better (Lynch 229).

DPCM
A Di erential Pulse Code Modulation (DPCM) system consists of a predictor, a di erence
calculator, and a quantizer. The predictor predicts the value of the next sample. The
di erence calculator then determines the di erence between the predicted value and the actual
value. Finally, this di erence value is quantized by the quantizer. The quantized di erences
are used to represent the original signal.
1 Performance is measured with SNR.

14

Essentially, a DM signal is a DPCM signal with one bit being used in the quantization
process and a predictor based on the previous bit. In a DM system, the predicted value
is always the same as the previous value and the di erence between the predicted value
(previous value) and the actual signal is quantized with using one bit (two levels).
The performance of a DPCM signal depends on the predictor. The better it can predict
where the signal is headed, the better it will perform. A DPCM system using one previous
value in the predictor can achieve the same SNR as a {Law PCM system using one less bit
to quantize each sample value. If three previous values are used for the predictor, the same
SNR can be achieved using two bits less to represent each sample (Lynch 227). This is a
signi cant performance increase over PCM because it obtains the same SNR using less data.
This technique can be extended even further by making the prediction method adaptive to the
input data. The technique is called Adaptive Di erential Pulse Code Modulation (ADPCM).

ADPCM
ADPCM is a modi cation of the DPCM technique making the algorithm adapt to the char-
acteristics of the signal. The relationship between DM and ADM is the same as that between
DPCM and ADPCM. In both of these, the algorithm is made adaptive to the changes in the
audio signal. The adaptive part of the system can be built into the predictor, the quantizer,
or both, but has been shown to be most e ective in the quantizer (Lynch 227).
Using this adaptive algorithm, the compression performance can be increased beyond that
of DPCM. Cohen (1973) shows that by using the two most signi cant bits in the previous
three samples, a gain in SNR of 7dB over non-adaptive DPCM can be obtained" (Lynch,
227). Di erent forms of ADPCM are used in many applications including inexpensive digital
recorders. Also, ADPCM is used in public compression standards which are slowly gaining
popularity, like CCITT G.721 and G.723, which used ADPCM at 32 kbits/s and 24 or 40
kbits/s respectively (Audio FAQ).

15

PASC and ATRAC
All of the previously mentioned compression techniques are a relatively simple re-writing
of the audio data. Precision Adaptive Subband Coding (PASC) and Adaptive TRansform
Acoustic Coding (ATRAC) di er from these, because they are much more complex propri-
etary schemes which were developed for a speci c purpose. PASC and ATRAC were both
developed for used in the Hi-Fi audio market. PASC was developed by Philips for use with
the Digital Compact Cassette (DCC), and ATRAC was developed by Sony for use with their
MiniDisc player. Both of these techniques use psychoacoustic phenomena as a basis for the
compression algorithm in order to achieve the extreme compression ratios required for their
applications.
The details of the algorithms are complicated, and will not be discussed here. More
information is given in the discussion of compression used in Hi-Fi audio equipment in
Section 4.2. In addition to this, details on PASC can be found in Advanced Digital Audio
by Ken Polmann, and details on ATRAC can be found in the Proceedings of the IEEE in an
article titled, The Rewritable MiniDisc System" by Tadao Yoshida.

3.3.2 Voice Coding Techniques
LPC
Linear Predictive Coding (LPC) is one of the most popular voice coding techniques. In
an LPC system, the voice signal is represented by storing characteristics about the system
creating the voice. When the data is played back, the voice is synthesized from the stored data
by the playing device. The model used in an LPC system includes the source of the sound,
a variable lter resembling the human vocal tract, and an variable ampli er resembling the
amplitude of the sound.
The source of the sound is modeled in two di erent ways depending on how the voice is
being produced. This is done because humans can produce two types of sound, voiced and
unvoiced. Voiced sounds are those which are created by using the vocal cords and unvoiced
16

sounds are created by pushing air through the vocal tract. An LPC algorithm models these
sounds by using either driven periodic pulses (voiced) or a random noise generator (unvoiced)
as the source.
The human vocal tract is modeled in the system as a time-varying lter (Lynch, 240).
Parameters are calculated for the lter to mimic the changing characteristics of the vocal
tract when the sound was being produced. The data used to represent the voice in an LPC
algorithm consists of the information on the lter parameters, the source used (voiced or
unvoiced), the pitch of the voice, and the volume of the voice. The amount of data generated
by storing these parameters is signi cantly less than the amount of data used to represent
the waveform of the speech signal.

GSM
The Global System for Mobile telecommunications (GSM) is a standard used for compression
of speech in the European digital cellular telephone system. GSM is an advanced compression
technique that can achieve a compression ratio of 8:1. To obtain this high compression ratio
and still produce high quality sound, GSM is based on the LPC voice coding technique and
also incorporates a form of waveform coding (Degener, 30).

4 Uses of Compression
Compression is used in almost all modern digital audio applications. These devices include
computer les, audio playing devices, telephony applications, and digital recording devices.
Many of the devices, like the telephone system, have been using compression for many years
now. Others have just recently started using it. The type of compression that is used depends
on cost, size, space, and many other factors.
After reviewing a basic background on compression, one question remains unanswered:
what type of compression is used for a particular application? In the following sections, the

17

uses of compression in two major areas will be discussed: computer les, and digital hi-
stereo equipment. Knowledge about these areas is particularly useful, because it can help in
deciding which device to use.

4.1 Compression in File Formats
When digital audio technology was rst appearing on the market, each computer manufac-
turer had their own le format, or formats, associated with their computer (Audio FAQ). As
software became more advanced, computers attained the ability to read more than one le
format. Today, most software can read and write a wide range of le formats, leaving the
choice to the user.
In general, there are two types of le formats, raw" and self-describing. In a raw le
format data can be in any format. The encoding and parameters are xed and know in
advance to be able to read the le. The self-describing format has a header in which di erent
information about the data type are stored, like sampling rate and compression. The main
concern here will be with self-describing le formats, since these are most often used and
most versatile.
A disadvantage of using compression in computer les is that the le usually needs to be
converted to linear PCM data for playback on digital audio devices. This requires extra code
and processing time. It also may be one of the reasons why approximately half of the le
formats available for computers don't support compression. The following is a chart taken
from the Audio Tutorial FAQ" of The Center for Innovative Computer Applications. It
describes most of the popular le formats on the market, and the compression that is used if
any:

18

Extension, Name Origin Variable Parameters
.au or .snd NeXT, Sun rate, #channels, encoding, info string
.aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info
.aif(f), AIFC Apple, SGI same (extension of AIFF with compression)
.i , IFF/8SVX Amiga rate, #channels, instrument info (8 bits)
.voc Soundblaster rate (8 bits/1 ch; can use silence deletion)
.wav, WAVE Microsoft rate, #channels, sample width, lots of info
including compression scheme]
.sf IRCAM rate, #channels, encoding, info
none, HCOM Mac rate (8 bits/1 ch; uses Hu man compression)
none, MIME Internet usually 8-bit {Law compression 8000 samp/s]
.mod or .nst Amiga bank of digitized instrument samples with
sequencing information]
Many of these le formats are just uncompressed PCM data with the sampling rate and
the number of channels used during recording speci ed in the header. For the formats that do
support compression, it is usually optional. For example, in the Soundblaster .voc" format,
silence compression can be used, and in the Microsoft .wav" format, a number of di erent
encoding schemes can be used including PCM, DM, DPCM, and ADPCM.
Conversion from one format to another can be accomplished via software. The Audio
FAQ" also provides information on a number of di erent programs that will do the conversion.
When converting from uncompressed to compressed formats, the le is generally smaller
afterwards, but some quality is lost. If the le is later converted back, the size will increase,
but the quality can never be regained.

4.2 Compression in Recording Devices
There are currently four major digital stereo devices on the market. These are the Compact
Disc (CD), the Digital Analog Tape (DAT), the Digital Compact Cassette (DCC), and the
MiniDisc (MD). They are all very di erent from each other. The CD and MD use an optical
storage mechanism, and the DAT and DCC use a magnetic tape to store the data. There are
also a number of other apparent di erences between the mediums. For example, a CD is not
19

re-writable while the others are.
A major di erence that may not be apparent, however, is that the MD and DCC utilize
digital data compression while the DAT and CD do not. This allows the MD and DCC to be
physically smaller than their uncompressed counterparts. In both devices, the smaller data
size is necessary and advantageous.
In the MD, the design goal was to make the optical disc small so that it would be portable.
The MD contains the same density of data as the CD. Only by using compression can the disc
be made physically smaller than the CD. In addition to reducing the size, the compression
used gave the MD other advantages. It allowed the MD to be the rst optical player with
the digital anti-shock mechanism described in the introduction. Since less data is required
to generate sound and the MD reads at the same speed as the CD, the MD can read more
data than it needs to generate sound. The extra data is stored in a bu er, which does not
need to be very big. CD's eventually came out with the same technology, but in order to
implement it, the reading speed of the CD needed to be increased, and the data needed to
be compressed after reading to t it into a memory bu er.
The design goal of the DCC was to make the storage medium inexpensive and the same
size as an audio tape. By doing this, a DCC player could accept standard audio tapes as
well as the new DCC tapes, making it more marketable. To be able to t the data onto a
relatively inexpensive tape medium which can be housed in an audio cassette case, digital
compression was required.
In both the MD and DCC, the space available for digital audio data was approximately 1=4
of the size required for PCM data. The compression ratio needed was therefore approximately
4:1. To obtain such high compression rates, the compression schemes utilize psychoacoustic
phenomena.
Precision Adaptive Subband Coding (PASC) is the compression algorithm that is used
for the DCC to provide a 4:1 compression of the digital PCM data. PASC is described in
the book Advanced Digital Audio, edited by Ken Pohlmann:

20

The PASC system is based on three principles. First, the ear only hears sounds
above the threshold of hearing. Second, louder sounds mask softer sounds of
similar frequency, thus dynamically changing the threshold of hearing. Similarly,
other masking properties such as high- and low-frequency masking may be util-
ized. Third, su cient data must be allocated for precise encoding of sounds above
the dynamic threshold of hearing.
Using PASC, enough digital data can t onto a medium the size of a cassette to make the
DCC player feasible.
The MD uses the ATRAC compression algorithm, which is based on the same psy-
choacoustical phenomenon. Compression in a MiniDisc is more advanced, however. The
MiniDisc achieves a compression ratio of 5:1 in order to o er 74 min of playback time"
(Yoshida, 1498).
Although these algorithms o er such a high compression, there are some losses that are
involved. Experts claim that they can hear a di erence between a CD and a MD, but the
actual losses are so minimal that the average person will not hear them. The largest errors
occur with certain types of audio sounds that the compression algorithm has problems with.
In an article in Audio Magazine, Edward Foster writes:
Although the test was not double-blind, and thus is suspect, I convinced my-
self I could reliably tell the original from the copy|just barely, buy di erent
nonetheless.
The di erences occurred in three areas: A slight suppression of low-level high-
frequency content when the algorithm needed most of the available bitstream
to handle strong bass and midrange content, a slight dulling of the attack of
percussion instruments (piano, harpsichord, glockenspiel, etc.) probably caused
by imperfect masking of pre-echo" and a slight post-echo" (noise pu ) at the
sensation of a sharp sound (such as claves struck in an acoustically dead envir-
onment). The second and third of these anomalies were most readily discernible
on single instruments played one note at a time in a quiet environment and were
taken from a recording speci cally made to evaluate perceptual encoders.
Similar e ects exist when listening to a DCC recording. Although the losses are minimal,
they are still present, being the tradeo of having the small compact portable format.

21

5 Conclusion
In the last decade, the eld of digital audio compression has grown tremendously. With the
expansion of the electronics industry and the decreasing prices of digital audio, many devices
which once used analog audio technology now use digital technology. Many of these digital
devices use compression to reduce storage space, and bring down cost.
Digital audio compression has become a sub-area of Audio Engineering, supporting many
professionals who specialize in this eld. Millions of dollars are invested by companies,
such as Sony and Philips, to develop proprietary compression schemes for their digital audio
applications (Audio FAQ).
Because of the widespread use of compression, knowledge in this area can be useful.
As a musician working with modern digital recording and editing equipment, the study of
compression can provide an advantage. Knowledge in the eld of compression can help in
the evaluation and understanding of recording and playback equipment. It can also aid when
manipulating digital les with computers. As we move into the next century, and digital
audio technology continues to grow, the knowledge of audio compression will become an
increasingly valuable asset.

22

Bibliography
Audio tutorial FAQ." FTP://pub/usenet/news.answers/audio-fmts/part 12], Center for
Innovative Computer Applications, August 1994.
J. G. Beerends and J. A. Stermerdink, A perceptual audio quality measure based on
a psychoacoustic sound representation," AES: Journal of the Audio Engineering Society,
vol. 40, p. 963, December 1992.
L. W. Couch, Digital and Analog Communication Systems. New York, NY: Macmillan
Publishing Company, fourth ed., 1993.
J. Degener, Digital speech compression," Dr. Dobb's Journal, vol. 19, p. 30, December
1994.
M. Fleischmann, Digital recording arrives," Popular Science, vol. 242, p. 84, April 1993.
E. J. Foster, Sony MSD-501 minidisc deck," Audio, vol. 78, p. 56, November 1994.
D. B. Guralnik, ed., Webster's New World Dictionary. New York, NY: Prentice Hall Press,
second college ed., 1986.
P. Lutter, M. Muller-Wernhart, J. Ramharter, F. Rattay, and P. Slowik, Speech research
with WAVE-GL," Dr. Dobb's Journal, vol. 21, p. 50, November 1996.
T. J. Lynch, Data Compression: Techniques and Applications. New York, NY: Van Nos-
trand Reinhold, 1985.
M. Nelson, The Data Compression Book. San Mateo, CA: M&T Books, 1992.
Panasonic Portable CD Player SL-S600C Operating Instructions.
K. C. Pollmann, ed., Advanced Digital Audio. Carmel, IN: SAMS, rst ed., 1993.
J. W. Ratcli , Audio compression," Dr. Dobb's Journal, vol. 17, p. 32, July 1992.
J. W. Ratcli , Examining PC audio," Dr. Dobb's Journal, vol. 18, p. 78, March 1993.
J. Rothstein, MIDI: A Comprehensive Introduction. Madison, WI: A-R Editions, Inc., 1992.
A. Vollmer, Minidisc, digital compact cassette vie for digital recording market," Electron-
ics, vol. 66, p. 11, September 13 1993.
J. Watkinson, An Introduction to Digital Audio. Jordan Hill, Oxford (GB): Focal Press,
1994.
T. Yoshida, The rewritable minidisc system," Proceedings of the IEEE, vol. 82, p. 1492,
October 1994.

23

Compression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Compression

Similar to Compression (20)

Recently uploaded

Recently uploaded (20)

Compression