What is Sound
All sounds are vibrations traveling through the air as sound waves. Sound
waves are caused by the vibrations of objects and radiate outward from
their source in all directions.
A vibrating object compresses the surrounding air molecules (squeezing
them closer together) and then rarefies them (pulling them farther
apart). Although the fluctuations in air pressure travel outward from the
object, the air molecules themselves stay in the same average position.
As sound travels, it reflects off objects in its path, creating further
disturbances in the surrounding air. When these changes in air pressure
vibrate your eardrum, nerve signals are sent to your brain and are
interpreted as sound.
Fundamentals of a Sound Wave
• The simplest kind of sound wave is a sine wave. Pure
sine waves rarely exist in the natural world, but they are
a useful place to start because all other sounds can be
broken down into combinations of sine waves. A sine
wave clearly demonstrates the three fundamental
characteristics of a sound wave: frequency, amplitude,
• Frequency is the rate, or number of times per second, that
a sound wave cycles from positive to negative to positive
again. Frequency is measured in cycles per second,
or hertz (Hz). Humans have a range of hearing from 20 Hz
(low) to 20,000 Hz (high). Frequencies beyond this range
exist, but they are inaudible to humans.
• Amplitude (or intensity) refers to the strength of a sound
wave, which the human ear interprets as volume or
loudness., Audio meters use a logarithmic scale (decibels)
to make the units of measurement for audio loudness.
• Bit rate refers to the amount of data, specifically bits, transmitted or
received per second.
• One of the most common bit rates given is that for compressed audio
files. For example, an MP3 file might be described as having a bit rate
of 160 kbit/s or 160000 bits/second. This indicates the amount of
compressed data needed to store one second of music.
• The standard audio CD is said to have a data rate of 44.1 kHz/16,
meaning that the audio data was sampled 44,100 times per second,
with a bit depth of 16. CD tracks are usually stereo, using a left and
right track, so the amount of audio data per second is double that of
mono, where only a single track is used. The bit rate is then 44100
samples/second x 16 bits/sample x 2 tracks = 1,411,200 bit/s or
• To fully define a sound file's digital audio bit rates: the format of the
data, the sampling rate, word size (bit depth), and the number of
channels (e.g. mono, stereo, quad), must be known.
• Calculating values
• An audio file's bit rate can be calculated given sufficient information.
Given any three of the following four values, the fourth can be
• Bit rate = (sampling rate) × (bit depth) × (number of channels)E.g., for
a recording with a 44.1 kHz sampling rate, a 16 bit depth, and 2
• 44100 × 16 × 2 = 1411200 bits per second = 1411.2 kbit/sThe
eventual file size of an audio recording can also be calculated using a
Sample rate indicates the number of digital snapshots taken of an audio signal each
second. This rate determines the frequency range of an audio file. The higher the
sample rate, the closer the shape of the digital waveform is to that of the original
analogue waveform. Low sample rates limit the range of frequencies that can be
recorded, which can result in a recording that poorly represents the original sound.
A. Low sample rate that distorts the
original sound wave.
B. High sample rate that perfectly
reproduces the original sound wave.
Human hearing includes frequencies up to around 20,000 Hz
Since every cycle of a waveform has both a positive and negative pressure, top to bottom,
we must dedicate a minimum of two samples for each cycle of a wave. Therefore, the
highest frequency a digital system can represent is half of the sampling rate. This is the
so-called "Nyquist theory“.
In the case of 44.1kHz, the highest frequency we can accurately represent is 22,050
Hertz. According to our understanding of human hearing, this frequency seems to be
enough, we can capture frequencies up to 20k and even a little beyond.
CD-quality (red book)
sound the rate is
44,100 samples per
For every digital sample, our analogue to digital converter asks "what is the
amplitude?". The question that remains is, how is this amplitude represented? The
answer is "bit depth" which determines both how many different amplitude
levels/steps are possible and what the overall capacity of the system is...how loud of
a signal it can tolerate.
CD-quality has a bit depth of 16. This means we will have 2^16 ("two to the 16th
power") different amplitude values available to us
Since the number of steps is divided between positive and negative values (crests and
troughs) this means it is divided into 32,767 positive and 32,768 negative values.
2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2 = 65,536 steps
For each sample taken, the actual amplitude must be "rounded" to the nearest available
level...producing another "error" relative to the original audio signal. The signal is
"quantized". This "quantization error" produces as small amount of "quantization
noise", noise inherent to digital recording. A digital system is totally noise-less on its
own, but as soon as it is recording a signal, it makes these errors and ends up with this
small amount of noise.
The overall amplitude capacity of an digital system can be theoretically approximated
as 6 decibels per bit. For our 16-bit CD-quality signal, this means our system can
tolerate 96 dB. (16 bits x 6 dB)
So, is 16-bits enough? The threshold of hearing varies among individuals, but is often
cited as 120 or 130 dB. So it may be that--unlike the CD-quality sampling rate and its
accommodation for the range of human hearing--our 16-bit system is not enough.
What bit depth would a system need to be to tolerate 130 dB ? 24 bit x 6 = 144 dB
If care is not taken when recording, a signal can easily exceed the maximum
amplitude, producing “ Digital clipping". In clipping, the waveform hits its amplitude
ceiling resulting is a cropped waveform.
The conversion of a raw audio signal into a digital representation is known as
quantization. The continuous, real-world audio signal, represented here as a smooth
waveform with positive and negative pressure levels, is recorded in a series of
snapshots known as "samples". Each sample is, like a frame of video, a picture of the
signal at that moment. Specifically, it is a picture of its amplitude.
The difference between the actual
incoming audio signal (grey line) and
the quantized digital signal (red line)
is called the quantization error.
"what is the amplitude?". The succession of these amplitude measurements
("samples", shown below as dotted lines) results in a digital approximation of the
original audio signal. The frequencies and notes we hear are the result of these
changing amplitudes over time.