SlideShare a Scribd company logo
1 of 11
Download to read offline
1 2.671 Go Forth and Measure
2.671 Measurement and Instrumentation
Monday 2-5 PM Lab
Dr. Daniel Braunstein
December 12
th
, 2013
PLAYBACK AND TRANSMISSION FREQUENCIES OF CELL PHONES
Margaret Coad
Massachusetts Institute of Technology
Cambridge, MA, USA
ABSTRACT
Cell phone sound quality was measured and
compared for six different phones in two cases: recording
and playback, and transmission. For recording and
playback, a white noise signal was recorded and played
back on each phone, and, for transmission, the white
noise signal was transmitted in a phone call between two
of the same phone. Frequency response analysis was
applied to determine the cutoff frequency of each phone
for both cases. The cutoff frequencies for recording and
playback were 3,470 ± 343 Hz for the LG enV 1, 3,609 ±
166 Hz for the LG Cosmos 2, 6,775 ± 634 Hz for the
HTC One, 11,110 ± 1202 Hz for the RAZR M, 15,335 ±
94 Hz for the iPhone 5, and 15,564 ± 708 Hz for the
iPhone 4. The cutoff frequencies for transmission were
3,569 ± 32 Hz for the HTC One, 3,766 ± 146 Hz for the
RAZR M, 3,685 ± 278 Hz for the iPhone 5, and 3,674 ±
65 Hz for the iPhone 4. Thus, for recording and
playback, the two Apple smartphones had the highest
cutoff frequencies, then the two Android smartphones,
and finally the two conventional phones, most likely due
to the fact that smartphones are multimedia devices that
require higher quality sound and can store more data than
conventional phones. However, for transmission, all
phones had cutoff frequencies within 6% of each other,
near the 3,400 Hz typical of current phone networks.
1. INTRODUCTION
Cell phones in recent years have been gaining
more and more functions in daily life, but their basic
function has remained the same: transmitting the sound of
the human voice to allow communication across long
distances. The fundamental measure of the cell phone,
then, is how intelligibly the information spoken into a
phone comes through to the other end of a phone call, or,
how similar are the sounds going into and coming out of
the phone.
Sound quality of six different cell phones released
between 2006 and 2013 was characterized, both by
recording and playing back sound through each phone,
known hereafter as recording and playback, and by
transmitting sound in a phone call between two of the
same phone, known henceforth as transmission. The
resulting frequency response functions were compared
across phones to determine similarities and differences in
how the six phones change the input sound signal.
Both measures of cell phone sound quality used
here, recording and playback and transmission, are
valuable in their own way. Understanding cell phone
transmission of sound is useful for quantifying call
quality and comparing the performance of different
phones during phone calls. On the other hand, knowing
the frequency response function of a phone for recording
and playback applies directly to sound quality when using
the phone as a voice recorder. The recording and
playback frequency response also provides insight into
how the microphone and the speaker of the phone
function without the cell phone network coming into
consideration. All of these factors can be important to
someone buying a new cell phone or interested in the
performance of a phone that they already own.
Section 2 discusses background information on
human hearing and speech, the telephone as a system, and
frequency response analysis. Section 3 explains the
measurement of the frequency response functions for
each phone. Section 4 presents and discusses the
measured frequency response functions and cutoff
frequencies, and Section 5 draws conclusions and
suggests future work.
2. BACKGROUND
An understanding of human hearing and speech, as
presented in Section 2.1, provides motivation for
transmission or recording and playback of a high range of
frequencies to improve intelligibility of the input sound.
In addition, understanding the telephone call as a system
of sound filters, as discussed in Section 2.2, puts the
measured frequency response functions in context.
2 2.671 Go Forth and Measure
[e]
[o]
Finally, the explanation of frequency response functions
and cutoff frequencies in Sections 2.3 and 2.4 clarifies
how the raw data of sound files were analyzed to
determine meaningful results.
2.1 HUMAN HEARING AND SPEECH
The human ear can hear frequencies ranging from
approximately 20 Hz to 20,000 Hz. As people age, their
ability to hear the highest frequencies diminishes, but
most adolescents can hear up to 20,000 Hz, with the most
sensitivity in the range from 500 Hz to 5,000 Hz. Fig. 11
shows the thresholds of audibility and pain for a typical
human ear.
Figure 1: Diagram of the audible frequency range of
the human ear. Sound needs to be louder, having a
higher sound pressure level, to be heard in the lower
and higher frequency ranges and is most easily heard
in the range from 500 Hz to 5,000 Hz.1
Any sound can be broken down into a combination
of sinusoidal sound waves of different frequencies. The
“voiced” sounds of human speech, that is, those spoken
on a specific pitch, such as the vowels [a], [e], [i], [o], [u]
and the consonants [b], [d], [g], [l], [m], [n], and [r],
typically contain a lowest frequency, known as the
fundamental frequency, as well as several higher
frequencies at integer multiples of the fundamental
frequency, known as harmonics.2
The presence and the
relative amplitudes of the higher harmonics determine the
timbre of the sound, which allows distinction between
specific vowel or consonant sounds as well as among the
voices of different people. The lowest two or three
harmonics of vowel sounds are the most important for
understanding speech and typically range in frequency
from 200 Hz to 3,000 Hz for adults.1
Fig. 2 shows the
frequency makeup of the long sounds of the letters [e]
and [o].
Figure 2: Frequency content of the long sounds of
the letters [e] and [o]. For both sounds, the
fundamental frequency is just above 200 Hz and the
third harmonic is just above 600 Hz. The letters can
be distinguished by the relative magnitudes of each
harmonic.
“Unvoiced” sounds, such as the consonants [f], [s],
[p], [t], and [k], are different from voiced sounds in that
they do not contain specific frequencies in integer
multiples but, rather, tend to contain a wider range of
frequencies with less distinct values. Consonants are
more important than vowels for speech recognition, but
their distinguishing features range up to higher
frequencies than vowels.1
Fig. 3, for example, compares
the frequency content of the [f] sound and the [s] sound,
showing that their low-frequency components are similar
to each other, but they also have high-frequency
components that are important for distinction between the
two sounds.
3 2.671 Go Forth and Measure
[f]
[s]
[s]
Figure 3: Frequency content of the sounds of the
letters [f] and [s]. These letters look distinctly
different than the vowels in Fig. 2, because their
spectrum is continuous and has a much wider range
of frequencies. The [f] and the [s] sounds are easily
distinguishable. However, looking at only the
lowest 3 kHz, the two sounds could be mistaken for
each other.
2.2 THE TELEPHONE AS A SYSTEM
The telephone is designed to transmit close to the
minimum amount of data necessary to convey the
information required to understand speech. Shown in
Fig. 4 is a schematic diagram of the different processes
that an input sound signal !!" goes through before it
becomes the output sound signal !!"# heard by the
receiving ear. In the traditional landline telephone
system, the cutoff frequencies !!
!"
and !!
!"#
  are equal to
4,000 Hz, and the sampling frequencies !!
!"
,   !!
!"#
, and
!!
!"#
are equal to 8,000 Hz, which results in output sound
signal !!"# being filtered to disregard any of the
frequency content of the input signal !!" above 4,000 Hz.3
Figure 4: Schematic diagram of a telephone
conversation. The signal !!" enters the microphone.
The microphone output is filtered by an anti-aliasing
low pass filter with cutoff frequency !!
!"
, and
sampled by an analog to digital converter at filter
and then sampled again by the telephone system at
frequency !!
!"#
. The signal is transmitted to the
receiving phone and similarly filtered before
reaching the receiving ear as the output signal !!"#.3
Until 2013 in the United States, all cell phone calls
were similar to those of land line phones, transmitting
only frequencies between 300 Hz and 3,400 Hz.4
This
frequency range is close to the range of highest sensitivity
for human hearing, 500 Hz to 5,000 Hz, as well as the
range of the lowest three harmonics for vowel sounds,
200 Hz to 3,000 Hz. Thus, most important sound
characteristics are retained in typical cell phone
conversations. However, consonants, especially the
sounds [f], [v], [d], and [t], which rely on high
frequencies to differentiate them from other sounds, have
been shown to be the most easily confused sounds when
cut off at 4,000 Hz.1
In addition, sound cut off at this
frequency tends to appear “muffled” rather than “bright”
and “thin” rather than “full” as compared to unfiltered
sound.3
To improve the quality and intelligibility of cell
phone conversations, and because modern computing
methods have progressed enough to be able to easily store
and transmit more data, cell phone manufacturers and
network providers have been working on creating phones
and networks capable of transmitting sound with a higher
range of frequencies. New to the U.S. in 2013 is
“wideband” or “HD voice” calling, which allows
transmission of sound with double the typical bandwidth
of frequencies, so that frequencies between 50 Hz and
about 7,000 Hz are retained for transmission.5
To achieve
this increased performance, both the transmitting and
receiving phone, as well as the network, must be capable
of supporting the wider span of frequencies.
4 2.671 Go Forth and Measure
2.3 FREQUENCY RESPONSE FUNCTIONS
Fig. 5 shows the system block diagram of a linear
time invariant (LTI) system in both the time domain and
the frequency domain. LTI systems are defined by two
characteristics: linearity and time invariance. Linearity
means that, for each input, there is a one-to-one
correspondence to an output. In other words, if a linear
system is excited by an input at a certain frequency, the
corresponding output will be at the same frequency, but
may be shifted in magnitude and phase. In addition, if the
amplitude of the input signal is multiplied by some
constant factor, the amplitude of the output signal will be
multiplied by the same constant factor. Time invariance
means that the system will treat the same input in the
same way no matter when it is excited by that input. LTI
systems can be described in the time domain by ordinary
differential equations (ODEs) with constant coefficients.
Figure 5: System block diagram of a linear time
invariant (LTI) system. In the time domain, the
input signal !(!) is passed through a filter ℎ(!)
characterized by a system of ordinary differential
equations (ODEs) to produce an output !(!), which
is the convolution of the input signal and the system
transfer function in the time domain. In the
frequency domain, which is calculated by taking the
Laplace transform of the time domain signal, the
output signal is found simply by multiplying the
input signal and the system transfer function.
As shown in Fig. 5, in the frequency domain, the
transfer function of the output signal !(!) is given simply
by multiplying the input transfer function !(!) by the
system transfer function !(!),
! ! = ! ! !(!) . (1)
Rearranging Eq. 1 and evaluating at a specific frequency
! gives an expression for the gain !(!), or the ratio of
output to input amplitudes, at each frequency,
! ! =
!(!)
!(!)
. (2)
A plot of gain versus frequency for a system is
known as the frequency response function. Estimation of
the frequency response function for a physical system
generally involves exciting the system with a certain
input and then comparing the input and output signals
using a technique that extracts frequency information
from time-domain signals, such as the Fast Fourier
Transform (FFT). The system excitation can be done
either by sweeping through a range of frequencies one at
a time and measuring the output for each frequency, or by
using a stochastic, or random, input that contains a wide
range of frequencies all at once. The second method, the
stochastic input, is more time-efficient for the data-taking
process.
In the case of a randomly generated (stochastic)
input signal, there are several methods for calculating the
frequency response function, all of which treat noise in
the system differently. The technique described next,
known as !!, is optimized to minimize noise in the
output signal, which can occur due to measurement error
and the slight non-linearity inherent in any real-world LTI
system.6
First, the auto-correlation function of the input signal
is calculated. This function gives a measure of the
signal’s memory for itself and is computed by correlating
the signal with itself for various lags or shifts. The auto-
correlation function !!!(!) of an input signal !(!) is
given by
!!! ! = ! !   ! ! + !   !"
!
!!
, (3)
where ! is the time lag at which the function is
evaluated.7
From the input auto-correlation function, the
power spectral density8
!!!(!) of !(!) is calculated by
taking the FFT of !!!(!),
!!!(!) = !!!(!)!!!!"#$
!"
!
!!
. (4)
The power spectral density of a sound signal gives the
expected power contained near each frequency in that
sound signal.
Additionally, the output-input cross-correlation
function !!" is calculated for the output signal !(!) and
input signal !(!) as
!!" ! = ! !   ! ! + !   !"
!
!!
. (5)
The cross-correlation function between two sound signals
is a measure of the similarity between the two sounds and
whether they are delayed from each other. From the
cross-correlation function, the cross spectral density
!!"(!) of !(!) and !(!) is calculated by taking the FFT
of !!" ! ,
5 2.671 Go Forth and Measure
!!"(!) = !!"(!)!!!!"#$
!"
!
!!
. (6)
Finally, the gain ! ! for each frequency can be
calculated by dividing the input-output cross-spectral
density by the input power spectral density,
! ! =
!!"(!)
!!!(!)
. (7)
Division here in the frequency domain is equivalent to
deconvolving the input auto-correlation function from the input
output cross-correlation in the time domain9
. The resulting
frequency response function ! ! can be graphed by plotting
gain versus frequency in the desired range.
2.4 CUTOFF FREQUENCIES
Low pass filters and band pass filters have cutoff
frequencies above which the gain of the system begins to
fall off quickly with increasing frequency. These cutoff
frequencies can be calculated from the frequency
response function graph and are typically defined as the
frequency at which the gain drops below some chosen
value.
For a simple RC circuit low-pass filter9
, the cutoff
frequency is defined as the frequency at which the gain
falls below 0.707, as shown in Fig. 6. The cutoff
frequency provides a quantitative measurement with
which the frequency responses of different systems of the
same type can be compared.
Figure 6: Frequency response plot for a simple RC
circuit low-pass filter. The gain is the ratio of the
circuit output voltage to its input voltage. Output
signals that have amplitude of less than 0.707 times
the amplitude of the input signals are above the
cutoff frequency, which here is approximately 200
Hz.
3. FREQUENCY RESPONSE MEASUREMENT
3.1 RECORDING AND PLAYBACK
A white noise sound file was used as the stochastic
input for frequency response testing of the phones. The
sound file contained frequencies between 0 Hz and
24,000 Hz, with a Gaussian distribution of amplitudes for
each frequency. A power spectrum10
of the white noise
file is shown in Fig. 7.
Figure 7: Power spectrum of input white noise
sound file. Power has a Gaussian distribution for
each frequency in the range from 0 to 24,000 Hz, but
is only shown here up to 18,000 Hz.
This white noise sound file was played through a set
of speakers and the resulting sound was recorded with a
reference microphone both before and after being
recorded and played back by each of the six cell phones.
Fig. 8 shows a block diagram of the entire experimental
setup.
Figure 8: Block diagram of experimental setup for
recording and playback. A white noise sound file is
played by the speakers and recorded using the
microphone both before and after being passed
through the phone’s microphone and speakers. The
data are collected and then analyzed.
The microphone used was a Vernier Microphone
MCA-BTA interfaced to the computer through the
Vernier LabQuest Mini. The sound was recorded using
the respective phone’s recording function and played
back on speakerphone at full volume. Fig. 9 shows a
photograph of the setup.
10 100 1 10
3
× 1 10
4
×
0.01
0.1
1
Data
Gain = 0.707
Log Frequency (Hz)
LogGain
6 2.671 Go Forth and Measure
Figure 9: Experimental apparatus. The Vernier
Microphone, the LabQuest Mini, and the computer
are used to record white noise directly from the
speakers as well as from the phone being analyzed.
Six different phones were analyzed, shown in Fig. 10
in order of release date, which ranged from November
2006 to March 2013. There are two conventional cell
phones, the LG enV 1 and the LG Cosmos 2, two
Android smartphones, the RAZR M and the HTC One,
and two Apple smartphones, the iPhone 4 and the iPhone
5. Also, the two newest phones, the iPhone 5 and the
HTC One, are HD Voice capable, so if on the right
network in the right conditions, they are able to transmit a
higher range of frequencies than typical phone calls.
Figure 10: Photographs of the six different phones
tested and their release dates. The release dates
range from November 2006 to March 2013, and the
phones include two conventional cell phones, two
Android smartphones, and two Apple smartphones.
Four trials were completed for each phone, and the
resulting input and output data were analyzed by
determining the frequency response function and a
corresponding cutoff frequency for each phone.
3.2 TRANSMISSION
The measurement of input and output sound files for
transmission in a phone call between two of the same
phone was essentially the same as that for recording and
playback, except that two phones were involved instead
of one. Fig. 11 shows the system block diagram for this
new system.
Figure 11: Block diagram of experimental setup for
transmission. A white noise sound file is played by
the speakers and recorded using the microphone both
before and after being transmitted in a phone call
between two of the same phone. The data are
collected and then analyzed.
This second experiment required two rooms that
were out of earshot of each other. The speakers played
the input sound file into the first phone in one room,
while the second phone received the sound and played it
on speakerphone to the microphone in a second room.
Also, since the transmission experiment required two
of the same phone, only four of the phones were tested.
The two conventional phones, the LG enV 1 and the LG
Cosmos 2, were not tested, because it was too difficult to
acquire two of each of them.
Once again, four trials were completed for each
phone, and then the data were analyzed to determine the
frequency response function and a corresponding cutoff
frequency for each phone.
7 2.671 Go Forth and Measure
4. RESULTS AND DISCUSSION
4.1 RECORDING AND PLAYBACK
Six different frequency response functions were
determined, one for each phone, as shown in Figs. 12-17.
As expected from the stochastic input and the fact that
real-world systems are not perfectly linear and time
invariant, there is noise in the frequency response
functions. Nevertheless, a general pattern of attenuation
can be observed at higher frequencies. The cutoff
frequency was defined here as the frequency above which
the average of the last twenty gain values begins to drop
below 0.1 and stays below 0.1 for at least the next thirty
gain values. An exception was made for the iPhone 5 to
decrease the cutoff gain to 0.04, because, with a value of
0.1, the calculated cutoff frequency was far lower than the
visually obvious cutoff frequency around 15,000 Hz.
The plots shown in Figs. 12-17 are the average
frequency response functions of the four trials for each
phone. The black curve is the moving average of the past
20 data points. Only the frequencies up to 18,000 Hz are
shown, because, according to the spec sheet, the Vernier
Microphone begins to lose fidelity around that frequency.
Figure 12: Average frequency response function for
the LG enV 1 for recording and playback. The
colored data points denote the determined values of
the frequency response function. The black curve
denotes the moving average of the last twenty data
points. With the cutoff gain chosen to be 0.1, the LG
enV 1 attenuates frequencies higher than 3,470 ±
343 Hz.
Figure 13: Average frequency response function for
the LG Cosmos 2 for recording and playback. With
the cutoff gain chosen to be 0.1, the LG Cosmos 2
attenuates frequencies higher than 3,609 ± 166 Hz.
Figure 14: Average frequency response function for
the HTC One for recording and playback. With the
cutoff gain chosen to be 0.1, the HTC One attenuates
frequencies higher than 6,775 ± 634 Hz.
Figure 15: Average frequency response function for
the RAZR M for recording and playback. With the
cutoff gain chosen to be 0.1, the RAZR M attenuates
frequencies higher than 11,110 ± 1202 Hz.
Figure 16: Average frequency response function for
the iPhone 5 for recording and playback. With the
cutoff gain chosen to be 0.04, the iPhone 5
attenuates frequencies higher than 15,335 ± 94 Hz.
8 2.671 Go Forth and Measure
Figure 17: Average frequency response function for
the iPhone 4 for recording and playback. With the
cutoff gain chosen to be 0.1, the iPhone 4 attenuates
frequencies higher than 15,564 ± 708 Hz.
As evident in the frequency response functions
above, there is a large variation in cutoff frequencies
among phones, ranging from 3,470 Hz to 15,564 Hz. Fig.
18 shows a summary of the cutoff frequency data, with
error bars showing the 95% confidence level calculated
from the four trials of each phone.
Figure 18: Cutoff frequencies for recording and
playback for the six phones tested. The two Apple
smartphones have the highest cutoff frequencies, and
then the two Android smartphones, and then the two
conventional phones.
The two Apple smartphones have the highest cutoff
frequencies, 15,564 ± 708 Hz for the iPhone 4 and 15,335
± 94 Hz for the iPhone 5, which means that these phones
can be expected to have high quality sound when used as
voice recorders. Next highest are the two Android
smartphones, which vary in cutoff frequency from 11,110
± 1202 Hz for the RAZR M to about 6,775 ± 634 Hz for
the HTC One. These phones used as recorders will still
have higher quality sound than a typical phone call, but
not as high quality as the iPhones. The lowest cutoff
frequencies are those of the two conventional phones, at
3609 ± 166 Hz for the LG Cosmos 2 and 3,470 ± 343 Hz
for the LG enV 1. When recording and playing back
sound with these phones, one can expect to experience
the distortions explained in Sections 2.1 and 2.2, where
the sound appears muffled and thin and the consonants
are confused with one another.
The finding that all four smartphones have
significantly higher cutoff frequencies than the two
conventional phones is not surprising, given that
smartphones tend to have more space to store data and are
used as multimedia devices, which requires better sound
quality than telephone calling.
These results agree with the theory that cell phones
typically transmit frequencies between 300 Hz and 3,400
Hz during phone calls, as all of the phones in this
experiment kept the frequencies in that range more or less
intact. All of the cutoff frequencies found here were
larger than 3,400 Hz, meaning that the sound signal can
be further filtered down to the typical transmission range
before it is transmitted to another phone through the
wireless network.
In addition, the two HD Voice capable phones, the
HTC One and the iPhone 5, have the possibility of
transmitting the frequencies quoted for HD Voice, which
cuts off at around 7,000 Hz, since their cutoff frequencies
for recording and playback are close to or above 7,000
Hz.
4.2 TRANSMISSION
The resulting frequency response functions for
transmission for the four phones measured are shown in
Figs. 19-22, and the corresponding summary plot of
cutoff frequencies is shown in Fig. 23.
Figure 19: Average frequency response function for
the HTC One for transmission. With the cutoff gain
chosen to be 0.1, the HTC One attenuates
frequencies higher than 3,569 ± 32 Hz.
Figure 20: Average frequency response function for
the RAZR M for transmission. With the cutoff gain
9 2.671 Go Forth and Measure
chosen to be 0.1, the RAZR M attenuates
frequencies higher than 3,766 ± 146 Hz.
Figure 21: Average frequency response function for
the iPhone 5 for transmission. With the cutoff gain
chosen to be 0.1, the iPhone 5 attenuates frequencies
higher than 3,685 ± 278 Hz.
Figure 22: Average frequency response function for
the iPhone 4 for transmission. With the cutoff gain
chosen to be 0.1, the iPhone 4 attenuates frequencies
higher than 3,674 ± 65 Hz.
Figure 23: Cutoff frequencies for transmission for
the four phones tested. All four phones have cutoff
frequencies within 6% of each other, averaging
3,674 Hz, which is 8% higher than the traditional
network cutoff of 3,400 Hz. None of the phones
reached the HD Voice cutoff of 7,000 Hz.
The cutoff frequencies for transmission range from
3,569 ± 52 Hz for the HTC One to 3,766 ± 146 Hz for the
RAZR M, with those of the two iPhones in between at
3,674 ± 65 Hz for the iPhone 4 and 3,685 ± 278 Hz for
the iPhone 5. All of the cutoff frequencies are within 6%
of each other, and the error bars for all the phones overlap
those for at least one other phone. This agrees with the
theory that the wireless network cuts off sound for all
phone calls at the same frequency. However, the average
of the four measured values for the cutoff frequency is
3,674 Hz, which is 8% higher than the expected cutoff
frequency of 3,400 Hz. This discrepancy is most likely
due to the arbitrary choice of 0.1 as the cutoff gain. If the
cutoff gain had been chosen to be higher, the calculated
cutoff frequencies would have all been lower and would
have agreed even more with the theory.
None of the four phones tested, even those listed as
being capable of HD Voice, had cutoff frequencies that
reached the edge of the HD Voice range of 7,000 Hz.
The HTC One and iPhone 5 are capable of transmitting
sound in the HD Voice range of frequencies, but they
must not have been on the correct network or had the
correct settings during testing for this to occur. Lack of
correct network is likely, since, at the time this
experiment was conducted, only one or two networks had
transitioned to HD Voice already.
The two phones that were not tested for
transmission, the LG Cosmos 2 and the LG enV 1, most
likely would have had cutoff frequencies equal to or
slightly lower than those calculated for recording and
playback for those phones, 3609 ± 166 Hz for the LG
Cosmos 2 and 3,470 ± 343 Hz for the LG enV 1, as these
are already close to the expected value of 3,400 Hz for
cell phone transmission.
5. CONCLUSIONS
Frequency response analysis of six different cell
phones revealed that all phones have nearly the same
cutoff frequency for transmission of sound in a phone
call, but phones differ immensely in cutoff frequencies
for recording and playing back sound. The determined
cutoff frequencies for recording and playback were 3,470
± 343 Hz for the LG enV 1, 3,609 ± 166 Hz for the LG
Cosmos 2, 6,775 ± 634 Hz for the HTC One, 11,110 ±
1202 Hz for the RAZR M, 15,335 ± 94 Hz for the iPhone
5, and 15,564 ± 708 Hz for the iPhone 4. Thus, Apple
smartphones have higher cutoff frequencies than Android
smartphones, which have higher cutoff frequencies than
the conventional phones. The higher bandwidth of
smartphones is most likely due to their increased capacity
for data storage and their use as multimedia devices. The
cutoff frequencies for transmission were 3,569 ± 32 Hz
for the HTC One, 3,766 ± 146 Hz for the RAZR M, 3,685
± 278 Hz for the iPhone 5, and 3,674 ± 65 Hz for the
iPhone 4, all within 6% of each other with an average
value of 3,674 Hz. This average value is 8% higher than
the documented value for the cell phone cutoff frequency
of 3,400 Hz. This slight discrepancy is most likely due to
10 2.671 Go Forth and Measure
the arbitrary nature of the definition of the cutoff
frequency.
Characterization of cell phones sound quality in
these two ways revealed that, for smartphones, the
limiting factor in increasing the cutoff frequency and thus
improving the sound quality is not the phone itself but the
wireless network. For all but a few phones that are
calling other HD Voice phones on the correct network,
sound transmission is currently limited to the frequency
range below 3,400 Hz. However, in the next few years,
as more networks come onboard with HD Voice, it
should become commonplace to experience phone calls
that transmit frequencies up to 7,000 Hz. This added
bandwidth will greatly enhance sound quality, especially
understanding of consonants, and make conversations
sound fuller and less muffled.
Future work in this subject could explore other
features of the frequency response function besides the
cutoff frequency, e.g. dips and spikes at certain
frequencies, to gain a fuller picture of what determines
sound quality. For example, intelligibility studies could
be done on transmission by the four phones tested to
understand how the shape of the frequency response
function affects sound quality when all phones tested
have the same cutoff frequency. In addition, phones
could be tested that are set up correctly for HD Voice to
see what the new frequency response function looks like.
ACKNOWLEDGMENTS
The author would like to thank Dr. Braunstein for
help in designing, carrying out, and analyzing the
experiment, Prof. Leonard for useful discussions and
providing the white noise file, and Dr. Hughey for
answering questions about the experiment and data
analysis. In addition, this experiment could not have been
performed without the help of those who volunteered
their phones to be tested.
REFERENCES
1
Malcolm J. Crocker, Handbook of Noise and Vibration
Control, John Wiley & Sons (2007), available for
download (with MIT certificate) at
http://app.knovel.com/web/toc.v/cid:kpHNVC
0001/viewerType:toc/root_slug:handbook-noise-
vibration
2
Youngmoo Edmund Kim (2003), “Singing Voice
Analysis/Synthesis,” available for download at
http://dspace.mit.edu/handle/1721.1/62044
3
Eberhard Hänsler, Gerhard Schmidt (2008). Speech and
Audio Processing in Adverse Environments, available
for download at
http://link.springer.com/book/10.1007%2F978-3-540-
70602-1.
4
Alexandra Chang, “How HD Voice Works to Make
Your Calls Sound Drastically Better,” (2013), retrieved
October 29, 2013 from
http://www.wired.com/gadgetlab/2013/04/how-hd-
voice-works-to-make-your-calls-clearer/
5
Chris Forrester, “Trends in cell phone voice
processing,” Canadian Acoustics Vol 37, No 3 (2009),
available for download at http://jcaa.caa-
aca.ca/index.php/jcaa/article/view/2131
6
Randall J. Allemang (2001), Vibrations: Experimental
Modal Analysis, available for download at
http://www.sdrl.uc.edu/academic-course-info/vibrations-
iii-20-263-663
7
Ian Hunter, “Auto- and Cross-Correlation Functions,”
2.671 Lecture Notes, MIT, (unpublished)
8
Alan V. Oppenheim and George C. Verghese (2010),
Signals, Systems, and Inference, Class Notes for 6.011,
available for download at
http://ocw.mit.edu/courses/electrical-engineering-and-
computer-science/6-011-introduction-to-
communication-control-and-signal-processing-spring-
2010/readings/
9
Ian Hunter, “System Identification Via FFT,” 2.671
Lecture Notes, MIT, Revised 2013 (unpublished)
10
Ian Hunter, Barbara Hughey, “Fourier Analysis of Wav
Files,” 2.671 Lecture Notes, MIT, Revised 2013
(unpublished)
11 2.671 Go Forth and Measure

More Related Content

Similar to Coad_M_GoForth2Column

Coad_Margaret_GoForthFinalPoster
Coad_Margaret_GoForthFinalPosterCoad_Margaret_GoForthFinalPoster
Coad_Margaret_GoForthFinalPosterMargaret Coad
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
Energy distribution in formant bands for arabic vowels
Energy distribution in formant bands for arabic vowelsEnergy distribution in formant bands for arabic vowels
Energy distribution in formant bands for arabic vowelsIJECEIAES
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...kevig
 
High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...
High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...
High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...IJECEIAES
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalSonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalCemal Ardil
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 
Accent seminar
Accent seminarAccent seminar
Accent seminarMayur Garg
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...April Smith
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerIJESM JOURNAL
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerIJESM JOURNAL
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 

Similar to Coad_M_GoForth2Column (20)

Coad_Margaret_GoForthFinalPoster
Coad_Margaret_GoForthFinalPosterCoad_Margaret_GoForthFinalPoster
Coad_Margaret_GoForthFinalPoster
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Energy distribution in formant bands for arabic vowels
Energy distribution in formant bands for arabic vowelsEnergy distribution in formant bands for arabic vowels
Energy distribution in formant bands for arabic vowels
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
 
High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...
High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...
High Level Speaker Specific Features as an Efficiency Enhancing Parameters in...
 
B110512
B110512B110512
B110512
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalSonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposal
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Int journal 01
Int journal 01Int journal 01
Int journal 01
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Accent seminar
Accent seminarAccent seminar
Accent seminar
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
 
ecegwp
ecegwpecegwp
ecegwp
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech Synthesizer
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech Synthesizer
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Ijeer journal
Ijeer journalIjeer journal
Ijeer journal
 
F010334548
F010334548F010334548
F010334548
 

Coad_M_GoForth2Column

  • 1. 1 2.671 Go Forth and Measure 2.671 Measurement and Instrumentation Monday 2-5 PM Lab Dr. Daniel Braunstein December 12 th , 2013 PLAYBACK AND TRANSMISSION FREQUENCIES OF CELL PHONES Margaret Coad Massachusetts Institute of Technology Cambridge, MA, USA ABSTRACT Cell phone sound quality was measured and compared for six different phones in two cases: recording and playback, and transmission. For recording and playback, a white noise signal was recorded and played back on each phone, and, for transmission, the white noise signal was transmitted in a phone call between two of the same phone. Frequency response analysis was applied to determine the cutoff frequency of each phone for both cases. The cutoff frequencies for recording and playback were 3,470 ± 343 Hz for the LG enV 1, 3,609 ± 166 Hz for the LG Cosmos 2, 6,775 ± 634 Hz for the HTC One, 11,110 ± 1202 Hz for the RAZR M, 15,335 ± 94 Hz for the iPhone 5, and 15,564 ± 708 Hz for the iPhone 4. The cutoff frequencies for transmission were 3,569 ± 32 Hz for the HTC One, 3,766 ± 146 Hz for the RAZR M, 3,685 ± 278 Hz for the iPhone 5, and 3,674 ± 65 Hz for the iPhone 4. Thus, for recording and playback, the two Apple smartphones had the highest cutoff frequencies, then the two Android smartphones, and finally the two conventional phones, most likely due to the fact that smartphones are multimedia devices that require higher quality sound and can store more data than conventional phones. However, for transmission, all phones had cutoff frequencies within 6% of each other, near the 3,400 Hz typical of current phone networks. 1. INTRODUCTION Cell phones in recent years have been gaining more and more functions in daily life, but their basic function has remained the same: transmitting the sound of the human voice to allow communication across long distances. The fundamental measure of the cell phone, then, is how intelligibly the information spoken into a phone comes through to the other end of a phone call, or, how similar are the sounds going into and coming out of the phone. Sound quality of six different cell phones released between 2006 and 2013 was characterized, both by recording and playing back sound through each phone, known hereafter as recording and playback, and by transmitting sound in a phone call between two of the same phone, known henceforth as transmission. The resulting frequency response functions were compared across phones to determine similarities and differences in how the six phones change the input sound signal. Both measures of cell phone sound quality used here, recording and playback and transmission, are valuable in their own way. Understanding cell phone transmission of sound is useful for quantifying call quality and comparing the performance of different phones during phone calls. On the other hand, knowing the frequency response function of a phone for recording and playback applies directly to sound quality when using the phone as a voice recorder. The recording and playback frequency response also provides insight into how the microphone and the speaker of the phone function without the cell phone network coming into consideration. All of these factors can be important to someone buying a new cell phone or interested in the performance of a phone that they already own. Section 2 discusses background information on human hearing and speech, the telephone as a system, and frequency response analysis. Section 3 explains the measurement of the frequency response functions for each phone. Section 4 presents and discusses the measured frequency response functions and cutoff frequencies, and Section 5 draws conclusions and suggests future work. 2. BACKGROUND An understanding of human hearing and speech, as presented in Section 2.1, provides motivation for transmission or recording and playback of a high range of frequencies to improve intelligibility of the input sound. In addition, understanding the telephone call as a system of sound filters, as discussed in Section 2.2, puts the measured frequency response functions in context.
  • 2. 2 2.671 Go Forth and Measure [e] [o] Finally, the explanation of frequency response functions and cutoff frequencies in Sections 2.3 and 2.4 clarifies how the raw data of sound files were analyzed to determine meaningful results. 2.1 HUMAN HEARING AND SPEECH The human ear can hear frequencies ranging from approximately 20 Hz to 20,000 Hz. As people age, their ability to hear the highest frequencies diminishes, but most adolescents can hear up to 20,000 Hz, with the most sensitivity in the range from 500 Hz to 5,000 Hz. Fig. 11 shows the thresholds of audibility and pain for a typical human ear. Figure 1: Diagram of the audible frequency range of the human ear. Sound needs to be louder, having a higher sound pressure level, to be heard in the lower and higher frequency ranges and is most easily heard in the range from 500 Hz to 5,000 Hz.1 Any sound can be broken down into a combination of sinusoidal sound waves of different frequencies. The “voiced” sounds of human speech, that is, those spoken on a specific pitch, such as the vowels [a], [e], [i], [o], [u] and the consonants [b], [d], [g], [l], [m], [n], and [r], typically contain a lowest frequency, known as the fundamental frequency, as well as several higher frequencies at integer multiples of the fundamental frequency, known as harmonics.2 The presence and the relative amplitudes of the higher harmonics determine the timbre of the sound, which allows distinction between specific vowel or consonant sounds as well as among the voices of different people. The lowest two or three harmonics of vowel sounds are the most important for understanding speech and typically range in frequency from 200 Hz to 3,000 Hz for adults.1 Fig. 2 shows the frequency makeup of the long sounds of the letters [e] and [o]. Figure 2: Frequency content of the long sounds of the letters [e] and [o]. For both sounds, the fundamental frequency is just above 200 Hz and the third harmonic is just above 600 Hz. The letters can be distinguished by the relative magnitudes of each harmonic. “Unvoiced” sounds, such as the consonants [f], [s], [p], [t], and [k], are different from voiced sounds in that they do not contain specific frequencies in integer multiples but, rather, tend to contain a wider range of frequencies with less distinct values. Consonants are more important than vowels for speech recognition, but their distinguishing features range up to higher frequencies than vowels.1 Fig. 3, for example, compares the frequency content of the [f] sound and the [s] sound, showing that their low-frequency components are similar to each other, but they also have high-frequency components that are important for distinction between the two sounds.
  • 3. 3 2.671 Go Forth and Measure [f] [s] [s] Figure 3: Frequency content of the sounds of the letters [f] and [s]. These letters look distinctly different than the vowels in Fig. 2, because their spectrum is continuous and has a much wider range of frequencies. The [f] and the [s] sounds are easily distinguishable. However, looking at only the lowest 3 kHz, the two sounds could be mistaken for each other. 2.2 THE TELEPHONE AS A SYSTEM The telephone is designed to transmit close to the minimum amount of data necessary to convey the information required to understand speech. Shown in Fig. 4 is a schematic diagram of the different processes that an input sound signal !!" goes through before it becomes the output sound signal !!"# heard by the receiving ear. In the traditional landline telephone system, the cutoff frequencies !! !" and !! !"#  are equal to 4,000 Hz, and the sampling frequencies !! !" ,  !! !"# , and !! !"# are equal to 8,000 Hz, which results in output sound signal !!"# being filtered to disregard any of the frequency content of the input signal !!" above 4,000 Hz.3 Figure 4: Schematic diagram of a telephone conversation. The signal !!" enters the microphone. The microphone output is filtered by an anti-aliasing low pass filter with cutoff frequency !! !" , and sampled by an analog to digital converter at filter and then sampled again by the telephone system at frequency !! !"# . The signal is transmitted to the receiving phone and similarly filtered before reaching the receiving ear as the output signal !!"#.3 Until 2013 in the United States, all cell phone calls were similar to those of land line phones, transmitting only frequencies between 300 Hz and 3,400 Hz.4 This frequency range is close to the range of highest sensitivity for human hearing, 500 Hz to 5,000 Hz, as well as the range of the lowest three harmonics for vowel sounds, 200 Hz to 3,000 Hz. Thus, most important sound characteristics are retained in typical cell phone conversations. However, consonants, especially the sounds [f], [v], [d], and [t], which rely on high frequencies to differentiate them from other sounds, have been shown to be the most easily confused sounds when cut off at 4,000 Hz.1 In addition, sound cut off at this frequency tends to appear “muffled” rather than “bright” and “thin” rather than “full” as compared to unfiltered sound.3 To improve the quality and intelligibility of cell phone conversations, and because modern computing methods have progressed enough to be able to easily store and transmit more data, cell phone manufacturers and network providers have been working on creating phones and networks capable of transmitting sound with a higher range of frequencies. New to the U.S. in 2013 is “wideband” or “HD voice” calling, which allows transmission of sound with double the typical bandwidth of frequencies, so that frequencies between 50 Hz and about 7,000 Hz are retained for transmission.5 To achieve this increased performance, both the transmitting and receiving phone, as well as the network, must be capable of supporting the wider span of frequencies.
  • 4. 4 2.671 Go Forth and Measure 2.3 FREQUENCY RESPONSE FUNCTIONS Fig. 5 shows the system block diagram of a linear time invariant (LTI) system in both the time domain and the frequency domain. LTI systems are defined by two characteristics: linearity and time invariance. Linearity means that, for each input, there is a one-to-one correspondence to an output. In other words, if a linear system is excited by an input at a certain frequency, the corresponding output will be at the same frequency, but may be shifted in magnitude and phase. In addition, if the amplitude of the input signal is multiplied by some constant factor, the amplitude of the output signal will be multiplied by the same constant factor. Time invariance means that the system will treat the same input in the same way no matter when it is excited by that input. LTI systems can be described in the time domain by ordinary differential equations (ODEs) with constant coefficients. Figure 5: System block diagram of a linear time invariant (LTI) system. In the time domain, the input signal !(!) is passed through a filter ℎ(!) characterized by a system of ordinary differential equations (ODEs) to produce an output !(!), which is the convolution of the input signal and the system transfer function in the time domain. In the frequency domain, which is calculated by taking the Laplace transform of the time domain signal, the output signal is found simply by multiplying the input signal and the system transfer function. As shown in Fig. 5, in the frequency domain, the transfer function of the output signal !(!) is given simply by multiplying the input transfer function !(!) by the system transfer function !(!), ! ! = ! ! !(!) . (1) Rearranging Eq. 1 and evaluating at a specific frequency ! gives an expression for the gain !(!), or the ratio of output to input amplitudes, at each frequency, ! ! = !(!) !(!) . (2) A plot of gain versus frequency for a system is known as the frequency response function. Estimation of the frequency response function for a physical system generally involves exciting the system with a certain input and then comparing the input and output signals using a technique that extracts frequency information from time-domain signals, such as the Fast Fourier Transform (FFT). The system excitation can be done either by sweeping through a range of frequencies one at a time and measuring the output for each frequency, or by using a stochastic, or random, input that contains a wide range of frequencies all at once. The second method, the stochastic input, is more time-efficient for the data-taking process. In the case of a randomly generated (stochastic) input signal, there are several methods for calculating the frequency response function, all of which treat noise in the system differently. The technique described next, known as !!, is optimized to minimize noise in the output signal, which can occur due to measurement error and the slight non-linearity inherent in any real-world LTI system.6 First, the auto-correlation function of the input signal is calculated. This function gives a measure of the signal’s memory for itself and is computed by correlating the signal with itself for various lags or shifts. The auto- correlation function !!!(!) of an input signal !(!) is given by !!! ! = ! !  ! ! + !  !" ! !! , (3) where ! is the time lag at which the function is evaluated.7 From the input auto-correlation function, the power spectral density8 !!!(!) of !(!) is calculated by taking the FFT of !!!(!), !!!(!) = !!!(!)!!!!"#$ !" ! !! . (4) The power spectral density of a sound signal gives the expected power contained near each frequency in that sound signal. Additionally, the output-input cross-correlation function !!" is calculated for the output signal !(!) and input signal !(!) as !!" ! = ! !  ! ! + !  !" ! !! . (5) The cross-correlation function between two sound signals is a measure of the similarity between the two sounds and whether they are delayed from each other. From the cross-correlation function, the cross spectral density !!"(!) of !(!) and !(!) is calculated by taking the FFT of !!" ! ,
  • 5. 5 2.671 Go Forth and Measure !!"(!) = !!"(!)!!!!"#$ !" ! !! . (6) Finally, the gain ! ! for each frequency can be calculated by dividing the input-output cross-spectral density by the input power spectral density, ! ! = !!"(!) !!!(!) . (7) Division here in the frequency domain is equivalent to deconvolving the input auto-correlation function from the input output cross-correlation in the time domain9 . The resulting frequency response function ! ! can be graphed by plotting gain versus frequency in the desired range. 2.4 CUTOFF FREQUENCIES Low pass filters and band pass filters have cutoff frequencies above which the gain of the system begins to fall off quickly with increasing frequency. These cutoff frequencies can be calculated from the frequency response function graph and are typically defined as the frequency at which the gain drops below some chosen value. For a simple RC circuit low-pass filter9 , the cutoff frequency is defined as the frequency at which the gain falls below 0.707, as shown in Fig. 6. The cutoff frequency provides a quantitative measurement with which the frequency responses of different systems of the same type can be compared. Figure 6: Frequency response plot for a simple RC circuit low-pass filter. The gain is the ratio of the circuit output voltage to its input voltage. Output signals that have amplitude of less than 0.707 times the amplitude of the input signals are above the cutoff frequency, which here is approximately 200 Hz. 3. FREQUENCY RESPONSE MEASUREMENT 3.1 RECORDING AND PLAYBACK A white noise sound file was used as the stochastic input for frequency response testing of the phones. The sound file contained frequencies between 0 Hz and 24,000 Hz, with a Gaussian distribution of amplitudes for each frequency. A power spectrum10 of the white noise file is shown in Fig. 7. Figure 7: Power spectrum of input white noise sound file. Power has a Gaussian distribution for each frequency in the range from 0 to 24,000 Hz, but is only shown here up to 18,000 Hz. This white noise sound file was played through a set of speakers and the resulting sound was recorded with a reference microphone both before and after being recorded and played back by each of the six cell phones. Fig. 8 shows a block diagram of the entire experimental setup. Figure 8: Block diagram of experimental setup for recording and playback. A white noise sound file is played by the speakers and recorded using the microphone both before and after being passed through the phone’s microphone and speakers. The data are collected and then analyzed. The microphone used was a Vernier Microphone MCA-BTA interfaced to the computer through the Vernier LabQuest Mini. The sound was recorded using the respective phone’s recording function and played back on speakerphone at full volume. Fig. 9 shows a photograph of the setup. 10 100 1 10 3 × 1 10 4 × 0.01 0.1 1 Data Gain = 0.707 Log Frequency (Hz) LogGain
  • 6. 6 2.671 Go Forth and Measure Figure 9: Experimental apparatus. The Vernier Microphone, the LabQuest Mini, and the computer are used to record white noise directly from the speakers as well as from the phone being analyzed. Six different phones were analyzed, shown in Fig. 10 in order of release date, which ranged from November 2006 to March 2013. There are two conventional cell phones, the LG enV 1 and the LG Cosmos 2, two Android smartphones, the RAZR M and the HTC One, and two Apple smartphones, the iPhone 4 and the iPhone 5. Also, the two newest phones, the iPhone 5 and the HTC One, are HD Voice capable, so if on the right network in the right conditions, they are able to transmit a higher range of frequencies than typical phone calls. Figure 10: Photographs of the six different phones tested and their release dates. The release dates range from November 2006 to March 2013, and the phones include two conventional cell phones, two Android smartphones, and two Apple smartphones. Four trials were completed for each phone, and the resulting input and output data were analyzed by determining the frequency response function and a corresponding cutoff frequency for each phone. 3.2 TRANSMISSION The measurement of input and output sound files for transmission in a phone call between two of the same phone was essentially the same as that for recording and playback, except that two phones were involved instead of one. Fig. 11 shows the system block diagram for this new system. Figure 11: Block diagram of experimental setup for transmission. A white noise sound file is played by the speakers and recorded using the microphone both before and after being transmitted in a phone call between two of the same phone. The data are collected and then analyzed. This second experiment required two rooms that were out of earshot of each other. The speakers played the input sound file into the first phone in one room, while the second phone received the sound and played it on speakerphone to the microphone in a second room. Also, since the transmission experiment required two of the same phone, only four of the phones were tested. The two conventional phones, the LG enV 1 and the LG Cosmos 2, were not tested, because it was too difficult to acquire two of each of them. Once again, four trials were completed for each phone, and then the data were analyzed to determine the frequency response function and a corresponding cutoff frequency for each phone.
  • 7. 7 2.671 Go Forth and Measure 4. RESULTS AND DISCUSSION 4.1 RECORDING AND PLAYBACK Six different frequency response functions were determined, one for each phone, as shown in Figs. 12-17. As expected from the stochastic input and the fact that real-world systems are not perfectly linear and time invariant, there is noise in the frequency response functions. Nevertheless, a general pattern of attenuation can be observed at higher frequencies. The cutoff frequency was defined here as the frequency above which the average of the last twenty gain values begins to drop below 0.1 and stays below 0.1 for at least the next thirty gain values. An exception was made for the iPhone 5 to decrease the cutoff gain to 0.04, because, with a value of 0.1, the calculated cutoff frequency was far lower than the visually obvious cutoff frequency around 15,000 Hz. The plots shown in Figs. 12-17 are the average frequency response functions of the four trials for each phone. The black curve is the moving average of the past 20 data points. Only the frequencies up to 18,000 Hz are shown, because, according to the spec sheet, the Vernier Microphone begins to lose fidelity around that frequency. Figure 12: Average frequency response function for the LG enV 1 for recording and playback. The colored data points denote the determined values of the frequency response function. The black curve denotes the moving average of the last twenty data points. With the cutoff gain chosen to be 0.1, the LG enV 1 attenuates frequencies higher than 3,470 ± 343 Hz. Figure 13: Average frequency response function for the LG Cosmos 2 for recording and playback. With the cutoff gain chosen to be 0.1, the LG Cosmos 2 attenuates frequencies higher than 3,609 ± 166 Hz. Figure 14: Average frequency response function for the HTC One for recording and playback. With the cutoff gain chosen to be 0.1, the HTC One attenuates frequencies higher than 6,775 ± 634 Hz. Figure 15: Average frequency response function for the RAZR M for recording and playback. With the cutoff gain chosen to be 0.1, the RAZR M attenuates frequencies higher than 11,110 ± 1202 Hz. Figure 16: Average frequency response function for the iPhone 5 for recording and playback. With the cutoff gain chosen to be 0.04, the iPhone 5 attenuates frequencies higher than 15,335 ± 94 Hz.
  • 8. 8 2.671 Go Forth and Measure Figure 17: Average frequency response function for the iPhone 4 for recording and playback. With the cutoff gain chosen to be 0.1, the iPhone 4 attenuates frequencies higher than 15,564 ± 708 Hz. As evident in the frequency response functions above, there is a large variation in cutoff frequencies among phones, ranging from 3,470 Hz to 15,564 Hz. Fig. 18 shows a summary of the cutoff frequency data, with error bars showing the 95% confidence level calculated from the four trials of each phone. Figure 18: Cutoff frequencies for recording and playback for the six phones tested. The two Apple smartphones have the highest cutoff frequencies, and then the two Android smartphones, and then the two conventional phones. The two Apple smartphones have the highest cutoff frequencies, 15,564 ± 708 Hz for the iPhone 4 and 15,335 ± 94 Hz for the iPhone 5, which means that these phones can be expected to have high quality sound when used as voice recorders. Next highest are the two Android smartphones, which vary in cutoff frequency from 11,110 ± 1202 Hz for the RAZR M to about 6,775 ± 634 Hz for the HTC One. These phones used as recorders will still have higher quality sound than a typical phone call, but not as high quality as the iPhones. The lowest cutoff frequencies are those of the two conventional phones, at 3609 ± 166 Hz for the LG Cosmos 2 and 3,470 ± 343 Hz for the LG enV 1. When recording and playing back sound with these phones, one can expect to experience the distortions explained in Sections 2.1 and 2.2, where the sound appears muffled and thin and the consonants are confused with one another. The finding that all four smartphones have significantly higher cutoff frequencies than the two conventional phones is not surprising, given that smartphones tend to have more space to store data and are used as multimedia devices, which requires better sound quality than telephone calling. These results agree with the theory that cell phones typically transmit frequencies between 300 Hz and 3,400 Hz during phone calls, as all of the phones in this experiment kept the frequencies in that range more or less intact. All of the cutoff frequencies found here were larger than 3,400 Hz, meaning that the sound signal can be further filtered down to the typical transmission range before it is transmitted to another phone through the wireless network. In addition, the two HD Voice capable phones, the HTC One and the iPhone 5, have the possibility of transmitting the frequencies quoted for HD Voice, which cuts off at around 7,000 Hz, since their cutoff frequencies for recording and playback are close to or above 7,000 Hz. 4.2 TRANSMISSION The resulting frequency response functions for transmission for the four phones measured are shown in Figs. 19-22, and the corresponding summary plot of cutoff frequencies is shown in Fig. 23. Figure 19: Average frequency response function for the HTC One for transmission. With the cutoff gain chosen to be 0.1, the HTC One attenuates frequencies higher than 3,569 ± 32 Hz. Figure 20: Average frequency response function for the RAZR M for transmission. With the cutoff gain
  • 9. 9 2.671 Go Forth and Measure chosen to be 0.1, the RAZR M attenuates frequencies higher than 3,766 ± 146 Hz. Figure 21: Average frequency response function for the iPhone 5 for transmission. With the cutoff gain chosen to be 0.1, the iPhone 5 attenuates frequencies higher than 3,685 ± 278 Hz. Figure 22: Average frequency response function for the iPhone 4 for transmission. With the cutoff gain chosen to be 0.1, the iPhone 4 attenuates frequencies higher than 3,674 ± 65 Hz. Figure 23: Cutoff frequencies for transmission for the four phones tested. All four phones have cutoff frequencies within 6% of each other, averaging 3,674 Hz, which is 8% higher than the traditional network cutoff of 3,400 Hz. None of the phones reached the HD Voice cutoff of 7,000 Hz. The cutoff frequencies for transmission range from 3,569 ± 52 Hz for the HTC One to 3,766 ± 146 Hz for the RAZR M, with those of the two iPhones in between at 3,674 ± 65 Hz for the iPhone 4 and 3,685 ± 278 Hz for the iPhone 5. All of the cutoff frequencies are within 6% of each other, and the error bars for all the phones overlap those for at least one other phone. This agrees with the theory that the wireless network cuts off sound for all phone calls at the same frequency. However, the average of the four measured values for the cutoff frequency is 3,674 Hz, which is 8% higher than the expected cutoff frequency of 3,400 Hz. This discrepancy is most likely due to the arbitrary choice of 0.1 as the cutoff gain. If the cutoff gain had been chosen to be higher, the calculated cutoff frequencies would have all been lower and would have agreed even more with the theory. None of the four phones tested, even those listed as being capable of HD Voice, had cutoff frequencies that reached the edge of the HD Voice range of 7,000 Hz. The HTC One and iPhone 5 are capable of transmitting sound in the HD Voice range of frequencies, but they must not have been on the correct network or had the correct settings during testing for this to occur. Lack of correct network is likely, since, at the time this experiment was conducted, only one or two networks had transitioned to HD Voice already. The two phones that were not tested for transmission, the LG Cosmos 2 and the LG enV 1, most likely would have had cutoff frequencies equal to or slightly lower than those calculated for recording and playback for those phones, 3609 ± 166 Hz for the LG Cosmos 2 and 3,470 ± 343 Hz for the LG enV 1, as these are already close to the expected value of 3,400 Hz for cell phone transmission. 5. CONCLUSIONS Frequency response analysis of six different cell phones revealed that all phones have nearly the same cutoff frequency for transmission of sound in a phone call, but phones differ immensely in cutoff frequencies for recording and playing back sound. The determined cutoff frequencies for recording and playback were 3,470 ± 343 Hz for the LG enV 1, 3,609 ± 166 Hz for the LG Cosmos 2, 6,775 ± 634 Hz for the HTC One, 11,110 ± 1202 Hz for the RAZR M, 15,335 ± 94 Hz for the iPhone 5, and 15,564 ± 708 Hz for the iPhone 4. Thus, Apple smartphones have higher cutoff frequencies than Android smartphones, which have higher cutoff frequencies than the conventional phones. The higher bandwidth of smartphones is most likely due to their increased capacity for data storage and their use as multimedia devices. The cutoff frequencies for transmission were 3,569 ± 32 Hz for the HTC One, 3,766 ± 146 Hz for the RAZR M, 3,685 ± 278 Hz for the iPhone 5, and 3,674 ± 65 Hz for the iPhone 4, all within 6% of each other with an average value of 3,674 Hz. This average value is 8% higher than the documented value for the cell phone cutoff frequency of 3,400 Hz. This slight discrepancy is most likely due to
  • 10. 10 2.671 Go Forth and Measure the arbitrary nature of the definition of the cutoff frequency. Characterization of cell phones sound quality in these two ways revealed that, for smartphones, the limiting factor in increasing the cutoff frequency and thus improving the sound quality is not the phone itself but the wireless network. For all but a few phones that are calling other HD Voice phones on the correct network, sound transmission is currently limited to the frequency range below 3,400 Hz. However, in the next few years, as more networks come onboard with HD Voice, it should become commonplace to experience phone calls that transmit frequencies up to 7,000 Hz. This added bandwidth will greatly enhance sound quality, especially understanding of consonants, and make conversations sound fuller and less muffled. Future work in this subject could explore other features of the frequency response function besides the cutoff frequency, e.g. dips and spikes at certain frequencies, to gain a fuller picture of what determines sound quality. For example, intelligibility studies could be done on transmission by the four phones tested to understand how the shape of the frequency response function affects sound quality when all phones tested have the same cutoff frequency. In addition, phones could be tested that are set up correctly for HD Voice to see what the new frequency response function looks like. ACKNOWLEDGMENTS The author would like to thank Dr. Braunstein for help in designing, carrying out, and analyzing the experiment, Prof. Leonard for useful discussions and providing the white noise file, and Dr. Hughey for answering questions about the experiment and data analysis. In addition, this experiment could not have been performed without the help of those who volunteered their phones to be tested. REFERENCES 1 Malcolm J. Crocker, Handbook of Noise and Vibration Control, John Wiley & Sons (2007), available for download (with MIT certificate) at http://app.knovel.com/web/toc.v/cid:kpHNVC 0001/viewerType:toc/root_slug:handbook-noise- vibration 2 Youngmoo Edmund Kim (2003), “Singing Voice Analysis/Synthesis,” available for download at http://dspace.mit.edu/handle/1721.1/62044 3 Eberhard Hänsler, Gerhard Schmidt (2008). Speech and Audio Processing in Adverse Environments, available for download at http://link.springer.com/book/10.1007%2F978-3-540- 70602-1. 4 Alexandra Chang, “How HD Voice Works to Make Your Calls Sound Drastically Better,” (2013), retrieved October 29, 2013 from http://www.wired.com/gadgetlab/2013/04/how-hd- voice-works-to-make-your-calls-clearer/ 5 Chris Forrester, “Trends in cell phone voice processing,” Canadian Acoustics Vol 37, No 3 (2009), available for download at http://jcaa.caa- aca.ca/index.php/jcaa/article/view/2131 6 Randall J. Allemang (2001), Vibrations: Experimental Modal Analysis, available for download at http://www.sdrl.uc.edu/academic-course-info/vibrations- iii-20-263-663 7 Ian Hunter, “Auto- and Cross-Correlation Functions,” 2.671 Lecture Notes, MIT, (unpublished) 8 Alan V. Oppenheim and George C. Verghese (2010), Signals, Systems, and Inference, Class Notes for 6.011, available for download at http://ocw.mit.edu/courses/electrical-engineering-and- computer-science/6-011-introduction-to- communication-control-and-signal-processing-spring- 2010/readings/ 9 Ian Hunter, “System Identification Via FFT,” 2.671 Lecture Notes, MIT, Revised 2013 (unpublished) 10 Ian Hunter, Barbara Hughey, “Fourier Analysis of Wav Files,” 2.671 Lecture Notes, MIT, Revised 2013 (unpublished)
  • 11. 11 2.671 Go Forth and Measure