SlideShare a Scribd company logo
1 of 8
Download to read offline
Page 1 of 8
Imaging the Human Voice
As a Three Dimensional Surface
Ebe Helm
ABSTRACT
Nature often hides its most interesting qualities and patterns until curiosity or imagination conceives new
ways to discover and visualize them. The Nautilus shell and the Barnsley fern are just two examples of this.
The possibility that human speech, if viewed in new ways, and in the expanded perspective of a three
dimensional surface might also exhibit such patterns was the basis for this investigation. To a more practical
application the goal was to determine if a new approach might yield such observable patterns as would
expand the science of computer speech recognition, and in such a way as would make the topic more
approachable to a general audience.
Introduction
This paper is a follow-on effort to an earlier work that focused primarily on examining acoustic waveforms in the
more traditional linear time domain, and attempted to expand on some of the challenges encountered in computer
speech recognition [1]. A function for separating the waveforms without the application of Discrete Fourier
Transforms or the use of Mel Frequency Cepstral Coefficients, as well as the considerations of Dynamic Time Warping
were demonstrated. This presentation continues from that point by providing additional perspective beyond two
dimensions. The goal of the application was founded in two questions. What does a word actually look like? Is it
possible to associate the meaning of a sound by its appearance [2]. The questions become more complex because
the perceptions are governed by the methods used to render the images. Traditionally, and more commonly
encountered, are both the linear time-line and the two-dimensional Sonographs as shown below. This initial
hypothesis required a means for transforming coordinate data from a form of ordered triples into an ordered pair
such that these could be displayed and rotated on a virtual plane. While this was accomplished, it was a remarkable
experience to observe that when the acoustic waveforms were processed and displayed, they exhibited an
inherently three-dimensional quality all their own. It was not necessary to force the perspective of three dimensions.
It appears that the quality was already there.
The following five sections demonstrate techniques for processing complex audio waveforms providing: 1)
separation of the waveform into individual lines of constituent frequencies. Including the Fundamental [sometimes
known as the Glottal] frequency. 2) Generating waveform profiles that may be viewed and rotated as a three
dimensional surface using co-ordinate transformation. 3) Examination of waveforms across long and short intervals
of the time domain. 4) A description and illustration comparing the frequency profile across a variable time domain
and how this profile might yield unique and recognizable patterns to individual phonetic structures. 5) Finally, a
means for reducing the background noise floor and separating individual words and consonants is illustrated.
Linear waveform Two dimensional Spectragram
The more traditional examples of linear waveform rendering (left) and a two-dimensional Spectrogram sometimes
known as Sonograms, Sonographs, voice prints, or spectrographs shown (right).
Page 2 of 8
I. Waveform Filtering
Fast Fourier Transforms (FFT), Discrete Fourier Transforms (DFT), and Mel Frequency Cepstral Coefficients (MFCC)
have traditionally been the chosen means for separating complex waveforms into their constituent frequencies [3]
[4] [5] [6] [7]. One of the first objectives in this effort was to find an alternative that would accomplish essentially
the same thing. Perhaps by smoothing away the higher frequencies from the lower frequencies, and revealing detail
that is otherwise occluded. The desired result was observed by taking the average of f(x) and the two pointโ€™s f(x-n)
and f(x+n) on either side. On the first iteration of this function the waveform immediately displayed the higher
frequencies and lower energies of the consonants |t| and |th| as shown in figure 1a below. These energies are
normally all but indiscernible when viewed as a composite two dimensional waveform. As the number of iterations
of this function increased, the waveforms smoothed out to reveal the lower frequency and higher energies until the
fundamental frequency itself came into relief. Figure 1b. With each following iteration, the values of n are increased
outward on either side of f(x) in what might be referred to as an expanding average. The results however, are not
immediately applied to the line f(x). They are kept in a buffer such as not to effect the following iterations of f(x). In
this way the algorithm might be described as semi-recursive. Only after all values in the line f(x) have been calculated,
are they moved to become g(x). This is not necessarily implicit in the equation below, but does have significant effect
on the resultant data. Initially, a significant amount of noise was observed with each line iteration. A variety of
techniques and functions were explored to filter and remove this noise from the signal with varying degrees of
success. Ultimately it was found that by simply subtracting the previous line from the current line, the noise was
effectively removed. This was increasingly evident with regards to the lower frequency noise. As a side effect, the
overall waveform demonstrated a more accurately defined shape and envelope.
๐‘Š๐‘Ž๐‘ฃ๐‘’๐‘“๐‘œ๐‘Ÿ๐‘š ๐น๐‘–๐‘™๐‘ก๐‘’๐‘Ÿ๐‘–๐‘›๐‘” โ†’ ๐‘”(๐‘ฅ)โˆ˜๐‘–
= (
๐‘“(๐‘ฅ โˆ’ ๐‘›) + ๐‘“( ๐‘ฅ) + ๐‘“(๐‘ฅ + ๐‘›)
3
) ๏‚ฎ โ„Ž โˆ˜ ๐‘”(๐‘ฅ)โˆ˜๐‘–
= ๐‘”(๐‘ฅ)โˆ˜๐‘–
โˆ’ ๐‘”(๐‘ฅ)โˆ˜๐‘–โˆ’1
The depth and distribution of the waveform on a two dimensional plane, from highest to lowest frequencies, is also
effected by the rate of the increasing values of the two elements โ€“n and +n from f(x). Treated as a non-linear
function, it is possible to effect and scale the distribution of the overall envelope of the waveform across the 100
wave-lines of frequencies resolved. As illustrated in section II following below. In this case 100 lines: Where โ†’ i = 1
to 100 lines and n = i + iโ„2. The results of frequency filtering are shown above. With the first iteration, the highest
frequencies come into relief showing the otherwise less apparent lower energy of |t| in โ€œTestingโ€ and โ€œTwoโ€ and
|th| in the word โ€œThreeโ€. Figure 1a. The fundamental frequency of 147Hz is shown at 50 iterations. Figure 1b. The
significance of the range in time domain is illustrated here. The higher frequencies relating to soft pallet sounds are
easily seen with no magnification at all. While the glottal sounds and fundamental frequency are more apparent at
a 75 millisecond window.
figure 1a: n โ‰ˆ 1 iterations. 400:1 compresion 2000 milliseconds.
figure 1b: n โ‰ˆ 50 iterations. 15:1 compresion 75 milliseconds.
Page 3 of 8
II. Waveform Profile
These complex waveforms must be examined across a broad range of magnification in the time domain. From
seconds to milliseconds. In the longer time period views the signal is better perceived by looking at it as a profile.
This profile can be obtained with an arithmetic mean function as shown below.
๐‘Š๐‘Ž๐‘ฃ๐‘’๐‘“๐‘œ๐‘Ÿ๐‘š ๐‘ƒ๐‘Ÿ๐‘œ๐‘“๐‘–๐‘™๐‘’ โ†’
1
๐‘›
โˆ‘ |๐‘“(๐‘ฅ)|
๐‘›+๐‘š
๐‘ฅ=๐‘š
๐‘› = (๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’๐‘ /๐‘ ๐‘’๐‘)/100 ๐‘š = โˆ’๐‘› 2
โ„
The process of averaging all this data was found to be processor intensive and slowed rendering of the waveforms
considerably. To reduce the rendering time, the function value calculated at a given point is copied into a range of
following elements. The next calculation point is then advanced by that number of elements. This effectively reduces
the number of function iterations. This is not implicit in the โ€˜Waveform Profileโ€™ equation above, but rather managed
in the program code. The effect is a significantly increased rendering speed with acceptable results of imaging the
waveform as shown in figure 2b below.
figure 2a figure 2b
One of the most interesting observations was made when all 100 data lines were stacked one atop another and
displayed simultaneously. The resemblance to a two-dimensional sonograph, like the one shown on page 1, was
immediately evident as in figure 2c. Note: The amplitude of each individual line is scaled to a min/max of 0-100 for
the purposes of calculations and display. This is what makes bringing the higher frequency (lower energy)
components into relief possible. The soft pallet sounds of |t| and |th| are not normally visible in the composite
waveform, but clearly standout with a topographical quality when processed in this way.
figure 2c figure 2d
In extending the results of the Waveform Profile technique to view to the entire array of lines, the image began to
display an interesting degree of detail. This in contrast to the fact that the resolution of the data [in the profile view]
had actually been smoothed away. It was at this point that the [naturally occurring] three dimensional quality of
these waveforms first resolved. It was only on magnification to the millisecond level where the same quality became
dramatically apparent with the raw data.
Page 4 of 8
III. Coordinate Transformations
Initially the thought was that recognizing human speech patterns might be possible by bringing their phonetic
patterns into relief as a three dimensional surface. These structures, impossible to see in linear plots, and barely
discernable in two-dimensional spectrographs might possibly become apparent if viewed in this way. The first goal
was to find or develop a means of coordinate transformation of the ordered triples (x, y, z) into ordered pairs (xโ€™, yโ€™)
such that they could be rotated on a virtual plane. On this surface x would remain the time domain, while z would
extend along the range of the individual frequencies derived, and y would continue to represent the amplitude of
values of the signal f(x). These could then be displayed on a computer screen. A modification of the two-dimensional
trigonometric identity for the addition of angles, and subsequently the addition of a second angle of rotation
provided the desired result for creating a rotational plane. To visualize the concept, hold a cylindrical object, perhaps
a drinking glass. Imagine the rim of this glass existing only on a two-dimensional plane. Viewing the glass in this way,
its rim should first appear as a straight line. Now tilt this object forward about its x-axis and observe the rim
transforming into an ellipse. Finally, as the glass continues to tilt about x, the rim becomes a circle. Notice also that
as the glass is tilted, the rim also translates down along the y-axis. The final modification to this identity provided for
the effect of an ordered triple to represent a point on orbit about the origin of a plane laid tangent onto the surface
of a sphere.
Figure 3a
๐ถ๐‘œ๐‘œ๐‘Ÿ๐‘‘๐‘–๐‘›๐‘Ž๐‘ก๐‘’ ๐‘ก๐‘Ÿ๐‘Ž๐‘›๐‘ ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ†’ ๐‘ฅโ€ฒ = ๐‘ฅ๐ถ๐‘œ๐‘ ๐œƒ โˆ’ ๐‘ง๐‘†๐‘–๐‘›๐œƒ ๐‘Ž๐‘›๐‘‘ ๐‘ฆโ€ฒ = ๐‘†๐‘–๐‘›๐œ™(๐‘ฅ๐‘†๐‘–๐‘›๐œƒ + ๐‘ง๐ถ๐‘œ๐‘ ๐œƒ) + ๐‘ฆ๐ถ๐‘œ๐‘ ๐œ™
It may be beneficial to consider an analogy. The antiquated NTSC television transmission system might be described
as one of the most complex analog encoding systems evolved before the advent of digital transmission. In this
system, specialized oscilloscopes are used for observing the various aspects of the composited signal. The frame
rate at fractions of a second. The line rate in microseconds. Finally, the color subcarrier is observed and measured
in nanoseconds. It is of course impossible to visualize all aspects of this signal at the same time. They must be
observed in incremental steps as one would first use the unaided eye, then a magnifying glass, and finally a
microscope. The analogy is relevant here because, like the old TV broadcasts, recognizable components of a spoken
word also appear to exist over a wide range in the time domain. Hence the need of being able to zoom in and out
from whole seconds to milliseconds becomes even more important.
Page 5 of 8
Of particular import also, is that the data is not only being shown over a broadly varying range of time, but also that
it is being represented as both a profile as in figure 3a, and as the raw data again as in figure 3b. The overall structure
of the words is more easily seen in the long time domain if shown as a smoothed profile, however the raw data itself
requires no smoothing when zoomed in to short time intervals.
Figure 3b
In figure 3a the high frequency energies of |t| and |th| are easily seen at the top of the plot, while the more subtle
phonetic structures and harmonics of consonants and vowels such as |a| and |ah| are examined at magnifications
of twenty five milliseconds shown in figure 3b. It is necessary to view the patterns of the spoken word both near and
far to comprehend the relevant features that make a word unique. It may be interesting to note that two seconds
of recorded speech shown in figure 3a requires a compression of 400:1. Were the sample expanded to reveal its full
detail at 1:1 compression, the data would require a computer screen seventy five feet wide. The contrast is important
here to further demonstrate the diversity and scope to which the unique patterns of a word extend. While the higher
frequency soft pallet sounds are easily seen in real time, the subtle differences in vowels require closer examination.
Having successfully rendered waveforms from a virtual 3D plane onto the 2D display, the need for accurately tracking
a cursor across the two planes for measurments and selections becomes evident. In effect, it is nessesary to reverse
the coordinate transformation between the two planes. The ordered pair of (x, y) represent the mouse pointer
coordiantes on the computer display and are transfomed to the virtual plane as (xโ€™, yโ€™) as shown in the โ€˜Surface
cursor linesโ€™ equation below. This allows the mouse pointer to more accurately relate position and track across the
virtual 3D plane. Note: n in yโ€™ compensates for a 10:1 ratio between 1000 data points across 100 data lines. In this
case the value of n = 10.
๐‘†๐‘ข๐‘Ÿ๐‘“๐‘Ž๐‘๐‘’ ๐‘๐‘ข๐‘Ÿ๐‘ ๐‘œ๐‘Ÿ ๐‘™๐‘–๐‘›๐‘’๐‘  โ†’ ๐‘ฅโ€ฒ = (
๐‘ฆ
๐‘†๐‘–๐‘›๐œ™
) ๐‘†๐‘–๐‘›๐œƒ + ๐‘ฅ๐ถ๐‘œ๐‘ ๐œƒ ๐‘Ž๐‘›๐‘‘ ๐‘ฆโ€ฒ = (
๐‘ฆ
๐‘†๐‘–๐‘›๐œ™/๐‘›
) ๐ถ๐‘œ๐‘ ๐œƒ โ€“ ๐‘ฅ๐‘†๐‘–๐‘›๐œƒ
While it was discovered that the data had a naturally occurring three dimensional quality all its own, there was still
a distinct advantage to being able to examine the images from a continuously variable perspective. Nuances of form
and shape that would otherwise be unnoticed became more evident.
Page 6 of 8
IV. Spectrum sampling
The symmetry of harmonics in the waveform may be brought into greater contrast by removing the negative going
values in a way analogous to Nyquist filtering. One of the first observations in doing this was that the positive and
negative sides of the waveforms are not symmetrical. It appears that both positive and negative going aspects of the
waveform could be relevant for discriminating patterns for recognition. Likewise the details and subtleties of
spectrum of these waveforms across the 100 data lines are also brought into greater relief when observed in varying
three-dimensional perspectives. Figures 4a and 4b below.
Figure 4a Figure 4b
As can be seen in these illustrations, the fundamental frequency is clearly evident, but more importantly a spectral
profile at any given instant across the time domain is now also shown. The need for techniques such as โ€˜Dynamic
Time Warpingโ€™ [5] [8] to fit and match waveform patterns is not neccessary, as it does not matter where these
patterns occur in the time domain. Only that they do occur relative to the fundamental. Figures 4c and 4d below.
figure 4c figure 4d
The ability to observe these systems in three dimensions affords a unique visual perspective for the two components
of amplitude and frequency. In addition to this, another perspective is made possible by observing a sampling of
both amplitude and frequency for a chosen time period. As a starting point, the fundamental frequency is a key
reference. The frequency/amplitude profiles are then summed to form a two dimensional pattern. The visulaizations
obtained may suggest the potential for recognizable patterns. The overall shape of these patterns may in fact be
unique and recognizable while at the same time independent of amplitude, frequency and time. An example of this
concept is shown below as a period is taken from a point on the fundamental zero crossing at -๏ฐ/4 to ๏ฐ/4 Figures 4e
and 4f
figur 4e figure 4f
Page 7 of 8
V. Noise Floor Suppression
The general purpose of the Wavescope platform was to provide as limitless an environment as possible for exploring
techniques in manipulating waveform data. As an example. Noise floor suppression continues to be an important
subject for improving audio quality of both speech and music. It also relates to the quandary of separating what is
not wanted from what is wanted. Applied to computer speech recognition, removing the noise floor from a signal
also provides separation of elements allowing for pattern recognition. Separating โ€˜connected speechโ€™ has long been
one of the greatest challenges in computer speech recognition. This concept also extends to the goal of separation
and isolation of individual consonants and vowels.
๐‘๐‘œ๐‘–๐‘ ๐‘’ ๐น๐‘™๐‘œ๐‘œ๐‘Ÿ ๐‘†๐‘ข๐‘๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ†’ โ„Ž(๐‘ฅ) = ๐‘” โˆ˜ ๐‘“(๐‘ฅ) (1 โˆ’ (
๐‘” โˆ˜ ๐‘“(๐‘ฅ) ๐‘š๐‘Ž๐‘ฅ
โˆ’ ๐‘” โˆ˜ ๐‘“(๐‘ฅ)
๐‘” โˆ˜ ๐‘“(๐‘ฅ) ๐‘š๐‘Ž๐‘ฅ
))
The equation above provides signal attenuation inverse to the amplitude of g๏‚ฐ f(x). As the signal level increases, the
amount of attenuation decreases. This demonstrates eliminating lower level noise while affecting the desired signal
in an increasingly lesser degree as the amplitude increases. The important distinction in this case is that this
attenuation can now be applied to each of the 100 wave-data lines individually. The ability to discreetly filter each
line provides for a more exact and discriminating result. The example above was inspired from the more commonly
known techniques of ฮผ-law and A-law compression and expansion algorithms used in telecommunications to limit
bandwidth.
figure 5a figure 5b
Shown in figures 5a and 5b, the filter is applied to the frequency lines individually as opposed to applying the function
to the unprocessed composite waveform as a whole. As a result, the effectiveness of the filter appears to increase
as its application is more selective. Separation of consonants and vowels begin to come into relief, and the |t| and
|th| and |s| sounds are shown more clearly separated from the lower frequencies..
Conclusion
Computer speech recognition has been attained and mostly perfected. What has not been perfected is a general
accessibility and understanding of the subject. It remains one of both esoteric obscurity and significantly advanced
mathematics to those wishing to explore, or expand the science with new concepts. Perhaps a next logical effort
might be to explore pattern matching techniques using the spectrum profiles found using these or similar
techniques. As described earlier, this could prove an effective means of removing the challenges related to Dynamic
Time Warping, as matching these profiles to sample patterns would not be subject to alignment in the time domain.
The zero-crossings of the fundamental frequency might be used as a reference for selecting and extracting the
spectral profile of a specific period in time. Finally, the term โ€˜Wavescopeโ€™ was given to this program as a descriptive
akin to a telescope or microscope. What might be seen is usually entirely unknown until the thing is built and one
looks through it. That was the purpose of the program. To see what has never been seen. Perhaps in a way that it
has never been seen before. It is intended, and hoped that the techniques and equations presented here would
prove sufficient to reproduce the results shown by anyone wishing to do so.
Page 8 of 8
References
[1] Y. Chow, M. Dunham, O. Kimball, M. Krasner, Kubala, J. G. Makhoul, S. Price, S. Roucos and R. M.
Schwarz, "BYBLOS: The BBN Continuous Speech Recognition System," vol. 12, pp. 89-92, 1987.
[2] E. P. Lewenstein and D. Musello, "His Masterโ€™s (Digital) Voice," Time, vol. 125, no. 13, pp. 83-84, 1
April 1985.
[3] M. Gales and S. Young, "The Application of Hidden Markov Models in Speech Recognition,"
Foundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195-304, 2007.
[4] S. Levinson, "Continuous speech recognition by means of acoustic/ Phonetic classification obtained
from a hidden Markov mode," in IEEE International Conference on ICASSP, Acoustics, Speech, and
Signal Processing, 1987.
[5] L. Muda, M. Begam and I. Elamvazuthi, "Voice Recognition Algorithms using Mel Frequency Cepstral
Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques," JOURNAL OF COMPUTING, vol.
2, no. 3, March 2010.
[6] D. B. Paul, "Speech Recognition Using Hidden Markov Models," The Lincoln Laboratory Journal, vol.
3, no. 1, 1990.
[7] W. Ward, "Hidden Markov Models In Speech Recognition," Carnegie Mellon University, Pittsburgh.
[8] Eamonn J. Keogh and Michael J. Pazzani, "Derivative Dynamic Time Warping,," in Proceedings of the
2001 SIAM International Conference on Data Mining, 2001.

More Related Content

Similar to Imaging the human voice

Damage detection in cfrp plates by means of numerical modeling of lamb waves ...
Damage detection in cfrp plates by means of numerical modeling of lamb waves ...Damage detection in cfrp plates by means of numerical modeling of lamb waves ...
Damage detection in cfrp plates by means of numerical modeling of lamb waves ...eSAT Journals
ย 
Chapter 4
Chapter 4Chapter 4
Chapter 4kerrberrs
ย 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014Fangchen FENG
ย 
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijripublishers Ijri
ย 
Final document
Final documentFinal document
Final documentramyasree_ssj
ย 
Ravasi_etal_EAGE2014
Ravasi_etal_EAGE2014Ravasi_etal_EAGE2014
Ravasi_etal_EAGE2014Matteo Ravasi
ย 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transformijsrd.com
ย 
ECEN+5264 TERM PAPER_Mithul Thanu
ECEN+5264 TERM PAPER_Mithul ThanuECEN+5264 TERM PAPER_Mithul Thanu
ECEN+5264 TERM PAPER_Mithul ThanuMithul Thanu
ย 
Ultrasonography
UltrasonographyUltrasonography
UltrasonographyLipikamandal3
ย 
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijripublishers Ijri
ย 
Wavelet neural network conjunction model in flow forecasting of subhimalayan ...
Wavelet neural network conjunction model in flow forecasting of subhimalayan ...Wavelet neural network conjunction model in flow forecasting of subhimalayan ...
Wavelet neural network conjunction model in flow forecasting of subhimalayan ...iaemedu
ย 
Sound Source Localization with microphone arrays
Sound Source Localization with microphone arraysSound Source Localization with microphone arrays
Sound Source Localization with microphone arraysRamin Anushiravani
ย 
Towards the identification of the primary particle nature by the radiodetecti...
Towards the identification of the primary particle nature by the radiodetecti...Towards the identification of the primary particle nature by the radiodetecti...
Towards the identification of the primary particle nature by the radiodetecti...Ahmed Ammar Rebai PhD
ย 
Waveguide beamprop
Waveguide beampropWaveguide beamprop
Waveguide beampropeiacqer
ย 
Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ahmed Ammar Rebai PhD
ย 
K147897
K147897K147897
K147897irjes
ย 
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionA New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
ย 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Roman Atachiants
ย 

Similar to Imaging the human voice (20)

fading-conf
fading-conffading-conf
fading-conf
ย 
Damage detection in cfrp plates by means of numerical modeling of lamb waves ...
Damage detection in cfrp plates by means of numerical modeling of lamb waves ...Damage detection in cfrp plates by means of numerical modeling of lamb waves ...
Damage detection in cfrp plates by means of numerical modeling of lamb waves ...
ย 
Chapter 4
Chapter 4Chapter 4
Chapter 4
ย 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014
ย 
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
ย 
Final document
Final documentFinal document
Final document
ย 
Image Processing With Sampling and Noise Filtration in Image Reconigation Pr...
Image Processing With Sampling and Noise Filtration in Image  Reconigation Pr...Image Processing With Sampling and Noise Filtration in Image  Reconigation Pr...
Image Processing With Sampling and Noise Filtration in Image Reconigation Pr...
ย 
Ravasi_etal_EAGE2014
Ravasi_etal_EAGE2014Ravasi_etal_EAGE2014
Ravasi_etal_EAGE2014
ย 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transform
ย 
ECEN+5264 TERM PAPER_Mithul Thanu
ECEN+5264 TERM PAPER_Mithul ThanuECEN+5264 TERM PAPER_Mithul Thanu
ECEN+5264 TERM PAPER_Mithul Thanu
ย 
Ultrasonography
UltrasonographyUltrasonography
Ultrasonography
ย 
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...
ย 
Wavelet neural network conjunction model in flow forecasting of subhimalayan ...
Wavelet neural network conjunction model in flow forecasting of subhimalayan ...Wavelet neural network conjunction model in flow forecasting of subhimalayan ...
Wavelet neural network conjunction model in flow forecasting of subhimalayan ...
ย 
Sound Source Localization with microphone arrays
Sound Source Localization with microphone arraysSound Source Localization with microphone arrays
Sound Source Localization with microphone arrays
ย 
Towards the identification of the primary particle nature by the radiodetecti...
Towards the identification of the primary particle nature by the radiodetecti...Towards the identification of the primary particle nature by the radiodetecti...
Towards the identification of the primary particle nature by the radiodetecti...
ย 
Waveguide beamprop
Waveguide beampropWaveguide beamprop
Waveguide beamprop
ย 
Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...
ย 
K147897
K147897K147897
K147897
ย 
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionA New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
ย 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
ย 

Recently uploaded

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
ย 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
ย 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...Sรฉrgio Sacani
ย 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
ย 
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
ย 
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCRStunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
ย 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
ย 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
ย 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
ย 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sรฉrgio Sacani
ย 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
ย 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
ย 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
ย 
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….Nitya salvi
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
ย 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
ย 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSรฉrgio Sacani
ย 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSรฉrgio Sacani
ย 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
ย 

Recently uploaded (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
ย 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
ย 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
ย 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
ย 
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
ย 
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCRStunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
ย 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
ย 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
ย 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
ย 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
ย 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
ย 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
ย 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
ย 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
ย 
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
ย 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
ย 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
ย 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
ย 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
ย 

Imaging the human voice

  • 1. Page 1 of 8 Imaging the Human Voice As a Three Dimensional Surface Ebe Helm ABSTRACT Nature often hides its most interesting qualities and patterns until curiosity or imagination conceives new ways to discover and visualize them. The Nautilus shell and the Barnsley fern are just two examples of this. The possibility that human speech, if viewed in new ways, and in the expanded perspective of a three dimensional surface might also exhibit such patterns was the basis for this investigation. To a more practical application the goal was to determine if a new approach might yield such observable patterns as would expand the science of computer speech recognition, and in such a way as would make the topic more approachable to a general audience. Introduction This paper is a follow-on effort to an earlier work that focused primarily on examining acoustic waveforms in the more traditional linear time domain, and attempted to expand on some of the challenges encountered in computer speech recognition [1]. A function for separating the waveforms without the application of Discrete Fourier Transforms or the use of Mel Frequency Cepstral Coefficients, as well as the considerations of Dynamic Time Warping were demonstrated. This presentation continues from that point by providing additional perspective beyond two dimensions. The goal of the application was founded in two questions. What does a word actually look like? Is it possible to associate the meaning of a sound by its appearance [2]. The questions become more complex because the perceptions are governed by the methods used to render the images. Traditionally, and more commonly encountered, are both the linear time-line and the two-dimensional Sonographs as shown below. This initial hypothesis required a means for transforming coordinate data from a form of ordered triples into an ordered pair such that these could be displayed and rotated on a virtual plane. While this was accomplished, it was a remarkable experience to observe that when the acoustic waveforms were processed and displayed, they exhibited an inherently three-dimensional quality all their own. It was not necessary to force the perspective of three dimensions. It appears that the quality was already there. The following five sections demonstrate techniques for processing complex audio waveforms providing: 1) separation of the waveform into individual lines of constituent frequencies. Including the Fundamental [sometimes known as the Glottal] frequency. 2) Generating waveform profiles that may be viewed and rotated as a three dimensional surface using co-ordinate transformation. 3) Examination of waveforms across long and short intervals of the time domain. 4) A description and illustration comparing the frequency profile across a variable time domain and how this profile might yield unique and recognizable patterns to individual phonetic structures. 5) Finally, a means for reducing the background noise floor and separating individual words and consonants is illustrated. Linear waveform Two dimensional Spectragram The more traditional examples of linear waveform rendering (left) and a two-dimensional Spectrogram sometimes known as Sonograms, Sonographs, voice prints, or spectrographs shown (right).
  • 2. Page 2 of 8 I. Waveform Filtering Fast Fourier Transforms (FFT), Discrete Fourier Transforms (DFT), and Mel Frequency Cepstral Coefficients (MFCC) have traditionally been the chosen means for separating complex waveforms into their constituent frequencies [3] [4] [5] [6] [7]. One of the first objectives in this effort was to find an alternative that would accomplish essentially the same thing. Perhaps by smoothing away the higher frequencies from the lower frequencies, and revealing detail that is otherwise occluded. The desired result was observed by taking the average of f(x) and the two pointโ€™s f(x-n) and f(x+n) on either side. On the first iteration of this function the waveform immediately displayed the higher frequencies and lower energies of the consonants |t| and |th| as shown in figure 1a below. These energies are normally all but indiscernible when viewed as a composite two dimensional waveform. As the number of iterations of this function increased, the waveforms smoothed out to reveal the lower frequency and higher energies until the fundamental frequency itself came into relief. Figure 1b. With each following iteration, the values of n are increased outward on either side of f(x) in what might be referred to as an expanding average. The results however, are not immediately applied to the line f(x). They are kept in a buffer such as not to effect the following iterations of f(x). In this way the algorithm might be described as semi-recursive. Only after all values in the line f(x) have been calculated, are they moved to become g(x). This is not necessarily implicit in the equation below, but does have significant effect on the resultant data. Initially, a significant amount of noise was observed with each line iteration. A variety of techniques and functions were explored to filter and remove this noise from the signal with varying degrees of success. Ultimately it was found that by simply subtracting the previous line from the current line, the noise was effectively removed. This was increasingly evident with regards to the lower frequency noise. As a side effect, the overall waveform demonstrated a more accurately defined shape and envelope. ๐‘Š๐‘Ž๐‘ฃ๐‘’๐‘“๐‘œ๐‘Ÿ๐‘š ๐น๐‘–๐‘™๐‘ก๐‘’๐‘Ÿ๐‘–๐‘›๐‘” โ†’ ๐‘”(๐‘ฅ)โˆ˜๐‘– = ( ๐‘“(๐‘ฅ โˆ’ ๐‘›) + ๐‘“( ๐‘ฅ) + ๐‘“(๐‘ฅ + ๐‘›) 3 ) ๏‚ฎ โ„Ž โˆ˜ ๐‘”(๐‘ฅ)โˆ˜๐‘– = ๐‘”(๐‘ฅ)โˆ˜๐‘– โˆ’ ๐‘”(๐‘ฅ)โˆ˜๐‘–โˆ’1 The depth and distribution of the waveform on a two dimensional plane, from highest to lowest frequencies, is also effected by the rate of the increasing values of the two elements โ€“n and +n from f(x). Treated as a non-linear function, it is possible to effect and scale the distribution of the overall envelope of the waveform across the 100 wave-lines of frequencies resolved. As illustrated in section II following below. In this case 100 lines: Where โ†’ i = 1 to 100 lines and n = i + iโ„2. The results of frequency filtering are shown above. With the first iteration, the highest frequencies come into relief showing the otherwise less apparent lower energy of |t| in โ€œTestingโ€ and โ€œTwoโ€ and |th| in the word โ€œThreeโ€. Figure 1a. The fundamental frequency of 147Hz is shown at 50 iterations. Figure 1b. The significance of the range in time domain is illustrated here. The higher frequencies relating to soft pallet sounds are easily seen with no magnification at all. While the glottal sounds and fundamental frequency are more apparent at a 75 millisecond window. figure 1a: n โ‰ˆ 1 iterations. 400:1 compresion 2000 milliseconds. figure 1b: n โ‰ˆ 50 iterations. 15:1 compresion 75 milliseconds.
  • 3. Page 3 of 8 II. Waveform Profile These complex waveforms must be examined across a broad range of magnification in the time domain. From seconds to milliseconds. In the longer time period views the signal is better perceived by looking at it as a profile. This profile can be obtained with an arithmetic mean function as shown below. ๐‘Š๐‘Ž๐‘ฃ๐‘’๐‘“๐‘œ๐‘Ÿ๐‘š ๐‘ƒ๐‘Ÿ๐‘œ๐‘“๐‘–๐‘™๐‘’ โ†’ 1 ๐‘› โˆ‘ |๐‘“(๐‘ฅ)| ๐‘›+๐‘š ๐‘ฅ=๐‘š ๐‘› = (๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’๐‘ /๐‘ ๐‘’๐‘)/100 ๐‘š = โˆ’๐‘› 2 โ„ The process of averaging all this data was found to be processor intensive and slowed rendering of the waveforms considerably. To reduce the rendering time, the function value calculated at a given point is copied into a range of following elements. The next calculation point is then advanced by that number of elements. This effectively reduces the number of function iterations. This is not implicit in the โ€˜Waveform Profileโ€™ equation above, but rather managed in the program code. The effect is a significantly increased rendering speed with acceptable results of imaging the waveform as shown in figure 2b below. figure 2a figure 2b One of the most interesting observations was made when all 100 data lines were stacked one atop another and displayed simultaneously. The resemblance to a two-dimensional sonograph, like the one shown on page 1, was immediately evident as in figure 2c. Note: The amplitude of each individual line is scaled to a min/max of 0-100 for the purposes of calculations and display. This is what makes bringing the higher frequency (lower energy) components into relief possible. The soft pallet sounds of |t| and |th| are not normally visible in the composite waveform, but clearly standout with a topographical quality when processed in this way. figure 2c figure 2d In extending the results of the Waveform Profile technique to view to the entire array of lines, the image began to display an interesting degree of detail. This in contrast to the fact that the resolution of the data [in the profile view] had actually been smoothed away. It was at this point that the [naturally occurring] three dimensional quality of these waveforms first resolved. It was only on magnification to the millisecond level where the same quality became dramatically apparent with the raw data.
  • 4. Page 4 of 8 III. Coordinate Transformations Initially the thought was that recognizing human speech patterns might be possible by bringing their phonetic patterns into relief as a three dimensional surface. These structures, impossible to see in linear plots, and barely discernable in two-dimensional spectrographs might possibly become apparent if viewed in this way. The first goal was to find or develop a means of coordinate transformation of the ordered triples (x, y, z) into ordered pairs (xโ€™, yโ€™) such that they could be rotated on a virtual plane. On this surface x would remain the time domain, while z would extend along the range of the individual frequencies derived, and y would continue to represent the amplitude of values of the signal f(x). These could then be displayed on a computer screen. A modification of the two-dimensional trigonometric identity for the addition of angles, and subsequently the addition of a second angle of rotation provided the desired result for creating a rotational plane. To visualize the concept, hold a cylindrical object, perhaps a drinking glass. Imagine the rim of this glass existing only on a two-dimensional plane. Viewing the glass in this way, its rim should first appear as a straight line. Now tilt this object forward about its x-axis and observe the rim transforming into an ellipse. Finally, as the glass continues to tilt about x, the rim becomes a circle. Notice also that as the glass is tilted, the rim also translates down along the y-axis. The final modification to this identity provided for the effect of an ordered triple to represent a point on orbit about the origin of a plane laid tangent onto the surface of a sphere. Figure 3a ๐ถ๐‘œ๐‘œ๐‘Ÿ๐‘‘๐‘–๐‘›๐‘Ž๐‘ก๐‘’ ๐‘ก๐‘Ÿ๐‘Ž๐‘›๐‘ ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ†’ ๐‘ฅโ€ฒ = ๐‘ฅ๐ถ๐‘œ๐‘ ๐œƒ โˆ’ ๐‘ง๐‘†๐‘–๐‘›๐œƒ ๐‘Ž๐‘›๐‘‘ ๐‘ฆโ€ฒ = ๐‘†๐‘–๐‘›๐œ™(๐‘ฅ๐‘†๐‘–๐‘›๐œƒ + ๐‘ง๐ถ๐‘œ๐‘ ๐œƒ) + ๐‘ฆ๐ถ๐‘œ๐‘ ๐œ™ It may be beneficial to consider an analogy. The antiquated NTSC television transmission system might be described as one of the most complex analog encoding systems evolved before the advent of digital transmission. In this system, specialized oscilloscopes are used for observing the various aspects of the composited signal. The frame rate at fractions of a second. The line rate in microseconds. Finally, the color subcarrier is observed and measured in nanoseconds. It is of course impossible to visualize all aspects of this signal at the same time. They must be observed in incremental steps as one would first use the unaided eye, then a magnifying glass, and finally a microscope. The analogy is relevant here because, like the old TV broadcasts, recognizable components of a spoken word also appear to exist over a wide range in the time domain. Hence the need of being able to zoom in and out from whole seconds to milliseconds becomes even more important.
  • 5. Page 5 of 8 Of particular import also, is that the data is not only being shown over a broadly varying range of time, but also that it is being represented as both a profile as in figure 3a, and as the raw data again as in figure 3b. The overall structure of the words is more easily seen in the long time domain if shown as a smoothed profile, however the raw data itself requires no smoothing when zoomed in to short time intervals. Figure 3b In figure 3a the high frequency energies of |t| and |th| are easily seen at the top of the plot, while the more subtle phonetic structures and harmonics of consonants and vowels such as |a| and |ah| are examined at magnifications of twenty five milliseconds shown in figure 3b. It is necessary to view the patterns of the spoken word both near and far to comprehend the relevant features that make a word unique. It may be interesting to note that two seconds of recorded speech shown in figure 3a requires a compression of 400:1. Were the sample expanded to reveal its full detail at 1:1 compression, the data would require a computer screen seventy five feet wide. The contrast is important here to further demonstrate the diversity and scope to which the unique patterns of a word extend. While the higher frequency soft pallet sounds are easily seen in real time, the subtle differences in vowels require closer examination. Having successfully rendered waveforms from a virtual 3D plane onto the 2D display, the need for accurately tracking a cursor across the two planes for measurments and selections becomes evident. In effect, it is nessesary to reverse the coordinate transformation between the two planes. The ordered pair of (x, y) represent the mouse pointer coordiantes on the computer display and are transfomed to the virtual plane as (xโ€™, yโ€™) as shown in the โ€˜Surface cursor linesโ€™ equation below. This allows the mouse pointer to more accurately relate position and track across the virtual 3D plane. Note: n in yโ€™ compensates for a 10:1 ratio between 1000 data points across 100 data lines. In this case the value of n = 10. ๐‘†๐‘ข๐‘Ÿ๐‘“๐‘Ž๐‘๐‘’ ๐‘๐‘ข๐‘Ÿ๐‘ ๐‘œ๐‘Ÿ ๐‘™๐‘–๐‘›๐‘’๐‘  โ†’ ๐‘ฅโ€ฒ = ( ๐‘ฆ ๐‘†๐‘–๐‘›๐œ™ ) ๐‘†๐‘–๐‘›๐œƒ + ๐‘ฅ๐ถ๐‘œ๐‘ ๐œƒ ๐‘Ž๐‘›๐‘‘ ๐‘ฆโ€ฒ = ( ๐‘ฆ ๐‘†๐‘–๐‘›๐œ™/๐‘› ) ๐ถ๐‘œ๐‘ ๐œƒ โ€“ ๐‘ฅ๐‘†๐‘–๐‘›๐œƒ While it was discovered that the data had a naturally occurring three dimensional quality all its own, there was still a distinct advantage to being able to examine the images from a continuously variable perspective. Nuances of form and shape that would otherwise be unnoticed became more evident.
  • 6. Page 6 of 8 IV. Spectrum sampling The symmetry of harmonics in the waveform may be brought into greater contrast by removing the negative going values in a way analogous to Nyquist filtering. One of the first observations in doing this was that the positive and negative sides of the waveforms are not symmetrical. It appears that both positive and negative going aspects of the waveform could be relevant for discriminating patterns for recognition. Likewise the details and subtleties of spectrum of these waveforms across the 100 data lines are also brought into greater relief when observed in varying three-dimensional perspectives. Figures 4a and 4b below. Figure 4a Figure 4b As can be seen in these illustrations, the fundamental frequency is clearly evident, but more importantly a spectral profile at any given instant across the time domain is now also shown. The need for techniques such as โ€˜Dynamic Time Warpingโ€™ [5] [8] to fit and match waveform patterns is not neccessary, as it does not matter where these patterns occur in the time domain. Only that they do occur relative to the fundamental. Figures 4c and 4d below. figure 4c figure 4d The ability to observe these systems in three dimensions affords a unique visual perspective for the two components of amplitude and frequency. In addition to this, another perspective is made possible by observing a sampling of both amplitude and frequency for a chosen time period. As a starting point, the fundamental frequency is a key reference. The frequency/amplitude profiles are then summed to form a two dimensional pattern. The visulaizations obtained may suggest the potential for recognizable patterns. The overall shape of these patterns may in fact be unique and recognizable while at the same time independent of amplitude, frequency and time. An example of this concept is shown below as a period is taken from a point on the fundamental zero crossing at -๏ฐ/4 to ๏ฐ/4 Figures 4e and 4f figur 4e figure 4f
  • 7. Page 7 of 8 V. Noise Floor Suppression The general purpose of the Wavescope platform was to provide as limitless an environment as possible for exploring techniques in manipulating waveform data. As an example. Noise floor suppression continues to be an important subject for improving audio quality of both speech and music. It also relates to the quandary of separating what is not wanted from what is wanted. Applied to computer speech recognition, removing the noise floor from a signal also provides separation of elements allowing for pattern recognition. Separating โ€˜connected speechโ€™ has long been one of the greatest challenges in computer speech recognition. This concept also extends to the goal of separation and isolation of individual consonants and vowels. ๐‘๐‘œ๐‘–๐‘ ๐‘’ ๐น๐‘™๐‘œ๐‘œ๐‘Ÿ ๐‘†๐‘ข๐‘๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ†’ โ„Ž(๐‘ฅ) = ๐‘” โˆ˜ ๐‘“(๐‘ฅ) (1 โˆ’ ( ๐‘” โˆ˜ ๐‘“(๐‘ฅ) ๐‘š๐‘Ž๐‘ฅ โˆ’ ๐‘” โˆ˜ ๐‘“(๐‘ฅ) ๐‘” โˆ˜ ๐‘“(๐‘ฅ) ๐‘š๐‘Ž๐‘ฅ )) The equation above provides signal attenuation inverse to the amplitude of g๏‚ฐ f(x). As the signal level increases, the amount of attenuation decreases. This demonstrates eliminating lower level noise while affecting the desired signal in an increasingly lesser degree as the amplitude increases. The important distinction in this case is that this attenuation can now be applied to each of the 100 wave-data lines individually. The ability to discreetly filter each line provides for a more exact and discriminating result. The example above was inspired from the more commonly known techniques of ฮผ-law and A-law compression and expansion algorithms used in telecommunications to limit bandwidth. figure 5a figure 5b Shown in figures 5a and 5b, the filter is applied to the frequency lines individually as opposed to applying the function to the unprocessed composite waveform as a whole. As a result, the effectiveness of the filter appears to increase as its application is more selective. Separation of consonants and vowels begin to come into relief, and the |t| and |th| and |s| sounds are shown more clearly separated from the lower frequencies.. Conclusion Computer speech recognition has been attained and mostly perfected. What has not been perfected is a general accessibility and understanding of the subject. It remains one of both esoteric obscurity and significantly advanced mathematics to those wishing to explore, or expand the science with new concepts. Perhaps a next logical effort might be to explore pattern matching techniques using the spectrum profiles found using these or similar techniques. As described earlier, this could prove an effective means of removing the challenges related to Dynamic Time Warping, as matching these profiles to sample patterns would not be subject to alignment in the time domain. The zero-crossings of the fundamental frequency might be used as a reference for selecting and extracting the spectral profile of a specific period in time. Finally, the term โ€˜Wavescopeโ€™ was given to this program as a descriptive akin to a telescope or microscope. What might be seen is usually entirely unknown until the thing is built and one looks through it. That was the purpose of the program. To see what has never been seen. Perhaps in a way that it has never been seen before. It is intended, and hoped that the techniques and equations presented here would prove sufficient to reproduce the results shown by anyone wishing to do so.
  • 8. Page 8 of 8 References [1] Y. Chow, M. Dunham, O. Kimball, M. Krasner, Kubala, J. G. Makhoul, S. Price, S. Roucos and R. M. Schwarz, "BYBLOS: The BBN Continuous Speech Recognition System," vol. 12, pp. 89-92, 1987. [2] E. P. Lewenstein and D. Musello, "His Masterโ€™s (Digital) Voice," Time, vol. 125, no. 13, pp. 83-84, 1 April 1985. [3] M. Gales and S. Young, "The Application of Hidden Markov Models in Speech Recognition," Foundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195-304, 2007. [4] S. Levinson, "Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov mode," in IEEE International Conference on ICASSP, Acoustics, Speech, and Signal Processing, 1987. [5] L. Muda, M. Begam and I. Elamvazuthi, "Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques," JOURNAL OF COMPUTING, vol. 2, no. 3, March 2010. [6] D. B. Paul, "Speech Recognition Using Hidden Markov Models," The Lincoln Laboratory Journal, vol. 3, no. 1, 1990. [7] W. Ward, "Hidden Markov Models In Speech Recognition," Carnegie Mellon University, Pittsburgh. [8] Eamonn J. Keogh and Michael J. Pazzani, "Derivative Dynamic Time Warping,," in Proceedings of the 2001 SIAM International Conference on Data Mining, 2001.