SlideShare a Scribd company logo
1 of 58
Download to read offline
Queen Mary, University of London 
Master's Project 
Real-Time Vowel Synthesis: 
A Magnetic Resonator Piano Based Project 
Author: 
Vasileios Valavanis 
Supervisor: 
Andrew McPherson 
A thesis submitted in ful
lment of the requirements 
for the degree of Master of Science 
in the 
School of Electronic Engineering and Computer Science 
Queen Mary, University of London 
August 2014
If a picture paints a thousand words, then a 
naked picture paints a thousand words without 
any vowels" 
Josh Stern
Abstract 
Speech synthesis has been an important
eld of research since the beginning of the 
digital signal processing era. Vowels make words intelligible and as Josh Stern so 
eloquently quoted, without them words are naked. This project aims to explore 
the development of a real time vowel synthesis system based on a medium that 
no conventional systems use. Dr McPherson's magnetic resonator piano was used 
in order to vibrate its strings in such way so that they generate vowels. This 
paper walks the reader through the thorough investigation on the properties of the 
human voice, the spectral analysis the magnetic resonator piano's structure and the 
implementation of this vowel synthesis system that includes a software synthesiser 
developed by the author. Results, potential improvements and expansions are 
discussed. 
ii
Acknowledgements 
I would like to thank my project supervisor, Dr Andrew McPherson, for giving 
me the opportunity to work on one of the most fascinating subjects in the
eld 
of audio synthesis and computer science, for allowing me to use the magnetic 
resonator piano and also for his consistent support and guidance throughout the 
project and for putting up with my constant emailing and pestering. 
iii
Contents 
List of Figures v 
1 Introduction 1 
1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 
1.3 Paper Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 
2 Literature Review 5 
2.1 The Magnetic Resonator Piano (MRP) . . . . . . . . . . . . . . . . 5 
2.1.1 MRP Signal Flow . . . . . . . . . . . . . . . . . . . . . . . . 6 
2.2 The Human Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 
2.2.1 Anatomy of the Human Voice . . . . . . . . . . . . . . . . . 8 
2.2.2 Mechanics of the Human Voice . . . . . . . . . . . . . . . . 11 
2.3 Speech Synthesis Models . . . . . . . . . . . . . . . . . . . . . . . . 17 
3 Implementation 23 
3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 
3.2 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 
3.3 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 
3.4 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 
3.5 Plugin Parameters Description . . . . . . . . . . . . . . . . . . . . . 36 
3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 
4 Conclusion 43 
4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 
4.3 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 
4.4 Summary and Final Thoughts . . . . . . . . . . . . . . . . . . . . . 48 
Bibliography 49 
iv
List of Figures 
2.1 MRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 
2.2 Key Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 
2.3 U.R.S. Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 
2.4 Vocal Cords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 
2.5 Glottal Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 
2.6 Glottal Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 
2.7 Formant
ltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 
2.8 Source-
lter model . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 
2.9 Articulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 
2.10 Concatenative synthesis . . . . . . . . . . . . . . . . . . . . . . . . 22 
3.1 PRAAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 
3.2 Impulse Train . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 
3.3 Magnitude Response . . . . . . . . . . . . . . . . . . . . . . . . . . 30 
3.4 Vowel A Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 
3.5 C4 Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . 34 
3.6 G3 Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . 34 
3.7 G3 Average Frequency Response . . . . . . . . . . . . . . . . . . . . 35 
3.8 Plugin Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 
3.9 1st result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 
3.10 2nd Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 
3.11 3rd Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 
3.12 4th Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 
3.13 5th Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 
3.14 6th Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 
v
Chapter 1 
Introduction 
1.1 History 
The arti
cial recreation of the human voice has been a subject of study long before 
the digital age. Papers from the late 1700's exploring into the generation of vowels 
and the synthesis of consonants and their implementation, indicate just how early 
scientists showed interest in the subject. [1] Almost a century later, in 1930, 
Bell labs created the "vocoder", which stripped down speech into its fundamental 
frequency and its harmonics and within the next decade Homer Dudley developed 
a keyboard based voice synthesiser. [2] 
With the evolution of electronic and digital signal processing came more advanced 
systems. In 1961 Bell labs once more, created an electronic speech synthesis system 
using an IBM 704 computer and in the early 70's the TSI Speech+ was developed 
by Handheld electronics which was a breakthrough in portable speech calculators 
1
Chapter 1. Introduction 2 
for blind.[2] One of the most brilliant minds of the 21st century, Stephen Hawking, 
is also using a Speech+ series system to communicate due to his severe medical 
condition which has rendered him unable to speak. 
Nowadays, the majority of these technologies are computer based but there is still 
signi
cant need for mechanical implementations in the market. [2] 
1.2 Background 
Digital imitation of speech, both as a concept and as an area of study, has the 
potential to drive the advancement of a variety of industries forward. Some of 
the current industries experimenting with this technology have seen a signi
cant 
growth since its successful development and have pushed its methodology towards 
new boundaries. Medical sciences, education, music, gaming platforms and many 
other
elds have created a substantial number of speech synthesis techniques. All 
of the techniques rely on understanding how the human voice works and the correct 
use of the tools available. 
The functionality of an electronic or digital audio synthesiser is quite simple. It 
involves the generation of electric or digital signals which represent waveforms 
and their conversion to audible signals through speakers. A few of the most pop- 
ular sound synthesis methods are additive synthesis, subtractive synthesis and 
wavetable synthesis.[3] All of these methods, including the ones not mentioned, 
describe the generation of non audible signals before the stage of their transmission 
however what most waveform synthesis systems fail to elaborate on is the actual
Chapter 1. Introduction 3 
medium of sound reproduction. Modern synthesisers are in a way restricted to 
transmit their output through speakers. This project is willing to take a dierent 
angle and explore vowel synthesis transmitted via piano strings. 
The quality of any arti
cial system depends on its approximation to the system it 
is modelling. In the search for a perfect manmade speech processor the research 
on vowel generation through piano strings described in this paper has resulted in 
a new speech synthesis method. The question raised by the author is whether it 
is possible to develop a vowel synthesis system and transmit its output through 
piano strings so that intelligible vowels are generated. 
1.3 Paper Structure 
The remainder of this paper is structured as follows: 
• Chapter 2 contains a literature review covering the mechanics of the mag- 
netic resonator piano developed by Dr Andrew McPherson, the physics of 
the human voice and existing speech synthesis methods. 
• Chapter 3 includes the proposed method and its implementation (including 
the design process) regarding the real time vowel synthesis system discussed 
and the description of its parameters. 
• Chapter 4 concludes the project with a discussion of the results in the 
context of what is examined in the literature review, an evaluation and
Chapter 1. Introduction 4 
discussion of the future work, and
nally a summary of the project along 
with the author's
nal thoughts.
Chapter 2 
Literature Review 
2.1 The Magnetic Resonator Piano (MRP) 
The magnetic resonator piano can be considered to be an acoustic instrument 
with electronic prosthetics. As a project on its own, the MRP takes the traditional 
grand piano and extends its capabilities. Its success is based on the electromagnetic 
actuators attached above each piano string creating an electromagnetic
eld strong 
enough to force the strings to vibrate. This vibration allows the inde
nite sustain 
of any note whilst gives the performer control over amplitude, frequency and timbre 
in real time. What is remarkable about the MRP is that it is perfectly audible with 
its existing acoustic structure and without any ampli
cation or use of loudspeakers. 
All 88 keys of the piano are usable whereas the normal performance of the piano 
remains intact despite all the instalments. [4] 
5
Chapter 2. Literature Review 6 
Figure 2.1: Magnetic Resonator Piano Block Diagram [4] 
2.1.1 MRP Signal Flow 
When in operation, the MRP can be divided into three main processes that consist 
its work
ow. 
1. The
rst task of the system is to receive an audio signal from a computer as 
seen in
gure 2.1. The most updated version of the MRP, and the one used 
for this project, neglects the sensing of the string vibration by the pickup as 
well as the feedback based mechanism that the band pass
lters and PLL 
create. The input feeds the audio ampli
ers that drive each actuator directly. 
[5] 
2. Triggering the actuators is the next stage of operation. It is essential to 
mention here that the computer uses a MAX Msp patch to guide outgoing 
signals towards the string of the user's choice. The detection of which key
Chapter 2. Literature Review 7 
is active is made by a continuous key sensing mechanism. A modi
ed Moog 
Piano Bar is used for this speci
c task. The Piano bar uses optical and 
interrupt sensors above each piano key, so that it keeps track of their motion. 
The actuators are triggered with a slow pressing movement of any key and 
remain active as long as the hammer does not touch the string and the key 
is not back to its default position. [5] 
3. The last process of this machine is the actuation. The electromagnets above 
each string generate a
eld in which piano strings vibrate. Being driven by 
audio ampli
ers, the actuators force a vibration which is in phase with the 
actual audio input a process that results in a better spectral presence of the 
output. [5] 
Figure 2.2: Key sensing [5] 
To sum up and simplify, the magnetic resonator piano receives any audio signal 
from a computer and transmits it as accurately as possible keeping the string
Chapter 2. Literature Review 8 
vibrations in harmony. Limitations and non linearities are of course inevitable 
and will be discussed in later chapters. 
2.2 The Human Voice 
2.2.1 Anatomy of the Human Voice 
Most of us are oblivious as to how many parts of our body are required to com- 
plete certain tasks like generating voice, which is the case for most of our bodily 
functions.[6] An examination of the anatomy of the upper respiratory system re- 
veals the complexity of such process whilst clarifying essential notions towards its 
correct physical modelling. 
Our body is such an ecient machine that uses the same organs, muscles and 
bones for a vast variety of dierent functions. This paper will focus on how these 
body parts are related to the production of sound rather than their general use. 
Figures 2.3 and 2.4 show where these key body parts are located. 
• The pharynx is a part of the whole throat area located just behind the oral 
and nasal cavity. It is used for producing sound and it has the ability to 
split into two muscular tubes. [8] 
• The larynx is an organ that helps breathing and is essential to sound cre- 
ation because of its ability to manipulate pitch and volume. [9]
Chapter 2. Literature Review 9 
Figure 2.3: Upper Respiratory System Diagram [7] 
• The trachea sends air into the lungs. It is open almost all the time and is 
located right below the larynx. [9] 
• The epiglottis is an elastic cartilage 
ap attached to the entrance of the 
larynx. It covers the larynx and works as a valve when we eat or drink. [10] 
• The esophagus allows food to pass from the pharynx into the stomach. It 
is only open when you swallow or vomit. [8] 
• The vocal cords or folds make phonation possible.They are twin mem- 
branes of muscles and ligaments covered in mucus and are located where the 
pharynx splits between the trachea and the oesophagus stretching horizon- 
tally across the larynx. They are no bigger than 22mm and are open during
Chapter 2. Literature Review 10 
Figure 2.4: Vocal Cords [11] 
inhalation, closed when you hold your breath and vibrate when you speak 
or sing.[10] 
• The glottis is the combination of the vocal folds and the space in between 
the folds. [12] 
The aforementioned body parts put together the most sophisticated musical in- 
strument in existence. To the disappointment of some readers, it is well shown 
that the vocal cords are not a set of strings vibrating like a guitar or a piano thus 
producing sound. Their function is to allow, block or partially block air traveling 
from the lungs through the trachea. The anatomy of the vocal system unveils key 
activities of certain organs that make the voice processing research more accurate.
Chapter 2. Literature Review 11 
2.2.2 Mechanics of the Human Voice 
A view of the human vocal system as a musical instrument has helped the author 
examine the mechanics behind producing voice in a technical way. This section 
will look into how the generation of vowels depends on the interplay between key 
body parts. 
The human respiratory system works like a string and a wind instrument simultaneously.[8] 
This complex apparatus is broken down into three major sub systems that explain 
its function thoroughly. The major active processes responsible for the generation 
of vowels are: respiration, phonation and resonation. 
1. Respiration 
The
rst component of the voice instrument is the lungs. We can consider 
the lungs as the source of the kinetic energy that is responsible for sound 
generation since air is the medium in which sound propagates. When we 
inhale, air is being stored temporarily into the lungs in order to oxygenate 
the blood. To initiate speech, air from the lungs is forced through the trachea 
and the other vocal mechanisms before it exits our body. While speaking, 
breathing becomes faster and inhalation occurs mostly from the mouth. [8] 
One control parameter that this component provides is the volume gain of the 
produced sound. The force by which we contract our lungs while we speak 
controls the pressure within the lungs. When the pressure of the lungs is 
either higher or lower than the atmospheric pressure air starts 
owing. To 
exhale we simply use certain muscles to decrease our lungs' capacity thus
Chapter 2. Literature Review 12 
increase air pressure which results in expiration. The higher the velocity of 
the air
ow through the vocal tract the greater the amplitude of the sound 
we produce is. [9] 
2. Phonation 
The next segment of the vocal system concerns the actual generation of 
sound. Phonation is the process of the conversion of air into audible sound 
waves. As air travels through the trachea it inevitably meets the larynx 
and the vocal cords located at its base. The vocal cords work as a gate 
which regulates the air
ow from and towards the oral and nasal cavities. 
The ability of this gate to remain partially open while interrupting the air 
coming from the lungs is what makes it function as a vibrator. [8] 
Certain aerodynamic and myeoelastic phenomena drive the vibration pro- 
cess of the vocal cords. Under the pressure of the pulmonary air
ow, the 
vocal cords separate whereas due to a combination of factors, including elas- 
ticity, laryngeal muscle tension, and the Bernoulli eect the vocal folds close 
rapidly. [13] As the top of the folds is opening, the bottom is in the process 
of closing, and as soon as the top is closed, the pressure buildup below the 
glottis begins to open the bottom. If the process is maintained by a steady 
supply of pressurised air, the vocal cords will continue to open and close 
in a quasi-periodic fashion. Each vibration allows a small amount of air to 
escape, producing an audible sound at the frequency of this movement; this 
process generates voice. [13]
Chapter 2. Literature Review 13 
Figure 2.5: Pulses created by the vocal cord vibrations [14] 
Figure 2.6: Glottal source spectrum [15]
Chapter 2. Literature Review 14 
The frequency of this vibration sets the fundamental frequency of the glottal 
source and determines the perceived pitch of the voice.[16] The resulting 
waveform of this process is a periodic pulsating signal with high energy 
at the fundamental frequency and its harmonics and gradually decreasing 
amplitude across the spectrum as shown in
gure 2.6. [6] 
Like all string instruments, the pitch of the sound generated by the folds 
depends on their mass, length and tension. 
fx = 
nv 
2L 
where: v = 
s 
T 
 
(2.1) 
(T is tension, L is string length and μ is mass per unit length.) 
However, being organic and air
ow dependant the vocal folds distinguish 
themselves in terms of functionality. The length of the vocal cords can vary 
between 17- 22 mm for an adult male and 12-17 for an adult female. These 
numbers mathematically explain why women most commonly have a higher 
pitched voice than men. The fundamental frequency range of the glottal 
source is commonly between 50 - 500 Hz for all human beings. [17] 
Controlling the pitch of our voice is one of the most important features we 
have towards communicating with one another. While the pressure of the 
pulmonary air
ow can be one method of controlling voice pitch the primary 
mechanism of doing so resides into the larynx. The muscles within the 
larynx give us control over the elasticity and the tension of the vocal folds.
Chapter 2. Literature Review 15 
By manipulating these characteristics we practically adjust the fundamental 
frequency of the glottal source that is being generated. [16] 
3. Resonation 
In the
nal stage of the voice generation process, the glottal source is be- 
ing shaped into intelligible vowels and consonants making up words. The
rst step of this transformation occurs into the cavity of the pharynx which 
communicates with the nasal, the oral cavity and the larynx. The pulses of 
air that escaped the vocal cords are being diused into all of these cavities 
that have the role of the resonator of our whole vocal system. The moving 
parts of our resonator give us the ability to shape the waveforms transmit- 
ted. The lips, the tongue, the velum and basically all of our facial muscles 
give us dynamic control over the real time
ltering of the glottal source. The 
resonator's task is to attenuate some frequencies of the band limited pulse 
produced by the vocal folds while amplifying others. Despite the fact that 
without the vocal folds we could have no voice whatsoever, it is this section 
of the whole mechanism that makes our voice versatile,interesting and iden- 
ti
able. The nature of the human DNA makes sure that each person has 
dierent dimensions of the aforementioned cavities and muscles. Therefore 
the
ltering of the glottal source that occurs in the resonation stage, is as 
unique as one's
ngerprints. [18] 
At this point it is important to mention that speech is made up from more 
than one types of sound. The previous section examined the generation 
of voiced sound but speech contains unvoiced and plosive sounds as well.
Chapter 2. Literature Review 16 
Unvoiced sounds result when air gets through certain blockades in the oral 
cavity whereas plosive sounds are sudden bursts of air coming from either 
the abrupt movement of the vocal tract or the mouth. [17] 
If we were to describe the resonation process in terms of digital signal pro- 
cessing the notion of formants must be introduced. Formants have more 
than one de
nition in multiple research areas. The most common de
nition 
and the one used by the author describes formants as the spectral peaks 
of a given sound spectrum. While the resonator ampli
es and attenuates 
frequencies of the source, formants occur in its spectral envelope that are 
unique for each individual. [19] 
Figure 2.7: Graphic representation of formant creation [20]
Chapter 2. Literature Review 17 
2-3 formants are enough to represent a person's voice despite the fact that our 
resonator produces more. [8] As shown in the bottom of
gure 2.7, formants look 
like the result of band pass

More Related Content

What's hot

What's hot (10)

A proposed taxonomy of software weapons
A proposed taxonomy of software weaponsA proposed taxonomy of software weapons
A proposed taxonomy of software weapons
 
Dissertation
DissertationDissertation
Dissertation
 
Lartc
LartcLartc
Lartc
 
Nw E040 Jp
Nw E040 JpNw E040 Jp
Nw E040 Jp
 
thesis
thesisthesis
thesis
 
071513 How to Prepare for Schedule Quantitative Risk Analysis
071513 How to Prepare for Schedule Quantitative Risk Analysis071513 How to Prepare for Schedule Quantitative Risk Analysis
071513 How to Prepare for Schedule Quantitative Risk Analysis
 
Deep Blue: Examining Cerenkov Radiation Through Non-traditional Media
Deep Blue: Examining Cerenkov Radiation Through Non-traditional MediaDeep Blue: Examining Cerenkov Radiation Through Non-traditional Media
Deep Blue: Examining Cerenkov Radiation Through Non-traditional Media
 
Jazz and blues theory piano 1
Jazz and blues theory piano 1Jazz and blues theory piano 1
Jazz and blues theory piano 1
 
My thesis
My thesisMy thesis
My thesis
 
thesis
thesisthesis
thesis
 

Similar to Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasileios_Valavanis

M1 - Photoconductive Emitters
M1 - Photoconductive EmittersM1 - Photoconductive Emitters
M1 - Photoconductive EmittersThanh-Quy Nguyen
 
MIMO-OFDM communication systems_ channel estimation and wireless.pdf
MIMO-OFDM communication systems_  channel estimation and wireless.pdfMIMO-OFDM communication systems_  channel estimation and wireless.pdf
MIMO-OFDM communication systems_ channel estimation and wireless.pdfSamerSamerM
 
Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...
Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...
Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...David Sabater Dinter
 
Thesis yossie
Thesis yossieThesis yossie
Thesis yossiedmolina87
 
Tesi ph d_andrea_barucci_small
Tesi ph d_andrea_barucci_smallTesi ph d_andrea_barucci_small
Tesi ph d_andrea_barucci_smallAndrea Barucci
 
Analysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural NetworkAnalysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural NetworkZHENG YAN LAM
 
Dissertation wonchae kim
Dissertation wonchae kimDissertation wonchae kim
Dissertation wonchae kimSudheer Babu
 
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_PhysicsFrançois Rigaud
 
Design Project Report - Tyler Ryan
Design Project Report - Tyler RyanDesign Project Report - Tyler Ryan
Design Project Report - Tyler RyanTyler Ryan
 
Thesis Fabian Brull
Thesis Fabian BrullThesis Fabian Brull
Thesis Fabian BrullFabian Brull
 
Dafx (digital audio-effects)
Dafx (digital audio-effects)Dafx (digital audio-effects)
Dafx (digital audio-effects)shervin shokri
 
Automated antlr tree walker
Automated antlr tree walkerAutomated antlr tree walker
Automated antlr tree walkergeeksec80
 

Similar to Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasileios_Valavanis (20)

repport christian el hajj
repport christian el hajjrepport christian el hajj
repport christian el hajj
 
M1 - Photoconductive Emitters
M1 - Photoconductive EmittersM1 - Photoconductive Emitters
M1 - Photoconductive Emitters
 
shifas_thesis
shifas_thesisshifas_thesis
shifas_thesis
 
MIMO-OFDM communication systems_ channel estimation and wireless.pdf
MIMO-OFDM communication systems_  channel estimation and wireless.pdfMIMO-OFDM communication systems_  channel estimation and wireless.pdf
MIMO-OFDM communication systems_ channel estimation and wireless.pdf
 
Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...
Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...
Parallel Interference Cancellation in beyond 3G multi-user and multi-antenna ...
 
Thesis yossie
Thesis yossieThesis yossie
Thesis yossie
 
Tesi ph d_andrea_barucci_small
Tesi ph d_andrea_barucci_smallTesi ph d_andrea_barucci_small
Tesi ph d_andrea_barucci_small
 
Analysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural NetworkAnalysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural Network
 
Dissertation wonchae kim
Dissertation wonchae kimDissertation wonchae kim
Dissertation wonchae kim
 
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
 
Design Project Report - Tyler Ryan
Design Project Report - Tyler RyanDesign Project Report - Tyler Ryan
Design Project Report - Tyler Ryan
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Diplomarbeit
DiplomarbeitDiplomarbeit
Diplomarbeit
 
Thesis Fabian Brull
Thesis Fabian BrullThesis Fabian Brull
Thesis Fabian Brull
 
Dafx (digital audio-effects)
Dafx (digital audio-effects)Dafx (digital audio-effects)
Dafx (digital audio-effects)
 
Automated antlr tree walker
Automated antlr tree walkerAutomated antlr tree walker
Automated antlr tree walker
 
Thesis-DelgerLhamsuren
Thesis-DelgerLhamsurenThesis-DelgerLhamsuren
Thesis-DelgerLhamsuren
 
Dissertation A. Sklavos
Dissertation A. SklavosDissertation A. Sklavos
Dissertation A. Sklavos
 
論文
論文論文
論文
 
spurgeon_thesis_final
spurgeon_thesis_finalspurgeon_thesis_final
spurgeon_thesis_final
 

Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasileios_Valavanis

  • 1. Queen Mary, University of London Master's Project Real-Time Vowel Synthesis: A Magnetic Resonator Piano Based Project Author: Vasileios Valavanis Supervisor: Andrew McPherson A thesis submitted in ful
  • 2. lment of the requirements for the degree of Master of Science in the School of Electronic Engineering and Computer Science Queen Mary, University of London August 2014
  • 3. If a picture paints a thousand words, then a naked picture paints a thousand words without any vowels" Josh Stern
  • 4. Abstract Speech synthesis has been an important
  • 5. eld of research since the beginning of the digital signal processing era. Vowels make words intelligible and as Josh Stern so eloquently quoted, without them words are naked. This project aims to explore the development of a real time vowel synthesis system based on a medium that no conventional systems use. Dr McPherson's magnetic resonator piano was used in order to vibrate its strings in such way so that they generate vowels. This paper walks the reader through the thorough investigation on the properties of the human voice, the spectral analysis the magnetic resonator piano's structure and the implementation of this vowel synthesis system that includes a software synthesiser developed by the author. Results, potential improvements and expansions are discussed. ii
  • 6. Acknowledgements I would like to thank my project supervisor, Dr Andrew McPherson, for giving me the opportunity to work on one of the most fascinating subjects in the
  • 7. eld of audio synthesis and computer science, for allowing me to use the magnetic resonator piano and also for his consistent support and guidance throughout the project and for putting up with my constant emailing and pestering. iii
  • 8. Contents List of Figures v 1 Introduction 1 1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Paper Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Literature Review 5 2.1 The Magnetic Resonator Piano (MRP) . . . . . . . . . . . . . . . . 5 2.1.1 MRP Signal Flow . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 The Human Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Anatomy of the Human Voice . . . . . . . . . . . . . . . . . 8 2.2.2 Mechanics of the Human Voice . . . . . . . . . . . . . . . . 11 2.3 Speech Synthesis Models . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Implementation 23 3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 Plugin Parameters Description . . . . . . . . . . . . . . . . . . . . . 36 3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 Conclusion 43 4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 Summary and Final Thoughts . . . . . . . . . . . . . . . . . . . . . 48 Bibliography 49 iv
  • 9. List of Figures 2.1 MRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Key Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 U.R.S. Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Vocal Cords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Glottal Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6 Glottal Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.7 Formant
  • 10. ltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.8 Source-
  • 11. lter model . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.9 Articulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.10 Concatenative synthesis . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 PRAAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Impulse Train . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Magnitude Response . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Vowel A Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 C4 Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6 G3 Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . 34 3.7 G3 Average Frequency Response . . . . . . . . . . . . . . . . . . . . 35 3.8 Plugin Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.9 1st result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.10 2nd Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.11 3rd Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.12 4th Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.13 5th Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.14 6th Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 v
  • 12. Chapter 1 Introduction 1.1 History The arti
  • 13. cial recreation of the human voice has been a subject of study long before the digital age. Papers from the late 1700's exploring into the generation of vowels and the synthesis of consonants and their implementation, indicate just how early scientists showed interest in the subject. [1] Almost a century later, in 1930, Bell labs created the "vocoder", which stripped down speech into its fundamental frequency and its harmonics and within the next decade Homer Dudley developed a keyboard based voice synthesiser. [2] With the evolution of electronic and digital signal processing came more advanced systems. In 1961 Bell labs once more, created an electronic speech synthesis system using an IBM 704 computer and in the early 70's the TSI Speech+ was developed by Handheld electronics which was a breakthrough in portable speech calculators 1
  • 14. Chapter 1. Introduction 2 for blind.[2] One of the most brilliant minds of the 21st century, Stephen Hawking, is also using a Speech+ series system to communicate due to his severe medical condition which has rendered him unable to speak. Nowadays, the majority of these technologies are computer based but there is still signi
  • 15. cant need for mechanical implementations in the market. [2] 1.2 Background Digital imitation of speech, both as a concept and as an area of study, has the potential to drive the advancement of a variety of industries forward. Some of the current industries experimenting with this technology have seen a signi
  • 16. cant growth since its successful development and have pushed its methodology towards new boundaries. Medical sciences, education, music, gaming platforms and many other
  • 17. elds have created a substantial number of speech synthesis techniques. All of the techniques rely on understanding how the human voice works and the correct use of the tools available. The functionality of an electronic or digital audio synthesiser is quite simple. It involves the generation of electric or digital signals which represent waveforms and their conversion to audible signals through speakers. A few of the most pop- ular sound synthesis methods are additive synthesis, subtractive synthesis and wavetable synthesis.[3] All of these methods, including the ones not mentioned, describe the generation of non audible signals before the stage of their transmission however what most waveform synthesis systems fail to elaborate on is the actual
  • 18. Chapter 1. Introduction 3 medium of sound reproduction. Modern synthesisers are in a way restricted to transmit their output through speakers. This project is willing to take a dierent angle and explore vowel synthesis transmitted via piano strings. The quality of any arti
  • 19. cial system depends on its approximation to the system it is modelling. In the search for a perfect manmade speech processor the research on vowel generation through piano strings described in this paper has resulted in a new speech synthesis method. The question raised by the author is whether it is possible to develop a vowel synthesis system and transmit its output through piano strings so that intelligible vowels are generated. 1.3 Paper Structure The remainder of this paper is structured as follows: • Chapter 2 contains a literature review covering the mechanics of the mag- netic resonator piano developed by Dr Andrew McPherson, the physics of the human voice and existing speech synthesis methods. • Chapter 3 includes the proposed method and its implementation (including the design process) regarding the real time vowel synthesis system discussed and the description of its parameters. • Chapter 4 concludes the project with a discussion of the results in the context of what is examined in the literature review, an evaluation and
  • 20. Chapter 1. Introduction 4 discussion of the future work, and
  • 21. nally a summary of the project along with the author's
  • 23. Chapter 2 Literature Review 2.1 The Magnetic Resonator Piano (MRP) The magnetic resonator piano can be considered to be an acoustic instrument with electronic prosthetics. As a project on its own, the MRP takes the traditional grand piano and extends its capabilities. Its success is based on the electromagnetic actuators attached above each piano string creating an electromagnetic
  • 24. eld strong enough to force the strings to vibrate. This vibration allows the inde
  • 25. nite sustain of any note whilst gives the performer control over amplitude, frequency and timbre in real time. What is remarkable about the MRP is that it is perfectly audible with its existing acoustic structure and without any ampli
  • 26. cation or use of loudspeakers. All 88 keys of the piano are usable whereas the normal performance of the piano remains intact despite all the instalments. [4] 5
  • 27. Chapter 2. Literature Review 6 Figure 2.1: Magnetic Resonator Piano Block Diagram [4] 2.1.1 MRP Signal Flow When in operation, the MRP can be divided into three main processes that consist its work ow. 1. The
  • 28. rst task of the system is to receive an audio signal from a computer as seen in
  • 29. gure 2.1. The most updated version of the MRP, and the one used for this project, neglects the sensing of the string vibration by the pickup as well as the feedback based mechanism that the band pass
  • 30. lters and PLL create. The input feeds the audio ampli
  • 31. ers that drive each actuator directly. [5] 2. Triggering the actuators is the next stage of operation. It is essential to mention here that the computer uses a MAX Msp patch to guide outgoing signals towards the string of the user's choice. The detection of which key
  • 32. Chapter 2. Literature Review 7 is active is made by a continuous key sensing mechanism. A modi
  • 33. ed Moog Piano Bar is used for this speci
  • 34. c task. The Piano bar uses optical and interrupt sensors above each piano key, so that it keeps track of their motion. The actuators are triggered with a slow pressing movement of any key and remain active as long as the hammer does not touch the string and the key is not back to its default position. [5] 3. The last process of this machine is the actuation. The electromagnets above each string generate a
  • 35. eld in which piano strings vibrate. Being driven by audio ampli
  • 36. ers, the actuators force a vibration which is in phase with the actual audio input a process that results in a better spectral presence of the output. [5] Figure 2.2: Key sensing [5] To sum up and simplify, the magnetic resonator piano receives any audio signal from a computer and transmits it as accurately as possible keeping the string
  • 37. Chapter 2. Literature Review 8 vibrations in harmony. Limitations and non linearities are of course inevitable and will be discussed in later chapters. 2.2 The Human Voice 2.2.1 Anatomy of the Human Voice Most of us are oblivious as to how many parts of our body are required to com- plete certain tasks like generating voice, which is the case for most of our bodily functions.[6] An examination of the anatomy of the upper respiratory system re- veals the complexity of such process whilst clarifying essential notions towards its correct physical modelling. Our body is such an ecient machine that uses the same organs, muscles and bones for a vast variety of dierent functions. This paper will focus on how these body parts are related to the production of sound rather than their general use. Figures 2.3 and 2.4 show where these key body parts are located. • The pharynx is a part of the whole throat area located just behind the oral and nasal cavity. It is used for producing sound and it has the ability to split into two muscular tubes. [8] • The larynx is an organ that helps breathing and is essential to sound cre- ation because of its ability to manipulate pitch and volume. [9]
  • 38. Chapter 2. Literature Review 9 Figure 2.3: Upper Respiratory System Diagram [7] • The trachea sends air into the lungs. It is open almost all the time and is located right below the larynx. [9] • The epiglottis is an elastic cartilage ap attached to the entrance of the larynx. It covers the larynx and works as a valve when we eat or drink. [10] • The esophagus allows food to pass from the pharynx into the stomach. It is only open when you swallow or vomit. [8] • The vocal cords or folds make phonation possible.They are twin mem- branes of muscles and ligaments covered in mucus and are located where the pharynx splits between the trachea and the oesophagus stretching horizon- tally across the larynx. They are no bigger than 22mm and are open during
  • 39. Chapter 2. Literature Review 10 Figure 2.4: Vocal Cords [11] inhalation, closed when you hold your breath and vibrate when you speak or sing.[10] • The glottis is the combination of the vocal folds and the space in between the folds. [12] The aforementioned body parts put together the most sophisticated musical in- strument in existence. To the disappointment of some readers, it is well shown that the vocal cords are not a set of strings vibrating like a guitar or a piano thus producing sound. Their function is to allow, block or partially block air traveling from the lungs through the trachea. The anatomy of the vocal system unveils key activities of certain organs that make the voice processing research more accurate.
  • 40. Chapter 2. Literature Review 11 2.2.2 Mechanics of the Human Voice A view of the human vocal system as a musical instrument has helped the author examine the mechanics behind producing voice in a technical way. This section will look into how the generation of vowels depends on the interplay between key body parts. The human respiratory system works like a string and a wind instrument simultaneously.[8] This complex apparatus is broken down into three major sub systems that explain its function thoroughly. The major active processes responsible for the generation of vowels are: respiration, phonation and resonation. 1. Respiration The
  • 41. rst component of the voice instrument is the lungs. We can consider the lungs as the source of the kinetic energy that is responsible for sound generation since air is the medium in which sound propagates. When we inhale, air is being stored temporarily into the lungs in order to oxygenate the blood. To initiate speech, air from the lungs is forced through the trachea and the other vocal mechanisms before it exits our body. While speaking, breathing becomes faster and inhalation occurs mostly from the mouth. [8] One control parameter that this component provides is the volume gain of the produced sound. The force by which we contract our lungs while we speak controls the pressure within the lungs. When the pressure of the lungs is either higher or lower than the atmospheric pressure air starts owing. To exhale we simply use certain muscles to decrease our lungs' capacity thus
  • 42. Chapter 2. Literature Review 12 increase air pressure which results in expiration. The higher the velocity of the air ow through the vocal tract the greater the amplitude of the sound we produce is. [9] 2. Phonation The next segment of the vocal system concerns the actual generation of sound. Phonation is the process of the conversion of air into audible sound waves. As air travels through the trachea it inevitably meets the larynx and the vocal cords located at its base. The vocal cords work as a gate which regulates the air ow from and towards the oral and nasal cavities. The ability of this gate to remain partially open while interrupting the air coming from the lungs is what makes it function as a vibrator. [8] Certain aerodynamic and myeoelastic phenomena drive the vibration pro- cess of the vocal cords. Under the pressure of the pulmonary air ow, the vocal cords separate whereas due to a combination of factors, including elas- ticity, laryngeal muscle tension, and the Bernoulli eect the vocal folds close rapidly. [13] As the top of the folds is opening, the bottom is in the process of closing, and as soon as the top is closed, the pressure buildup below the glottis begins to open the bottom. If the process is maintained by a steady supply of pressurised air, the vocal cords will continue to open and close in a quasi-periodic fashion. Each vibration allows a small amount of air to escape, producing an audible sound at the frequency of this movement; this process generates voice. [13]
  • 43. Chapter 2. Literature Review 13 Figure 2.5: Pulses created by the vocal cord vibrations [14] Figure 2.6: Glottal source spectrum [15]
  • 44. Chapter 2. Literature Review 14 The frequency of this vibration sets the fundamental frequency of the glottal source and determines the perceived pitch of the voice.[16] The resulting waveform of this process is a periodic pulsating signal with high energy at the fundamental frequency and its harmonics and gradually decreasing amplitude across the spectrum as shown in
  • 45. gure 2.6. [6] Like all string instruments, the pitch of the sound generated by the folds depends on their mass, length and tension. fx = nv 2L where: v = s T (2.1) (T is tension, L is string length and μ is mass per unit length.) However, being organic and air ow dependant the vocal folds distinguish themselves in terms of functionality. The length of the vocal cords can vary between 17- 22 mm for an adult male and 12-17 for an adult female. These numbers mathematically explain why women most commonly have a higher pitched voice than men. The fundamental frequency range of the glottal source is commonly between 50 - 500 Hz for all human beings. [17] Controlling the pitch of our voice is one of the most important features we have towards communicating with one another. While the pressure of the pulmonary air ow can be one method of controlling voice pitch the primary mechanism of doing so resides into the larynx. The muscles within the larynx give us control over the elasticity and the tension of the vocal folds.
  • 46. Chapter 2. Literature Review 15 By manipulating these characteristics we practically adjust the fundamental frequency of the glottal source that is being generated. [16] 3. Resonation In the
  • 47. nal stage of the voice generation process, the glottal source is be- ing shaped into intelligible vowels and consonants making up words. The
  • 48. rst step of this transformation occurs into the cavity of the pharynx which communicates with the nasal, the oral cavity and the larynx. The pulses of air that escaped the vocal cords are being diused into all of these cavities that have the role of the resonator of our whole vocal system. The moving parts of our resonator give us the ability to shape the waveforms transmit- ted. The lips, the tongue, the velum and basically all of our facial muscles give us dynamic control over the real time
  • 49. ltering of the glottal source. The resonator's task is to attenuate some frequencies of the band limited pulse produced by the vocal folds while amplifying others. Despite the fact that without the vocal folds we could have no voice whatsoever, it is this section of the whole mechanism that makes our voice versatile,interesting and iden- ti
  • 50. able. The nature of the human DNA makes sure that each person has dierent dimensions of the aforementioned cavities and muscles. Therefore the
  • 51. ltering of the glottal source that occurs in the resonation stage, is as unique as one's
  • 52. ngerprints. [18] At this point it is important to mention that speech is made up from more than one types of sound. The previous section examined the generation of voiced sound but speech contains unvoiced and plosive sounds as well.
  • 53. Chapter 2. Literature Review 16 Unvoiced sounds result when air gets through certain blockades in the oral cavity whereas plosive sounds are sudden bursts of air coming from either the abrupt movement of the vocal tract or the mouth. [17] If we were to describe the resonation process in terms of digital signal pro- cessing the notion of formants must be introduced. Formants have more than one de
  • 54. nition in multiple research areas. The most common de
  • 55. nition and the one used by the author describes formants as the spectral peaks of a given sound spectrum. While the resonator ampli
  • 56. es and attenuates frequencies of the source, formants occur in its spectral envelope that are unique for each individual. [19] Figure 2.7: Graphic representation of formant creation [20]
  • 57. Chapter 2. Literature Review 17 2-3 formants are enough to represent a person's voice despite the fact that our resonator produces more. [8] As shown in the bottom of
  • 58. gure 2.7, formants look like the result of band pass
  • 59. lters applied to the source. This parallelism is not far from the truth as formants are characterised by the same features as band-pass
  • 60. lters which are centre frequency, gain and bandwidth. 2.3 Speech Synthesis Models Traditionally when we talk about speech synthesisers we refer to TTS (text to speech) systems due to their intuitive input method and vast popularity. Such systems can be broken down into multiple components that consist their func- tionality. One of the most important modules of TTS systems is the waveform generator. Based on the model used for the sound generation, speech synthesis systems can be classi
  • 61. ed into three types. These three types are the Formant Syn- thesis, the Articulatory Synthesis and the Concatenative Synthesis. The dierent synthesis systems can also be classi
  • 62. ed into two sub categories according to the extend of human intervention in the creation and execution process. Synthesis by rule uses a collection of supervised rules in order to perform synthesis and data driven synthesis derives its parameters from actual speech data. [21] 1. Formant Synthesis Formant synthesis uses a source-
  • 63. lter model to generate intelligible sounds. The source-
  • 64. lter model can be characterised as a simpli
  • 65. ed version of the real life voice generation process. It can be simply described as the generation of
  • 66. Chapter 2. Literature Review 18 a quasi-periodic pulsating signal (glottal source) and its
  • 67. ltering by multiple variable band pass
  • 68. lters with the appropriate formant parameters in order to have intelligible vowels produced. [21] Figure 2.8: Source-
  • 69. lter model [20] Modelling multiple formant resonances in the digital world requires the im- plementation of 2nd order IIR (in
  • 71. lters. Equation 2.2 shows the transfer function of a 2nd order IIR
  • 72. lter. [21] The deriva- tion of the frequency response of a
  • 73. lter from its transfer function will be examined in chapter 3. Hi (z) = 1 1 2ebi cos (2fi) z1 + e2biz2 (2.2)
  • 74. Chapter 2. Literature Review 19 (2nd order IIR all-pole
  • 75. lter transfer function with fi = Fi=Fs, bi = Bi=Fs where Fi, Bi and Fs are the formant's centre frequency, bandwidth and sam- pling frequency, respectively.) The choice of IIR
  • 76. lters is not random. IIR
  • 78. lters, are a lot more computationally ecient. Their low requirements of
  • 79. lter coecients (ai, bi) make them faster and less memory consuming. On the other hand FIR
  • 80. lters are always stable whereas IIR
  • 81. lters can have poles outside the unit circle which will render them unstable. [21] There are two ways to combine a number of IIR
  • 82. lters together. One way is to create a cascaded array and the other is to combine them in parallel. The parallel method is a lot more complex and is mainly used for the production of fricative sounds. The cascaded method is ideal for vowel sounds and is a lot easier to implement. One other very important dierence between the two methods is that the cascaded technique results in an all pole
  • 83. lter whereas the parallel method results in a
  • 84. lter that has zeros in addition to poles. Poles and zeros can disclose the frequency response characteristics of a
  • 85. lter and are often used as the basis for digital
  • 86. lter design. [21] H1 (z) = XM k=0 bkzk (2.3) H2 (z) = 1 1 PN k=1 akzk (2.4) (2.3 is the transfer function of an all-zero
  • 87. lter and 2.4 of an all-pole
  • 89. Chapter 2. Literature Review 20 In reality voice signals are not stationary. Formant synthesis by rule takes into account the physical limitations of the vocal tract so that the change between formants does not occur abruptly whilst giving the user the ability to manipulate pitch and formant sweep in real time, however it leaves out important re ections and nuances that make the output sound realistic.[21] 2. Articulatory Synthesis Articulatory synthesis is a lot closer to formant synthesis in terms of syn- thesising voice by rule. It models the motion of our articulators and the resulting distributions of waveforms in the lungs, the oral and nasal cavities and the larynx. This model drives a formant synthesiser and uses 5 artic- ulatory parameters: area of lip opening, constriction formed by the tongue blade, opening of the nasal cavities, average glottal area and rate of active expansion/ contraction of the vocal tract tube behind a blockade [21] Figure 2.9: A list of the human articulators [22]
  • 90. Chapter 2. Literature Review 21 The nature of the human speech articulators does not allow them to perform large movements. On the contrary, due to the fact that they are so restricted it is easier to model them. The 5 articulatory parameters are interlaced with the fundamental frequency and the
  • 91. rst 4 formant frequencies. Though this model can be the most promising in terms of speech quality, the methods to collect the aforementioned area parameters are not very advanced thus making articulatory synthesis the least accurate speech synthesis model of the three. [21] 3. Concatenative Synthesis Concatenative synthesis attempts to imitate speech in such way that it cap- tures all of the small details and secondary re ections in order to sound as realistic as possible. The principle behind this model involves the concate- nation of several speech excerpts from recordings together so that a natural sequence of speech is formed. Unlike synthesis by rule this data driven synthesis model does not require any manual adjustments. In addition the segments of speech selected are real, so the output is expected to be of high quality. [21] In reality the cascaded segments are often dierent in terms of spectral and prosodic continuity. If the formants of one segment are not exactly the same with its adjacent or the perceived pitch is dierent from one clip to another, then discontinuities occur at the point of concatenation. The speech excerpts may be perfectly normal, it is their sequence that sounds unnatural. Under ideal conditions the concatenative synthesis model produces the most natural
  • 92. Chapter 2. Literature Review 22 output however its design has to address many issues to avoid discontinuities. The more issues solved during the design process the better the outcome and naturally the more complicated the system is. In addition high quality data driven synthesis models require large amounts of data stored within their operating system and are very computationally expensive and consume a lot more memory than rule based models. [21] Figure 2.10: A simple concatenative synthesis diagram [23]
  • 93. Chapter 3 Implementation The task of the project described in this paper is to build a real time computer based system that generates vowels in such way so that they are being transmitted by Dr. McPherson's magnetic resonator piano intelligibly. The author has created a user friendly digital audio synthesiser that has the role of the input and is in the form of an AU plugin. Moreover the spectral behaviour of the piano has been analysed and added to the system so that clearer results are achieved. The scienti
  • 94. c process behind this attempt is examined in the remainder of this chapter. 3.1 Methodology The method proposed for the successful production of this system is the creation of a vowel synthesis AU plugin based on the formant synthesis by rule model men- tioned in chapter 2. Two criteria were taken under consideration before making this decision; one was the computational requirements of the plugin and the second 23
  • 95. Chapter 3. Implementation 24 was the ability to modify certain parameters in real time. The formant synthesis model oers low computational cost and full adjustment capabilities in real time as well as a decent quality output. The implementation of the real time vowel synthesis system consists of two primary phases; programming and spectral analysis of the piano. The JUCE framework was used for the programming stage which was carried out in C++ using X-Code 5. For the spectral analysis of the piano 2 DPA 2006A microphones and a TASCAM US-122MKII audio interface were used for the recording of the samples and
  • 96. nally MATLAB was used for the analysis of the audio samples. 3.2 Preparation The correct programming of the plugin required some pre calculated data. Specif- ically, the formant centre frequency, bandwidth and gain values needed for the
  • 97. ltering of the model, were captured using PRAAT. PRAAT is an open source speech feature extraction software that oers the option to detect voice formants by a single recording.[24] The
  • 98. rst 4 formants for each vowel were analysed by the recordings of the author's voice. As shown in
  • 99. gure 3.1, the red dots represent the formants of the vowels. The Y axis of the spectrogram is frequency in Hz and the X axis is time in seconds. It is clear that PRAAT provides comprehensive formant data across time in segments of 5 ms.
  • 100. Chapter 3. Implementation 25 Figure 3.1: PRAAT spectrogram showing the
  • 101. rst 4 formants of 3 vowels [24] 3.3 Programming The most comprehensive way to describe the creation of a source/
  • 102. lter audio plugin is to divide it into two main processes. The
  • 103. rst process would be the generation of the source and the second the
  • 104. ltering of it. 1. Source In chapter 2 we thoroughly investigated the nature of the glottal source and concluded that it is a periodic pulsating signal rich in harmonics with gradually decreasing amplitude from the fundamental frequency and across the spectrum. Its digital representation is a band limited pulse which is essentially a series of harmonics or sinusoids: (t) = XN k=1 sin (2kf0t) (3.1) with a number of harmonics given by:
  • 105. Chapter 3. Implementation 26 N = fs 2f0 (3.2) so that we avoid aliasing. A C++ oscillator class taken from Will Pirkle's book Designing Audio Plug-ins in C++ was used for the generation of multiple sine waves. This clever approach uses a 1024 sample buer to store sine wave values which are constantly updated. To avoid any fractional data point values within the buer, a linear interpolation using weighted sums function is being used[25] y = (x x1) (x2 x1) y2 + 1 (x x1) (x2 x1) y1 (3.3) for any y between data points (x1; y1) and (x2; y2). This procedure ensures phase coherence to a large number of oscillator in- stances created simultaneously. In practice 34 instances of the oscillator class were created, the
  • 106. rst one rep- resenting the fundamental frequency and the rest 33 its harmonics, creating a band limited pulse at steady magnitude across its spectrum as shown in
  • 107. gure 3.2. 2. Filtering
  • 108. Chapter 3. Implementation 27 Figure 3.2: Impulse train in time and frequency domain The
  • 109. ltering stage has been conducted in a way so that it extends the tradi- tional methods. The obtained formant
  • 110. lter values for 5 vowels (phonetically [a], [e], [i], [o], [ou]) have been inserted into 5 lookup tables. Each lookup table contains centre frequency (in Hz), gain (in dB) and Q (bandwidth) values for 4
  • 111. lters. To adequately describe the performance of the plugin, this section is divided into the two main functions of the code. • Calculation of coecients: The aforementioned formant values are used to derive the necessary coecients for the 20
  • 112. lters in total. Given:
  • 113. Chapter 3. Implementation 28 G = 10g=20 F = 2 fc fs For : g = 1 And for : g 1 H = F 2Q H = F 2gQ Then the
  • 114. lter coecients ai, bi are given by: a2 = 0:5(1H) (1+H) a1 = (0:5 a2) cos (H) b0 = (G 1)(0:25 + 0:5a2) + 0:5 b1 = a1 b2 = (G 1)(0:25 + 0:5a2) a2 (fc is centre frequency, g is gain and Q is the bandwidth of each
  • 115. lter.) • Calculation of frequency response: Normally, a source/
  • 116. lter model would use coecients to design
  • 117. lters that will shape the band limited pulse generated, however this plugin incorporates something beyond just
  • 118. ltering. In a separate function, each
  • 119. lter's frequency re- sponse is derived from its transfer function. From the 2nd order
  • 120. lter transfer function: H (z) = b0 + b1z1 + b2z2 a0 + a1z1 + a2z2
  • 121. Chapter 3. Implementation 29 We calculate the frequency response by substituting z with ! = ej! and taking its complex conjugate: H (!) = b0 + b1 cos (!) jb1 sin (!) + b2 cos (2!) jb2 sin (2!) a0 + a1 cos (!) ja1 sin (!) + a2 cos (2!) ja2 sin (2!) H (!) = [b0 + b1 cos (!) + b2 cos (2!)] + j [b1 sin (!) b2 sin (2!)] [a0 + a1 cos (!) + a2 cos (2!)] + j [a1 sin (!) a2 sin (2!)] for: A = [b0 + b1 cos (!) + b2 cos (2!)] B = [b1 sin (!) b2 sin (2!)] C = [a0 + a1 cos (!) + a2 cos (2!)] D = [a1 sin (!) a2 sin (2!)] we get: H (!) = A+jB C+jD CjD CjD H (!) = [AC+BD]+j[ADBC] C2+D2 the magnitude response of the
  • 122. lter: jH (!)j = 1 C2D2 q (AC + BD)2 + (AD BC)2 (3.4)
  • 123. Chapter 3. Implementation 30 Note here that ! is frequency in radians (0 2) and can be written as ! = 2 fi fs The magnitude response of a
  • 124. lter encloses gain information for each frequency within its bandwidth. Figure 3.3: Magnitude response of the cascaded
  • 125. lter for vowel A For any generated sine wave with frequency fi which is part of the source waveform, the above function calculates its amplitude according to the 4 formant
  • 126. lters of each vowel. Essentially the plugin generates 34 pre
  • 127. ltered signals giving a lot more exibility in terms of transmission.The resulting synthesis method is the additive synthesis because the source is being constructed instead of being carved into shape. [3] Figure 3.4 demonstrates how the frequency response function has shaped the band limited pulse according to the 4 resonances of the vowel A. It
  • 128. Chapter 3. Implementation 31 Figure 3.4: The vowel A transmitted from the plugin is clearly shown how dierent frequencies are generated with dierent magnitudes. The 4
  • 129. lters are combined in a cascaded array at the end of the C++ function. As mentioned in chapter 2, the cascaded method is ideal for vowel synthesis and very easy to implement. Expressing the
  • 130. lter transfer function in a factorized form as seen below: H (z) = PM k=0 bkzk 1 PN k=1 akzk = G (1 z1z1) (1 z 1z1) (1 z2z1) (1 zz1) : : : (1 p1z1) (1 p1 z1) (1 p2z1) (1 p2 z1) : : : (3.5) where G is the cascaded
  • 131. lter gain and the poles pi and zeros zi are either complex conjugate pairs or real-valued. By grouping the factorized equation in terms of the complex conjugate and real valued pairs we
  • 132. Chapter 3. Implementation 32 get: H (z) = G (1 z1z1) (1 z 1z1) (1 p1z1) (1 p1 z1) (1 z2z1) (1 z 2z1) (1 p2z1) (1 p2 z1) : : : (3.6) Equation 3.6 represents the cascaded expression of 2nd order IIR
  • 133. lters enclosed in the big brackets which can also be expressed as: H (z) = G YK k=1 Hk (z) (3.7) Finally, a low pass
  • 134. lter with centre frequency at 2.5kHz has been added to the cascaded array so that unwanted harmonics are neglected. 3.4 Spectral Analysis The magnetic resonator piano, like most structures, has some speci
  • 135. c acoustic properties. When sound waves travel through its body their spectrum is inevitably shaped in a similar way the vocal tract shapes the glottal source. In the attempt to transmit the output of the plugin through the MRP, the piano's formants have to be taken under consideration. The accurate reproduction of the vowels generated from the AU plugin requires a medium with a at frequency response. Given that the piano certainly does not have a at frequency response a compensative method was necessary.
  • 136. Chapter 3. Implementation 33 The strings used for this project were speci
  • 137. c so that their resonant frequencies cover the vocal range (50 - 500Hz). C2, E2, G2, D3, E3, G3, C4, E4 and G4 were tested and their output was recorded in order to be analysed. Their resonant frequencies are: • C2 - 65Hz • E2 - 82Hz • G2 - 97Hz • D3 - 146Hz • E3 - 164Hz • G3 - 197Hz • C4 - 260Hz • E4 - 327Hz • G4 - 389Hz The idea is to feed the strings with all the vowels, a version of all the vowels
  • 138. ltered at 1kHz and white noise all at dierent dB levels, record all the outputs and derive an average frequency response for each string and consequently the whole body of the piano. Knowing how each string responds is very important towards the implementation of an inverse
  • 139. lter which will atten that response and make vowel transmission more intelligible.
  • 140. Chapter 3. Implementation 34 Figure 3.5: The frequency response of C4 under all test conditions for vowel A Figure 3.6: The frequency response of G3 for all vowels
  • 141. Chapter 3. Implementation 35 The response of a system h(n) knowing its input x(n) and output y(n) is a simple comparison of y(n) with regard to the x(n) and in theory it should be the same no matter what the input is. Figures 3.5 and 3.6 show the response of the string C4 under a variety of test conditions. The responses of each string for all vowels under every test condition were gathered in a signi
  • 142. cant number of arrays. By averaging these arrays a clear view of how each string behaves to a number of harmonic inputs is revealed and by taking their inverse average response a quanti
  • 143. able pattern emerges towards the creation of the inverse
  • 144. lter. Figure 3.7: Average frequency response and its inverse for string G3 The amplitude dierences between the red line and the blue line, shown in
  • 145. gure 3.7, at the frequencies corresponding to the peaks of the red line are the gain and frequency values to be imported to the inverse
  • 146. lter.
  • 147. Chapter 3. Implementation 36 3.5 Plugin Parameters Description The design of the vowel generator AU plugin is such so that it oers a variety of adjustable parameters to the user. This section includes a description of its architecture and an examination of its features. Figure 3.8: A picture of the plugin in operation Figure 3.8 shows the layout of the plugin. Its parameters from top to bottom are: • A master ON/ OFF button enables and disables the output. The button turns red to indicate when it is in OFF mode.
  • 148. Chapter 3. Implementation 37 • A volume knob ranging from 0 to 10 dB with a step of 0.1. • The wave type drop down list that oers 4 choices of waveform types to be generated. The user has the option to create vowels with sine waves, triangle waves, sawtooth waves or rectangular waves. • The vowel type drop down list is the parameter with which the user can switch between the 5 dierent vowels. • A frequency knob that sets the fundamental frequency of the vowel. This parameter ranges from 50 to 500 Hz with a step of 1. • 9 buttons for each string. These buttons con
  • 149. gure the settings of the inverse
  • 150. lter according to the analysis conducted on each string separately. The user may switch between 9
  • 151. lters depending on what piano key he is using. Each button turns white when active and red when not. • 34 volume sliders, one for each harmonic and the fundamental frequency f0. These sliders function as a graphic equaliser and range between -50 to +50 dB with a step of 1. When an inverse
  • 152. lter button is pressed, these sliders automatically take values for the inverse
  • 153. lter of the corresponding string. 3.6 Results This project set out to create a vowel synthesis system that receives intelligi- ble vowels from an AU plugin and plays them back through the MRP strings keeping a their intelligibility. The results of this attempt are dicult to show
  • 154. Chapter 3. Implementation 38 on paper. Evaluating whether a vowel is intelligible or not means one has to listen to it to comprehend it. Being able to identify whether a sound is a vowel, natural or not, is the goal here. The spectral content of the output of the piano in comparison to the output of the plugin is the optimal way to view this project's results. Due to the sheer volume of results this section will show the three most intelligible vowels transmitted from the piano and the three least intelligible. Figure 3.9: Spectrum of piano output vs plugin output of G3 playing the vowel O
  • 155. Chapter 3. Implementation 39 Figure 3.10: Spectrum of piano output vs plugin output of E4 playing the vowel A Figure 3.11: Spectrum of piano output vs plugin output of C4 playing the vowel O
  • 156. Chapter 3. Implementation 40 Figure 3.12: Spectrum of piano output vs plugin output of G2 playing the vowel A Figure 3.13: Spectrum of piano output vs plugin output of C2 playing the vowel I
  • 157. Chapter 3. Implementation 41 Figure 3.14: Spectrum of piano output vs plugin output of D3 playing the vowel E The
  • 158. gures of this section show the spectral behaviour of 6 piano strings, represented by the blue lines, superimposed on the spectrum of the plugin output or the piano input, represented by the dashed red lines. Intelligibility of the piano output is shown when the peaks of the blue line follow the pattern created by the peaks of the red line. In other words when the spectral shape of the blue line is close to the spectral shape of the red line or in some occasions the same, regardless of the general dierence in dB level, then it means that the plugin output has retained the magnitude relationship of its spectral peaks through the piano strings. In the occasion where the formants of the two lines do not follow the same pattern, it is clear that the acoustical structure of the piano has distorted the input's frequency content, thus the inverse
  • 160. Chapter 3. Implementation 42 It is clear that
  • 161. gures 3.9, 3.10 and 3.11 are graphs of intelligible vowels whereas
  • 162. gures 3.12, 3.13 and 3.14 are plots of vowels that are quite dif-
  • 163. cult to identify. Further discussion on the results will take place in the conclusion.
  • 165. nal chapter concludes the examination of the project. A discussion on the method chosen, the implementation and the results is included as well as a
  • 166. nal evaluation. The last two sections of the chapter concern possible improvements in terms of programming and analysis and a summary of the report. 4.1 Discussion A thorough investigation was carried out on the science behind the human voice, the most popular speech synthesis systems in the market and the magnetic resonator piano. In the literature review, this paper examined the mechanics of how voice is produced, deducted measurable parameters of voicing, and explained the major dierences between three speech synthesis models. 43
  • 167. Chapter 4. Conclusion 44 The formant synthesis model was chosen, which according to the investiga- tion was the appropriate method in terms of best quality for vowel generation and low computational cost. The JUCE framework was used within Xcode 5 on a Macintosh operating system for the creation of an audio plugin. The implementation of this model required the generation of a source waveform and its
  • 168. ltering by a cascaded array of 2nd order IIR
  • 169. lters. In chapter 3 a detailed analysis of the source/
  • 170. lter model is included, providing a descrip- tion of the
  • 171. ltering stage which takes this method a step further. After intelligible vowels were produced by the AU plugin, an analysis on how the structure of the MRP aects the frequency content of the input was conducted. The analysis revealed a pattern in the frequency response of the 9 strings of the piano that were used in this project. Nine inverse
  • 172. lters were incorporated into the audio plugin in order to obtain a at frequency response. Finally, 6 plots were shown representing the most intelligible vowels and the least intelligible vowels generated by this complex system. 4.2 Evaluation The project's aims were to create a vowel generation system using a digital audio synthesiser and to transmit its output accurately via the strings of the MRP. Considering that the AU plugin produced intelligible vowels that are in no way realistic, an understanding emerges of how the evaluation process will be carried out.
  • 173. Chapter 4. Conclusion 45 In terms of spectral shape relationship between the input and the output, a close relationship meaning success and a far relationship meaning failure, this project has been a success. Most of the 45 vowels transmitted via the piano strings at 9 dierent fundamental frequencies, had a very close spectral shape relationship with the input. The vowels generated by the digital audio synthesiser, have neglected a sig- ni
  • 174. cant number of characteristics that would make them sound real. Early re ections in the nasal and oral cavities, unvoiced sounds, small pitch varia- tions and many other parameters are not included in the source/
  • 175. lter model. The resulting sound lacks the transients of the natural voice and its closest approximation to reality is as if we took the steady state parts of a real vowel and put it in an inde
  • 176. nite loop. Although real it would not sound natural or intelligible. Within a context of successively changing vowels of the same type though the perception of intelligibility we have changes dramatically. For example when the piano plays the vowel A at 65Hz through C2, it sounds nothing like an A. But when it plays all the vowels in a successive order we start to recognise the dierent types. Even the vowels whose spectrum plots did not meet the success criteria, in a context of successive transmission of dierent vowels they become intelligible. And vice versa, the most intelligi- ble vowels in terms of spectral shape relationship are not intelligible when played out of any context. To conclude, the author's evaluation is that the project is a successful
  • 177. rst step towards a useable vowel generation system but as it is at the moment
  • 178. Chapter 4. Conclusion 46 it is incomplete. 4.3 Improvements Possible improvements towards the better function of this vowel synthesis system concern programming and spectral analysis. { Algorithm improvements The AU plugin although very successful, could incorporate some more parameters to generate a more natural output. Small pitch variations of the fundamental frequency of the source and the centre frequency of each formant
  • 179. lter could be implemented in order to model the human voice more accurately. White noise could model the unvoiced air escap- ing the vocal folds resulting in a harmonically richer output. Finally a parallel array of 2nd order IIR
  • 180. lters could make consonant genera- tion possible and would model vowel re ections in the mouth and nasal cavities. { Spectral analysis improvements The inverse
  • 181. ltering applied in this project is based on the linear rela- tionship between input and output. The MRP is a physical structure and like all physical structures it is non linear. One of the main reasons that this project had partial success in the frequency content relation- ship between input and output, was intermodulation distortion. This
  • 182. Chapter 4. Conclusion 47 non linear distortion produced by the acoustic architecture of the pi- ano, resulted in the appearance of some very unpredictable harmonics in the output spectrum. A major improvement for this project would be to analyse the non linearity of the piano and derive a mathematical model that will predict the distortion and will design its inverse
  • 183. lter more accurately. This may allow the creation of a function performing vowel transition across their corresponding spectral targets. This parameter was attempted during this project but distortion coming from the piano excited a vast variety of harmonics during the vowel transition. The result was very noisy and contradicted with the goals of the project.
  • 184. Chapter 4. Conclusion 48 4.4 Summary and Final Thoughts The goals of this bold attempt described in this paper both have been and have not been achieved. The research and implementation by the author may have resulted in a robust vowel synthesis system which successfully transmits the majority of the vowels from a digital audio synthesiser via the MRP strings, however it is not a useable instrument. Improvements have been proposed and room for further research by other scientists has been left by the author. Overall, it is hoped that this project has been a positive step towards the fascinating
  • 185. eld of speech processing.
  • 186. Bibliography [1] History and development of speech synthesis. 2006. URL http://www.acoustics.hut.fi/publications/files/theses/ lemmetty_mst/chap2.html. [2] Richard W. Sproat. Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, volume 4. Springer, 1997. [3] Sam O'Sullivan. Understanding the basics of sound synthesis. The Pro Audio Files, February 2012. URL http://theproaudiofiles. com/sound-synthesis-basics/. [4] Andrew McPherson. The magnetic resonator piano: Electronic aug- mentation of an acoustic grand piano. Journal of New Music Research, 39(3):189 { 202, 2010. [5] Youngmoo Kim Andrew McPherson. Augmenting the acous- tic piano with electromagnetic string actuation and contin- uous key position sensing. NIME, 1:217{222, 2010. URL http://www.educ.dab.uts.edu.au/nime/PROCEEDINGS/papers/ Paper%20K1-K5/P217_McPherson.pdf. 49
  • 187. Bibliography 50 [6] Glen Lee. Voice synthesis. The Encyclopaedia of Virtual Environ- ments, 1, 1993. URL http://www.hitl.washington.edu/projects/ knowledge_base/virtual-worlds/EVE/I.B.2.VoiceSynthesis. html. [7] Upper respiratory system diagram. . URL http://quizlet.com/ 12905648/module-1-the-respiratory-system-anatomy-and-physiology-flash-cards/. [8] Johan Sundberg. The acoustics of the singing voice. 1997. URL http: //www.zainea.com/voices.htm. [9] Jackie R. Haynes and Ronald Netsell. The mechanics of speech breath- ing: a tutorial. Department of Communication Sciences and Disorders Southwest Missouri State University, 2001. [10] Deirdre D. Michael. About the voice. Lions Voice Clinic, 2014. URL http://www.lionsvoiceclinic.umn.edu/page2.htm. [11] Larynx. URL http://learnhumananatomy.com/larynx/. [12] The Voice Foundation. Voice anatomy physiology. 2014. URL http://voicefoundation.org/health-science/voice-disorders/ anatomy-physiology-of-voice-production/. [13] Janwillem van den Berg. Myoelastic aerodynamic theory of voice production. Journal of Speech, Language, and Hearing Research, September 1958. URL http://jslhr.pubs.asha.org/article.aspx? articleid=1749406.
  • 188. Bibliography 51 [14] C. Julian Chen. Physics of human voice: A new theory with applica- tion. Research Conference Columbia University, 1(1):1{19, November 2012. URL http://www.google.co.uk/url?sa=trct=jq=esrc= ssource=webcd=1ved=0CCMQFjAAurl=http. [15] Glottal source spectrum. . URL http://www.ncvs.org/ncvs/ tutorials/voiceprod/images/5.5.jpg. [16] Dinesh K. Chhetri. Neuromuscular control of fundamental frequency and glottal posture at phonation onset. Acoustical Society of America, November 2011. URL http://headandnecksurgery.ucla.edu/ workfiles/Academics/Articles/neuromusc_control_chhetri_et_ al.pdf. [17] Maeva Garnier Joe Wolfe and John Smith. Voice acoustics: an intro- duction. UNSW, 2, May 2009. URL http://www.phys.unsw.edu.au/ jw/voice.html. [18] Eric Armstrong. Journey of the voice: Anatomy, physiology and the care of the voice. Voice and Speech source, 2008. URL http://www. yorku.ca/earmstro/journey/resonation.html. [19] Fant G. Acoustic theory of speechproduction. Mouton co, 1960. [20] Source/
  • 189. lter model. URL http://www.phys.unsw.edu.au/jw/ speechmodel.html.
  • 190. Bibliography 52 [21] Alex Acero Xuedong Huang and Hsiao Wuen Hon. Spoken Language Processing a Guide to Theory, Algorithm, and System Development. Prentice Hall,
  • 191. rst edition, 2001. [22] Articulators. URL http://educationcing.blogspot.co.uk/2012/ 08/articulatory-phonetics-vocal-organs.html. [23] Concatenative synthesis. URL http://www.acoustics.hut.fi/ publications/files/theses/lemmetty_mst/chap9.html. [24] Praat. . URL http://www.fon.hum.uva.nl/praat/. [25] Will Pirkle. Designing Audio Eect Plug-Ins in C++. Focal Press,