Micro-Electro-Mechanical System, MEMS, is a technology that in its most general form can be
defined as a miniaturized mechanical and electro-mechanical element that are made using the
techniques of micro fabrication. The critical physical dimensions of the MEMS Devices can vary
from well below one micron on the lower end to several millimetres. Likewise the types of
MEMS Devices can vary from relatively simple structures having no movements to extremely
complex electromechanical systems with multiple moving elements under the control of
integrated microelectronics. The one main criterion of MEMS is that there are at least some
elements having some sort of mechanical functionality whether or not these elements can move.
The term used to define MEMS varies in different parts of the world. In United States it is
predominantly called MEMS, while in some other part of the world as Microsystems
Technology. MEMS technology has enabled us to realize advanced micro devices by using
processes similar to VLSI technology. When MEMS devices are combined with other
technologies new generation of innovative technology will be created. This will offer
outstanding functionality.MEMS has been identified as one of the most promising technologies
for the 21st
century and has the potential to revolutionize both industrial and consumer products
by combining silicon based microelectronics with micromachining technology. Its techniques
and micro system-based devices have the potential to dramatically affect of all our lives and the
way we live. If semiconductor micro manufacturing was seen to be the first manufacturing
revolution, MEMS is the second revolution.
The functional elements of MEMS are miniaturized structures, sensors, actuators, and
microelectronics. The most notable elements are the microsensors and microactuators.
Microsensors and microactuators are appropriately categorized as “transducers”, which are
defined as devices that convert energy from one form to another. In the case of microsensors, the
device typically converts a measured mechanical signal into an electrical signal. Microsensors
detect changes in the system’s environment by measuring mechanical, thermal, magnetic,
chemical or electromagnetic information or phenomena. Microelectronics processes this
information and signals the microactuators to react and create some form of changes to the
Figure 2.1: different combination of opto-electro-mechanical systems.
Silicon integrated circuit industry is able to produce devices in volume with very high yield at
low cost. Silicon has driven the semiconductor industry and allowed for stable reduction in size
for more than 3 decades. In MEMS silicon technology is well established the possibility of
integration with microelectronics on a single chip. While the device electronics are fabricated
with IC chip technology, the micromechanical components are fabricated by sophisticated
manipulation of silicon and other substrates using the micromachining processes.
Manufacturing Process of MEMS Technology
Today, MEMS have the capability to produce almost any type of electronic devices. To fully
understand what MEMS are, basic of the MEMS manufacturing process, fabrication process, and
their material compositions are important to know.
MEMS are generally made from a material called polycrystalline silicon which is a common
material also used to make integrated circuits. Frequently, polycrystalline silicon is doped with
other materials like germanium or phosphate to enhance the materials properties. Sometimes,
copper or aluminium is plated onto the polycrystalline silicon to allow electrical conduction
between different parts of the MEMS devices.
Figure 3.1: Positive and Negative photo resist. 
Photolithography is the basic technique used to define the shape of micro machine structures.
The technique is essentially the same as that used in the microelectronics industry described.
There are two types of photo resist, termed positive and negative photo resist. Where the
ultraviolet light strikes the positive resist it weakens the polymer, so that when the image is
developed the resist is washed away where the light struck it, transferring a positive image of the
mask to the resist layer. The opposite occurs with the negative resist. Where the ultraviolet light
strikes negative resist it strengthens the polymer, so when developed the resist that was not
exposed to ultraviolet light is washed away, a negative image of the mask is transferred to the
resist. A chemical is used to remove the oxide where it is exposed through the openings in the
resist. Finally, the resist is removed leaving the patterned oxide.
Figure 3.1 shows the thin film of some material(eg:silicon dioxide) on the substrate of some
other material(eg:silicon wafer).It is desired that some of the silicon dioxide is selectively
removed so that it only remains in particular areas on the silicon wafer. Firstly, a mask is
produced.This will typically be a chromium pattern on a glass plate. The wafer is then coated
with a polymer which is sensitive to ultraviolet light called a photo resist. The photo resist is then
developed which transfers the pattern on the mask to the photoresist layer.
2.3. Silicon Micromachining
There are number of basic techniques that can be used to pattern thin films that have been
deposited on a silicon wafer, and to shape the wafer itself, to form a set of basic microstructures
(bulk micromachining). The techniques for depositing and patterning thin films can be used to
produce quite complex microstructures on the surface of silicon wafer (surface silicon
micromachining). Electrochemical etching techniques are being investigated to extend the set of
basic silicon micromachining techniques. Silicon bonding techniques can also be utilized to
extend the structures produced by silicon micromachining techniques into multilayer structures.
There are 3 basic techniques associated with silicon micromachining. They are:
1. Deposition of thin films of materials.
2. Removal of material by wet chemical etching.
3. Removal of material by dry chemical etching.
There are number of different techniques that facilitate the deposition or formation of very thin
films of different materials on a silicon wafer. These films can then be patterned using
photolithographic techniques and suitable etching techniques. Common materials include silicon
dioxide, polycrystalline silicon and aluminium. The number of other materials can be deposited
as thin films, including noble metals such as gold. Noble metals will contaminate
microelectronic circuitry causing it to fail, so any silicon wafers with noble metals on them have
to be processed using equipments specially set aside for the purpose. Noble metal films are often
patterned by a method known as “lift off” rather than wet or dry etching.
Wet etching is a blanket name that covers the removal of material by immersing the wafer in a
liquid bath of the chemical etchant. Wet etch ants fall into two broad categories; isotropic
etchants and anisotropic etchants.
Isotropic etchants attack the material being etched at the same rate in all directions. Anisotropic
etchants attack the silicon wafer at different rates in different directions, and so there is more
control of shapes produced. Some etchants attack silicon at different rates being on the
concentration of impurities in the silicon.
Figure 3.2: Isotropic and Anisotropic Etching. 
The most common form of dry etching for micromachining applications is reactive ion etching.
Ions are accelerated towards the material to be etched, and the etching reaction is enhanced in the
direction of travel of ion. Reactive ion etching is an anisotropic etching technique. Deep trenches
and pits of arbitrary shape and with vertical walls can be etched in a variety of materials
including silicon, oxide, and nitride.
Lift off is a stencilling technique often used to pattern noble metal films. There are a number of
different techniques. A thin film of assisting material (eg. oxide) is deposited. A layer of resist is
put over this and patterned as for photolithography, to expose the oxide in the pattern desired for
the metal. The oxide is then wet etched so as to undercut the resist. The metal is then deposited
on the wafer, typically by a process known as evaporation. The metal pattern is effectively
stencilled through the gaps in the resist, which is then removed lifting off the unwanted metal
with it. The assisting layer is then stripped off through leaving the metal pattern alone.
Excimer LASER Micromachining
Excimer lasers produce relatively wide beams of ultraviolet laser light. One interesting
application of these lasers is their use in micromachining organic materials (plastics, polymers,
etc). This is because the excimer laser doesn't remove material by burning or vaporizing it,
unlike other types of laser, so the material adjacent to the area machined is not melted or
distorted by heating effects.
When machining organic materials the laser is pulsed on and off, removing material with each
pulse. The amount of material removed is dependent on the material itself, the length of the
pulse, and the intensity (fluency) of the laser light. Below certain threshold fluency, dependent
on the material, the laser light has no effect. As the fluency is increased above the threshold, the
depth of material removed per pulse is also increased. It is possible to accurately control the
depth of the cut by counting the number of pulses. Quite deep cuts (hundreds of microns) can be
made using the excimer laser.
Figure3.3: Excimer laser Micromachining.
The shape of the structures produced is controlled by using chrome on quartz mask, like the
masks produced for photolithography. In the simplest system the mask is placed in contact with
the material being machined, and the laser light is shone through it. A more sophisticated and
versatile method involves projecting the image of the mask onto the material. Material is
selectively removed where the laser light strikes it.
Micro Electro Mechanical systems (MEMS), particularly those with radio frequency (RF)
applications, have demonstrated significantly better performance over current electromechanical
and solid-state technologies. Surface roughness and asperity micro contacts are critical factors
that can affect contact behaviour at scales ranging from the nano to the micro in MEMS devices.
One of the major objectives in the design of RF MEMS with metal contacts is to have repeatable
and reliable electrical contacts. However, the complexity of physical and mechanical interactions
at micro contacts has made it extremely difficult to obtain accurate predictions of RF MEMS
behaviour, such that reliable devices can be designed for significantly improved life cycles.
Validated modelling methods can provide MEMS switch designers with insights on the evolution
of contact pressures, inelastic deformations, potential failure modes, and micro structural
behaviour of asperity micro contacts. Hence, guidelines can be incorporated in the design and
fabrication process to effectively size critical components and forces to provide stable contact
resistance for significantly improved device durability and performance.
Compound solid state switches such as GaAs MOSFETs and PIN diodes are widely used in
microwave and integrated circuits (ICs) for telecommunications applications including signal
routing, impedance matching networks, and adjustable gain amplifiers. However, these solid-
state switches have a large insertion loss (typically 1 dB) in the on state and poor electrical
isolation in the off state. The recent developments of micro-electromechanical systems (MEMS)
have been continuously providing new and improved paradigms in the field of microwave
applications. Different configured micro machined miniature switches have been reported.
Among these switches, capacitive membrane microwave switching devices present lower
insertion loss, higher isolation, better nonlinearity and zero static power consumption.
RF MEMS Switches
Basically RF MEMS switches are of two configurations
• RF series contact switch
• RF shunt capacitive switch
Currently, both series and shunt RF MEMS switch configurations are under development, the
most common being series contact switches and capacitive shunt switches.
4.1. RF Series Contact Switch
An RF series switch operates by creating an open or short in the transmission line, as shown in
Figure 4.1. The basic structure of a MEMS contact series switch consists of a conductive beam
suspended over a break in the transmission line. Application of dc bias induces an electrostatic
force on the beam, which lowers the beam across the gap, shorting together the open ends of the
transmission line1. Upon removal of the dc bias, the mechanical spring restoring force in the
beam returns it to its suspended (up) position. Closed circuit losses are low (dielectric and I2R
losses in the transmission line and dc contacts) and the open-circuit isolation from the ~100 μm
gap is very high through 40 GHz. Because it is a direct contact switch, it can be used in low
frequency applications without compromising performance.
Figure 4.1: Circuit equivalent of RF MEMS series contact switch.
4.2. RF Shunt Capacitive Switch
Figure 4.2: Circuit equivalent of RF MEMS shunt capacitive switch.
A circuit representation of a capacitive shunt switch is shown in Figure 4.3. In this case, the RF
signal is shorted to ground by a variable capacitor. Specifically, for RF MEMS capacitive shunt
switches, a grounded beam is suspended over a dielectric pad on the transmission line When the
beam is in the up position, the capacitance of the line-dielectric-air-beam configuration is on the
order of ~50 fF, which translates to a high impedance path to ground through the beam
[IC=1/(ῳC)]. However, when a dc voltage is applied between the transmission line and the
electrode, the induced electrostatic force pulls the beam down to be coplanar with the dielectric
pad, lowering the capacitance to pF levels, reducing the impedance of the path through the beam
for high frequency (RF) signal and shorting the RF to ground. Therefore, opposite to the
operation of the series contact switch, the beam in the up position corresponds to a low-loss RF
path to the output load, while the beam in the down
position results in RF shunted to ground and no RF signal at the output load. While the shunt
configuration allows hot-switching and gives better linearity, lower insertion loss than the
MEMS series contact switch, the frequency dependence of the capacitive reactance restricts high
quality performance to high RF signal frequencies (5-100 GHz), whereas the contact switch can
be used from dc levels.
Switch Design and Operation
The geometry of a capacitive MEMS switch is shown in Fig.5.1. The switch consists of a lower
electrode fabricated on the surface of the glass wafer and a thin aluminium membrane suspended
over the electrode. The membrane is connected directly to grounds on either side of the electrode
while a thin dielectric layer covers the lower electrode. The air gap between the two conductors
determines the switch off-capacitance. With no applied actuation potential, the residual tensile
stress of the membrane keeps it suspended above the RF path. Application of a DC electrostatic
field to the lower electrode causes the formation of positive and negative charges on the
electrode and membrane conductor surfaces. These charge exhibit an attractive force which,
when strong enough, causes the suspended metal membrane to snap down onto the lower
electrode and dielectric surface, forming a low impedance RF path to ground.
The switch is built on coplanar waveguide transmission lines, which have an impedance of 50Ω
that matches the impedance of the system. The width of the transmission line is 160 m and the
gap between the ground line and signal line is 30 m. The insertion loss is dominated by the
resistive loss of the signal line and the coupling between the signal line and the membrane when
the membrane is in the up position. To minimize the resistive loss, a thick layer of metal needs
be used to build the transmission line. The thicker metal layer results in a bigger gap that reduces
the coupling between signal and ground yet also requires higher voltage to actuate the switch. To
achieve a reasonable actuation voltage, a 4m thick copper is used as the transmission line. The
glass wafer is chosen for the RF switch over a semi-conductive silicon substrate since typical
silicon wafer is too lossy for RF signal. When the membrane is in the down position, the
electrical isolation of the switch mainly depends on the capacitive coupling between the signal
line and ground lines. The dielectric layer plays a key role for the electrical isolation. The smaller
the thickness and the smoother the surface of the dielectric layer, the better isolation of the
switch is. But there is another trade-off here. When the membrane is pulled down, the biased
voltage is directly applied across the dielectric layer. Since this layer is very thin, the electric
field within the dielectric layer is very high. The thickness of the dielectric layer should be
chosen such that the electric field will never exceed the breakdown electric field of the dielectric
material. The silicon nitride film has breakdown electric field as high as several mega-volts per
centimetre and can be utilized as dc block dielectric layer. The thickness of the silicon nitride
layer is chosen as 0.2 m to accomplish the dc block and RF coupling purpose.
Figure 5.1: Capacitive RF MEMS switch. (Top and cross-sectional view).
The switches were fabricated by surface micro-machining techniques with a total of four
masking level. No critical overlay alignment was required. Fig. shows the essential process steps:
1. Ti/Cu seed layer deposition: The starting substrate was a 2-inch glass wafer. A layer of
titanium (0.05 m) and copper (0.15 m) was sputtered on the substrate as seed layer for
2. Silicon nitride deposition: A layer of silicon nitride (0.2 m) was deposited and patterned as DC
block and reactive ion etch.
3. Copper electroplating: A photo resist layer was spin coated and patterned to define the
electroplating area. Then, a 4m thick copper layer was electroplated to define the coplanar
waveguide and the posts for the membranes.
4. Aluminium deposition: A layer of aluminium (0.4 m) was deposited by using electron beam
evaporation and patterned to form the top electrode in the actuation capacitor structure.
5. Release: The photo resist sacrificial layer was removed to finalize the switch structure.
The major characteristics of the switch are the insertion loss when the signals pass
through and the isolation when signals are rejected. In the off-state the RF signal passes
underneath the membrane without much loss. In the on-state, between the central signal line and
coplanar waveguide grounds exists a low impedance path through the bended membrane. The RF
signal will be reflected by the switch. The resonant frequency of 23.4 GHz was observed when
the membrane was in the down position. This means that the switch can be equivalently
modelled as a capacitor, inductor and resistor connected in series between the
signal and ground lines. Since the switch has a better isolation around the resonant frequency, it
can be designed such that the desired frequency overlaps with the resonant frequency by
adjusting the geometry of the switch. The actuation voltage of the MEMS switch is about 50V.
The spring constant of the membrane and the distance between the membrane and the
bottom electrode determines the actuation voltage of the switch. The spring constant of the
membrane is mainly determined by the membrane material properties, the membrane geometry,
and the residual stress in the membrane.
CMOS-based monolithic MEMS technology proposed to solve many of the problems. It
consists of masks processing after the completion of standard CMOS processing flow. The goal
is to minimize the issues caused by mechanical stresses in micro machined layers by supporting
them with a patterned polyamide substrate and at the same time form thick conductors to lower
the conductor losses. The enabling processing techniques are thick-film processing, Stress-
compensation, electroplating. The process starts with a standard CMOS process flow. The
complementary masks are fabricated through an independent mask maker.
Figure 5.2:Process flow (a) Seed layer deposition (b)Dielectric layer deposition and patterning(c)
Spacer coater and patterning(d)Transmission line electroplating(e)Membrane deposition and
General Reliability Concerns
6.1. Metal Contact Resistance (Series Contact Switches)
Series contact switches tend to fail in the open circuit state with wear. Even though the bridge is
collapsing and making contact with the transmission line, the conductivity of the contact
metallization area decreases until unacceptable levels of power loss are achieved. These
increases in resistivity of the metal contact layer over cycling time may be attributed to frictional
wear, pitting, hardening, non-conductive skin formation, and/or contamination of the metal.
Pitting and hardening can be reduced by decreasing the contact force during actuation. But
tailoring the design to minimize the effect involves balancing operational conditions (contact
force, current, and temperature), plastic deformation properties, metal deposition method, and
switch mechanical design. In other cases, the resistivity of the contact increases with use due to
the formation of a thin dielectric layer on the surface of the metal.
While this has been documented, the underlying physical mechanisms are not currently well
understood. As the RF power level is raised above 100 mW, the aforementioned failures are
exacerbated by the increased temperature at the contact area and, under hot-switching conditions,
arcing and microwelding between the metal layers.
6.2. Dielectric Breakdown (Shunt Capacitive Switches)
Shunt capacitive switches often fail due to charge trapping, both at the surface and in the bulk
states of the dielectric. Surface charge transfer from the beam to the dielectric surface results in
the bridge getting stuck in the up position (increased actuation voltage). Bulk charge trapping, on
the other hand, creates image charges in the bridge metallization and increases the holding force
of the bridge to a value above its spring restoring force. There are several actions that can be
taken to mitigate dielectric charging in the design phase, including
choosing better dielectric material and designing peripheral pull-down electrodes to decouple the
actuation from the dielectric behaviour at the contact. Unlike series contact switches, capacitive
shunt switches do not experience hard failures at RF power levels > 100 mW, as long as the
bridge contact metallization is thick enough to handle the high current densities. However, RF
power may be limited in some cases by a recoverable failure, self-actuation. While not yet fully
understood, it has been observed that a capacitive shunt switch will self-actuate at 4W of RF
power and experience latch-up (stuck in down position) in hot-switching mode at 500 mW. Even
though these “failures” are recoverable, the switch operates normally if the RF power is
decreased below the latch-up value of 500 mW, they still illustrate a lifetime consideration for
high power applications.
6.3. Radiation and Other Effects
There are some areas of RF MEMS reliability research that have not been investigated in detail
and are in need of immediate attention. For example, RF MEMS series contact switches were
thought to be immune to radiation effects, design-dependent charge separation effects in the pull-
down electrode dielectric material, which noticeably decreases the actuation voltage of the
device. This immediately begins the question of how radiation effects will accelerate the
dielectric material failure mechanisms of capacitive switches, Which have known dielectric
failure mechanisms or other series switches that utilize dielectric material in their electrode
Comparison of MEMS Switches with Solid State Switches
RF switches are used in a wide array of commercial, aerospace, and defence application areas,
including satellite communications systems, wireless communications systems, instrumentation,
and radar systems. In order to choose an appropriate RF switch for each of the above scenarios,
one must first consider the required performance specifications, such as frequency bandwidth,
linearity, power handling, power consumption, switching speed, signal level, and allowable
losses. Traditional electromechanical switches, such as waveguide and coaxial switches, show
low insertion loss, high isolation, and good power handling capabilities but are power-hungry,
slow, and unreliable for long-life applications. Current solid-state RF technologies (PIN diode-
and FET- based) are utilized for their high switching speeds, commercial availability, low cost,
and ruggedness. Their inherited technology maturity ensures a broad base of expertise across the
industry, spanning device design, fabrication, packaging, applications system insertion and,
consequently, high reliability and well-characterized performance assurance. Some parameters,
such as isolation, insertion loss, and power handling, can be adjusted via device design to suit
many application needs, but at performance cost elsewhere. For example, some commercially
available RF switches can support high power handling, but require large, massive packages and
high power consumption. Table 7.1. shows a comparison of MEMS, PIN-diode and FET switch
Table 7.1:Comparision of MEMS Switches with Solid State Switches.
Parameter RF MEMS PIN DIODE FET
Voltage(mV) 20-80 3-5 3-5
Current(mA) 0 0-20 0
Powerconsumption(mW) 0.5-1 5-100 -0.5-0.1
Switching 1-300µS 1-100ns 1-100ns
Power Handling(W) <1 <10 <10
In spite of this design flexibility, two major areas of concern with solid-state switches persist,
Breakdown of linearity and frequency bandwidth upper limits. When operating at high RF
power, nonlinear switch behaviour leads to spectral regrowth, which smears the energy outside
of its allocated frequency band and causes adjacent channel power violations as well as signal to
noise problems. The other strong driving mechanism for pursuing new RF technologies is the
fundamental degradation of insertion loss and isolation at signal frequencies above 1-2 GHz. By
utilizing electromechanical architecture on a miniature (or micro) scale, MEMS RF switches
combine the advantages of traditional Electromechanical switches (low insertion loss, high
isolation, and extremely high linearity) with those of solid-state switches (low power
consumption, low mass, long lifetime).
RF MEMS switches are slower and have lower power handling capabilities. All of these
advantages, together with the potential for high reliability long lifetime operation make RF
MEMS switches a promising solution to existing low-power RF technology limitations.
Advantages of MEMS
There are many advantages of using MEMS rather than ordinary large scale machinery.
• Ease of production.
• MEMS can be mass-produced and are inexpensive to make.
• Ease of parts alteration.
• Higher reliability than their macro scale counterparts.
• IC technology used: Integrated multiple and more complex functions on a chip, to form
monolithic systems. Miniaturization with no loss of functionality, improved performance.
• Basic fabrication: Reduced manufacturing cost and time.
• Micro components make the system faster, more reliable, more portable,low power
consumption, easily and massively employed, easily maintained and replaced.
• Easy to integrate into systems and modify.
• Little harm to the Environment and can be incorporating.
Disadvantages of MEMS
• Due to their size, it is physically impossible for MEMS to transfer any significant power.
• MEMS are made up of Poly-Si (a brittle material), so they cannot be loaded with large
• Standard IC packing cannot be used because of the moving parts the MEMS structure.
• Many standard production steps that improve the mechanical structure that degrade the
Electronics and vice versa.
• The unavailability of the standard design software.
Application of MEMS
• Inertial navigation units on a chip for munitions guidance and personal navigation.
• Electromechanical signal processing for ultra-small and ultra low-power wireless
• Distributed unattended sensors for asset tracking, environmental monitoring, and
• Integrated fluidic systems for miniature analytical instruments, propellant, and
• Weapons safing, arming, and fusing. Embedded sensors and actuators for condition-
• Mass data storage devices for high density and low power.
Low power consumption, low insertion loss, high isolation, excellent linearity and the ability to
be integrated with other electronics all make MEMS switches an attractive alternative to
mechanical and solid state switches. These switches will have applications in phase antenna
arrays, in MEMS impedance matching networks and in communications applications. MEMS
which are going to be the future of the modern technical field in the growth of micro sensor
based applications such as automotive industries, wireless communication, security systems, bio
medical instrumentation and in armed forces.
RF MicroElectroMechanical systems (MEMS) technology has been proven to be one of the
most valuable technologies for low-loss, low-power microwave components and systems’
applications for telecommunications. Developments in this technology have made possible the
design and fabrication of control devices suitable for switching microwave signals. Furthermore,
RF MEMS switches offer superior performance such as high isolation, low insertion loss, and
low power consumption compared to conventional FET or PIN diodes. MEMS is an emerging
technology which uses the tools and technologies that were developed for the IC industry to
build microscopic machines, which are build on a standard microscopic silicon wafers.
In summary, a low-cost, high-performance, RF MEMS technology compatible with CMOS and
high-voltage devices. High-performance RF MEMS switch, high voltage MOSFET, and CMOS
devices were all integrated on the same chip.
 Sazzadur Choudhury, M. Ahmadi, and W.C. Miller, “Micromechanical system for System-
on-Chip Connectivity”, IEEE Circuits and Systems, Page(s) 112-132 September 2002
 J. B. Muldavin, G. M. Rebeiz, "High Isolation RF MEMS Shunt Switches-Part 2: Design",
IEEE Tran. On Microwave Theory and Techniques, Vol.6, Page(s): 253-276.
 P. Osterberg, H. Yie, X. Cai, J. White, and S. Senturia, “Self-consistent simulation and
modeling of electrostatic ally deformed diaphragms,“ in Proc. IEEE MEMS Conf. January 1994,
 Gopinath. A and Ranklin.JB, IEEE Transaction on Electronic development, GaAs FET RF
switches “, vol. 12, Page(s) 18-37, August 2003
Formant Extraction and Speech Recognition
Formant features can be interpreted as adaptive non-uniform samples of the signal spectrum that are located in the
resonance frequencies of the vocal tract and normally happen to have higher signal-to-noise ratios than the other
parts. The number and the position of these frequencies along the frequency axis might differ depending on the
phonemes and the position of the window along the phoneme (i.e. beginning or ending part of a phoneme). Along
with the formants (the resonance frequency), we might use the bandwidth and/or magnitude of the spectrum in that
particular frequency to encode the properties of the speech and use them in different applications such as speech
recognition, enhancement, noise reduction, hearing aid adaptive filters, etc.
There are several methods of formant extraction such as peak picking, HMM2 and LP model pole extraction. The
main method used in this work is the LP model pole extraction combined with a rule based method for pole
refinement. This method unexpectedly results in high recognition rates for unvoiced phonemes which do not have
any formants at all.
Figure 1. The LP model of a signal and the segment of the signal in time domain. A) Is the LP model frequency
response where the + and * correspond to the position of the formants along frequency axis. B) In the time domain
we can see that there is a kind of periodicity in the signal.
Figure 1 illustrates the frequency spectrum of the LP model of a segment of speech signal. The LP model is the
Linear Prediction model where it is assumed that the signal is predictable from a limited number of its past values:
Where ak’s are the Linear Prediction Coefficients (LPC), e(m) is the error of prediction and x(m) is the signal. In
the z domain
This filter is an all pole filter, since the numerator of its transfer function is a constant. The
input to the system is the error function which can be interpreted as the unpredictable part
of the signal or excitation which derives the all-pole system.
The characteristics of a speech signal varies with time since it is a sequence of different phonemes with different
frequency characteristics combined with pauses and periods of silence. To extract these characteristics we need to
chop the signal into segments which are more stationary and have some predictable behaviour across time and
frequency. These segments however should overlap to avoid the effect of discontinuity. To extract the formant
trajectory of the signal we need first to chop the signal into these overlapping segments and pre-process them. The
pre-processing is basically something called windowing. To window a segment is to multiply it by another segment
of the same length which usually has its maximum in the middle and smooth endings. This is to minimize the effect
that chopping the signal has in the edges. These windowed segments are then linear predicted so that we will have a
set of linear prediction coefficients for each segment that yields to the same formulation in equations above. The
frequencies of the complex poles of HLP(z) are the candidate frequencies for formants since the
poles in a system model the resonances in that system.
The bandwidth of the formants and the magnitude of the LP model are two other features usually extracted and used
in speech processing. If z1 is a complex pole in HLP(z), then the features of that pole is calculated using:
Where is the sampling frequency, F is the formant BW is the 3dB bandwidth of the spectrum in that frequency and
M is the magnitude of the spectrum in that frequency. The effect of noise can be measured in different SNR for each
phoneme. This could be done using labelled speech signals where the boundaries of the phonemes are given or
Figure2. the effect of noise on the distribution of the pole frequencies
Figure 2 illustrates the effect of train noise with SNR=0dB on the pole frequencies’ distribution. The red (dashed)
curve is the histogram of the pole frequencies of different phonemes in 0dB noise and the blue (solid) curve is that
of clean signal. The data where extracted using 130 sentences uttered by an American male speaker. The train noise
where recorded in real situation on a train in London with a sampling frequency of 8Khz. The spectrum of the noise
is illustrated in figure 2.
Figure 3. the spectrum of the train noise fs=8000
Formant tracking in Noise
The next task to do is to actually track the formants, whether in noisy or clean conditions. So far we have extracted
the poles of the LP model of the overlapping segments of the speech signal, and calculated their bandwidth and
frequency. These are the candidates that might be chosen from to form the formant tracks. However, there are other
criteria which will be used to refine these candidates and find the desired tracks. These conditions include limitation
of frequency and bandwidth as well as continuity. This makes the method a rule-based method or algorithm for
formant extraction. These rules are actually based on our knowledge of speech signals and the formants. The method
used in this work is a variable LP order rule based method which is discussed below.
Variable LP Order Rule Based Formant Tracking
Figure 4 illustrates the block diagram of the program’s different modules and their interrelation.
First the speech signal enters the pre-processing module which any we might have the pre-
emphasis there. The signal is chopped to overlapping segments of length 25ms. These segments
have usually 15ms overlap with each other. The window type used in this method is hamming
window of the same length of the segments (400 samples in 16 KHz).
Figure 4. Variable LP Order Formant Tracker
After a segment is ready and pre-processed the LP coefficients of it will be calculated. The
primary LP order is set 11 or 13 so that we will usually have 5 pair of complex poles which
introduce the resonances of the system. The poles are then sorted regarding to their frequencies,
the real valued poles will be eliminated and only one of each pair will be picked since they both
introduce the same frequency and bandwidth. The set of frequencies and bandwidths will go
through the Rule based refinement then where some of them might be eliminated due to the
criteria used. The first criterion is the maximum frequency which determines a frequency that is
the maximum frequency possible for the last formant. Any poles with a higher frequency than
that will be eliminated. One simple value that might be used is the number of formant × 1000
e.g. the fourth formant has a maximum of 4000.
The other criterion is bandwidth limitation which limits the bandwidth of the poles to a certain
limit determined by the behaviour of speech formants. This criterion is set to avoid poles with
large bandwidths, which normally do not represent the formants. The threshold is set to 600 Hz
in the program so that the poles with bandwidths larger than 600 Hz will be eliminated. These
poles might be due to the noise in that segment because noise poles, although not representing a
resonance normally, but might be modelled with one or more poles centred in a relatively large
range of frequency where the most amount of noise energy is concentrated in. This technique
also might help to distinguish between two poles that are so near to each other that are merged
into one single pole with relatively large bandwidth. The first question one might ask is that if
the pole, which is due to the combination of two poles, is eliminated how we can bring the two
merged poles into calculation? The answer to this question is why we use variable LP order
After the rule based refinement/elimination of the poles we might end up with only a few poles
which might not be sufficient for the rest of the process if we are to have a fixed pre-determined
number of formants. For example if we use a primary order of 13 we might have a maximum
number of 6 poles. Now suppose we eliminate 3 of these poles in the refinement process while
we are going to need at least 4 poles (to extract 4 formants). This might cause a discontinuity in
the tracks i.e. we might have F2, F3 and F4 but not F1. This, in turn, might not be desirable for
some applications like recognition tasks. To avoid such discontinuities we need to have enough
number of candidates for each segment. Hence, after the refinement process the number of the
candidates is checked and if less than the number of formants needed, the LP order will be
increased and the LP pole extraction will be repeated. This loop keeps going on until a sufficient
number of poles are achieved. We can increase the LP order by 2 each time so that it is expected
to have one more pole each time. However this might force the system to have at least one real
pole for every segment (assuming an odd primary order) which itself imposes a low pass
property to the speech segment. This might be desirable in the case that we have voiced
phonemes but since for recognition tasks the system needs to model unvoiced phonemes as well
as voiced ones the order is increased by one unit each time. As described in the next section this
model is capable to model the consonants (for recognition) even better than the voiced phonemes
which are expected to have formants. The number of poles after refinement might be larger than
the number of formants to be extracted. In this case a case a continuity criterion is used to choose
between the different combinations or sets of candidates. The continuity rule is based on the
Euclidean Distance (2-norm distance) between each set and the previous formant set chosen.
After the candidates are chosen every possible combination of them will be considered. Then the
distance of each candidate set from the previous chosen set will be calculated and the one with
the minimum distance will be chosen as the next formant set. If we assume we are to extract n
formants for each segment then equation below might be used as the continuity criterion to
choose the closest set to the previous one.
Where Ck is the kth
candidate set and Fi is the ith
formant set chosen. The initial condition for the
above recursive equation can be set to a set of mean values of the formants. After the next set of
formants are chosen the corresponding bandwidths and magnitudes of those, too, will be found
and augmented to the formant features. Using equations above the bandwidth and magnitude of
each formant can be calculated. This is, however, in the case that the features are meant to be
used in recognition purposes.
Figure 5 illustrates a sample formant tracking task done on a sentence uttered by a male speaker
to track five formants. There is a period of silence in the beginning and the end of the signal. The
initial silence part is of length of about 60 frames in which the tracks’ fluctuation is too much.
Focussing on F1, it is observed that the track is quite stable during most parts of the signal but
there are some instances of time when there is a sudden jump in the track. These jumps
correspond to the unvoiced phonemes where the signal is high pass.
Figure 5. Sample Formant tracks of a clean signal superimposed over its LP spectrogram
Finally Kalman filters are used to correct (smoothen) the tracks. The experiments show that
using Kalman filters improves the tracking in noisy conditions. The feature vectors for
recognition contain formant frequencies, bandwidths and spectrum magnitudes (and also the
delta and delta-delta values). A sample implementation of this method in Matlab can be found
here. Some results of the recognition and tracking are summarized below:
Figure 6 Recognition rate using MFCC and formant features with and without energy
Figure 7. The overall error percentage of formant tracks using different tracking methods
Figure 8. The recognition rates using dynamic and actual values of the formants
Figure 9. The recognition rate of the consonants using the same method of extracting the
features as formants