Mems finalr eport


Published on

Published in: Engineering, Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Mems finalr eport

  1. 1. Chapter 1 Introduction Micro-Electro-Mechanical System, MEMS, is a technology that in its most general form can be defined as a miniaturized mechanical and electro-mechanical element that are made using the techniques of micro fabrication. The critical physical dimensions of the MEMS Devices can vary from well below one micron on the lower end to several millimetres. Likewise the types of MEMS Devices can vary from relatively simple structures having no movements to extremely complex electromechanical systems with multiple moving elements under the control of integrated microelectronics. The one main criterion of MEMS is that there are at least some elements having some sort of mechanical functionality whether or not these elements can move. The term used to define MEMS varies in different parts of the world. In United States it is predominantly called MEMS, while in some other part of the world as Microsystems Technology. MEMS technology has enabled us to realize advanced micro devices by using processes similar to VLSI technology. When MEMS devices are combined with other technologies new generation of innovative technology will be created. This will offer outstanding functionality.MEMS has been identified as one of the most promising technologies for the 21st century and has the potential to revolutionize both industrial and consumer products by combining silicon based microelectronics with micromachining technology. Its techniques and micro system-based devices have the potential to dramatically affect of all our lives and the way we live. If semiconductor micro manufacturing was seen to be the first manufacturing revolution, MEMS is the second revolution. The functional elements of MEMS are miniaturized structures, sensors, actuators, and microelectronics. The most notable elements are the microsensors and microactuators. Microsensors and microactuators are appropriately categorized as “transducers”, which are defined as devices that convert energy from one form to another. In the case of microsensors, the device typically converts a measured mechanical signal into an electrical signal. Microsensors detect changes in the system’s environment by measuring mechanical, thermal, magnetic, chemical or electromagnetic information or phenomena. Microelectronics processes this
  2. 2. information and signals the microactuators to react and create some form of changes to the environment. Figure 2.1: different combination of opto-electro-mechanical systems.[1] Silicon integrated circuit industry is able to produce devices in volume with very high yield at low cost. Silicon has driven the semiconductor industry and allowed for stable reduction in size for more than 3 decades. In MEMS silicon technology is well established the possibility of integration with microelectronics on a single chip. While the device electronics are fabricated with IC chip technology, the micromechanical components are fabricated by sophisticated manipulation of silicon and other substrates using the micromachining processes.
  3. 3. Chapter 2 Manufacturing Process of MEMS Technology Today, MEMS have the capability to produce almost any type of electronic devices. To fully understand what MEMS are, basic of the MEMS manufacturing process, fabrication process, and their material compositions are important to know. 2.1. Materials MEMS are generally made from a material called polycrystalline silicon which is a common material also used to make integrated circuits. Frequently, polycrystalline silicon is doped with other materials like germanium or phosphate to enhance the materials properties. Sometimes, copper or aluminium is plated onto the polycrystalline silicon to allow electrical conduction between different parts of the MEMS devices. 2.2. Photolithography
  4. 4. Figure 3.1: Positive and Negative photo resist. [1] Photolithography is the basic technique used to define the shape of micro machine structures. The technique is essentially the same as that used in the microelectronics industry described. There are two types of photo resist, termed positive and negative photo resist. Where the ultraviolet light strikes the positive resist it weakens the polymer, so that when the image is developed the resist is washed away where the light struck it, transferring a positive image of the mask to the resist layer. The opposite occurs with the negative resist. Where the ultraviolet light strikes negative resist it strengthens the polymer, so when developed the resist that was not exposed to ultraviolet light is washed away, a negative image of the mask is transferred to the resist. A chemical is used to remove the oxide where it is exposed through the openings in the resist. Finally, the resist is removed leaving the patterned oxide. Figure 3.1 shows the thin film of some material(eg:silicon dioxide) on the substrate of some other material(eg:silicon wafer).It is desired that some of the silicon dioxide is selectively removed so that it only remains in particular areas on the silicon wafer. Firstly, a mask is produced.This will typically be a chromium pattern on a glass plate. The wafer is then coated
  5. 5. with a polymer which is sensitive to ultraviolet light called a photo resist. The photo resist is then developed which transfers the pattern on the mask to the photoresist layer. 2.3. Silicon Micromachining There are number of basic techniques that can be used to pattern thin films that have been deposited on a silicon wafer, and to shape the wafer itself, to form a set of basic microstructures (bulk micromachining). The techniques for depositing and patterning thin films can be used to produce quite complex microstructures on the surface of silicon wafer (surface silicon micromachining). Electrochemical etching techniques are being investigated to extend the set of basic silicon micromachining techniques. Silicon bonding techniques can also be utilized to extend the structures produced by silicon micromachining techniques into multilayer structures. Basic Techniques There are 3 basic techniques associated with silicon micromachining. They are: 1. Deposition of thin films of materials. 2. Removal of material by wet chemical etching. 3. Removal of material by dry chemical etching. Thin Films There are number of different techniques that facilitate the deposition or formation of very thin films of different materials on a silicon wafer. These films can then be patterned using photolithographic techniques and suitable etching techniques. Common materials include silicon dioxide, polycrystalline silicon and aluminium. The number of other materials can be deposited as thin films, including noble metals such as gold. Noble metals will contaminate microelectronic circuitry causing it to fail, so any silicon wafers with noble metals on them have to be processed using equipments specially set aside for the purpose. Noble metal films are often patterned by a method known as “lift off” rather than wet or dry etching.
  6. 6. Wet Etching Wet etching is a blanket name that covers the removal of material by immersing the wafer in a liquid bath of the chemical etchant. Wet etch ants fall into two broad categories; isotropic etchants and anisotropic etchants. Isotropic etchants attack the material being etched at the same rate in all directions. Anisotropic etchants attack the silicon wafer at different rates in different directions, and so there is more control of shapes produced. Some etchants attack silicon at different rates being on the concentration of impurities in the silicon. Figure 3.2: Isotropic and Anisotropic Etching. [1] Dry Etching The most common form of dry etching for micromachining applications is reactive ion etching. Ions are accelerated towards the material to be etched, and the etching reaction is enhanced in the direction of travel of ion. Reactive ion etching is an anisotropic etching technique. Deep trenches
  7. 7. and pits of arbitrary shape and with vertical walls can be etched in a variety of materials including silicon, oxide, and nitride. Lift off Lift off is a stencilling technique often used to pattern noble metal films. There are a number of different techniques. A thin film of assisting material (eg. oxide) is deposited. A layer of resist is put over this and patterned as for photolithography, to expose the oxide in the pattern desired for the metal. The oxide is then wet etched so as to undercut the resist. The metal is then deposited on the wafer, typically by a process known as evaporation. The metal pattern is effectively stencilled through the gaps in the resist, which is then removed lifting off the unwanted metal with it. The assisting layer is then stripped off through leaving the metal pattern alone. Excimer LASER Micromachining Excimer lasers produce relatively wide beams of ultraviolet laser light. One interesting application of these lasers is their use in micromachining organic materials (plastics, polymers, etc). This is because the excimer laser doesn't remove material by burning or vaporizing it, unlike other types of laser, so the material adjacent to the area machined is not melted or distorted by heating effects. When machining organic materials the laser is pulsed on and off, removing material with each pulse. The amount of material removed is dependent on the material itself, the length of the pulse, and the intensity (fluency) of the laser light. Below certain threshold fluency, dependent on the material, the laser light has no effect. As the fluency is increased above the threshold, the depth of material removed per pulse is also increased. It is possible to accurately control the depth of the cut by counting the number of pulses. Quite deep cuts (hundreds of microns) can be made using the excimer laser.
  8. 8. Figure3.3: Excimer laser Micromachining.[1] The shape of the structures produced is controlled by using chrome on quartz mask, like the masks produced for photolithography. In the simplest system the mask is placed in contact with the material being machined, and the laser light is shone through it. A more sophisticated and versatile method involves projecting the image of the mask onto the material. Material is selectively removed where the laser light strikes it.
  9. 9. Chapter 3 RF MEMS Micro Electro Mechanical systems (MEMS), particularly those with radio frequency (RF) applications, have demonstrated significantly better performance over current electromechanical and solid-state technologies. Surface roughness and asperity micro contacts are critical factors that can affect contact behaviour at scales ranging from the nano to the micro in MEMS devices. One of the major objectives in the design of RF MEMS with metal contacts is to have repeatable and reliable electrical contacts. However, the complexity of physical and mechanical interactions at micro contacts has made it extremely difficult to obtain accurate predictions of RF MEMS behaviour, such that reliable devices can be designed for significantly improved life cycles. Validated modelling methods can provide MEMS switch designers with insights on the evolution of contact pressures, inelastic deformations, potential failure modes, and micro structural behaviour of asperity micro contacts. Hence, guidelines can be incorporated in the design and fabrication process to effectively size critical components and forces to provide stable contact resistance for significantly improved device durability and performance. Compound solid state switches such as GaAs MOSFETs and PIN diodes are widely used in microwave and integrated circuits (ICs) for telecommunications applications including signal routing, impedance matching networks, and adjustable gain amplifiers. However, these solid- state switches have a large insertion loss (typically 1 dB) in the on state and poor electrical isolation in the off state. The recent developments of micro-electromechanical systems (MEMS) have been continuously providing new and improved paradigms in the field of microwave applications. Different configured micro machined miniature switches have been reported. Among these switches, capacitive membrane microwave switching devices present lower insertion loss, higher isolation, better nonlinearity and zero static power consumption.
  10. 10. Chapter 4 RF MEMS Switches Basically RF MEMS switches are of two configurations • RF series contact switch • RF shunt capacitive switch Currently, both series and shunt RF MEMS switch configurations are under development, the most common being series contact switches and capacitive shunt switches. 4.1. RF Series Contact Switch An RF series switch operates by creating an open or short in the transmission line, as shown in Figure 4.1. The basic structure of a MEMS contact series switch consists of a conductive beam suspended over a break in the transmission line. Application of dc bias induces an electrostatic force on the beam, which lowers the beam across the gap, shorting together the open ends of the transmission line1. Upon removal of the dc bias, the mechanical spring restoring force in the beam returns it to its suspended (up) position. Closed circuit losses are low (dielectric and I2R losses in the transmission line and dc contacts) and the open-circuit isolation from the ~100 μm gap is very high through 40 GHz. Because it is a direct contact switch, it can be used in low frequency applications without compromising performance.
  11. 11. Figure 4.1: Circuit equivalent of RF MEMS series contact switch.[2] 4.2. RF Shunt Capacitive Switch Figure 4.2: Circuit equivalent of RF MEMS shunt capacitive switch.[2] A circuit representation of a capacitive shunt switch is shown in Figure 4.3. In this case, the RF signal is shorted to ground by a variable capacitor. Specifically, for RF MEMS capacitive shunt switches, a grounded beam is suspended over a dielectric pad on the transmission line When the beam is in the up position, the capacitance of the line-dielectric-air-beam configuration is on the
  12. 12. order of ~50 fF, which translates to a high impedance path to ground through the beam [IC=1/(ῳC)]. However, when a dc voltage is applied between the transmission line and the electrode, the induced electrostatic force pulls the beam down to be coplanar with the dielectric pad, lowering the capacitance to pF levels, reducing the impedance of the path through the beam for high frequency (RF) signal and shorting the RF to ground. Therefore, opposite to the operation of the series contact switch, the beam in the up position corresponds to a low-loss RF path to the output load, while the beam in the down position results in RF shunted to ground and no RF signal at the output load. While the shunt configuration allows hot-switching and gives better linearity, lower insertion loss than the MEMS series contact switch, the frequency dependence of the capacitive reactance restricts high quality performance to high RF signal frequencies (5-100 GHz), whereas the contact switch can be used from dc levels. Chapter 5 Switch Design and Operation The geometry of a capacitive MEMS switch is shown in Fig.5.1. The switch consists of a lower electrode fabricated on the surface of the glass wafer and a thin aluminium membrane suspended over the electrode. The membrane is connected directly to grounds on either side of the electrode while a thin dielectric layer covers the lower electrode. The air gap between the two conductors determines the switch off-capacitance. With no applied actuation potential, the residual tensile stress of the membrane keeps it suspended above the RF path. Application of a DC electrostatic field to the lower electrode causes the formation of positive and negative charges on the electrode and membrane conductor surfaces. These charge exhibit an attractive force which, when strong enough, causes the suspended metal membrane to snap down onto the lower electrode and dielectric surface, forming a low impedance RF path to ground.
  13. 13. The switch is built on coplanar waveguide transmission lines, which have an impedance of 50Ω that matches the impedance of the system. The width of the transmission line is 160 m and the gap between the ground line and signal line is 30 m. The insertion loss is dominated by the resistive loss of the signal line and the coupling between the signal line and the membrane when the membrane is in the up position. To minimize the resistive loss, a thick layer of metal needs be used to build the transmission line. The thicker metal layer results in a bigger gap that reduces the coupling between signal and ground yet also requires higher voltage to actuate the switch. To achieve a reasonable actuation voltage, a 4m thick copper is used as the transmission line. The glass wafer is chosen for the RF switch over a semi-conductive silicon substrate since typical silicon wafer is too lossy for RF signal. When the membrane is in the down position, the electrical isolation of the switch mainly depends on the capacitive coupling between the signal line and ground lines. The dielectric layer plays a key role for the electrical isolation. The smaller the thickness and the smoother the surface of the dielectric layer, the better isolation of the switch is. But there is another trade-off here. When the membrane is pulled down, the biased voltage is directly applied across the dielectric layer. Since this layer is very thin, the electric field within the dielectric layer is very high. The thickness of the dielectric layer should be chosen such that the electric field will never exceed the breakdown electric field of the dielectric material. The silicon nitride film has breakdown electric field as high as several mega-volts per centimetre and can be utilized as dc block dielectric layer. The thickness of the silicon nitride layer is chosen as 0.2 m to accomplish the dc block and RF coupling purpose.
  14. 14. Figure 5.1: Capacitive RF MEMS switch. (Top and cross-sectional view).[4] Fabrication The switches were fabricated by surface micro-machining techniques with a total of four masking level. No critical overlay alignment was required. Fig. shows the essential process steps: 1. Ti/Cu seed layer deposition: The starting substrate was a 2-inch glass wafer. A layer of titanium (0.05 m) and copper (0.15 m) was sputtered on the substrate as seed layer for electroplating.
  15. 15. 2. Silicon nitride deposition: A layer of silicon nitride (0.2 m) was deposited and patterned as DC block and reactive ion etch. 3. Copper electroplating: A photo resist layer was spin coated and patterned to define the electroplating area. Then, a 4m thick copper layer was electroplated to define the coplanar waveguide and the posts for the membranes. 4. Aluminium deposition: A layer of aluminium (0.4 m) was deposited by using electron beam evaporation and patterned to form the top electrode in the actuation capacitor structure. 5. Release: The photo resist sacrificial layer was removed to finalize the switch structure. The major characteristics of the switch are the insertion loss when the signals pass through and the isolation when signals are rejected. In the off-state the RF signal passes underneath the membrane without much loss. In the on-state, between the central signal line and coplanar waveguide grounds exists a low impedance path through the bended membrane. The RF signal will be reflected by the switch. The resonant frequency of 23.4 GHz was observed when the membrane was in the down position. This means that the switch can be equivalently modelled as a capacitor, inductor and resistor connected in series between the signal and ground lines. Since the switch has a better isolation around the resonant frequency, it can be designed such that the desired frequency overlaps with the resonant frequency by adjusting the geometry of the switch. The actuation voltage of the MEMS switch is about 50V. The spring constant of the membrane and the distance between the membrane and the bottom electrode determines the actuation voltage of the switch. The spring constant of the membrane is mainly determined by the membrane material properties, the membrane geometry, and the residual stress in the membrane. CMOS-based monolithic MEMS technology proposed to solve many of the problems. It consists of masks processing after the completion of standard CMOS processing flow. The goal is to minimize the issues caused by mechanical stresses in micro machined layers by supporting them with a patterned polyamide substrate and at the same time form thick conductors to lower the conductor losses. The enabling processing techniques are thick-film processing, Stress- compensation, electroplating. The process starts with a standard CMOS process flow. The complementary masks are fabricated through an independent mask maker.
  16. 16. Figure 5.2:Process flow (a) Seed layer deposition (b)Dielectric layer deposition and patterning(c) Spacer coater and patterning(d)Transmission line electroplating(e)Membrane deposition and patterning(f)membrane releasing.[1]
  17. 17. Chapter 6 General Reliability Concerns 6.1. Metal Contact Resistance (Series Contact Switches) Series contact switches tend to fail in the open circuit state with wear. Even though the bridge is collapsing and making contact with the transmission line, the conductivity of the contact metallization area decreases until unacceptable levels of power loss are achieved. These increases in resistivity of the metal contact layer over cycling time may be attributed to frictional wear, pitting, hardening, non-conductive skin formation, and/or contamination of the metal. Pitting and hardening can be reduced by decreasing the contact force during actuation. But tailoring the design to minimize the effect involves balancing operational conditions (contact force, current, and temperature), plastic deformation properties, metal deposition method, and switch mechanical design. In other cases, the resistivity of the contact increases with use due to the formation of a thin dielectric layer on the surface of the metal. While this has been documented, the underlying physical mechanisms are not currently well understood. As the RF power level is raised above 100 mW, the aforementioned failures are exacerbated by the increased temperature at the contact area and, under hot-switching conditions, arcing and microwelding between the metal layers. 6.2. Dielectric Breakdown (Shunt Capacitive Switches) Shunt capacitive switches often fail due to charge trapping, both at the surface and in the bulk states of the dielectric. Surface charge transfer from the beam to the dielectric surface results in the bridge getting stuck in the up position (increased actuation voltage). Bulk charge trapping, on
  18. 18. the other hand, creates image charges in the bridge metallization and increases the holding force of the bridge to a value above its spring restoring force. There are several actions that can be taken to mitigate dielectric charging in the design phase, including choosing better dielectric material and designing peripheral pull-down electrodes to decouple the actuation from the dielectric behaviour at the contact. Unlike series contact switches, capacitive shunt switches do not experience hard failures at RF power levels > 100 mW, as long as the bridge contact metallization is thick enough to handle the high current densities. However, RF power may be limited in some cases by a recoverable failure, self-actuation. While not yet fully understood, it has been observed that a capacitive shunt switch will self-actuate at 4W of RF power and experience latch-up (stuck in down position) in hot-switching mode at 500 mW. Even though these “failures” are recoverable, the switch operates normally if the RF power is decreased below the latch-up value of 500 mW, they still illustrate a lifetime consideration for high power applications. 6.3. Radiation and Other Effects There are some areas of RF MEMS reliability research that have not been investigated in detail and are in need of immediate attention. For example, RF MEMS series contact switches were thought to be immune to radiation effects, design-dependent charge separation effects in the pull- down electrode dielectric material, which noticeably decreases the actuation voltage of the device. This immediately begins the question of how radiation effects will accelerate the dielectric material failure mechanisms of capacitive switches, Which have known dielectric failure mechanisms or other series switches that utilize dielectric material in their electrode structures.
  19. 19. Chapter 7 Comparison of MEMS Switches with Solid State Switches RF switches are used in a wide array of commercial, aerospace, and defence application areas, including satellite communications systems, wireless communications systems, instrumentation, and radar systems. In order to choose an appropriate RF switch for each of the above scenarios, one must first consider the required performance specifications, such as frequency bandwidth, linearity, power handling, power consumption, switching speed, signal level, and allowable losses. Traditional electromechanical switches, such as waveguide and coaxial switches, show low insertion loss, high isolation, and good power handling capabilities but are power-hungry, slow, and unreliable for long-life applications. Current solid-state RF technologies (PIN diode- and FET- based) are utilized for their high switching speeds, commercial availability, low cost, and ruggedness. Their inherited technology maturity ensures a broad base of expertise across the industry, spanning device design, fabrication, packaging, applications system insertion and, consequently, high reliability and well-characterized performance assurance. Some parameters, such as isolation, insertion loss, and power handling, can be adjusted via device design to suit many application needs, but at performance cost elsewhere. For example, some commercially available RF switches can support high power handling, but require large, massive packages and
  20. 20. high power consumption. Table 7.1. shows a comparison of MEMS, PIN-diode and FET switch parameters. Table 7.1:Comparision of MEMS Switches with Solid State Switches. Parameter RF MEMS PIN DIODE FET Voltage(mV) 20-80 3-5 3-5 Current(mA) 0 0-20 0 Powerconsumption(mW) 0.5-1 5-100 -0.5-0.1 Switching 1-300µS 1-100ns 1-100ns Power Handling(W) <1 <10 <10 In spite of this design flexibility, two major areas of concern with solid-state switches persist, Breakdown of linearity and frequency bandwidth upper limits. When operating at high RF power, nonlinear switch behaviour leads to spectral regrowth, which smears the energy outside of its allocated frequency band and causes adjacent channel power violations as well as signal to noise problems. The other strong driving mechanism for pursuing new RF technologies is the fundamental degradation of insertion loss and isolation at signal frequencies above 1-2 GHz. By utilizing electromechanical architecture on a miniature (or micro) scale, MEMS RF switches combine the advantages of traditional Electromechanical switches (low insertion loss, high isolation, and extremely high linearity) with those of solid-state switches (low power consumption, low mass, long lifetime). RF MEMS switches are slower and have lower power handling capabilities. All of these advantages, together with the potential for high reliability long lifetime operation make RF MEMS switches a promising solution to existing low-power RF technology limitations.
  21. 21. Chapter 8 Advantages of MEMS There are many advantages of using MEMS rather than ordinary large scale machinery. • Ease of production. • MEMS can be mass-produced and are inexpensive to make. • Ease of parts alteration. • Higher reliability than their macro scale counterparts. • IC technology used: Integrated multiple and more complex functions on a chip, to form monolithic systems. Miniaturization with no loss of functionality, improved performance. • Basic fabrication: Reduced manufacturing cost and time. • Micro components make the system faster, more reliable, more portable,low power consumption, easily and massively employed, easily maintained and replaced. • Easy to integrate into systems and modify.
  22. 22. • Little harm to the Environment and can be incorporating. Disadvantages of MEMS • Due to their size, it is physically impossible for MEMS to transfer any significant power. • MEMS are made up of Poly-Si (a brittle material), so they cannot be loaded with large forces. • Standard IC packing cannot be used because of the moving parts the MEMS structure. • Many standard production steps that improve the mechanical structure that degrade the Electronics and vice versa. • The unavailability of the standard design software. Chapter 9 Application of MEMS • Inertial navigation units on a chip for munitions guidance and personal navigation. • Electromechanical signal processing for ultra-small and ultra low-power wireless communications. • Distributed unattended sensors for asset tracking, environmental monitoring, and security surveillance. • Integrated fluidic systems for miniature analytical instruments, propellant, and combustion control. • Weapons safing, arming, and fusing. Embedded sensors and actuators for condition- based maintenance. • Mass data storage devices for high density and low power.
  23. 23. Chapter 10 Conclusion Low power consumption, low insertion loss, high isolation, excellent linearity and the ability to be integrated with other electronics all make MEMS switches an attractive alternative to mechanical and solid state switches. These switches will have applications in phase antenna arrays, in MEMS impedance matching networks and in communications applications. MEMS which are going to be the future of the modern technical field in the growth of micro sensor based applications such as automotive industries, wireless communication, security systems, bio medical instrumentation and in armed forces.
  24. 24. RF MicroElectroMechanical systems (MEMS) technology has been proven to be one of the most valuable technologies for low-loss, low-power microwave components and systems’ applications for telecommunications. Developments in this technology have made possible the design and fabrication of control devices suitable for switching microwave signals. Furthermore, RF MEMS switches offer superior performance such as high isolation, low insertion loss, and low power consumption compared to conventional FET or PIN diodes. MEMS is an emerging technology which uses the tools and technologies that were developed for the IC industry to build microscopic machines, which are build on a standard microscopic silicon wafers. In summary, a low-cost, high-performance, RF MEMS technology compatible with CMOS and high-voltage devices. High-performance RF MEMS switch, high voltage MOSFET, and CMOS devices were all integrated on the same chip. References [1] Sazzadur Choudhury, M. Ahmadi, and W.C. Miller, “Micromechanical system for System- on-Chip Connectivity”, IEEE Circuits and Systems, Page(s) 112-132 September 2002 [2] J. B. Muldavin, G. M. Rebeiz, "High Isolation RF MEMS Shunt Switches-Part 2: Design", IEEE Tran. On Microwave Theory and Techniques, Vol.6, Page(s): 253-276. June 2000,
  25. 25. [3] P. Osterberg, H. Yie, X. Cai, J. White, and S. Senturia, “Self-consistent simulation and modeling of electrostatic ally deformed diaphragms,“ in Proc. IEEE MEMS Conf. January 1994, Page (s)28-32. [4] Gopinath. A and Ranklin.JB, IEEE Transaction on Electronic development, GaAs FET RF switches “, vol. 12, Page(s) 18-37, August 2003 /----------------------------------------------------------------------------------------------- Formant Extraction and Speech Recognition Formant features can be interpreted as adaptive non-uniform samples of the signal spectrum that are located in the resonance frequencies of the vocal tract and normally happen to have higher signal-to-noise ratios than the other parts. The number and the position of these frequencies along the frequency axis might differ depending on the phonemes and the position of the window along the phoneme (i.e. beginning or ending part of a phoneme). Along with the formants (the resonance frequency), we might use the bandwidth and/or magnitude of the spectrum in that particular frequency to encode the properties of the speech and use them in different applications such as speech recognition, enhancement, noise reduction, hearing aid adaptive filters, etc. There are several methods of formant extraction such as peak picking, HMM2 and LP model pole extraction. The main method used in this work is the LP model pole extraction combined with a rule based method for pole refinement. This method unexpectedly results in high recognition rates for unvoiced phonemes which do not have any formants at all. Figure 1. The LP model of a signal and the segment of the signal in time domain. A) Is the LP model frequency response where the + and * correspond to the position of the formants along frequency axis. B) In the time domain we can see that there is a kind of periodicity in the signal. Figure 1 illustrates the frequency spectrum of the LP model of a segment of speech signal. The LP model is the Linear Prediction model where it is assumed that the signal is predictable from a limited number of its past values:
  26. 26. Where ak’s are the Linear Prediction Coefficients (LPC), e(m) is the error of prediction and x(m) is the signal. In the z domain or This filter is an all pole filter, since the numerator of its transfer function is a constant. The input to the system is the error function which can be interpreted as the unpredictable part of the signal or excitation which derives the all-pole system. The characteristics of a speech signal varies with time since it is a sequence of different phonemes with different frequency characteristics combined with pauses and periods of silence. To extract these characteristics we need to chop the signal into segments which are more stationary and have some predictable behaviour across time and frequency. These segments however should overlap to avoid the effect of discontinuity. To extract the formant trajectory of the signal we need first to chop the signal into these overlapping segments and pre-process them. The pre-processing is basically something called windowing. To window a segment is to multiply it by another segment of the same length which usually has its maximum in the middle and smooth endings. This is to minimize the effect that chopping the signal has in the edges. These windowed segments are then linear predicted so that we will have a set of linear prediction coefficients for each segment that yields to the same formulation in equations above. The frequencies of the complex poles of HLP(z) are the candidate frequencies for formants since the poles in a system model the resonances in that system. The bandwidth of the formants and the magnitude of the LP model are two other features usually extracted and used in speech processing. If z1 is a complex pole in HLP(z), then the features of that pole is calculated using: Where is the sampling frequency, F is the formant BW is the 3dB bandwidth of the spectrum in that frequency and M is the magnitude of the spectrum in that frequency. The effect of noise can be measured in different SNR for each phoneme. This could be done using labelled speech signals where the boundaries of the phonemes are given or calculated.
  27. 27. Figure2. the effect of noise on the distribution of the pole frequencies Figure 2 illustrates the effect of train noise with SNR=0dB on the pole frequencies’ distribution. The red (dashed) curve is the histogram of the pole frequencies of different phonemes in 0dB noise and the blue (solid) curve is that of clean signal. The data where extracted using 130 sentences uttered by an American male speaker. The train noise where recorded in real situation on a train in London with a sampling frequency of 8Khz. The spectrum of the noise is illustrated in figure 2. Figure 3. the spectrum of the train noise fs=8000 Formant tracking in Noise The next task to do is to actually track the formants, whether in noisy or clean conditions. So far we have extracted the poles of the LP model of the overlapping segments of the speech signal, and calculated their bandwidth and frequency. These are the candidates that might be chosen from to form the formant tracks. However, there are other criteria which will be used to refine these candidates and find the desired tracks. These conditions include limitation of frequency and bandwidth as well as continuity. This makes the method a rule-based method or algorithm for formant extraction. These rules are actually based on our knowledge of speech signals and the formants. The method used in this work is a variable LP order rule based method which is discussed below. Variable LP Order Rule Based Formant Tracking Figure 4 illustrates the block diagram of the program’s different modules and their interrelation. First the speech signal enters the pre-processing module which any we might have the pre- emphasis there. The signal is chopped to overlapping segments of length 25ms. These segments
  28. 28. have usually 15ms overlap with each other. The window type used in this method is hamming window of the same length of the segments (400 samples in 16 KHz). Figure 4. Variable LP Order Formant Tracker After a segment is ready and pre-processed the LP coefficients of it will be calculated. The primary LP order is set 11 or 13 so that we will usually have 5 pair of complex poles which introduce the resonances of the system. The poles are then sorted regarding to their frequencies, the real valued poles will be eliminated and only one of each pair will be picked since they both introduce the same frequency and bandwidth. The set of frequencies and bandwidths will go through the Rule based refinement then where some of them might be eliminated due to the criteria used. The first criterion is the maximum frequency which determines a frequency that is the maximum frequency possible for the last formant. Any poles with a higher frequency than that will be eliminated. One simple value that might be used is the number of formant × 1000 e.g. the fourth formant has a maximum of 4000. The other criterion is bandwidth limitation which limits the bandwidth of the poles to a certain limit determined by the behaviour of speech formants. This criterion is set to avoid poles with large bandwidths, which normally do not represent the formants. The threshold is set to 600 Hz in the program so that the poles with bandwidths larger than 600 Hz will be eliminated. These poles might be due to the noise in that segment because noise poles, although not representing a resonance normally, but might be modelled with one or more poles centred in a relatively large range of frequency where the most amount of noise energy is concentrated in. This technique also might help to distinguish between two poles that are so near to each other that are merged into one single pole with relatively large bandwidth. The first question one might ask is that if the pole, which is due to the combination of two poles, is eliminated how we can bring the two merged poles into calculation? The answer to this question is why we use variable LP order formant tracking. After the rule based refinement/elimination of the poles we might end up with only a few poles which might not be sufficient for the rest of the process if we are to have a fixed pre-determined number of formants. For example if we use a primary order of 13 we might have a maximum number of 6 poles. Now suppose we eliminate 3 of these poles in the refinement process while we are going to need at least 4 poles (to extract 4 formants). This might cause a discontinuity in the tracks i.e. we might have F2, F3 and F4 but not F1. This, in turn, might not be desirable for some applications like recognition tasks. To avoid such discontinuities we need to have enough number of candidates for each segment. Hence, after the refinement process the number of the candidates is checked and if less than the number of formants needed, the LP order will be increased and the LP pole extraction will be repeated. This loop keeps going on until a sufficient number of poles are achieved. We can increase the LP order by 2 each time so that it is expected to have one more pole each time. However this might force the system to have at least one real
  29. 29. pole for every segment (assuming an odd primary order) which itself imposes a low pass property to the speech segment. This might be desirable in the case that we have voiced phonemes but since for recognition tasks the system needs to model unvoiced phonemes as well as voiced ones the order is increased by one unit each time. As described in the next section this model is capable to model the consonants (for recognition) even better than the voiced phonemes which are expected to have formants. The number of poles after refinement might be larger than the number of formants to be extracted. In this case a case a continuity criterion is used to choose between the different combinations or sets of candidates. The continuity rule is based on the Euclidean Distance (2-norm distance) between each set and the previous formant set chosen. After the candidates are chosen every possible combination of them will be considered. Then the distance of each candidate set from the previous chosen set will be calculated and the one with the minimum distance will be chosen as the next formant set. If we assume we are to extract n formants for each segment then equation below might be used as the continuity criterion to choose the closest set to the previous one. Where Ck is the kth candidate set and Fi is the ith formant set chosen. The initial condition for the above recursive equation can be set to a set of mean values of the formants. After the next set of formants are chosen the corresponding bandwidths and magnitudes of those, too, will be found and augmented to the formant features. Using equations above the bandwidth and magnitude of each formant can be calculated. This is, however, in the case that the features are meant to be used in recognition purposes. Figure 5 illustrates a sample formant tracking task done on a sentence uttered by a male speaker to track five formants. There is a period of silence in the beginning and the end of the signal. The initial silence part is of length of about 60 frames in which the tracks’ fluctuation is too much. Focussing on F1, it is observed that the track is quite stable during most parts of the signal but there are some instances of time when there is a sudden jump in the track. These jumps correspond to the unvoiced phonemes where the signal is high pass. Figure 5. Sample Formant tracks of a clean signal superimposed over its LP spectrogram Finally Kalman filters are used to correct (smoothen) the tracks. The experiments show that using Kalman filters improves the tracking in noisy conditions. The feature vectors for
  30. 30. recognition contain formant frequencies, bandwidths and spectrum magnitudes (and also the delta and delta-delta values). A sample implementation of this method in Matlab can be found here. Some results of the recognition and tracking are summarized below: Figure 6 Recognition rate using MFCC and formant features with and without energy components Figure 7. The overall error percentage of formant tracks using different tracking methods Figure 8. The recognition rates using dynamic and actual values of the formants
  31. 31. Figure 9. The recognition rate of the consonants using the same method of extracting the features as formants Back m/stats?ADWt5AXv48CO/Y8rqvILIpsSArhw