2.
DAFX - Digital Audio EffectsUdo Zolzer, EditorUniversity of the Federad Armed Forces, Hamburg, GermanyXavier AmatriainPompeu Fabra University, Barcelona, SpainDaniel ArfibCNRS - Laboratoire de Mkcanique et d’Acoustique, Marseille, FranceJordi BonadaPompeu Fabra University, Barcelona, SpainGiovanni De PoliUniversity of Padova, ItalyPierre DutilleuxLiibeck, GermanyGianpaolo EvangelistaSwiss Federal Institute of Technology Lausanne (EPFL), SwitzerlandFlorian KeilerUniversity of the Federal Armed Forces, Hamburg, GermanyAlex LoscosPompeu Fabra University, Barcelona, SpainDavide RocchessoUniversity of Verona, ItalyMark SandlerQueen Mary, University of London, United KingdomXavier SerraPompeu Fabra University, Barcelona, SpainTodorTodoroffBrussels, Belgium0JOHNWILEY & SONS, LTD
3.
Copyright 02002 by John Wiley & Sons, LtdBaffins Lane, Chichester,West Sussex, P O 19 lUD, EnglandNational 01243 779777International (t44) 1243 779777e-mail (for orders and customer service enquiries): cs-books@wiley.co.ukVisit our Home Page on http://www.wiley.co.ukor http://www.wiley.comAll Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system,or transmitted,inanyform or by anymeans,electronic,mechanical,photocopying,recording,scanning or otherwise, except under the terms of the Copyright Designs and Patents Act 1988 orunder the termsof a licence issued by the Copyright Licensing Agency, 90 Tottenham Court Road,London, W1P OLP, UK, without the permission in writing of the Publisher, with the exceptionof any material supplied specifically for the purpose of being entered and executed on a computersystem, for exclusive use by the purchaser of the publication.Neither the authors nor John Wiley & Sons, Ltd accept any responsibility or liability for loss ordamage occasioned to any person or property through using the material, instructions, methodsor ideas contained herein, or acting or refraining from acting as a result of such use. The authorsand Publisher expressly disclaim all implied warranties, including merchantability of fitness forany particular purpose. There will be no duty on the authorsof Publisher to correct any errors ordefects in the software.Designations used by companiestodistinguishtheirproductsareoftenclaimed as trademarks.In all instances where John Wiley & Sons, Ltd is aware of a claim, the product names appear ininitial capital or capital letters. Readers, however, should contact the appropriate companies formore complete information regarding trademarks and registration.Other Wiley Editorial OfficesJohn Wiley & Sons, Inc., 605 Third Avenue,New York, NY 10158-0012, USAWILEY-VCH Verlag GmbHPappelallee 3, D-69469 Weinheim, GermanyJohn Wiley & Sons Australia, Ltd, 33 Park Road, Milton,Queensland 4064, AustraliaJohn Wiley & Sons (Canada) Ltd, 22 Worcester RoadRexdale, Ontario, M9W 1L1, CanadaJohn Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01,Jin Xing Distripark, Singapore 129809British Library Cataloguingin Publication DataA catalogue record for this book is available from the British LibraryISBN 0 471 49078 4Produced from PostScript files supplied by the author.Printed and bound in Great Britain by Biddles Ltd, Guildford and King’s Lynn.This book is printed on acid-free paper responsibly manufactured from sustainable forestry,in which at least two trees are planted for each one used for paper production.
13.
xivPrefaceDAFX is a synonym for digital audio effects. It is also the name for a Europeanresearchproject for co-operationand scientific transfer, namelyEU-COST-G6“DigitalAudio Effects” (1997-2001). It was initiated by DanielArfib (CNRS,Marseille). Inthe pastcouple of years we have had four EU-sponsored internationalworkshops/conferences onDAFX, namely, inBarcelona(DAFX-98l),Trondheim(DAFX-9g2),Verona (DAFX-003), and Limerick (DAFX-014). A variety of DAFXtopics have been presented by international participants at these conferences. Thepapers can be found on the corresponding web sites.This book not only reflects these conferences and workshops, it is intended as aprofound collection and presentation of the main fields of digital audio effects. Thecontents and structureof the book wereprepared by a special book work group anddiscussed in severalworkshopsover thepastyearssponsored by the EU-COST-G6 project. However, the single chapters are the individualwork of the respectiveauthors.Chapter 1 gives an introduction to digital signal processing and shows softwareimplementations with the MATLAB programming tool. Chapter 2 discusses digi-tal filters for shaping the audio spectrum and focuses on the main building blocksfor thisapplication.Chapter 3 introducesbasicstructures for delaysand delay-based audio effects. In Chapter 4 modulators and demodulators are introduced andtheir applications to digital audio effects are demonstrated. The topic of nonlinearprocessing is the focus of Chapter 5. First, we discuss fundamentals of dynamicsprocessingsuch aslimiters,compressors/expandersandnoisegatesandthen weintroduce the basics of nonlinearprocessors for valve simulation, distortion, har-monic generatorsandexciters.Chapter 6 covers the wide field of spatial effectsstarting with basic effects, 3D for headphones and loudspeakers, reverberation andspatial enhancements. Chapter 7deals with time-segment processingand introducestechniques for variablespeed replay, time stretching, pitch shifting, shuffling andgranulation. In Chapter 8 we extend the time-domain processing of Chapters 2-7.We introducethefundamentaltechniques for time-frequencyprocessing,demon-strate several implementation schemes and illustrate thevariety of effects possiblein the two-dimensional time-frequency domain. Chapter 9covers the field of source-filter processing where the audio signal is modeled as a source signal and a filter.We introducethreetechniques for source-filter separationand show source-filtertransformations leading to audioeffects such as cross-synthesis, formant changing,spectral interpolation and pitch shifting with formant preservation. The endof thischapter coversfeature extraction techniques. Chapter10deals with spectral process-ing where the audio signal is represented by spectral models such as sinusoids plusa residual signal. Techniques for analysis, higher-level feature analysis and synthesisare introduced and a variety of new audio effects based on these spectral models‘http://www.iua.upf.es/dafxgd2http://www.notam.uio.no/dafx993http://profs.sci.univr.it/^dafx4http://www.csis.ul.ie/dafx01
14.
Preface XVare discussed. Effect applications range from pitch transposition, vibrato, spectralshape shift, gender change to harmonizer and morphing effects. Chapter 11 dealswith fundamentalprinciples of time andfrequency warping techniquesfor deformingthe time and/or the frequency axis. Applications of these techniques are presentedfor pitch shifting inharmonic sounds, inharmonizer, extractionof excitation signals,morphing andclassical effects. Chapter 12deals with the controlof effect processorsranging from general control techniquesto control based on sound features andges-tural interfaces. Finally, Chapter 13 illustrates new challenges of bitstream signalrepresentations, shows the fundamental basics and introduces filtering concepts forbitstream signal processing. MATLAB implementations in several chapters of thebook illustrate software implementationsof DAFX algorithms. TheMATLAB filescan befound on the web site h t t p ://www .dafx.de.I hope the reader will enjoy the presentation of the basic principles of DAFXin this book and will be motivated to explore DAFX with the help of our softwareimplementations. The creativity of a DAFX designer can only grow or emerge ifintuition and experimentation are combined with profound knowledge of physicaland musical fundamentals. The implementation of DAFX in software needs someknowledge of digital signal processing and this is where this book may serve as asource of ideas and implementation details.AcknowledgementsI would liketo thank the authorsfor their contributionsto the chapters andalso theEU-Cost-G6 delegates from all over Europe for their contributions during severalmeetingsand especially NicolaBernadini,JavierCasajus,MarkusErne, MikaelFernstrom, Eric Feremans, Emmanuel Favreau, Alois Melka, Jmran Rudi, and JanTro. The book cover is based on a mapping of a time-frequency representation of amusical piece onto the globe by Jmran Rudi. Jmran has also published a CD-ROM5for making computer music “DSP-Digital Sound Processing”,which may serve as agood introduction to sound processing and DAFX. Thanks to Catja Schumann forherassistance in preparingdrawingsandformatting,ChristopherDuxburyfor proof-reading and Vincent Verfaille for comments and cleaning up the codelinesof Chapters 8 to 10. I also express my gratitude tomy staff members Udo Ahlvers,Manfred Chrobak,Florian Keiler, HaraldSchorr,andJorg Zeller at the UniBwHamburg for providing assistance during the course of writing this book. Finally,I would like to thank Birgit Gruber, Ann-Marie Halligan, Laura Kempster, SusanDunsmore, and Zoe Pinnock from John Wiley & Sons, Ltd for their patience andassistance.My special thanks are directed tomy wife Elke and our daughterFranziska.Hamburg,March 2002UdoZolzer
15.
xviList of ContributorsXavier Amatriainwas born in Barcelona in1973. He studied TelecommunicationsEngineering at the UPC(Barcelona) where he graduated in 1999. In the sameyearhe joined the Music Technology Group in the Audiovisual Institute (Pompeu FabraUniversity). He is currentlyalecturer atthesame universitywhereheteachesSoftwareEngineering and AudioSignalProcessing and is also a PhD candidate.His past research activities include participation in the MPEG-7 development taskforce as well as projectsdealingwithsynthesiscontrolandaudioanalysis. Heiscurrently involved in research in the fields of spectral analysis and the developmentof new schemes for content-based synthesis and transformations.Daniel Arfib (born 1949) received his diploma as “inghieur ECP”from the EcoleCentrale of Paris in 1971 and is a “docteur-inghieur” (1977) and “docteur es sci-ences”(1983)from the Universitk of Marseille 11. After a few years in educationor industry jobs, he has devoted his work to research, joining the CNRS (NationalCenter for Scientific Research) in 1978 at the Laboratoryof Mechanics and Acous-tics (LMA) of Marseille (France). His main concern is to provide a combination ofscientific and musical points on views on synthesis, transformation and interpreta-tion of sounds usingthe computer as a tool, both asa researcher and a composer. Asthe chairman of the COST-G6 action named “Digital AudioEffects” he has been inthe middle of a galaxy of researchers working on this subject.He also has a stronginterest in the gesture and sound relationship, especially concerningcreativity inmusical systems.Jordi Bonada studied Telecommunication Engineeringat the CatalunyaPolytech-nic University of Barcelona (Spain) and graduated in 1997. In 1996 he joined theMusic Technology Group of the Audiovisual Institute of the UPF as a researcherand developer in digitalaudio analysis and synthesis. Since 1999 hehas beenalecturer at the same university where heteachesAudioSignalProcessing and isalsoa PhD candidate in Informatics and Digital Communication. Heis currentlyinvolved in research in the fields of spectral signal processing, especially in audiotime-scaling and voice synthesis and modeling.Giovanni De Poli is an Associate Professor of Computer Science at the Depart-ment of Electronics and Informatics of the University of Padua, where he teaches“Data Structures and Algorithms” and “Processing Systems for Music”. Heis theDirector of theCentrodi Sonologia Computazionale(CSC) of the University ofPadua. He is a member of the Executive Committee (ExCom) of the IEEE Com-puter Society Technical Committee on Computer Generated Music, member of theBoard of Directors of AIM1 (Associazione Italiana di InformaticaMusicale), memberof the Board of Directors of CIARM (Centro Interuniversitariodi Acustica e RicercaMusicale), member of the Scientific Committee of ACROE (Institut National Po-litechniqueGrenoble), and Associate Editor of the International Journal of NewMusic Research. His main research interests are in algorithms for sound synthesisand analysis, models for expressiveness in music, multimedia systems and human-computerinteraction,andthe preservation andrestoration of audiodocuments.Heis the author of several scientific international publications, and has served in
16.
List of Contributors xviithe Scientific Committees of international conferences. He is coeditor of the booksRepresentations of Music Signals, MIT Press 1991, and Musical Signal Processing,Swets & Zeitlinger, 1996. Systems andresearch developed in his lab have been ex-ploited in collaboration with digital musical instruments industry (GeneralMusic).He is the owner of patents on digital music instruments.Pierre Dutilleux graduated in thermal engineeringfrom the EcoleNationaleSupkrieure des Techniqueshdustrielles et des Mines de Douai (ENSTIMD) in 1983and in information processing from the Ecole Nationale Supkrieure d’Electroniqueet de Radioklectricitk deGrenoble (ENSERG)in 1985. He developed audioandmusicalapplications for the Syter real-time audio processingsystemdesigned atINA-GRM by J.-F. Allouis. After developing a set of audio processing algorithmsas well as implementing the first wavelet analyzer on a digital signal processor, hegot a PhD in acoustics and computermusic from the University of Aix-Marseille I1in 1991 under the direction of J.-C. Risset. From 1991 through 2000 he worked asa research and development engineer at the ZKM (Center for Art and Media Tech-nology) in Karlsruhe. There he planned computer and digital audio networks for alarge digital audio studio complex, and he introduced live electronics and physicalmodeling as toolsfor musical production. He contributed to multimediaworks withcomposers such as K. Furukawa and M. Maiguashca. He designed and realized theAML (Architecture and Music Laboratory) as an interactive museum installation.He is a German delegateon the DigitalAudio Effects (DAFX) project.He describeshimself as a “digital musical instrument builder”.Gianpaolo Evangelista received the laurea in physics (summa cum laude) fromthe University of Naples, Italy, in 1984 and the MSc and PhD degrees in electricalengineering fromthe University of California, Irvine, in 1987 and 1990,respectively.Since1998hehasbeena Scientific Adjunct with the Laboratory for AudiovisualCommunications, Swiss Federal Instituteof Technology, Lausanne, Switzerland, onleave from the Department of Physical Sciences, University of Naples Federico 11,which he joined in 1995 as a Research Associate. From 1985 to 1986, he worked atthe Centre d’Etudes de Ma.thematique etAcoustique Musicale (CEMAMu/CNET),Paris, France,where he contributed to the development of a DSP-based sound syn-thesis system, andfrom 1991to 1994, hewas a Research Engineerat theMicrograv-ity Advanced Researchand Support (MARS) Center,Naples, where hewas engagedin research in image processing applied to fluid motion analysis and material sci-ence. His interests include speech, music, and image processing; coding; wavelets;and multirate signal processing. DrEvangelista was a recipient of theFulbrightfellowship.FlorianKeiler was born in Hamburg,Germany,in1972. He studiedelectricalengineering at the TechnicalUniversityHamburg-Harburg. As part of the studyhe spent 5 monthsatthe King’s College London in 1998. There hecarriedoutresearchonspeechcodingbasedonlinearpredictivecoding (LPC). He obtainedhis Diplom-Ingenieur degree in 1999. He is currently working on a PhD degree atthe University of the Federal Armed Forces in Hamburg. His main research field isnear lossless and low-delayaudio coding for a real-time implementation ona digitalsignal processor (DSP). He works also on musical aspects and audio effects related
17.
xviii List of Contributorsto LPC and high-resolution spectral analysis.Alex Loscos received the BSc and MSc degrees in Telecommunication Engineer-ingfromCatalunyaPolytechnic University, Barcelona,Spain,in1997and1999respectively. He is currently a Ph.D. candidate in Informatics and Digital Commu-nication at the Pompeu FabraUniversity (UPF) of Barcelona. In 1997 he joined theMusic Technology Group of the Audiovisual Institute of the UPF as a researcherand developer. In 1999 he became a member of the Technology Department of theUPF as lecturer and he is currently teaching and doing research in voice process-ing/recognition, digital audio analysis/synthesis and transformations, and statisticaldigital signalprocessing and modeling.Davide Rocchessoreceived the Laureain Ingegneria Elettronica and PhDdegreesfrom the University of Padua, Italy, in 1992and 1996,respectively. His PhD researchinvolved the design of structures and algorithms based onfeedback delay networksfor sound processing applications. In 1994 and 1995, he was a Visiting Scholar attheCenter for ComputerResearch in Music and Acoustics (CCRMA),StanfordUniversity, Stanford, CA. Since 1991, he has been collaborating with the Centro diSonologia Computazionale (CSC), University of Padua as a Researcher and Live-Electronic Designer. Since 1998, he has been with the University of Verona, Italy,asanAssistantProfessor. At theDipartimentodiInformatica of the Universityof Verona he coordinates the project “Sounding Object”, funded by the EuropeanCommissionwithin the framework of theDisappearingComputerinitiative. Hismain interests arein audio signal processing, physical modeling, sound reverberationand spatialization, multimedia systems, and human-computer interaction.Mark Sandler (born 1955) is Professor of Signal Processing at Queen Mary, Uni-versity of London, where hemoved in 2001 after 19 yearsat King’s College London.He was founder and CEO of Insonify Ltd, an Internet Streaming Audio start-upfor 18 months. Mark received the BSc and PhD degrees from University of Essex,UK, in 1978 and 1984, respectively. He has published over 200 papers in journalsand conferences. He is a Senior Member of IEEE, a Fellow of IEE and a Fellow ofthe Audio Engineering Society. He has worked in Digital Audio for over 20 yearson a wide variety of topics including: Digital Power amplification; DrumSynthesis;Chaos and Fractalsfor Analysis and Synthesis; Digital EQ;Wavelet Audio Codecs;Sigma-Delta Modulation and Direct Stream Digital technologies; Broadcast Qual-ity Speech Coding; Internet Audio Streaming; automatic music transcription, 3Dsound reproduction;processing in the compression domain, high quality audio com-pression, non-linear dynamics, and time stretching. Livingin London, he hasa wife,Valerie, and 3 children, Rachel, Julian and Abigail, aged 9, 7 and 5 respectively. Agreat deal of his spare time is happily taken up playing with the childrenor playingcricket.Xavier Serra(born in 1959) is the Directorof the Audiovisual Institute (IUA) andthe head of the Music Technology Group at the Pompeu Fabra University (UPF)in Barcelona, where he has been Associate Professorsince 1994. He holds a Masterdegree in Music from Florida State University (1983), a PhD in Computer Musicfrom StanfordUniversity (1989)and hasworked as Chief Engineer in Yamaha MusicTechnologies USA, Inc.His research interests arein sound analysis and synthesisfor
18.
List of Contributors xixmusic and other multimedia applications. Specifically, he is working with spectralmodels and their applicationto synthesis, processing,high quality coding, plusothermusic-related problems such as: sound source separation, performance analysis andcontent-based retrieval of audio.Todor Todoroff (born in 1963), is an electrical engineer with a specialization intelecommunications. He received a First Prize in ElectroacousticComposition atthe Royal Conservatory of Music in Brussels as well as a higher diploma in Elec-troacoustic Composition at the Royal Conservatory of Music in Mons in the classof Annette Vande Gorne. After having been a researcher in the field of speech pro-cessing at theFree University of Brussels, for 5 years hewas head of the ComputerMusic Research at the Polytechnic Faculty in Mons (Belgium) where he developedreal-time software tools for processing and spatialization of sounds aimed at elec-troacoustic music composers in collaboration with the Royal Conservatory of Musicin Mons. He collaborated on manyoccasions with IRCAM where his computer toolswere used by composers Emmanuel Nunes, Luca Francesconi and Joshua Fineberg.His programs were used in Mons by composers like Leo Kupper, Robert Norman-deau and Annette Vande Gorne. He continues his research within ARTeM where hedeveloped a sound spatialization audio matrix and interactive systemsfor sound in-stallations and dance performances.He is co-founder and president of ARTeM (Art,Research, Technology & Music) and FeBeME (Belgian Federation of Electroacous-tic Music), administrator of NICE and member of the Bureau of ICEM. Heis aBelgian representative of the European COST-G6 Action “Digital Audio Effects”.His electroacoustic music shows a special interest in multiphony and sound spatial-ization as well as in research into new forms of sound transformation. He composesmusic for concert, film, video, dance, theater and sound installation.Udo Zolzer was born in Arolsen,Germany, in 1958. He received the Diplom-Ingenieur degree in electrical engineering fromthe University of Paderborn in 1985,the Dr.-Ingenieur degree from the Technical University Hamburg-Harburg (TUHH)in 1989 and completed a habiZitation in Communications Engineering at the TUHHin 1997. Since 1999 he has been a Professor and head of the Department of SignalProcessing and Communications at the University of the Federal Armed Forces inHamburg,Germany. His researchinterestsareaudioand video signal processingand communication. He has worked as a consultant for several companiesin relatedfields. He is a member of the AES and the IEEE. Inhis free time heenjoys listeningto music and playing the guitar and piano.
19.
Chapter lIntroductionU. Ziilzer1.1 DigitalAudio Effects DAFXwithMATLABAudio effects are used by all individualsinvolved in the generationof musical signalsand start with special playing techniques by musicians, merge to the use of specialmicrophone techniques and migrate to effect processors for synthesizing, recording,production and broadcasting of musical signals. This book willcover several cat-egories of sound or audio effects and their impact on sound modifications. Digitalaudio effects - as anacronym we use DAFX - are boxes or software tools withinputaudiosignalsorsounds which are modified according to somesoundcontrolpa-rameters and deliver output signals or sounds (see Fig. 1.1).The input and outputsignals are monitored by loudspeakers or headphones and some kind of visual rep-resentation of the signal such as the time signal, the signal level and its spectrum.According to acousticalcriteria the soundengineer or musician sets his controlparameters for the sound effect he would like to achieve. Both input and outputFigure 1.1 Digital audio effect and its control [Arf99].1
20.
2 1 Introductionsignals arein digital format and represent analog audio signals. Modification of thesound characteristicof the inputsignal is the main goal of digital audio effects. Thesettings of the control parameters are often done by sound engineers, musicians orsimply the music listener, but can also be part of the digital audioeffect.The aim of this book is the description of digital audio effects with regard tophysical and acoustical effect: we take a short look at thephysical backgroundandexplanation. We describeanalogmeans or deviceswhich generatethesound effect.digital signal processing: we give a formal description of the underlying algo-rithm and show some implementation examples.musical applications: we point out some applications and give references tosound examples available on CD or on the WEB.The physical and acoustical phenomenaof digital audio effects will be presented atthe beginning of each effect description, followedby an explanation of the signalprocessing techniques to achieve the effect and some musical applications and thecontrol of effect parameters.In this introductory chapterwe next explain some simple basics of digital signalprocessing and thenshow how to write simulation softwarefor audio effects process-ing with the MATLABsimulationtoolorfreewaresimulationtools2.MATLABimplementations of digital audio effects are a long way from running in real-timeon a personal computer or allowing real-time control of its parameters. Neverthe-less the programmingof signal processing algorithms and in particular sound effectalgorithms with MATLAB is very easy and can be learned very quickly.1.2 Fundamentals of DigitalSignalProcessingThe fundamentals of digital signal processing consist of the description of digitalsignals - in the context of this book we use digital audio signals - as a sequenceof numbers with appropriate number representation and the description of digitalsystems, which are described by software algorithmsto calculate an outputsequenceof numbers from an inputsequence of numbers. The visual representation of digitalsystems is achieved by functional block diagram representation or signalflow graphs.Wewill focus on some simple basics as an introduction to the notation and referthe reader to the literaturefor an introduction to digital signalprocessing [ME93,Orf96, Zo197, MSY98, MitOl].http://www.rnathworks.com2http://www.octave.org
21.
1.2 Fundamentals of Digital Signal Processing 3tin usec + n + n + t i n pec +Figure 1.2 Sampling and quantizing byADC, digital audio effects and reconstruction byDAC.1.2.1 Digital SignalsThe digital signal representationof an analog audiosignal as a sequence of numbersis achieved by an analog-to-digital converter ADC. TheADC performs sampling ofthe amplitudesof the analog signalx(t)on an equidistantgrid along the horizontaltime axis and quantizationof the amplitudes tofixed samples represented by num-bers x(n)along the vertical amplitude axis(see Fig. 1.2). The samples areshown asvertical lines with dots on t,he top. The analogsignal x(t)denotes the signalampli-tude over continuous time t in psec. Following the ADC, the digital (discrete-timeand quantized amplitude) signal is represented by a sequence (stream) of samples~ ( n )represented by numbers over the discrete time index n. The time distance be-tween two consecutive samplesis termed sampling intervalT (sampling period) andthe reciprocal is the sampling frequency fs = 1/T (sampling rate). The samplingfrequency reflects the numberof samples per second in Hertz (Hz). Accordingto thesampling theorem, it has to be chosen as twice the highest frequency fmax (signalbandwidth) contained in the analog signal, namely fs > 2 . fmax. Ifwe are forcedto use a fixed sampling frequency fs, we have to make sure that our input signalwewant to sample has a bandwidth accordingto fmax= fs/2. If not, we have to rejecthigher frequencies by filtering with a lowpass filter which passes all frequencies upto fmax. The digital signal is then passed to a DAFX box (digital system), whichperforms a simple multiplication of each sample by 0.5 to deliver the output signaly ( n ) = 0.5-z(n).This signal y ( n ) is then forwarded to a digital-to-analog converterDAC, which reconstructs the analog signaly(t). The output signal y(t) has half theamplitude of the input signalz(t).The following M-file1.1may serve as an examplefor plotting digital signals as shown in Fig. 1.2.
22.
4 l IntroductionM-file 1.1 (figurel-02.m)subplot(2,4,1) ;plot((O:96)*5,y(l:97));title(’x(t) ’1 ;axis( CO 500 -0.05 0.051);xlabel(’t in musec rightarrow’);subplot(2,4,2) ;stem((0:24),uI(I:25),’.’);axis([O 25 -0.05 0.051);xlabel( ’n rightarrow’) ;title(’x(n) ’1;subplot(2,4,3);stem((0:24),0.5*ul(l:25),’.’);axi~(~O 25 -0.05 0.051);xlabel(’n rightarrow’) ;title(’y(n1’);subplot(2,4,4);plot((0:96)*5,0.5*y(l:97));axis([O 500 -0.05 0.053);xlabel(’t in mu sec rightarrow’);title(’y(t) ’1;-0.5 I I I I I I I I0 1000 2000 3000 4000 5000 6000 7000 80000.4I I I I I I 1 I I I I-0.4 I I I I I I I I I I l0 100 200 300 400 500 600 700 800 900 1000-0.05I I I I I I I I I I I0 10 20 30 40 50 60 70 80 90 100n +Figure 1.3 Different time representations for digital audio signals.
23.
1.2 Fundamentals of DigitalSignalProcessing 5Figure 1.3 shows some digital signals to demonstrate different graphical repre-sentations(seeM-file 1.2). The upper partshows 8000 samples, the middle part thefirst 1000 samples and the lower part shows the first 100 samples out of a digitalaudio signal. Only if the number of samples inside a figure is sufficiently low,willthe line with dot graphical representation be used for a digital signal.M-file 1.2 (figurel-03.m)[ulyFSyNBITS]=wavread(~ton2.wav~);figure(1)subplot(3,1,i);pl0t(0:7999,ul(l:800O));ylabel(’x(~~’);subplot (3,1,2);p~ot(0:999,ul(l:1000));ylabel(’x(n)’);subplot (3,l,3) ;stem(O:99,ul(l:100),’.’);ylabel(’x(n)’);xlabel( ’n rightarrow’) ;axisformatverticalVerticalNormalizedaxis format32767-32766’-32767,-32768-Continuoustime axisI : : : : : : : : : : : : I ~ T Discrete0 1 2 3 4 5 6 7 8 9 1 0 1 1time axis+-+ : : : : : : : : : : : ~nNormalizeddiscrete time axisFigure 1.4 Vertical and horizontal scale formats for digital audio signals.Two different vertical scale formats for digital audiosignals are shown in Fig. 1.4.The quantization of the amplitudes to fixed numbers in the range between -32768... 32767is based on a 16-bit representationof the sample amplitudeswhich allows216 quantized values inthe range -215...215- 1.For a general w-bit representationthe number range is -2”-’ ...2”-’ - 1. This representation is called the integernumber representation. If we divide all integer numbers by the maximum absolutevalue, for example 32768,we come to thenormalized vertical scale in Fig. 1.4 whichis intherange between -1 ... l-Q. Q is thequantizationstep size andcan becalculated by Q = 2-(”-’), which leads to Q = 3.0518e - 005 for W = 16. Figure
24.
6 l Introduction1.4 alsodisplays the horizontalscaleformats, namely the continuous-timeaxis,the discrete-timeaxisandthenormalizeddiscrete-timeaxis, which will be usednormally. After this narrow description we can define a digital signal as a discrete-time and discrete-amplitude signal, which is formed by sampling an analog signaland by quantization of the amplitude onto a fixed number of amplitude values.The digital signal is represented by a sequence of numbers z(n).Reconstruction ofanalog signals can be performed by DACs. Further details of ADCs and DACs andthe related theory can befound in the literature. For our discussion of digital audioeffects this short introduction to digital signals is sufficient.Signal processing algorithms usually process signals by either block processingor sample-by-sample processing. Examplesfor digital audio effects are presented in[Arf98]. For block processing, data are transferred to a memory buffer andthenprocessed each time thebuffer is filled with new data. Examples of such algorithmsare fast Fourier transforms (FFTs) for spectra computations and fast convolution.In sample processing algorithms, each input sample is processed on a sample-by-sample basis.A basicalgorithm for weighting of asound z(n) (see Fig. 1.2) by aconstantfactor a demonstrates a sample-by-sampleprocessing. (see M-file 1.3). The inputsignal is represented by a vector of numbers z(O),z(1),... ,z(length(z) - 1).M-file 1.3 (sbs-a1g.m)% Read input sound file into vector x(n) and sampling frequency FS[x,FS]=wavread(’input filename’);% Sample-by sample algorithm y(n>=a*x(n>for n=i :length(x) ,end ;% Write y(n> into output sound file with number of% bits Nbits and sampling frequency FSwavwrite(y,FS,Nbits,’output filename’);y(n>=a * x(n>;1.2.2 Spectrum Analysis of Digital SignalsThe spectrumof a signal shows the distribution of energy over the frequency range.The upper part of Fig. 1.5 shows the spectrum of a short time slot of an analogaudio signal. The frequencies range up to20 kHz. The sampling and quantizationofthe analog signal with sampling frequency of fs = 40 kHz lead to a correspondingdigitalsignal. The spectrum of the digitalsignal of the same time slot is shownin the lower part of Fig. 1.5. The sampling operation leads to a replication of thebaseband spectrum of the analog signal [Orf96].The frequency contents from 0 Hzup to20 kHz of the analog signalnow also appear from 40 kHz up to60 kHz and thefolded version of it from 40 kHz down to 20 kHz. The replication of this first imageof the baseband spectrum at 40 kHzwillnow also appear at integer multiples ofthe sampling frequency of fs = 40 kHz. But notice that the spectrumof the digitalsignal from 0 up to 20 kHz shows exactly the same shape as the spectrum of the
25.
l .L? Fundamentals of DigitalSignalProcessing 7m:-20540-60-800 4WO 8WO 12OW 16000 :!WW 24000 280W 32000 36000 4WW 440W 480W 52000 5M)OO SOW0f m Hz+Figure 1.6 Spectra of analog and digital signals.analog signal. The reconstruction of the analog signal out of the digital signal isachieved by simply lowpass filtering the digital signal, rejecting frequencies higherthan f s / 2 = 20 kHz. Ifwe consider the spectrum of the digital signal in the lowerpart of Fig. 1.5 andifwe reject all frequencies higher than 20 kHz we come back tothe spectrum of the analog signal in the upper part of the figure.Discrete Fourier TransformThe spectrumof a digital signal can be computedby the discreteFourier transformDFT which is given byN-lX ( k ) = DFT[z(n)]= cz(n)e-jZnnklNk = 0,1,... , N - 1. (1.1)n=OThe fastversion of the above formulais called the fast Fourier transform FFT. TheFFT takes N consecutive samples out of the signal z(n)and performs a mathemat-ical operation to yield N sa,mples X ( k ) of the spectrum of the signal. Figure 1.6demonstrates the resultsof a, 16-point FFT applied to 16 samplesof a cosine signal.The result is normalized by N according to X=abs(fft(x,N)) /N;.The N samples X ( k ) = X,(k) +j X l ( k ) are complex-valued with a real partXR(IC)and an imaginary partX ~ ( l c )from which one can compute the absolutevalueJX(lc)J= vIX i ( k ) +X?(k) IC = 0,1,... ,N - 1 (1.2)which is the magnitude spectrum, and the phasep(k) = arctan - k = 0 , 1 , ... , N - lX R ( k )
26.
8 1 Introduction0 24 6 8 10 1214 16n-+1 -II I I I0.5 -0 7 . . . . . . . . . . . I . 1I I I I I0 24 6 8 10121416k +1 -II I I I I I- (c) Magnitude spectrum IX(f)l t0.5 - -0 -, I I I I I0 0.5 1 1.5 2 2.5 3 3.5f in Hz + x lo4Figure 1.6 Spectrum analysis with FFT algorithm: (a) digital cosine with N = 16 sam-ples, (b) magnitude spectrum (X(k)lwith N = 16 frequencysamples and (c) magnitudespectrum IX(f)l from 0 Hz up to the sampling frequency fa = 40000 Hz.which is the phase spectrum. Figure 1.6also shows that theFFT algorithm leadstoN equidistant frequency points which give N samples of the spectrumof the signalstarting from 0 Hz insteps of upto vis.Thesefrequencypoints are givenby IC$,where IC is running from 0,1,2,. .. ,N - 1.The magnitude spectrum IX(f)lis often plotted over a logarithmicamplitude scale according to 20 log,, (g)which gives 0 dB for a sinusoid of maximum amplitude f l . This normalization isequivalent to 20 log,, . Figure 1.7 shows thisrepresentation of the examplefrom Fig. 1.6. Images of the baseband spectrum occur at the sampling frequencyfs and multiples of fs. Therefore we see the original frequency at 5 kHz and in thefirst image spectrum thefolded frequency fs - fc,,i,,=(40000-5)Hz=35000 Hz. Thefollowing M-file 1.4 is used for the computation of Figures 1.6 and 1.7.(NI2M-file 1.4 (figurei-06-07.111)N=16;~=cos(2*pi*2*(0:l:N-l~/N)’;
27.
1.2 Fundamentals of Digital Signal Processing 9--20IX-400 0.5 1 1.5 2 2.5 3 3.5f in Hz -+ X lo4Figure 1.7 Magnitude spectrumIX(f)l in dB from 0 Hz up to the sampling frequencyfs = 40000 Hz.figure(1)subplot(3,1,1);stem(O:N-iyx,’. ’>;axis([-0.2 N -1.2 1.21);legend(’Cosine signal x(n>’>;ylabel( ’a)’ ;xlabel( ’n rightarrow’);X=abs(fft(x,N) ) /N;subplot(3,1,2);stem(O:N-l,Xy’.’);axis([-0.2 N -0.1 1.11);legend(’Magnitude spectrumindex{Magnitude spectrum) IX(k) 1 ’ ) ;ylabel( ’b) ’) ;xlabel( ’k rightarrow’N=1024;x=cos(2*pi*(2*1024/16)*(O:I:N-l~/N)’;FS=40000;X=abs(fft(x,N))/N;subplot( 3 , l ,3) ;plot(f,X>;axis( [-0.2*44100/16 max(f) -0.1 1.11 ;legend(’Magnitude spectrumindex{Magnitude spectrum) IX(f) 1 ’ ) ;ylabel( ’c)’) ;xlabel(’f in Hz rightarrow’)f=((O:N-l)/N)*FS;figure(2)subplot(3,l,l);plot(f ,20*10giO(X./(0.5)>);axis( C-0.2*44100/16 max(f) -45 201 ;legend( ’Magnitudespectrumindex{Magnitude spectrum) IX(f) I in dB’);ylabel(’)X(f)) in dB rightarrow’);xlabel(’f inHz rightarrow’)
28.
10 1 IntroductionInverse Discrete Fourier Transform (IDFT)Whilst the DFT is used asthetransform from the discrete-timedomain to thediscrete-frequency domainfor spectrum analysis, the inverse discrete Fourier trans-form IDFT allows the transformfrom the discrete-frequency domainto thediscrete-time domain. The IDFT algorithm is given by1 N--l~ ( n )= IDFT[X(IC)]= C X(k)ej2""k/N n = 0, l,... , N - 1. (1.4)k=OThe fast version of the IDFT is called the inverse FastFouriertransform IFFT.Taking N complex-valuednumbers and the property X ( k ) = X * ( N - IC) in thefrequencydomainandthenperformingtheIFFT gives N discrete-timesamples~ ( n ) ,which are real-valued.Frequency Resolution: Zero-padding and Window FunctionsToincrease the frequencyresolution for spectrumanalysis we simply take moresamples for the FFT algorithm. Typical numbers for the FFT resolution are N =256,512,1024,2048,4096 and8192. If we are only interested in computing thespec-trum of 64 samples and would like to increase the frequency resolution from f,/64to f,/1024, we have to extend the sequence of 64 audiosamples by adding zerosamples up to the length 1024 and then performing an 1024-point FFT. This tech-nique is called zero-paddingand is illustrated in Fig.1.8and byM-file 1.5. Theupper left part shows the original sequence of 8 samples and the upper right partshows the corresponding8-point FFT result.The lower left partillustratestheadding of 8zerosamples to the original8samplesequenceup to the length ofN = 16. The lower right part illustrates the magnitude spectrum IX(k)l resultingfrom the 16-point FFT of the zero-padded sequence of length N = 16. Notice theincreaseinfrequencyresolutionbetween the 8-point and 16-point FFT. Betweeneach frequency bin of the upper spectrum a new frequency bin in the lower spec-trum is calculated. Bins k = 0,2,4,6,8,10,12,14of the 16-point FFT correspondto bins k = 0,1,2,3,4,5,6,7of the 8-point FFT. These N frequency bins cover thefrequency range from 0 Hz up to vfs Hz.M-file 1.5 (figurel-08.m)x2(16)=0;x2(1:8)=x1;xI=[-I -0.5 1 2 2 1 0.5 -11;subplot(221);stern(O:l:7,xl);axis([-O.5 7.5 -1.5 2.51)-ylabel(x(n) rightarrow) ;title(8 samples));subplot(222) ;stem(O:l:7,abs(fft(xI)));axis([-O.5 7.5 -0.5 IO]);ylabelo IX(k) I rightarrowJ);title(8-point FFT);
29.
1.2 Fundamentals of Digital Signal Processing 118 samples2.5 ’28 samples+ zero-padding1TF 0.5-1-1.5-’0 5 1015n +8-point FFT10, , Ilo’16-point FFT0 5 10 15k-1I S0 frequencyin Hz f SFigure 1.8 Zero-padding to increase frequency resolution.subplot(223);stem(0:1:15,x2);axis([-0.5 15.5 -1.5 2.51);xlabel ( ’n rightarrow’ ;ylabel( ’x(n) rightarrow’ ;title( ’8 samples + zero-padding’);subplot(224);stem(0:1:15,abs(fft(x2)));axis([-1 16 -0.5 101);xlabel(’k rightarrow’);ylabel(’JX(k)I rightarrow’);title(’l6-point FFT’);The leakage effect occurs due to cutting out N samples from the signal. Thiseffect is shown in the upper part of Fig. 1.9 and demonstrated by the correspond-ing M-file 1.6. The cosine spectrum is smeared around the frequency. We can reducethe leakage effectby selecting a window function like Blackman window and
30.
1 Introduction1210.50-0.5-1(a) Coslne signal x@) (b) Spectrum of cosinesignal200 400 600 Boo lo00 0 2000 4000 6OoO 8000 10000(c) Cosine signalxw(n)=x(n). w(n) withwindow (d) Spectrumwith BiackmanwindowI I ot I-4 -800 2000 4000 6000 8000 0 2000 4000 6000 8000 10000(e) Audio signalx,(n)with window (l)Spectrumwith BlackmanwlndowL I I00.5-20-0-40-0.5-60-1 -an0 2000 4000 6000 8000I-O 2000 4000 6000 8000 10000n + fin Hz --tFigure 1.9 Spectrum analysis of digital signals: take N audio samples and perform an Npoint discrete Fourier transform to yield N samples of the spectrum of the signal startingfromOHzoverk~wherekisrunningfromO,l,Z,... , N - l . ( a ) z ( n ) = c o s ( Z . . r r ~ ~ .1 kHzn).Hamming window~ ~ ( 7 2 )= 0.42 - 0.5~0~(27rn/N)+0.08 cos(47rn/N), (1.5)~ ~ ( 7 2 )= 0.54 - 0 . 4 6 ~ 0 ~ ( 2 ~ n / N ) (1.6)n = 0 , 1 , ...N - 1 .and weighting the N audiosamples by the window function.Thisweighting isperformed according to x, = w(n) .x(n)with 0 5 n 5 N - 1 and then an FFTof the weightedsignal is performed.The cosineweighted by a window andthecorresponding spectrum is shown in the middle part of Fig. 1.9. The lower part ofFig. 1.9shows a segment of an audio signalweighted by the Blackman window andthe corresponding spectrum via a FFT. Figure 1.10 shows further simple examplesfor the reduction of the leakage effect and can be generated by the M-file 1.7.M-file 1.6 (figurel-09.m)x=cos(2*pi*1000*(0:1:N-1)/44100)~;figure(2)
31.
1.2 Fundamentals of Digital Signal Processing 13W=blackman(N) ;W=N*W/sum(W); % scaling of windowf=((O:N/2-1)/N)*FS;xw=x.*W;subplot(3,2,l);plot(O:N-1,x);axis([O 1000 -1.1 1.111;titleoa) Cosine signalx(n> ’)subplot(3,2,3);plot(O:N-1,xw);axis([O 8000 -4 41);titleoc) Cosine signal x-w(n)=x(n) cdot w(n) with window’)X=2O*loglO (abs (f ft (x ,N)1/ (N/2)) ;subplot(3,2,2);plot(f,X(1:N/2));axis( CO l0000 -80 1011;ylabel(’X(f)’);title(’b) Spectrum of cosine Signal’)Xw=20*logiO (abs(ff t(xw,N) 1/ (N/2) ) ;subplot(3,2,4);plot(f ,Xw(I:N/2));axis( [O 10000 -80 101) ;ylabel(’X(f)’);title(’d) Spectrum with Blackman window’)s=ul(I:N).*W;subplot(3,2,5);plot(O:N-lys);axis([0 8000 -1.1 1.11);xlabel( ’n rightarrow’);titleoe) Audio signal x-w(n) with window’)Sw=20*loglO (abs(ff t(S,N)1/ (N/2)1;subplot(3,2,6);plot(f ,Sw(I:N/2));axis( CO 10000 -80 101 ;ylabel(’X(f)’);titleof) Spectrum with Blackman window’)xlabel( f inHz rightarrow’ ;M-file 1.7 (figurel-1O.m)W = BLACKMAN(8) ;w=w*8/sum(w) ;xl=x.*W’;x2(IS)=O;x2(1:8)=xI;x=[-I -0.5 1 2 2 1 0.5 -11;
32.
14 1 Introduction0 1 2 3 4 5 6 7(b)l57&point FFT of (c)0 1 2 3 4 5 6 70 1 2 3 4 5 6 7(C)16-point FFT of (d)~ 0 0 0 0 0 0 0 0 0I . I0 5 10 15151050n - l k - 1Figure 1.10 Reduction of the leakage effect by window functions: (a) theoriginal signal,(b) the Blackman window function of length N = 8, (c) product z(n).w(n) with 0 5 n 5N - 1, (d) zero-padding applied to z ( n ) .w(n) up to length N = 16 and the correspondingspectra are shown on the right side.ylabel(’x(n) rightarrow’);title(’a) 8 samples’);subplot(423);stem(O:l:7,w);axis([-0.5 7.5 -1.5 31);ylabel(’w(n) rightarrow’);title(’b) 8 samples Blackmanwindow’);subplot(425);stem(0:l:7,xl);axis([-O.5 7.5 -1.5 S ] ) ;ylabel( ’x-w (n)rightarrow’) ;title(’c) x(n)cdot w(n)’);
33.
1.2 Fundamentals of Digital Signal Processing 15subplot(222);stem(0:l:7,abs(fft(xl)));axis([-O.5 7.5 -0.5 151);ylabel( ’ IX(k) I rightarrow’) ;title( ’8-pointFFT of c ) ’);subplot (224);stem(0:1:15,abs(fft(x2)));axis(~-1 16 -0.5 151);xlabel(’k rightarrow’) ;ylabel(’lX(k) I rightarrow’);title( ’16-point FFT of ci) ’) ;DFT DFT DFTA A AN=8 N=8 N=8Figure 1.1’1Short-time spectrum analysis by FFT.Spectrogram: Time-frequency RepresentationA special time-frequency representation is the spectrogram which gives an estimateof the short-time,time-localized frequency contentof the signal. Thereforethe signalis split into segments of length N which are multiplied by a window and an FFTisperformed (see Fig. 1.11).To increasethe time-localization of the short-time spectraan overlap of the weighted segments can be used. A special visual representationof the short-time spectra is the spectrogram in Fig. 1.12. Time increases linearlyacross the horizontal axis and frequency increases across the vertical axis. So eachvertical line represents the absolute value X (f )1 over frequency by a grey scalevalue (see Fig. 1.12). Onlyfrequencies up to half the sampling frequency are shown.The calculation of the spectrogram from a signal can be performedby the MATLABfunction B = SPECGRAM (x,NFFT ,Fs,WINDOW,NOVERLAP).Another time-frequency representation of the short-time Fourier transforms ofa signal x(.) is the waterfall representation in Fig. 1.13,which can be produced byM-file 1.8which calls the waterfall computation algorithm given by M-file 1.9.M-file 1.8 (figurel-l3.m)[signal,FS,NBITS]=wavread(’ton2’);subplot(2li);plot(signal);subplot (212) ;waterfspec(signal,256,256,512,FS,20,-100);
34.
16 1 Introduction..lFrequencySpectrogramFigure 1.12 Spectrogram via FFT of weighted segments.1Signal x(n)I I I I I I Il l- O I 1-1 ’ I I I I I 1 I I J0 1000 2000 3000 4000 5000 6000 7000 8000n +Waterfall ReDresentaiionof Short-time FFTsf in kHz --tFigure 1.13 Waterfall representation via FFT of weighted segments.M-file 1.9 (waterfspec .m)function yy=waterfspec(signal,start,steps,N,fS,clippin~oint,basepl~e)% waterfspec(signal, start, steps, N , f S , clippingpoint,baseplane)% shows short-timespectra of s i g n a l , s t a r t i n g%
35.
1.2 Fundamentals of Digital Signal Processing 17% at k=start, with increments ofSTEP with N-point FFT% dynamic range from -baseplane in dB up to 20*log(clippingpoint)% in dB versus time axis% 18/9/98 J. Schattschneider% 14/10/2000 U. Zoelzerecho off;if narginc7, baseplane=-IO0; endif narginc6, clippingpoint=O; endif narginc5, fS=48000; endifnarginc4,N=1024;end % defaultFFTif narginc3, steps=round(length(signal)/25); endif narginc2, start=O; end%windoo=blackman(N); % window - defaultwindoo=windoo*N/sum(windoo) ; % scalingCalculation of number of spectra nosn=length(signal);rest=n-start-N;nos=round(rest/steps) ;if nos>rest/steps, nos=nos-l; endvectors for 3D representationx=linspace(O, fS/1000 ,N+I);z=x-x;cup=z+clippingpoint;cdown=z+baseplane;signal=signa1+0.0000001;Computation of spectra and visual representationfor i=l:l:nos,spekl=20.*loglO(abs(fft(windoo .*signal(l+start+ ....... .i*steps:start+N+i*steps) ) ./ (N)/O .5);spek= C-200; spekl(l :NI];spek=(spek>cup’).*cup’+(spek<=cup’).*spek;spek=(spek<cdown’).*cdown’+(spek>=cdown’).*spek;spek(l)=baseplane-1.0;spek(N/2)=baseplane-10;y=x-x+(i-l) ;if i==lp=plAt3(x(l:N/2) ,y(I:N/2) ,spek(l:N/2), ’k’);set(p,’Linewidth’,O.l);pp=patch(x(l:N/2),y(l:N/2),spek(l:N/2),’w’,’Visible’,’on’);set (pp, ’Linewidth’,O .l);endend;set(gca,’DrawMode’,’fast’);
36.
18 1 Introductionaxis( C-0.3fS/2000+0.3 0 nos baseplane-l0 01 ;set(gca,’Ydir’,’reverse’);view(12,401;1.2.3 Digital SystemsA digital system is represented by an algorithm which uses the input signal x(n)asa sequence (stream) of numbers and performs mathematical operations upon theinput signal such as additions, multiplications and delay operations. The result ofthe algorithm is a sequence of numbers or the outputsignal y(n). Systems which donot change their behavior over time andfulfill the superposition property[Orf96]arecalled linear time-invariant (LTI) systems. Nonlinear time-invariant systemswill bediscussed inChapter 5. The input/outputrelations for a LT1digital system describetime domain relations which are based on the following terms and definitions:0 unit impulse, impulse response and discrete convolution;0 algorithms and signal flow graphs.For each of thesedefinitions an equivalentdescription in the frequencydomainexists, which will be introduced later.Unit Impulse, Impulse Response and Discrete Convolution0 Test signal: a very useful test signal for digital systems is the unit impulsed(n)=1 for W = 00 for n # 0,which is equal to one for n = 0 and zero elsewhere (see Fig. 1.14).0 Impulse response: ifwe apply a unit-sample function to a digital system, thedigital system will lead to an outputsignal y(n) = h(n),which is the so-calledimpulse response h(n)of the digital system. The digital system is completelydescribed by the impulseresponse, which is pointedout by the label h(n)inside the box, as shown in Fig. 1.14.Figure 1.14 Impulse response h(n) as a time domain description of a digital system.
37.
1.2 Fundamentals of Digital Signal Processing 190 Discrete convolution:if we know the impulse responseh(n)of a digital system,we can calculate the outputsignal y(n) from a freely chosen input signal ~ ( n )by the discrete convolution formula given by00y(n) = cz ( k ) . h(" - k ) = .(n) * h(n), (1.8)k = - mwhich is often abbreviatedby the second term y(n) = x(n)*h(n).This discretesum formula (1.8) represents an input-output relation for a digital system inthe time domain. The computation of the convolution sum formula (1.8) canbe achieved by the MATLAB function y=conv(x, h).Algorithms and Signal Flow GraphsThe above given discrete convolution formula shows the mathematical operationswhich have to be performed to obtain the output signal y(n) for a given inputsignal x(.). In thefollowing we will introduce a visual representationcalled a signalflow graph which represents the mathematical input/output relationsin a graphicalblock diagram.We discuss some example algorithmsto show that we only need threegraphical representations for the multiplication of signals by coefficients, delay andsummation of signals.0 A delay of the inputsignal by two sampling intervalsis given by the algorithmy(n) = .(n - 2) (1.9)and is represented by the block diagram in Fig. 1.15.Figure 1.15 Delay of the input signal.A weighting of the input signal by a coefficient a is given by the algorithmy(n) = a . x(.) (1.10)and represented by a block diagram in Fig. 1.16.1 a a- 1 0 1 2 - 1 0 1Figure 1.16 Weighting of the input signal.
38.
200 The addition of two input signals is given by the algorithmy(n) = a1 .x1(n)+a2 .z2(n)and represented by a block diagram in Fig. 1.17.1 Introduction(1.11)- 1 0 1 2Figure 1.17 Addition of two signals zl(n)and z2(n)0 The combination of the abovealgorithmsleads to the weighted sum overseveral input samples, which is given by the algorithm1 1 13 3 3y(n) = -x(.) + -x(. - 1)+ -x(n- 2) (1.12)and represented by a block diagram in Fig. 1.18.x(n-2)- 1 0 1 21313n)= 1x(n)+ 1x(n-1)+ -x(n-2)33 L n- 1 0 1 2 3Figure 1.18 Simple digital system.Transfer Function and Frequency ResponseSo far our descriptionof digital systems hasbeen based onthe time domainrelation-ship between the input and output signals. We noticed that the input and outputsignals and the impulseresponse of the digital system aregiven in the discrete timedomain. In a similar way to thefrequency domain description of digital signals bytheir spectra given in the previous subsection we can have a frequency domain de-scription of the digital system which is represented by the impulse response h(n).The frequency domain behavior of a digital system reflects its ability topass, rejectand enhance certain frequenciesincluded in the input signal spectrum. The com-mon terms for the frequency domain behavior are the transfer function H ( z ) and
39.
1.2 Fundamentals of Digital Signal Processing 21the frequency response H ( f ) of the digital system. Both can be obtained by twomathematical transforms applied to theimpulse response h(n).The first transform is the 2-TransformccX ( z ) = (1.13)n=-ccapplied to the signal z(n) and the second transform is the discrete-time FouriertransformMn=-ccwith R = wT = 27rf/fs (1.15)applied to thesignal z(n).Both are relatedby the substitutionz U ej. Ifwe applythe Z-transform to the impulse response h(n)of a digital system according tocc(1.16)n=-ccwe denote H ( z )as thetransfer function. If we apply thediscrete-time Fourier trans-form to the impulse response h(n)we getSubstituting (1.15) we define the frequency response of the digital system bycc(1.17)(1.18)n=-ccCausal and Stable SystemsA realizable digital system has to fulfill the following two conditions:Causality: a discrete-time system is causal, if the output signal y ( n ) = 0 forn < 0 for a given input signal U(.) = 0 for n < 0. This means that the systemcannot react to an inputbefore the input is applied to the system.0 Stability: a digital system is stable if00(1.19)holds. The sum over the absolute values of h(n)has to be less than a fixednumber M2 < cc.
40.
22 l IntroductionTable 1.1Z-transforms and discrete-time Fourier transforms of x(..).The stability implies that thetransfer function (Z-transformof impulse response)and thefrequency response (discrete-time Fourier transform of impulse response) ofa digital system are related by the substitution z H ej. Realizable digital systemshave to be causal and stable systems. Some Z-transforms and their discrete-timeFourier transforms of a signal ~ ( n )are given in Table 1.l.IIR and FIR SystemsIIR systems: A systemwith an infiniteimpulseresponse h(n)is called an IIRsystem. From the block diagram in Fig. 1.19 we can read the difference equationy ( n ) = z(n)- a1y(n - 1)- a2y(n - 2). (1.20)The output signal y(n) is fed back through delayelements and a weighted sumof these delayed outputs is summed up to the input signal x(.). Such a feedbacksystem is also called a recursive system. The Z-transform of (1.20) yieldsY ( z ) = X ( z )- alz-lY(z) - a / Y ( z ) (1.21)X(z) = Y ( z ) ( l+a1z- +a2z-2) (1.22)and solving for Y ( z )/X ( z ) gives transfer function(l.23)Figure 1.20 shows a special signal flow graph representation, where adders, multi-pliers and delay operators are replaced by weighted graphs.If the input delay line is extended up to N - 1 delay elements and the outputdelayline up to M delay elementsaccording to Fig.1.21, we canwrite for thedifference equationM N-l(1.24)k=l k=O
41.
1,2 Fundamentals of Digital Signal Processing 23Figure 1.19 Simple IIR system with input signal z(n)and output signal y(n).Figure 1.20 Signalflow graph of digital system in Fig. 1.19 withtimedomaindescrip-tionin the left block diagram and correspondingfrequencydomaindescriptionwith Z-transform.Figure 1.21 IIR systemthe Z-transform of the difference equationM N-lY ( z )= -- cah z-’YY(z) + cbh z-’XX(z),k=l k=O(1.25)
42.
24 l Introductionand the resulting transfer function(1.26)The following M-file 1.10 shows a block processing approach for the IIR filter algo-rithm.M-file 1.10 (fi1ter.m)Y = FILTER(B,A,X) filters the data in vector X with thefilter described by vectorsA and B to create the filtereddata Y. The filter is a "Direct Form I1 Transposed"implementation of the standard difference equation:a(l)*y(n) = b(l)*x(n) + b(2)*x(n-l) + ... + b(nb+l)*x(n-nb)- a(2)*y(n-1) - ... - a(na+l)*y(n-na)If a(1) is not equal to1, FILTER normalizes the filtercoefficients bya( l) .A sample-by-sample processing approachfor a second-order IIR filter algorithmis demonstrated by the following M-file 1.11.M-file 1.11 (DirectFormO1.m)% M-File DirectFormO1.M% Impulse response of 2nd order IIR filter% Sample-by-sample algorithmclearecho on%%% Impulse response of 2nd order IIR filterecho off% Coefficient computationf g=4000;f a=48000;k=tan(pi*fg/fa) ;b(l)=l/(l+sqrt(2)*k+k-2);b(2)=-2/(l+sqrt(2)*k+k^2);b(3)=l/(l+sqrt(2)*k+k-2);a(l)=l;a(2)=2*(k-2-l)/(l+sqrt(2)*k+kA2);a(3)=(l-sqrt(2)*k+k^2)/(l+sqrt(2)*k+k-2);
43.
1.2 Fundamentals of Digital Signal Processing 25% Initialization of state variablesxhl=O;xh2=0;yhl=O;yh2=0;% Input signal: unit impulseN=20; % length of input signalx(N)=O;x(l)=l;% Sample-by-sample algorithmfor n=l:Ny(n)=b(l)*x(n) + b(2)*xhl + b(3)*xh2 - a(2)*yhl - a(3)*yh2;xh2=xhl; xhl=x(n);yh2=yhl; yhl=y(n);end;% Plot resultssubplot (2,1,1)stem(0:l:length(x)-lyx,’.’);axis([-0.6 length(x)-l -1.2 1.21);xlabel(’n rightarrow’) ;ylabel(’x(n) rightarrow’);subplot (2 1,2)stem(O:l:length(x)-l,y,’.’);axis([-0.6 length(x)-l -1.2 1.21);xlabel( ’n rightarrow’) ;ylabel(’y(n) rightarrow’);The computation of frequency response based on the coefficients of the transferfunction H ( z )= %can be achieved by the M-file 1.12.M-file 1.12 (freq2.m)FREqZ Digital filter frequency response.[H,W] = FREqZ(B,A,N) returns the N-point complex frequencyresponse vector H and the N-point frequency vectorW inradians/sample of the filter:jw-jw -jmwjw B(e) b(1) + b(2)e + .... + b(m+l)eH(e) = ____ = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .jw-jw -jnwA(e)a(1) + a(2)e + .. .. + a(n+i)egiven numerator and denominator coefficients in vectorsB and A.The frequency response is evaluated atN points equally spacedaround the upper half of the unit circle.The computation of zeros and poles of H ( z ) = Bis implemented byM-file1.13.M-file 1.13 (zp1ane.m)ZPLANE 2-plane zero-pole plot.ZPLANE(B,A) where B andA are row vectors containing transfer
44.
26 l Introductionfunction polynomial coefficients plots the poles and zeros ofB(z)/A(z).FIR system: A system with a finite impulse responseh(n)is called an FIR system.From the block diagram in Fig. 1.22 we can read the difference equationy(n) = bo.(.) +blz(n- 1)+baz(n- 2). (1.27)The input signal ~ ( n )is fed forward through delay elements and a weighted sumof these delayed inputs is summed up to the inputsignal y(n). Such a feed forwardsystem is also called a nonrecursive system. The Z-transform of (1.27) yieldsY ( z ) boX(2)+blz-X(z) +b z ~ - ~ X ( z ) (1.28)= X ( Z ) ( b o +b1z-l +b2zP2) (1.29)and solving for Y ( z )/ X ( z ) gives transfer functionA general FIR system in Fig. 1.23 consists of a feed forward delay line with N - 1delay elements and has the difference equationN - ly(n) = cbk Z(?Z - k ) . (1.31)k=OThe finite impulse response is given byN--lh(n)= C bk 6(n- k ) , (l.32)k=Owhich shows that each impulse of h(n)is represented by a weighted and shifted unitimpulse. The Z-transform of the impulse response leads to the transfer functionN - lk=Ox(n-1) = xH,(n) XFigure 1.22 Simple FIR system with input signal z(n)and output signal y(n).
45.
1.2 Fundamentals of Digital Signal Processing 27.Figure 1.23 FIR system.The time-domain algorithmsfor FIR systems are the same as thosefor IIR systemswiththeexception that the recursive part is missing. The previouslyintroducedM-files for IIR systems can beused with the appropriatecoefficients for FIR blockprocessing or sample-by-sample processing.The computation of the frequency response H (f ) = (H(f)1 .ejLH(f)( ( H (f )Imagnitude response, cp = LH(f ) phase response) from the Z-transform of an FIRimpulse response according to (1.33) is shown in Fig. 1.24 and is calculated by thefollowing M-file 1.14.(a) Impulse Response h(n)I l 0.7I(b) Magnitude Response IH(f)lI- 1 0 1 2 3 4 5n-l(c) PoleLZero plot0 10 20 30 40fin kHz +O l T T - - - -(d) Phase Response L H(f)TS -1IU-1.5-2‘ 10 20 30f in kHz +3Figure 1.24 FIR system: (a) impulseresponse, (b) magnituderesponse, (c) pole/zeroplot and (d) phase response (sampling frequency fs= 40 kHz).
47.
1.3 Conclusion 291.3 ConclusionIn this first chapter some basic conceptsof digital signals, their spectra and digitalsystems have been introduced. The descriptionis intended for persons with littleorno knowledge of digital signal processing. The inclusion of MATLAB M-files for allstages of processing may serve as a basis for further programming in the followingchapters. Aswell as showingsimpletools for graphicalrepresentations of digitalaudio signals we have calculated the spectrum of a signal x(.) by the use of theFFT M-file0 Xmagnitude=abs(fft (x))Xphase=angle(fft(x) ) .Time-domain processingfor DAFX can be performed by block-based input-outputcomputations which are based on the convolution formula (if the impulse responseof a system is known) or difference equations (if the coefficients a and b are known).The computations can be done by the following M-files:0 y=conv (h,x) %length of output signal l-y =l-h +l-x -1y=filter(b,a,x) %l-y =l-xThese M-files deliver an outputvector containing the output signaly(n) in a vectorof corresponding length.Of course, theseblock processing algorithms perform theirinner computations on asample-by-sample basis. Therefore, we have also shownanexample for the sample-by-sample programming technique, which can be modifiedaccording to different applications:0 y=dafxalgorithm(paraeters,x)f o r n=i :length(x) ,y(n)=. ...do something algorithm with x(n) and parameters;end;% Sample-by sample algorithm y(n)=function(parameters,x(n))That is all we need for DAFX exploration and programming, good luck!Bibliography[Arf98] D. Arfib. Different ways to writedigitalaudio effects programs. InProc. DAFX-Q8Digital Audio Effects Workshop, pp. 188-191, Barcelona,November 1998.[Arf99] D. Arfib. Visual representations for digital audio effects and their control.In Proc. DAFX-99 Digital Audio Effects Workshop,pp. 63-68, Trondheim,December 1999.
48.
30 1 Introduction[ME931 C . Marvin and G. Ewers. A Simple Approach to Digital SignalProcessing.Texas Instruments, 1993.[MitOl] S.K Mitra. Digital Signal Processing - AComputer-Based Approach.McGraw-Hill, 2nd edition, 2001.[MSY98] J. McClellan, R. Schafer, and M. Yoher. DSP FIRST: AMultimediaApproach. Prentice-Hall, 1998.[Orf96] S.J.Orfanidis. Introductionto SignalProcessing. Prentice-Hall, 1996.[Zo197] U. Zolzer. Digital Audio SignalProcessing. John Wiley & Sons, Ltd, 1997.
49.
Chapter 2FiltersP. Dutilleux, U. Zolzer2.1 IntroductionThe term filter can have a large number of different meanings. In general it can beseen as a way to select certain elements with desired properties from a larger set.Let us focus on the particular field of digital audio effects and consider a signal inthe frequency domain. The signal can be seen as a set of partials having differentfrequencies and amplitudes. Thefilter will perform aselection of the partialsaccord-ing to the frequencies that we want to reject, retain or emphasize. In other words:the filter will modify the amplitude of the partials according to their frequency.Once implemented, it will turn out that thisfilter is a linear transformation. As anextension,lineartransformationscanbesaid to befilters. According to this newdefinition of a filter, any linear operation could be said to be a filter but this wouldgo far beyond the scope of digital audio effects. It is possible to demonstrate whata filter is by using one’s voice and vocal tract. Utter a vowel, a for example, at afixed pitch and then utter othervowels at the same pitch. By doing that we do notmodify our vocal cords but we modify the volume and the interconnection patternof our vocal tract. The vocal cords produce a signal with a fixed harmonic spec-trum whereas the cavitiesact as acousticfilters to enhance some portions of thespectrum. We have described filters in the frequency domain here because it is theusual way to consider them but they also have an effect in the time domain. Afterintroducingafilter classification in the frequencydomain, we will review typicalimplementation methods and the associated effects in the time domain.The various types of filters can be defined according to the following classifica-tion:e Lowpass (LP) filters select low frequencies up to the cut-off frequency fcand attenuate frequencies higher than fc.31
50.
32 2 FaltersLP BP H(f)t ResonatorHP BR NotchFigure 2.1 Filter classification.0 Highpass (HP) filters select frequencies higher than f c and attenuate fre-quencies below f c .0 Bandpass (BP) filters select frequencies between a lowercut-off frequencyf c l and a higher cut-off frequency f c h . Frequencies below fcl and frequencieshigher than fch are attenuated.0 Bandreject (BR) filters attenuate frequencies between a lowercut-off fre-quency f,.~and a higher cut-off frequency f&. Frequencies below fcl and fre-quencies higher than f c h are passed.0 Notch filters attenuate frequencies in a narrow bandwidth around the cut-offfrequency f c.0 Resonator filters amplify frequencies in a narrow bandwidth around the cut-off frequency fc.0 Allpass filters pass all frequencies but modify the phase of the input, signal.Other types of filters (LP with resonance, comb, multiple notch ...) can be de-scribed as a combination of these basic elements. Here are listed someof the possibleapplications of these filter types: The lowpass with resonance is very often used incomputer music to simulate an acoustical resonating structure; the highpass filtercan remove undesired very low frequencies; the bandpass can produce effects suchas the imitation of a telephone line or of a mute on an acoustical instrument; thebandreject can divide the audible spectrum into two bands that seem to be uncor-related. The resonator can beused to addartificial resonances to a sound; the notchis most useful in eliminating annoying frequency components; a setof notch filters,used in combination with the input signal, can producea phasing effect.
51.
2.2 Basic Falters 332.2 Basic Filters2.2.1 Lowpass FilterTopologiesA filter can be implemented in various ways. It can be an acoustic filter, as in thecase of the voice. For our applications we would rather use electronic or digitalmeans.Although we areinterested in digitalaudio effects, it is worthhavingalook at well-established analog techniques because a large body of methods havebeen developed in the past to design and build analogfilters. There are intrinsicdesign methods for digital filters but many structures can be adaptedfrom existinganalog designs. Furthermore, some of them have been tailored for ease of operationwithin musical applications. It is therefore of interest to gain ideasfrom these analogdesigns in order to build digital filters having similar advantages. We will focus onthe second-order lowpass filter because it is the most common type and other typescan be derivedfrom it. The frequency responseof a lowpass filter is shown in Fig. 2.2.The tuning parameters of this lowpass filter are the cut-off frequency fc and thedamping factor C. The lower the damping factor, the higher the resonance at thecut-off frequency.Analog Design, Sallen & KeyLet us remind ourselves of an analog circuit that implements a second-order lowpassfilter with the least number of components: the Sallen & Key filter (Figure 2.2).CFrequency responseH(f) 1-------+ ihighLFigure 2.2 Sallen & Key second-order analog filter and frequency response.The components ( R I ,R z , C) are related to the tuningparameters as:These relations are straightforward but both tuning coefficients are coupled. It istherefore difficult to vary one while the other remains constant. This structure istherefore not recommended when the parameters are to be tuneddynamically andwhen low damping factors are desired.
52.
34 2 FaltersDigital Design, CanonicalThe canonical second-order structure, asshown in Fig. 2.3, can be implemented bythe difference equationy(n) = box(?%)+b1z(n - l)+b2x(n - 2)-u1y(n - 1)- uay(n- 2). (2.2)Figure 2.3 Canonical second-order digital filter.It can be used for any second-order transfer function according toIn order to modify the cut-off frequency or the damping factor, all 5 coefficientshave to be modified. They can be computedfrom the specification in the frequencyplane or froma prototype analog filter.One of the methods that can be used isbased on the bilinear transform [DJ85].The following set of formulas compute thecoefficients for a lowpass filter:f c analog cut-off frequencyc damping factorf s sampling frequencyc = l/[tan(7rfc/fs)l(2.4)This structure has the advantage that itrequires very few elementary operationsto process the signal itself. It has unfortunately some severe drawbacks. Modifying
53.
2.2 Basic Falters 35the filter tuning (fc, 4)involves rather complex computations. If the parameters arevaried continuously, the complexity of the filter is more dependent on the coefficientcomputation than on the filtering process itself. Another drawback is the poor signalto noise ratio for low frequency signals.Other filter structures areavailable that copewith these problems. We will again review a solution in the analog domain and itscounterpart in the digital domain.State Variable Filter, AnalogFor musical applications of filters one wishes to have an independent control overthe cut-off frequency andthedampingfactor. A techniqueoriginating from theanalog computing technology can solve our problem. It is called the state variablefilter (Figure 2.4). This structure is more expensive than the Sallen & Key but hasindependent tuningcomponents (Rf,RC)for the cut-off frequency and the dampingfactors:Furthermore, it provides simultaneously three types of outputs: lowpass, highpassand bandpass.HigtlpassoutputBandpassoutput0 C 0 CLowpass0- --outputFigure 2.4 Analog state variable filter.State Variable Filter, DigitalThe statevariable filter has a digital implementation, as shown in Fig. 2.5 [Cha80],where4 n ) input signalY1(n) lowpass outputY b(n) bandpassoutputYh(R) highpass outputand the difference equations for the output signals are given by
54.
36 2 FaltersHighpassoutputBandpassoutputLowpassoutput ,,Figure 2.5 Digital state variable filter.With tuning coefficients F1 and Q1, related to the tuning parametersf c and C as:F1 = 2sin(rfC/f S) Q1 = 2( (2.8)it can be shown that the lowpass transfer function is:T = F1 q = l - F l Q InThis structure is particularly effective not only as farasthe filtering process isconcerned but above allbecause of the simple relationsbetween control parametersandtuning coefficients. Oneshouldconsider the stability of thisfilter, becauseat higher cut-off frequencies andlargerdampingfactorsit becomes unstable. A“usabilitylimit” given by Fl < 2 - Q1assuresthestableoperation of the statevariable implementation [DutSl, DieOO]. In most musical applications however it isnotaproblembecause the tuning frequencies are usuallysmallcompared to thesampling frequency and the damping factor is usually set to small values [Dut89a,Dat971. This filter has proven itssuitability for alargenumber of applications.The nice properties of this filter have been exploited to produce endless glissandiout of natural sounds and to allow smoothtransitions betweenextremesettings[Dut89b, m-Vas931. It is also used for synthesizerapplications [DieOO].We haveconsidered here twodifferent digital filter structures.More are available and each hasits advantages and drawbacks. An optimum choice can only be made in agreementwith the application [Zo197].NormalizationFilters are usuallydesigned in the frequencydomainand we have seen that theyhave an action also in the time domain. Another correlated impact lies in the loud-ness of the filtered sounds. The filter might produce the right effect but the result
55.
2.2 Basic Filters 37might be useless because the sound has become too weak or too strong. The methodof compensating for these amplitude variations is called normalization. Usual nor-malizationmethods are called L1, L2 and L, [Zo197]. L1 is used when the filtershould never be overloaded under any circumstances. This is overkill most of thetime. L2 is used to normalize t,he loudness of the signal. It is accurate for broad-band signals and fits many practical musical applications. L, actually normalizesthe frequency response.It is best when the signal to filter is sinusoidal or periodical.With a suitable normalization scheme the filter can prove to bevery easy to handlewhereas with the wrong normalization, thefilter might be rejected by musicians be-cause they cannot operate it. Thenormalization of the statevariable filter has beenstudied in [DutSl]where several implementation schemes are proposed that lead toan effective implementation. In practice, a first-order lowpass filter that processesthe input signal will perform the normalization in fc and an amplitudecorrection in&will normalize in (Figure 2.6). This normalization scheme allows us to operatethe filter withdampingfactors down to where the filter gain reaches about 74dB at fc.l lt”Figure 2.6 &-normalization in fc and C for the state variable filter.Sharp FiltersApart from FIR filters (see section2.2.3), we have so far only given examples ofsecond-order filters. These filters are not suitable for all applications. On the onehand, smooth spectral modifications are better realized by using first-order filters.Ontheotherhand, processing two signalcomponents differently that are closein frequency, or imitatingt8he selectivity of ourhearingsystem calls for higherorder filters. FIR filterss can offer the right selectivity but again, they will not beeasily tuned. Butterworthfilters haveattractive featuresin this case. Such filters areoptimized for a flat frequency response untilfc and yield a 6n dB/octave attenuationfor frequencies higher than fc. Filters of order 2n canbe built out of n second-ordersections, All sections are tuned to the same cut-off frequency fc but each sectionhas a different damping factor C (Table 2.1) [LKG72].These filters can be implemented accurately in the canonical second-order dig-ital filter structure but modifying the tuning frequencyinrealtimecanlead totemporary instabilities. The statevariable structure is less accurate for high tuningfrequencies (i.e. fc > fs/lO) but allows fastertuning modifications. A bandpassfiltercomprisinga4th-orderhighpass and a 4th-order lowpass was implemented
56.
38Table 2.1 Damping factors for Butterworth filters.2 Filtersn2 I( 0.707C of second-order sections1i ‘10.924 0.383 110 0.988 0.8910.707 0.454 0.1560.966 0.707 0.2590.981 0.831 0.556 0.195and used to imitatea fast varying mute on a trombone [Dutgl].Higher order filters(up to about 16) are useful to segregate spectral bands or even individual partialswithin complex sounds.Behavior in the Time DomainWe so far considered the action of the filters in the frequency domain. We cannotforget the time domain because it is closely related to it. Narrow bandpass filters,or resonant filters even more, will induce long ringing time responses. Filters canbe optimized for their frequency response or time response. It is easier to grasp thetime behavior of FIRs than IIRs. FIRs have the drawback of a time delay that canimpair the responsiveness of digital audio effects.2.2.2 ParametricAP, LP, HP, BP and BR FiltersIntroductionInthissubsection we introduceaspecialclass of parametric filter structures forallpass, lowpass, highpass, bandpass and bandreject filter functions. Parametric fil-ter structures denotespecial signal flow graphs where a coefficient inside the signalflow graph directly controls thecut-off frequency and bandwidth of the correspond-ingfilter.These filter structures are easily tunable by changing only one or twocoefficients. They play an important role for real-time control with minimum com-putational complexity.Signal ProcessingThe basis for parametric first- and second-order IIR filters is the first- and second-order allpass filter. We will first discuss the first-order allpass and show simple low-and highpass filters, which consist of a tunable allpass filter together with a directpath.
57.
2.2 Basic Filters 39First-order allpass. A first-order allpass filteris given by the transferfunction4(z) =z-1 + c1+cz-1(2.10)(2.11)The magnitude/phaseresponseandthegroup delay of a first-orderallpass areshown in Fig. 2.7. The magnitude response is equal to one and the phase responseis approaching -180 degrees for high frequencies. The group delay shows the delayof the input signal in samples versus frequency. The coefficient c in (2.10) controlsthe cut-off frequency of the allpass, where the phase response passes -90 degrees(see Fig. 2.7).Magnitude Response, Phase Response. GroupDelay-1 00 0.1 0.2 0.3 0.4 0.5-2000 0.1 0.2 0.3 0.4 0.50 00 0.1 0.2 0.3 0.4 0.5f/f, +Figure 2.7 First-order allpass filter with fc = 0.1 .fs.From (2.10) we can derive the corresponding difference equationy(n) = cz(n)+2(n - l)- cy(n - l), (2.12)which leads to the block diagram in Fig. 2.8. The coefficient c occurs twice in thissignal flow graphandcan be adjustedaccording to (2.11) to change the cut-offfrequency. A va,riant allpass structure with only one delay element is shown in t,heright part of Fig. 2.8. It is implemented by the difference equations(2.13)(2.14)
58.
40 2 FiltersDirect-form structure Allpass structureFigure 2.8 Block diagram for a first-order allpass filter.The resultingtransferfunction is equal to (2.10). For simpleimplementations atable with a number of coefficients for different cut-off frequencies is sufficient, buteven for real-time applications this structure offers very few computations. In thefollowing we use this first-order allpass filter to perform low/highpass filtering.First-order low/highpass. A first-order lowpass filtercan be achieved byadding or subtracting (+/-) the input signal from the output signal of a first-orderallpass filter. As the output signal of the first-order allpass filter has a phase shiftof -180 degrees for high frequencies, this operation leads to low/highpass filtering.The transfer functionof a low/highpass filter is then given byH ( z ) = - (1fA(z)) (LP/HP +/-)l2(2.15)(2.16)(2.17)where a tunable first-order allpass 4(z) with tuning parameter c is used. The plussign (+) denotes the lowpass operation and the minus sign (-) the highpass opera-tion. A block diagram in Fig. 2.9 represents the operations involved in performingthe low/highpass filtering. The allpass filter can be implemented by the differenceequation (2.12) as shown in Fig. 2.8.Figure 2.9 Block diagram of a first-order low/highpass filter.The magnitude/phaseresponse and groupdelay are illustratedfor low- and high-pass filtering in Fig. 2.10.The -3dB point of the magnituderesponse for lowpass and
59.
2.2 Basic Falters 41MagnitudeResponse,PhaseResponse,GroupDelayMagnitudeResponse,PhaseResponse,GroupDelay,.,0 0.1 0 2 0.3 0.4 0.5U.-aI!!-100- 5 1 L l0 0.1 0.2 0.3 0.4 0.59 l / , , , , I-100 0.1 0.2 0.3 0.4 0.5flf,+ flf, +Figure 2.10 First-order low/highpass filter with fc = O.lfshighpass is passed at the cut-off frequency. With the help of the allpass subsystemin Fig. 2.9 tunable low- and highpass systems are achieved.Second-order allpass. Theimplementation of tunablebandpassandband-reject filters can beachieved with a second-order allpass filter.The transfer functionof a second-order allpass filter is given by(2.18)(2.19)(2.20)The parameterd adjusts the cut-off frequency and the parameterc the bandwidth.The magnitude/phase response and the group delay of a second-order allpass areshown in Fig. 2.7. The magnituderesponse is againequal to oneandthephaseresponse approaches-360 degrees for high frequencies. The cut-off frequency WC de-termines thepoint onthe phasecurve, where the phaseresponse passes -180 degrees.The width orslope of the phase transition around thecut-off frequency is controlledby the bandwidth parameterW B . From (2.18) the corresponding difference equationy(n) = -cz(n) +d(l - c)z(n- 1)+z(n - 2)-d(l - c)y(n - 1)+cy(n - 2) (2.21)
60.
42 2 Filterscan be derived, which leads to theblock diagram in Fig. 2.12. The cut-off frequencyis controlled by the coefficient d and the bandwidth by coefficient c.Magnitude Response, Phase Response,Group Delay-100 0.1 0.2 0.3 0.4 0.5-400 L J0 0.1 0.2 0.3 0.4 0.5vis --tFigure 2.11 Second-order allpass filter with fc = O.lfs and fb = 0.022fs.Figure 2.12 Block diagram for a second-order allpass filter.
61.
2,2 Basic Falters 43Second-order bandpass/bandreject. Second-order bandpass and bandrejectfilters can be described by the following transfer functionH ( z ) = 5[l‘F A(z)] (BP/BR -/+)1(2.22)(2.23)(2.24)(2.25)where a tunable second-order allpassA(z)with tuning parameters c and d is used.The plus sign (+) denotes the bandpass operation and theminus sign (-) the band-reject operation. Theblock diagram in Fig.2.13 shows the bandpass and bandrejectfilter implementation based on a second-order allpass subsystem, which can be im-plemented by the signalflow graph of Fig. 2.12. The magnitude/phase response andgroup delay are illustrated in Fig. 2.14 for both filter types.Figure 2.13 Second-order bandpass and bandreject filter.Second-order low/highpassfilters. The coefficients for second-order low-and highpass filters given by the transfer function of (2.3) are shown in Table 2.2.A control of single coefficients for adjusting the cut-off frequency is not possible.A complete set of coefficients is necessary, if the cut-off frequency is changed. Theimplementation of these second-order low- and highpass filters can be achieved bythe difference equation (2.2) and the filter structure in Fig. 2.3.Table 2.2 Filter coefficients for second-order lowpass/highpass filters [Zo197] .lowpass (second-order) with K = tan (7rfc/fs)bo a2a1b2blK2 2K21+&K+K2K 2I + & K + K ~ I + & K + K ~I + & K + K ~I + & K + K ~2(K”-l) I - & K + K ~highpass (second-order) withK = tan (rfc/fs)boI+&K+K~~ - & K + K ~I+&K+K~I + & K + K ~I + & K + K ~I+&K+K~a 2a1b2bl1 -2 1 2 ( K L - l )
62.
44 2 FaltersMagnitudeResponse,PhaseResponse,GroupDelayMagnltudeResponse, Phase Response,GroupDelay.-vc3 -5m.-2 -5f 5-100 0.1 0.2 0.3 0.4 0.5-100 0.1 0.2 0.3 0.4 0.5v! 1 : : 7 - - /.--50-1 00a0 0.1 0.2 0.3 0.4 0.5c -501 ,c, , , 1-1 000 0.1 0.2 0.3 0.4 0.52,-$ 5aQ 020 0.1 0.2 0.3 0.4 0.5"f, +Figure 2.14 Second-order bandpasslbandreject100,-100 i0 0.1 0.2 0.3 0.4 0.5jr-,i X 1014-rnva -49Q -60 0.1 0.2 0.3 0.4 0.55 -6 10 0.1 0.2 0.3 0.4 0.5f/fs +filter with fc = O.lfs and fb = 0.022fs.Series connection of first- and second-order filters. If several filters arenecessary for spectrum shaping,a series connection of first- and second-order filtersis performed, which is given by the product of the single transfer functions(2.26)(2.27)(2.28)A series connection of three stages is shown in Fig. 2.15. The resulting difference
63.
2.2 Basic Falters 45Stage 1 Stage2 Stage 3Figure 2.15 Series connection of firstfsecond-order stages.equation can be split into three difference equations as given bystage 1y1(n) = hfz(n)+b;lz(n - l)- a;lyl(n - 1) (2.29).2(.> = Y1 (n) (2.30)y2(.) = bi2.2(n) +b;2z2(n- 1)+b;%2(n - 2)-a, ya(n - 1) - a;2y2(n - 2)S 2 (2.31)23(72) = Y2(n) (2.32)y(n) = b;3.3(n) +b?323(n- 1)+b;323(n - 2)-43y(n - 1)- 43y(n - 2).(2.33)stage 2stage 3Musical ApplicationsThe simple controlof the cut-off frequency and the bandwidthof these parametricfilters leadsto very efficient implementations for real-time audio applications. Onlysecond-order low- and highpass filters need the computation of a complete set ofcoefficients. The series connection of these filters can be done very easily as shownin the previous paragraph.2.2.3 FIR FiltersIntroductionThe digitalfilter that we have seen beforeis said to have an infinite impulse response.Because of the feedbackloopswithin thestructure,aninputsample will excitean output signal whose duration is dependent on the tuning parameters and canextend over a fairly long period of time. There are other filter structures without
64.
46 2 FiltersTFigure 2.16 Finite Impulse Response Filter.feedback loops (Figure 2.16). These are called finite impulse response filters (FIR),because the response of the filter to aunitimpulse lasts only for a fixed periodof time. These filters allow the building of sophisticated filter types where strongattenuation of unwanted frequencies or decomposition of the signal into severalfrequency bands is necessary. They typically require more computing power thanIIR structures toachieve similar results but when they are implemented in the formknown as fast convolution they become competitive, thanks to the FFTalgorithm.It is rather unwieldy to tunethese filters interactively. As an example, let us brieflyconsider the vocoderapplication. If the frequency bands are fixed, then the FIRimplementation can be most effective but if the frequency bands have to be subtlytuned by a performer, then the IIR structureswill certainly prove superior [Mai97].However, the filter structure in Fig. 2.16findswidespreadapplications for head-related transfer functions and the approximationof first room reflections, as will beshown in Chapter 6. For applications where the impulse response of a real systemhas beenmeasured, the FIR filter structure can be used directly to simulate themeasured impulse response.Signal ProcessingThe output/input relation of the filter structure in Fig.2.16 is described by thedifference equationN-ly(n) = cbi .z(n- i ) (2.34)i=O= boz(n)+b1z(n- 1) +...+b,Ai_1z(n - N +l), (2.35)which is a weighted sum of delayed input samples. If theinput signal is aunitimpulse 6(n),which is one for n = 0 and zero for n # 0, we get the impulseresponse of the system according toN-lh(n)= cbi .6(n- i ) . (2.36)A graphical illustration of the impulse response of a 5-tap FIR filter is shown inFig. 2.17. The Z-transform of the impulse response gives the transfer functioni=ON-l(2.37)i=O
65.
2.2 Basic Filters 47Figure 2.17 Impulse response of an FIR filter.and with z = ejn the frequency responseH(ej") = bo +ble-j" +b2e-j2" +...+b ~ - l.e-j(N-l)n (2.38)with 0 = 2rf /fs = WT.Filter design. The filters already described such as LP, HP, BP and BR are alsopossible with FIR filter structures (see Fig. 2.18). The N coefficients bo,. .. ,bN-1of a nonrecursive filter have to be computed by special design programs, which arediscussed in all DSP text books. The N coefficients of the impulseresponse canbedesigned to yield a linear phaseresponse,whenthe coefficients fulfill certainsymmetryconditions.Thesimplestdesign is basedon the inverse discrete-timeFourier transform of the ideal lowpass filter, which leads to the impulse responseh(n) =- .2fc sin [ W C / f S (n- +l)]f s 2rfclfs (.- F),n = 0,... ,N - 1. (2.39)Toimprove the frequencyresponse this impulseresponsecanbeweighted by anappropriate window function like Hamming or Blackman according to(2.40)(2.41)If a lowpassfilter is designed and animpulse responseh ~ p ( n )is derived, a frequencytransformation of this lowpass filter leads to highpass,bandpassandbandrejectfilters (see Fig. 2.18).Figure 2.18 Frequency transformations: LP and frequencytransformations to BP andHP.
66.
48 2 FiltersFrequency transformationsare performed in thetimedomain by takingthelowpass impulse response h ~ p ( n )and computing the following equations:0 LP-HP0 LP-BPhBp(n)= 2hLp(n).cos [2n-;( 71 - -N i l ) ] n = o ,... ,N - 1 (2.43)0 LP-BR- hsp(n) 72 = o,.. . , N - 1. (2.44)Another simple FIR filter design is based on the FFT algorithm and is called fre-quency sampling. Design examples for audio processing with this design techniquecan be found in [Zo197].Musical ApplicationsIf linear phase processing is required, FIR filtering offers magnitude equalizationwithout phase distortions. They allow real-time equalization by making use of thefrequency sampling design procedure [Zo197]and are attractive equalizer counter-parts to IIR filters, asshown in [McG93]. A discussionof more advanced FIR filtersfor audio processing can be found in[Zo197].2.2.4 ConvolutionIntroductionConvolution is a generic signal processing operation like addition or multiplication.In the realm of computer music it has nevertheless the particular meaning of im-posing a spectral or temporal structure onto a sound. These structures areusuallynot defined by a set of few parameters, such as the shape or the timeresponse of afilter, but given by a signal which lasts typically a few seconds or more. Althoughconvolution has been known and used for a very long time in the signal processingcommunity, itssignificance for computer music and audioprocessing has grown withthe availability of fast computers thatallow long convolutions to be performed in areasonable period of time.
67.
2.2 Basic Filters 49Signal ProcessingWe could say in general that the convolution of two signals means filtering the onewith the other. There are several ways of performing this operation. The straight-forward method is a direct implementation in a FIR filter structure but it is com-putationally very ineffective when the impulse response is several thousand sampleslong. Another method, called the fast convolution,makes use of the FFT algorithmto dramatically speed up the computation. The drawback of the fast convolutionis that it has a processing delay equal to the length of two FFT blocks, which isobjectionable for real-time applications whereasthe FIR method has the advantageof providing a result immediately after the first sample has been computed. In or-der to take advantageof the FFT algorithm while keeping the processing delayto aminimum, low-latency convolution schemes have been developed whichare suitablefor real-time applications [Gar95, MT991.Theresult of convolution canbeinterpreted in both the frequency andtimedomains. If U(.) and b(n)are thetwo convolvedsignals, the output spectrumwill begiven by the productof the two spectra S ( j )= A ( f ) . B ( f ) .The time interpretationderives from the fact that if b(n) is a pulse at time k,we will obtain a copy of u(n)shifted at time ko, i.e. s(n) = a(n - k). If b(n) is asequence of pulses, wewillobtain a copy of U(.) in correspondence to everypulse, i.e. a rhythmic, pitched,or reverberated structure, depending ont,he pulse distance. If b(n) is pulse-like, weobtain the same pattern witha filtering effect. In this case b(n)should beinterpretedas animpulse response. Thus convolutionwill result in subtractive synthesis,wherethe frequency shape of the filter is determined by a real sound. For example theconvolution with a bell sound will be heard asfiltered by the resonances of the bell.In fact the bell sound is generated by a strike on the bell and can be considered asthe impulse response of the bell. In this way we can simulate the effect of a soundhitting a bell, without measuring the resonances and designing the filter. If bothsounds a(n) and b(n) are complex in time and frequency, the resulting sound willbe blurred andwill tend tolack the original sound’s character.If both sounds areoflong duration andeach has a strong pitch and smooth attack, the resultwill containboth pitches and the intersectionof their spectra.Musical ApplicationsThe sound example “quasthal” [m-quasthal] illustrates the use of the impulse re-sponse as a wayof characterizing a linear system. In this example, a spoken wordis convolved with a series of impulses which are derived from measurements of 2loudspeakersand of 3 rooms. Thefirstloudspeaker, a small studiomonitor, al-tersatleasttheoriginalsound.The secondloudspeaker, a sphericalone, colorsthe sound strongly. When the sound is convolved with the impulse responses of aroom, it is projected in the corresponding virtual auditory space [DMT99]. A dif-fuse reverberationcanbeproduced by convolving withbroadbandnoisehavinga sharp attack and exponentially decreasing amplitude. Another example featuresa tuba glissando convolved by a series of snare-drum strokes. The tuba is trans-formed in something like a tibetan trumpet playing in the mountains. Each stroke
68.
50 2 Filtersof the snare drum produces a copy of the tuba sound. Since each stroke is noisyand broadband, it acts like a reverberator. The series of strokes acts like severaldiffusing boundaries and produces the type of echo that can be found in naturallandscapes [DMT99, m-tubg5snal.The convolution can beused to mapa rhythm pattern ontoa sampled sound. Therhythm pattern can be defined by positioning a unit impulse at each desired timewithin a signal block. The convolution of the input sound with the time patternwillproduce copies of the input signal at each of the unit impulses. If the unit impulseis replaced by a more complex sound, each copy will be modified in its timbre andin its time structure. If a snare drum stroke is used, the attacks will be smearedand some diffusion will be added [m-gendsna]. The convolution has an effect bothin the frequency andin the time domain.Take a speech sound withsharp frequencyresonances and a rhythm pattern defined by a series of snare-drum strokes. Eachword will appear with the rhythm pattern, also the rhythm pattern will be heardin each word with the frequency resonances of the initialspeech sound [m-chu5sna].The convolution as a tool for musicalcompositionhasbeeninvestigated bycomposers such as Horacio Vaggione[m-Vag96, Vag981 and Curtis Roads [Roa97].Because the convolution has a combined effect in the time andfrequency domains,some expertiseis necessary to foresee the result of the combination of two sounds.2.3 EqualizersIntroduction and Musical ApplicationsIn contrast to lowlhighpass and bandpasslreject filters, which attenuate theaudiospectrum above or below a cut-off frequency, equalizers shape the audio spectrumby enhancing certain frequency bandswhile others remainunaffected. They are builtby a series connection of first- and second-order shelving and peak filters,which arecontrolled independently(see Fig. 2.19). Shelving filters boost or cut thelow or highfrequency bands with the parameter cut-off frequency fc and gain G. Peak filtersboost or cut mid-frequency bands with parameters cut-off frequency fc, bandwidthfb and gain G. One often usedfilter type is the constant Q peak filter. TheQ factoris defined by the ratio of the bandwidth to cut-off frequency Q = k.The cut-offfrequency of peak filters are then tuned, while keeping the Q factor constant. Thismeans that the bandwidth is increased when thecut-off frequency is increased andvice versa. Several proposed digitalfilter structures for shelving and peakfilters canbe foundin the literature [Whi86,RM87, Dut89a, HB93, Bri94, Orf96,Orf97, Zo1971.Applications of these parametric filters can be found in parametric equalizers,octave equalizers (fc=31.25, 62.5, 125, 250, 500, 1000, 2000, 4000, 8000, 16000 Hz)and all kinds of equalization devices in mixing consoles, outboard equipment andfoot pedal controlled stomp boxes.
69.
2.3 Equalizers 51Cut-off frequencyf, Cut-off frequencyf,Cut-offfrequency f, Cut-offfrequency f,Gain G in dBBandwidthfBandwidthfGain G in ddGain G in dBGain G in ddFigure 2.19 Series connection of shelving and peak filters.2.3.1 Shelving FiltersFirst-order DesignFirst-order low/high frequency shelving filters[Zo197]can be describedby the trans-fer functionH ( z )= 1+ -[l+&A(z)] (LF/HF +/-)H02(2.45)with the first-order allpass(2.46)The block diagraminFig. 2.20 shows a first-order low/high-frequency shelvingFigure 2.20 First-order low/high-frequency shelving filter.
70.
52 2 Filtersfilter, which leads to the followingdifference equations:y1(n) aB/Cz(n)+ - 1)- aB/cyl (n- 1) (2.47)?An> = 2[.(n) Yl(n)l+ 4.).H0 (2.48)The gain G in dB for low/high frequencies can be adjusted by the parameterH0 = V - 1, with V. = 1OGl2. (2.49)The cut-off frequency parameter U B for boost and a c for cut can be calculated as(2.50)(2.51)The cut-off frequency parameters for boost and cut for a first-order high-frequencyshelving filter [Zo197]are calculated by(2.52)(2.53)Magnitude responses for alow-frequencyshelving filter are illustrated in the leftpart of Fig. 2.21 for several cut-off frequencies and gain factors. The slope of thefrequency curves for these first-order filters are with 6 dB per octave.Second-order DesignFor several applications especially in advanced equalizer designs the slope of theshelving filter is further increasedby second-order transfer functions. Design formu-las for second-order shelving filters are given in Table 2.3 from [Zo197].Magnituderesponses for second-order low/high frequency shelving filters are illustrated intheright part of Fig. 2.21 for two cut-off frequencies and several gain factors.2.3.2 Peak FiltersA second-order peak filter [Zo197]is given by the transfer functionHo2H ( z )= 1+ -[l- (2.54)where
71.
2.3 Equalizers 53Table 2.3 Second-order shelving filter design with K = tan (rfC/fs)[Zo197].low-frequency shelving (boost V0 = loG/”)I + J Z K + K ~2(K2-1)I I I Ilow-frequency shelving(cut V. =Ihigh-frequency shelving (boost V0 = loG/”)I I I Ihigh-frequency shelving (cut V0 = 10-G/20)bo I bl I b2 a2a1First-orderShelvingFiltersSecond-orderShelvingFiltersf i n H z + f i n H z +Figure 2.21 Frequency responses for first-order and second-order shelving filters.is a second-order allpass filter. The block diagram in Fig. 2.22 shows the second-order peak filter, which leads to the following difference equations:y1(n) = -uqcz(n) +d(1- uB/c)z(n- 1)+z(n- 2)(2.56)
72.
54 2 Filters:+Figure 2.22 Second-order peak filter.The center/cut-off frequency parameter d and the coefficient H0 are given by(2.58)(2.59)(2.60)The bandwidthfb is adjusted through the parametersag and a c for boost and cutand are given by(2.61)(2.62)This peak filter offers almost independent control of all three musical parameterscenter/cut-off frequency, bandwidth and gain. Another design approachfrom [Zo197]shown in Table 2.4 allows direct computation of the five coefficients for a second-order transfer function as given in the difference equation (2.2).Frequency responses for several settings of a peak filter are shown in Fig. 2.23.The left part shows a variation of the gain with a fixed center frequency and band-width. The right part show for fixed gain and center frequency a variation of thebandwidth or Q factor.
73.
552.4 Time-varying FiltersTable 2.4 Peak filter design with K = tan (.-fc/fs) [Zo197].peak (boost v0 = loG/”)boI + ~ K + K ~~ + & K + K zI + & K + K ~1 + x K + K 2a2a162blI f ~ K f K 2I++K+K~mQ mI - - ~ K + K ~Z(K2-1)I - P K f K ’2(K2-1) Q mpeak (cut v0 =bo 1 bz a1201510T 5mn oc -5I-10-15.---20,2.4Second-order Peak FiltersParameter: Gain FactorY 1 1200 2000 20000Second-order Peak FiltersParameter: Bandwidthfin Hz --f fin Hz +Figure 2.23 Frequency responses second-order peak filters.Time-varying FiltersTheparametricfilters discussed inthe previous sections allow thetime-varyingcontrol of the filter parameters gain, cut-off frequency and bandwidth or Q factor.Special applications of time-varying audio filters will be shown in the following.2.4.1 Wah-wah FilterThe wah-wah effect is produced mostlyby foot-controlled signal processors contain-ing a bandpassfilter with variable centerlresonant frequency and asmall bandwidth.Moving the pedal back and forth changes the bandpass cut-offlcenter frequency.The “wah-wah” effectis then mixedwith the direct signal as shown in Fig. 2.24.This effect leads to a spectrum shaping similar to speech and produces a speechlike “wah-wah” sound. If the variation of the center frequency is controlled by the
74.
56l-mir. ......2 FiltersFigure 2.24 Wah-wah: time-varyingbandpass filter.input signal, a low-frequency oscillator is used to change the center frequency. Suchan effect is called an auto-wah filter. If the effect is combined with a low-frequencyamplitude variation, which produces a tremolo, the effect is denoted a tremolo-wahfilter. Replacing the unitdelay in the bandpass filter by an M tap delay leads to theM-fold wah-wah filter [Dis99],which is shown in Fig. 2.25. M bandpass filters arespread over the entire spectrum and simultaneously change their center frequency.When a white noise input signal is applied to an M-fold wah-wah filter, a spectro-gram of the output signal shown in Fig. 2.26 illustrates the periodic enhancementof the output spectrum. Table 2.5 contains several parameter settings for differenteffects.Figure 2.25 M-fold wah-wah filter.Table 2.5 Effects with M-fold wah-wah filter [Dis99].Wah-Wah i 1 i -/3kHz i 200Hz 1~ , _...1 M-fold Wah-Wah I 5-20 I 0.51- I 200-500Hz l I , lBell effect I 100 I 0.5/- I lOOHz 12.4.2 PhaserThe previous effect relies on varying the center frequency of a bandpass filter. An-other effect uses notch filters: phasing. A set of notch filters, that can be realizedas a cascade of second-order IIR sections, is used to process the input signal. Theoutput of the notchfilters is then combined with the directsound. The frequen-cies of the notches are slowly varied using a low-frequency oscillator (Figure 2.27)[Smi84]. “The strong phase shifts that exist around the notch frequencies combine
75.
2.4 Time-varying Filters 57x 100 0.2 0.4 0.6 0.81 1.2 1.4 1.6 1.8TimeFigure 2.26 Spectrogram of output signalof a time-varyingM-fold wah-wah filter[DisSS].with the phases of the direct signal and cause phase cancellationsor enhancementsthat sweep up and down the frequency axis" [Orf96]. Although this effect does notrely on a delay line, it is often considered to go along with delay-line based effectsbecause the sound effect is similarto thatof flanging. An extensive discussion onthistopic is found in [Str83]. A different phasing approach is shown in Figure 2.28. Thenotch filters have been replaced by second-order allpass filters with time-varyingcenter frequencies. The cascadeof allpass filters produces time-varying phase shiftswhich lead to cancellations and amplifications of different frequency bands whenused in the feedforward and feedback configuration.Figure 2.27 Phasing.
76.
58 2 FiltersFigure 2.28 Phasing with time-varying allpass filters.2.4.3 Time-varyingEqualizers0 Time-varying octave bandpass filters, as shown in Fig. 2.29, offer the possi-bility of achieving wah-wah-like effects. The spectrogram of the output signalin Fig. 2.30 demonstrates the octave spaced enhancement of this approach.Figure 2.29 Time-varying octavefilters.0 Time-varying shelving and peak filters: the special allpass realizationof shelv-ing and peak filters has shown that a combination of lowpass, bandpass andallpass filters gives access to several frequencybands inside such a filter struc-ture. Integratinglevel measurement or envelope followers (see Chapter 5) intothese frequency bands can be used for adaptively changing the filter param-etersgain, cut-off/centerfrequency andbandwidth or Q factor.The com-bination of dynamicsprocessing, which will be discussed in Chapter 5, andparametric filter structures allows the creation of signal dependent filteringeffects with a variety of applications.
77.
2.5 Conclusion 59TimeFigure 2.30 Spectrogramof output signal for time-varying octave filters.Feedbackcancellers, which are based ontime-varyingnotchfilters,play animportant role in sound reinforcement systems. The spectrumis continuouslymonitored for spectral peaks and a very narrow-band notch filter is appliedto thesignal path.2.5 ConclusionFiltering is still one of the most commonly used effect tools for sound recordingand production. Nevertheless, its successful applicationis heavily dependent onthespecialized skills of the operator. In this chapter we have described basic filter al-gorithms for time-domain audio processing. These algorithms performthe filteringoperations by the computation of difference equations. The coefficients for the dif-ference equations are given for several filter functions such as lowpass, highpass,bandpass, shelving and peak filters. Simple design formulas for various equalizerslead to efficient implementationsfortime-varyingfilterapplications.The combi-nation of these basic filters together with the signal processing algorithms of thefollowing chapters allows the building of more sophisticated effects.
78.
60 2 FiltersSound and Music[m-chu5sna]chu5sna:vocoderspeechconvolvedwith snare-drumstrokes. DemoSound. DAFX Sound Library.[m-gendsna] gendsna: snare-drum rhythm pattern is mapped onto a gender sound.Demo Sound. DAFX Sound Library.[m-Mai97] M. Maiguashca: Reading Castaiieda. CD. Wergo2053-2, zkm 3 edition,1997.[m-Pie99] F.Pieper:Das Effekte Praxisbuch.GCCarstensen, 1999. CD. Tr. 1,35, 36.[m-quasthal] quasthal: convolutionof speech with impulse responses. Demo Sound.DAFX Sound Library.[m-Vag96] H. Vaggione: MYR-S, Composition for cello, electroacoustic set-up andtape. Festival Synthhse, Bourges 1996.[m-Vas93] P. Vasseur: PURPLEFRAME, in Le sens cachi. 100 mystkres, CDFREE WAY MUSIQUE, No. 869193, 1993.[m-tubg5sna] tubg5sna::tuba glissando convolved by a series of snare-drum strokes.Demo Sound. DAFX Sound Library.Bibliography(Brig41 R. Bristow. The equivalence of various methods of computing biquad co-efficients for audio parametricequalizers. In Proc. 97th AES Convention,Preprint 3906, San Rancisico, 1994.[Cha80]H.Chamberlin. MusicalApplications of Microprocessors. HaydenBookCompany, 1980.[Dat97] J. Dattoro. Effect design, part 1:Reverberator and other filters. J. AudioEng. Soc., 45(9):660-684, September 1997.[DieOO] S. Diedrichsen.Personalcommunication.emagicGmbH, 2000.[Dis99] S. Disch. Digital audio effects - modulationand delay lines. Master’sthesis, Technical University Hamburg-Harburg, 1999.[DJ851 C. Dodge and T. A. Jerse. Computer Music; Synthesis,composition andPerformance. Schirmer Books, 1985.
79.
Bibliography 61[DMT99] P. Dutilleux andC. Miiller-Tomfelde. AML 1 Architectureand Music[Dut89a][Dut89b][DutSl][Gar951[HB93][LKG72][Mai97][McG93][MT99][Orf96][Orf97][RM87][Roa97]Laboratory, a museum installation. In Proc. of the 16th AES Int. Conf.on Spatial SoundReproduction. Rovaniemi, Finland. Audio EngineeringSociety, pp. 191-206, April 10-12 1999.P. Dutilleux. Simple to operate digitaltime-varying filters. In Proc. 86thAES Convention, Preprint 2757, Hamburg, March 1989.P. Dutilleux.Spinning the sounds in real-time. In Proc. InternationalComputer Music Conference, pp. 94-97, November Columbus 1989.P.Dutilleux. Vers la machine d sculpter le son, modification en tempsreel des caracteristiquesfrequentielleset temporelles des sons. PhD thesis,University of Aix-Marseille 11, 1991.W.G. Gardner.Efficient convolution without input-output delay. J.AudioEng. Soc., 43(3):127-136, March 1995.F. Harris and E.Brooking. A versatile parametric filterusing an imbededall-pass sub-filter to independently adjust bandwidth, center frequency,and boostor cut.In 95th AES Convention, Preprint 3757, New York,1993.M. Labarrkre, J. Krief, and B. Gimonet. Le filtrage et ses applications.CEPADUES, 1972.M. Maiguashca. Reading Castafieda. Booklet. Wergo2053-2, zkm 3 edi-tion, 1997.D.S. McGrath. An efficient 30-band graphic equalizer implementationfora low cost dsp processor. In Proc.95th AES Convention, Preprint 3756,New York, 1993.C. Muller-Tomfelde. Low-latency convolution for real-time applications.In Proc. of the 16th AESInt. Conf.on Spatial SoundReproduction,Rovaniemi, Finland,pp. 454-460. Audio EngineeringSociety, April 10-121999.S.J. Orfanidis. Introductionto SignalProcessing. Prentice-Hall, 1996.S.J. Orfanidis.Digital parametric equalizer design withprescribednyquist-frequency gain. J. Audio Eng. Soc., 45(6):444-455, June 1997.P.A. Regalia and S.K. Mitra. Tunable digital frequency response equal-izationfilters. .IEEE Trans. Acoustics, Speech and Signal Processing,35(1):118-120, January 1987.C. Roads. Sound transformation by convolution. In C. Roads, St. Pope,A. Piccialli, and G. De Poli (eds), Musical Signal Processing,pp. 411-438.Swets & Zeitlinger Publishers, 1997.
80.
62 2 Filters[Smi84] J.O. Smith. An all-passapproach to digitalphasingand flanging. InProc. International Computer Music Conference, pp. 103-109, 1984.[Str83] A. Strange. Electronic Music,Systems, Techniques and Controls. W. C.Brown, 1983.[Vag981 H. Vaggione.Transformationsmorphologiques:quelquesexemples. InProceedings of JIM98, CNRS-LIMA, pp. G1.1-G1.lO, Marseille 1998.[Whi86] S.A. White. Design of a digitalbiquadraticpeaking or notch filter fordigital audio equalization. J. Audio Eng. Soc., 34(6):479-482, June 1986.[Zo197] U. Zolzer. Digital Audio SignalProcessing. John Wiley & Sons,Ltd,1997.
81.
Chapter 3DelaysP. Dutilleux, U. Zolzer3.1 IntroductionDelays can be experienced in acoustical spaces. A sound wave reflected by a wallwill be superimposed on the sound wave at the source. If the wall is far away, suchas a cliff, wewill hear an echo. If the wall is close to us, we will notice the reflec-tions through a modification of the sound color. Repeated reflections can appearbetween parallel boundaries. In a room, such reflections will be called flatter echo.The distancebetween the boundaries determines thedelay that is imposed to eachreflected sound wave. In a cylinder, successive reflections will develop at both ends.If the cylinder is long, we will hear an iterative pattern whereas, if the cylinder isshort, we will hear a pitched tone. Equivalents of these acoustical phenomena havebeen implemented as signal processing units.3.2 Basic Delay Structures3.2.1 FIR Comb FilterThe network that simulates a single delay is called the FIR comb filter. The inputsignal is delayed by agiven time duration. The effectwill be audible only whenthe processedsignal is combined (added) to the input signal, which acts here asa reference. This effecthas, 2 tuning parameters: the amount of time delay T andthe relative amplitude of the delayed signal to that of the reference signal. Thedifference equation and the transfer function aregiven byy(n) = z(n)+gs(n- M )H ( z ) = 1 +gz-M.with M = r / f s63
82.
64 3 DelaysFigure 3.1 FIR comb filter and magnitude response.The time responseof this filter is made up of the direct signal and the delayedversion. This simple time domain behavior comes along with interesting frequencydomain patterns. For positive values of g, the filter amplifies all frequencies thatare multiplesof 1 / ~and attenuates allfrequencies that lie in between. The transferfunction of such a filter shows a series of spikes and it looks like a comb (Fig. 3.1).That is why this type of filter is called a comb filter. For negative values of g, thefilter attenuates frequencies that are multiples of l/ and amplifies those that liein between. The gain varies between 1f g and 1 - g [Orf96]. The following M-file 3.1 demonstrates a sample-by-sample based FIR comb filter. For plotting theoutput signal use the command stem(0 :length(y) -1,y) and for the evaluationofthe magnitude and phaseresponse use the command freqz(y ,I).M-file 3.1 (firc0mb.m)x=zeros(100,1);x(1)=1; % unit impulse signal of length 100Delayline=zeros(lO,l);% memory allocation for length IOf o r n=l:length(x);y(n)=x(n)+g*Delayline(lO);Delayline=[x(n) ;Delayline(l:IO-l)] ;end ;g=o.5;As well as acoustical delays, the FIR comb filter has an effect both in the timeand frequency domains. Our ear is more sensitive to the one aspect or to the otheraccording to the rangewhere the timedelay is set. For larger valuesof T,we can hearan echo that is distinct from the direct signal.The frequencies that areamplified bythe comb are so close to each other that we barely identify the spectral effect. Forsmaller values of T,our ear can no longer segregate the time events but can noticethe spectral effect of the comb.3.2.2 IIR Comb FilterSimilar to the endless reflections at both ends of a cylinder, the IIR comb filterproducesan endless series of responses y(n) to an input x(.). Theinput signal
83.
3.2 Basicay 65circulates in a delay line that is fed back to the input. Each time the signal goesthrough the delay line it is attenuated by g. It is sometimes necessary to scale theinput signal by c in order to compensate for the high amplification produced by thestructure. It is implemented by the structure shown in Fig. 3.2 with the differenceequation and transfer function given bypositivenegativecoefficientI *FrequencyFigure 3.2 IIR comb filter and magnitude response.Due to the feedback loop, the time response of the filter is infinite. After eachtime delay r a copy of the input signal will come out with an amplitude g p wherep is the number of cycles t,hat the signal has gone through the delay line. It canreadily be seen, that (g(<I= 1 is a stability condition. Otherwise the signal wouldgrow endlessly. The frequencies that are affected by the IIR comb filter are similarto those affected by the FIR comb filter. The gain varies between 1/(1 - g) and1/(1+g). The main differences between the IIR comb and the FIR comb is thatthe gain grows very high and that the frequency peaks get narrower as 19) comescloser to 1 (see Fig. 3.2). The following M-file 3.2 shows the implementation of asample-by-sample based IIR comb filter.M-file 3.2 (iirc0mb.m)x=zeros(100,l);x(i)=1; % unit impulse signal of length 100Delayline=zeros(lO,l); % memory allocation f o r length l 0for n=l:length(x);y(n)=x(n)+g*Delayline(lO) ;Delayline=[y(n) ;Delayline(l:IO-l)] ;end;g=o. 5;3.2.3 Universal Comb FilterThe combination of FIR and IIR combfiltersleads to the universalcombfilter.This is simply an allpass filter (see Fig. 2.8) where the one sample delay operator
84.
66 3 Delaysz- is replaced by the M sample delay operator zPM and an additional multiplierFF shown in Figure 3.3. The special casesfor differences in feedback parameter FB,feedforward parameter FF and blend parameter BL are given in Table 3.1. M-file3.3 shows the implementation of a sample-by-sample universal comb filter.I - - IFigure 3.3 Universal comb filter.Table 3.1 Parameters for universal comb filterI BL I FB I FFFIR combfilter I X I 0 I XI I IIIR combfilter 1 1 I X 1 0I allpass I a I -a I l IIdelay 1 0 1 0 ( 1M-file 3.3 (unic0mb.m)x=zeros(IOO,l);x(1)=1; % unit impulse signalof length 100BL=O.5;FF=i;M=lO;Delayline=zeros(M,l); % memory allocation for lengthIOf o r n=l:length(x) ;FB=-O.5;xh=x(n)+FB*Delayline(M) ;y(n)=FF*Delayline(M)+BL*xh;Delayline=[xh;Delayline(l:M-l)];end;The extension of the above universal comb filter to a parallel connection of Ncomb filtersis shown in Figure 3.4. The feedback, feedforward and blend coefficientsare now NxN matrices to mix the input and output signals of the delay network.The use of different parameter sets leads to the applications shown in Table 3.2.3.2.4 FractionalDelayLinesVariable-lengthdelays of the input signal are used to simulateseveralacousticaleffects. Therefore, delays of the input signal with noninteger values of the sampling
85.
3.2 Basic Delay Structures 67UFigure 3.4 Generalized structure of parallel allpass comb filters.Table 3.2 Effects with generalized comb filter.I 1 delay BL 1 FB 1 FF 110o<x<11X0reverb matrix matrixmatrixinterval are necessary. A delay of the input signal by M samples plus a fraction ofthe normalized sampling interval with 0 5 frac 5 1is given byg(n)= x(n - [M +frac]) (3.6)and can be implemented by a fractional delay shown in Fig. 3.5.IM-l M M+lM-lM M+lInterpolationFigure 3.5 Fractional delay line with interpolation.Design tools for fractional delay filters can be found in [LVKL96]. An interpo-lation algorithm has to compute the output sample g(n),which lies in between thetwo samples at time instants M and M +1.Several interpolation algorithms havebeen proposed for audio applications:
86.
68 3 Delays0 linear interpolation [Dat97]y(n) = .(n - [M +I])frac+z(n - M)(1 - frac) (3.7)0 allpass interpolation [Dat97]y(n) = z(n- [M +l])frac+z(n- M)(1- frac) - y(n - 1)(1- frat) (3.8)0 sinc interpolation [FC98]0 fractionally addressed delay lines [Roc98, Roc0010 spline interpolation [Dis99]y(n) = z(n- [ M + l]). -frac36+ z(n - M ) .+ .(n - [M - l]).(1+f r a ~ ) ~- 4 .frac36(2 - f r a ~ ) ~- 4(1 - f r a ~ ) ~6+ z(n- [M - 21) .(1- frac)6They all perform interpolation of a fractional delayed output signal with differentcomputational complexity anddifferent performance properties,which are discussedin [RocOO]. The choice of the algorithm depends on thespecific application.3.3 Delay-based AudioEffects3.3.1 VibratoWhen a car is passing by, we hear a pitch deviation dueto thedoppler effect [Dutgl].This effect will be explained in another chapter but we can keep in mind that thepitch variation is due to the fact that the distancebetween the source and our earsis being varied. Varying the distance is, for our application, equivalent to varyingthe time delay. If we keep on varying periodically the time delay we will produce aperiodical pitch variation. Thisis precisely a vibrato effect. For that purpose we needa delay line and a low-frequency oscillator to drive the delay time parameter. Weshould only listento thedelayed signal. Typical valuesof the parameters are5 to 10ms as average delay-time and5 to 14Hz rate for the low-frequencyoscillator (Figure3.6) [And95, Whi931.M-file 3.4 shows the implementation for vibrato [Dis99].M-file 3.4 (vibrato .m)1 Vibratofunction y=vibrato(y,SAMPLERATE,Modfreq,Width)ya-alt=O;
87.
3.3 Delay-based Audio EffectsFigure 3.6 Vibrato.69Delay=Width; basic delay of input sample in secDELAY=round(Delay*SAMPLERATE); % basic delay in# samplesWIDTH=round(Width*SAMPLERATE); % modulation width in# samplesif WIDTH>DELAYerror(’de1ay greater than basic delay! ! ! ’ ) ;return;endMODFREQ=Modfreq/SAMPLERATE; % modulation frequency in# samplesLEN=length(x) ; % # ofsamplesinWAV-fileL=2+DELAY+WIDTH*2; % length of the entire delayDelayline=zeros(L,l); % memory allocation for delayy=zeros(size(x)) ; % memory allocation for output vectorfor n=l:(LEN-l)M=MODFREQ;MOD=sin(M*2*pi*n) ;ZEIGER=I+DELAY+WIDTH*MOD;i=f loor (ZEIGER);frac=ZEIGER-i;Delayline=[x(n) ;Delayline(l:L-l)] ;y(n,l)=Delayline(i+l)*frac+Delayline(i)*(i-frac);%y(n,i)=(Delayline(i+l)+(l-frac)*Delayline(i)-(l-frac)*ya~alt);%ya-alt=ya(n, 1) ;%y(n,I)=Delayline(i+l)*frac-3/6x . ...+Delayline(i)*( (i+f rac) ^3-4*frace3)/6x . . .+Delayline(i-l)*( (2-frac)-3-4*(i-frac)-3)/6%....+Delayline(i-2)*(l-frac)-3/6;X3rd-order Spline Intierpolation%---Linear Interpolation-----------------------------%---Allpass Interpolation------------------------------%---Spline Interpolat;ion-------------------------------end3.3.2 Flanger, Chorus,Slapback,EchoA few popular effects can be realized usingthe comb filter. They have special namesbecause of the peculiar sound effects that they produce. Consider the FIR combfilter. If the delay is in the range10 to 25 ms, we will hear a quick repetition namedslapback or doubling. If the delay is greater than 50 ms wewill hear an echo. If
88.
70 3 Delaysthetime delay is short (less than15ms)and if this delay time is continuouslyvaried with a low frequency such as 1Hz, we will hear theflanging effect. If severalcopies of theinputsignalare delayed intherange 10 to 25 ms withsmallandrandomvariationsin the delaytimes, we will hear the chorus effect, whichis acombination of the vibrato effect with the direct signal (see Table 3.3 and Fig. 3.7)[Orf96, And95, Dat971. These effects can also be implemented as IIR comb filters.The feedback will then enhance theeffect and produce repeated slapbacksor echoes.Table 3.3 Typical delay-based effects.[ Delay range (ms) I Modulation I Effect name(TYP.1 (TYP.)0 ... 20 Resonator0 ... 15Slapback25 ... 50Chorus Random10 ... 25FlangingSinusoidal> 50 EchoFigure 3.7 Chorus.Normalization. We saw in 2.2.1 that it is importanttocompensate for theintrinsic gain of the filter structure. Whereas in practice the FIR comb filter doesnot amplify the signal by more than 6 dB, the IIRcomb filter can yield a very largeamplification when 191 comes close to 1. The L2 and L, norm are given byL2 = 1/J1-s” L, = 1/(1- 191). (3.10)Thenormalization coefficient c = l/L,,whenapplied,ensures that nooverloadwill occurwith, for example, periodical input signals. c = 1/L2ensures that theloudness will remain approximatively constant for broadband signals.A standard effect structure was proposed by Dattorro[Dat97]and is shownin Fig. 3.8. It is based on the allpass filter modification towards a general allpasscomb, wherethe fixed delay line is replaced by a variable-length delayline. Dattorro[Dat97] proposed keeping the feedback tap of the delay line fixed, that means theinput signal to thedelay line xh(n)is delayed by a fixed integer delay K and withthis zh(n- K ) is weighted and fed back. The delay K is the center tab delay of
89.
3.3 Delay-based Audio Effects 71BLFigure 3.8 Standard effects with variable-length delay line.the variable-length delayline for the feed forward path. The controlsignal MOD(n)for changing the length of the delay line can either be a sinusoid or lowpass noise.Typical settings of the parameters aregiven in Table 3.4.Table 3.4 Industry standard audio effects. [Dis99]11 BL 1 FF I FB I DELAY I DEPTH 1 MOD0-3 ms 0.1-5HzSine0-2 ms 0.1-1HzSine(white) Chorus (-0.7)1-30 ms 1-30 ms lowpassnoiseDoubling 0.7 lJ.7 10-100ms 1-100ms lowpassnoise3.3.3 Multiband EffectsNew interesting sounds can be achieved after splitting the signal into several fre-quency bands, for example,lowpass, bandpass and highpass signals, as shown inFig. 3.9.Figure 3.9 Multiband effects.Variable-lengthdelays areappliedtothese signals withindividualparametersettings and the outputsignals are weighted and summed to the broad-bandsignal[FC98, Dis991. Efficient frequency splitting schemes are available from loudspeakercrossover designs and can be applied for this purpose directly. One of these tech-niques uses complementary filtering [Fli94, Orf961,which consists of lowpassfiltering
90.
72 3 Delaysand subtracting the lowpass signal from the broad-band signal to derive the high-pass signal, as shown in Fig. 3.10. The lowpass signalis then further processed by afollowing stage of the same complementary technique to deliver the bandpass andlowpass signal.x ( n ) o ~ ~ ~ ~ ~ i c !HighLP1LowLP2Figure 3.10 Filter bank for multiband effects.3.3.4 Natural Sounding Comb FilterWe have made the comparison between acoustical cylinders and IIR comb filters.This comparison might seem inappropriate because the comb filters soundmetallic.They tend to amplify greatly the high frequency components and they appear toresonate much too long comparedto theacoustical cylinder.To find an explanation,let us consider the boundaries of the cylinder. They reflect the acoustic waves withan amplitude that decreaseswithfrequency. If the comb filter shouldsound likeanacousticalcylinder,thenitshould also have a frequency-dependentfeedbackcoefficient g(f). This frequency dependence can be realized by using a first orderlowpass filter in the feedback loop (see Fig. 3.11) [Moo85].Figure 3.11 Lowpass IIR comb filter.These filters sound more natural than the plainIIR comb filters. Theyfind appli-cation in room simulators. Further refinements such as fractional delays and com-pensation of the frequency-dependent group-delay within the lowpass filter makethem suitable for the imitation of acoustical resonators. They have been used forexample to impose a definite pitch onto broadband signals such as sea waves or todetune fixed-pitched instruments such as a Celtic harp [m-Ris92]. M-file 3.5 showsthe implementationfor a sample-by-sample based lowpass IIR comb filter.M-file 3.5 (1piircomb.m)x=zeros(l00,l);x(l)=1; % unit impulse signalof length 100g=o.5;
91.
3.4 Conclusion 73b-O=O. 5;b-i=0.5;a-i=0.7;xhold=O;yhold=O;Delayline=zeros(iO,l); % memory allocation for length 10for n=i :length(x) ;yh(n)=b-O*Delayline(lO)+b-i*xhold-a_l*yhold;yhold=yh(n) ;xhhold=Delayline(lO) ;y(n)=x (n) +g*yh(n);Delayline=cy (n);Delayline(l :10-1)1;Ist-order difference equationend;3.4 ConclusionDelays are used in audio processing to solve several practical problems, for exampledelay compensation for sound reinforcement systems, and as basic building blocksfor delay-basedaudio effects, artificialreverberationand physical models for in-strument simulation. The variety of applications of delays to spatial effects will bepresented in Chapter 6.This brief introduction has described some of the basic delay structures, whichshould enable the reader to implement and experiment with delay algorithms ontheir own. We have focused Qn a small set of important delay applications such asecho, vibrato, flanger and chorus. We have pointed out the important combinationof delays and filters and their time-varying application. These basic building blocksmay serve as a source of ideas for designing new digital audio effects.Sound and Music[m-Ris92] J.-C.Risset:Lurai, pourharpe celtique et bande.RadioFrance,21.3.1992.[m-Pie99] F. Pieper: Das Effekte Praxisbuch. Delay, Kammfilter, Phaser, Vibrato,Chorus. CD, Tr. 2-7, 18-24, 28-32, Ch. 5, 7, 8, 10, 11.GC Carstensen,1999.Bibliography[And951 C.Anderton. Multieffects for Musicians. Amsco Publications, 1995.[Dat97] J. Dattoro. Effect design, part 2: Delay-line modulationand chorus. J.Audio Eng. Soc., 45(10):764-788, October 1997.
92.
74 3 Delays[Dis99] S. Disch.Digitalaudio effects - modulationand delay lines. Mastersthesis, Technical University Hamburg-Harburg, 1999.[Dutgl] P. Dutilleux. Vers la machine ci sculpter le son,modificationentempsreel des caracteristiques frequentielles et temporelles des sons. PhD the-sis, University of Aix-Marseille 11, 1991.[FC98] P. Fern6ndez-Cid andF.J.Casajlis-Quir6s.Enhancedqualityand va-riety of chorus/flangeunits.In Proc.DAFX-98DigitalAudioEffectsWorkshop, pp. 35-39, Barcelona, November 1998.[Fli94] N.J. Fliege. MultirateDigitalSignalProcessing. John Wiley & Sons,Ltd, 1994.[LVKL96] T.I. Laakso,V.Vallimalki, M. Karjalainen,and U.K.Laine. Splittingthe unit delay. IEEE Signal Processing Magazine, 13:30-60, 1996.[Moo851 J.A. Moorer. About this reverberation business. In Foundations of Com-puter Music, pp. 605-639. MIT Press, 1985.[Orf96] S.J. Orfanidis. IntroductiontoSignalProcessing. Prentice-Hall,1996.[Roc981 D. Rochesso. Fractionally-adressed delay lines. In Proc. DAFX-98 DigitalAudio Effects Workshop, pp. 40-43, Barcelona, November 1998.[RocOO] D.Rochesso.Fractionallyaddresseddelay lines. IEEETrans.on Speechand Audio Processing, 8(6):717-727, November 2000.[Whig31 P. White. L enregistrementcreatif,Effetsetprocesseurs,Tomes 1 et 2.Les cahiers de lACME, 1993.
93.
Chapter 4Modulators andDemodulatorsP. Dutilleux, U. Zolzer4.1 IntroductionModulation is the process by which parameters of a sinusoidal signal (amplitude,frequency and phase) are modified or varied by an audiosignal. In the realm oftelecommunications the word modulate means: “to shift the frequency spectrumof a signal to another frequency band”. Numerous techniques have been designedto achieve this goal and some have found applications in digitalaudio effects. Inthe field of audio processin,g thesemodulationtechniques are mainly used withvery low frequency shifts of the audio signal. In particular, thevariation of controlparameters for filters or delay lines can be regarded as an amplitude modulationor phase modulation of the audio signal. Wah-wah, phaser and tremolo are typicalexamples of amplitude modulation and vibrato, flanger andchorus are examplesfor phase modulations of th’e audio signal. To gain a deeper understanding of thepossibilities of modulation .techniques we will firstintroducesimple schemes foramplitude modulation,single side band modulation and phase modulation and pointout theiruse for audio effects. The combination of these modulatorswill lead to moreadvanced digital audio effects, which will be demonstrated by several examples. Ina further section we will describe several demodulators, which extract the incomingsignal or parametersof the incoming signal for further effects processing.75
94.
76 4 Modulators and Demodulators4.2 Modulators4.2.1 Ring ModulatorInthe ring modulation (RM) theaudio signal x(.) is multiplied by a sinusoidm(n) withcarrierfrequency fc. Intheanalogdomainit was pretty difficult todo it properly but within a computer it is straightforward [Ste87]since it is a meremultiplication. The inputsignal is called the modulators(n)and thesecond operandis called the carrier m(n):If m(n)is a sine wave of frequency fc, the spectrumof the outputy(n) is made up oftwo copies of the input spectrum: thelower side band (LSB) and the upperside band(USB). The LSB is reversed in frequency and both sidebands are centered aroundfc (Figure 4.2). Depending on the width of the spectrum of s(n)and on the carrierfrequency, the side bands can be partly mirrored around theorigin of the frequencyaxis. If thecarrier signalcomprisesseveral spectralcomponents,thesame effecthappens with each component. Although the audible result of a ring modulationis fairly easy to comprehend for elementary signals, it gets very complicated withsignalshavingnumerous partials. The carrier itself is notaudible in this kind ofmodulation. When carrier and modulator are sine waves of frequencies fc and fz,one hears the sum and thedifference frequencies fc +fz and fc - fz [Ha195].Figure 4.1 Ring modulation of a signal ~ ( n )by a sinusoidal carrier-signal m(n).Figure 4.2 Ringmodulation of asignal z(n) by asinusoidalcarrier-signal m(n).Thespectrum of the modulator ~ ( n )(a)is shifted around the carrier frequency (b).When the input signal is periodic with fundamental frequency fo, a sinusoidalcarrier of frequency fc produces a spectrum with amplitudelines at thefrequenciesIkfo&fcl [De 001. A musical applicationof this effect is applied in the piece “Ofanim”by Lucian0 Berio. The first section is dominated by a duet between a child voiceand a clarinet. The transformationof the child voice into a clarinet was desired. Forthis purpose apitchdetectorcomputestheinstantaneousfrequency fo(n) of the
95.
4.2 Modulators 77voice. Then thechild voice passes through a ring modulator,where the frequency ofthe carrier fc is set to fo(n)/2.In this case odd harmonics prevail which is similarto thesound of a clarinet in the low register [VidSl]. Noticethat in order to betterrepresent the characteristic articulations of theclarinetandtoeliminatetypicalvocal portamento, a sample-and-hold unit is inserted after the pitch detector forholding the current pitch until thesuccessive one has been established.4.2.2 Amplitude ModulatorThe amplitude modulation(.AM) was easier to realize with analog electronic meansand has therefore been in use for a much longer time. It can be implemented byy(n) = [l+am(n)].x(..) (4.2)where it is assumed that the peak amplitude of m(n)is 1.The Q: coefficient deter-mines the depthof modulation. The modulationeffect is maximum with a = 1andthe effect is disengaged when a = 0. A typical application is with an audio signal ascarrier x(.) and a low-frequency oscillator (LFO) as modulator m(.) (Figure 4.3).The amplitude of the audio signal varies according to the amplitude of the LFO,and it is heard as such.(audio)Carriera 1 x(n)Modulaior(LFO) m(n)Figure 4.3 Typical application of AM.When the modulatoris an audiblesignal and the carrier asine wave of frequencyfc, the spectrum of the output y(t) is similar to that of the ring modulator exceptthat the carrier frequency can be also heard. When carrier and modulator are sinewaves of frequencies fc and fx, one hears three components: carrier,difference andsum frequencies ( f c - f z , fc, f c +fx). One should notice that due to the integrationtime of our hearing system the effect is perceived in a different manner dependingonthe frequency range of the signals. A modulationwithfrequencies below 20Hertz will beheardinthetimedomain(variation of theamplitude,tremolo inFig. 4.4) whereas modulations by high frequencies will be heard as distinct spectralcomponents (LSB, carrier, TJSB).4.2.3 Single-Side Band ModulatorThe upper and lower side bandscarrythesameinformationalthough organizeddifferently. In orderto save bandwidth and transmitterpower, radio-communicationengineers have designed the single-side band (SSB) modulation scheme (Fig. 4.5).Either the LSB or the USB is transmitted.Phaseshifted versions by 90" of the
96.
78 4 Modulators and DemodulatorsLow-frequency Amplitude Modulation(fc=20Hz)0.55 0I I-0.5 I I I I I I I I J0 1000 2000 3000 4000 5000 6000 7000 8000 9000I I I I I I I 1 I0 1000 2000 3000 4000 5000 6000 7000 8000 90001.5c 1EP 0.5---0.50 1000 2000 3000 4000 5000 6000 7000 8000 9000n +Figure 4.4 Tremoloeffect by AM.Figure 4.5 Singleside band modulatorwithcompensationfilter CF and Hilbertfilter(90” block).modulating audiosignal z(n)are denotedby 2(n)and of the carrier signal m(n)byh ( n )and are produced by Hilbert transform filters [Orf96]. The upper and lowerside band signals can be computed as follows:USB(n) = .(.)m(.) - i ( n ) h ( n )LSB(n) = z(n)m(n)+2(n)h(n).
97.
4.2 ModulatorsImpulse response compensation filter1 -0.5.-5 0 ..................... "......*..-........................c-0.5 --1 -0 20 40 60n - 1Impulse responseof Hilbert filter-0.510 20 40 60n +79Magnitude responseof compensation filter:::I,, , , ,00 0.5 1 1.5 2fin HzMagnitude responseof Hilbert filterX lo4-0 0.5 1 1.5 2fin Hz x 104Figure 4.6 Delay compensation and Hilbert filter.Adiscrete-timeHilberttransformcanbeapproximated by a FIR filter with thezero-phase impulse responsea(.) = 1- cos(.irn) = { 2/?) for n odd.irn for n even.These coefficients are multiplied with a suitable window function of length N , forexample a Hamming window, and shiftedright by to make the filter causal.Acceptablequalitycanbe achieved with N M 60. Note that the use of the FIRHilbert filter requires a delay in the direct path for the audio and thecarrier signal.Figure 4.6 shows an examplewith the compensation delay of 30 samples and aFIR Hilbert filter of length N = 61. This effect is typically used with a sine waveas carrier of frequency fc. The use of a complex oscillator for m(n) simplifies theimplementation. By using positive or negative frequencies it is then possible toselect the USB or the LSB. The spectrum of ~ ( n )is frequency shifted up or downaccording to fc. The results arenon-harmonic sound effects: a plucked-string soundis heard, after processing, like a drum sound. The modification in pitch is muchless than expected, probably because our ear recovers pitch information from thefrequency difference between partials and not only from the lowest partial of thetone [Dutgl].
98.
80 4 Modulators andDemodulators4.2.4 Frequency and Phase ModulatorThe frequency modulation (FM) is widely used for broadcasting and has found in-teresting applications for sound synthesis [Roa96].The continuous-time descriptionof an angle modulated carrier signal is given byx P M / F M (t) = cos[2rfct +4(t)] (4.6)where A, is the amplitude of the signal and the argument of the cosine is given bycarrier frequency f,and the modulating signal m(t) according toFor phase modulation (PM) the phase 4(t) is directlyproportional to the modu-lating signal m(t), while for frequency modulation the phase 4(t)is the integral ofthe modulating signal m(t). Some examples of frequency and phase modulation areshown in Fig. 4.7. In the first example the modulating signal is a sinusoid whichshows that the resulting FM and PM signals are the same. Thesecond example (c)and (d) shows the difference between FM and PM, where the modulating signal isnow a bipolar pulse signal. The lastexample in (e) and (f) shows the result of aramp type signal. The main idea behind using these techniques is the control of thecarrier frequency by a modulating signal m(n).We especially notice from Fig. 4.7the possibility to change the carrier frequency by a sinusoid and a ramp signal.Using angle modulat,ionfor audio effects is different from the previous discussion,where a modulating signalm(n)is used to modify the phase 4(t)of a cosine of fixedcarrier frequency f,.By phase modulation we mean the direct modification of thephase of the audiosignal by a controlparameter or modulatingsignal m(n).A phasemodulator for an audio signal can be regarded as a system which performs a phasemodulation of the audioinput signal x(n).The phasemodulatorsystemcanbedescribed by a time-variant impulse response h(n)and leads to a phase modulatedoutput signal z p ~ ( n )according toThe result for phase modulation (PM) of the signal z(n)can then be written asY(n) = z P M ( n ) = z(n- m(n>) (4.11)where m(n)is a continuous variable, which changes every discrete time instant n.Therefore m(n)is decomposed into an integer and a fractional part [Dat97]. Theinteger part is implemented by a series of M unitdelays, the fractionalpart isapproximated by interpolation filters [Dat97, LVKL96, Zo1971, e.g. linear interpo-lation, spline interpolation or allpass interpolation (see Fig. 4.8). The discrete-time
100.
82 4 Modulators and DemodulatorsM ,frac symbol. . %("-M) x(n-(M+l) +(M+?)tm(n)=M + fracFigure 4.8 Phase modulation by delay line modulation.Fourier transform of (4.11) yieldsY(,j") XPM(,j") = x(,j"),-mW (4.12)which shows the phase modulation of the input signal by a time-variant delay linem(n).The phase modulation is performed by the modulating signal m(.).For sine type modulation, useful for vibrato effects, the modulation signal canbe written asm(n)= M +DEPTH. sin(wMnT). (4.13)For a sinusoidal input signal theso-called resampling factorfor sine type modulationcan be derived as~ ( n )= - = 1- DEPTH. WMTCOS(WM~T).WIW(4.14)The instantaneous radian frequency is denoted by W I and the radian frequency ofthe input sinusoid is denoted by W . The resampling factor is regarded as the pitchchange ratio in [Dat97].For sine type modulation the meanvalue of the resamplingfactor a(n)is one. The consequence is an output signal, which has the same lengthas the input signal, but has a vibrato centered around the original pitch.For ramp type modulation according tom(n)= M =k SLOPE. 72, (4.15)the resampling factor a(n)for the sinusoidal input signal is given bya(n)= = 1 SLOPE.WI(4.16)The output signal is pitch transposed by a factor a and the length of the outputdata is altered by thefactor l/a. Thisbehavior is useful for pitchtransposingapplications.4.3 DemodulatorsEachmodulationhasasuitabledemodulation scheme. Thedemodulator for thering modulator uses exactly the same scheme, so no neweffect is to be expectedthere. The demodulator for the amplitude detector is called an amplitude followerin the realmof digital audioeffects. Several schemesare available, some are inspiredfrom analog designs, some are mucheasier to realize using digital techniques. Thesedemodulators are comprised of three parts: a detector, an averager and a scaler.
101.
4.3 Demodulators 834.3.1 DetectorsThe detector can bea half-wave rectifier dh(t),a full-wave rectifier d f (t),a squarerdr(t)or an instantaneous envelope detector dq(t).The first two detectorsare directlyinspired by analogdesigns.They are still useful to achieve effects havingtypicalanalog behavior. The third and fourth types aremuch easier to realize in the digitaldomain (Figure 4.9). The four detectors are computed as follows:input signalHilberttransform of x(n)= max[O,x(.)]= Ix(n)I= x2(n)= x2(n)+p2(n) (4.17)Figure 4.9 Detectors: (a) half-wave, (b) full-wave, (c)squarer, (d) instantaneous envelope.4.3.2 AveragersIn the analog domain, theaverager is realized with a resistor-capacitor(RC) networkand in the digitaldomain using afirstorder lowpass filter.Both structuresarecharacterized by a time-constant T. The filter is implemented as:9 = exP[-l/(.fs~)ld(n) = detectoroutputA n ) = (1- g)d(n)+g1J(n - 1) (4.18)The time-constantmustbe chosen in accordancewith the application. A shorttime-constant is suitable when fast variations of the input signal must be followed.A larger time-constant is better to measure the long-term amplitude of the inputsignal. This averager is nevertheless not suitable for many applications. It is often
102.
84 4 Modulators and DemodulatorsFigure 4.10 RMS (Root Mean Square)) detectors. (a) single time-constant; (b) attackand release time-constants.necessary to follow short attacks of the input signal. This calls for a very smalltime-constant, 5 ms typically. The output of the averager will then react very fastto any amplitude variation,even to theintrinsic variations within a period of a lowfrequency signal. We understand thatwe need an averager with two time-constants:an attack time-constant 7, and a release time-constant T ~ .To distinguish it fromthe basicaverager, wewill call this one the AR-averager. McNally has proposedan implementation having two fixed coefficients [McN84, 231973 and Jean-Marc Jothas an alternative where a single coefficient is varied according to the relationshipbetween the input and the output of the averager (Figure 4.10):ga = ex~[-l/(fs~a)lgr = exp[-l/(fs.r,)1d(n) detectoroutput(4.19)(4.20)4.3.3 Amplitude ScalersThe outputsof the systems described aboveare all different. In orderto get measuresthat are comparable with each other, it would be necessary to scale the outputs.Although scaling schemes are typically defined for sine waves, each type of signalwill require a different scalingscheme. To builda RMS detectoror an instanta-neous envelope detector, a root extractor would still be necessary, but building anaccurate device can be difficult in analog and computationally intensive in digital.Fortunately, itis often possibleto avoid the root extractionby modifying the circuitthat makes use of the averager output, so that it works fine with squared measures.For these practical reasons the scalingis taken into accountmost of the timewithinthe device that follows the averager output. If this device is a display, thenthescaling can be done by changing the display marks.4.3.4 Typical ApplicationsWell-known devices or typical applications relateto theprevious schemes as follows:
103.
4.4 ApplicationsEnvelooeDifferencetone85Figure 4.11 Instantaneous envelope detector as applied to detect a difference tone thatis produced by two sine waves.0 The AM-detector comprises the half-wave rectifier and of the basic averager.0 The Volume-Meter (VU-meter) is an AM-detector. It measures the averageamplitude of the audio signal.0 The Peak Program Meter (PPM) is, according to DIN45406, a full-wave rec-tifier followed by an AR-averager with 10 ms integration-time and 1500 msrelease-time.0 The RMS detector, asfound in electronic voltmeters,uses the squarer and thebasic averager.0 A sound level meter uses a RMS detector along with an AR-averager to mea-sure impulsive signals.0 The RMS detector associated with an AR-averager is the best choice for am-plitude follower applications in vocoders, computermusic and live-electronics[Dut98b, m-Fur931.0 Dynamicsprocessors use varioustypes of the abovementioned schemes inrelation to the effect and to thequality that has to beachieved.0 The instantaneous envelope detector, without averager, is useful to follow theamplitude of a signalwith the finest resolution. The output containstypi-cally audio band signals. A particular application of the e(t)detector is theamplification of difference tones (Figure 4.11) [Dut96, m-MBa951.4.4 ApplicationsSeveral applicationsof modulation techniques for audio effects are presented in theliterature [Dut98a, War98, Dis991. We will now summarize some of these effects.
104.
86 4 ModulatorsandDemodulators4.4.1 VibratoThe cyclic variation of the pitch of the input signal is the basic application of thephase modulator described in the previous section (see Fig. 4.12). The variation iscontrolled by a low-frequency oscillator.m(n)=M+DEPTH.sin(onT)Figure 4.12 Vibrato based on a phase modulator.4.4.2 Stereo PhaserThe application of a SSB modulator for astereophaser is described in [War98].Figure 4.13shows a SSBmodulator performed by a recursive allpass implementationof a Hilbert filter. The phase difference of 90” is achieved through special designedallpass filters. A further effect with this approachis a rotating speaker effect, if youconnect the output signals yh(n) and y ~ ( n )via DACs to loudspeakers.Figure 4.13 Stereo phaserbased on SSB modulation [War98].4.4.3 Rotary LoudspeakerEffectIntroductionThe rotary loudspeakereffect was first used for the electronic reproductionof organinstruments. Figure 4.14 shows the configuration of a rotating bi-directional loud-speakerhorninfront of alistener. The sound in the listener’s ears is altered bythe Doppler effect, the directional characteristic of the speakers, and phase effectsdue to air turbulence. The Doppler effect raises and lowers the pitch according tothe rotation speed. The directional characteristicof the opposite horn arrangementperforms an intensity variation in the listener’s ears. Both the pitch modificationand the intensity variation areperformed by speaker A and in the oppositedirectionby speaker B.
105.
4.4 Applications 87intensity0" 360" ($l 0" 360" ($lFigure 4.14Rotary loudspeaker [DZ99].Signal ProcessingA combination of modulation and delay line modulation can be used for a rotaryloudspeaker effect simulation [DZ99], as shown in Fig. 4.15. The simulation makesuse of a modulated delay line for pitchmodifications and amplitude modulationfor intensitymodifications. The simulation of the Doppler effect of twooppositehorns is done by the use of two delay lines modulated with 180phase shifted signalsin vibrato configuration (see Fig. 4.15). A directional sound characteristic similarto rotating speakers can be achieved by amplitude modulating the output signalof the delay lines. The modulation is synchronous to the delay modulation in amanner, that theback moving horn haslower pitch and decreasing amplitude.At thereturn point the pitch is unaltered and the amplitude is minimum. The movementin direction to thelistener causes a raised pitch and increasing amplitude. A stereorotary speaker effect is perceived due to unequal mixing of the two delay lines tothe left and right channel output.sin(wnT) l+sin(onT)A 4 sin(wnT) l+sin(onT)71-(Ml+fracl) pFY0.7Bz -(M2+frac2) t.QXFigure 4.15 Rotary loudspeaker simulation [DZ99].
106.
88 4 Modulators and DemodulatorsMusical ApplicationsBy imprinting amplitude and pitch modulationsas well as some spatialization, thiseffect makes the sounds morelively. At lower rotation speeds it is reminiscent of theechoes in a cathedral whereas at higher rotation speeds it gets a ring modulationflavor. This effect is known as “Leslie” from the name of Donald E. Leslie, whoinvented it in the earlyforties. It was licensed to electronicorganmanufacturerssuch as Baldwin, Hammond or Wurlizer but it has also found applicationsfor othermusical instruments such as the guitar or even the voice (”Blue Jay Way” on theBeatles LP “MagicalMysteryTour” [Sch94].) Ademonstration of a Leslie simu-lator can be heard on [m-Pie99]. This effect can also be interpreted as a rotatingmicrophone between two loudspeakers. You may also imagine that you are sittingon a merry-go-round and you pass by two loudspeakers.4.4.4 SSB EffectsSingle-sideband modulation can beused as a special effectfor detuning of percussioninstruments or voices. The harmonic frequency relations are modified by using thistechnique.Anotherapplication is time-variant filtering: first use SSB modulationto shift theinputspectrum,thenapply filteringorphasemodulation andthenperform the demodulation of the signal, as shown in Fig. 4.16 [Dis99, DZ991. Thefrequency shift of the inputsignal is achieved by a low-frequency sinusoid. Arbitraryfilters can be used in between modulation and demodulation. The simulationof themechanical vibrato bar of an electric guitar can be achieved by applying a vibratoinstead of a filter [DZ99]. Such a vibrato bar alters the pitch of the lower stringsof the guitar in larger amounts than the higher strings and thus a non-harmonicvibrato results. TheSSB appoach can also beused for the construction of modifiedflangers.t tsin(2nfT) UFigure 4.16 SSB modulation-filtering-demodulation: if a vibrato is performed instead ofthe filter a mechanical vibrato bar simulation is achieved.Furtherapplications of SSB modulationtechniques for audio effects are pre-sented in [Dut98a, War981.4.4.5 SimpleMorphing:Amplitude FollowingAmong the many different meanings that the word “morphing”canhave,letusnow consider its first meaning: imposing a feature of one sound onto another. Theamplitude envelope, the spectrum aswell as the time structure are features that can
107.
4.4 Applications 89be morphed. Morphing the amplitude envelope can be achieved by the amplitudefollower whereas morphing a, spectrum or a time structure can be achieved by theuse of convolution.IntroductionEnvelope following isone of the various methods developed to breathemore life intosynthetic sounds. The amplitudeenvelope of a control signal, usually coming from areal acoustical source,is measured andused to control the amplitudeof the syntheticsounds. For example, the amplitude envelope of speech can be used to control theamplitude of broadband noise. Through this process the noise seems to have beenarticulated likevoice. A refinement of this method has led to the development ofthe vocoder where the same process is applied in each of the frequency bands inwhich the voice as well as the noise are divided.An audio effect is achieved when the amplitudeof an input signal is modified bya predefined amplitude envelope or according to the amplitude of another signal.In the latter case the process is called amplitude following.Signal ProcessingIf an amplitude envelope is used, the input signal is multiplied by the output ofthe envelope generator. If a control signal is used, its envelope has to be measuredbefore it can bemultiplied with the input signal. When an accurate measurementisdesired, a RMS detector should beused. However,signals from acoustic instrumentshaveusually fairly limited amplitude variations and their loudness variations aremore dependent on spectrum modifications than on amplitudemodifications. If theloudness of the output signal has to be similar to that of the controlling signal,then an expansionof the dynamicof the controlling signalshould be performed.Aneffective way to expand the dynamicby a factor2 is to eliminate the root extractionfrom the scaler and use a much simpler MS (Mean Square) detector.IIScaler -I ,# II Decaytime-constantAttack time-constantFigure 4.17 The amplitude of an input signal z(n)is controlledby that of another signalz,(n). The amplitude of theinput signalisfirstleveledbefore the modulation by theamplitude measured on the controlling signal.
108.
90 4 Modulators and DemodulatorsMusical Applications and ControlIn “Swim, swan”, Kiyoshi F‘urukawa has extended the sound of a clarinet by ad-ditional synthetic sounds. In order to link these sounds intimately to the clarinet,their amplitude is controlled by that of the clarinet. In this case, the input soundis the synthetic sound and the controlling sound is the clarinet. The mixing of thesynthetic sounds with the clarinetis done in the acoustic domain of the performancespace [m-F‘ur93].The amplitude variations of the controlling signal applied to the input signalproduce an effect that is perceived in the time domainor in the frequency domain,according to the frequency content of the modulating signal. For sub-audio rates(below 20 Hz) the effect will appear in the time domain and we will call it “am-plitude following’’l whereas for audionlodulationrates(above 20 Hz), the effectwill be perceived in the frequency domain and will be recognized as an amplitudemodulation.If the control signal has a large bandwidth, the spectrumof the amplitude willhave to be reduced by the averager. Typical settings for the decay time constantof the averager are in the range of 30 to 100 ms. Such values will smooth out theamplitude signal so that it remains in thesub-audiorange. However, it is oftendesired that the attacks, that are present in the control signal, are morphed ontothe inputsignal as attacks and are not smoothed outby the averager. This is why itis recommended to use a shorter attack time constant than thedecay time constant.Typical values are in the rangeof l to 30 ms.The amplitude variations of the input signal could be opposite to those of thecontrolling signal, hence reducing the impact of the effect or be similar and provokean expansion of the dynamic. In order to get amplitude variations at the outputthat are similar to those of the controlling signal, it is recommended to process theinput signal through a compressor-limiter beforehand [Ha195,p. 401.In his work “Diario polacco”, Luigi Nono specifies how the singer should moveaway from hermicrophoneinorder to produce amplitude modifications that areused to control the amplitude of other sounds [Ha195,p. 67-68].Applying an amplitude envelopecanproduceinterestingmodifications of theinput signal. Take,for example, the sustained soundof a flute and apply iterativelya triangularamplitude envelope. By varying the slopes of the envelope andtheiteration rate, the originalsoundcan be affected by atremolo or a Flatterzungeand evoke percussive instruments [Dut88]. Such sound transformations are reminis-cent of those (anamorphoses) that the early electroacoustic composers were fond of[m-Sch98].4.5 ConclusionIn this chapterwe introduced the concepts of modulators and demodulators in re-lation to digital audio effects. Although well known in communication engineering
109.
Sound and Music 91and alreadysuccessfully used for music synthesizers, thespecial emphasis on modu-lation and demodulation can alsohelp to clarify the importanceof these techniquesin the field of audio effects. The interactionof modulators/demodulators with filtersand delays is one of the the fundamentalprocesses for many audio effects occurringin the real world. Several applications of modulators and demodulators may serveas examples for experiments and further research.Sound and Music[m-MBa95] M. Bach:55Sounds for Cello. Composition for cello and live-electronics. 1995.[m-Fur93] K. Furukawa:“Swim, swan”, composition for clarinetand live-electronics. ZKM, 1993.[m-Hoe82] K. Hormann and M. Kaiser: Effekte in der Rock- und Popmusik: Funk-tion, Klang, Einsatz. Bosse-Musik-Paperback ; 21, 1982. Sound exam-ples. Cassette BE 2240 MC.[m-Pie99] F. Pieper: Leslie-Simulatoren.CD, Tr. 33. in Das Effekte Praxisbuch.Ch. 12. GC Carstensen, 1999.[m-Sch98] P. Schaeffer and G. Reibel: Solfkge de l’objet sonore. Booklet +3 CDs.First published 1967. INA-GRM, 1998.Bibliography[Dat97] J.Dattoro. Effect design, part 2: Delay-linemodulation andchorus. J.Audio Eng. Soc., 45(10):764-788, October 1997.[De 001 G.De Poli. Personalcommunication, 2000.[Dis99] S. Disch.Digitalaudio effects - modulationand delay lines. Master’sthesis, Technical University Hamburg-Harburg, 1999.[Dut88] P. Dutilleux. Miseeneuvre de transformations sonores sur u n s g s t h etemps-re‘el. Technical report,RapportdestagedeDEA, CNRS-LMA,June 1988.[Dutgl] P. Dutilleux. Vers la machine h sculpter le son, modification entempsre‘el des caructe‘ristiquesfre‘quentielles et temporelles des sons. PhD the-sis, University of Aix-Marseille 11, 1991.[Dut96] P. Dutilleux. VerstarkungderDifferenztone (a-fl).In Bericht der 19.Tonmeistertagung Karlsruhe, Verlag K.G. Saw, pp. 798-806, 1996.
110.
92 4 Modulators and Demodulators[Dut98a]P.Dutilleux.Filters, Delays,ModulationsandDemodulations: A Tu-torial.In Proc. DAFX-98 DigitalAudioEffectsWorkshop, pp. 4-11,Barcelona, November 1998.[Dut98b] P. Dutilleux.Opkrasmultimkdias, le r61e des ordinateurs clans troiscrkationsduzkm. in musique et arts plastiques.In GRAME et MuseedArt contemporain, Lyon, pp. 73-79, 1998.[DZ99] S. Disch and U. Zolzer. Modulationand delay line baseddigitalaudioeffects. In Proc. DAFX-99 DigitalAudioEffectsWorkshop, pp. 5-8,Trondheim, December 1999.[Ha1951 H.P.Haller. DasExperimentalStudio der Heinrich-Strobel-Stiftung desSudwestfunks Freiburg 1971-1989,DieErforschungderElektronischenKlangumformung und ihreGeschichte. Nomos, 1995.[LVKL96] T.I.Laakso, V. Vallimalki, M. Karjalainen,andU.K.Laine.Splittingthe unit delay. IEEE SignalProcessingMagazine, 13:30-60, 1996.[McN84] G.W. McNally. Dynamicrangecontrol of digitalaudiosignals. J. AudioEng. Soc., 32(5):316-327, May 1984.[Orf96] S.J. Orfanidis. IntroductiontoSignal Processing. Prentice-Hall, 1996.[Roa96] C. Roads. TheComputerMusicTutorial. MITPress, 1996.[Sch94] W. Schiffner. Rock und Pop und ihreSounds. Elektor-Verlag, 1994.[Stet371 M. Stein. Lesmodemspourtransmission de donnees. Masson CNET-ENST, 1987.[Vidgl] A. Vidolin. Musical interpretationand signalprocessing.In G. De Poli,A. Piccialli, and C. Roads (eds), Representations of Musical Signals, pp.439-459. MIT Press, 1991.[War981 S. Wardle. A Hilbert-transformerfrequencyshifter for audio.In Proc.DAFX-98 Digital Audio Effects Workshop,pp. 25-29, Barcelona, Novem-ber 1998.[Zo197] U. Zolzer. DigitalAudioSignal Processing. John Wiley & Sons, Ltd,1997.
111.
Chapter 5Nonlinear ProcessingP. Dutilleux, U. Zolzer5.1 IntroductionAudio effect algorithms for dynamics processing,valve simulation, overdrive anddis-tortion for guitar andrecording applications, psychoacoustic enhancers and excitersfall into the category of nonlinear processing. They create intentional or uninten-tional harmonic or inharmonic frequency components which are not present in theinput signal. Harmonic distortion is caused by nonlinearities within the effect de-vice. Most of these signal processing devices are controlled by varying parametersand simultaneously listeningto the outputsignal and monitoring the output signalby a level meter. A lot of listening and recordingexperience is necessary to obtainsound results which are preferred by most listeners. The application of these sig-nal processing devices is an art of its own and of course one of the main tools forrecording engineers and musicians.Nonlinear processing/processors is the term for signal processing algorithms orsignal processing devices in the analog or digital domains which deliver an outputsignal asa sum of sinusoids y(n) = A0 +A1 sin(2xflTn)+A2 sin(2.2nflTn) +...+AN sin(N . 2xflTn),if the input signal is a sinusoid of known amplitude and fre-quency accordingto x(.) = A sin(2xflTn). A linear system will deliver the outputsignal y ( n ) = Aout sin(2xfl Tn+cp,,t) which again is a sinusoid wherethe amplitudeis modified by the magnitude repsponse ( H ( f l ) (of the transfer function accordingto Aout = (H(f1)l. A and the phase response (Pout = pin +LH(f1) is modified bythe phase LH(fl)of the transfer function.Block diagrams in Fig. 5.1 showing bothinput and output signals of a linear anda nonlinear system illustrate thedifferencebetween both systems. A measurement of the total harmonic distortion gives an93
112.
94 5 Nonlinear ProcessingOutputsignal of linearsystemSpectrumx(n)System-0.5 -30Inputsignal -10 500 1000 " 0 200 400 600 800n-rfinHz-rOutputsignalofnonlinearsystemSpectrum0 500 1000 1O t l 1-1 I-10-20-30-40IL0 500 1000 0 200._ B400 h600i800n+finHz+Figure 5.1 Input and output signals of a linear and nonlinear system. The output signalof the linear system is changed in amplitudeand phase. The outputsignal of the nonlinearsystem is strongly shaped by the nonlinearity and consists of a sum of harmonics, as shownby the spectrum.indication of the nonlinearity of the system. Total harmonic distortionis defined bywhich is the square rootof the ratioof the sumof powers of all harmonic frequenciesabove the fundamentalfrequency to thepower of all harmonic frequencies includingthe fundamental frequency.We will discuss nonlinear processing in three main musical categories. The firstcategory consists of dynamic range controllerswhere the main purpose is the con-trol of the signal envelope according to some control parameters. The amount ofharmonic distortion introduced by these control algorithms should be kept as lowas possible. Dynamics processing algorithms will be introduced in section 5.2. Thesecond classof nonlinear processing is designed for the creationof strong harmonicdistortion such as guitar amplifiers, guitar effect processors, etc. and will be intro-duced in section 5.3. These nonlinear processors can be described by a linear partand a nonlinear part. The linear partconsists of the impulse response and the non-linear part is a combination of a nonlinear function f[x(n)]of the input signal~ ( n )and linear systemsin front and behind this nonlinearity. The theory and simulationof nonlinear systems will be discussed in section 5.3.1. The third category can bedescribed by the same theoretical approach andis represented by signal processing
113.
5.2 Dynamics Processing 95devices called exciters and enhancers. Their mainfield of application is the creationof additional harmonics for a subtle improvement of the main sound characteristic.The amount of harmonicdistortion is usuallykeptsmall to avoid a pronouncedeffect. Exciters and enhancers are based on psycho-acoustic fundamentals and willbe discussed in section 5.4.5.2 DynamicsProcessingDynamics processingis performed by amplifying devices where the gain is automat-ically controlled by the level of the input signal. We will discuss limiters, compres-sors, expanders andnoise gates. A good introduction to the parametersof dynamicsprocessing can be found in [Ear761 (pp. 242-248).Dynamics processing is based onan amplitude/level detectionscheme sometimescalled an envelope follower, an algorithm to derive a gain factor from the result ofthe envelope follower and a multiplier to weight the input signal (see Fig. 5.2). Theenvelope follower calculates the mean of the absolute values Ix(n)l over a predefinedtime interval. The output to inputrelation is usually described by the characteristiccurve y(n) = f[z(n)]as shown in Fig. 5.2. Certainthresholdsare defined for achange of the output to inputbehavior. For the given example the outputis limitedto y(n) = f0.5 for Ix(n)l> 0.5 and y(n) = z(n)for Ix(n)l 5 0.5. The lower pathconsisting of the envelope detector and the following processing to derive the gainfactor g(n)is usually call the side chain path. Normally, the gain factor is derivedfrom the input signal, but the sidechain path canalso be connected to anothersignal for controlling the gain factor of the input signal.Signal ProcessingA detailed description of a dynamic range controlleris shown in Fig. 5.3. It consistsof a direct path for delaying the input signal and a side chain path. The side chainpath performs a level measurement and a subsequent gain factor calculation whichis then used as a gain factor for the delayed input signal. The level measurement isfollowed by a static function and a part for the attack andrelease time adjustment.Besides the time signals x(n),f(n),g(n)and y(n) the corresponding signal levelsX , G and Y are denoted. These level values are the logarithm of the root meansquare xrms(n) (RMS value) or peak value xpeak(n)of the time signals accordingto X = 20-log,,,(.). The multiplication g(n) = g ( n ) .z(n- D) at the outputof thedynamic range controller can be regardedas an addition in the logarithmic domain.This means Y = X +G in dB. The calculation of the time-variant gain factor g(n)is usuallyperformedwithalogarithmic level representation,becausethehumansensitivity of loudness follows a logarithmic relation.The delay of D samples in thedirect path allows for the time delay of the side chain processing, which is mainlymade up of the level measurement and the attack and release time adjustments.The static function for the output level versus the input level (in dB) and thecalculations are shown in the left part of Fig. 5.4. Inside thisrepresentation the
114.
96 5 Nonlinear Processingox(n)L f,y t )Gain factorg(n)Level DetectionInputsignal x(n)Gain factorsignal g(n)10.5-0.50 o . : F lo.:r-1 00 100 200 300 400 0 100 200 300 400Envelope of x(n)10.50-0.50 -10 100 200 300 400 0 100 200 300 400n + n +Figure 5.2 Block diagram of a nonlinear signal processing device with envelopedetector.The lower plots show the input signal z(n),the envelope of z(n),the derived gain factorg ( n ) and the output signal y(n).&x(n-D)LevelStaticMeasurementcurveFigure 5.3 Block diagram of a dynamic range controller [Zo197].
115.
5.2 Dynamics Processing 97X[dB] -0-10-20-30-40-50-60-70-80-90 tW B l-80 -NT-X[dB]-60 -50 -40 -30 -20 -10 070LT0-10-20-30-40-50-60-70-80-90 tW B IFigure 5.4 Static characteristic of a dynamic range controller [Zo197].thresholds for limiting (LT limiter threshold), compression (CT compressor thresh-old), expansion (ET expander threshold) and noise gate (NT noise gate threshold)are denoted. In the right part of Fig. 5.4 the input level versus the gain level (indB) is shown which clearly shows the four regions of operation. For the descriptionof the static function two further parameters, namely the slope factor S = 1-andthe compression factor R = = are used. The compression factor Rrepresents the fractionof input level change ALI to outputlevel change ALo. Withthe help of Fig. 5.5 for Compression the equation Y = CT + h ( X - CT) and theequation R = = tan& canbederived.X[dB] - X[dB] ----*-90 -80 -70-80 -50 -40 -30 -20 -10 0 -90 -80 -70 -60 -50 -40 -30 -20 -10 0CR=3/1cs=1-1/3=2/31-20-30-40 -40-50 -50-60 -60W B IFigure 5.5 Static Characteristic: definition of slope S and ratio R [Zo197].Typical values for compression factor and slope factor for the four regions ofoperation are:Limiter R = mCompressor 1< R < mLinear part R = lExpander 0 < R < 1Noise gate R = OS = lo < S < lS = o-m< S < oS =-m.
116.
98 5 Nonlinear ProcessingThe calculation of the control parameter f ( n )in the logarithmic domain F indB can be performed by simple line equations [Zo197, RZ95] given byLimiter FL = -LS(X - L T ) +CS(CT - LT)Compressor FC = -CS(X - CT)Linear part Flin = 0Expander FE = -ES(X - E T )Noise gate FNG = - N S ( X - N T ) +ES(ET - NT).The dynamic behavior of a dynamic range controller is influenced by the levelmeasurement approach (with attackAT and release time RT for peak measurementand averaging time TAV for RMS measurement) and further adjusted with specialattack/release times which canbe achieved by the systems shown in Figures 5.6and 5.7.ATI ATAVFigure 5.6 RMS and peak measurement (envelope detector/follower)for a dynamic rangecontroller[McN84,261971.Figure 5.7 Dynamic filter: attack and release time adjustmentfor a dynamic range con-troller [McN84,Z6197].These envelope detectors/followers can also be used for other musical applica-tions. The calculation of the attack time parameter is carried out by
117.
5.2 Dynamics Processing 99where t is the time parameterin seconds and T is the sampling period. The releasetimeparameter RT and the averaging parameter TAV canbecomputed by thesameformula by replacing AT by ~ R Tor tTAV, respectively. Further details andderivationscanbefound in [McN84, 261971. The output factor f ( n )of the staticfunction is used as the input signal to the dynamic filter with attack and releasetimes in Fig. 5.7. The outputsignal g ( n ) is the gain factorfor weighting the delayedinput signal x(n - D ) as shown in Fig. 5.3. In the following sections some specialdynamic range controllers are discussed in detail.5.2.1 LimiterThe functional units of a limiter are shown in Fig. 5.8. The purpose of the limiteris to provide control over the high peaks in the signal and to change the dynamicsof the signal as little as possible. A limiter makes use of peak level measurementand should react very quickly to extensions of the limiter threshold. Typical para-meters for a limiter are AT = 0.02 ...0.04. ..10.24 msec and tRT = l ...l30 ...5000msec for the peak measurement and tAT = 0.02.. .0.04...10.24 msec and ~ R T=1...130.. .5000 msec for the attack/release time adjustment. The fast attack andrelease of a limiter allow t,he volume reductionassoonasthesignal crosses thethreshold. By lowering the peaks, the overall signal can be boosted. Beside limitingsingle instrumentsignals,limiting is alsooftenperformedon the final mix of amultichannel application.Figure 5.8 Block diagram of a limiter [Zo197].The following M-file 5.1may serve as an example of a sample-basedlimiterimplementation.M-file 5.1 (1imiter.m)% Limiter.manzahl=220;f o r n=l :anzahl,x (n)=o. 2*sin(n/5);end ;f o r n=anzahl+l:2*anzahl;end ;x(n)=sin(n/5) ;
118.
100 5 Nonlinear Processingslope=l;tresh=0.5;rt=O.Ol;at=O.4;xd(l)=O;for n=2:2*anzahl;a=abs(x(n))-xd(n-l);if a<O, a=O; end;xd(n)=xd(n-l)*(l-rt)+at*a;if xd(n)>tresh,f(n)=lO-(-slope*(loglO(xd(n))-logiO(tresh)));% linear calculation of f=iO-(-LS*(X-LT))else f(n)=l;end;y(n)=x(n)*f(n);end;Figure 5.9 demonstrates the behavior of the time signals inside a limiter configura-tion of Fig. 5.8.A variant for a hard limiter is given by the following M-file 5.2 which makes useof lowpass filtering the absolute value x(n)by a second order Butterworth filterto obtain thegain factor g(n)[Ben97].M-file 5.2 (hard-limiter .m)function y=hard-limiter(x, limit)% Hard sound limiter, limit- normalized limitg=filter(le-5*[0.45535887 0.91071773849 0.455358871,......[l -1.99395528 0.9939734941 ,abs(x) ;1detects the envelope of the signal with a second orderh=h/max (h);for n=l:length(x)Butterworth filter, cut off frequency30 Hzif h(n)>limit %if the signal envelope is above the limitend;x(n)=x(n)*limit/g(n) ;end;y=x;5.2.2 Compressor and ExpanderA dynamic range controller for compression and expansion is shown in Fig. 5.10.The gain factor calculation is based on an RMS measurement and some computa-tions in the logarithmic domain. Typical parametersfor compressors and expandersare tAT = 5 msec and ~ R T= 130 msec for the RMS measurementand tAT =
119.
5.2 Dynamics ProcessingInput signal x(n)101Output signal y(n)10.50-0.5-1I t0.50-0.5l Ii-1 t0 100 200 300 400n - 1Filter output signalxpeak(n)0 100 200 300 400n-,Gain signal f(n)0 100 200 300 400n - 10 100 200 300 400n +Figure 5.9 Signals for limiter.0.16.. . S . . .2600 msec and tRT = 1...130...5000 msec for the attacklrelease timeadjustment. The programmingis similar to theimplementation for the limiterfromthe previous subsection.Range Detector-FFigure 5.10 Block diagram of a compressor/expander [Zo197].A different method of implementation is givenby [Ben97], which follows the
120.
102 5 NonlinearProcessingcomp>0comp=O O<comp<-lexpandercompressorFigure 5.11 Block diagramof a compressor/expander [Ben97].block diagram in Fig. 5.11. The corresponding M-file 5.3 illustrates the implemen-tation.M-file 5.3 (c0mpexp.m)function y=compexp(x,comp,release,attack,a,Fs)% Compressor/expander% comp - compression: O>comp>-i, expansion: O<comp<i% a - filter parameter <lh=filter([(l-a)-2], [l.OOOO -2*a a-21 ,abs(x));h=h/max (h);h=h.-comp;y=y*max(abs(x)>/max(abs(y)) ;y=x.*h;Compressors are used for reducing the dynamics of the input signal.Quiet partsor low levels of a signal are not modified but high levels or loud parts are reducedaccording to the staticcurve. The result is that thedifference between the loud andquiet parts is lessened, and thus the overall signal level can be boosted, and thusthe signal is made louder. Expanders operate on low level signals and increase thedynamics of these low level signals. This leads to a lively sound characteristic.5.2.3 Noise GateThe functional unitsof a noise gate areshown in Fig. 5.10. The decision to activatethe gate is based on a peak measurement which leads to a fade in/fade out of thegain factor g ( n ) with appropriate attack and release times. The input to the timeconstant system is set to zero if the input level falls below the noise gate threshold,and is set to one if theinput level exceeds the noise gate level. The M-file 5.4demonstrates an implementation with a hold time [Ben97].M-file 5.4 (noisegt .m)function y=noisegt(x,holdtime,ltrhold,utrhold,release,attack,a,Fs)%y=noisegt(x,holdtime,ltrhold,utrhold,release,attack,a,Fs);% holdtime - time in seconds thesound level has tobe below thenoise gate with hysteresis% threshhold valuebefore thegate is activated
121.
5.2 Dynamics Processing 103Figure 5.12 Block diagram of a noise gate [Zo197].% ltrhold - threshold value for activating the gate% utrhold - threshold value for deactivating the gate> ltrhold% release - time in seconds before the sound level reaches zero% attack - time in seconds before the output sound level is the% same as the input level after deactivating the gate% a - pole placement of the envelope detecting filter<l% Fs - sampling frequencyrel=round(release*Fs); %number of samples for fadeatt=round(attack*Fs); %number of samples for fadeg=zeros(size(x)) ;lthcnt=O;uthcnt=O;ht=round(holdtime*Fs) ;h=filter( [(1-a)"2], [l.OOOO -2*a a-21,abs(x)) ;%envelope detectionh=h/max(h) ;for i=l:length(h)% Value below the lower threshold?if (h(i)<=ltrhold) 1 ((h(i)<utrhold) & (lthcnt>O))lthcnt=lthcnt+l;uthcnt=O;if lthcnt>ht% Time below the lower threshold longer than the hold time?if lthcnt>(rel+ht)g(i)=O;elseg(i)=l-(lthcnt-ht)/rel; % fades the signal to zeroend;g(i)=O;g(i)=l;elseif ((icht) 8! (lthcnt==i))elseend;elseif (h(i)>=utrhold) I ((h(i)>ltrhold) & (uthcnt>O))1Value above the upper threshold or is the signal being faded in?uthcnt=uthcnt+l;
122.
104 5 Nonlinear Processingif (g(i-i)<l)% Has the gate been activatedor isnt the signal faded in yet?g(i)=max(uthcnt/att,g(i-l));g(i)=i;end ;lthcnt=O;g(i)=g(i-l) ;lthcnt=O;uthcnt=O;elseelseend ;end ;y=x .*g;y=y*max(abs(x))/max(abs(y));The main use of a noise gate is to eliminate noise when the desired signal is notpresent. The noise gate attenuates only the soft signals. A particular applicationis found when recording a drum set. Each element of the drum set has a differentdecay time. When they are not manually damped, their sounds mix together andthe result is no longer distinguishable. When each element is processed by a noisegate, every sound can automatically be faded out after the attack partof the sound.This results in an overall cleaner sound.Further implementationsof limiters, compressors, expanders andnoise gates canbe found in [Orf96]and in [Zo197]where special combined DRCs are also discussed.5.2.4 De-esserA de-esser is a signal processing devicefor processing speech and vocals. It consistsof a bandpass filter tuned to the frequency range between 2-6 kHz to detect thelevel of the signal in this frequency band. If a certainthreshold is exceeded, theamount of gain is used to control the gain factorof a peak- or notch-filter tuned tothe same frequency band. The peak-or notch-filter is in the direct signal path (seeFig. 5.13). As an alternative to the bandpass/notch filters, highpass and shelvingfilters areused with good results.In order to make thede-esser more robust againstinput level changes the threshold should depend on the overall level of the signal -that is, a relative threshold [NieOO].Applications of de-essers are mainly in the field of speech and vocal processingto avoidhighfrequencysibilance.Here, quitefasttimeconstantsareused. An-other application is the feedback reduction in sound reinforcement systems whereadaptively changing notchfrequencies and gain factors are required.
123.
5.2 Dgnamics Processing 105Gain factorLevel DetectionFigure 5.13 Block diagram of a de-esser.5.2.5 InfiniteLimitersIn order to catch overshoots from a compressor and limiter, a clipper - or infinitelimiter - may be used [NieOO]. Another reason for using an infinite limiter is thatthe average level rises. The infinite limiteris a nonlinear operation working directlyon the waveformby flattening the signal aboveathreshold.Thesimplestone ishard clipping which generates lots of high order harmonics. A gentler infinite lim-iter is the soft clipper which rounds the signal shape before the absolute clippingthreshold. The rounding typically consists of a low order polynomial and thereforethe harmonic spectrum rolls off faster [NieOO].Due to the fact that thesignal processing takes place in the digital domain, notonly harmonic distortion butalso aliasing distortionis generated. Aliasing distortionsounds metallic andis not really good. Although infinite limiting should be avoidedduring mix down of multi-channel recordings and recording sessions, several CDsmake use of infinite limiting or saturation (see wavefile in Fig. 5.14 and listen toCarlos Santana/Smooth [m-San99]).signal h C zoom in0.5 0.50 0-0.5 -0.5-1 -10 5 10 15 0 200 400 600n + X lo4 n-,Figure 5.14 Infinite limiting (Santana- “Smooth”).
124.
106 5 Nonlinear Processing5.3 NonlinearProcessorsThere are two approaches towardsnonlinear processing for audio applications. Thefirst approachis driven by musical applications of nonlinear effects and thesecond isdriven by the reduction of nonlinear behavior especially in the field of loudspeakers.We cover the first approach of nonlinear effects for musical processing, where topicsfrom nonlinear modeling of physical systems play an important role. Therefore, weinvestigate approximation and simulation techniques for nonlinear processors andwill show simple implementationschemes for nonlinear processors.Special nonlinearwaveshaping techniques for sound synthesis can be found in [Bru79, Arf79, De 841.5.3.1 Basics of NonlinearModelingDigital signal processing is mainly based on linear time-invariant systems. The as-sumption of linearity and timeinvariance is certainly valid for a variety of technicalsystems, especially for systemswhereinputandoutput signals arebounded toa specified amplitude range. Infactseveralanalogaudioprocessing devices havenonlinearities like valve amplifiers,analog effect devices, analog tape recorders,loudspeakersand attheend of the chain the humanhearingmechanism. Acompensation and the simulation of these nonlinearities need nonlinear signal pro-cessing and of course a theory of nonlinear systems. From several models for dif-ferentnonlinearsystems discussed in the literature the Volterra series expansionis a suitableapproach, because it is an extension of the linearsystemstheory.Notalltechnical and physicalsystemscan be described by the Volterra seriesexpansion, especially systemswithextremenonlinearities. If the inner structureof a nonlinearsystem is unknown,atypicalmeasurementset-up, as shown inFig. 5.15, with a pseudo-random signal as the input signal and recording the out-put signal is used. Inputandoutput signals allow the calculation of the linearimpulseresponse hl(n) by cross-correlation and kernels(impulseresponses) ofhigher order h2 (721, n2),h3(nl,122,n3),... ,hN(n1,... ,n ~ )by higher order cross-correlations. The linear impulse response hl(n)is a one-dimensional, h2(n1,7~2)isa two-dimensional and hN(nl,... ,n ~ )is an N-dimensional kernel. An exhaustivetreatment of these techniques can be found in [Sch80].These N kernels can be usedsync 0 -Figure 5.15 Measurement of nonlinear systems.
125.
5.3 Nonlinear Processors 107for an N-order Volterra systemmodel given byNi= 1Mv1=0 v.N=oFigure 5.16 shows the block diagram representing (5.3).Figure 5.16 Simulation of nonlinear systems by an N-order Volterra system model.A further simplification [Fra97]is possible if the kernels can be factored accordingtohi(nI,n2;.. ,ni)=hf(nl)hf(nz)..hf(ni). (5.4)Then (5.3) can be written aswhichis shown in blockdiagramrepresentation in Fig. 5.17. This representationshows several advantages especially from theimplementationpoint of view,be-causeeverysubsystem can be realized by aone-dimensionalimpulseresponse or
126.
108 5 Nonlinear ProcessingFigure 5.17 Simulation of nonlinear systems by an N-order Volterra system model withfactored kernels.the equivalent representations we have discussed in the previous chapters. At theoutput of eachsubsystem we have to perform the O i operationonthe corres-ponding output signals. The discussion so far can be applied to nonlinear systemswithmemory, which means that besidesnonlinearitieslinearfilteringoperationsare alsoincluded.Furthermaterialon nonlinearaudiosystemscanbefound in[Kai87, RH96, Kli98, FUB+98].A simulation of anonlinearsystemwithout memory, namely static nonlinearcurves, can be done by a Taylor series expansion given byNy(n) = f [x(.)] = cb2xi(n).i=OStatic nonlinear curves can be applied directlyto the inputsignal, where each inputamplitude is mapped to an output amplitude according to the nonlinear functiony = f [x(.)] (see Fig. 5.18). If one applies a squarer operation to the inputsignal ofa given bandwidth, the output signal y(n) = x2(n)will double its bandwidth. Assoon as the highest frequency after passing a nonlinear operation exceeds half thesampling frequency f s / 2 , aliasing will fold this frequency back to the base band. Insome effect applications additional aliasing distortions might be helpful, especiallyfor extreme distortions in metal music. This means that for digital signals we firsthave to performover-sampling of the input signal before applying any nonlinearoperation to the input signal in order to avoid any aliasing distortions. This over-sampling is shown in Fig.5.18wherefirstup-sampling is performed and then aninterpolation filter HI is used to suppress imagesup to thenew sampling frequencyLfs. Then thenonlinear operation can be applied followed by a band-limiting filterto fs/2 and down-sampling.As can be noticed, the output spectrum only contains frequencies up to f s / 2 .Based on this fact the entire nonlinear processing can be performed without over-sampling and down-sampling by the system shown in Fig. 5.19 [SZ99].The inputsignal is split into severallowpass versions which are forwarded to an individualnonlinearity. The output signal is a weighted combination of the individual outputsignals after passing a nonlinearity. With this approach the problem of aliasing is
127.
5.3 Nonlineav ProcessorsNonlinearity109Figure 5.18 Nonlinear processing by over-sampling techniques.Figure 5.19 Nonlinear processing by band-limitinginput range [SZSS].avoided and anequivalent approximationto theover-sampling techniqueis achieved.A comparison with our previous discussion on factored Volterra kernels shows alsoa close connection. As a conclusion for a special static nonlinearity applied to aninput signal, the input signal has to be filtered by a lowpass of cut-off frequencyf s / ( 2 .order of the Taylor series), otherwise aliasing distortions will occur.5.3.2 Valve SimulationValve or tube devices dominated signal processing during the first part of the lastcentury and have experienced a revival in audio processing every decade since theirintroduction[Bar98, Ham731. One of the most commonly used effect for electricguitars is the amplifier and especially the valve amplifier. The typicalbehavior
128.
110 5 Nonlinear Processingof the amplifier and the connected loudspeakercabinethavedemonstratedtheirinfluence on the sound of rock music over the past decades. Besides the two mostimportant guitars,namely the Fender Stratocaster and theGibson Les Paul, severalvalve amplifiers have helped in creating exciting sounds from these classic guitars.IntroductionVintage valve amplifiers. An introduction to valve amplifiers and their historycan be foundin [Fli93, Bar981 where several comments on sound characteristics arepublished. We will concentrate on the most important amplifier manufacturers overthe past and point out some characteristic features.0 Fender: The Fenderseries of guitar amplifiers goes back to the year 1946when the first amplifiers were introduced. These were based on standard tubeschematics supplied by the manufacturers of tubes. Over the years modifica-tions of the standard design approach were integrated in response to musi-cians’ needs and proposals. The range of Fender amplifiers is still expandingbut also reissues of the originals are very popular with musicians. The soundof Fenderamplifiers is the “classic tube sound”. For moreinformation’ werefer to [Fli93].0 Vox: The sound of the VOX AC30/4 is best characterized by guitar playerBrian May in [PD93] where he states “the quality at low levels is broad andcrisp and unmistakebly valve like, and as thevolume is turned up itslides intoa pleasant, creamycompression anddistortion”.There is always aringingtreblequalitythrough all level settings of theamp.The real ‘‘soul of theamp” comes out if you play it at full volume andcontrol your soundwiththe volume knob of your guitar. The heart of the sound characteristic of theVOX AC30/4 is claimed to be the use of EL84s, NFB and cathode-biasingand ClassAconfiguration. The foursmall EL84s shouldsoundmore livelythan the bigger EL34s. The sound of VOX AC30 can be found on recordingsby Brian May, Status Quo, Tom Petty and Bryan Adams.Marshall: The FenderBassman5F6 was the basis for the Marshall JTM45. The differences between bothare discussed in [Doy93] andare claimedto be theoutput transformers,speakers,input valve and feedback circuit,although the main circuit diagrams arenearly identical.The sound of Marshallis characterized by an aggressive and “crunchy” tone with brilliant harmonics,as Eric Clapton says, “I was probably playing full volume to get that sound”[Doy93].Typical representativesof the early Marshall sound are JimiHendrix,Eric Clapton, Jimmy Page and Ritchie Blackmore.Mesa-Boogie: Twocascadedpre-ampstages and a master volume for thepower amp is the basis for Mesa-Boogie amps. An ambassador for Mesa-Boogiesound is Carlos Santana.lhttp://www.fender.Com,http://www.ampwares.com
129.
5.3 Nonlinear Processors 111Signal ProcessingThesound of a valve amplifier is basedon a combination of several importantfactors. First of all the main processing features of valves or tubes are important[Bar98, Ham731. Then the amplifier circuit has its influence on the sound and, lastbut not least, the chassis and loudspeaker combination play an important role insound shaping. We will discuss all three factors now.Valve basics. Zr-iode valves [Rat95, RCA591 consisting of threeelectrodes,namely the anode, cathode and gate, are considered as having a warm and softsound characteristic. The main reasonfor this is the nonlinear transfer function foranode current versus input gate voltage of the triode which is shown in Fig. 5.20.Thisnonlinearcurvehas a quadratic shape. An inputsignalrepresented by thegate voltageVGdelivers an anode output currentIA = f(Uc)representing the out-put signal. The corresponding output spectrum shows a second harmonic as wellas the input frequency. This second harmonic can be lowered in amplitude whenthe operating point of the nonlinearcurve is shiftedrightandtheinputvoltageis applied to the more linear region of the quadratic curve. The dc component in400Nonlinear Characteristicof a Triode300--?a00100-20 -15-100-5 0Glnwt Sianal10080T60c4020n-20 -15-10-5 0GOutput Signal400300-?$OO100nv0 20 40 60 80 100t-+Output Spectrum50408 30TC.-G 20>-lo0fin kHz +IFigure 5.20 Triode:nonlinearcharacteristiccurve IA = f(lJc)and nonlineareffectoninput signal. Theoutput spectrum consists of the fundamental input frequency and asecond harmonic generated by the quadratic curve of the triode.
130.
112 5 Nonlinear Processingthe output signal can be suppressed by a subsequent highpass filter. Note also theasymmetrical soft clipping of the negative halves of the output sinusoid, which isthe result of the quadratic curveof the triode. Input stagesof valve amplifiers makeuse of these triodevalves. A design parameter is the operatingpoint which controlsthe amplitudeof the second harmonic.Pentode valves, which consist of five electrodes, are mainly used in the poweramp stage [Rat95,RCA591. They have a static characteristic like an S-curve shownin Fig. 5.21 which shows the anode output current IA versus the input gate voltageVG.If the entire static characteristiccurve is used, the outputsignals is compressedfor higher input amplitudesleading to a symmetrical soft clipping. The correspond-ing output spectrum shows the creation of odd order harmonics. For lower inputamplitudes the static characteristic curve operates in a nearly linear region, whichagain shows the control of the nonlinear behavior by properly selecting the oper-ating point. Several static characteristic curves can be chosen to achieve the S-likecurve of pentodes.Nonlinear Characteristicof a Pentode"G -+Input Signaln--20 -15 -10 -5 0U - +9Output Signal6oo -500400-?$OO200100I0 20 40 60 80 100t +Output Spectrumfin kHz 4Figure 5.21 Pentode: nonlinear characteristic curve IA = ~ ( U G )and nonlinear effectoninput signal.The technical parameters of valves have wide variation, which leads to a widevariation of sound features, although selected valves withsmallchanges of para-
131.
5.3 Nonlinear Processors 113Splitter TransformerCabinetPowerAMPFigure 5.22 Main stages of a valveamplifier.Upperleftplotshowssignal after pre-amplifier,lowerleftplotshowssignal after phase splitter, upperrightplotshowssignalafter power amplifier and lower right plot shows signal after output transformer.meters are of course available. All surrounding environmental parameters like hu-midity and temperature have their influence as well.Valve amplifier circuits. Valve amplifier circuits are based on the block dia-gram in Fig. 5.22. Several measuredsignals from a VOXAC30 at different stages ofthe signal flow path arealso displayed. This will give an indication of typical signaldistortions invalve amplifiers. The main stagesof a valveamplifier are given below:0 the input stage consists of a triode circuit providing the input matching fol-lowed by a volume control for the next stages.0 the tone control circuitry is based on low/high frequency shelving filters.0 the phase inversion/splitting stagefor providing symmetricalpower amp feed-ing. This phase splitter delivers the original input for the upper power ampand a phase inverted replica of the input for the lower power amp.0 the power ampstageperformsindividual amplification of the original andthe phase inverted replica in a class A, class B or class C configuration (seeFig. 5.23). Class A is shown in the upper left plot, where the output signal isvalid all the time. Class B performs amplification only for one half wave andclass C only for a portion of one half wave. Class A and class AB (see lowerpart of Fig. 5.23) are the mainconfigurations for guitar power amplifiers. Forclass AB operation theworking point lies in between class A and class B. Thesignals after amplification in both parts of the power amplifier are shown.0 the output transformer, shown in Fig. 5.24, performs the subtractionof bothwaveforms delivered by the power amplifiers which leads to the outputsignalI A ~- I A ~as shown in Fig. 5.23. The nonlinearbehavior of transformers isbeyond the scope of this discussion.
132.
114 5 Nonlinear ProcessingClass Bt Class ABFigure 5.23 Poweramplifieroperation (upper plots:leftclass A, middleclass B, rightclass C, lower plot: class AB operation).Figure 5.24 Power amplifier stage and output transformer.the chassis and the loudspeakers are arrangedin several combinations rangingfrom 2x12 to 4x10 in closed or open cabinets. Simulationsof these componentscan be doneby impulse response measurementsof the loudspeaker and cabinetcombination.As well as thediscussed topics the influence of the power supply withvalve rectifier(pp. 51-54 in [vdL97])is claimed to be of importance. A soft reduction of the powersupply voltage occurs when in a high power operation short transients need a highcurrent. This power supply effect leads to a soft clipping of the audio signal.
133.
5.3 Nonlinear Processors 115dBn100-10-20-30-+0-501k Zk 3k 5k 7k lekmk(C) (d)dBn FPT CH1,10dBm FFI CHI.100 0-20-10 -10-30-20-io-30-50-40-60-50-60-70-80-70-90-80-90lk Zk 3k Sk 7k 10k lk Zk 3k 5k 7k 1Qk ZOkFigure 5.25 VOX AC30/4 spectra at differentstages: (a) input stage, (b) output phasesplitter, (c) output power amp and (d) output of transformer.The corresponding spectra of the signals for the VOX AC30/4 measurementsare shown in Fig. 5.25. The distortion characteristic can be visualized by the wa-terfall representation of short-time FFTs for a chirp input signal in Fig. 5.26. Theindividual distortion components for the second up to thefifth harmonic areshownin Fig. 5.27 for each signal in the VOX AC30/4 amplifier.Musical ApplicationsMusical applications of valve amplifiers can be found on nearlyevery recording fea-turing guitar tracks. Ambassadors of innovative guitar playersfrom the blues toearly rock period are B. B. King, Albert King and Chuck Berry, who mainly usedvalve amplifiers for their warm and soft sound. Representatives of the classic rockperiod are Jimi Hendrix [m-Hen67a, m-Hen67b, m-Hen681, Eric Clapton [m-Cla67],Jimmy Page [m-Pag69], Ritchie Blackmore[m-Bla70],Jeff Beck [m-Bec89]and Car-los Santana [m-San99]. All makeextensive use of valve amplification and specialguitar effect units. There are also players from the new classic period like Eddievan Halen, Steve Ray Vaughan and Steve Morse up to the new guitar heroes suchas Steve Lukather, Joe Sartriani, GaryMoore, Steve Vai and Paul Gilbert, who areusing effect devices together withvalve amplifiers. New guitar amplifier designs withdigital preamplifiers are ba,sed on digital modeling technology,2 where a modelingof classic valve amplifiers is performed together with new valve-based algorithmsfor guitar and bass guitar processing.
134.
116Signal x(n)5 Nonlinear Processing-1 ’ I I I I I I I I I J0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000n +Waterfall Representationof Short-time FFTsfin kHz +Figure 5.26 Short-timeFFTs (waterfall representation) of VOXACSO with a chirp inputsignal. The amplifier is operating with full volume setting.Valve microphones, preamplifiers and effect devices suchas compressors, limitersand equalizers are also used for vocal recordings where the warm and subtle effectof valve compression is applied. A lot of vocalists prefer recording with valve con-denser microphones because of their warm low end and smooth top end frequencyresponse. Also the recording of acoustical instrumentssuch as acoustic guitars, brassinstruments and drums benefit from being processed with valve outboard devices.Valve processors also assist the mixing process for individual track enhancing andon the mix buses. The demand for valve outboard effects and classic mixing con-soles used incombinationwithdigitalaudioworkstationshas led back to entirevalve technology mixingconsoles. For the variety of digital audio workstationsa lotof plug-in software modulesfor valve processing are available on the marketplace.5.3.3 Overdrive, Distortion and FuzzIntroductionAs pointed out in the section on valve simulation, the distortedelectric guitar is acentral part of rock music. Aswell as the guitar amplifier as a major sound effectdevice, several stomp boxes (foot-operated pedals) have been used by guitar play-
135.
5.3 Nonlinear Processors 117Input Stage VOXAC30142-. - - - - _ _ - - -- -, - - - _ , _ - - :I- -19.8 2020.2 20.420.620.8Level indBm +Power Amp StageVOX AC30/4a”l25- - - d I -ll .ll .T8 20.S 15-e._ti 10-I5 -E.-1 .0 /25 30 35 40 4550Level indBm 425Phase Inversion Stage VOX AC30/4/0-1 0 0 10 20 30Levelin dBm +Output TransformerVOX AC30/4- j - ’ 1- - ,“5 10 15 20 2530Level indBm +Figure 5.27 VOX AC30 - Distortion components versus signal level at different stages.ers for the creation of their typical guitar sound. Guitar heroes like Jimi Hendrixhave made use of several small analog effect devices to achieve their unmistakablesound. Most of these effect devices have been used to create higher harmonics forthe guitar sound in a faster way and at a much lower sound level compared tovalve amplifiers. In this context terms like overdrive, distortion, fuzz and buzz areused. Several definitions of overdrive, distortion and fuzz for musical applicationsespecially in the guitar player world are a~ailable.~For our discussion we will de-fine overdrive as a first state where a nearly linear audio effect device at low inputlevels is driven by higher input levels into the nonlinear region of its characteristiccurve. The operating region is in the linear region as well as in the nonlinear regionwith a smooth transition. The main sound characteristic is of course from the non-linear part. Overdrive has a warm and smooth sound. The second state is termeddistortion, where the effects device mainly operates in the nonlinear region of thecharacteristiccurveand reaches the upper input level, where the output level isfixed to a maximum level. Distortion covers a wide tonal area startingbeyond tubewarmth to buzz saw effects. All metal and grungesounds fall into this category.The operating status of fuzz is represented by a completely nonlinear behavior ofthe effect device with a soundcharacterized by the guitar player terms “harder”and “harsher” than distortion. Fuzz alsomeansnotclear,distinct,or precise and3A Musical Distortion Primer by R.G. Keen on http://www.geofec.corn
136.
118 5 Nonlinear Processingunpredictable. The fuzz effect is generally used on single-note lead lines.Signal ProcessingSymmetrical soft clipping. For overdrive simulationsa symmetrical soft clippingof the inputvalues has tobe performed. One possible approach for a soft saturationnonlinearity [Sch80] is given byfor 0 5 x 5 1/3for 2/3 5 x 5 1.for 113 5 x 5 213 (5.7)The static input to outputrelation is shown in Fig. 5.28. Up to thethreshold of 1/3the input is multiplied by two and the characteristic curve is in its linear region.Between input values of 1/3 up to 2/3, thecharacteristic curveproduces a softcompression described by the middle term of equation (5.7). Above input values of2/3 the output value is set to one. The corresponding M-file 5.5 for overdrive withsymmetrical soft clipping is shown next.M-file 5.5 (symc1ip.m)function y=symclip(x)% y=symclip(x)% "Overdrive" simulation with symmetrical clipping% x - inputN=length(x) ;th=1/3; % threshold for symmetrical soft clippingfor i=i:l:Ny% by Schetzen Formulaif abs(x(i))< thy y(i)=2*x(i);end;if abs(x(i))>=th,if x(i)> 0, y(i)=(3-(2-x(i)*3) .^2)/3; end;if x(i)< 0, y(i)=-(3-(2-abs(x(i))*3) .^2)/3; end;end ;if abs(x(i))>2*th,if x(i)> 0, y(i)=i;end;if x(i)< 0, y(i)=-1;end;end ;end ;Figure 5.29 shows the waveforms of a simulation with theabove described char-acteristic curve anda sinusoid of 1kHz. In the upperleft plot the outputsignal fromsample 0 up to250 is shown which corresponds to the saturated partof the charac-teristic curve. The tops and bottoms of the sinusoid run with a soft curve towardsthe saturated maximum values. The upper right plot shows the output signal fromsample 2000 up to2250, where the maximum values are in the soft clipping region ofthe characteristic curve. Both the negative and the positive top of the sinusoid are
137.
5.3 Nonlinear Processors 119Static characteristic:y=f(x)-1 -0.5 0 0.5 1X-25/ -30-30 -20X in dB-10Figure 5.28 Static characteristic curve of symmetricalsoftclipping(right part showslogarithmic output value versus input value).Signalx(n)Signal1 10.5 0.5r T-C OXh-C OX-0.5 -0.5-1 -10 50100150200250200020502100215022002250n + n-,Waterfall RepresentationofShort-time FFTs- tInkHz+ l5 20Figure 5.29 Short-time FFTs (waterfall representation) of symmetrical soft clipping fora decaying sinusoid of 1kHz.rounded in their shape. Thelower waterfall representation showsthe entiredecay ofthe sinusoid downto -12 dB. Notice the odd order harmonics producedby this non-linear symmetrical characteristic curve,which appear in the nonlinear region of the
138.
120 5 Nonlinear Processingcharacteristic curve and disappear as soon as the lower threshold of the soft com-pression is reached. The prominent harmonics are the third and thefifth harmonic.The slow increase or decrease of higher harmonics is the major property of sym-metrical soft clipping. As soon as simple hard clipping without a soft compressionis performed, higher harmonics appear with significant higher levels (see Fig. 5.30).The discussion of overdrive and distortion has so far only considered the creationof harmonics for a single sinusoid as the input signal. Sincea single guitar toneitself consists of the fundamentalfrequency plus all odd and even order harmonics,the sum of sinusoids are always processed by a nonlinearity. The nonlinearity alsoproduces sum and difference frequencies, which sound very disturbing. The controlof these sum and difference frequencies goes beyond the current treatment of thesubject.10.5tC OIx-0.5-1Signal x(n) Signalx(n)rJ50 100 150 200:I12500 2000 4000 6000 8000 l0000n - t n +Waterfall Representationof Short-time FFTsFigure 5.30 Short-timeFFTs (waterfall representation) of symmetrical hard clipping fora decaying sinusoid of 1 kHz.Asymmetrical clipping. We have already discussed the behavior of triode valvesin the previous sectionwhich produce an asymmetricaloverdrive. One famous repre-sentative of asymmetrical clipping is the FuzzFaze which was used by Jimi Hendrix.The basic analog circuitis shown in Fig. 5.314and consistsonly of a few componentswith two transistors in a feedbackarrangement. The outputsignals for various inputlevels are presented in Fig. 5.32 in conjunction with the corresponding spectra for4The Technology of the Fuzz Face by R.G. Keen on http://www.geofex.com
139.
5.3 Nonlinear Processors 121INPUTFigure 5.31 Analog circuit of Fuzz Face.a l kHz sinusoid. The upper plots down to thelower plots show an increasing input0.57 7Output signals0-0.5L.i0 100 200 300 400 500-0.50 100 200 300 400 500Iil,..Spectra-60-800 2000 4000 6000 8000 10000-2ot A0n n . h n0 2000 4000 6000 8000 10000ot i-20-60-n r; -m0 -40-.-0 100 200 300 400 500"-0 2000 4000 6000 8000 1000002:-20-0.50 100 200 300 400 500-800 2000 4000 6000 8000 100000-0.50 100 200 300 400 500n +o t 1-20-60-40-800 2000 4000 6000 8000 10000f in Hz +Figure 5.32 Signals and corresponding spectra of Fuzz Face.level. For low level input signals the typical second harmonic of a triode valve canbe noticed although the timesignal shows no distortion components. With increas-ing input level the second harmonic and all even order harmonics as well as oddorder harmonics appear. The asymmetrical clipping produces enhanced even orderharmonics as shown in the third row of Fig. 5.32. Notice that only the top of thepositive maximum values are clipped. As soon as the input level further increases,the negative part of the waveform is clipped. The negative clipping levelislowerthan the positive clipping value and so asymmetrical clipping is performed. When
140.
122 5 NonlinearProcessingboth positive and negative clipping levels are exceeded, the odd order harmonicscome up but the even order harmonics are still present.Short-time Fourier transforms (in waterfall representation) for an increasing 1kHz sinusoid together with two waveforms are shown in Fig. 5.33.Signalx(n)gnal0.5 0.5T TC O C Ox X-0.51 1.011.02 1.03 1.041.05-0.54 4.014.024.034.044.05n +n-t X lo4 x 104Waterfall Representation of Short-time FFTsFigure 5.33 Short-time FFTs (waterfall representation) of Fuzz Faceforanincreasing1 kHz sinusoid. The upper plots showsegments of samplesfrom the completeanalysedsignal.A proposal for asymmetrical clipping [Ben971 used for tube simulation is givenbyQ # 0 , ~# Q.The underlying design parametersfor the simulationof tube distortion arebased onthe mathematical model [Ben971 where no distortion should occur when the inputlevel islow (the derivative of f(x) has to be f’(0)X 1 and f(0) = 0). The staticcharacteristic curve should perform clipping and limiting of largenegativeinputvalues and approximately linear for positive values. The result of equation (5.8) isshown in Fig. 5.34.The followingM-file 5.6 performs equation (5.8) from[Ben97].To removethe dccomponent and to shapehigher harmonics, additional lowpass and highpass filteringof the output signal is performed.
141.
5.3 Nonlinear ProcessorsStatic characteristic:y=f(x)123-0,5-1-1 -0.5 0X0.5Figure 5.34 Static characteristic curveof asymmetric soft clipping for tube simulationQ = -0.2 and dist = 8.M-file 5.6 (tube.rn)function y=tube(x, gain,Q, dist, rh,r l , mix)% y=tube(x, gain,Q, dist, rh, rl, mix)% "Tube distortion" simulation, asymmetrical function% x - input% gain - the amount of distortion,>0->% Q - work point. Controls the linearity of the transfer% dist - controls the distortions character, a higher number gives1 rh - abs(rh)<i, but close to 1. Placement of poles in theHP% rl - O<rl<l. The po.Le placement in the LP filter used to% mix - mix of original and distorted sound, l=only distortedq=x*gain/max(abs(x)); %Normalizationif Q==O% function for low input levels, more negative=more linear% aharderdistortion, >O% filter which removes the DC component% simulatecapacitancesinatubeamplifierz=q./(l-exp(-dist*q));for i=l:length(q) %Test because of theif q(i)==Q %transfer functionsz(i)=l/dist; %O/O value inQend;end;z=(q-Q)./(l-exp(-dist*(q-Q)))+Q/(l-exp(dist*Q));for i=i:length(q) %Test because of theelseif q(i)==Q %transfer functionsend;z(i)=l/dist+Q/(l-exp(dist*Q)); %O/O value inQ
142.
124 5 Nonlinear Processingend ;end ;y=mix*z*max(abs(x))/max(abs(z))+(l-mix)*x;y=y*max (abs(x))/max(abs (y) ;y=filter( C1 -2 l] [l -2*rh rh-23 ,y>;%HP filtery=filter( Cl-rll C1 -rll ,y>; %LP filterShort-time FFTs (waterfall representation)of this algorithm applied toa 1kHzsinusoid are shown inFig. 5.35. The waterfallrepresentation shows strong evenorder harmonics and also odd order harmonics.Signal x(n)7Signal x(n)0.5 0.57 TC O-C OX X-0.5 t i-1 U0 100 200 300 400 500n +- O I L-18000 8100 8200 8300 8400 8n +00Waterfall Representationof Short-time FFTsFigure 5.35 Short-time FFTs (waterfall representation) of asymmetrical soft clipping.Distortion. A nonlinearity suitable for the simulation of distortion [Ben971 is givenbyThe M-file 5.7 for performing equation (5.9) is shown next.M-file 5.7 (fuz2exp.m)function y=fuzzexp (x gain mix)% y=fuzzexp(x, gain, mix)
143.
5.3 Nonlinear Processors 125% Distortion based on an exponential function% x - input% mix - mix of original and distorted sound, l=only distortedq=x*gain/max(abs(x));z=sign(-q).*(l-exp(sign(-q) .*q));y=mix*z*max(abs(x))/max(abs(z))+(l-mix)*x;y=y*max(abs(x))/max(abs(y));gain - amount of distortion, >0->The static characteristic curve is illustrated in Fig. 5.36 and short-time FFTs of adecaying 1kHz sinusoid are shown in Fig. 5.37.Staticcharacteristic: y=f(x) Log. outputoverinputlevel-1 -0.5 0 0.5 1-25-30-2-30 -20 -1 0 0X X in dBFigure 5.36 Static characteristic curve of exponential distortion.Musical ApplicationsThereare a lot of commercial stomp effects for guitarists on the marketplace.Some of the most interesting distortion devices for guitars are theFuzz Face whichperformsasymmetricalclippingtowardssymmetrical soft clipping andtheTubeS~reamer,~which performssymmetrical soft clipping6The Fuzz Face was usedby Jimi Hendrix and the Tube Screamer by Stevie Ray Vaughan. They both offerclassical distortion and arewell known because of their famous users.It is impossibleto explain thesound of a distortionunitwithoutlisteningpersonally to it. Thetechnical specifications for the sound of distortion are missing, so the only way tochoose a distortion effect is by a comparative listening test.’The Technology of the Tube Screamer by R.G. Keen on http://www.geofex.com6GM Arts Homepage http://www.chariot.net.au/-gmarts/ampovdrv.htm
144.
126 5 Nonlinear ProcessingSignalx(n)ignal0 50100150 200250n+-15000 505051005150 5200 5250n +Waterfall Representationof Shoft-time FFTsFigure 5.37 Short-timeFFTs (waterfall representation) of exponential distortion5.3.4 HarmonicandSubharmonicGenerationIntroductionHarmonic and subharmonic generation are performed by simple analog or digitaleffect devices which should producean octave above and/or an octavebelow a singlenote. Advancedtechniques to achievepitchshifting of instrument sounds will beintroduced in Chapter 7. Here, we will focusonsimpletechniques, which lead tothe generation of harmonics and subharmonics.Signal ProcessingThesignal processing algorithms for harmonicandsubharmonicgenerationarebasedonsimplemathematicaloperations like absolute value computationandcounting of zero crossings, as shown inFig. 5.38 when an input sinusoid has tobe processed (first row shows time signal and corresponding spectrum).The second row of Fig. 5.38 demonstrates half-wave rectification, wherepositivevalues arekeptandnegative values are set to zero. This operation leads to thegeneration of even order harmonics. Full-wave rectification, where the absolute valueis taken from the inputsequence, leadsto even order harmonics as shownin the thirdrow of Fig. 5.38. Notice the absence of the fundamentalfrequency. If a zero crossing
145.
5.3 Nonlinear Processors 127Input signal Spectra50 loo,, 150 200Half-wave rectlffedsignal0 50 100,. 150 200l , Full-wave rectlfled siqnalI0 50 IO0 150 200Octave dlvlslon signalI0 2 4 6 8 1 0t O-0 2 4 6 8 1 07 -0 2 4 6 8 1 0If in kHz --fFigure 5.38 Signals and corresponding spectra of halve-wave rectification, full-wave rec-tification and octave division.counter is applied to the half-wave or the full-wave rectified signal, a predefinednumber of positive wave parts can be set to zero to achieve the signal in the lastrow of Fig. 5.38. This signal has a fundamental frequencywhich is one octavelowerthan the inputfrequency in the first row of the figure, but also shows harmonicsofthis new fundamental frequency. If appropriate lowpass filteringis applied to sucha signal, only the fundamental frequency can be obtained which is then added tothe original input signal.Musical ApplicationsHarmonicandsubharmonicgeneration is mostlyusedon single note lead lines,where an additional harmonic or subharmonic frequency helps to enhance the oc-tave effect. Harmonic generators can be found in stomp boxes for guitar or bassguitar and appear under the name octaver. Subharmonic generation is often usedfor solo and bass instrumentsto give them an extra bass boostor simply a fuzz basscharacter.
146.
128 5 Nonlinear Processing5.3.5 Tape SaturationIntroduction and Musical ApplicationThe special sound characteristic of analog tape recordings has been acknowledgedby avariety of producersand musicians in the field of rock music. They preferdoing multi-track recordings with analog tape-based machines and use the specialphysics of magnetic tape recording as an analog effects processor for sound design.One reason for their preference for analog recording is the fact that magnetic tapegoes intodistortiongradually [Ear761 (pp. 216-218) and producesthosekinds ofharmonics which help special sound effects on drums, guitars and vocals.Signal ProcessingTape saturation can be simulated by the already introduced techniques for valvesimulation. An input level derived weighting curve is used for generating a gainfactor which is used to compress the input signal. A variety of measurements oftape recordings can help inthe design of such processingdevices. An example of theinput/output behavior is shown in Fig. 5.39and a short-timeFFT of a sinusoid inputsignal in Fig. 5.40 illustrates a tape saturation algorithm. For low-level inputs thetransfer characteristic is linear without any distortions. A smooth soft compressionsimulates the gradually increasing distortion of magnetic tape.Figure 5.39 Tape saturation:input and output signal (left) and static characteristiccurve.5.4 Excitersand Enhancers5.4.1 ExcitersIntroductionAn exciter is a signal processor that emphasizes or de-emphasizes certain frequen-cies in order to change a signalstimbre. An exciterincreasesbrightnesswithout
147.
5.4 Exciters and Enhancers 129Signal x(n)10.5T-UC OX-0.5-1 J0 100 200 300 400 500n +Signal x@)1-l000 1100 1200 1300 1400 1500n +Waterfall Representationof Short-time FFTs4’#,’Figure 5.40 Tape saturation: short-time FFTs (waterfall representation) fordecayingsinusoid of 1kHz.necessarily adding equalization. The result is a brighter, “airier” sound without thestridency that cansometinlesoccur by simply boostingthetreble.This is oftenaccomplished with subtle amounts of high-frequency distortion, andsometimes, byplaying around with phase shifting. Usually there will only be one or two parame-ters, such as exciter mix and exciter frequency. The former determines how much“excited”soundgetsadded to the straight sound,andthelatterdeterminesthefrequency at which the exciter effect starts [Whi93, And95, Dic87, WG941.This effectwas discovered by the Aphexcompanyand“AuralExciter” is atrademark of this company. The medium and treble partsof the original signal areprocessed by a nonlinear circuit that generates higher overtones. These componentsare then mixed to some extent to the original signal. A compressor at the outputof the nonlinear element makes the effect dependent on the input signal. The ini-tial part of percussive sounds will be more enriched than the following part, whenthe compressorlimits the effect depth. The enhancedimaging or spaciousness isprobably the result of the phase rotation within the filter [Alt90].Signal ProcessingMeasurement results of the APHEX Aural Exciter are shown in Fig. 5.41 and inFig. 5.42 where the genera.tion of a second harmonic is clearly visible. The input
148.
130 5 Nonlinear Processingsignal is a chirp signal with an increasing frequency up to 5 kHz. Signal processingtechniques to achieve the effect have already been discussedin the previous sections.The effect is created in the side chain path and is mixed with the input signal.Figure 5.41 Block diagram of the psycho-acousticequalizerAPHEXAuralExciter andfrequency response.I ,Signal x(n)I , I II I-1 I I I I I I I I I I0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000n +Waterfall Representation of Short-time FFTsFigure 5.42 Short-time FFTs (waterfall representation) of a psycho-acoustic equalizer.Musical ApplicationsThe applications of this effect are widespread and range from single instrumentenhancement to enhancement of mix buses and stereo signals. The effect increases
149.
5.4 Exciters and Enhancers 131the presence and clarity of a single instrument inside a mix and helps to add nat-ural brightness to stereo signals. Applied to vocals and speech the effect increasesintelligibility. Compared to equalizers the sound level is only increased slightly. Theapplication of this effect only makes sense if the input signal lacks high frequencycontents.5.4.2 EnhancersIntroductionEnhancers aresignal processorswhich combine equalization together with nonlinearprocessing. Theyperformequalizationaccording to the fundamentals of psycho-acoustics [ZF90] andintroduce asmallamount of distortion in a just noticeablemanner. An introduction to the ear’s own nonlinear distortions, sharpness, sensorypleasantness and roughnesscan be alsobefoundin [ZF90]. A lot of recordingengineers and musicians listen under slightly distorted conditions during recordingand mixing sessions.Signal ProcessingAs an example of this class of devices the block diagram and thefrequency responseof the SPL vitalizer are shown in Fig. 5.43. This effect processor has also a sidechain path which performsequalizationwitha strong bassenhancement,a mid-frequency cut anda high-frequency boost. The short-timeFFT of the outputsignalwhen a chirp input signal is applied is shown in Fig. 5.44. The resulting waterfallrepresentation clearly shows higher harmonics generated by this effect processor.TProcess X+Harmonics(Mix)enter Freq.)Figure 5.43 Block diagram of the psycho-acoustic equalizerSPL Vitalizer and frequencyresponse.Further refinements of enhancers can be achieved through multiband enhancerswhich split the inputsignal into several frequency bands. Inside eachfrequency bandnonlinear processing plus filteringis performed. The outputsignals of each frequencyband are weighted and summed up to form the output signal (see Fig. 5.45).Musical ApplicationsThe main applications of such effects are single track processing as a substitute forthe equalizers inside the input channels of mixing consoles and processing of final
150.
132 5 Nonlinear ProcessingSignal x(n)0.5-1 ’ 10 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000n +WaterfallRepresentationof Short-time FFTsFigure 5.44 Short-timeFFTs (waterfall representation) of psycho-acoustic equalizer SPLVitalizer.W.Figure 5.45 Multiband enhancer with nonlinear processing in frequency bands.mixes. The side chainprocessing allowsthe subtlemix of the effects signal togetherwith the input signal. Further applications are stereo enhancement for broadcaststations and sound reinforcement.5.5 ConclusionThe most challenging tools for musicians and sound engineers are nonlinear pro-cessors such as dynamics processors, valve simulators and exciters. The successfulapplication of these audio processors depends on the appropriate control of thesedevices. A variety of interactive control parameters influences the resulting sound
151.
Sound and Music 133quality.The primary purpose of this chapter is to enable the reader to attain a fun-damental understanding of different types of nonlinear processors and the specialproperties of nonlinear operations applied to the audio signal.Dynamics processorsneed a careful consideration of the interaction of thresholds and time-constants toachieve sonic purity and avoid aliasing distortion. On the other handnonlinear pro-cessors are used for the simulation of valve amplifiers or nonlinear audio systems,where aspecial kind of nonlinearity providesa sound distortion with accepted soundcharacteristic. We have presented the basics of nonlinear modeling and focused onthe combination of filters and nonlinearities. Several applications demonstrate theimportance of nonlinear processors. The basic building blocksof the previous chap-ters such as filters, delays and modulators/demodulators are particularly useful forexploring new improved nonlinear processors.Sound and Music[m-Bec89] JeffBeck: GuitarShop. 1989.[m-Bla70]Deep Purple: Deep Purple in Rock.1970.[m-Cla67]Cream: Disraeli Gear. 1967.[m-Hen67a] Jimi Hendrix: Are You Experienced? 1967[m-Hen67b] Jimi Hendrix: Bold AsLove. 1967.[m-Hen68] JimiHendrix:ElectricLadyland. 1968.[m-Pag69] Led Zeppelin: I/II/II/IV. 1969-1971.[m-San99] Santana:Supernatural. 1999.Bibliography[AM01 S.R. Alten. Audioin Media. Wadsworth,1990.[And951 C. Anderton. Multieffects for Musicians. Amsco Publications, 1995.[Arf79] D. Arfib. Digitalsynthesis of complex spectra by means of multiplica-tion of nonlinear distorted sine waves. J. Audio Eng. Soc., 27:757-768,October 1979.[Bar981 E. Barbour. The cool sound of tubes. IEEE Spectrum,pp. 24-32, August1998.[Ben971 R. Bendiksen. Digitale Lydeffekter. Master’sthesis, NorwegianUniver-sity of Science and Technology, 1997.
152.
134[Bru79][De 841[Did71[Doy931[Ear761[Fli93][Fra97]5 Nonlinear ProcessingM. Le Brun. Digital waveshaping synthesis. J. Audio Eng. Soc., 27:250-265, April 1979.G. De Poli.Soundsynthesis by fractionalwaveshaping. J. AudioEng.Soc., 32(11):849-861, November 1984.M.Dickreiter. Handbuch der Tonstudiotechnik, Band I und II. K.G.Saur, 1987.M. Doyle. The History of Marshall. Hal Leonhard Publishing Corpora-tion, 1993.J. Eargle. Sound Recording. Van Nostrand, 1976.R. Fliegler. AMPS! The Other Half of Rock’n’Roll. Hal Leonhard Pub-lishing Corporation, 1993.W. Frank. Aufwandsarme Modellierung und Kompensation nichtlinearerSysteme auf Basisvon Volterra-Reihen. PhD thesis,University of theFederal Armed Forces Munich, Germany, 1997.[FUB+98]A.Farino, E. Ugolotti,A.Bellini, G. Cibelli, andC.Morandi. Inversenumericalfilters for linearisation of loudspeakerresponses.In Proc.DAFX-98 Digital Audio Effects Workshop,pp. 12-16, Barcelona, Novem-ber 1998.[Ham731 R.O.Hamm.Tubes versus transistors - is there an audible difference?J. AudioEng. Soc., 21(4):267-273,May 1973.[Kai87] A.J.M.Kaizer.Modelling of the nonlinearresponse of an electrody-namicloudspeaker by a volterra seriesexpansion. J. AudioEng. Soc.,35(6):421-433, 1987.[Kli98] W. Klippel.Directfeedbacklinearization of nonlinearloudspeaker sys-tems. J. AudioEng. Soc., 46(6):499-507, 1998.[McN84] G.W. McNally. Dynamicrangecontrol of digitalaudiosignals. J. AudioEng. Soc., 32(5):316-327,May1984.[NieOO] S. Nielsen. Personalcommunication. TC Electronic A/S, 2000.[Orf96] S.J. Orfanidis. Introductionto SignalProcessing. Prentice-Hall,1996.[PD93]D.PetersonandD. Denney. TheVOXStory. The Bold StrummerLtd.,1993.[Rat951 L.Ratheiser. Das groj3e Rohrenhandbuch. Reprint. Franzis-Verlag,1995.[RCA59] RCA. Receiving Tube Manual. Technical Series RC-19. RadioCorpora-tion of America, 1959.
153.
Bibliography 135[RH961[RZ95][SchsO][SZ99][vdL97][WG94][Whig31[ZF90][Z0197]M.J. Reed and M.O. Hawksford. Practical modeling of nonlinear au-diosystemsusingthevolterraseries. In Proc.100th AES Convention,Preprint 4264, 1996.B. Redmer and U. Zolzer. Parameters for a Dynamic Range Controller.Technical report, Hamburg University of Technology, 1995.M. Schetzen. The Volterra and Wiener Theories of NonlinearSystems.Robert Krieger Publishing, 1980.J. Schattschneiderand U. Zolzer. Discrete-timemodels for nonlinearaudio systems. In Proc. DAFX-99 Digital Audio Effects Workshop, pp.45-48, Trondheim, December 1999.R. von derLinde. Rohrenuerstarker. Elektor-Verlag, 1997.M. Warstat and T. Gorne. Studiotechnik - Hintergrund und Praxiswis-sen. Elektor-Verlag,1994.P. White. L’enregistrement crdatif, Effets et processeurs, Tomes 1 et 2.Les cahiers de I’ACME, 1993.E. Zwicker and H. Fastl. Psychoacoustics. Springer-Verlag,1990.U. Zolzer. Digital Audio SignalProcessing. John Wiley & Sons,Ltd,1997.
154.
Chapter 6Spatial EffectsD. Rocchesso6.1 IntroductionThe human peripheral hearing system modifies the sound material that is trans-mittedtothe higher levels inthebrain,andthese modifications aredependenton the incoming direction of the acoustic waves. From the modified sound signalsseveral featuresare collected in a set of spatial cues, used by the brain to inferthe most likely position of the sound source. Understanding the cues used by thehearing system helps the audio engineer to introduce some artificial features in thesound material in order to project the sound eventsin space. In the first half of thischapter, the most importanttechniques for sound projection are described] bothforindividual listeners using hea.dphones andfor an audience listening througha set ofloudspeakers.In natural listening conditions, sounds propagatefrom a source to a listener andduring this trip they are widely modified by the environment. Therefore] there aresome spatial effects imposed by the physical and geometric characteristics of theenvironment to thesound signals arriving at the listener’s ears. Generally speaking]we refer to the kind of processing operated by the environment as reverberation. Thesecond half of this chapter illustrates these kindsof effects and describes audio pro-cessing techniques that have been devised to imitate and extend the reverberationthat occurs in nature.The lastsection of the chapter describesa third categoryof spatial effects that isinspired by the natural spatialprocessing of sounds. In this miscellaneous categorywe list all the filtering effects that aredesigned to change the apparentsource width,the spaciousness of a soundfield, and the directivityof a loudspeaker set.Moreover]the geometric and physical characteristics of reverberating enclosures are used asdesign features for generalizedresonators to beused aspart of soundsynthesisalgorithms.137
155.
138 6 Spatial EffectsThe importanceof space hasbeen largely emphasized in electro-acousticcompo-sitions, but sophisticated spatial orchestrations often resultin poor musical messagesto the listener. Indeed, space cannot be treated as a composition parameter in thesame way as pitch or timbre are orchestrated, justbecause spacefor sounds is not an“indispensable attribute” [KVOl] as it is for images. This relative weakness is wellexplained ifwe think of two loudspeakers playing the same identical sound track:the listener will perceive one apparent source. The phenomenon is analogous to twocolored spotlights that fuse to give one new, apparent, colored spot. In fact, coloris considered a non-indispensable attribute for visual perception. However, just ascolor is a very important component in visual arts, thecorrect use of space canplaya fundamental role in music composition, especially in improving the effectivenessof other musical parameters of sound, such as pitch, timbre, and intensity.6.2 Basic Effects6.2.1 PanoramaIntroductionUsing a multichannel sound reproduction system we can change the apparent posi-tion of a virtual sound source just by feeding the channels with the samesignal andadjusting the relative amplitudeof the channels. This taskis usually accomplished,as part of the mixdown process, by the sound engineer for each sound source, thuscomposing a panorama of acoustic events inthe space spannedby the loudspeakers.Acoustic and Perceptual FoundationsForreproductionviamultipleloudspeakers,it is important to take some specificaspects of localization into account. In fact, different and sometimes contradictorycues come into playwhenmultiplesources radiate coherent or partially coherentsignals [Bla83]. Particularly important is the case of a listener hearing signals ar-riving from the sources only at slightly different levels and times. In this case thesources will cooperate to provide a single sound event located at a place differentfrom the source locations. For larger differences in the incoming signal the virtualsound images tend to collapse onto one of the real sources. The precedence effect(see section 6.2.2) is largely responsible for this phenomenon.Figure 6.2 shows the kind of curves that it is possible to drawfromexperi-ments with a standard stereo layout (i.e. central listener and angle of60’ with theloudspeakers, as in Fig. 6.1) and with broadband impulses as test signals [Bla83].Figure 6.2.a shows the perceived displacement for a given level difference. Figure6.2.b shows the perceived displacement for a given time difference. The curve forlevel difference is well approximated by the Blumlein law [Bla83]
156.
622 Basic Egects 139Figure 6.1 Stereo panning. B is the angle of the apparent source position.or by the tangent lawwhere g L and g L are the gains to be applied to the left and right stereo channels,0 is the angle of the virtualsourceposition,and 81 is the angle formed by eachloudspeaker with the frontal direction. In [Pu197] it is shown that the tangent lawresults from a vectorformulation of amplitudepanning (see section6.4.3). Thisformulation, as well as thecurves of Fig. 6.2 are mainly valid for broadband signalsor at low frequencies (below 500-600Hz). For narrowband signalsat higher frequen-cies the curvesarequitedif€erent and the curve for time differences can even benonmonotonic [Bla83].Tb30g 20S 10c/l/level difference in dB+(a>time difference in msec+(b)Figure 6.2 Perceived azimuth of a virtual sound source when a standard stereo layout isdriven with signals that only differ in level (a) or in time delay (b).
157.
140 6 Spatial EgectsSignal ProcessingIn a standard stereo loudspeaker set-up it is assumed that the listener stands incentral position and forms an angle 201 with the twoloudspeakers (see Fig. 6.1).Two gains gL and gR should be applied to the left and right channel, respectively,in order to set the apparent azimuthat the desired value 8. A unit-magnitude two-channel signal, corresponding to the central apparent source position (0 = 0), canbe represented by the column vectorJzU = [ +]’so that thegains to be appliedto thetwo channelsin order to steer thesound sourceto the desired azimuth are obtained by the matrix-vector multiplication:[ i: ] = Aeu.The matrix A0 is a rotation matrix. If 01 = 45” the rotation matrix takes the formAe= [ cos0 sin0- sin0 cos0 1 ’so that when 0 = onlyone of the twochannels is non zero. It is easily verifiedthattherotation by matrix (6.5) corresponds to applyingthetangent law (6.2)tothe configurationwith 91 = 45”. Amplitudepanning by means of a rotationmatrix preserves the loudness of the virtual sound sourcewhile moving its apparentazimuth. In contrast, linearcross-fading betweenthe two channels does not preservethe loudness and determines a “hole in the middle” of the stereo front.The matrix-based panning of equation (6.4) is suited to direct implementationin Matlab. TheM-file 6.1 implements the amplitude panningbetween an initial anda final angle.M-file 6.1 (matpan.m)initial-angle = -40; %in degreesfinal-angle = 40; %indegreessegments = 32;angle-increment = (initial-angle - final-angle)/segments * pi / 180;lenseg = floor(length(monosoud)/segments) - 1;pointer = l;angle = initial-angle * pi / 180; %in radians% in radiansfor i=l:segmentsA =[cos(angle) , sin(ang1e) ; -sin(angle), cos(ang1e)l;x = [monosound(pointer:pointer+lenseg);monosound(pointer:pointer+lenseg)];
158.
6.2 Basic Effects 141y = Cy, A * XI;angle = angle + angle-increment;pointer = pointer + lenseg;end ;The monophonic input sound (stored in array monosound) is segmented in a num-ber segments of blocks and each block is rotated by means of a matrix-by-vectormultiplication.In practice,the steering angle8 does not necessarily correspond to theperceivedlocalization azimuth. The perceived location is influenced by the frequency contentof sound. Some theories of “directionalpsychoacoustics” have been developed inthe past in order to drive the rotation matrices with the appropriate coefficients[Ger92a]. Accurate implementations use frequency-dependent rotation matrices, atleast discriminating between low (less than about 500 Hz) and high (between 500Hz and 3500 Hz) frequency components [PBJ98].We will discuss directional psycho-acoustics in further detail in section 6.4.4, in the context of Ambisonics surroundsound.Music Applications and ControlStereopanninghas been applied to recordings of almost every music genre.Insome cases, the panoramicknob of domestic hi-fi systems has been exploited asan artistic resource. Namely, the composition HPSCHD by John Cage and LejarenHiller appears in the record [m-Cag69] accompanied by a sheet(unique for eachrecord copy) containing a computer-generated score for the listener. In the score,panoramic settings are listed at intervals of 5 seconds together with values for thetreble and bass knobs.6.2.2 Precedence EffectIntroductionIn a stereo loudspeaker set-up, ifwe step to one side of the central position andlisten to a monophonic music program, we locate the apparent sound source in thesame position as our closest loudspeaker, and the apparent position does not moveeven if the other channel is significantly louder. The fact that the apparent sourcecollapses into one of the loudspeakers is due to theprecedence effect, a well-knownperceptual phenomenon that can be exploited in certain musical situations.Acoustic and Perceptual FoundationsOur hearing system is very sensitive to thedirection of the first incomingwavefront,so that conflicting cues coming afterthe first acoustic impact with the source signalare likely to be ignored. It seems that spatial processing is inhibited for a few tensof milliseconds after the occurrence of a well-localized acoustic event [Gri97]. Thisis the precedence effect [Bla83]. Clearly, the first wavefront is related to a transient
159.
142 6 SpatialEflectsin sound production, and transients are wideband signals excitinga wide frequencyrange where the directional properties of the outer ear (see section 6.3.4) can beused to give a hint of incoming direction.Signal ProcessingIn a standard stereo loudspeaker set-up, with the listener in central position as inFig. 6.1, a monophonic source materialthat feeds both loudspeakers with the sameloudness can be moved towards oneof the loudspeakers justby inserting some delayin the other channel. Figure 6.2.bshows the qualitative dependencyof the apparentazimuth on the relativedelay between channels. The actual curve strongly dependson the kind of sounds that areplayed [Bla83].A simple circuit allowing a joint control of panorama and precedence effectisshown in Fig. 6.3. The variable-length delay lines should allow delays up to lms.Figure 6.3 Control of panorama and precedence effect.Music Applications and ControlIn a live electro-acousticmusicperformance, it is oftendesirable to keepall theapparent sound sources in the front, so that they merge nicely, both acousticallyand visually, with the soundscoming from acoustic instruments on stage.However,if only the front loudspeakers are playing it mightbe difficult to provideall theaudience with a well-balanced sound, even at the seats in the rear.A good solution,called transparent amplification by Alvise Vidolin [RV96], is to feed the rear loud-speaker with a delayed copy of the front signals. The precedence effect will ensurethat the apparent location is on stage as long as the delay of the rear signals is atleast as long as the time taken by sounds to go from the stage to the rear loud-speakers. Given the length of the room L and the speed of sound c M 340m/s, thedelay in seconds should be aboutLT o = - . (6.6)c
160.
6.2 Basic Effects 1436.2.3 Distance and Space RenderingIntroductionIn digital audio effects, the control of apparent distance can beeffectivelyintroducedeven in monophonic audio systems. In fact, the impression of distance of a soundsource is largely controllable by insertion of artificial wall reflections or reverberantroom responses.Acoustic and Perceptual FoundationsThere are no reliable cues for distance in anechoic or open space. Familiarity withthe sound source can provide distance cues related with air absorption of high fre-quency. For instance, familiarity with amusical instrument tells us what theaverageintensity of its sounds arewhen coming from a certain distance. The fact that tim-bra1 qualities of the instrument will change when playing loudly or softlyis also a cuethat helps with the identification of distance. These cues seem to vanish when usingunfamiliar sources or synthetic stimuli that do not resemble any physical soundingobject. Conversely, in an enclosure the ratioof reverberant to direct acoustic energyhas proven to be a robust distance cue [Bla83]. It is often assumed that in a smallspace the amplitude of the reverberant signal changes little with distance, andthatin a large spaceit is roughly proportionalto l / d E [Cho71].The direct soundattenuates as l/Distanceif spherical waves are propagated.Special, frequency-dependent cues are used when the source is very close to thehead. These cues are briefly described in section 6.3.2.Signal ProcessingA single reflection from a wall canbeenough to providesomedistancecues inmany cases. The physical situation is illustrated in Fig. 6.4a, together with a signalprocessing circuit that reproduces it. A single delay line with two taps is enoughto reproduce this basic effect. Moreover, if the virtual sound source is close enoughto the listeningpoint, the first tap canbetakendirectlyfrom the source, thus1/(1 + 2w - d)(b)Figure 6.4 Distance rendering via single wall reflection: (a) physical situation, (b) signalprocessing scheme.
161.
144 6 Spatial Effectsreducing the signalprocessing circuitry to a simplenon-recursivecombfilter.Tobe physically consistent, the direct sound and itsreflection should be attenuated asmuch as the distance they travel, and thewall reflection should also introduce someadditional attenuation and filtering in the reflected sound, representedby the filterH, in Fig. 6.4b. The distance attenuation coefficients of Fig. 6.4b have been set insuch a way that they become one when the distance goes to zero, just toavoid thedivergence to infinity that would come from the physical laws of a point source.From this simple situation itis easy to see how the direct sound attenuates fasterthan the reflected sound as long as the source approaches the wall. This idea canbe generalized to closed environments adding a full reverberant tail to the directsound. An artificial yet realistic reverberant tail can be obtained just by taking anexponentially decayed gaussian noise and convolving it with the direct sound. Thereverberant tail should be added to the direct sound aftersome delay (proportionalto the size of the room) and should be attenuated with distance in a lesser extentthan the direct sound. Figure 6.5 shows the signal processing scheme for distancerendering via room reverberation.Gaussian Noise&f y n e n t i a l DecayY1/(1+ d) H,1/(1 + sqrt(d))(a) (b)Figure 6.5 Distance rendering viaroom reverberation: (a) physical situation,(b)signalprocessing scheme.The following M-file 6.2 allows experimentation with the situations depictedinFig. 6.4 and 6.5, with different listener positions, provided that x is initialized withthe input sound, andy, z, and W are long-enough vectors initialized to zero.M-file 6.2 (distspace.m)h = filter([0.5,0.5],1, ...random(norm,O,i,l,lenh).*exp(-[i:lenh]*O.Ol/distwall~/lOO~;% reverb impulse responseoffset = 100;st = Fs/2;for i = 1:i:distwall-l % several distances listener-sourcedell = floor (i/c*Fs) ;Indeed, in this single reflection situation, the intensity of the reflected sound increases as thesource approaches the wall.
162.
6.2 Basic Effects 145de12 = floor((distwall*2 - i)/c*Fs);y(i*st+l:i*st+dell) = zeros(1,dell);y(i*st+dell+l:i*st+dell+length(x)) = x./(l+i); % direct signalw(i*st+del2+1:i*st+del2+length(x)) = ...y(i*st+del2+1:i*st+del2+length(x)) + ...x./(l+(2*distwall-i)); % direct signal + echoz(i*st+del2+1:i*st+del2+length(x)+lenh-l+offset) = ...y(i*st+del2+1:i*st+del2+length(x)+lenh-l+offset) + ...[zeros(l,offset) ,conv(x,h)l ./sqrt(l+i);% direct signal + delayed reverbendMusic Applications and ControlAn outstanding example of musical control of distance is found in the piece “Ture-nas” by John Chowning [m-Cho72], where the reverberant sound amplitude is de-creasedwith the reciprocal of the square root of distance, much slower than thedecrease of direct sound. In the composition, distance cues are orchestrated withdirection (as in section 6.2.1) and movement (as in section 6.2.4) cues.Panorama, precedence effect, and distance rendering taken asa whole contributeto the definition of a sonic perspective [Cho99]. The control of such sonic perspec-tive helps the task of segregation of individual sound sources andthe integrationofmusic and architecture. A noteworthy example of fusion between architecture andmusic is found in the piece “Prometeo”,by Luigi Nono [m-Non82]. In the originalperformance the players were located in different positions of an acoustic shell de-signed by architect Renzo Piano, and sounds from live-electronic processing weresubject to spatial dislocation in a congruent sonic architecture.Another importantpiece that makes extensive useof space renderingis “R6pons”,by Pierre Boulez [m-Bou84], which is performed with an unusual arrangement ofsoloists, choir, and audience. The first spatial element exploited in the compositionis the physical distance between each soloist and the choir. Moreover, the sounds ofthe soloists are analyzed, transformed, and spatializedby means of a computer anda set of six loudspeakers. For the CD, soundshave been processed usingthe software“Spatialisateur” [Jot991in order to place the orchestrain front of the listener and toenlarge the audioimage of the soloists. The Spatialisateur,implemented at IRCAMin Paris, makes use of panorama and distance rendering via room reverberation,using sophisticated techniques such as those described in section 6.5.6.2.4 Doppler EffectIntroductionMovements of the sound sources are detected as changes in direction and distancecues. Doppler effectis a further (strong) cue that intervenes whenever there is aradial component of motion between the sound source and the listener. In a closed
163.
146 6 Spatial Effectsenvironment, radial componentsof motion are likely to show up via reflections fromthe walls. Namely, even if a sound source is moving at constant distance from thelistener, the paths takenby the sound waves via wall reflections are likely to changeinlength. If the sourcemotion is sufficiently fast, in all of thesecases we havetranspositions in frequency of the source sound.Acoustic and Perceptual FoundationsThe principle of the Doppler effect is illustrated in Fig. 6.6, where the listener ismoving toward the sound source with speed c,. If the listener meets f s wave crestsper second at rest, he ends up meeting crests at the higher ratef d f s (1 f 2)when the source is moving. Here c is the speed of sound in air. We usually appre-ciate the pitch shift due to Doppler effect in non-musical situations, such as whenan ambulance or a train is passing by. The perceived cue is so strong that it canevocate the relative motion between source and listener even when other cues in-dicate a relative constant distance between the two. In fact, ambulance or insectsounds having a strong Doppler effect are often used to demonstrate how good aspatialization system is, thusdeceiving the listener who does not think thatmuch ofthe spatialeffect is already present in the monophonic recording. Recent researchinpsychoacoustics has also shown how the perceptionof pitch can be stronglyaffectedby dynamic changes in intensity, as arefound in situations where the Doppler effectoccurs [Neu98]. Namely, a sound source approaching the listener at constantspeeddetermines a rapid increase in intensity when it traverses the neighborhood of thelistener. On the other hand,while the frequency shift is constant andpositive beforepassing the listener, and constant and negative after it has passed, most listenersperceive an increase in pitch shift as thesource is approaching. Such apparent pitchincrease is due to thesimultaneous increase in loudness.Figure 6.6 opplereffect.Illustration of the D
164.
6.2 Basic Eflects 147Signal ProcessingThe Doppler effect can be faithfully reproduced by a pitch shifter (see Chapter 7)controlled by the relative velocity between source and listener. In particular, thecircuit of Fig. 7.16 can be used with sawtooth control signals whose slope increaseswith the relativespeed.Figure6.7 shows the signal to be used to controlone ofthe delays of Fig. 7.16 for a sound source that approaches the listening point andpasses it. Before the source reaches the listener, the soundis raised in pitch, and itis lowered right after.Figure 6.7 Controlsignal for simulating the Dopplereffectwithadelay-basedpitchshifter.Any sound processing model based on the simulationof wave propagation, suchas the model described in section 6.4.6, implements an implicit simulation of theDoppler effect. In fact, thesemodels are based ondelay lines that change their lengthaccording to therelative position of source and listener, thusproviding positive ornegative pitch transpositions.In general,the accuracy and naturalness of a Doppler shift reproducedby digitalmeansdepend on the accuracy of interpolation in variable-length delays. If thisis notgoodenough,modulationproducts affect thetransposed signalproducingremarkable artifacts. The enhanced techniques described in sections 3.2.4 and 7.4can be applied for various degrees of fidelity.Music Application and ControlThe motion of sourceorlistener in musical contextshas to beperformedwithcare, since uncontrolled frequency shifts completely destroy any harmonic relationsbetween thenotes of a melody. However, therearenotable examples of electro-acoustic music works incorporatingDoppler effects [m-Cho72, m-Pis951. In mostcases, the sounds that are subjected to Doppler shifting are various kinds of noisesor percussive patterns, so that theydonotimposeanyspecialharmonic glidingcharacter on the piece.6.2.5 Sound TrajectoriesThe first attempt todesign spatial movements as anindependent dimension of musiccomposition is probablyStockhausen’s“GesangderJunglinge” [m-Sto56], wheredirections and movements among five groups of loudspeakers, arranged in a circlearound the audience, were precisely programmed by the author. This piece is alsoprobably the origin of the “spatial utopia” of a large part of contemporary music,
165.
148 6 SpatialEffectssinceStockhausen gave abundant and clear explanations of his aesthetics, wherevelocities and spatial displacements areconsidered to be as important aspitches andnote durations. This suggestive idea seems to ignore some perceptual evidence andthe difficulties related to conveying consistent spatial information to an audience.Nevertheless, sound trajectories in [m-Sto56] are certainly essential to the dramaticdevelopment of the piece.Since the work of Chowning [Cho71, m-Cho721, many computer systems havebeen built in order to help the musician to compose with space. Most of these sys-tems are based on a software interface that allows the musician to display orlanddefine trajectories of virtual soundsources in the listening space, where actual loud-speakers are positioned. In fact, the manipulationof sound trajectories belongs to acontrol layer built on top of the signal processing level that implements panoramicor distance effects. Parameters such as angular position, distance from loudspeak-ers, etc., are taken directly from the curves at a rate that is high enough to ensuresmooth movements without noticeable artifacts. In this respect, if only panoramicand distance effects are affected by trajectories, the control rate canbe very low(e.g., 20 Hz) thus allowing separation of the controllayer from the signal processinglayervia a low-bandwidthcommunicationchannel. Vice versa, if the Doppler ef-fect is somehow taken into account, the control ratemust be much higher, becausethe ear is sensitive to minusculevariations of soundpitch.Alternatively,controlsignalscanstillbe transmitted at low rate if at the signalprocessingendthereis an interpolation mechanism that reconstructs the intermediate values of controlparameters.Sometimes spatial trajectories are taken asa metaphor for guiding transforma-tions of sonicmaterials. For instance, Truax [m-Tru85] uses multiple epicycles tocontrol transformationsof continuous sounds andto project them intospace. In thisexample, the consistency between spatial display and sound transformation helpsthe listener to discriminate and follow several music lines without the aid of pitchor rhythmic patterns.Software systems for the definition and display of sound trajectories are com-monly used in live electronics. For instance, Belladonna andVidolin developed an in-terface for two-dimensional panoramic control for the visual languageMAX [BV95].Trajectories can be memorized, recalled, and translated at runtime into MIDI mes-sagessent to a MIDI-controlledaudiomixer. Externalreverberating devices canalso be controlled and synchronized with panoramic messages in order to recreatean impression of depth and distance. This system has been used in several musicproductions and in various listening spaces. Figure 6.8 illustrates two erratic move-ments of virtual sound sources (the actual sources are the two percussion sets) inthe big San Marco church in Milan, as it was programmed for acomposition byGuarnieri [m-Gua99]. This piece and listening space are noteworthy for the use ofintensity-controlled sound panning over a wide area, i.e., some sources move fromfront to back as they get louder. Ingeneral, sound trajectories arerarely meaningfulper se, but they can become a crucial dramatic component if their variations arecontrolled dynamically as a function of other musical parameters such as intensity,or rhythm.
166.
6.3 3D with Headphones 14970 m.Figure 6.8 Spatial layout and sonic trajectories for the performanceof a composition byG. Guarnieri.A complex system to control sound trajectories in real-time was implementedon the IRIS-MARSworkstation[RBM95]. It implemented the room-within-the-room model described in section 6.4.6 by reserving one of the two processors forspatialization, and the other for reverberation. The coordinates of source motionand other parametersof spat,ialization could be precomputed or controlled in real-time via MIDI devices. This kind of spatial processing has been used in the piece“Games” by F. Cifariello Ciardi [m-Cif95].A sophisticated system providing real-time multichannel spatial audio process-ing on acomputerworkstation is the“Spatialisateur” developed at IRCAM inParis [Jot92,Jot991. In this system, an intermediatelevel is interposed between thesignal processing level and the control level. Namely, the physical parameters ofvirtual rooms are accessed through a set of perceptual attributes such as warmth,brilliance, etc., in such a way that compositional control of space becomes easier.As the interface is provided in the visual language MAX, it is easy to associate setsof parameters to trajectories that are drawn ina 2-D space.6.3 3D with Headphones6.3.1 LocalizationIntroductionHumans can localize sound sources in a 3D space with good accuracy using severalcues. Ifwe can rely on the assumption that thelistener receives the sound materialvia a stereo headphone,we can reproduce mostof the cues that are due to thefilter-
167.
150 6 Spatial Effectsing effect of the pinna-head-torso system, and inject the signal artificially affectedby this filtering process directly to the ears.Acoustic and Perceptual FoundationsClassic psychoacoustic experiments have shown that, when excited with simple sinewaves, the hearing system uses two strong cues to estimate the apparent directionof a sound source. Namely, interaural intensity and time differences (IID and ITD)arejointly used to that purpose.IID is mainly useful above 1500 Hz, where theacoustic shadow producedby the head becomes effective, thus reducing the intensityof the waves reaching the contralateral ear. For this high-frequency range and forstationary waves, the ITDis also far less reliable, sinceit produces phase differencesin sine waveswhich often exceed 360".Below 1500Hz the IIDbecomes smaller due tohead diffraction which overcomes the shadowing effect. In this low-frequency rangeit is possible to rely on phase differences produced by the ITD. IID and ITD canonly partially explain the ability to discriminate among different spatial directions.In fact,if the sound source moved laterally alonga circle (see Fig. (6.9))the IID andITD would not change. The cone formed by the circle with thecenter of the head hasbeen called cone of confusion. Front-back and vertical discrimination within a coneof confusion are better understoodin terms of broadband signals and Head-RelatedTransfer Functions (HRTF).The system pinna-head-torso actslike a linear filterfora plane wave coming from a given direction. The magnitude and phase responsesof this filter are very complex and direction dependent, so that it is possible for thelistener to disambiguate between directions having the same, stationary, ITD andIID. In some cases, it is advantageous to think about these filtering effects in thetime domain, thus considering the Head-Related Impulse Responses (HRIR).Figure 6.9 Interaural polar coordinate system and cone of confusion.In section 6.3.4, we describe a structural model for HRTFs [BD981 where it ispossible to outline a few relevant direction-dependent cues obtained from simplifi-cation of HRTFs. Even if the cues are simplified, when they are varied dynamicallythey can give a strong impression of localization. In real life, the listener is never
168.
6.3 3D with Headphones 151static when listening to a sound source. Even small movements of the head greatlyhelp in discriminating possible confusions, such as the uncertainty between a cen-tral source in front of the listener or the same source exactly behind the listener.Therefore, a small set of cues such as ITD, IID, and the major notches of HRTFscan be sufficient to give a strong impression of direction, as long as the cues arerelated to movements of the listeners head.6.3.2 Interaural DifferencesSound spatialization for headphones can be based on interaural intensity and timedifferences. The amplitude and time delay of each channel should just be governedby curves similar to those of Fig. 6.2. It is possible to use only one of the two cues,but using both cues will provide a stronger spatialimpression. Of course, interauraltimeandintensity differences arejustcapable of moving theapparentazimuthof a sound source, without any sense of elevation. Moreover, the apparent sourceposition is likely to belocated inside the head of the listener,withoutany senseof externalization. Special measures have to be taken in order to push the virtualsound sources out of the head.A finer localization can beachieved by introducing frequency-dependent interau-ral differences. In fact, due to diffraction the low frequency components are barelyaffected by IID,andtheITD is larger in the low frequencyrange.Calculationsdone with a spherical head model and a binaural model [Kuh77, PKH991 allow thedrawing of approximated frequency-dependent ITD curves, one being displayed inFig. 6.10a for 30" of azimuth. The curve can be further approximated by constantsegments, one corresponding to a delay of about 0.38 ms at low frequencies, andthe other corresponding to a delay of about 0.26 ms at high frequencies. The low-frequencylimitcan in generalbeobtained for a generalincidentangle B by theformula1.56ITD = -sin0 .Cwhere 6 is the inter-ear distance in meters andc is the speed of sound. The crossoverpoint between high and low frequencies is locatedaround 1 kHz. Similarly, theIID should be made frequency dependent. Namely, the difference is larger for high-frequency components,so that we have IID curves suchas that reported in Fig. 6.10bfor 30" of azimuth. The IID and ITD are shown to change when the source is veryclose to the head [DM98]. Inparticular, sources closer than five timesthe headradiusincrease theintensity difference in low frequency. The ITD alsoincreasesfor very close sources but its changes do not provide significant information aboutsource range.6.3.3 ExternalizationListening to binaural sound material through loudspeakers often causes internaliza-tion of the sound sources in the head of the listener. It seems that human subjects
169.
152 6 Spatial EffectsIntensity Differencet1 kHz frequency 1 kHz frequencyFigure 6.10 frequency-dependent interaural time (a) and intensity (b) differenceforazimuth 30".tendto internalize the perceived objects when thetotalstimulation,as comingfrom all sensorial modes, cannot be produced by natural situations involving dis-tant sources [Dur92]. One techniquethat is recognized to be effectivein externalizingthe soundsources when headphones are used, is decorrelation[Ken95b], and it isjustified by the fact that,in natural situations, thesignals reaching the ears aresig-nificantly decorrelated,especially because of room reverberation. The correlation ofthe left and right channels is represented by the functionThe argument r is the time lag between the two channels and U is a normalizingfactor defined as(6.10)If one channel is just a shifted copy of the other, thecorrelation function will have astrong peak for a value of the argument equal to thetime shift. Usually the degreeof correlation is measured by a single number taken as the absolute value of themaximum of the correlation function. The normalization factor is chosen in orderto produce a degree of correlation equal to 1for pure shift and -1 for signals thatare 180" out of phase.Twosignalshavingadegree of correlationequal to 1 inmagnitude are called coherent. The coherence between the left and right channelsis maintained under time and intensity differences, and even under phase inversion.When a subject, listens to two channels of broadband noise via headphones, therelativedegree of correlationproducesaspatialimage that rangesfromcentralinside the head to lateral right out of the ears (see Fig.6.11). Partially coherent)signals produce imagesthat arelarger and less sharply located [Bla83]than those ofperfectly coherent signals.The degree of correlation is generally reduced in presenceof reverberation. Thus, the techniques that are commonly used for artificial rever-beration canalso be used to decorrelate two channels. Of course, if the only purpose
170.
6.3 3D with Headphones 153Figure 6.11 Perceivedsoundimagewhenlistening to broadbandnoisewithdegree ofcorrelation equal to 1 (a), and 0 (b).is to externalize a sound image, we want to avoid other artifacts such as excessivecoloration or reverberant tail. Wewill see in section 6.6.1 how a good decorrela-tor can be built. It can be applied both to externalize a sound source in binaurallistening and to adjust the apparentsource width in listening via loudspeakers.The degree of correlation, beside helping to push the apparent sources out ofthe head, is also useful in arranging an ensemble of sources in space. In fact, it hasbeen shown by experiments that thelistener assignsa unique spatial image to soundsources having similar degreeof correlation. For instance, a coherentcouple of voicesignalsadded to the broadband noise of Fig.6.11b,produces a distinct image inthe center of the head. This property of the hearing system canbeexploited toaugment the clarity of display of a mixture of sound sources.A dramatic increase in externalization is achieved if the binaural stimulusis dy-namically varied to follow the head movementsin a consistent, naturalway [Dur92].However, this control of spatialization based on head tracking is too expensive andtoo cumbersome in most cases. Therefore,it is usually assumed that the listenerkeeps her head fixed while listening, for instance because she is watching a screenin a standard computer desktop environment [Kyr98]. In this situation, the simul-taneous presentation of visual stimuli correlated with sounds is also known to helpexternalize the auditory events.Recently, excellent results in localization accuracyand externalizationhave beenreported in anechoic rendering via headphones under fixed head conditions, evenwhen the details of the filtering characteristics of the head are smoothed signifi-cantly [KC98]. It seems that if the playback apparatus preserves the acoustics of anatural listening condition the sound sources do externalize in most cases. Unfortu-nately, such a playback apparatus requires the correct coupling with the ear-canalimpedance and this is not what happens with conventional headphones. Therefore,thecontrol over the degree of correlation, even if its effectiveness andaccuracyare limited, seems to be the only way we have to externalize a sound source for ageneric listener. Ifwe can afford signal processing specialized to thelistener’s headandheadphonecharacteristics,betteraccuracycanbe achieved by modeling theHRTF. The effects of binaural spectral details on externalization were extensivelyinvestigated in [HW96], where it was shown that sources are externalized well evenwithfrequency-independent ITD while, on the other hand, the effects of IID areaccumulated over the entire spectrum.
171.
154 6 Spatial Eflects6.3.4 Head-Related TransferFunctionsSeveral authors have measured the filtering properties of the system pinna-head-torso by means of manikins or human subjects. A popular collection of measure-ments was taken by Gardner and Martinusing a KEMAR dummy head, and madefreely available[GM94,Gar98al.Measurements of this kind are usuallytaken inan anechoic chamber, wherealoudspeakerplaysa test signal which approachesthe head from the desired direction. The directions should be taken in such a waythat twoneighboringdirections never exceed the localizationblur, which rangesfrom about f3" in azimuth for frontalsources, toaboutf20" in elevation forsourcesabove and slightlybehind the listener [Bla83]. The test signal is usuallyapseudo-random noise such as a Maximum-Length Sequence (MLS) [RV891 or aGolay code [ZGM92],which can easily be deconvolved from the measured response.The result of the measurements is a set of HRIRs that can bedirectly used as coef-ficients of a pair of FIR filters. Since the decay time of the HRIR is always less thana few milliseconds, 256 to 512 taps are sufficient at a sampling rate of 44.1 kHz.A cookbook of HRIRs and direct convolution seems to be a viable solution toprovidedirectionality to soundsources using todays technology. A fundamentallimitation comes from the fact that HRIRs vary widely between different subjects,to such an extent thatfront-back reversalsare fairly commonwhen listening throughsomeone elses HRIRs. Using individualized HRIRs dramatically improves the qual-ity of localization. Moreover, since we unconsciously use small head movements toresolve possible directional ambiguities, head-motion tracking is also desirable.There areseveral reasonsthat make a model of the external hearing systemmoredesirable than a raw catalog of HRIRs. First of all, a model might be implementedmore efficiently, thus allowing more sources to be spatialized in real time. Second,ifthe model is well understood, itmight be described with a few parameters having adirect relationship with physical or geometric quantities. This latter possibility cansave memory and allow easy calibration.As happens with models for soundsynthesis, we can try to model the effectsor the causes of the modifications occurring on sound signals. The first approachconsists in applying data reduction and filter design techniques, especially in thefrequency domain, to the HRTFs. Much research has been devoted to discoveringthe amount of approximation that is tolerated by human listeners and how to de-sign efficient IIR filters that implement the approximated HRTFs[KW92].A recentexperiment has shown that we are quite insensitive to thefine details of the HRTFmagnitude spectrum, and that the lack of externalization often reported in priorstudiesmightbedue to incorrectcoupling between the headphonesandtheearcanal [KC98].Filter design techniques have been applied to the problem of approx-imating thedesired HRTFs by means of low-order yet accurate linear systems. IIRfilters of order as low as ten can be designed so that they keep enoughspectraldetail to allow goodlocalizationaccuracy [MHV97]. Frequencywarpinghas beenproposed as a technique to increase the accuracy of approximations in the low fre-quency range [HZ991by stretching the frequency axis according to the distributionof critical bands [ZF90]. One of the problems of the models based on signal pro-
172.
6.3 3D with Headphones 155cessing is that they do not increase our understanding of the underlying physicalphenomena. As a consequence, it becomes difficult to control the parameters andwe have to rely on collections of static configurations.Modeling the structural properties of the system pinna-head-torso gives us thepossibility of applying continuous variation to the positions of sound sources andto the morphology of the listener. Much of the physical/geometric properties canbe understood by careful analysis of the HRIRs, plotted as surfaces, functions ofthe variables time and azimuth, or time and elevation. Thisis the approach takenby Brown and Duda [BD981who came up with a model which can be structurallydivided into three parts:0 Head Shadow and ITD0 ShoulderEcho0 Pinna ReflectionsStarting from theapproximation of thehead as a rigid spherethatdiffracts aplane wave, the shadowing effect can be effectively approximated by a first-ordercontinuous-time system, i.e., a pole-zero couple in the Laplace complex plane:(6.11)sp = -2wo , (6.12)where WO is related to theeffective radius a of the head and thespeed of sound c bycW O = - . (6.13)aThe position of the zero varies with the azimuth 0 (see Fig. 6.9) according to thefunctiona(0)= 1.05+0.95 COS -180" .(1580" ) (6.14)The pole-zero couple can be directly translated into a stable IIR digital filter bybilinear transformation [Mit98], and the resulting filter (with proper scaling) is(WO +aF,) +(WO - aYFJ2-l= (WO +F,) +(WO - F,)z-l .(6.15)The ITD can be obtainedby means of a first-order allpassfilter [OS89,SF851whosegroup delay in seconds is the following function of the azimuth angle 0:(6.16)Actually, the group delay provided by the allpass filter varies with frequency, butfor these purposes such variability can be neglected. Instead, the filter (6.15) gives
173.
156 6 Spatial Effectsan excess delay at DC that is about 50 percent that given by (6.16). This increasein the group delay at DC is exactly what one observes for the real head [Kuh77],and it hasalready beenoutlinedinFig. 6.10. The overall magnitudeandgroupdelay responses of the block responsible for head shadowing and ITD are reportedin Fig. 6.12. The M-file 6.3 implements the head-shadowing filter.201 Ifrequency in kHz -+-20 l1o-2 1oofrequency in kHz -+Figure 6.12 Magnitude and group delay responses of the block responsible for headshadowing and ITD (fs = 44100 Hz). Azimuth ranging from 0 (dashed line) to 7r at stepsof ~ j 6 .M-file 6.3 (hsf ilter.m)function[output] = h s f i l t e r ( t h e t a , F s , i n p u t )% h s f i l t e r ( t h e t a , F s , i n p u t )% filters the input signal according to head shadowing% t h e t a i s the angle with the frontal plane% Fs i s thesample rate%t h e t a = t h e t a + 90;theta0 = 150 ;alfa-min = 0.05 ;c = 334; % speed ofsounda = 0.08; % radiusofheadWO = c/a;a l f a = l+ alfa_min/2 + (1- alfa_min/2) * cos(theta/thetaO*pi) ;B = [( a l f a+wO/Fs) / (l+wO/Fs) , (-alf a+wO/Fs) / (l+wO/Fs) 1 ;% numeratorofTransferFunction1denominatorofTransferFunctiongdelay = - Fs/wO*(cos(theta*pi/180) - 1)gdelay = Fs/wO*((abs(theta) - 90)*pi/180 + 1)A = [l, -(~-wO/FS)/(~+WO/FS)I;if(abs(theta190)else
174.
6.3 3D with Headphones 157end ;a = (l - gdelay) / (1 + gdelay) ;out-magn = f i l t e r ( B , A , input);output = f i l t e r ( [a, I], [I , a] , out-magn) ;a l l p a s s f i l t e r c o e f f i c i e n tThe function hsf i l t e r has to be applied twice, for the left and right ears withopposite values of argument theta, to a given input sound.In a rough approximation, the shoulder and torso effects are synthesized in asingle echo. An approximate expression of the time delaycanbededuced by themeasurements reported in [BD98, Fig. 81Tsh = 1.2-( ( 180"+ Q1- 0.00004 (4 - 80)180"(6.17)and it is depicted in Fig. 6.13. The echo should also be attenuated as the sourcegoes from frontal to lateral position.7Time delayof shoulder reflection-0.5-1 00 -50 0 50 100Elevation in degrees+Figure 6.13 Delay time of the echo generated by the shoulders as a function of azimuthand elevation. Azimuth ranging from 0 (dashed line) to 7r/3 at steps of 7r/12.Finally, the pinna provides multiple reflections that can be obtained by meansof a tapped delay line. In the frequency domain, these short echoes translate intonotcheswhose position is elevation dependent and that are frequently consideredas the main cue for the perception of elevation [Ken95a].In [BD98], a formula forthe time delay of these echoes is given:TP, = A, cos (8/2) sin (Dn(900- 4)) +B,. (6.18)The parameters aregiven in Table 6.1 together with the amplitudevalues ppn of thereflections. The parameter D, allows the adjustmentof the model to the individualcharacteristics of thepinna,thus providing an effective knob for optimizingthelocalization properties of the model. Figure 6.14 depicts the delay time of the first
175.
158 6 Spatial EffectsTable 6.1Parameters for calculating amplitudeand timedelay of the reflections producedby the pinna model.n2D,B,[samples]A,[samples]pp,-1210.5 N2 0.513 2 0.52.20 204060elevation in degrees --fFigure 6.14 Delay time (in samples at fs = 44100 Hz) of the first echo generated by thepinna as afunction of azimuth and elevation.Azimuthrangingfrom 0 (dashedline) to7r/2 at steps of 7~112.pinna echo (in samples at fs = 44100 Hz) as a function of azimuth and elevation.The corresponding frequency notch lies somewhere between 7 and 10 kHz.The structural model of the pinna-head-torsosystem is depicted in Fig. 6.15with all its three functional blocks, repeated twice for the two ears. Even thoughthe directionalpropertiesareretained by the model, anechoic soundsfiltered bythe model do not externalize well. As we have explained in section 6.3.3, there areseveral ways to improve the externalization in binaural listening.Music Applications and ControlSeveral widely used softwaresystems allow the possibility of spatializingsoundsources for binaural listening. For instance, Tom ErbesSoundhack,an award-winning softwarefor the Macintosh, hasa cookbook of HRTF coefficients and allowsthe musician to locate apparent soundsources in a 3D space. Similar operations canbe carried out using the Csound language [BCOO]and native opcodes.We have already mentioned the IRCAM Spatialisateur as a powerful software
176.
6.4 3D with Loudspeakers 159monoauralinputhead shadow and ITD13”r/shoulder echo/channelright outputFigure 6.15 Structural model of the pinna-head-torso system.spatialization system. It allows specification of a virtual room and virtual sourcetrajectories in3D space, and the rendering can be done eitherfor loudspeakersor forheadphones. In this latter case,HRTFs are used. The ability toprovide twodifferentoutput formats with a single system has been used in the production of the CD ofPierre Boulez’s “Rkpons” [rn-Bou84].In fact, afew early andlucky customers couldget another CD for free from the publishing company, with the binaural version ofthe same performance.6.4 3D with Loudspeakers6.4.1 IntroductionWe outlinethreemainapproaches to soundspatialization by multi-loudspeakerlayouts: holophonic reconstruction, transaural techniques, and methods relying onthe precedence effect.Holophonicreconstruction is thereproduction of a 2D or 3D soundfield in aconfined area asa result of the interference of the wavefronts generated by differentloudspeakers. Diffraction effects around the listener’s head and torso, described insection 6.4.2, canbetakenintoaccount in order to optimize the rendering. Wediscuss two specific holophonic techniques, namely,3D panning and Ambisonics, insections 6.4.3 and 6.4.4.
177.
160 6 Spatial EffectsTransaural spatialization is described in section 6.4.5 as a recasting of binauraltechniques for presentations based on loudspeaker layouts.The relative arrival time of the wavefronts from different loudspeakers can havea dramatic impact on the apparent source directions. A technique that introducesexplicitdelaysamong the loudspeakers is the room-within-the-roommodel,pre-sented in section 6.4.6. This technique, even though less accurate, is less sensitiveto changes of the listener position because it exploits the precedence effect.6.4.2 LocalizationwithMultipleSpeakersA listener facing a couple of loudspeakers receives the two signals X L L and X R L atthe left ear, thefirst coming from the left loudspeaker, and the second coming fromthe right loudspeaker. Symmetrically, the two signals ZRR and X L R are received atthe right ear. If the right loudspeaker has a pure amplitude factor AR and a puretime delay TR,asinusoidal,unit-amplitudesignal at the loudspeakersgeneratestwo sinusoidal signals at the ears. These signals are attenuated and shifted by thefollowing complex weights:where AHand ?-H are the amplitude factor and timedelay given by the head transferfunction to the contralateral ear from the directionof the loudspeaker. The situationis illustrated in Fig. 6.16. Since, in low frequency, AH is almost unity, it can be seenfrom (6.19) with the help of Fig. 6.17 that a pure level difference (i.e. TR = 0) at theloudspeakers generates a pure time delay at the ears. Conversely, a pure time delaybetween the loudspeakers generates a pure level difference at the ears. As is shownin [Bla83],the earsignal caneven be stronger on theside of the lagging loudspeaker.The analysis of stationary signals becomes more complicated at higher frequencies,where the shadowing of the head cannot be ignored. In general, the time and leveldifferences at the earscan give cues that are in contradiction to the time and leveldifferences at the loudspeakers.Figure 6.16 Transfer functions involved in a stereophonic layout.
178.
6.4 3D with Loudspeakers 1611 Left channel I RightchannelFigure 6.17 Vector illustration of the pure time difference (rotation of the bold vector)at the ears generated by a pure level difference at the loudspeakers.Michael Gerzon developed some theoriesof localization in a sound field [Ger92a]that were somewhat validatedby experiments and crafting experience. Gerzoncalledthe meta-theory collecting all these theories “directional psychoacoustics”.The basicassumptions of directional psychoacoustics are:0 At low frequencies (up to about 500 Hz) the signals arriving at the two earshave a stable phasedifference that is less than half a wavelength. The hearingsystemproducesa sense of direction from this phase difference. The phaselocking means that we can do a vector summation of the contributions fromall loudspeakers in order to obtain a “velocity vector gain”U = cgi1i (6.20)ithat is representative of the perceived direction in low frequency. In (6.20),gi is the gain of the i-th loudspeaker and li is the unit-length vector pointingfrom the loudspeaker to the listener (see Fig. 6.1).0 At high frequencies (from about 500 Hz up to about 3500 Hz) the signalsarriving at the two ears are treated as incoherent, and the hearing system ismainly sensitive to the “energy vector gain”(6.21)The facts or assumptions provided by directional psychoacoustics are used to im-prove the quality of multichannel reproduction in several ways (see section 6.4.4).6.4.3 3D PanningThe matrix-based approach used for stereo panning in section 6.2.1 canbe general-ized to an arbitrarynumber of loudspeakers located at any azimuth though nearly
179.
162 6 Spatial Effectsequidistant from the listener. Such a generalization is called Vector Base Ampli-tude Panning (VBAP) [Pu197]and is based on a vector representation of positionsin a Cartesianplane having its centerin the position of the listener. The unit-magnitude vector pointing toward the virtual sound sourceU can be expressed as alinear combination of the unit-magnitude column vectors1~and 1~ pointing towardthe left and right loudspeakers, respectively. In matrix form, this combination canbe expressed as(6.22)Except for degenerate loudspeaker positions, the linear system of equations (6.22)can be solved in the vector of gains g . This vector has not, in general, unit mag-nitude, but can be normalized by appropriate amplitude scaling. The solution ofsystem (6.22) implies the inversion of matrix L, but this can be done beforehandfor a given loudspeaker configuration.The generalization to more than twoloudspeakers in aplane is obtained byconsidering, at any virtual source position, only onecouple of loudspeakers, thuschoosing the best vector base for that position.The generalization to three dimensions is obtained by considering vector basesformed by three independent vectors in space. The vector of gains for such a 3Dvector base is obtained by solving the system(6.23)Of course,havingmore thanthree loudspeakers in a 3D spaceimplies, for anyvirtual source position, the selection of a local 3D vector base.As indicated in [Pu197],VBAP ensures the maximum sharpnessin sound sourcelocation. In fact:0 If thevirtualsound source is located at a loudspeakerposition, only thatloudspeaker has nonzero gain;0 If the virtual sound source is located on a line connecting two loudspeakers,only those two loudspeakers have nonzero gain;If the virtual sound source is located on the triangle delimited by three adja-cent loudspeakers, only those three loudspeakers have nonzero gain.The formulation of VBAP given here is consistent with the low frequency formula-tion of directional psychoacoustics. The extension to high frequencies has also beenproposed with the name Vector Base Panning (VBP) [PBJ98].
180.
6.4 3D with Loudspeakers 1636.4.4 Ambisonics and HolophonyAmbisonics is a technique for spatial audio reproduction introduced in the earlyseventies by Michael Gerzon [Ger85, "951. While Vector Base Panning aims atprojecting sound material into a3D listening space,the focus of Ambisonics is reallyspatial recording, efficient encoding in a few audio channels, and reproduction byan appropriate loudspeaker set-up.In Ambisonics the sound field is preferably encodedusing the so-called B-format.3D B-format is composed of foursignals: W , X , Y , and 2,where W is asignalastaken from an omni-directionalmicrophone, and X , Y , and 2 are signals astaken from figure-of-eight microphones aligned with the orthogonalaxes. If the foursignals have to be producedstarting from a monophonic sound signalS , the followingencoding equations will apply:(6.24)where 8 is the azimuth and $ is the elevation, as indicated in Fig. 6.9. The signalW is called the zero-order spherical harmonic of the sound field, and X , Y , and Zare called the first-order spherical harmonic components.The gain factorsof the X ,Y , and 2 signals are the Cartesian coordinates of the unit length vector pointingto the virtual sound source.The decoding stage will depend upon the actual loudspeaker layout. In the 3Dcase, the simplestlayout is given by loudspeakerspositioned at the corners of acube. If the i-th loudspeaker is found along the direction of the unit vector li, thecorresponding gain isg - - [GIW +G2 [X Y 21.li] .1* - 2(6.25)As mentioned in section 6.4, some theories of directional psychoacoustics havebeen developed, mainly by Michael Gerzon [Ger92a],when to design the gains to beapplied to theloudspeakers for setting the apparentdirection of the soundsource ina way consistent with our perception in a multichannel loudspeaker layout. Thesetheories translate intodifferent values of the gains G1 and G2 in (6.25).Ideally, thesegains should be made frequency dependent,and replaced by filters whose shape canbe carefully controlled [FU98].If the i-th loudspeaker is aligned with one of the axes, the gain factors are thesame asfound in VBP, except for the zero order harmonic andfor the scaling factorG2. In Ambisonics, if G1 = G2 = 1,when a sound source is located in the directionof a loudspeaker, the opposite loudspeaker has a null gain. This property is usedto avoid antiphase signals from couples of loudspeakers, thus adding stability tothe sound image. This is not necessary in VBP, since it uses only a local base of
181.
164 6 Spatial Eflectsloudspeakers and it does not treat thewhole loudspeaker layout as a vector base. Inthis sense, Ambisonics is a global technique. The fact that VBP is a local panningmethod allows the use of arbitrary loudspeaker layouts. Extensionsof Ambisonics togeneral layouts were also proposed by Gerzon [Ger92a], especially for applicationsin surround sound for home theater.Ambisonics and VBPhave been compared in terms of the direction and strengthof the velocity and energy vectors[PBJ98],which are obtainedby normalizing (6.20)and (6.21) to the totalpressure gain and the totalenergy gain, respectively [Ger92a].According to directionalpsychoacousticsthesevectorsshouldpointin the samedirection andshould be as close as possible to one in order to provide a sharp soundimage. The comparison showed that VBPoutperforms Ambisonics for steady soundsources, as expected for the sharpness of a local panning. However, Ambisonics givessmoother transitions between loudspeakers, in such a way that it is more difficultto tellwhere theloudspeakers really are. The conclusions drawnbycomparisonof the vectorgainshavebeen confirmed by informallistening tests on realtimeimplementations.It should be pointed out that these kinds of assessments make some sense onlyif the following assumptions are met:0 the listener remains steady in the sweet spot0 the loudspeakers generate plane waves0 anechoic or free field listening.Inpractice,theseassumptionsarerarelymetandoneshould rely on assessmenttoolsbasedoninterference patterns emerging in arealisticmodel of theactualroom and audience area [LS99].Ambisonics can induce dramatic shiftsin the apparent sourceposition as thelistener moves out of the sweet spot. For instance, the soundimage can switch fromfront to back ifwe just move one step backward. For home theater applications,the need for sharper frontal images in Ambisonics was addressed by introductionof vector transformations on the encodedsignals[GB98], thus giving aknob forcontrolling the forward dominance of sound distribution.In the literature, techniques such as Holophony and Wave-Field Synthesis canbe found. These arebased 011the mathematical property of analytic fields in an en-closure that make them describable as an integral over the boundary. Ifwe can putinfinitely many loudspeakers on aclosed contour line,we can reproduce any pressurefield in the plane area within the contour. In practice, witha finite number of loud-speakers, spatial aliasing puts a severe limit on accuracy. It has been shown [NE981that Ambisonics is a specialcase of Holophony obtained for loudspeakers placedat infinity (i.e., generating plane waves), and that using a numerous circular arrayof microphones and loudspeakers is a feasible way of extending the sweet spot ofaccurate acoustic field reconstruction to something more than a square meter.
182.
6.4 3 0 with Loudspeakers 1656.4.5 Transaural StereoIf binaural audio materialis played through a stereo loudspeaker layout,then almostanyspatialization effect is likely to disappear. However, arelatively inexpensivesignal processing apparatus, proposed in the early sixties by Schroeder [SchGl],canrecreatethe 3D listeningexperience at selected positions by preconditioning thesignals feeding the loudspeakers. The audio display based on loudspeakers is oftenpreferable because it is immune to fatigue and internalization problems that oftenarise with headphones.L RFigure 6.18 Binaural and transaural listening: geometry.Assume that we want to recreate, by means of a couple of loudspeakers, thesignals as they arriveto the earsfrom a couple of headphones. Referringto Fig. 6.18,8 is the distance between the ears in spatial samples, i.e., the distance in metersmultiplied by f s / c , where c is the speed of sound and fs is the sample rate, D isthe distance in samples between a loudspeaker and the nearest ear, 0 is the anglesubtended by a loudspeaker with the median plane. Under the assumption of point-like loudspeakers, the excess distance from a loudspeaker to the contralateral earproduces an attenuation that can be approximatedbyD&sin$+Dg = - (6.26)As we saw in section 6.3,the head of the listener introducesa shadowing effect whichcan be expressed by a lowpass transfer function H ( z ) .These considerations lead tothe following matrix relationship between the signals at the ears(el(z),e2(z))andthe signals at the loudspeakers (L(z),R(z)):where d is the arrival timedifference in samples of the signals at the ears.We shouldconsider d a function of frequency, as in Fig. 6.10.a,but itis usually enough to set itto the low-frequency limit d E 1.5Ssin0, easily derived from (6.8). The symmetricmatrix A can be easily inverted, thus giving(6.28)
183.
....................................Figure 6.19 Binaural to transaural conversion.The conversion from binaural to transaural is realized by the 2-input 2-outputsystem representedby matrix (6.28) and depictedin Fig. 6.19. Here we have outlinedtwofunctional blocks: T ( z ) is a lattice section, and C ( z ) is acomb filter witha lowpass in the feedback loop. This decomposition is useful when cascading thetransaural processor to a mono-to-binaural-stereo converter (see, e.g., section 6.6.1).In such a case, the linearity and time invariance of the functional blocks allow theuse of a single comb filter before the mono-to-stereo converter. This equivalence isillustrated in Fig. 6.20. From a comparison of Figures 6.19 and 6.20 it is clear howthe switchfrom transaural to binaural processingcan be done simply by zeroingthe coefficient g in both the lattice section and the comb filter.-~e1 LMonoW )+Mono-to-BinauralFigure 6.20 Mono-to-transaural-stereo converter.
184.
6.4 3D with Loudspeakers 167Apparently, transaural stereo imposes strict requirements on the position of thelistener andthe absence of reflections in the room. However, acertainamountof head rotation and shiftcanbetolerated and, if the walls are not too close tothe loudspeakers, the earlyroom reflections arriveafterthe impulseresponse ofthe crosstalk canceller has vanished [CB89], so that its effectiveness is preserved.In his book [Gar98a], William Gardner presents different topologies and practicalimplementation details of transaural stereophonic processors, including a thoroughphysical and psychophysical validation.6.4.6 Room-Within-the-Room ModelPanning and Ambisonics are methods for controlling the gains applied to the loud-speakers in order to approximate a targetsound field at a privileged listener position.A completely different approach can be taken by controlling the relative time delaybetween the loudspeaker feeds. A model supporting this approach was introducedby Moore [Moo82], and can be described as a physical and geometric model. Themetaphor underlying the Moore model is that of the room within a room, wherethe inner room has holes in t8hewalls, corresponding to the positions of loudspeak-ers, and the outer room is the virtual room where sound events have to take place(Fig. 6.21). The simplest form of spatialization is obtained by drawing direct soundFigure 6.21 Moore’s room in a room model.rays from the virtualsound source to theholes of the inner room.If the outerroomis anechoic, these are theonly paths taken by sound waves to reach the inner room.The loudspeakers will be fed by signals delayed by an amount proportional to thelength of these paths, and attenuated according to the relationship of inverse pro-portionality valid for propagation of spherical waves. In formulas, if li is the pathlength from the source to the i-th loudspeaker, and c is the speed of sound in air,the delay in seconds is set todi = li/c , (6.29)
185.
168and the gain is set to6 Spatial Effects(6.30)The formula for the amplitude gain is such that sources within the distance of l mfrom the loudspeaker2 will be connected to unity gain, thusavoiding the asymptoticdivergence in amplitude implied by a point source of spherical waves.The model is as accurate as the physical system being modeled would permit.A listener within a room would have a spatial perceptionof the outside soundscapewhose accuracy will increase with the number of windows in the walls. Therefore,the perception becomes sharperby increasing the number of holes/loudspeakers. Inreality, some of the holes will be masked by some walls, so that not all the rays willbe effective3 (e.g. the rays to loudspeaker 3 in Fig. 6.21). In practice,the directionalclarity of spatialization is increased if some form of directional panning is added tothe base model, so that loudspeakers opposite the direction of the sound source areseverely attenuated. In this case, it is not necessary to burden the model with analgorithm of ray-wall collision detection. The Pioneer Sound Field Controller is anexample of 15-loudspeaker hemisphericalarray governed by these principles [MarOl].A special case of the room in a room model is obtained when the loudspeakersare all located along a straight line, for instance above the listeners heads. In thiscase we canthink of holes in the ceiling of theinnerroom,andthis layout isespecially effective in reproducing movementsof the virtual soundsource along onedimension.Figure 6.22 illustrates this one-dimensionalreduction of the model inthe case of four loudspeakers and two listeners. From Fig. 6.22,it is easy to visualizethe robustness of the method todifferent listener positions.L1os4 -Figure 6.22 One-dimensional spatialization model.In fact, the spatialization methods based on amplitude panning are generallydesigned for a tight listener position, and areoverwhelmed by the precedence effectThis distance is merely conventional.3We are ignoring diffraction from this reasoning.
186.
6.4 3D with Loudspeakers 169as soon as the listener is in a different position. The fact that most of the soundimages collapse into oneof the loudspeakers can be easily experienced by attendinga pop music concert, or listening to a conventional car audio system. On the otherhand, the Moore model relies explicitly on the precedence effect, in the sense thatthe soundimage is biased towardthe first loudspeaker being reachedby the acousticrays. As illustrated in Fig. 6.22 for one-dimensional spatialization, in general thereis a wide area that is biased toward the same loudspeaker, just because the timedelay of thevirtual acousticrayssumswith thetime delay of theactualpathsconnecting the loudspeakers with the listener. For instance, a virtual sound sourcelocated in the neighborhood of the loudspeaker S2will be perceived as radiatingfrom S2 by both listeners L1 andL2, even though the loudspeaker S1 is closer thanS2 to L1. This is an intuitiveexplanation of the claim that the Moore model isable to provide consistent and robust spatialization to extended audiences [Moo82].Another reason for robustness might be found in the fact that simultaneous leveland timedifferences are applied to theloudspeakers. This has theeffect of increasingthe lateral displacement [Bla83] even for virtual sound sources such that the raysto different loudspeakers have similar lengths. Indeed, the localization of the soundsource becomes even sharper if the level control is driven by laws that roll off morerapidly than the physical l/d law of spherical waves.An added benefit of the room within a room model is that the Doppler effect isintrinsically implemented. As the virtual sound source is moved in the outer roomthe delay lines representing the virtual rays change their lengths, thus producingthe correct pitch shifts. It is true that different transpositions might affect differentloudspeakers, as the variations aredifferent for different rays, but this is consistentwith the physical robustness of the technique. For instance, if in Fig.6.22 thesource is moving downwards starting from the middle of the room, the loudspeaker2 will produce a downward transposition, and loudspeaker4 will produce an upwardtransposition. Accordingly, the listeners located close to one of the loudspeakers willperceive the transposition that is most consistent with their position with respectto the virtual soundsource.The model of the room within a roomworks fine if the movements of the soundsource are confined to a virtual space external to theinner room. This correspondsto an enlargement of the actual listeningspace and it is oftena highly desirablesituation. Moreover, it is natural to try to model the physical properties of theouter room, addingreflections at the walls and increasing the number of rays goingfrom a sound source to theloudspeakers. This configuration, illustrated in Fig. 6.21with first-order reflections, is a step from spatialization to reverberation and will befurther discussed in section 6.5.Music Applications and ControlSoundspatialization by means of loudspeakers has accompanied the history ofelectro-acoustic music in the second half of the twentieth century. An earlyspa-tializationsystem, the potentiomitre d’espace was used by J. Poullin to projectSchaeffer’s sounds of musiqueconcrite intospace [Pou57]. It is interesting that,
187.
170 6 Spatial Eflectseven in the fifties, the limitations of reproduction by loudspeakers, such as the in-evitability of the sweet spot, and thedifficult interplay between the actuallisteningroom and the simulated space, were very clear [DiS98].In section 6.2.5 we talked about spatialization in the music of Stockhausen sincehis influential work “Gesang der Junglinge” [m-Sto56]. In the early realizations, theloudspeakers were treated as actual sources, and trajectories were programmed bydetailed control of the relative gains of multiple audio channels. The idea of havingvirtual sound sources that are detached from the actual loudspeaker positions be-came popular afterthe works of John Chowning inthe USA, and Hans PeterHallerin Europe. The former proposed a fully-computerized system for designing spatialtrajectories. The latterdesigned an analog device, called a Halaphon, that allowedcontrol of rotational trajectories in real time, by means of sliding potentiometers.The halaphon was used in many importantcompositions, for instance, earlyversionsof “R6pons” by Boulez [m-Bou84], or “Prometeo” [m-Non82]by Nono. Indeed, thatdevice is especially important because it helpedestablish a practice of real-timeperformance by means of electronic devices (also called Live Electronics) [Ha195].The layout of Fig. 6.22 was used in live electro-acoustic music performance tosimulate sound wavefronts going back and forth above the audience [m-Pis95]. Inthis case, the Doppler shift affecting noisy sounds is a desirable side-effect of thespatializationsystem that magnifies thedirectionand velocity of movements. Atwo-dimensional implementation of the Moore spatialization model was used in apiece by Cifariello Ciardi [m-Cif95],where the main electro-acoustic gesture is thatof the rolling metallic ball of a pinball being launched.6.5 Reverberation6.5.1 Acoustic and Perceptual FoundationsIn theprevious sections, our mainconcern has been the apparentpositions of soundsources. The effects of the surrounding environment have been confined to changesin the perceived distance and position of the apparent source. In this section, wefocus on sound reverberationas a natural phenomenon occurring when soundwavespropagate in an enclosed space. Reverberation brings information about the natureand texture of materials,andaboutthe size andshape of the room and of theobjects inside it.In order to analyze the various acoustical aspects of reverberation, we will con-sider a rectangular room containing an omni-directional point source. Other, morecomplicated and realistic situations can be considered to a large extent as a gener-alization of this simple case.Point Source in an Enclosed SpaceConsider the pressure wave generated by a small sphere pulsing at the radian fre-quency W . The corresponding wave number is defined as IC = w/c. Theparticle
188.
6.5 Reverberation 171velocity of air is radial, in phase with the pressurefor large distances T (r >> l/k)and in phase quadrature with pressurefor small distances (T < l/k) [MI86].At longdistance from the source (far field), we approximate the condition of propagationof plane waves. On the other hand, near the source (near field), there is a largevelocity component that, being in quadrature with pressure,is responsible for somereactive energy that does not radiate. A simplified analysis may conclude that, assoon as the sound waves have traveled for a sufficiently long path from the source,we can consider them as planewaves, thus leaving the more complicated formalismof spherical propagation. We will see that the plane wave description can be eas-ily interpreted in the frequency domain and it allows a straightforward analysis ofsteady-state responses. In the proximity of the excitation event, we need descrip-tions based on spherical waves. Such descriptions are better carriedon in the timedomain.Later on, we will introduce a third level of description, useful to account for“fine grain” phenomena that are not easily described by the two other approaches.Namely, we will introduce a particle description of sound, which we will take ad-vantage of when describing the diffusion due to roughwalls.Frequency Domain: Normal ModesThe acoustics of an enclosed space can be described as a superposition of normalmodes, which are standing waves that can be set upin the gasfilling the box. Sucha description can be derived analyticallyfor simple shapes, such as the rectangularone, and it represents a powerful tool for understanding the behavior of the roomin the frequency domain.It is easier to calculate the normalmodes under the assumption that the soundwaves are propagatingin the farfield and after the endof the initial transient. Thisassumption allows considerationof all the wavefronts as planes. The normalmodesof a rectangular room havingsizes [Zz, 1, ,Zz] can be derived by forcing a plane-wavesolution into the 3D wave equation and setting the particle velocity to zero at theboundary [MI86]. As a result, we get the frequencies of the normal modeswhere(6.31)(6.32)is a triplet of non-negative integer numbers characterizing the normal mode.Each normal mode is supported by a plane wave propagating in the directionrepresented by the vector(6.33)
189.
172 6 Spatial EffectsIt is clear from (6.31) and (6.33)that a direction is associated with the modes hav-ing frequencies that are multiples of the same fundamental. In other words, all thetriplets that are multiples of an irreducible triplet are associated with a harmonicseries of resonant frequencies with the same direction in space. This property sug-gests that a harmonic series of normal frequencies can be reproduced by means ofa comb filter, i.e. by means of a feedback delay line whose length is1d = - ,fo(6.34)where fo is the fundamental of the harmonic series.The collection of a normalmodeand all itsmultiplescanbethought of asa plane wave bouncing back and forth in the enclosure. In order for an enclosurehaving finite extentto support aninfinite plane wavefront,it is necessary to bend thewavefront at the walls in such a way that it forms a closed constant-area surface.This interpretation can be visualized in twodimensions by means of plane waveloopshavingconstantlength (see Fig. 6.23). In thisrepresentationit is easy toverify that the time interval d is the time lag between two consecutive collisions ofplane wavefronts along the diagonal.Figure 6.23 Plane waveloops for the normal mode n, = 1,nu = 1.Not all the modes are excited with the same intensity in all positions. For in-stance, in thecentralpoint of arectangularbox, onlyone-eighth of the modesget excited, i.e. only those modes having all even components in the triplet n, asthe other modes have a nodal point in the middle. The most intense excitation isobtained at the cornersof the room.Time Domain: Acoustic RaysAnother kindof description of room acoustics considersthe propagationof acousticrays, i.e. portions of spherical waves having small aperture. This is the approachtaken in geometricalacoustics,which is analogous to geometricaloptics for lightwaves.Some basic assumptions have to be made in order to ensure the validity of de-scription by geometric rays.The first assumptionis that thewavelengths interestedby propagation aremuch smaller than thefinest geometric descriptionof the room.
190.
6.5 Reverberation 173The second assumption is that we can ignore all the phenomena of diffraction andinterference. The first assumptionis not very restrictive, since acoustic rays areusedto represent events in a short time-scale, i.e. in a very large frequency range, anddiffraction is also negligible at high frequencies. The absence of interference is alsoverified in most practical cases, where rays are mutually incoherent.Along an acoustic ray, the pressure decreases as thereciprocal of distance, sincethe energy conveyed by a spherical wavefront has tobe conserved during propaga-tion, and the area occupied by the front increases as the square of distance fromthe source.When a ray hits a surface, it is reflected in a way that depends on the natureof the surface. Smooth walls produce mirror reflections, i.e. the angle 0 formed bythe incoming ray with the normal to thesurface is equal to theangle formed by theoutgoing ray with the same normal. Moreover, the incoming ray, the outgoing ray,and the normal, all lie on the same plane.Usually there is some filtering associated with areflection, due to theabsorbingproperties of the wall material. In general, there is a frequency-dependent attenua-tion and a non constant time delay. Such filtering can be represented by a complexreflection function R relating the outgoing and incoming pressure wavefronts. R isdependent on the angle 0 according to the formula [KutSl]R =ZcosB - 1ZcosB +1 ’(6.35)where Z is the characteristic impedanceof the wall. If the wall is smooth and rigidZ approaches infinity. In this case we say that the reflection is perfect, andthereflection function R is unitary at all frequencies and for any angle.There are two popular approaches to compute acoustic rays for room acoustics:ray tracing and theimage method. Ray tracing has been very popular in computergraphics since theeighties as a techniquefor producing realistic images.In graphics,an image is constructed by tracing light rays back from the eye (or the focal plane)to thelight sources. In audio, anacoustic image is obtained by tracing acoustic raysback from the ear (or the microphone points) to the sound sources. In both cases,the scene is traversed by the rays and all the reflections are correctly reproduced.An important difference between sounds and images is that, while in graphics thelight canbeconsidered aspropagated in all thepoints of the scene in aframesample, in audio the sample rate is much higher and the speed of propagation ismuch smaller. As a consequence, each ray has to be associated with a delay line,thus adjoining memory occupation and computational complexityto a method thatis already quite time-consuming.The image method [AB791is trivial when explained by means of a 2-D medium(e.g. a rectangular membrane). In Fig. 6.24 a source S is reflected symmetrically atthe boundaries, and thesource images are reflected about the images of the bound-aries. The resulting tessellation of the plane is such that every segment connectingan image of the source with a receiving point R corresponds to one unique pathconnecting the source to thereceiver after a certain number of specular reflections.
191.
174 6 SpatialEffectsThis method is easily extended to 3-D and, with many complications and increasein complexity, to rooms described as arbitrary polyhedra [Bor84].Both the ray tracing technique and the image method are mainly used by ar-chitects in acoustic Computer Aided Design, where there is no special concern forreal-timeperformance. On the other hand, in theseapplicationstheaccuracy ofresults is particularly important in predicting the acoustic quality of a hall beforeit is constructed.r - - - - ~ - - - -I II IT - - - - iI 0 IIlIII IFigure 6.24 The image method in 2-D.Sounds as Particles: DiffusionUp to this point we have discussed room acoustics at thelevel of macroscopic phe-nomena, i.e. raysandnormal modes. If thetarget is complexity(orrealism) atthe final result, we need to consider also the microscopic phenomena occurring ina physical environment. The most important of these second-order phenomena isdiffusion of sound waves due to the fine grain geometry of the enclosure. Diffusionis better described using a particle representation of sound. In computer graphicsasimilar representation is used for modeling surface light diffusion and shadows. Inorder to adapt the same tools to acoustics,we need to keep the assumptionof broad-band signals. Moreover, we are interested in local interactions with surfaces havinggeometric irregularities (roughness) that are comparable with the wavelengths weare dealing with. Still, we need to assume that the acoustic particles are mutuallyincoherent, in order to avoid cancellations between particles. Once these hypothe-ses are assumed to be valid, it is natural to consider a sound particle as any briefacoustic signal having a broad frequency extension.The diffusion model that is commonly used in computer graphics [CW93] canbe readily adapted to the acousticdomain[Roc96], so that we can use the samesymbols andterminology. In particular,diffusion at a certain point of the enclosureis described by a Bidirectional Reflection Distribution Function (BRDF)fr($i, $ o ) ,which is the ratio between the radiance reflected along direction $o and the irradi-ance incident along direction $i.The diffusion is said to beLambertian when theBRDF is constant.Inthiscase the soundparticlesare diffused in anydirectionwith the sameprobability,
192.
6.5 Reverberation 175regardless of the incident direction. Vice versa, if all the sound particles arrivingfrom direction $Ji are redirected to thespecular direction $, = -$Ji we have a mirrorreflection, i.e. no diffusion at all. Actual walls are always somewhere in between amirror reflector and a Lambertian diffusor.Quantities such as those defined in this section have been useful in improvingthe accuracy of estimates of the reverberation timeof real rooms [Kut95].Objective and Subjective Attributes of Room AcousticsA very short soundpulse, such as a gun shot, generatesa reverberating tail that hasin most cases an exponentially decaying shape.The fundamental objective attributeof roomacoustics is the reverberation time, defined by W.C. Sabine as the timeinterval in which the reverberation level drops down by 60 dB.Extensive experimentation by computer music researchers showed that a verygood, if not the best, impulseresponse for reverberation is obtained by generat-ingGaussianwhite noise and enveloping itwithanexponential decay [Moo79].The actual reverberated sound can be obtained by convolution, even though thisoperation is computationally expensive. Any frequency-dependent attenuation canbe introduced by pre-filtering the impulse response. A finer control of the qualityof reverberation can be exerted by modification of “perceptual factors”, which aresubjective attributes that have been extracted from psychophysical experimenta-tion.Research in this field proceeds by extracting scales of subjectiveacousticalexperience and by correlating them with physical parameters. Most of the studiesconducted up to 1992 were summarized in [Ber92].Research at IRCAM in Paris [Ju195]tried to provide a minimal set of independ-ent parameters that give an exhaustive characterization of room acoustic quality.These parameters are divided into three categories [Jot99]:1. Source perception (relatedto the spectrum andrelative energy of direct soundand early reflections):presence (ratio between direct sound and early reverberated energy)0 brilliance (high frequency variation of early reverberated energy)0 warmth (low frequency variation of early reverberated energy)2. Source/Room interaction (relatedto therelative energies of direct sound,earlyand late reflections, and to theearly decay time):0 envelopment (energy of early reflections relative to direct sound)0 room presence (energy of late reverberated sound)0 running reverberance (early decay time of the room impulse response)3. Room perception (related to the late decay time and to its frequency varia-tions):
193.
176 6 Spatial Effects0 late reverberance (late decay time of the room impulse response)0 heaviness (low frequency variation of decay time)0 liveness (high frequency variation of decay time).In a model that providesthese parameters as knobsin a “perceptualinterface”,the “presence” parameter can be controlled to give an impression of distance of thesound source,in a way similar to whatwe showed in section 6.2.3,and the“envelop-ment” parameter can be adjusted to control theimpression of being surrounded bysound. Some of the parameters areperceived as timbralcolorations while the sourceis playing (running reverberance), while some others are perceived from the soundtails left inpausesbetweensoundevents(latereverberance).Griesinger [Gri99]gives a definition of these two quantities that is based on slicing the room impulseresponse into segmentsof 160 ms each. Running reverberation, “the amount of selfsupport one hearswhile playing”, is defined as the ratiobetween the energies in thesecond and first segmentsof the impulse response,and itis a measureof reverberantloudness that is independent of reverberation time. A room impulse response witha reverberation time of 0.5 S can sound just asloud (in terms of running reverber-ance) as a room with a reverberation time of 2 S [Gri94].The running reverberanceis an important parameter of roomacoustics and its preferred value dependsonthe instrument and on themusic genre [Gri94], being low for speech and relativelyhigh for orchestral romantic music. Therefore, in artificial reverberation it is usefulto have a knob that allows adjustment of the running reverberance to fit the musicthat is being played. Most of the times, the goal is to make reverberation audiblebut not overwhelming.Envelopment is acontroversial attribute,that is difficult to formalize but isknown to bea key component of a pleasant source/room combination.The most re-cent investigations tendto relate envelopment to fluctuations in ITD andIID due tothe spatial perceptionof early reflections. Therefore, to appreciate envelopment theearly reflections should be appropriately spatialized. Blauertand Lindemann [BL86]showed that lateral reflections below 3 kHz contribute to a sense of depth, whilehigherfrequencycomponents contributeto a sense of surround. However, sincelateral reflections produce ITD fluctuations and ITD is direction and frequency de-pendent, the most effective angle for a lateral reflection depends on frequency. Lowfrequency reflections produce the largest fluctuations if they come from lateral di-rections that are perpendicular to the median plane, while higher frequencies aremore effective if the arecloser to themedian plane [Gri97]. Griesinger [Gri97,Gri991developed a theory of spatial impression (another termmeaning envelopment) thatputs this multifaceted attributeof room acoustics into a cognitive frameworkwhereit is related to other phenomena such as the precedence effect. Namely, there is a“continuous spatialimpression” that is perceived when lateral reflections are addedto the perception of continuous sounds. Abrupt changes, which are always accom-panied by the precedence effect, give rise to an “early spatial impression” by meansof lateral reflections coming within 50 ms of the sound event, andto a “backgroundspatial impression” by means of spatialized late reverberation. This latter form ofspatial impression is considered to be the most important for the overall perception
194.
6.5 Reverberation 177of envelopment, as it is perceptually segregated from the streams of sound eventsthat form a sonic foreground. Envelopment is considered to be a desirable feature oflarge rooms, andis absent in small rooms.However, sometimes envelopment is con-fused with the apparentsource width, which is also an attributeof the source/roomcombination, but this can be high even in small rooms. The apparent source widthwill be further discussed in section 6.6.1.6.5.2 Classic Reverberation ToolsIn the second half of the twentieth century,several engineers and acousticians triedto inventelectronic devices capable of simulating the long-term effects of soundpropagation in enclosures. The most important pioneering work in the field of ar-tificial reverberation has been that of Manfred Schroeder at the Bell Laboratoriesin the earlysixties [SchGl, Sch62, Sch70, Sch73, SL61]. Schroederintroduced therecursive comb filtersand thedelay-based allpass filtersas computational structuressuitable for the inexpensive simulation of complex patterns of echoes. In particular,the allpass filter based on the recursive delay line has the formy(n) = - g . x(.) +z(n- m) +g . y(n - m) , (6.36)where m is the length of the delay in samples. The filter structure is depicted inFig. 6.25, where A ( z ) is usually replaced by a delay line. This filter allows a denseimpulseresponse and a flatfrequencyresponse to beobtained. Such a structurerapidly became a standardcomponent used in almost allthe artificial reverberatorsdesigned until nowadays [Moo79].It is usually assumedthat theallpass filtersdo notintroduce coloration in the input sound. However, this assumption is valid from aperceptual viewpoint only if the delay line is much shorter than theintegration timeof the ear, i.e. about 50 ms [ZF90]. If this is not the case, the time-domain effectsbecome much more relevant and the timbre of the incoming signal is significantlyaffected.Figure 6.25 The allpass filter structure.In the seventies, Michael Gerzon generalized the single-input single-output all-pass filter to a multi-input multi-output structure,where the delay line of m samples
195.
178 6 Spatial Effectshas been replaced by a order-N unitary network [Ger76]. Examples of trivial uni-tary networks are orthogonal matrices, and parallelconnections of delay lines orallpass filters. The idea behind thisgeneralization is that of increasing the complex-ity of the impulse response without introducing appreciable coloration in frequency.According to Gerzon’sgeneralization,allpassfilterscanbenestedwithinallpassstructures, in a telescopic fashion.Suchembedding is shown to beequivalent tolattice allpass structures [Gar98b], and it is realizable as long as there is at leastone delay element in the block A ( z ) of Fig. 6.25.An extensive experimentation on structures for artificial reverberation was con-ducted by Andy Moorer in the late seventies [Moo79]. He extended the work doneby Schroeder [&h701 in relating some basic computational structures (e.g., tappeddelaylines,comb and allpassfilters)with the physicalbehavior of actual rooms.In particular, it was noticed that the early reflections have great importance in theperception of the acoustic space, and that a direct-form FIR filter can reproducetheseearlyreflectionsexplicitlyandaccurately. Usually this FIR filter is imple-mented as a tapped delay line, i.e. a delay line with multiple reading points thatare weighted and summed together to provide a single output. This output signalfeeds, in Moorer’s architecture, a series of allpassfilters and a parallelgroup ofcomb filters. Another improvement introduced by Moorer was the replacement ofthe simple gainof feedback delaylines in comb filters withlowpass filters resemblingthe effects of air absorption and lossy reflections.An original approach to reverberation was taken by Julius Smith in 1985, whenhe proposed the Digital Waveguide Networks (DWN) as a viable starting point forthe design of numerical reverberators [Smi85].The idea of Waveguide Reverberatorsis that of building a network of waveguide branches (i.e., bidirectional delay linessimulating wave propagation in a duct or a string) capableof producing the desiredearly reflections and a diffuse, sufficiently dense reverb. If the network is augmentedwith lowpass filters it is possible to shape the decay time with frequency. In otherwords, waveguide reverberatorsarebuilt in twosteps: the first step is the con-struction of a prototype lossless network, the second step is the introduction of thedesired amount of losses. This procedure ensures good numerical properties and agood control over stability [Smi86,Vai931. In ideal terms, the qualityof a prototypelossless reverberator is evaluated with respect to the whiteness and smoothness ofthe noise that is generated as response to animpulse. The fine control of decay timeat different frequencies is decoupled from the structural aspectsof the reverberator.Among the classic reverberationtools we shouldalsomention thestructuresproposed by StautnerandPuckette[SP82],and by Jot [Jot92].Thesestructuresform the basis of the Feedback Delay Networks, which will be discussed in greaterdetail in section 6.5.3.Clusters of Comb/Allpass FiltersThe construction of high-qualityreverberators is half an art and half a science.Several structuresandmanyparameterizations were proposed in the past, espe-cially in non-disclosed form within commercial reverb units [Dat97]. In most cases,
196.
6.5 Reverberation 179Figure 6.26 Moorer’s reverberator.the various structures are combinations of comb and allpass elementary blocks, assuggested by Schroeder in his early works. As an example, we briefly describe theMoorer’s preferred structure [Moo79], depicted in Fig. 6.26. The block (a) of theMoorer’s reverb takes care of the early reflections by means of a tapped delay line.The resulting signal is forwarded to the block (b), which is the parallel of a directpath on onebranch,and a delayed, attenuated diffuse reverberator on the otherbranch. The output of the reverberator is delayed in sucha way that the last ofthe early echoes coming out of block (a) reaches the output before the first of thenon-null samples coming out of the diffuse reverberator. In Moorer’s preferred im-plementation, the reverberator of block (b) is best implemented as a parallel groupof six comb filters, each with a first-order lowpass filter in the loop, and a singleallpassfilter. In [Moo79], it is suggested setting the allpass delay length to G msand the allpass coefficient to 0.7. Despite the fact that any allpass filter does notadd coloration in the magnitude frequency response, its time response can give ametallic character to the sound,or add some unwanted roughness and granularity.The feedback attenuation coefficients and the lowpass filters of the comb filters canbe tuned to resemble a realistic and smooth decay. In particular, the attenuation
197.
180 6 SpatialEffectscoefficients gi determine the overall decay time of the series of echoes generated byeach comb filter. If the desired decay time (usually defined for an attenuation levelof 60 dB) is T d , the gain of each comb filter has to be set togi = 10- ,3%(6.37)where fs is the sampling rate and mi is the delay length in samples. Further at-tenuation at high frequencies is provided by the feedback lowpass filters, whosecoefficient can also be related with decay time at a specific frequency or fine tunedby direct experimentation. In [Moo79],an example set of feedback attenuation andallpass coefficients is provided, together with some suggested values for the delaylengths of the comb filters.As a general rule, they should be distributedover a ratio1: 1.5 between 50 and 80 ms. Schroeder suggested a number-theoretic criterion fora more precise choice of the delay lengths [Sch73]:the lengths in samples should bemutually coprime (or incommensurate) to reduce the superimposition of echoes inthe impulse response, thus reducing the so called flutter echoes. This same criterionmight be applied to the distances between each echo and the direct sound in earlyreflections. However, as it was noticed by Moorer [Moo79],the results are usuallybetter if the taps arepositioned according to thereflections computed by means ofsome geometric modeling technique, suchas theimage method. As will be explainedin section 6.5.3, even the lengths of the recirculating delays can be computed fromthe geometric analysis of the normal modes of actual room shapes.6.5.3 Feedback Delay NetworksIn 1982, J. Stautner and M. Puckette [SP82] introducedastructure for artificialreverberation based on delay lines interconnected in a feedback loop by means ofa matrix (see Fig. 6.27). More recently, structures such as this have been calledFeedback Delay Networks (FDN). The Stautner and Puckette FDN was obtainedas a vector generalization of the recursive comb filtery(n) = z(n- m) +g . y(n - m) , (6.38)where the m-sample delay line was replaced by a bunch of delay lines of differentlengths, and the feedback gain g was replaced by a feedback matrix G. Stautnerand Puckette proposed the following feedback matrix:0 1 1G - g [ - lf i 1 0 0(6.39)0 1 - 1 0Due to itssparse specialstructure, G requires only one multiply peroutput channel.More recently,Jean-Marc Jot has investigated the possibilities of FDNs verythoroughly. He proposed using some classes of unitary matrices allowing efficientimplementation. Moreover, he showed how to control the positions of the poles of
198.
6.5 Reverberation 1811,l a1,2 1.3 a1.42.1 a2.2 a3,2 a4.23.1 a3.2 a3,3 a3.4a4.1 4.2 a4,3 a4,4II ~nI ~nI1=&Figure 6.27 Fourth-order feedback delay network.the structure in order toimpose a desired decay time at various frequencies [Jot92].His considerations were driven by perceptualcriteriaandthegeneral goal is toobtain an ideal diffuse reverb. In this context,Jot introduced the important designcriterion that all the modes of a frequency neighborhood should decay at the samerate, in order to avoid the persistence of isolated, ringing resonances in the tail ofthe reverb [JC91]. This is not what happens in real rooms though, where differentmodes of close resonance frequencies can be differently affected by wall absorption[Morgl].However,it is generally believed that theslow variation of decay rates withfrequency produces smooth and pleasant impulse responses.General StructureReferring to Fig. 6.27, an FDN is built starting from N delay lines, each being~i = miT, seconds long, where T, = l/fsis the samplinginterval. The FDN is
199.
182 6 Spatial Effectscompletely described by the following equations:NNsz(n+mi) = cai,jsj(n)+biz(n)j=1(6.40)where si(n), 15 i 5 N , are the delay outputs at the n-th time sample. If mi = 1for every i , we obtainthe well-known state spacedescription of adiscrete-timelinear system [Kai80]. In the case of FDNs, mi are typically numbers of the ordersof hundreds or thousands, and the variables si(n) are only asmallsubset of thesystem state at timen, being the whole state represented by the content of all thedelay lines.From the state-variable description of the FDN, itis possible to find the systemtransfer function [Roc96, RS97] asA]-b +d. (6.41)The diagonal matrix D(z) = diag (zPm1,z-mz ,...z - ~ ~ )is called the delay ma-trix, and A = [ a i , j ] ~ ~ ~is called the feedback matrix.The stability properties of a FDN are all ascribed to the feedback matrix. Thefact that llAll" decaysexponentiallywith n ensures thatthe whole structure isstable [Roc96, RS971.The poles of the FDN are found as the solutions ofdet[A - D(z-l)] = 0 . (6.42)In orderto have all the poles on the unitcircle, it is sufficient to choose a unitarymatrix. This choice leads to theconstruction of a lossless prototype but this is notthe only choice allowed.The zeros of the transfer functioncan also be found [Roc96, RS97] as the solu-tions ofdet[A - b-cT - D(.-)] = 01d(6.43)In practice, once we have constructed a lossless FDN prototype, we must insertattenuation coefficients and filters in the feedback loop. For instance, following theindications of Jot [JC91],we can cascade every delay line with a gaing . - (p2 - (6.44)This corresponds to replacing D ( z ) with D ( z / a ) in (6.41). With this choice of theattenuation coefficients, all the poles arecontracted by the samefactor a. As a
200.
6.5 Reverberation 183consequence, all the modes decay with the same rate, and the reverberation time(defined for a level attenuation of 60 dB) is given byT d = - .-3Tsloga(6.45)Inorder to have a faster decay at higherfrequencies, as happens in realen-closures, we mustcascade the delay lines with lowpass filters. If the attenuationcoefficients gi are replaced by lowpass filters, we can still get a local smoothness ofdecay times at various frequencies by satisfying the condition (6.44),where gi anda have been made frequency dependent:G ~ ( z )= ami(^), (6.46)where A ( z )can be interpreted as per-sample filtering [JS83, JC91, Smi921.It is important to note thata uniform decay of neighboring modes, even thoughcommonly desired in artificialreverberation, is notfound in realenclosures. Thenormal modes of a room are associated with stationary waves, whose absorptiondepends on the spatialdirections taken by these waves. For instance, in a rectangn-lar enclosure, axial waves are absorbed less than oblique waves [Morgl]. Therefore,neighboring modes associated with different directions can have different reverber-ation times. Actually,for commonly found rooms having irregularities in the geom-etry and in the materials, the response is close to that of a room having diffusivewalls, where the energy rapidly spreads among the different modes. In these cases,we can find that the decay time is quite uniform among the modes [Kut95].ParameterizationThe mainquestionsarisingonce we haveestablishedacomputational structurecalled FDNare:Whatarethe numbers that canbe put in place of themanycoefficients of the structure? How should these numbers be chosen?The most delicate part of the structureis the feedback matrix. In fact, itgovernsthestability of the whole structure. In particular, it is desirable to start withalossless prototype, i.e. a reference structure providing an endless,flat decay. Thereader interested in general matrix classes that might work as prototypes is referredto the literature [Jot92, RS97, Roc97, Gar98bl. Here we only mention the class ofcirculant matrices, having the general formA matrix such as thisis used in the Csound babo opcode. The stability of a FDN isrelated to themagnitude of its eigenvalues, which can be computed by the DiscreteFourier Transform of the first row, in the case of a circulant matrix. By keeping
201.
184 6 SpatialEffectsthese eigenvalues on the unit circle (i.e., magnitude one) we ensure that the wholestructure is stable and lossless. The control over the angle of the eigenvalues canbetranslatedinto a directcontrol over the degree of diffusion of the enclosurethat is beingsimulated by the FDN. The limiting cases are the diagonal matrix,corresponding to perfectly reflecting walls, and the matrixwhose rows are sequencesof equal-magnitude numbers and (pseudo-)randomly distributed signs [Roc97].Another critical set of parameters is given by the lengths of the delay lines. Sev-eral authorssuggested the use of sample lengthsthat aremutually coprime numbersin order to minimize the collision of echoes in the impulse response. However, if theFDN is linked to a physical and geometrical interpretation,asit is donein theBall-within-the-Box (BaBo) model [Roc95],the delay lengths are derived from thegeometry of the roombeingsimulated and the resultingdigitalreverbquality isrelated to the quality of the actual room. How such derivation of delay lengths isactually performed is understandable from Fig. 6.23. A delay line will be associatedto a harmonic series of normal modes, all obtainable from a plane wave loop thatbouncesback and forth within the enclosure. The delay length for the particularseries of normal modes representedin Fig. 6.23is given by the time interval betweentwo consecutive collisions of the plane wavefront along the main diagonal, i.e. twicethe time taken to travel the distance(6.47)being fo the fundamentalfrequency of the harmonic modalseries. The extension ofthe BaBo model to spherical enclosures was presented in [RDOl].6.5.4 ConvolutionwithRoomImpulseResponsesIf the impulseresponse of a target room is readilyavailable, the mostfaithfulreverberation method would be to convolve the input signal with such a response.Direct convolution can be done by storing each sampleof the impulse response as acoefficient of an FIRfilter whose input is the drysignal. Direct convolutionbecomeseasily impractical if the length of the target response exceeds small fractions of asecond, as it would translate intoseveral hundredsof taps in the filter structure. Onesolution is to perform the convolution block by block in the frequency domain:Giventhe Fourier transform of the impulse response, and theFourier transform of a blockof input signal, thetwo can be multiplied point by point and the result transformedback to the time domain. As this kind of processing is performed on successiveblocks of the input signal, the output signal is obtained by overlapping and addingthe partial results [OS89].Thanks to theFFT computation of the discrete Fouriertransform, such techniques can be significantly faster. A drawback is that, in orderto be operated in real time,a block of N samples must be read and then processedwhile a second block is being read. Therefore,the input-output latency in samples istwice the size of a block,and thisis not tolerablein practical real-time environments.The complexity-latency trade-off is illustrated inFig. 6.28, wherethe direct-formand theblock-processing solutions can be located, together witha third efficient yet
202.
6.5 Reverberation 185low-latency solution [Gar95, Mii1991. This third realization of convolution is basedon a decomposition of the impulse response into increasingly largechunks. Thesize of each chunk is twice the size of its predecessor, so that the latency of priorcomputation can be occupiedby the computations relatedto thefollowingimpulse-responsechunk.Details and discussion on convolution were presented in section2.2.4.lDirect form FIRI Block-based FFTlatencyFigure 6.28 Complexity vs. latency trade-off in convolution.Even ifwe have enough computer power to compute convolutions by long im-pulse responses in real time, there are still serious reasons to prefer reverberationalgorithms based on feedback delay networks in many practical contexts. The rea-sons are similar to those that make a CAD description of a scene preferable to astill picture whenever several views have to be extracted or the environment hasto be modified interactively. In fact, it is not easy to modify a roomimpulse re-sponse to reflect some of the room attributes, e.g. its high-frequency absorption,and it is even less obvious how to spatializethe echoes of the impulseresponsein order to get a proper sense of envelopment. If the impulse response is comingfrom a spatial rendering algorithm,such as ray tracing, these manipulations can beoperated at the level of room description, and the coefficients of the room impulseresponse transmitted to thereal-time convolver. In the low-latency block based im-plementations, we can even have faster update rates for the smaller early chunksof the impulse response, and slower update ratesfor the reverberant tail. However,continuous variations of the room impulse response are rendered more easily usinga model of reverberation operating on a sample-by-sample basis, such as those ofsection 6.5.3.Music Applications and ControlReverberation has been used as a compositional dimension by some authors. Again,Chowning gave one of the most important examples in the piece “Turenas”, es-pecially for the use of reverberation as a means to achieve a sense of distance, asexplained in section 6.2.3.
203.
186 6 Spatial EffectsLuigi Nono made extensive use of reverberators to extend the possibilities of ac-tual performance halls. For instance, in [m-Non88]sudden changes in reverberationtime give the impression of doors that momentarily open theview to larger spaces.Moreover, the timbral featuresof running reverberance, withan ultra-naturaldecaytime of 30 S, are used to give an intense expressive characterto a sustained low noteof the tuba.There are several examples of reverberation by direct convolution with the pe-culiarimpulseresponse of existingspaces.In the work of some authors, such asBarry Truax or Agostino Di Scipio, reverberation is often a byproduct of granularsynthesis that can be somehow controlled by governing the density and distributionof concrete or synthetic sound grains.Feedback delay networks can be interpreted as models of 3-D resonators ratherthan strict reverberators. If theseresonatorsare varied in time in their size andgeometry, several kinds of interesting filteringeffects can be achieved, ranging fromirregular comb-filteringof the voice [m-Bat93]to colored drones carryingan intrinsicsense of depth [m-Doa98].A new interesting perspective on spatial controlof audio effects has been openedup by some recent works by Di Scipio, where the amplification chain and the lis-teningroom become parts of the composition itself as theyconcur to determinethe overall character of a piece. For instance, in [m-DiSOO] the live performancemakes use of a few microphones, whose input is compared with the synthetic soundmaterial, and thedifference signals are used to control several aspects of the soundtransformation process. In this way, the final result is a combination of the originalmaterial, the sound transformation algorithms, the peculiarities and variability ofthe room/audience combination, and the real-time actions of the performer [DiS98].6.6 SpatialEnhancements6.6.1 Stereo EnhancementIn this chapter, we have looked at the problem of reproducing the spatial imageof a soundsourceusing different approaches.Atoneextreme,onecanrecordacouple of soundsignals by means of adummyheadandreproduceit by meansof headphonesortransaural loudspeakerarrangements.Thisapproachhasbothpracticaladvantagesanddrawbacks that have previously beenexplained in sec-tions 6.3.4 and 6.4.5. Moreover, when it is used to collect samples to be playedby sound samplers, another severe problem occurs: the waveform loops, commonlyused to lengthen a steadysound beyond itsstoredlength,tend to magnify anyspatial shift of the sound image, which is perceived as floating cyclically in space.Instead of doing such an “integral sampling”, the spatial information can beintro-duced in a post-processing stage by implementation of HRTF filters. However, thisapproach tends to consider sources as being point-like and not tooclose to theears.A typical example of instrumental sound that is difficult to spatialize properly is
204.
6.6 Spatial Enhancements 187the piano.For the pianist, the sound sourceis perceived as changing its spatialcen-troid according to the note that is being played (the hammers are displaced alonga relatively wide area) and during a single note decay (as different contributionsfrom the soundboard, the frame, and the resonating stringsbecome evident). It isclear that any form of binaural or transaural processing that reproduces this kindof effects would be quite complicated. Fortunately, there are simplified approachesto the rendering of spatial attributes of extended sound sources. The general ideais that of constructing a simple parametric model that captures the spatial soundfeatures of interest. The most interesting feature is the mutual correlation betweentwo audio channels, already definedin (6.9). If the two channels are presented bin-aurally, a low degree of correlation is usually perceived as an externalizationof thesound image, and the fluctuations of the correlation function in time are associ-ated with a spatial impression of the enclosed space. Someauthors [Ken95b, K0831have investigatedhow the cross-correlation between twoaudio channelsis perceivedthrough a pair of loudspeakers. The peak value R of the function (6.9), computedon windowed portions of discretized signal, can take values in the range between-1 and +l,where1 identical signals (modulo a timeshift)0 uncorrelatecl signals .R = { -1 180-degreesout of phase signals (moduloatimeshift)} (6.48)Experiments with noise and natural sources have shown that, in a standard stereoloudspeaker layout, changes in the degreeof correlation R give rise to a combinationof changes in apparent sourcewidth and distance, as depicted in Fig. 6.29. Thispicture illustrates a qualitative behavior, the actual width and distance of soundsources will depend on their own sonic nature, as well as on the kind of processingthat leads to a specified value of R. In fact, aspectsof the correlation function otherthan the peak value will also contribute to the perceived imagery, for instance bychanging the degree of coupling between distance and width.Apparent SourceWidth* *R=OR=-lFigure 6.29 Apparent source width and distance for varying degrees of correlation R.Kendall [Ken95b] proposesadjusting thedegree of correlation between two audiochannelsby means of a coupleof filters whose impulse responsesare set to thedesiredvalue of R. Even though the perceptionof spatial widthseems to be mainly affected
205.
188 6 Spatial Effectsby frequencies below 2-3 kHz, a proposed filter design criterion consistsof taking thefull-range Inverse Fourier Transform (IFT) of two sequences having flat magnitudeandrandomphases 41 and $2. Values of R between 0 and If1 areobtained bycomposition of the two phase sequences, and taking the inverse transform with thephase sequences (41,k41 +4 2 ) . In this way, a catalog of FIR filters having variousdegrees of mutual correlation can be pre-constructed. The underlying assumptionis that a flat magnitude response would not induce coloration in the decorrelatedsignals. However, this assumption collides with the following facts:As explained in section 6.4.2, a pure phase difference between two sinusoidalcomponents at the loudspeakers induces a level difference at the ears. There-fore, the presumed magnitude transparency of the filters is not preserved atthe ears.Between the specified frequency points, the magnitudecan vary widely. Onecan increase the number of points (and thesize of the IFT),but this comes atthe expense of longer FIR filters. Besides being more expensive,if these filtersextend their impulse response beyond 20 ms, a diffusion effect is introduced.Stillunder the assumption that aflatmagnituderesponse at the loudspeakeris an important requirement, an alternative decorrelation is proposed in [Ken95b],which makes use of allpass filters whose poles are randomly distributed within theunit circle. This techniqueensures that themagnitude response at theloudspeakersis flat in the whole frequencyrange. Moreover, it is easier to implementdynamicvariations of the filter coefficients without reading precomputed values in a table.Dynamic filter variations produce interesting effects that resemble variations in thegeometry of a sourceor a room.InboththeFIRandthe allpassdecorrelators,the filter order has to be quite high (several hundred) to achieve good degrees ofdecorrelation. Also, working only with random phase variations can introduce anunpleasant character, called “phasiness” [Ger92b, Ger92a1, to the perceived spatialimage. Ifwe consider the Fouriertransforms y1 and y2 of the twochannels of abinaural recording, we can define the two quantities0 Apparent position: X( E)Phasiness: S(z).The phasiness is considered as anegative attribute, especially because it induceslisteningfatigue.Inorder to minimize phasiness,Gerzon[Ger92a]proposeslinearphase FIR filters with irregular magnitude. Between the filters applied to the twochannels, the magnitude responsesshould be complementary to avoidcoloration.Since realizingtheserequirements in practicecanbe difficult and expensive,onepossibility is to use allpassfiltersnetworked to form a multi-inputmulti-outputallpass block.Considering the prescriptions of Kendall, who recommends using flat-magnitudedecorrelation filters, and those of Gerzon, who recommends using linear-phase fil-ters, onemight argue that in practiceneither of the two ways is necessarily the
206.
6.6 Spatialents 189best. A third way is found by using a feedback delay network (see section 6.5.3),where two relatively uncorrelated outputs can be taken by using two distinct setsof output coefficients. The scheme for such a decorrelator is depicted in Fig. 6.30,where the d coefficients weight the contribution of the stereo input, ma and m, arethe delays of direct signals, and mf is the delay of the signal circulated in the FDN.The delay lengths can be set long enough to avoid coloration and coprime to eachother to minimize temporal artifacts. Alternatively, if one is aware of the physicalreasons for decorrelation at the ears, the delay lengthsmightbetunedbased onsome physical object involved in sound diffusion and propagation. For instance, inthe case of the piano it is likely that the soundboard plays a key role in providingthe appropriatedecorrelation to the earsof the pianist.Therefore, the delay lengthscan be tuned to match the lowest resonances of a piano soundboard.1,1 a1.2 a1,3 1.42.1 2 . 2 3.2 a4,23.1 a3,2 a3.3 3.4IFigure 6.30 Decorrelation of a stereo input pair by means of a feedback delay network.The model in Fig. 6.30 allows a fine tuning of the central position of the soundimage by adjustment of interaural time and intensitydifferences, without changingthe overall width of the sound image,which is essentially due to theFDN. It can alsobe augmented and made more accurate by means of the structural binaural modelpresented in section 6.3.4. Finally, the properly decorrelated signals can be playedthrough headphones or, using the structuresof section 6.4.5,through a loudspeakerpair.Figure 6.31 shows the measured interchannel ratio of magnitudes, apparent po-sition, andphasiness of a lowpass filtered and subsampled binaural piano recording,taken in a time window that extends for 100 ms right after the attack. The samequantities can be plotted after replacing the two binaural channels with the out-puts of the network of Fig. 6.30. This kind of visualization, together with plots ofthe short-time inter-channel correlation, is beneficial as a complement to the auralfeedback in the fine tuning stage of the decorrelator coefficients.
207.
190 6 Spatial Effects2015la5a-5-1a-1 5-20----2000 4000 n2321I10-1-2 --3 --4 -2000 4000-42000 4000 H2Figure 6.31 Frequency-dependent ratio of magnitudes, apparent position, and phasinessof a binaural piano sample.Reportedimagepositionimage shiftdisplaced fromthe loudspeakersprecedence effectcollapsing atone loudspeaker1rns Precedencetime delayreleaseechoicFigure 6.32 Reported apparent sound image position for perfectly correlated (thin line)and decorrelated (thick line) signals (after Kendall [Ken95b]).As noted in [Ken95b], a decorrelator is not only useful to increase the apparentwidth of a sound source, but canalso be effective in defeating the precedence effectto some extent. Figure 6.32shows how the listener perceives musical sounds comingfrom a loudspeaker pair as a function of the relative timedifference between signals(for instance,due to off-axis listeningpositions) and for twoextreme degrees ofcorrelation.
208.
6.6 Spatial Enhancements 1916.6.2 Sound RadiationSimulationIn common musical thinking, loudspeakers are often considered transparent sourcesof either spherical (point source) or plane (planar infinitely extended source)waves.These assumptions are also made,for the sake of simplicity, by the designers of mostspatialization systems. However, noloudspeakersystem is transparent because itintroduces linear and nonlinear distortion, and its radiation pattern can never bedescribed as asphereor a plane. As a consequence, there are spatial and signaldistortions that can degrade the qualityof music played by means of loudspeakers.These distortionsbecome easily audible when there areacoustic instruments ampli-fied by loudspeaker systems. As any sound engineer can confirm, obtaining a goodbalance between the natural source and the loudspeakers can be quite difficult, sothat delays are often introduced to deceive the audience using the precedence effect(see section 6.2.2), so that they believe that sounds are coming from the naturalsource only.Sometimes, it is interesting to reproduce the radiation pattern of an acousticsource, such as a musical instrument, by means of a system of loudspeakers. Thisis seldom the case in regular concert layouts, but it can be effective in sound in-stallations and sound sculptures, where the listeners are supposed to move aroundand through theloudspeaker system. Another special applicationof sound radiationsimulation is found in digital pianos: even if sound samples are perfectly recordedfrom a high-quality instrument, the spatialimpression given to thepianist by soundsas they are played through a few small loudspeakers can be quite disappointing.ActualSource (1)c.LoudspeakerEnsemble (2)-.Figure 6.33 Sound source radiation measured along a boundary (l),and reconstructionof the sound field within the boundary by means of a loudspeaker set (2).Sound radiation simulation can be formulatedas a problem of reconstruction ofthe acoustic field in a space by means of a finite number of sources. The situationis depicted in Fig. 6.33, where the target sound source is depicted within a volumeenclosed by a boundary, and the same volume encloses a set of loudspeakers. Thegoal is to apply some filtering to the loudspeakers so that the system on the right
209.
192 6 Spatial Effectshas the same radiation propertiesas the system on the left. In mathematical terms,this is expressed byNcai(w)Pi(r,w)= R(r,w),for anypoint r on theboundary dV , (6.49)i=lwhere Pi(r,W ) is the frequency response of loudspeaker i measured at point r, andR(r,W ) is the frequency responseof the target sound source measured at the samepoint r. If (6.49)holds at theboundary,the equivalence of the twosystems ofFig. 6.33 is also true in the whole enclosed space [EH69]. The frequency responsesPi canbethought of asvectors identifying avectorspace V , so that the set ofcoefficients ai that gives the vectorclosest to R is obtained by orthogonal projectionof R onto V (see Fig. 6.34). The projectionis such that the scalar product betweenthe distance vector R - C jajPj and each basis vector Pj is identically zero. Sincethe scalar productis computed by integration over the boundary [DCW95],we haveFigure 6.34 Finding the minimum distance between a vector R and a vector space V byorthogonal projection.The solution to (6.50) can be expressed as the solution to a linear system ofequations in the unknowns aj, to be solved for each frequency W of interest. Theresulting frequency responsesaj ( W ) , to be imposed on eachaudio channel, can thenbe approximated by filter design techniques, or inverted and implementedby director fastconvolution.Withasmallnumber of loudspeakers, the radiation patterncanbe fairly approximated only uptoa few hundreds Hz [DCW95].However,accurate tuning of low-frequency directivity can be enough to give the impressionof a directionaltone color similar to an acoustic source. Someresearchershavestarted collecting directional impulse responses of musical instruments [CT98], sothat a database of filter coefficients can be designed, and they canbe switched andinterpolated interactively in electroacoustic music performances.
210.
6.7 Conclusion 1936.7 ConclusionPlayingwith the spatial attributes of soundhasbeenanintriguingand challen-ging task for many musicians and sound designers. The multiplicity of techniquesdeveloped so farhasbeenroughly overviewed in the previouspages.Despite thethickness of this chapter,we have certainly missed many important contributionstothe field. However,we endeavored to communicate the main structural, perceptual,or technological limitationsand possibilities of spatialaudio. We hope that thesound designer, after reading this chapter,will be ableto model somespatial featuresof soundor,atleast,tobe conscious of thosefeaturesthat will bepart of theaesthetics of the design process rather than part of the sonic outcome.Technological progresswill stimulate more researchin spatial audioin the future.A particularly promising area is that of audio embedded in everyday objects as afeasible form of display for ubiquitouscomputing. As the number and flexibilityof soundsources are likely to increase inthis new context,it is likely that newparadigms for spatial sound design will emerge.Sound and Music[m-Bat93]G.Battistelli.FrauFrankenstein. 1993. In: CD BMG RICORDI74321465302, 1997.[m-Bou84] P. Boulez. Rkpons. 1981-84. In: CD Deutsche Grammophon 457 605-2,1998.[m-Cag69] J. Cage and L. Hiller. HPSCHD.1967-69.In: LP NonesuchRecordsH-71224, 1969. CD IDEAMA 069. Karlsruhe: ZKM, 1996.[m-Cho72] J. Chowning. Threnas. 1972.In: CD Wergo WER 2012-50, 1988.[m-Cif95] F. Cifariello Ciardi.Games.1995.EdipanEd.,Rome.In:CD PAN3064. 1999.[m-DiSOO] A.Di Scipio. 5 Difference-Sensitive CircularInteractions. 2000. In:CD of the International ComputerMusic Conference. Berlin, Germany,2000. ICMA.[m-Doa98] R.Doati.Inventario delle Eclissi. 1996. In: CD annextothe bookPoetronics: A1 confine tra suono, parola, tecnologia, A. Di Vincenzo andA.G. Immertat, eds., Edizioni Tracce, Pescara - Italy, 1999.[m-Gua94] A. Guarnieri. Orfeo cantando... tolse.... 1994. In: CD BMG RICORDI74321526802,1997.[m-Gua99] A. Guarnieri.PassionesecondoMatteo.1999.RicordiEd.,Milan.[m-Non82] L. Nono. Prometeo. 1982. Ricordi Ed.,Milan.
211.
194 6 Spatial Effects[m-Non88] L. Nono. Post-Prae-LudiumperDonau.1987.In:CD ARTIS ARC0032, 1993.[m-Pis95] M. Pisati. ZONE I: zone hackadirezionevirtuale. 1995. Ricordi Ed.,Milan.[m-Sto56] K. Stockhausen. Gesang der Junglinge. 1955-56. In: CD-3 of the Stock-hausen Complete Edition, StockhausenVerlag.[m-Tru85]B. Truax. SolarEllipse. 1984-85. In:CD Wergo WER 2017-50, 1988.Bibliography[AB791[BC001[Berg21[BD981[Bla83][BL86][Bor84][BV95][CB891[Cho7l]J. Allen and D. Berkley. Image method for efficiently simulating small-roomacoustics. J. Acoust. Soc. Am., 65(4):912-915, April 1979.R. Bianchini and A. Cipriani. VirtualSound. ConTempo,Rome,Italy,2000.L.L. Beranek. Concert hall acoustics- 1992. J. Acoust.Soc. Am.,92(1):1-39, July 1992.C.P. Brown and R.O. Duda. A structural model for binaural sound syn-thesis. IEEE Trans. Speechand Audio Processing, 6(5):476-488, Sept.1998.J. Blauert. Spatial Hearing: ThePsychophysics of HumanSound Local-ization. MIT Press, 1983.J. BlauertandW.Lindemann.Auditory spaciousness: Some furtherpsychoacousticanalyses. J. Acoust. Soc. Am., 80(2):533-542, August1986.J. Borish.Extension of the imagemodel toarbitrary polyhedra. J.Acoust. Soc. Am., 75(6):1827-1836, June 1984.A. Belladonna andA. Vidolin. spAAce: un programma dispazializzazioneper il Live Electronics.In Proc.Second Int.Conf. on Acoustics ana!Musical Research, pages 113-118, Ferrara, Italy, 1995.D.H.Cooperand J.L. Bauck. Prospects for transaural recording. J.Audio Eng. Soc., 37(1/2):3-19, Jan/Feb 1989.J.M. Chowning. The simulation of moving sound sources. J. Audio Eng.Soc., 19(1):2-6, 1971. Reprinted in the ComputerMusic Journal, June1977.
212.
Bibliography 195[Cho99] J.M. Chowning. Perceptual fusion and auditory perspective. InP.R. Cook(ed), Music, Cosgnition,and ComputerizedSound: An IntroductiontoPsychoacoustics. MITPress,Cambridge, MA, 1999. Pages 261-275.Reprinted in the Computer Music Journal, June 1977.[CT98] P.R. Cook and D. Theman. NBody:interactivemultidirectionalmusi-cal instrument body radiation simulators, and a database of measuredimpulse responses. In Proc.InternationalComputerMusicConference,Ann Arbor, MI, :pages353-356, 1998.[CW93] M.F. Cohen andJ.R. Wallace. RadiosityandRealisticImageSynthesis.Academic Press, 1993.[Dat97] J. Dattorro. Effect design - Part 1: Reverberatorandotherfilters. J.Audio Eng. Soc., 45(19):660-684, September 1997.[DCW95] P. Derogis, R. Causse, and 0.Warusfel. On the reproduction of directiv-ity patterns using multi-loudspeaker sources. In Proc. Intern. Symp. on[DiS98][DM981[Dur92][EH691[FU98][Gar951[Gar98a][Gar98b][GB981Musical Acoustics, Dourdan, France, pages 387-392, 1995.A. Di Scipio. El sonido en el espacio, el espacio en el sonido. Doce NotasPreliminares, 2:133-157, December 1998.R.O. Duda and W.L. Martens. Range dependence of the response of aspherical head model. J. Acoust. Soc. Am., 104(5):3048-3058, November1998.N. Durlach.Ontheexternalization of auditory images. Presence,1(2):251-257, Spring 1992.W.C.Elmoreand M.A.Heald. Physics of Waves. McGraw-Hill, 1969.Reprinted byDover Publications, Inc., 1985.A. Farina and E. Ugolotti.Software implementationof B-format encodingand decoding. Psreprint of the Audio Eng. Soc. Convention, Amsterdam,Netherlands, May, 1998.W.G.Gardner. Efficient convolutionwithout input-output delay. J.Audio Eng. Soc.:, 43(3):127-136, March 1995.W.G.Gardner. 3-0 Audio using Loudspeakers. KluwerAcademic Pub-lishers, 1998.W.G.Gardner.Reverberationalgorithms.In M. Kahrsand K. Bran-denburg(eds), Applications of DigitalSignalProcessingtoAudioandAcoustics, Kluwer Academic Publishers, pages 85-131, 1998.M.A.Gerzon andG.J.Barton.Surroundsoundapparatus, 1998. 1J.S.Patent no. 5,757,927.
213.
196 6 Spatial Effects[Ger76] M.A. Gerzon. Unitary (energypreserving)multichannelnetworkswithfeedback. Electronics Letters V, 12(11):278-279, 1976.[Ger85] M.A. Gerzon. Ambisonics in multichannelbroadcasting and video. J.Audio Eng. Soc., 33959-871, November 1985.[Ger92a] M.A. Gerzon. Optimumreproductionmatrices for multispeakerstereo.J. AudioEng.Soc., 40(7/8):571-589, 1992.[Ger92b]M.A.Gerzon.Signalprocessing for simulatingrealisticstereoimages.Preprint of the AudioEng. Soc. Convention, SanFrancisco,October1992.[GM941 W.G. Gardnerand K. Martin.HRTFmeasurements of aKEMARdummy-head microphone. Technical report # 280, MIT Media Lab, 1994.[Gri94] D. Griesinger.Subjectiveloudness of runningreverberation in halls andstages.In Proc. W.C. Sabine CentennialSymposium, Cambridge, MA,pages 89-92, June 1994.[Gri97] D.Griesinger. The psychoacoustics of apparent source width, spacious-ness and envelopment in performance spaces. Acustica, 83:721-731,1997.[Gri99]D.Griesinger.Objectivemeasures of spaciousness and envelopment. InProc. Audio Eng. Soc. Int. Conference, Rovaniemi, Finland, pages 27-41,April 1999.[Ha1951 H.P.Haller. Das Experimental Studio der Heinrich-Strobel-Stiftungdes Sudwestfunks Freiburg 1971-1989. Die Erforschung der Elektronis-chenKlangumformung und ihre Geschichte. Siidwestfunk,Schriftenre-iche, Rundfunkgeschichte, Band 6/1 und 6/2 Nomos Verlagsgesellschaft,Baden-Baden, Germany, 1995.[HW96] W.M.Hartmannand A. Wittenberg. On theexternalization of soundimages. J. Acoust. Soc. Am., 99(6):3678-3688, 1996.[HZ991 J. Huopaniemi and N. Zacharov.Objective andsubjective evaluationof head-relatedtransferfunction filter design. J. Audio Eng. Soc.,47(4):218-239, April 1999.[IAS98] 3D Working Group of the Interactive Audio SpecialInterestGroup. In-teractive 3 0 Audio Rendering Guidelines. Level 1.0. MIDI ManufacturersAssociation, June 9, 1998.[IAS99] 3D Working Group of the Interactive Audio SpecialInterestGroup. In-teractive 3 0 Audio Rendering Guidelines. Level 2.0. MIDI ManufacturersAssociation, September 20, 1999.[JC91]J.-M.Jotand A. Chaigne.Digital delay networks for designing artifi-cial reverberators.Preprint of the AudioEng. Soc. Convention, Paris,February 1991.
214.
Bibliography 197[Jot921 J.-M.Jot. Etude et Redhation d’un Spatialzsateur de Sons par ModdesPhysiques et Perceptifs. PhD thesis, TELECOM, Paris 92 E 019, 1992.[Jot991 J.-M.Jot.Real-timespatial processing of sounds for music,multimedia,and interactive human-computerinterfaces. Multimedia Systems, 7(1):55-69, 1999.[JS83] D.A. Jaffe andJ.O.Smith.Extensions of theKarplus-Strong pluckedstring algorithm. Computer Music J., 7(2):56-69, 1983.[Ju195] J.-P.Jullien.Structured model for the representationand the control ofroomacousticalquality. In Proc. 15th Int. Conf. on Acoustics, Trond-heim, Norway, pages 517-520, 1995.[Kai80] T. Kailath. Linear Systems. Prentice-Hall,1980.[KC981 A. Kulkarniand €[.S. Colburn. Role of spectraldetailinsound-sourcelocalization. Nature, 396747-749, December 1998.[Ken95a] G.S. Kendall. A 3-D sound primer: directional hearing and stereo repro-duction. Computer Music J., 19(4):23-46, Winter 1995.[Ken95b] G.S. Kendall. The decorrelationof audio signals and its impact on spatialimagery. Computer Music J., 19(4):71-87, Winter 1995.[K0831 K. KurozumiandK.Ohgushi.Therelationship between the cross-correlation coefficient of two-channelacoustic signals and sound imagequality J. Acoust. Soc. Am., 74:1728-1733, 1983.[Kuh77] G. Kuhn. Model for theinterauraltime differences intheazimuthalplane. J. Acoust. Soc. Am., 62:157-167, July 1977.[KutSl] H. Kuttruff. Room Acoustics. Elsevier Science, Essex, 1991. 3rded;1sted 1973.[Kut95] H. Kuttruff.A simple iteration scheme for thecomputation of decayconstants in enclosureswith diffusely reflecting boundaries. J. Acoust.Soc. Am., 98(1):288-293, July 1995.[KVOl] M. Kubovyand D. Van Valkenburg.Auditory and visual objects. Cog-nition, 90:97-126,2001.[KW92] D.J. Kistler and F.L. Wightman. A model of head-relatedtransfer func-tions based on principal components analysisand minimum-phase recon-struction. J. Acoust. Soc. Am., 91(3):1637-1647,March1992.[Kyr98]C.Kyriakakis. Fundamentaland technological limitations of immersiveaudio systems. Proc. IEEE, 86(5):941-951, May 1998.[LS99] C.Landoneand M. Sandler. Issues in performanceprediction of surroundsystems in sound reinforcement applications. In Proc. DAFX-99 DigitalAudio Effects Workshop,pages 77-82, Trondheim, December 1999.
215.
198 6 Spatial Effects[MarOl] W.L. Martens Psychophysicalcalibration for controlling the range of avirtual soundsource: multidimensional complexity inspatial auditorydis-play. In Int. Conf. on Auditory Display, pages 197-207, ESPOO, Finland,2001.[MHV97] J. Mackenzie, J. Huopaniemi, and V. Valimaki. Low-order modeling ofhead-relatedtransferfunctionsusingbalancedmodeltruncation. IEEESignal Processing Letters, 4(2):39-41, February 1997.[MI861 P.M. Morse and K.U. Ingard. Theoretical Acoustics. McGraw-Hill, 1968.Reprinted in 1986, Princeton University Press.[Mit98]S.K. Mitra. Digital Signal Processing: A Computer-Based Approach.McGraw-Hill, 2nd edition, 2001.[MM951 D. G.Malhamand A. Myatt. 3-D soundspatialization using ambisonictechniques. Computer Music J., 19(4):58-70, Winter 1995.[Moo791 J.A. Moorer.About thisreverberation business. Computer Music J.,3(2):13-18, 1979.[Moo821 F.R. Moore. A general model for spatial processing of sounds. ComputerMUSZCJ., 7(3):6-15, 1982.[Mor911 P.M. Morse. Vibration and Sound. American Institute of Physics for theAcoustical Society of America, 1991. 1st ed 1936, 2nd ed 1948.[Mu1991 C. Muller-Tomfelde Low-latency convolution for real-timeapplications.In Proc. AudioEng. Soc. Int. Conference, pages 454-459, Rovaniemi,Finland, April 1999.[NE981 R. Nicol and M. Emerit.Reproducing 3D-sound for videoconferencing: acomparison between holophonyand ambisonic. In Proc. DAFX-98 DigitalAudio Effects Workshop, pages 17-20, Barcelona, November 1998.[Neu98] J.G. Neuhoff. Aperceptualbias for risingtones. Nature, 395(6698):123-124, 1998.[OS891 A.V.Oppenheimand R.W. Schafer. Discrete-Time Signal Processing.Prentice-Hall, 1989.[PBJ98] J.-M. Pernaux, P. Boussard, and J.-M. Jot. Virtual soundsource position-ing and mixing in 5.1 implementation on the real-time systemGenesis. InProc. DAFX-98 Digital Audio Effects Workshop, pages 76-80, Barcelona,November 1998.[PKH99] V. Pulkki, M. Karjalainen, and J. Huopaniemi.Analyzing virtual soundsource attributes usingbinauralauditory models. J. Audio Eng. Soc.,47(4):203-217, April 1999.[Pou57] J. Poullin. Son et espace. La Revue Musicale, 1957.
216.
Bibliography 199[Pu197] V. Pulkki.Virtualsoundsourcepositioningusing vector baseamplitudepanning. J. Audio Eng. Soc., 45(6):456-466, 1997.[RBM95] D. Rocchesso, 0. 13allan, and L. Mozzoni. Soundspatialization for live[RDOl][Roc951[Roc961[Roc971[Its971[RV891[RV961[&h611[%h621[Sch70][%h731[SF851music performance. Proc.Second Int. Conf. on Acoustics and MusicalResearch, pages 183-188, Ferrara, Italy, 1995.D. Rocchesso and P. Dutilleux. Generalization of a 3-D resonator modelfor the simulation of sphericalenclosures. Applied Signal Processing,2001:15-26, 2001.D. Rocchesso. The Ballwithin the Box: asound-processing metaphor.Computer Music J., 19(4):47-57, Winter 1995.D. Rocchesso. Strutture ed Algoritmi per l%laborazione del Suono basatisuReti di Linee di Ritardo Interconnesse. PhD thesis,University ofPadua, February 1996.D. Rocchesso. Maximally-diffusive yet efficient feedback delay networksfor artificial reverberation.IEEE Signal Processing Letters, 4(9):252-255,September 1997.D. Rocchesso and J.O. Smith. Circulant and elliptic feedback delay net-works for artificial reverberation. IEEE Transactions on Speech and AIL-dio Processing, 5(1):51-63, January 1997.D. Rife and J. Vanderkooy. Transfer-functionmeasurements usingmaximum-lengthsequences. J. Audio Eng. Soc., 37(6):419-444, June1989.D. Rocchesso and A. Vidolin. Sintesi del movimento e dello spazio nellamusica elettroacustica. In Atti del ConvegnoLaTerra Fertile, L’Aquila,Italy, October 1996.M.R. Schroeder.Improvedquasi-stereophony and “colorless” artificialreverberation. J. ,4coust. Soc. Am., 33(8):1061-1064, August 1961.M.R. Schroeder. Natural-sounding artificial reverberation. J. Audio Eng.SOC.,10(3):219-233, July 1962.M.R. Schroeder. Digital simulation of sound transmission in reverberantspaces. J. Acoust. Soc. Am., 47(2):424-431, 1970.M.R. Schroeder. Computer models for concert hall acoustics. AmericanJournal of Physics, 41:461-471, 1973.J.O.Smithand B. Friedlander.Adaptiveinterpolated time-delay esti-mation. IEEE Trans. Aerospace and Electronic Systems, 21(2):180-199,March 1985.
217.
200 6 Spatial Effects[SL61] M.R.Schroederand B. Logan. “Colorless” artificialreverberation. J.Audio Eng. Soc., 9:192-197, July 1961. Reprinted in the IRE Trans. onAudio.[Smi85] J.O. Smith. A new approach to digital reverberation using closedwaveg-uide networks. In Proc. International Computer Music Conference,pages47-53, Vancouver, Canada, 1985. Also available in [Smi87].[Smi86] J.O. Smith. Elimination of limit cycles and overflow oscillations in time-varying lattice and ladder digitalfilters. In Proc. IEEE Conf. Circuits andSystems, SanJose,California,May1986.Longerversion also availablein [Smi87].[Smi87] J.O. Smith. Musicapplications of digitalwaveguides. Reportstan-m-39,CCRMA - Stanford University, Stanford, California, 1987.[Smi92] J.O. Smith. Physical modelingusingdigitalwaveguides. ComputerMusicJ., 16(4):74-91, Winter 1992.[SP82] J.Stautnerand M.Puckette. Designingmultichannelreverberators.Computer Music J., 6(1):52-65, Spring 1982.[Vai93] P.P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice-Hall,1993.[ZGM92] B. Zhou, D. Green, and J. Middlebrooks. Characterization of external earimpulse responses using golay codes. J. Acoust. Soc. Am., 92:1169-1171,1992.[ZF90] E. Zwicker and H. Fastl. Psychoacoustics: Facts and Models. Springer-Verlag, Berlin, Germany, 1990.
218.
Chapter 7Time-segment ProcessingP. Dutilleux, G. De Poli, U. Zolzer7.1 IntroductionIn this chapter we discussseveraltimedomainalgorithms which are a combina-tion of smaller processingblocks like amplitude/phase modulators,filters and delaylines. These effects mainly influence the pitch and the time duration of the audiosignal. We will first introduce some basiceffects like variable speed replayand pitch-controlled resampling. They are all based on delay line modulation and amplitudemodulation. Then we will discuss two approaches for time stretching (timescaling)of audiosignals.They are basedon an analysisstage, where the input signal isdivided into segments (blocks) of fixed or variablelength, and asynthesisstagewhere the blocks of the analysis stage are recombined by an overlap and add pro-cedure. These time stretching techniques perform time scaling without modifyingthe pitch of the signal. The fourth section focuses on pitch shifting, and introducesthree techniques: block processing based on time stretching and resampling, delayline modulation and pitch-synchronous block processing. Block processing based ondelay line modulationperformspitchshifting by scaling the spectral envelope ofeach block. Pitch-synchronous block processing performs pitch shifting by resam-pling the spectral envelope of each block and thuspreserving the spectralenvelope.The last section on time shuffling and granulation presents a more creative use oftime-segment processing. Short segments of the input signal are freely assembledand time placed in the output signal.In this case the input sound can be muchless recognizable. The wide choice of strategies for segment organization implies asound composition attitude from the user.201
219.
202 7 Time-segment Processing7.2 Variable Speed ReplayIntroductionAnalog audio tape recorders allow replay mode over a wide range of tape speeds.Particularly in fastforward/backwardtransfermode, a monitoring of theaudiosignal is possible which is used to locate a sound. During faster playback the pitchof the sound is raised and during slower playback the pitch is lowered. With thistechnique the duration of the sound is lengthened, if the tape is slowed down, andshortened if the tape speed is increased.Figure 7.1 illustrates a soundsegmentwhich is lenghtened and shortened and their corresponding spectra.Signal ProcessingVariable speed replay(v=I), time domain signals-0.5 1 i0 2000 4000 6000 8000( ~ 0 . 5 )-0.51 1I I0 2000 4000 6000 8000(v=2)I 1ht o-0.5 1l I0 2000 4000 6000 8000n - t0-20ti-40--60-80Variable speed replay (v=l), spectraP0 1000 2000 3000 4000 5000(~=0.5)-20F-40-60-800 1000 2000 3000 4000 5000(v=2)-40-60-800 1000 2000 3000 4000 5000f in Hz -tFigure 7.1 Pitch shifting:Variable speed replay leadsto time compression/expansionandcompression and expansion of the spectral envelope.Thephrase“variablespeedreplay” is used to mean that whathashappenedinitially during the time period nTs,inis now happening duringn T s , r e p l a y nTs,in/v (7.1)
220.
7.2 Variable Speed Replay 203at therelative speed v, where Ts,inand Ts,replayare the initial andreplay samplingperiods. Time expansion corresponds to W < 1.A straightforward method of imple-menting the variable speed replay is hence to modify the sampling frequency whileplaying back the sound according tofs,replay = fs,in ‘ v (7.2)where fs,in and fs,replay are the initial andreplay samplingfrequencies. One shoulddistinguish whether the output should be digital or may be analog. If the outputis analog,then a very effective method is to modify the samplingfrequency ofthe output DAC. The spectrum of the signal is scaled by W and the analog recon-struction filtershouldbetuned in order to remove the spectral images after theconversion [Gas87, Mas981.If a digital output is required, then a sampling frequency conversion has to beperformed between the desired replay frequency fs,replay and the output samplingfrequency fs+,%t which is usually equal to fs,in.If v < 1(time expansion) then fs,in > fs,replay < fS,OPltand more output samplesare needed than available from the input signal. The outputsignal is an interpolated(over-sampled) version by the factor 1/v of the input signal. Ifv > 1 (time com-pression) then fs,in < fs,Teplay > fs+t and less output samples than available inthe input signal are necessary. The input signal is decimated by the factor v. Beforedecimation, the bandwidthof the input signal has tobe reduced to fs,Teplay/2by adigital lowpass filter [McN84].The quality of the sampling rate conversion dependsvery much on the interpolation filter used. A very popularmethod is the linearinterpolation between two adjacent samples.A review of interpolation methods canbe found in [Mas98, CR83].A discrete-timeimplementationcanbe achieved by increasing/decreasing thetransfer rate of a recordeddigital audio signal to the DA converter,thus chan-ging the output samplingfrequency compared to therecording sampling frequency.If the output signal has to be in digitalformat again, we have to resample thevarispeed analog signal with the corresponding sampling frequency. A discrete-timeimplementationwithouta DA conversion and newAD conversion was proposedin [McN84] and is shown in Fig. 7.2. It makes use of multirate signal processingtechniques and performs an approximation of the DA/AD conversion approach. Afurther signal processing algorithm to achieve the acousticalresult of avariablespeed replay is the delay line modulation with a constant pitch change, which willbe discussed in section 7.4.3.Musical Applications and ControlAs well as for analogtape-basedaudioediting,thevariablespeedreplay is verypopular in digitalaudioeditingsystems. See [m-Wis94c, ID 2.9 and 2.101 for astraightforward demonstration of the effect on a voice signal.The effect of tape speed transposition has been used byLes Paul in the piececalled “Whispering” in 1951 [Lee72]. Thismethod is very often used in electro-acoustic music when the pitch of concrete sounds cannot be controlled at the time
221.
204 7 Time-segment Processingfssampling frequencyoutputsampling frequencyFigure 7.2 Variable speed replay scheme.of recording. P. Schaeffer designed the Phonogine chromatique to transposea soundto any one of the 12 degrees of the equal tempered scale. The device was based ona tape recorderwith12 capstansand pinch rollers. Theoperation of the pinchrollers could be controlled by a piano-like keyboard. An additional gear extendedthe rangeof operation to twooctaves [Mo160,p. 71];[Roa96, p.119];[Ges00].JacquesPoullin developed another version, the Phonogkne ci coulisse, which allowed contin-uous speed modifications. A pair of cones, with a friction wheel in between, consti-tutes a variable-ratio mechanicallink between the motor and the capstanof the tapeplayer. The position of the friction wheel, and hence the replay speed, is controlledby a mechanical lever. Stockhausen, in “Hymnen”, transposed orchestral sounds togive them an overwhelming and apocalyptic character [Chi82, p. 531.In computermusic too, thevariable speed replay providesan effective transposi-tion scheme. J.-C. Risset says: by “mixing a sound with transpositions of itself witha minute frequency difference (say,a twentieth of a Hertz), one canturn steadyperi-odic tones intoa pattern where the harmonics of the tone wax and wave at differentrates, proportional to the rank of the harmonic” [Ris98, p. 255];[m-INA3, Sud]. In“The Gates of H.”, Ludger Brummer exploits the fact that variablespeedreplaymodifies boththepitchandtheduration of asample[m-Bru93,14’40”-17’25”].Seven copies of the same phrase, played simultaneously at speeds 7.56, 4.49, 2.24,1.41, 0.94, 0.67, 0.42, 0.31 are overlapped. The resulting sound begins with a com-plex structure and an extended spectrum. As the piece continues, the faster copiesvanish and the slower versions emergeoneafter theother.Thesoundstructuresimplifies and it evolves towards the very low registers.The character of the transposed sounds is modified because all the features ofthe spectrum are simultaneouslyscaled. The formants are scaled up leading to a“Mickey Mouse effect” or down,as if the sounds were produced by oversized objects.The time structure is modified as well. The transients are spread or contracted. Avibrato in the initialsound will lose its character and will appear as a slower orfaster modulation. The sounds can also be played at negative speeds. A speed -1yields a sound with the same average spectrum although sounding very different.Thinkabout speech or percussivesounds played backwards. Othertranspositionschemes that are free fromthesedrawbacksare achieved by moresophisticatedmethods described in further sections of this book.A particularapplication was desired by the composer Kiyoshi Furukawa. Hewanted a sampler for which the speed would be controlled by the amplitude of an
222.
7.3 Time 205acoustical instrument. A sound is stored in a sampler and is played as a loop. Inthe meantime, the RMS amplitude of an incoming controlling signal is computedand time averaged with independent attack and decay time constants. This ampli-tude is converted to decibels and scaled before driving the speed of the sampler.The parameters have to be tuned in such a way that the speed remains within avalid range and the speed variations are intimately related to theloudness and thedynamics of the instrument (see Fig. 7.3).ILoop-samplerI Decay time-constantAttack time-constantFigure 7.3 A loop-sampler controlled by an acoustical signal.This effectis controlled by a clarinet in “Swim,swan” and by a viola in “denungeborenen Gottern” [m-Fur93,m-F‘ur971. The pitch of the acoustical instrumentselects words out of a predefined set whereasthe loudness controls the replay speedof these words.7.3 TimeStretchingIntroductionIn order to understand the issue of time stretching, let us take the example of asignal whose duration does not fit the time slot that is allocated to its application.Think about a speaker that has already recorded 33 seconds of speech but whosecontribution toa commercial maynot belonger than 30 seconds. If he does not wantto record his text again, the sound engineer may artificially contract his speech by10%. Withtheterm“timestretching” we mean thecontraction or expansion ofthe duration of an audio signal (time compression, time expansion, time scaling -+signal processing term). We have studied in 7.2 a method that alters the durationof a sound, the variable speed replay, but it has the drawback of simultaneouslytransposing the sound. TheHarmonizer could be usedto transpose the soundin theopposite direction and the combination of both methods leads toa time stretchingalgorithm.The main task of time stretching algorithms is to shorten or lengthen a soundfile of M samples to a new particular length M‘ = (Y . M , where Q is the scalingfactor. For performing time stretching algorithmsthe sound file has to be availablein a stored form on a storage medium like a sampler, DAT or a hard disc. Timestretching of a sequence of audio samples is demonstrated in Fig. 7.4. The originalsignal is shown in the upper plot. The middle plot shows the sequencewhich is
223.
206 7’ Time-segment ProcessingTime stretching (a=i),time domain signals0-20B-40Time stretching (a=l),spectra0.55 0h-0.5-600 2000 4000 60008000-800 1000 2000 3000 4000 5000(a=0.5) (a=0.5)0.5-20G O 2-40-0.5-60-800 2000 4000 6000 8000 0 1000 2000 3000 4000 5000(a=2) (a=2)0-20-400.5-t o--60-0.50 2000 4000 60008000 0 1000 20003000 4000 5000-80n + f in Hz +Figure 7.4 Time stretching with scaling factor a = 0.5,2.shortened by a scaling factor a = 0.5 and the lower plot shows the stretching by ascaling factor Q = 2.Signal ProcessingThe intended time scaling does notcorrespond to the mathematical time scalingas realized by vary-speed. We rather require a scaling of the perceived timing at-tributes, suchas speaking rate, withoutaffecting the perceived frequencyattributes,such as pitch. We could say that we want the timescaled version of an acoustic sig-nal to be perceived as the same sequence of acoustic events as the original signalbeing reproduced accordingto a scaled time pattern. The time stretching algorithmsshould not affect the pitch or the frequency contents of the processed signals. Thisis demonstrated by the corresponding spectra (first 2000 samples) of the discrete-timesignals in Fig. 7.4. Forcomparisononlythetraditionaltechnique for timestretching based on the variable speed replay introduces a pitch shift (see section7.2 and Fig.7.1). The basic ideaof time stretchingby time-segment processing is todivide the input sound into segments. Then if the sound is to be lengthened, somesegments are repeated,while if the sound is to be shortened,some segments aredis-carded. A possible problem is amplitude and phase discontinuityat the boundaries
224.
7.3 Time Stretching 207of the segments. Amplitude discontinuities are avoided by partially overlapping theblocks, while phase discontinuities are avoided by a proper time alignment of theblocks. Two different strategies will be presented in subsections 7.3.2 and 7.3.3.ApplicationsSpecial machines such as the Phonogbne universe1of Pierre Schaeffer or the Tem-pophon used by Herbert Eimerts allowed alteration of the time duration as well asthe pitch of sounds. The Phonogbne found many applications in rnusique concrbteas a “time regulator”. In itscomposition “Epitaph fur Aikichi Kuboyama”,HerbertEimerts uses the Tempophon in order to iteratespoken word fragments. Thedeviceallowed the scanning of syllables, vowels and plosives and could make them shorter,longer or iterate them atwill[Ha195, p. 13];[m-Eim62].As mentioned in the Introduction, the stretchingof signals can beused to matchtheir duration to an assigned time-slot. In Techno music, different pieces of musicare played one after the other as a continuous stream. This stream is supposed tohave only very smooth tempo or bpm (beat per minute) transitions although themusical excerpts usually do not have the same tempo.In order to adjust the tempoto each other, the disc jockey modifies the replay speeds at thetransition from oneexcerpt to the other. This methodleads to temporary pitchmodifications whichcouldbeobjectionable. The use of timestretchingmethods could eliminate thisproblem.After a brief presentation of the technology of the Phonogkne, the followingsectionsdiscusstwosignalprocessingtechniques which performtimestretchingwithout pitch modifications.7.3.1 HistoricalMethods - PhonogheFairbanks, Everitt and Jaeger reportin 1954on a modified tape recorder for time orfrequency compression-expansionof speech [Lee72,Lar981. Springer develops a sim-ilar machine [Spr55,Spr591 and PierreSchaeffer praises a machine called Phonogbneuniverse1that was designed as a combination of the aforementioned Phonogbne chro-matique and Phonogbne h coulisse with the rotating head drum of Springer [Mo160,p. 71-76];[Sch73,p. 47-48]; [m-Sch98, CD2, ID. 50-52];[Pou54, PS57, GesOO].The modified tape recorder has several playback heads mounted on a rotatinghead drum. The absolute speed of the tape at the capstandetermines the durationof the sound whereas the relative speed of the heads to thatof the tapedeterminesthe amountof transposition. By electrical summation of the outputsof the differentheads, a continous sound is delivered (Fig. 7.5). Moles reports a typical operatingrange of +lo% to -40% [Mo160, p. 761. The Springer machine was also known asLaufieitregler or Zeitdehner [Car92, p. 479-480];[End97].
225.
208 7 Tame-segment ProcessingFigure 7.5 Tape-based time compression-expansion system (After [Mo160]).7.3.2 Synchronous Overlap and Add(SOLA)A simple algorithmfor time stretching based on correlation techniques is proposedin [RW85, MEJ861. The input signal is divided into overlapping blocks of a fixedlength, as shown in Fig. 7.6. In a second step these overlapping blocks are shiftedaccording tothetimescalingfactor Q. Thenthesimilarities in thearea of theoverlap intervals are searchedfor a discrete-time lagof maximum similarity.At thispoint of maximum similarity the overlapping blocks are weighted by a fade-in andfade-out function and summed sample by sample.overlapoverlapintervalintervalFigure 7.6 SOLA time stretching.
226.
7.3 Time Stretching 209Algorithm description:1. Segmentation of the input signal intoblocks of length N with time shift of S,samples.2. Repositioning of blocks with time shift S, = a . S, with scaling factor a.3. Computation of the cross-correlation~ L-m-lbetween z~l(n)and x ~ Z ( n ) ,which are the segmentsof q ( n )and x2(n)in theoverlap interval of length L.4. Extracting the discrete-time lag 5, where the cross-correlation~,,,,,,(5,) = r,,, has its maximum value (see Fig. 7.7a).5. Using this discrete-time lag IC,, fade-out z1(n)and fade-in x2(n).6. Overlap-add of zl(n)and z2(n)for new output signal.Figure 7.7 illustrates the difference between simple overlap and add with fade-out and fade-in of the blocks and therefined synchronous method with the pointofmaximum similarityIC,. The SOLA implementation leads to timescaling with smallcomplexity, where the parameters S,, N , L are independent of the pitch period ofthe input signal.The followingM-file 7.1demonstratestheimplementation of the SOLA timescaling algorithm:M-file 7.1 (TimeScaleS0LA.m)% TimeScaleS0LA.m% Time Scaling with Synchronized Overlapand Add%% Parameters:%% analysis hop size Sa = 256 (default parmater)% blocklength N = 2048(defaultparameter)% overlapintervalL = 256*alpha/2time scaling factor 0.25 <= alpha <= 2clear allyclose all[signal,Fs] = wavread(’xi.wav’);DAFx-in = signal’;Sa = input(’Ana1ysis hop size Sa in samples = ’ ) ;N = input(’Ana1ysis block sizeN in samples = ’>;
227.
210 7 Time-segment ProcessingS,=aSaS,=aS,. .overlapintervaloverlapinterval-Lkm2Figure 7.7 SOLA: cross-correlation and time stretching.if Sa > Ndisp(’Sa must be lessthan N ! ! ! ’ )endM = ceil(length(DAFx-in)/Sa);% Segmentation into blocksof length N every Sa samples% leads to M segmentsalpha=input(’Timestretchingfactoralpha = ’);Ss =round(Sa*alpha) ;L =input(’Overlapinsamples (even) = ’ ) ;
228.
7.3 Time Stretching 211if Ss >= N disp(’a1pha is not correct,Ss is >= N’)elseif Ss > N-L disp(’a1pha is not correct,Ss is > N-L’)endDAFx-in(M*Sa+N)=O;Overlap = DAFx-in(l:N);% **** Main TimeScaleSOLA loop****for ni=l:M-lgrain=DAFx-in(ni*Sa+l:N+ni*Sa);XCORRsegment=xcorr(grain(i:L),Overlap(i,ni*Ss:ni*Ss+(L-l)));[xmax(l,ni) ,index(l,ni)]=max(XCORRsegment);fadeout=i:(-l/(length(Overlap)-(ni*Ss-(L-l)+index(l,ni)-l))) :O;fadein=O:(i/(length(Overlap)-(ni*Ss-(L-l)+index(l,ni)-i))):l;Tail=Overlap(l,(ni*Ss-(L-l))+ ...Begin=grain(l:length(fadein)).*fadein;Add=Tail+Begin;Overlap=[Overlap(l,l:ni*Ss-L+index(l,ni)-l) ...index(1,ni)-l:length(Overlap)).*fadeout;Add grain(length(fadein)+l:N)l ;end;% **** end TimeScaleSOLA loop****% Output in WAV filesound(Overlap,44100);wavwrite(Overlap,Fs,’xl-time-stretch’);7.3.3 Pitch-synchronous Overlap and Add(PSOLA)A variation of theSOLA algorithm for time stretchingis the Pitch SynchronousOverlap andAdd (PSOLA) algorithm proposed by Moulineset al. [HMC89,MC901especially for voice processing.It is based on the hypothesis that the input soundis characterized bya pitch, as for examplehuman voice and monophonic musicalinstruments.In this case PSOLA can exploit the knowledge of the pitch to correctly syn-chronize thetime segments, avoiding pitch discontinuities.When we perform timestretching ofan input sound, thetimevariation ofthepitch P(t)should be stretchedaccordingly.Ift = at describes thetime scaling function or time warping functionthatmaps the timet of the input signal into thetime 2of the output signal, the locapitch of the output signalp(i)will be defined byP(i)= P(&) = P(t).More gen-erally,when the scaling factora is not constant,a nonlinear time scaling functioncan be defined asi = 7 ( t )= S, C X ( T ) ~ Tand used instead ofi = at.The algorithm is composedoftwo phases: thefirst phase analyses and segmentsthe input sound(see Fig.7.8), and the second phase synthesizesa time stretchedversion by overlapping and adding time segments extracted by the analysisrithm.t
229.
212 7 Tame-segment ProcessingPitch SynchronousAnalysis-0.5I l l I I I I 10 1000 2000 3000 4000 5000 6000 7000 8000n +Figure 7.8 PSOLA: Pitch analysis and block windows.Analysis algorithm (see Fig. 7.9):Determination of the pitch period P ( t )of the inputsignal and of time instants(pitch marks)ti. These pitch marks are incorrespondence with the maximumamplitude or glottal pulses at a pitch synchronous rate during the periodicpart of the soundand at a constantrateduringthe unvoiced portions.Inpractice P(t) is considered constant P ( t ) = P(ti) = ti+l - ti on thetimeinterval (ti, ti+l).Extraction of a segment centered at every pitch mark ti by using a Hanningwindow with the length Li = 2P(ti)(two pitch periods) to ensure fade-in andfade-out.PSOLAanalysisI-segments7I IFigure 7.9 PSOLA pitch analysis.Synthesis algorithm (see Fig. 7.10): for every synthesis pitch mark [k1. Choice of the corresponding analysis segment i (identified by the time markti) minimizing the time distance lati - f k l .
230.
7.3 Time 2132. Overlap and add the selected segment. Notice that some input segments willberepeated for a > 1 (timeexpansion) or discardedwhen a < 1 (timecompression).3. Determination of the time instant ik+,+l where the next synthesis segment willbe centered, in order to preserve the local pitch, by the relationU- segments--LA time stretchingSynthesispitch marksFigure 7.10 PSOLA synthesis for time stretching.The basic PSOLA synthesis algorithm can be implemented in MATLAB by thefollowing M-file 7.2:M-file 7.2 (pso1a.m)function out=psola(in,m,alpha,beta)% in input signal% mpitchmarks% alphatime stretchingfactor% betapitchshiftingfactorP = diff (m); %compute pitch periodsif m( l)<=P(l ) , %remove first pitch markm=m( 2 :length(m)) ;P=P(2:length(P));end
231.
214 7 Time-segment Processingif m(length(m))+P(length(P))>length(in) %remove last pitch markm=m(l:length(m)-l);elseP=[P P(length(P))I ;endLout=ceil(length(in)*alpha) ;out=zeros(l,Lout); %output signaltk = P(l)+l; %outputpitchmarkwhile round(tk)<Lout[minimum i] = min( abs(alpha*m - tk) ) ; %find analysis segmentpit=P (i);gr = in(m(i)-pit:m(i)+pit) .* hanning(2*pit+l);iniGr=round(tk)-pit;endGr=round(tk)+pit;if endGr>Lout, break; endout(iniGr:endGr) = out(iniGr:endGr)+gr; %overlap new segmenttk=tk+pit/beta;end %whileIt should be noticed that the determination of the pitch and of the position ofpitch marks is not a trivial problem and could be difficult to implement robustly inreal-time. Stretching factorstypically range from Q = 0.25 to 2 for speech. Audiblebuzziness appears in unvoiced sound when larger values areapplied,due to theregularrepetition of identical input segments. In order to prevent thealgorithmfromintroducingsuch an artificial short-term correlation in the synthesissignal,it is advisable to reverse the timeaxis of every repeated version of an unvoicedsegment. With such an artifice, speech can be slowed down by a factor of four, eventhough some tonal effect is encountered in voiced fricatives, which combine voicedand unvoiced frequency regions and thus cannot reversed in time.It is possible to furtherexploit the analysis phase. Infact, uniformly applied timestretching can produce some artifacts on the non-periodic parts of the sound. Forexample a plosive consonant can be repeated if the synthesis algorithm chooses thetime segment containing the consonant twice. The analysis can then be extendedin order to detect the presence of fast transitions. During synthesis, the time scalewill not be modified at these points, thus the segments will not be repeated. Thisapproach can be generalized for non-speech sounds where a large time scale changeduring transitions (e.g. attacks)would dramatically changethe timbreidentity. Alsoin this case it is possible to limit time stretching during transitions and apply itmainly to the steady state portion of the input sound. This technique is usuallyapplied to digital musical instruments based on wavetable synthesis. On the otherhand, the deformation of transient parts can be considered an interesting timbretransformation and can be appreciated as a musically creative audio effect.
232.
7.4 PatchShafting 2157.4 Pitch ShiftingIntroductionTransposition is one of the basic tools of musicians. When we think about providingthis effect by signal processing means, we need to think about the various aspectsof it. For a musician, transposing means repeating a melody after pitch shifting itby a fixed interval. Each time the performer transposesthe melody, he makes use ofa different register of his instrument. By doing so, not only the pitch of the soundis modified but also the timbre is affected.In the realm of DAFx,it is a matter of choice to transposewithouttakinginto account the timbre modificationorwhether the characteristic timbre of theinstrument has to be maintained in each of its registers. The firstmethod couldbe called “variable timbre transposition” whereas the second approach would becalled “constant timbre transposition”. To get an insight into the problem we haveto consider the physical origins of the audio signal.The timbre of a sound heavily depends on the organization of its spectrum. Amodel can be derived from the study of the singing voice. The pitch of a singingvoiceis determined by the vocal chords and it can be correlated with the set offrequencies available in the spectrum. The timbreof the voice is mainly determinedby the vocal cavities. Their effect is to emphasize someparts of the spectrumwhichare called formants. A signalmodelcanbederived where an excitationpart ismodified by a resonance part. In the case of the voice, the excitation is providedby the vocal chords, hence related to the frequencies of the spectrum, whereas theresonances correspond to theformants. When a,singer transposes a tune, he has,tosome extent, thepossibility of modifying the pitch and the formantsindependently.In a careful signalprocessing implementation of this effect, each of these two aspectsshould be considered.If only the spectrum of the excitation is stretched or contracted, a pitch trans-position up or down, with a constant timbre, is achieved. If only the resonances arestretched or contracted, then the pitch remains the same but the timbre is varied.The harmonic singing relies on this effect. If both excitation and resonance are de-liberately and independently altered, then we enter the domain of effects that canbe perceived as unnatural, but that might have a vast musical potential.The separation of a sound into its excitation and resonance part is a complexprocess that will be addressed in Chapter 9. Wewill present here methods whichsimultaneously alter both aspectssuch as theharmonizer or pitch shiftingby delay-line modulation in section 7.4.3. A more refined method based on PSOLA, whichallows pitch shifting with formant preservation, will be discussed in section 7.4.4.For more advanced pitch shifting methods we refer to Chapters 8-11.Musical ApplicationsTypical applications of pitch shifting in pop music are the correction of the into-nation of instruments or singers as well as the production of an effect similar to a
233.
216 7 Time-segment Processingchorus. When the voice of a singer is mixed with copies of itself that are slightlytransposed, a subtle effect appears thatgives the impression that one is listening toa choir instead of a single singer.The harmonizercanalsoproducesurprising effects such as a manspeakingwith a tiny high pitched voice or a female with a gritty low-pitched one. Extremesoundscanbeproducedsuchasthedeepsnaredrumsound on David Bowie’s“Let’s Dance” record [Whi99]. It has also been used for scrambling and unscram-bling speech [GRH73]. In combination with a delay line and with feedback of thetransposed sound to the input,a kind of spiral can be produced where the soundisalways transposed higher or lower at each iteration.A subtle effect, similar to a phasing, can be achieved with a set of harmonizers[Dut88] coupledin parallel andmixed to the input sound,as shown in Fig. 7.11. Thetransposition ratio of the nthharmonizer should be set to 1+nr where r is of theorder of 1/3000. If fo is the pitch of the sound, the outputs of the nth harmonizerwill provide a pitch of fo +nAf, where Af = rfo. If Af is small enough (a few1/100 Hz) the interferencesbetweenthevariousoutputs of theharmonizers willbe clearly audible. When applied, for example, to alow-pitched tuba sound, oneharmonic after the other will be emphasized. Flanging and chorus effects can alsobe achieved by setting the pitch control for a very slight amount of transposition(say, 1/10 to 1/5of a semitone) and adding regeneration [And95,p. 531. It appearshere that tuning an audio effect is very dependent on the sound being processed.Itfrequently happens that the tuning has to be adjustedfor each new sound or eachnew pitch.r=l.O003+* r=l.O009 +P-*B-t------ri__lr=1.0012Figure 7.11 A set of harmonizers that produce a phasing-like effect. It is particularlyeffective for low-pitched (typ. 100 Hz) signals of long duration.Hans Peter Haller describes in [Ha195,pp. 51-55] some applications of the har-monizer for the production of musical works from Luigi Nono and Andrk Richard.7.4.1 HistoricalMethods - HarmonizerThe tape-based machines described in 7.3.1 were also able to modify the pitch ofsounds whilekeeping theirinitialduration.The Phonogdne universe1 was bulkyand could not find a broad diffusion but in the middle of the 1970s, a digital deviceappearedthat was called a Harmonizer. It implemented in thedigitaldomain a
234.
7.4 Pitch 217process similar to thatof the Phonogdne universeb From there on the effect becamevery popular. Since Harmonizer is a trade mark of the Eventidecompany, othercompanies offer similar devices under names suchas pitch transposer or pitch shifter.The main limitation of the use of the harmonizer is the characteristic qualitythat it gives to the processed sounds. Moles statesthatthe operatingrange ofthe Phonogbne universel, used as apitchregulator, was at least -4 to +3 semi-tones [Mo160, p. 741. Geslin estimates that the machines available in the late six-ties found application in musique concr6te also at much larger transposition ratios[GesOO].The digital implementations in the form of the harmonizermight allow for abetter quality but there are still severe limitations. For transpositions in the orderof a semitone, almost no objectionable alterationof the sounds can be heard.As thetransposition ratio grows larger, in the practical range of plus or minus 2 octaves,the timbreof the outputsound obtains a character that is specificto theharmonizer.This modification can be heard both in the frequency domain and in the timedomain and is due to the modulation of the signal by the chopping window. Thespectrum of theinput signal is indeed convolved with that of the window. Thetime-domain modulation can be characterized by its rate and by the spectrum ofthe window, which is dependent on its shape and its size. The longer the window,the lower the rate andhence the narrower the spectrumof the window and the lessdisturbing the modulation. Theeffect of a trapezoidal window will be strongerthanthat of a smoother one, such as the raised cosine window.On the other hand, a larger window tends to deliver, through the overlap-addprocess, audible iterated copies of the input signals. For the transposition of per-cussive sounds, it is necessary to reduce the size of the window. Furthermore, toaccuratelyreplaytransientsandnotsmooththemout,the window should havesharp transit.ions. We see that a trade-off between audible spectral modulation anditeratedtransientshastobe found for each type of sound. Musicians using thecomputer as a musical instrument might exploit these peculiaritiesin the algorithmto give their sound a unique flavor.7.4.2 Pitch Shifting by Time Stretching and ResamplingThe variablespeedreplay discussed in section 7.2 leads to a compression or ex-pansion of the duration of a sound and to a pitch shift. This is accomplished byresampling in the time domain. Figure 7.1 illustrates the discrete-time signals andthe corresponding spectra. The spectrum of the sound is compressed or expandedover the frequency axis. The harmonic relationsof the sound are not altered but are scaled according to
235.
218 7 Tame-segmentProcessingThe amplitudesof the harmonics remain the same a;" = In order to rescale thepitch shifted sound towards the original length a further time stretching algorithmcan be appliedto thesound. The result of pitch shiftingfollowed bya time stretchingalgorithm is illustrated in Fig. 7.12.Pitch scaling(GI), time domain signals0 2000 4000 6000 8000(a=0.5)-0.51 10 2000 4000 6000 8000-0.51 10 2000 4000 6000 8000n +O 7Pitch scaling(a=l),spectra-202-40-60-80-0 1000 2000 3000 4000 5000-20O+--40-60-an""0 1000 2000 3000 4000 5000(a=2)-20t h h ih2-40-60-anI-O 1000 2000 3000 4000 5000f in Hz +Figure 7.12 Pitch shifting followed by time correction.The order of pitch shiftingand timescaling can be changed,as shown in Fig. 7.13.First, a time scaling algorithm expands the input signalfrom length NI tolength NZ.Then a resampling operation with the inverse ratio Nl/N2 performs pitch shiftingand a reduction of length NZ back to length N I .x(n,dlime Scaling I 4 Resampling y(n)(ratio NZ/N1) (ratioNlJN2)Figure 7.13 Pitch shifting by time scaling and resampling.The following M-file 7.3 demonstratestheimplementation of the SOLA timescaling and pitch scaling algorithm:M-file 7.3 (PitchSca1eSOLA.m)% PitchScaleS0LA.m
236.
7.4 Pitch Shifting 219% Parameters:% analysishopsize Sa = 256(defaultparmater)% blocklengthN = 2048 (defaultparameter)% pitch scaling factor 0.25 <= alpha <= 2% overlapinterval L = 256*alpha/2clear al1,close all[signal,Fs] = wavread(’x1.wav’);DAFx-in = signal’;Sa=256;N=2048; % timescalingparametersM=ceil(length(DAFx-in)/Sa);ni=512;n2=256; % pitchscalingni/n2Ss=round(Sa*ni/n2) ;L=256* (ni/n2)/2;DAFx-in(M*Sa+N)=O;Overlap=DAFx-in(1:N);% ****** Time Stretching with alpha=n2/ni******....... include main loop TimeScaleS0LA.m% ****** End Time Stretching******% ****** Pitch shifting with alpha=ni/n2******lfen=2048;lfen2=lfen/2;wI=hanningz(lfen);w2=~1;% for linear interpolation of a grain of length lx to length lfenIx=f loor (If en*ni/n2);x=i+(o:lfen-i)’*lx/lfen;ix=floor(x);ixi=ix+i;dx=x-ix;dxi=i-dx;%lmax=max (Ifen,1x);Overlap=Overlap’;DAFx-out=zeros(length(DAFx-in) ,l) ;pin=O;pout=O;pend=length(Overlap)-lmax;while pincpendPitch shifting by resampling a grainof length lx to length lfengrain2=(0verlap(pin+ix).*dxi+Overlap(pin+ixi).*dx).* wi;DAFx~out(pout+i:pout+lfen)=DAFx_out(pout+l:pout+lfen)+grain2;pin=pin+nl;pout=pout+n2;end;
237.
220 7 Time-segment Processing7.4.3 Pitch Shifting by Delay Line ModulationPitch shifting or pitch transposingbased on block processing is described in severalpublications.In [BB891 a pitchshifterbasedon an overlap add scheme with twotime-varyingdelay lines is proposed (see Fig.7.14).A cross-fade block combinesthe outputs of the two delay lines according to a cross-fade function. The signal isdividedinsmallchunks. The chunks are read faster to producehigherpitchesorslower to produce lower pitches. In order to produce a continuoussignal output,twochunks areread simultaneouslywith a time delayequal to one half of theblock length. A cross-fade is made from one chunk to the other at each end of achunk [WG94, pp. 257-2591.Figure 7.14 Pitch shifting.The length of the delay lines is modulated by a sawtooth-type function.A similarapproach is proposed in [Dat87] where thesame configuration is used for timecompression and expansion.A periodicity detection algorithm is used for calculatingthe cross-fade function in order to avoid cancellations during the cross-fades.An enhanced method for transposing audio signals is presented in [DZ99].Themethod is basedon an overlap-add scheme and doesnot need anyfundamentalfrequency estimation. The difference from other applications is the way the blocksare modulated andcombined to the outputsignal. The enhanced transposing systemis based on an overlap-add scheme with three parallel time-varying delay lines.Figure 7.15 illustrates how theinput signal is divided into blocks, which areresampled (phase modulation with a ramp type signal), amplitude modulated andsummed yielding an output signal of the same length as the input signal. Adjacentblocks overlap with 2/3 of the block length.The modulation signals form a system of three 120"-phase shifted raised cosinefunctions. The sum of thesefunctions is constant for allarguments.Figure 7.16also shows the topology of the pitch transposer. Since a complete cosine is used formodulation, theperceived sound qualityof the processed signal ismuch better thanin simple twofold overlap-add applications using several windows. The amplitudemodulation only produces sum anddifference frequencies with thebase frequency ofthe modulation signal, which can be very low (6-10 Hz). Harmonics are not present
238.
7.4 Pitch Shifting 221inputbackwardjumpbackwardjumpr. . , P : . .1 block 1 I block2outputFigure 7.15 Pitch transposer: block processing, time shifting and overlap-add.in the modulation signal a,nd hence cannot form sum or difference frequencies ofhigherorder. The perceived artifacts are phasing-like effects and are less annoy-ing than local discontinuities of other applications based on twofold overlap-addmethods.vZ-O+frac3Figure 7.16 Pitch transposer: block diagram.Ifwe want to change the pitch of a signal controlled by another signal or signalenvelope, we can also makeuse of delay line modulation. Theeffect can beachievedby performing a phase modulationof the recorded signal accordingto y(n) = x(n-D(n)).The modulating factor D(n)= M +DEPTH. x,,,(n) is now dependent ona modulating signal xmOd(n).With this approach the pitch of the input signal ~ ( n )is changed according to the envelope of the modulating signal (see Fig. 7.17).
239.
222 7 Time-segment ProcessingModulation DepthFigure 7.17Pitch controlled by envelope of signal zmOd(n).7.4.4 Pitch Shifting by PSOLAandFormant PreservationThistechnique is thedualoperation to resamplingintimedomain,but in thiscase a resampling of the short-time spectralenvelope is performed. The short-termspectral envelope describes a frequency curve going through all amplitudes of theharmonics. Thisis demonstrated in Fig. 7.18,where the spectralenvelope is shown.The harmonics are again scaled according to f:""= p .&ld, but the amplitudesof the harmonics U?" = env(f,""") # .:ld are now determined by samplingthespectral envelope. Some deviations of the amplitudesfrom the precise envelope canbe noticed. This depends on thechosen pitch shifting algorithm.The PSOLA algorithm can beconveniently used for pitch shifting a voice soundmaintaining the formant position, and thus the vowel identity [ML95, BJ951. Thebasic idea consistsof time stretching the positionof pitch marks,while the segmentwaveform is not changed. The underlining signal model of speech production is apulse train filtered by a time varying filter corresponding to the vocal tract. Theinput segment corresponds to the filter impulseresponse and determines the for-mant position. Thus, it should not bemodified. Conversely,the pitch mark distancedetermines thespeech period, and thus should be modified accordingly. The aim ofPSOLA analysisis to extract thelocal filter impulse response.As can beseen in Fig.7.19, the spectrum of a segment extracted using a Hanning window with a lengthof two periods approximates the local spectral envelope. Longer windows tend toresolve the fine line structure of the spectrum, while shorter windows tend to blurthe formant structure of the spectrum. Thus ifwe do not stretch the segment, theformant position is maintained. The operation of overlapping the segments at thenew pitch mark position will resample the spectral envelope at the desiredpitchfrequency. When we desire a pitch shift by a factor p,defined as the ratio of thelocal synthesis pitch frequency to the original one p = &(i)/fo(t), the new pitchperiod will be given by P(i)= P(t)/P,where in this case i = t because time is notstretched.Theanalysisalgorithm is the sameasthat previouslyseen for PSOLAtimestretching in section 7.3.3 (see Fig. 7.9). The synthesis algorithm is modified (seeFig. 7.20) according to the following steps:for every synthesis pitch mark fk1. Choice of the correspondinganalysissegment i (identified by the timemark ti) minimizing the time distance (ti- &l.
240.
7.4 Pitch Shafting 223Pitch scaling(GI), time domain signals.l0 2000 4000 6000 8000(a=0.5)O 7Pitch scaling(a=l),spectra-208-40-60-80c0 1000 2000 3000 4000 5000(e0.5)- -C O x -40-0.5-60-800 1000 2000 3000 4000 50000 2000 4000 6000 8000k-2)0.5l JC O-0.51 i0 2000 4000 6000 8000n +O 7(a=2)g-40-60-800 1000 2000 3000 4000 5000f in Hz +Figure 7.18 Pitch shifting by the PSOLA method:frequencyresampling the spectralenvelope.2. Overlap and add the selected segment. Notice that some input segmentswill be repeated for p > 1(higher pitch) or discarded when p < 1 (lowerpitch).3. Determination of the time instantl k + l where the next synthesis segmentwill be centered, in order topreserve the local pitch, by the relation&+l = ik +P&) = ik +P(tZ)/O.0 for large pitch shifts, it is advisable to compensate the amplitude variation,introduced by the greater or lesser overlapping of segments, by multiplyingthe output signal by l/p.It is possible to combine time stretching by a factor a! with pitch shifting. In thiscase for everysynthesispitchmark & the first step of thesynthesisalgorit,hmabove presented will be modified as choice of the corresponding analysis segment i(identified by the time mark ti) minimizing the time distance Iati - fkl.The PSOLA algorithm is very effectivefor speech processingand is computation-ally very efficient, once the sound has been analyzed,so it iswidely used for speech
241.
224 7 Tame-segment Processing-1 1 I I I I l l I I5 10 15 20 25 30 35 40 45time inmsec --f0 --700 1000 2000 3000 4000 5000v- I "0 1000 2000 3000VI4000li5000fin Hz --fFigure 7.19 Spectrum of segments extracted from a vowel /a/ by using a Hanningwindowrespectively long 4 (dotted line), 2 (solid line), and 1 (dashed line)pitchperiods. It canbe noticed that the solid line approximates the local spectral envelope.synthesis from a database of diphones, for prosody modification, for automatic an-swering machines etc. For wide variation of the pitch it presents some artifacts. Onthe other hand the necessity of a preliminary analysis stage for obtaining a pitchcontour makes the real-time implementation of an input signal modification diffi-cult. Also the estimation of glottal pulsescan be difficult. Asolution is to placethe pitch marks at a pitch synchronous rate, regardless of the true position of theglottal pulses. The resultingsynthesisquality will beonlyslightlydecreased (seefor example Fig. 7.21).A further effect that can be obtained by a variation of PSOLA is linear scalingof formant frequencies (see Fig. 7.22). In fact, we saw that a time scale of a signalcorresponds to aninverse frequencyscale. Thus when we perform time scalingof theimpulse response of a filter,we inversely scale the frequency of formants. In PSOLAterms, this corresponds to timescaling the selected input segments before overlapand add in the synthesis step, without anychange inthe pitch marks calculation.Toincrease the frequencies of formants by a factory, every segment should be shortenedby afactor l/y by resampling. For example, the averageformant frequencies offemale adults are about 16 percent higher than those of male adults, and childrens
242.
7.4 Patch Shifting 225Figure 7.20PSOLA: synthesis algorithmfor pitch shifting.formants are about20 percent higher thanfemale formants. Notice that careshouldbetaken when the frequencies increase in order to avoid foldover. Ideally band-limited resampling should be used.The following M-file7.4 showsthe implementationof the basic PSOLA synthesisalgorithm. It is basedon the PSOLA time stretching algorithm shown in sect,ion7.3.3.M-file 7.4 (psolaf.m)function out=psolaF(in,m,alpha,beta,gamma)% . . .% gamma newFormantFreq/oldFormantFreq%% the internalloopastk = P(1)+1; %output pitchmarkwhile round(tk)<Lout. . .[minimum i]=min(abs(alpha*m-tk) ) ; % find analysis segmentpit=P(i) ;pitStr=floor(pit/gamma);gr=in(m(i)-pit:m(i)+pit) .*hanning(2*pit+l);gr=interpl(-pit:l:pit,gr,-pitStr*gamma:gamma:pit);% stretch segm.iniGr=round(tk)-pitStr;endGr=round(tk)+pitStr;if endGr>Lout, break;endout(iniGr:endGr)=out(iniGr:endGr)+gr; % overlap new segmenttk=tk+pit/beta;end % end of while
243.
226 7 Tame-segment Processingx IO42W 1-0.-E OEm -1-2100 200 300 400 500 600 700n +.-cm,5 -1 l-2 -60-50 100 150 200 250 300 0 1000 2000 3000 4000 5000freq in Hz50 100 150 200 250 300 0 1000 2000 3000 4000 5000n-, freq in HzFigure 7.21 Comparison of a segment extracted in the correspondence with glottal pulsewith one extracted between pitch pulses.7.5 Time Shuffling andGranulation7.5.1 Time ShufflingIntroductionMusique concrbte has made intensive use of splicing of tiny elements of magnetictape. When mastered well, this assembly of hundreds of fragments of several tensof millisecondsallows an amalgamation of heterogeneous sound materials, at thelimit of the time discriminationthreshold.Thismanualoperation called micro-splicing was verytime-consuming.BernardParmegianisuggested in 1980 at theGroupede RecherchesMusicales (GRM)thatthis couldbedone by computers.An initial version of the software was produced in the early eighties. After beingrewritten] improved and ported several times, it was eventually made available onpersonal computers in the form of the program called brassage in French that willbe translated here as time shuffling [Ges98, GesOO].
244.
7.5 Tame Shuflang and Granulationx lo4 original grain227spectrum210)Uc.-i i oE-1-2--150 -100 -50 0 50 1 0 0n +2 7x lo4 time-stretched grain-150 -100 -50 0 50 100 150n +freq in Hzspectrumfreq. in HzFigure 7.22 PSOLA: variation of PSOLA as linear formant scaling.Signal ProcessingLet us describe herean elementary algorithm for time shuffling that is based on thesuperposition of two time segmentsthat are picked randomly from the input signal(see Fig. 7.23):1. Let x(n)and y(n) be the input and output signals.2. Specify the duration d of the fragments and the duration D 2 d of the timeperiod [n- D,n]from which the time segments will be selected.3. Store the incoming signal x(.) in a delay line of length D.4. Choose at random the delay time TI with d 5 TI 5 D.5. Select the signal segment x1d of duration d beginning at x(n - 7 1 ) .6. Follow the same procedure (steps 4 and 5 ) for a second time segment x2d.7. Read x 1 d and x 2 d and apply an amplitude envelope W to each of them inorder to smooth out the discontinuities at the borders.
245.
228 7 Time-segment Processingx(n)InputsignalIn-D n-rr n-r9 n lI ,;/ *Timex I d x 2d Inputbuffer of lengthDOverlap andAddP Output buffer4 Y(n) OutputsignalFigure 7.23 Time shuffling: 2 input segments,selected at random from the past inputsignal, are overlap-added toproduceanoutputtime segment. When one of theinputsegment is finished, a new one is selected.8. When the reading of xld or x 2 d is finished, iterate the procedure for each ofthem.9. Compute the output as the overlap add of the sequence of x1d and x 2 d witha time shift of d/2.Musical Applications and ControlThe version described above introduces local disturbances into the signal’s actualtiming, while preserving the overall continuity of its time sequence. Many furtherrefinements of this algorithm arepossible. A random amplitudecoefficient could beapplied to each of the input segments in order to modify the density of the soundmaterial. The shape of the envelope could be modified in order to retain more ofthe input time structure or, on the contrary, to smooth it out and blend differentevents with each other. The replay speed of the segments could be varied in orderto produce transposition or glissandi.At a time whencomputertools were notyetavailable,BernardParmegianimagnificently illustrated the technique of tape-based micro-splicing in works suchas “Violostries” (1964) or“Dedans-Dehors” (1977) Im-Par64, m-Par771. The ele-mentary algorithm presented above can be operated in real time but other off-lineversions have also been implemented which offered many more features. They havethe ability to mergefragments of any size,sampled from arandom field, also ofany dimension: from a few samples to several minutes. Thus apart from generatingfusion phenomena, for which the algorithm was conceived, the software was able toproduce cross-fading of textured sound and other sustained chords,infinitely small
246.
7.5 Time Shufling and Granulation 229variations in signal stability, interpolation of fragments with silence or sounds ofother types [Ges98]. Jean-Claude Risset used this effect to perform sonic develop-ments from short sounds,such as stones and metalchimes [m-INA3, Sud-I, 3’44” to4’38”];[Ris98,GesOO] and to producea “stuttering” piano, furtherprocessed by ringmodulation [m-INA3, Sud-I,4’30”,5’45”J.Starting from“foundobjects”such asbird songs, he rearranged them in a compositional manner to obtainfirst a pointil-listic rendering, then a stretto-like episode [m-INA3, Sud-I, 1’42” to 2’49”l.7.5.2 GranulationIntroductionIn the previous sections about pitch shifting and time stretching we have proposedalgorithms that have limitations as faras their initial purposeis concerned. Beyonda limited range of modification of pitch or of time duration, severe artifacts appear.The timeshufflingmethod considers these artifactsfrom an artisticpoint of view andtakes them for granted. Out of the possibilities offered by the methods and by theirlimitations, it aims to create new sound structures. Whereas the timeshufflingeffectexploits the possibilities of a given software arrangement,which could be consideredhere as a “musical instrument”, theidea of building a complex sound out of a largeset of elementary sounds could find a larger framework.The physicist Dennis Gabor proposed in 1947the ideaof the quantumof sound,an indivisible unit of information from the psychoacoustical pointof view. Accordingto his theory, a granular representationcould describe any sound.Granular synthesiswas first suggested as a computer music technique for producing complex sounds byIannis Xenakis (1971) and Curtis Roads (1978). This technique builds up acousticevents from thousandsof sound grains.A sound grain lastsa brief moment (typically1to 100 ms), which approaches the minimum perceivable event time for duration,frequency, and amplitude discrimination [Roa96, Roa98, TruOOa].The granulation effect is an application of granular synthesis where the materialout of which the grains areformed is an inputsignal. Barry Truax hasdeveloped thistechnique [Tru88,Tru941by a first real-time implementation andusing it extensivelyin his compositional pieces.Signal ProcessingLet x(n)and y(n) be the input and output signals. The grains g k ( i ) are extractedfrom the input signal with the help of a window function z u k ( i ) of length L k byg k ( i ) = x(i +i k ) w k (i) (7.6)with i = 0,...,L k - 1 . The time instant ak indicates the point where the segmentis extracted; the lengthL k determines the amount of signal extracted; thewindowwaveform w k ( i ) should ensure fade-in and fade-out at the border of the grain andaffects the frequency content of the grain. Long grains tend to maintain the timbre
247.
230 7 Time-segment Processingidentity of theportion of theinputsignal, while short onesacquire a pulse-likequality.When thegrain is long,the window has a flat topandit usedonly tofade-in and fade-out the bordersof the segment.The following M-files 7.5 and 7.6 show the extraction of short and long grains:M-file7.5 (grainSh.m)function y = grainSh(x,init,L)% extract a short grain% x inputsignal% init first sample% L grain length (in samples)y=x(init:init+L-l).*hanning(L)’;M-file7.6 (grainLn.m)function y = grainLn(x,iniz,L,Lw)% extract a long grain% x inputsignal% init first sample% L grain length (in samples)% LW length fade-in and fade-out (in samples)if length(x) <= iniz+L , error(’length(x) too short.’), endy = x(iniz:iniz+L-l) ; % extractsegmentW = hanning(2*Lw+l)’;y(L-Lw+l:L) = y(L-Lw+l:L) .*w(Lw+2:2*Lw+l); fade-outy(1:Lw) = y(1:Lw).*w(i:Lw); % fade-inThe synthesis formula is given bywhere ak is an eventual amplitude coefficient and nk is the time instant where thegrain is placed in the output signal. Notice that the grains can overlap. To overlapa grain gk (grain) at instant n k = (iniOLA) withamplitude ak, the followingMATLAB instructions can be usedendOLA = iniOLA+length(grain)-1;y(ini0LA:endOLA) = y(ini0LA:endOLA) + ak * grain;An example of granulation with random values of the parameters grain initialpoint and length, output point and amplitude is shown in Fig. 7.24. The M-file 7.7shows the implementationof the granulation algorithm.M-file 7.7 (granu1ation.m)% granulation.mf=fopen(’a-male.mI1’);x=f read(f, int 16) ;
248.
7.5 Tame Shuflang and Gran,ulataon 231Ly=length(x) ; y=zeros (1,Ly) ;% ConstantsnEv=4;maxL=200;minL=50;Lw=20;L = round((maxL-minL)*rand(l,nEv))+minL;initIn = ceil((Ly-maxL)*rand(l,nEv));initout= ceil((Ly-maxL)*rand(l,nEv));a = rand(1,nEv);endOut=initOut+L-1;Synthesisfor k=l:nEv,grain=grainLn(x,initIn(k) ,L(k) ,LW);Initializations%output signal%grain length%init grain%init out grain%ampl. grainy(initOut(k) :endOut(k))=y(initOut(k) :endOut(k))+a(k)*grain;endFigure 7.24 Example of granulation.This technique is quite general and can be employed to obtain very differentsound effects. The result is greatly influenced by the criterion used to choose theinstants nk. If thesepointsareregularlyspaced in time and the grain waveformdoes not changetoo much, the technique can be interpreted asa filtered pulse train,i.e. it produces a periodic sound whose spectral envelope is determined by the grainwaveform interpretedasimpulseresponse. An example is the PSOLAalgorithmshown in the previoussections 7.3.3 and 7.4.4. When thedistance between twosubsequent grains is much greater thanLk, the sound will result in grains separatedby interruptions or silences. witha specific character.Whenmanyshortgrainsoverlap (i.e. the distance is less than Lk), a sound texture effect is obtained.
249.
232 7 Time-segment ProcessingThe strategiesfor choosing the synthesis instants can be grouped intotwo rathersimplified categories:synchronous,mostlybasedondeterministicfunctions;andasynchronous, based on stochastic functions. Grains can be organized in streams.There are two main controlvariables: the delay between grains for a single stream,and the degree of synchronicity among grains in different streams. Given that thelocal spectrum affects the global sound structure, it is possible to use input soundsthat canbeparsed in grainswithoutalteringthe complex characteristics of theoriginal sound, as water dropsfor stream-like sounds.It is further possible to modify the grain waveform with a time transformation,such as modulation for frequencyshiftingor timestretching for frequency scal-ing [DP91]. The main parametersof granulation are: grain duration,selection orderfrom input sound, amplitudeof grains, temporal patternin synthesis, grain density(i.e. grains per second). Densityis a primary parameter, as it determines theoveralltexture, whether sparse or continuous. Notice that it is possible to extract grainsfrom different sound files to create hybrid textures, e.g. evolving from one textureto another.Musical ApplicationsA demonstration of the effect is provided in [m-Tod99]. Further examples can befound in [m-Wis94c]. Barry Truax hasused the technique of granulation to processsampledsound as compositionalmaterial. In “The Wings of Nike” (1987) he hasprocessed only short “phonemic” fragments but longer sequences of environmentalsound have been used in pieces such as “Pacific” (1990).In each of these works, thegranulated material is time stretched by various amounts and thereby produces anumber of perceptual changes that seem to originate from within the sound [TruOOb,m-Tru951.In “Le Tombeau de Maurice”, Ludger Brummer uses the granulation techniquein order to perform timbral, rhythmic aswell as harmonic modifications [m-Bru97].A transition from the originalsound color of an orchestral sample towards noisepulses is achieved by reducing progressively the size of the grains. At an intermediategrainsize, the pitch of the originalsound is stillrecognizablealthough the timestructure has already disappeared [m-Bru97, 3’39”-4’12”].A melody can be playedby selectinggrains of different pitches and by varying thetempoat which thegrains are replayed [m-Bru97, 8’38”-9’10”].New melodies can even appear out of atwo-stage granulation scheme. A first series of grains is defined out of the originalsample whereas the second is a granulation of the first one. Because of the streamsegregation performed by the hearing system, the rhythmic as well as the harmonicgrouping of the grains is constantly evolving [m-Bru97, 9’30”-10’33”l.7.6 ConclusionThe effects described in this chapter are based on the division of the input soundinto short segments. These segments areprocessed by simple methods such as time
250.
Sound and Music 233scaling by resampling oramplitudemultiplication by an envelope. The segmentwaveform is not changed, thus maintaining the characteristic of the source signal.Twocategories of effects can be obtained, depending on the strategy used toplace the segments in time during the synthesis. If the order and organization ofextracted segments are carefully maintained, time stretching or pitch shifting can beperformed. Basic methods, SOLA and PSOLA, arepresented and their characteris-tics arediscussed. These effects aim toproduce soundsthat areperceived as similarto theoriginal, but are modified in duration or pitch. As often happens with digitalaudio effects, the artifacts produced by these methods can be used as a method fordeformation of the input sound, whilst maintaining its maincharacteristics.Thelow computational complexity of time-segment processing allows efficient real-timeapplications. Nevertheless, these algorithms produce artifacts thatlimit their scopeof application. More advanced methods for time stretching and pitch shifting willbe introduced in Chapters 8-11.The second category changes the organization and the orderof the segments toa great extent, and thus leads to time shuffling and granulation. In this case, theinput sound can be much less recognizable. The central element becomes the grainwith its amplitude envelope and time organization. These techniques can produceresults from sparsegrains to dense textures,with a very loose relationshipwiththe original sound. It should be noticed that the wide choice of strategies for grainorganization implies a sound composition attitude from the user. Thus granulationbecame a sort of metaphor for music composition starting from the microlevel.Sound and Music[m-Bru93][m-Bru97][m-Eim62][m-Fur93][m-Fur97][m-INA3][m-Par64]L. Brummer:TheGates of H. Computer music. CCRMA 1993. In:CRI, “The listening room”, CD edel 0014522TLR, Hamburg, 1995.L. Brummer: Le TombeaudeMaurice, for computer-generated tape.In: Computer music @ CCRMA. CD CCRMAV02, 1997.H. Eimert: Epitaph fur Aikichi Kuboyama. Electronic music composi-tion, 1962. Studio-Reihe neuer Musik, Wergo, LP WER 60014. Reedi-tion as a CD, Koch/Schwann, 1996.K. F’urukawa: Swim, Swan,composition for clarinet and live-electronics. ZKM, 1993.K. Furukawa: Denungeborenen Gottern.Multimedia-Opera, ZKM,1997.J.-C. Risset: Sud, Dialogues, Inharmonique,Mutations.CD INAC1003.B. Parmegiani: Violostries, 1964. IDEAMA CD 051 Target Collection,ZKM and CCRMA, 1996.
251.
234 7 Time-segment Processing[m-Par77] B. Parmegiani:Dedans-Dehors,1977.INA-GRM.[m-Sch98] P. Schaeffer and G. Reibel: Solfege de lobjet sonore. Booklet +3 CDs.First published 1967. INA-GRM, 1998.[m-Tod99] T. TodoroffReal-timegranulation.Revenant.aif,Reven2G.aif,RevenCmb.aif. DAFX-Sound Library.[m-Tru95]B.Truax:GranularTime-shiftingandTranspositionCompositionEx-amples. In Computer Music Journal, Volume 19 Compact Disc, Index6, 1995.[m-Wis94c] T. Wishart:Audible design, Soundexamples.CD.OrpheusthePan-tomime, York, 1994.Bibliography[And951[BB891[BJ95][Car921[Chi821[CR83][Dat87][DP91][Dut88][DZ99]C. Anderton. Multieffects for Musicians. Amsco Publications, 1995.K. Bogdanowicz and R. Blecher. Using multiple processors for real-timeaudio effects. In AES 7th International Conference, pp. 337-342, 1989.R.Bristow-Johnson.Adetailedanalysis of atime-domainformant-correctedpitchshiftingalgorithm. J. AudioEng. Soc., 43(5):340-352,1995.T. Cary. Illustrated Compendium of Musical Technology. Faber and Faber,1992.M.Chion. La musique dectroacoustique. QSJ No 1990, PUF, 1982.R.E.CrochiereandL.R.Rabiner. Multirate Digital Signal Processing.Prentice-Hall, 1983.J. Dattoro. Usingdigitalsignalprocessorchipsin a stereoaudiotimecompressor/expander.In Proc.83rd AES Convention, Preprint 2500,1987.G. DePoliand A. Piccialli. Pitch-synchronousgranularsynthesis. InG. De Poli, A. Piccialli, and C. Roads (eds), Representations of MusicalSignals, pp. 187-219. MIT Press, 1991.P. Dutilleux. Mise en axvre de transformations sonores sur un systbmetemps-reel. Technical report,RapportdestagedeDEA, CNRS-LMA,June 1988.S. Disch and U. Zolzer. Modulationand delay line baseddigitalau-dio effects. In Proc. DAFX-99 Digital Audio Effects Workshop, pp. 5-8,Trondheim, December 1999.
252.
Bibliography 235[End971 B. Enders. LexikonMusikelektronik. AtlantisSchott, 1997.[Gas871 P.S. Gaskell. Ahybridapproach to the variablespeedreplay of digitalaudio. J. AudioEng.Soc., 35:230-238, April 1987.[Ges98] Y. Geslin. Sound and music transformationenvironments:Atwenty-yearexperiment at the “Groupe deRecherches Musicales”. In Proc. DAFX-98Digital Audio Efects Workshop,pp. 241-248, Barcelona, November 1998.[GesOO] Y. Geslin.About the varioustypes of Phonoghnes.GRM,Personal com-munication, 2000.[GRH73] T.A. Giordano, H.B. Rothman, and H. Hollien. Helium speech unscram-blers - a critical review of the state of the art. IEEE n u n s Audio andElectroacoustics, AU-21(5), October 1973.[Ha1951 H.P. Haller. Das ExperimentalStudio der Heinrich-Strobel-Stiflung desSudwestfunks Freiburg 1971-1989, DieErforschung der ElektronischenKlangumformung ,und ihre Geschichte. Nomos, 1995.[HMC89] C. Hamon, E. Moulines, and F. Charpentier. A diphone synthesis system[Lar98][Lee721[Mas981[MC901[McN84][MEJ86][ML95][Mol601based on time-domain prosodic modificationsof speech. In Proc. ICASSP,pp. 238-241, 1989.J. Laroche.Time and pitchscalemodifications of audiosignals. InM. Kahrsand K.-H. Brandenburg(eds), Applications of Digital SignalProcessing to Audio and Acoustics, pp. 279-309. Kluwer, 1998.F.F. Lee. Time compression and expansion of speech by the samplingmethod. J. AudioEng.Soc., 20(9):738-742, November 1972.D.C. Massie. Wavetablesamplingsynthesis. In M. Kahrsand K.-H.Brandenburg(eds), Applications of DigitalSignal Processing toAudioand Acoustics, pp. 311-341. Kluwer, 1998.E. Moulines and F. Charpentier. Pitch synchronous waveform processingtechniques for text-to speech synthesis using diphones. Speech Commu-nication, 9(5/6):453-467, 1990.G.W. McNally. Variable speed replay of digital audio with constant out-put sampling rate. In Proc. 76th AES Convention, Preprint 2137, 1984.J. Makhoul and A. El-Jaroudi. Time-scale modificationin medium to lowrate speech coding. In Proc. ICASSP, pp. 1705-1708, 1986.E. Moulines and .J. Laroche.Non-parametrictechnique for pitch-scaleand time-scale modification of speech. Speech Communication, 16:175-205, 1995.A. Moles. Lesmusiques expe‘rimentales. Trad. D. Charles. Cercle d’ArtContemporain, 1960.
253.
236 7 Time-segment Processing[Pou54] J. Poullin.L’apportdestechniquesd’enregistrementdanslafabractiondematikresetformesmusicales nouvelles. Applications a la musiqueconcrbte. L’OndeElectrique, 34(324):282-291, 1954.[PS571 J. Poullinand D.A. Sinclair. TheApplication of Recording Techniquesto theProduction of New Musical Materials and Forms. Applicationto“Musique Concrkte”. National Research Council of Canada, 1957. Tech-nical Translation TT-646, pp. 1-29.[Ris98] J.-C.Risset.Example of the musicaluse of digitalaudio effects. InProc. DAFX-99 Digital Audio Effects Workshop, pp. 254-259, Trondheim,December 1998.[Roa96]C. Roads. TheComputerMusic Tutorial. MITPress, 1996.[Roa98] C.Roads.Micro-sound,historyand illusion. In Proc. DAFX-98 DigitalAudio Effects Workshop, pp. 260-269, Barcelona, November 1998.[RW85] S. Roucos and A.M.Wilgus. High quality time-scalemodification forspeech.In Proc. ICASSP, pp. 493-496, 1985.[%h731 P. Schaeffer. La musiqueconcrkte. QSJ No 1287, PUF 1973.[Spr55] A.M.Springer.Einakustischer Zeitregler. GravesanerBlutter, (1):32-27,July 1955.[Spr59] J.M. Springer.AkustischerTempo- undTonlagenregler. GravesanerBlutter, (13):80, 1959.[Tru88] B.Truax.Real-timegranularsynthesiswith a digitalsignalprocessor.Computer Music Journal, 12(2):14-26, 1988.[Tru94]B.Truax. Discoveringinnercomplexity:time-shifting andtransposi-tionwith a real-timegranulationtechnique. ComputerMusicJournal,18(2):28-38, 1994.[TruOOa] B. Truax. h t t p : //www .sfu. ca/”truax/gran.html. Granular synthesis,2000.[TruOOb] B. Truax. h t t p ://www .sfu. ca/-truax/gsample.html. Granulation ofsampled sounds, 2000.[WG94] M. Warstatand T. Gorne. Studiotechnik - HintergrundundPraxiswis-sen. Elektor-Verlag,1994.[Whig91 P. White. Creative Recording, Eflects and Processors. SanctuaryPub-lishing,1999.
254.
Chapter 8Time-frequency ProcessingD. Arfib, F. Keiler, U. Zolzer8.1 IntroductionThis chapter describes the use of time-frequency representations of signals in or-der to produce transformations of sounds. A very interesting (and intuitive)way ofmodifying a sound is to make a two-dimensional representation of it, modify thisrepresentation in some or ot,her way and reconstruct a new signal from this repre-sentation (see Fig. 8.1).Consequently a digital audioeffect based on time-frequencyrepresentations requires three steps: an analysis (sound to representation),a trans-formation (of the representa,tion) anda resynthesis (getting back to a sound).Short-timeSpectran l % nTimeFigure 8.1 Digitalaudioeffectsbased on analysis, transformation and synthesis (resyn-thesis).The direct scheme of spectral analysis, transformation and resynthesis will bediscussed in section 8.2. We will explore the modification of the magnitude IX(k)land phase cp(k) of these representations before resynthesis. The analysis/synthesis237
255.
238 8 Time-frequencyProcessingscheme is termed thephase vocoder (see Fig. 8.2). The input signalx(.) is multipliedby a slidingwindow of finite length N , which yields successive windowedsignalsegments.Thesearetransformed tothespectraldomain by FFTs. In this waya time-varying spectrum X ( n ,IC) = IX(n,IC)lejp(n,k) with k = 0,1,... ,N - 1 iscomputed for each windowed segment. The short-time spectra can be modified ortransformed for a digital audio effect. Then each modifiedspectrum is applied to anIFFT and windowed in the time domain. The windowed output segments are thenoverlapped and addedyielding the outputsignal. It is also possible to complete thistime-frequency processing by spectral processing,which is dealt with in the nexttwo chapters.Time-domain -*7 4€meK - F2 &Windowing $(0 ‘E0LLMagnitude and Phase+- T - ~ ” c , u c ! y 4IFFT +Overlap-addTime-domainFigure 8.2 Time-frequency processing based on the phase vocoder: analysis, transforma-tion and synthesis.8.2 Phase Vocoder BasicsTheconcepts of short-time Fourieranalysisandsynthesis havebeen widely de-scribed in the literature[Pori’6, Cro80, CR831. We will briefly summarize the basicsand define our notation of terms for the application to digital audioeffects.
256.
8.2 Phase Vocoder Basics 239-0.5 I I I I I I I I-1000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000h(4000-m) h(9000-m) I-1000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000STFT X(0,f).-m +STFT X(4000,f) STFT X(9000,f)t 2 2.E 0mIa -2 -2 -2D20 010 2000 4000f in Hz +0 2000 4000 0 2000 4000f in Hz + f in Hz +Figure 8.3 Sliding analysis window and short-time Fourier transform.The short-timeFourier transform (STFT) of the signal x(.) is given by02X ( n , k ) = c x(m)h(n-m)W?k, k = 0,1,.. , N - 1
257.
240 8 Time-frequencyProcessingX ( n ,IC) isacomplexnumberandrepresents the magnitude ( X ( n ,k ) ( and phasep(n, k ) of a time-varying spectrum with the frequency bin (index) 0 5 k 5 N - 1andtimeindex n. Note thatthesummationindex is m in (8.1). At each timeindex n the signal .(m) is weighted by a finite length window h(n - m).Thus thecomputation of (8.1)can be performedby a finite sum overm with anFFT of lengthN . Figure 8.3 shows the input signal .(m) and the sliding window h(n - m) forthree time indices of n. The middle plot shows the finite length windowed segments.(m) .h(n-m). These segments are transformedby the FFT yielding the short-timespectra X ( n ,k ) given by (8.1).The lower two rows in Fig. 8.3 show the magnitudeand phase spectraof the corresponding time segments.8.2.1 Filter Bank Summation ModelThe computationof the time-varying spectrum of an input signal canalso be inter-preted as a parallel bank of N bandpass filters, as shown in Fig. 8.4, with impulseresponses and Fourier transformsgiven byH k (e )Each bandpass signal yk(n) is obtained by filtering the input signal x(.) with thecorresponding bandpass filter hk(n).Since the bandpass filters arecomplex-valued,we get complex-valued output signals yk(n), which will be denoted byyk(n) = X(n,IC) = ( ~ ( n ,IC)[.ej+(nlk). (8.6)These filtering operations are performed by the convolutionsW COyk(n) = c z(m)hk(n- m ) = c z(m)h(n- m)W,(n-m)k (8.7)From (8.6) and (8.8) it is important to notice thatX(n,k ) = winkx(n,IC) = w & ~ I x ( ~ ,k ) [ e j ~ ( ~ > ~ ) (8.9)@(n,IC) = -n +cp(n,k). (8.10)Based on equations (8.7) and (8.8)two different implementations are possible, asshown in Fig. 8.4. The first implementation is the so-called complex baseband im-plementation according to (8.8). The baseband signals X ( n ,IC) (short-time Fouriertransform)arecomputed by modulation of z(n) with WGk and lowpass filteringfor each channel IC. The modulationof X ( n ,k ) by W i n kyields the bandpass signalX ( n ,IC). The second implementationis the so-called complex bandpass implementa-tion, which filters the input signal withhk(n)given by (8.4))as shown in the lower27rkN
258.
8.2 Phase VocoderBasics 2410 1 2 8time indexn0 n kBaseband BandpassFigure 8.4 Filter bank description of the short-time Fourier transform. Two implementa-tions of the kth channel are shown in the lower left part. The discrete-time and discrete-frequencyplaneisshownin the right part. The markedbandpasssignals yk(n) are thehorizontal samples X ( n ,k ) . The different frequency bands Y k corresponding to each band-passsignal are shownon top of the filterbank. The frequency bands for the basebandsignal X ( n ,k ) and the bandpass signal X(n,k ) are shown in the lower right part.left part of Fig. 8.4. This implementationleads directly to thecomplex-valued band-pass signals X(n,IC).If the equivalent baseband signals X ( n ,IC) are necessary, theycan be computed by multiplication with WGk.The operations for the modulationby WGkyielding X ( n ,IC) and back modulation by WNnk(lower left part of Fig. 8.4)are only shown to point out the equivalence of both implementations.The output sequence ?/(*a)is the sum of the bandpass signals according toN - l N - l N-ly(n) = cyk(n) = 1X ( n , k ) = cx(n,IC)WNnk. (8.11)
259.
242 8 Time-frequencyProcessingbands shown in the upper part of Fig. 8.4. The property X(n,k ) = X*(n,N - k )together with the channel stacking can be used for the formulation of real-valuedbandpass signals (real-valued kth channel)X(n,N - k ) = X(n,k ) +X*(n,C) (8.12)(8.13)+ e - M n > k )1.cos (8.14).. ,N/2 - 1.highpass channel.Besidesa dcand a highpasschannel
Be the first to comment