Chapter 1: Microphone Technology
The microphone is the front-end of almost all sound engineering
activities and, as the interface between real acoustic sound travelling in
air and the sound engineering medium of electronics, receives an
immense amount of attention. Sometimes one could think that the status
of the microphone has been raised to almost mythological proportions. It
is useful therefore to put things in their proper perspective: there are a
great many microphones available that are of professional quality.
Almost any of them can be used in a wide variety of situations to record
or broadcast sound to a professional standard. Of course different makes
and types of microphones sound different to each other, but the
differences don't make or break the end product, at least as far as the
listener is concerned.
Now, if you want to talk about something that really will make or break
the end product, that is how microphones are used. Two sound engineers
using the same microphones will instinctively position and direct them
differently and there can be a massive difference in sound quality. Give
these two engineers other mics, whose characteristics they are familiar
with, and the two sounds achieved will be identifiable according to
engineer, and not so much to according to microphone type.
There are two ways we can consider microphones, by construction and
by directional properties. Let's look at the different ways a microphone
can be made, to start off with.
There are basically three types of microphone in common use:
piezoelectric, dynamic and capacitor. The piezoelectric mic, it has to be
said, has evolved into a very specialized animal, but it is still commonly
found under the bridge of an electro-acoustic guitar so it is worth
The piezoelectric effect is where certain crystalline and ceramic materials
have the property of generating an electric current when pressure or a
bending force is applied. This makes them sensitive to acoustic vibrations
and they can produce a voltage in response to sound. Piezo mics (or
transducers as they may be called - a transducer is any device that
converts one form of energy to another) are high impedance. This means
that they can produce voltage but very little current. To compensate for
this, a preamplifier has to be placed very close to the transducer. This
will usually be inside the body of the electro-acoustic guitar. The preamp
will run for ages on a 9 volt alkaline battery, but it is worth remembering
that if an electro-acoustic guitar, or other instrument with a piezo
transducer, sounds distorted, it is almost certainly the battery that needs
replacing, perhaps after a year or more of service.
This is ‘dynamic’ as in ‘dynamo’. The dynamo is a device for converting
rotational motion into an electric current and consists of a coil of wire
that rotates inside the field of a magnet. Re-configure these components
and you have a coil of wire attached to a thin, lightweight diaphragm that
vibrates in response to sound. The coil in turn vibrates within the field of
the magnet and a signal is generated in proportion to the acoustic
vibration the mic receives. The dynamic mic is also sometimes known as
the moving coil mic, since it is always the coil that moves, not the
magnet - even though that would be possible.
The dynamic mic produces a signal that is healthy in both voltage and
current. Remember that it is possible to exchange voltage for current, and
vice versa, using a transformer. All professional dynamic mics
incorporate a transformer that gives them an output impedance of
somewhere around 200 ohms. This is a fairly low output impedance that
can drive a cable of 100 meters or perhaps even more with little loss of
high frequency signal (the resistance of a cable attenuates all frequencies
equally, the capacitance of a cable provides a path between signal
conductor and earth conductor through which high frequencies can
‘leak’). It is not necessary therefore to have a preamplifier close to the
microphone, neither does the mic need any power to operate. Examples
of dynamic mics are the famous Shure SM58 and the Electrovoice RE20.
The characteristics of the dynamic mic are primarily determined by the
weight of the coil slowing down the response of the diaphragm. The
sound can be good, particularly on drums, but it is not as crisp and clear
as it would have to be to capture delicate sounds with complete accuracy.
Dynamic microphones have always been noted for providing good value
for money, but other types are now starting to challenge them on these
There is a variation of the dynamic mic known as the ribbon microphone.
In place of the diaphragm and coil there is a thin corrugated metal ribbon.
The ribbon is located in the field of a magnet. <img
src="/graphics/coles4038.jpeg" border=0 width=69 height=114
align=RIGHT hspace=5 vspace=5 alt="">When the ribbon vibrates in
response to sound it acts as a coil, albeit a coil with only one turn. Since
the ribbon is very light, it has a much clearer sound than the conventional
dynamic, and it is reasonable to say that many engineers could identify
the sound of a ribbon mic without hesitation. If the ribbon has a problem,
it is that the output of the single-turn ‘coil’ is very low. The ribbon does
however also have a low impedance and provides a current which the
integral transformer can step up so that the voltage output of a modern
ribbon mic can be comparable with a conventional dynamic. Examples of
ribbon mics are the Coles 4038 and Beyerdynamic M130.
The capacitor mic, formerly known as the ‘condenser mic’, works in a
completely different way to the dynamic. Here, the diaphragm is
paralleled by a ‘backplate’. Together they form the plates of a capacitor.
A capacitor, of any type, works by storing electrical charge. Electrical
charge can be thought of as quantity of electrons (or the quantity of
electrons that normally would be present, but aren't). The greater the
disparity in number of electrons present – i.e. the amount of charge – the
higher will be the voltage across the terminals of the capacitor. There is
Q = C x V
charge = capacitance x voltage
Note that charge is abbreviated as ‘Q’, because ‘C’ is already taken by
Putting this another way round:
V = Q/C
voltage = charge / capacitance
Now the tricky part: capacitance varies according to the distance between
the plates of the capacitor. The charge, as long as it is either continuously
topped up or not allowed to leak away, stays constant. Therefore as the
distance between the plates is changed by the action of acoustic
vibration, the capacitance will change and so must the voltage between
the plates. Tap off this voltage and you have a signal that represents the
sound hitting the diaphragm of the mic.
Sennheiser MKH 40
The great advantage of the capacitor mic is that the diaphragm is
unburdened by a coil of any sort. It is light and very responsive to the
most delicate sound. The capacitor mic is therefore much more accurate
and faithful to the original sound than the dynamic. Of course there is a
downside too. This is that the impedance of the capsule (the part of any
mic that collects the sound) is very high. Not just high - very high. It also
requires continually topping up with charge to replace that which
naturally leaks away to the atmosphere. A capacitor mic therefore needs
power for these two reasons: firstly to power an integral amplifier, and
secondly to charge the diaphragm and backplate.
Old capacitor mics used to have bulky and inconvenient power supplies.
These mics are still in widespread use so you would expect to come
across them from time to time. Modern capacitor mics use phantom
power. Phantom power places +48 V on both of the signal carrying
conductors of the microphone cable actually within the mixing console or
remote preamplifier, and 0 V on the earth conductor. So, simply by
connecting a normal mic cable, phantom power is connected
automatically. That's why it is called ‘phantom’ – because you don't see
it! In practice this is no inconvenience at all. You have to remember to
switch in on at the mixing console but that's pretty much all there is to it.
Dynamic mics of professional quality are not bothered by the presence of
phantom power in any way, One operational point that is important
however is that the fader must be all the way down when a mic is
connected to an input providing phantom power, or when phantom power
is switched on. Otherwise a sharp crack of speaker-blowing proportions
A capacitor microphone often incorporates a switched -10 dB or -20 dB
pad, which is an attenuator placed between the capsule and the amplifier
to prevent clipping on loud signals.
The electret mic is a form of capacitor microphone. However the charge
is permanently locked into the diaphragm and backplate, just as magnetic
energy is locked into a magnet. Not all materials are suited to forming
electrets, so it is usually considered that the compromises involved in
manufacture compromise sound quality. However, it has to be said that
there are some very good electret mics available, most of which are back-
electrets, meaning that only the backplate of the capacitor is an electret
therefore the diaphragm can be made of any suitable material. Electret
mics do still need power for the internal amplifier. However, this can
take the form of a small internal battery, which is sometimes convenient.
Electret mics that have the facility for battery power can also usually be
phantom powered, in case the battery runs down or isn’t fitted.
The directional characteristics of microphones can be described in terms
of a family of polar patterns. The polar pattern is a graph showing the
sensitivity in a full 360 degree circle around the mic. I say a family of
polar patterns but it really is a spectrum with omnidirectional at one
extreme and figure-of-eight at the other. Cardioid and hypercardioid are
simply convenient way points.
To explain these patterns further, fairly obviously an omnidirectional mic
is equally sensitive all round. A cardioid is slightly less obvious. The
cardioid is most sensitive at the front, but is only 6 dB down in response
at an angle of 90 degrees. In fact it is only insensitive right at the back. It
is not at all correct, as commonly happens, to call this a unidirectional
microphone. The hypercardioid is a more tightly focussed pattern than
the cardioid, at the expense of a slight rear sensitivity, known as a lobe in
the response. The figure-of-eight is equally sensitive at front and back,
the only difference being that the rear produces an inverted signal, 180
degrees out of phase with the signal from the front.
All of this is nice in theory, but is almost never borne out in practice.
Take a nominally cardioid mic for example. It may be an almost perfect
cardioid at mid frequencies, but at low frequencies the pattern will spread
out into omni. At high frequencies the pattern will tighten into
hypercardioid. The significant knock-on effect of this is that the
frequency response off-axis – in other words any direction but head on –
is never flat. In fact the off-axis response of most microphones is nothing
short of terrible and the best you can hope for is a smooth roll-off of
response from LF to HF. Often though it is very lumpy indeed. We will
see how this affects the use of microphones at another time.
Looking at directional characteristics from a more academic standpoint,
the omnidirectional microphone is sensitive to the pressure of the sound
wave. The diaphragm is completely enclosed, apart from a tiny slow-
acting air-pressure equalizing vent, and the mic effectively compares the
changing pressure of the outside air under the influence of the sound
signal with the constant pressure within. Pressure acts equally in all
directions, therefore the mic is equally sensitive in all directions, in
theory as we said. In practice, at higher frequencies where the size of the
mic starts to become significant in comparison with the wavelength, the
diaphragm will be shielded from sound approaching from the rear and
rearward HF response will drop.
At the other end of the spectrum of polar patterns the figure-of-eight
microphone is sensitive to the pressure gradient of the sound wave. The
diaphragm is completely open to the air at both sides. Even though it is
very light and thin, there is a difference in pressure at the front and rear
of the diaphragm, and the microphone is sensitive to this difference. The
pressure gradient is greatest for sound arriving directly from the front or
rear, and lessens as the sound source moves round to the side. When the
sound source is exactly at the side of the diaphragm it produces equal
pressure at front and back, therefore there is no pressure gradient and the
microphone produces no output. Therefore the figure-of-eight
microphone is not sensitive at the sides. (You could also imagine that a
sound wave would find it hard to push the diaphragm sideways –
sometimes the intuitive explanation is as meaningful as the scientific
All directional microphones exhibit a phenomenon known as the
proximity effect or bass tip-up. The explanation for this is sufficiently
complicated to fall outside of the required knowledge of the working
sound engineer. The practical consequences are that close miking results
in enhanced low frequency. This produces a signal that is not accurate,
but it is often thought of as being ‘warmer’ than the more objectively
accurate sound of an omnidirectional microphone.
Cardioid and Hypercardioid
To produce the in-between polar patterns one could consider the
omnidirectional microphone where the diaphragm is open on one side
only, and the figure-of-eight microphone where the diaphragm is
completely open on both sides. Allowing partial access only to one side
of the diaphragm would therefore seem to be a viable means of
producing the in-between patterns, and indeed it is. A cardioid or
hypercardioid mic therefore provides access to the rear of the diaphragm
through a carefully designed acoustic labyrinth. Unfortunately the effect
of the acoustic labyrinth is difficult to equalize for all frequencies,
therefore one would expect the polar response of cardioid and
hypercardioid microphones to be inferior to that of omnidirectional and
There are many microphones available that can produce a selection of
polar patterns. This is achieved by mounting two diaphragms back-to-
back with a single central backplate. By varying the relative polarization
of the diaphragms and backplate, any of the four main polar patterns can
be created. It is often thought that the best and most accurate
microphones are the true omnidirectional and the true figure-of-eight,
and that mimicking these patterns with a multipattern mic is less then
optimal. Nevertheless, in practice multipattern mics are so versatile that
they are commonly the mic of first choice for many engineers.
Special Microphone Types
Two capsules may be combined into a single housing so that one mic can
capture both left and right sides of the sound field. This is much more
convenient than setting two mics on a stereo bar, but obviously less
flexible. Some stereo mics use the MS principle where one cardioid
capsule (M) captures the full width of the sound stage while the other
figure-of-eight capsule (S) captures the side-to-side differences. The MS
output can be processed to give conventional left and right signals.
Neumann stereo microphones
Interference Tube Microphone
This is usually known as a shotgun or rifle mic because of its similarity
in appearance to a gun barrel. The slots in the barrel allow off-axis sound
to cancel giving a highly directional response. The longer the mic, the
more directional it is. The sound quality of these microphones is inferior
to normal mics so they are only used out of necessity.
Sennheiser interference tube microphone
A close relation of the interference tube microphone is the parabolic
reflector mic. This looks like a satellite dish antenna and is used for
recording wildlife noises, and at sports events to capture comments from
Boundary Effect Microphone
The original boundary effect microphone was the Crown PZM (Pressure
Zone Microphone) so the boundary effect microphone is often referred to
generically as the PZM. In this mic, the capsule is mounted close to a flat
metal plate, or inset into a wooden or metal plate. Instead of mounting it
on a stand, it is taped to a flat surface. One of the main problems in the
use of microphones is reflections from nearby flat surfaces entering the
mic. By mounting the capsule within around 7 mm from the surface,
these reflections add to the signal in phase rather than interfering with it.
The characteristic sound of the boundary effect microphone is therefore
very clear (as long as there are no other nearby reflecting surfaces). It can
be used for many types of recording, and can also be seen in police
interview rooms where obviously a clear sound has to be captured for the
interview recording. The polar response is hemispherical.
Crown PZM microphone
This is sometimes known as a ‘tie-clip’ mic, although it is rarely ever
clipped to the tie these days. This type of mic is usually of the electret
design, which lends itself to very compact dimensions, and is almost
always omnidirectional. Miniature microphones are used in television
and in theater, where there is a requirement for microphones to be
unobtrusive. Since the diaphragm is small and not in contact with many
air molecules, the random vibration of the molecules does not cancel out
as effectively as it does in a microphone with a larger diaphragm.
Miniature microphones therefore have to be used close to the sound
source; otherwise noise will be evident.
For popular music vocals it is common to use a large-diaphragm mic,
often an old tube model. A large diaphragm mic generally has a less
accurate sound than a mic with a diaphragm 10-12 mm or so in diameter.
The off-axis response will tend to be poor. Despite this, models such as
the Neumann U87 are virtually standard in this application due to their
enhanced subjective ‘warmth’ and ‘presence’.
First in the catalogue of microphone accessories is the mic support.
These can range from table stands, short floor stands, normal boom
stands, tall stands up to 4 meters for orchestral recording, fishpoles as
used by video and film sound recordists, and long booms with cable
operated mic positioning used in television studios. Attaching the mic to
the stand is a mount that can range from a basic plastic clip, to an elastic
suspension or cradle that will isolate the microphone from floor noise.
The other major accessory is the windshield or pop-shield. A windshield
may be made out of foam and slipped over the mic capsule, or it may
look like a miniature airship covered with wind-energy dissipating
material. For blizzard conditions windshield covers are available that
look as though they are made out of yeti fur. The pop-shield, on the other
hand, is a fine mesh material stretched over a metal or plastic hoop, used
to filter out the blast of air cause by a voice artist's or singer's ‘P’ and ‘B’
• What is the piezoelectric effect?
• Where would you find a piezo-electric transducer?
• What is attached to the diaphragm of a dynamic microphone?
• What passive circuit component is incorporated in the output stage
of all professional microphones? (Note that some microphones use
an active circuit to imitate the action of this component).
• Describe the sound of a dynamic microphone.
• How does a ribbon microphone differ from an ordinary dynamic
• What is the old term for 'capacitor microphone'?
• Why does the capacitor microphone have a more accurate sound
than a dynamic microphone?
• Why does a capacitor microphone need to be powered (two
• What precaution should you take when switching on phantom
• Can dynamic microphones of professional quality be used with
phantom power switched on?
• What is a pad?
• Why does an electret microphone need to be powered?
• Describe the actual polar response of a typical nominally
• Describe the proximity effect.
• What is an 'acoustic labyrinth', as applied to microphones?
• Why does a boundary effect microphone give a clear sound?
• Why are large-diaphragm microphones used for popular music
• Describe the differences between wind shields and pop shields.
Chapter 2: The Use of Microphones
Use of Microphones for Speech
In sound engineering, as opposed to communications which will not be
considered here, there are commonly considered to be three classes of
sound: speech or dialogue, music and effects. Each has its own
considerations and requirements regarding the use of microphones.
There are a number of scenarios where speech may be recorded,
broadcast or amplified:
• Audio book
• Radio presentation, interview or discussion
• Television presentation, interview or discussion
• News reporting
• Sports commentary
• Film and television drama
In some of these, the requirement is for speech that is as natural as
possible. In an ideal world perhaps it should even sound as though a real
person were in the same room. The audio book is in this category, as are
many radio programs. There is a qualification however on the term
‘natural’. Sometimes what we regard as a natural sound is the sound that
we expect to hear via a loudspeaker, not the real acoustic sound of the
human voice. We have all been conditioned to expect a certain quality of
sound from our stereos, hifis, radio and television receivers, and when we
get it, it sounds natural, even if it isn’t in objective terms. In the recording
and most types of broadcasting of speech there are some definite
• No pops on ‘P’ or ‘B’ sounds.
• No breath noise or ‘blasting’
• Little room ambience or reverberation
• A pleasing tone of voice
Popping and blasting can be prevented in two ways. One is to position
the microphone so that it points at the mouth, but is out of the direct line
of fire of the breath. So often we see microphones used actually in the
line of fire of the breath that it seems as though it is simply the ‘correct’
way to use a microphone. It can be for public address, but it isn’t for
broadcasting or recording. The other way is to use a pop shield. Ideally
this is an open mesh stocking-type material stretched over a metal or
plastic hoop. This can be positioned between the mouth and the
microphone and is surprisingly effective in absorbing potential pops and
blasts. Sometimes a foam windshield of the type that slips over the end of
the microphone is used for this purpose. A windshield is really what it
says, and is not 100% effective for pops, although its unobtrusiveness
visually has value, for example, for a radio discussion where hoop-type
pop shields would mar face-to-face visual communication among the
The requirement for little room ambience or reverberation is handled by
placing the microphone quite close to the mouth – around 30 to 40 cm. If
the studio is acoustically treated, this will work fine. Special acoustic
tables are also available which absorb rather than reflect sound from their
‘A pleasing tone of voice’? Well, first choose your voice talent. Second,
it is a fact that some microphones flatter the voice. Some work
particularly well for speech, and there are some classic models such as
the Electrovoice RE20 that are commonly seen in this application.
Generally, one would be looking for a large-diaphragm capacitor
microphone, or a quality dynamic microphone for natural or pleasing
speech for audio books or radio broadcasting.
In television broadcasting, one essential requirement is the microphone
should be out of shot or unobtrusive. The usual combination for a news
anchor, for example, is to have a miniature microphone attached to the
clothing in the chest area, backed up by a conventional mic on a desk
stand. Often the conventional mic is held on stand-by to be brought on
quickly if the miniature mic fails, as they are prone to through constant
handling. Oddly enough, the use of microphones on television varies
according to geography. In France for example, it is quite common for a
television presenter to hand hold a microphone very close to the mouth.
Even a discussion can take place with three or four people each holding a
microphone. The resultant sound quality is in accordance with French
subjective requirements. Radio microphones are commonly used in
television to give freedom of movement and also freedom from cables on
the floor, leaving plenty of free space for the cameras to roll around
For news reporting, a robust microphone – perhaps a short shotgun – can
be used with a general-purpose foam windshield for both the reporter and
interviewee, should there be one. Such a microphone is easily pointable
(the reporter isn’t a sound engineer) and brings home good results
without any trouble. The sound quality of a news report may not be all
that could be imagined, but a little bit of harshness or degradation
sometimes, oddly, makes the report more ‘authentic’.
Sports commentary is a very particular requirement. This often takes
place in a noisy environment so the microphone must be adapted to cope
with this. The result is a mic that has a heavily compromised sound
quality, but this has come to be accepted as the sound of sports
commentary so it is now a requirement. The Coles 4104 is an example of
a 1950s design that is still widely used. It is a noise-cancelling
microphone that almost completely suppresses background noise, and the
positioning bar on the top of the mic ensures that the commentator
always holds it in the correct position (as, indeed it is always held - sports
commentators often like to move around in their commentary box as they
Film and Television Drama
For film and television drama, a fishpole (or boom as it is sometimes
known) topped by a shotgun or rifle mic with a cylindrical windshield is
the norm. The operator can position and angle the mic to get the best
quality dialogue (while monitoring on headphones), while keeping the
mic – and the shadow of the mic – out of shot. Miniature microphones
are also used in this context, often with radio transmitters. Obviously
they must not be visible at all. However, concealing the mic in the
costume can affect sound quality so care must be taken.
Sometimes in the studio a microphone might be mounted on a large floor
mounted boom that can extend over several meters (we’re not in fishing
country anymore). In this case the boom operator has winches to point
and angle the microphone.
In theatre the choice is between personal miniature microphones with
radio transmitters, or area miking from the front and sides of the stage.
Personal microphones allow a higher sound level before feedback since
they are close to the actor’s mouth. For straight drama, it isn’t necessary
to have a high sound level in the auditorium. In fact in most theatres it is
perfectly acceptable for the sound of the actors’ voices to be completely
unamplified. However if amplification, or reinforcement, is to be used
then area miking is usually sufficient. Shotgun or rifle mics are
positioned at the front of the stage (an area sometimes known for
traditional reason as ‘the floats’, therefore the mics are sometimes called
‘float mics’) to create sensitive spots on stage from which the actors can
easily be heard. The drawback is that there will be positions on the stage
from which the actors cannot be heard. The movements of the actors
have to be planned to take account of this.
I use this term loosely to cover everything from company boardrooms to
political party conferences. You will see that there can be a vast
difference in scale. In the boardroom it has become common to use
gooseneck microphones or boundary effect microphones that are
specifically designed for that purpose. This lies beyond what we
normally consider to be sound engineering and is categorized in the
specialist field of sound installation. The party conference is another
matter. To achieve reasonably high sound levels the microphone has to
be close to the mouth, yet the candidate – for obvious reasons – does not
want to look like a microphone-swallowing rock star. Therefore the
microphone has to be unobtrusive so that it can be placed fairly close to
the mouth without drawing undue attention to itself (the cluster of
broadcasters’ microphones in front of the lectern is another matter, but
they don’t have to be so close). The AKG C747 is very suitable for this
You will have noticed that in this context microphones are often used in
pairs. There are two schools of thought on this issue. One is that the
microphones should point inwards from the front corners of the lectern.
This allows the speaker to turn his or her head and still receive adequate
pickup. Unfortunately, as the head moves, both microphones can pick up
the sound while the sound source – the mouth – is moving towards one
mic and away from the other. The Doppler effect comes into play and
two slightly pitch shifted signals are momentarily mixed together. It
sounds neither pleasant nor natural. The alternative approach is to mount
both microphones centrally and use one as a backup. The speaker will
learn, through not hearing their voice coming back through the PA
system, that they can only turn so far before useful pickup is lost.
It is worth saying that in this situation, the person speaking must be able
to hear their amplified voice at the right level. If their voice seems too
loud, to them, they will instinctively back away from the mic. If they
can’t hear their amplified voice they will assume the system isn’t
working. I once saw the chairman of a large and prestigious organisation
stand away from his microphone because he thought it wasn’t working. It
had been, and at the right level for the audience. But unfortunately, apart
from the front few rows, they were unable to hear a single unamplified
word he said.
Use of Microphones for Music
The way in which microphones are used for music varies much more
according to the instrument than it possibly could for speech where the
source of sound is of course always the human mouth. First, some
• Public address
• Recording studio
• Location recording
• Concert hall
• Amplified music venue
The requirements of recording and broadcasting are very similar, except
that broadcasting often works to a more stringent timescale, and in
television broadcasting microphones must be invisible or at least
unobtrusive. There are two golden rules:
Point the microphone at the sound source from the direction of the best
natural listening position.
The microphone will always be closer than a natural comfortable
So, wherever you would normally choose to listen from is the right
position for the microphone, except that the microphone has to be closer
because it can’t discriminate direct sound from reflected sound in the
way the human ear/brain can. It is always a good starting point to follow
these two rules, but of course it may not always be possible, practical, or
a natural sound may not be wanted for whatever reason. Broadcasters, by
the way, tend to place the microphone closer than recording engineers.
They need to get a quick, reliable result, and a close mic position is
simply safer for this purpose. Ultimate sound quality is not of such
The recording studio is a very comfortable environment for microphones.
The engineer is able to use any microphone he or she desires and has
available. The mic may be old, large and ugly, cumbersome to use
perhaps with an external power supply (not phantom) and pattern
selector, prone to faults etc., but if it gets the right sound, then it will be
used. Location recording is not quite so comfortable and you need to be
sure that the microphones are reliable and easy to use, preferably without
external power supplies and with a simple stand mount rather than a
complicated elastic suspension.
As far as comfort goes, the concert hall is a reasonably good place to
record in as at least they are used to the requirements of music (the
owners of many good recording venues often have higher priorities –
religious worship being a prominent example). There are however
restrictions on the placement of microphones during a concert. Usually it
is against fire regulations to have microphones among the audience,
unless the mics are positioned in such a way that they don’t impede
egress and cables are very securely fixed. Generally therefore there will
be a stereo pair of mics slung from the ceiling, supplemented by a
number of mics on stage, which are closer than the engineer would
probably prefer them to be under ideal circumstances.
For amplified music, the problem is always in getting sufficient level
without feedback. This necessitates that microphones are very much
closer than the natural listening position, to the point that natural
direction has very little meaning. The ultimate example would be a
microphone clipped to the bridge or sound hole of a violin. It wouldn’t
even be possible to listen from this position. In rock music PA,
microphones are used as close to the singer’s lips as possible, right
against the grille cloth of a guitarist’s speaker cabinet and within
millimetres of the heads of the drums. Primarily this is to achieve level
without risk of feedback. However this has also come to be understood as
the ‘rock music sound’ because it is what the audience expects. In this
context, the most distant mics would be the drum overhead mics, which
don’t need much gain anyway. For string and wind instruments there are
a variety of clip-on mics available. There are also contact mics that pick
up vibrations directly from the body of the instrument, although even
these are not entirely immune to feedback.
In theatre musicals, the best option for the lead performers is to use
miniature microphones with radio transmitters. The placement of the mic
is significant. The original ‘lavalier’ placement, named for Mme Lavalier
who reportedly wore a large ruby from her neck, has long gone. The
chest position is great for newsreaders but it suffers from the shadow of
the chin and boominess caused by chest resonance. The best place for a
miniature microphone is on a short boom extending from behind the ear.
Mics and booms are available in a variety of flesh colours so they are not
visible to the audience beyond the second or third row. If a boom is not
considered acceptable, then the mic may protrude a short distance from
above the ear, or descending from the hairline. This actually captures a
very good vocal sound. It has to be tried to be believed. One of the
biggest problems with miniature microphones in the theatre is that they
become ‘sweated out’ after a number of performances and have to be
replaced. Still, no-one said that it was easy going on stage. For the
orchestra in a theatre musical, clip on mics are good for string
instruments. Wind instruments are generally loud enough for
conventional stand mics, closely placed. So-called ‘booth singers’ can
use conventional mics.
Stereo Microphone Techniques
Firstly, what is stereo? The word ‘stereophonic’ in its original meaning it
suggests a ‘solid’ sound image and does not specify how many
microphones, channels or loudspeakers are to be used. However, it has
come to mean two channels and two loudspeakers using as few or as
many microphones that are necessary to get a good result. When it
works, you should be able to sit in an equilateral triangle with the
speakers, listen to a recording of an orchestra and pinpoint where every
instrument is in the sound image. (By the way, some people complain
that ‘stereophonic’, as a word, combines both Greek and Latin roots. Just
as well perhaps, because if it had been exclusively Latin it would have
When recording a group of instruments or singers, it is possible to use
just two or three microphones to pick up the entire ensemble in stereo,
and the results can be very satisfying. There are a number of techniques:
• Coincident crossed pair
• Near-coincident crossed pair
• Mercury Living Presence
• Decca Tree
• Spaced omni
The coincident crossed pair technique traditionally uses two figure-of-
eight microphones angled at 90 degrees pointing to the left and right of
the sound stage (and, due to the rear pickup of the figure-of-eight mic, to
the left and right of the area where the audience would be also). More
practically, two cardioid microphones can be used. They would be angled
at 120 degrees were it not for the drop off in high frequency response at
this angle in most mics. A 110-degree angle of separation is a reasonable
compromise. This system was originally proposed in the 1930s and
mathematically inclined audio engineers will claim that this gives perfect
reproduction of the original sound field from a standard pair of stereo
loudspeakers. However perfect the mathematics look on paper, the results
do not bear out the theory. The sound can be good, and you can with
effort tell where the instruments are supposed to be in the sound image.
The problem is that you just don’t feel like you are in the concert hall, or
wherever the recording was made. The fact that human beings do not
have coincident ears might have something to do with it.
Coincident crossed pair
Separating the mics by around 10 cm tears the theory into shreds, but it
sounds a whole lot better.
Near-coincident crossed pair
The ORTF system, named for the Office de Radiodiffusion Television
Francaise, uses two cardioid microphones spaced at 17 cm angled
outwards at 110 degrees, and is simply an extended near-coincident
The redeeming feature of the coincident crossed pair is that you can mix
the left and right signals into mono and it still sounds fine. Mono, but
fine. We call this mono compatibility and it is important in many
situations – the majority of radio and television listeners still only have
one speaker. The further apart the microphones are spaced, the worse the
mono compatibility, although near-coincident and ORTF systems are still
Mercury Living Presence was one of the early stereo techniques of the
1950s, used for classical music recordings on the Mercury label. If you
imagine trying to figure out how to make a stereo recording when there
was no-one around to tell you how to do it, you might work out that one
microphone pointing left, another pointing center and a third pointing
right might be the way to do it. Record each to its own track on 35mm
magnetic film, as used in cinema audio, and there you have it! Nominally
omnidirectional microphones were used, but of course the early omni
mics did become directional at higher frequencies. Later recordings were
made to two-track stereo. These recordings stand up remarkable well
today. They may have a little noise and distortion, but the sound is
wonderfully clear and alive.
The same can be said of the Decca tree, used by the Decca record
company. This is not dissimilar from the Mercury Living Presence
system but baffles were used between the microphones in some instances
to create separation, and additional microphones might be used where
necessary, positioned towards the sides of the orchestra.
Another obvious means of deploying microphones in the early days of
stereo was to place three microphones spaced apart at the front of the
orchestra, much more distant from each other than in the above systems.
If only two microphones are used spaced apart by perhaps as much as
two meters or more, what happens on playback is that the sound seems to
cluster around the loudspeakers and there is a hole in the middle of the
sound image. To prevent this, a centre microphone can be mixed in at a
lower level so that the ‘hole’ is filled. There is no theory on earth to
explain why this works - being so dissimilar to the human hearing system
- but it can work very well. The main drawback is that a recording made
in such a way sounds terrible when played in mono.
The MS system, as explained previously, uses a cardioid microphone to
pick up an all-round mono signal, and a figure-of-eight mic to pick up the
difference between left and right in the sound field. The M and S signals
can be combined without too much difficulty to provide conventional left
and right signals. This is of practical benefit when it is necessary to
record a single performer in stereo. With a coincident crossed pair, one
microphone would be pointing to the left of the performer, the other
would be pointing to the right. It just seems wrong not to point a
microphone directly at the performer, and with the MS system you do,
getting the best possible sound quality from the mic. It is sometimes
proposed as an advantage of MS than it is possible to control the width of
the stereo image by adjusting the level of the S signal. This is exactly the
same as adjusting the width by turning the mixing console’s panpots for
the left and right signals closer to the centre. Therefore it is in reality no
advantage at all.
Binaural stereo attempts to mimic the human hearing system with a
dummy head (sometimes face, shoulders and chest too) with two
omnidirectional microphones placed in artificial ears just like a real
human head. It works well, but only on headphones. A binaural recording
played on speakers doesn’t work because the two channels mix on their
way to the listener, spoiling the effect. There have been a number of
systems attempting to make binaural recordings work on loudspeakers
but none has become popular.
In addition to the stereo miking system, it is common to mic up every
section of an orchestra, whether it is a classical orchestra, film music, or
the backing for a popular music track. Normally the stereo mic system,
crossed pair or whatever, is considered the main source of signal, with
the other microphones used to compensate for the distance to the rear of
the orchestra, and to add just a little presence to instruments where
appropriate. Sectional mics shouldn’t be used to compensate for poor
balance due to the conductor or arranger. Sometimes however classical
composers don’t get the balance quite right and it is not acceptable to
change the orchestration. A little technical help is therefore called for.
We come back to the two golden rules of microphone placement, as
above. It is worth looking at some specific examples:
There are two fairly obvious ways a saxophone can be close miked. One
is close to the mouthpiece, another is close to the bell. The difference in
sound quality is tremendous. The same applies to all close miking. Small
changes in microphone position can affect the sound quality enormously.
There are many books and texts that claim to tell you how and where to
position microphones for all manner of instruments, but the key is to
experiment and find the best position for the instrument – and player –
you have in front of you. Experience, not book learning, leads to success.
Of the two saxophone close miking positions, neither will capture the
natural sound of the instrument, if that’s what you want. Close mic
positions almost never do. If you move the mic further away, up to
around a meter, you will be able to capture the sound of the whole of the
instrument, mouthpiece, bell, the metal of the instrument, and the holes
that are covered and uncovered during the normal course of playing. Also
as you move away you will capture more room ambience, and that is a
compromise that has to be struck. Natural sound against room ambience.
Specifically the grand piano – it is common to place the microphone (or
microphones) pointing directly at the strings. Oddly enough no-one ever
listens from this position and it doesn’t really capture a natural sound, but
it might be the sound you want. The closer the microphones are to the
higher strings, the brighter the sound will be. You can position the
microphones all the way at the bass end of the instrument, spaced apart
by maybe 30 cm, and a rich full sound will be captured. Move the
microphones below the edge of the case and angle them so that they pick
up reflected sound from the lid and a more natural sound will be
discovered. You can even place a microphone under a grand piano to
capture the vibration of the soundboard. It can even sound quite good,
but listen out for noise from the foot pedals.
The conventional setup is one mic per drum, a mic for the hihat perhaps,
and two overhead mics for the cymbals. Recording drums is an art form
and experience is by far the best guide. There are some points to bear in
You can’t get a good recording of a poor kit, particularly cymbals, or a
kit that isn’t well set up. It is often necessary to damp the drums by
taping material to the edge of the drum head to get a shorter, more
The mics have to be placed where the drummer won’t hit them, or the
Dynamic mics generally sound better for drums, capacitor mics for
The kick drum should have its front head removed, or there should be a
large hole cut out so that a damping blanket can be placed inside.
Otherwise it will sound more like a military bass drum than the dull thud
that we are used to. The choice of beater – hard or soft - is important, as
is the position of the kick drum mic either just outside, or some distance
inside the drum.
The snares on the underside of the snare drum may rattle when other
drums are being played. Careful adjustment of the tension of the snares is
necessary, and perhaps even a little damping.
Microphones should be spaced as far apart from each other as possible
and directed away from other drums. Every little bit helps as the
combination of two mics picking up the same drum from different
distances leads to cancellation of groups of frequencies. The brute force
technique is to use a noise gate on every microphone channel, and this is
commonly done. Noise gates will be covered later.
Perhaps this is a brief introduction to the use of microphones, but it’s a
start. And to round off I’ll give away the secret of getting good sound
from your microphones:
• What problem is commonly found in live sports commentary?
• What does a fishpole operator concentrate on while working?
• In theater, what is 'area miking'?
• How is feedback avoided in live sound (the simplest technique)?
• Why must the speaker at a conference hear his or her own
amplified voice at the right level?
• Write down, copy if you wish, the two golden rules for
• Why do microphones have to be placed closer than a natural
• Where are personal mics worn in the theater?
• What is stereo?
• Describe the coincident crossed pair.
• What is the benefit of separating the microphones (relate this to the
human hearing system)?
• What is the value of mono compatibility?
• Why is it desirable to mic up every section of an orchestra
• Pick an instrument other than those mentioned in the text. Describe
the effect of two alternative close miking positions.
• When you look at a grand piano, performed solo, on stage, does
the pianist sit on the left or the right? Why?
• Why do drums often need to be damped?
Chapter 3: Loudspeaker Drive Units
Loudspeakers are without doubt the most inadequate component of the
audio signal chain. Everything else, even the microphone, is as close to
the capabilities of human hearing as makes hardly any difference at all.
However, amplify the signal and convert it back into sound and you will
know without any hesitation whatsoever that you are listening to a
loudspeaker, not a natural sound source.
Loudspeakers can be categorized by method of operation and by
• Method of operation:
• Moving coil
• Direct radiator
In this context we will use ‘PA’ to mean concert public address rather
than announcement systems that are beyond the scope of this text.
The moving coil loudspeaker, or I should say ‘drive unit’ as this is only
one component of the complete system, is the original and still most
widely used method of converting an electric signal to sound. The
components consist of a magnet, a coil of wire (sometimes called the
‘voice coil’) positioned within the field of the magnet and a diaphragm
that pushes against the air. When a signal is passed through the coil, it
creates a magnetic field that interacts with the field of the permanent
magnet causing motion in the coil and in turn the diaphragm. It is
probably fair to say that 99.999% of the loudspeakers you will ever come
across use moving coil drive units.
The electrostatic loudspeaker (and this time it is a loudspeaker rather than
just a drive unit) uses electrostatic attraction rather than magnetism. The
electrostatic loudspeaker has the most natural sound quality, but is not
capable of high sound levels. Hence it is rarely used in professional audio
outside of, occasionally, classical music recording.
A moving coil drive unit can be constructed as either a direct radiator or
a horn. In a director radiator drive unit, the diaphragm pushes directly
against the air. This is not very efficient as the diaphragm and the air
have differing acoustic impedance, which creates a barrier for the sound
to cross. A horn makes the transition from vibration in the diaphragm to
vibration in the open air more gradual, therefore it is more efficient, and
for a given input power the horn will be louder.
Let's look at these in more detail:
Moving Coil Drive Unit
Perhaps the best place to start is a 200 mm drive unit intended for low
and mid frequency reproduction. This isn't the biggest drive unit
available, so why are larger drive units ever necessary? The answer is to
achieve a higher sound level. A 200 mm drive unit only pushes against so
much air. Increase the diameter to 300 mm or 375 mm and many more
air molecules feel the impact. The next question would be, why are 300
mm or 375 mm drive units not used more often, when space is available?
The answer to that is in the behavior of the diaphragm:
The diaphragm must not bend in operation otherwise it will produce
distortion. It is sometimes said that the diaphragm should operate as a
The diaphragm could be flat and still produce sound. However, since the
motor is at the center and vibrations are transmitted to the edges, the
diaphragm needs to be stiff. The cone shape is the best compromise
between stiffness and large diameter.
High frequencies will tend to bend the diaphragm more than low
frequencies. It takes a certain time for movement of the coil to propagate
to the edge of the diaphragm. Fairly obviously, at high frequencies there
isn't so much time and at some frequency the diaphragm will start to
deviate from the ideal rigid piston.
200 mm is a good compromise. It will produce enough level at low
frequency for the average living room, and it will produce reasonably
distortion-free sound up to around 4 kHz or so. When the diaphragm
bends, it is called break up, due to the vibration ‘breaking up’ into a
number of different modes. ‘Break up’, in this context, doesn't mean
severe distortion or anything like that. In fact most low frequency drive
units are operated well into the break up region. It is up to the designer to
ensure that the distortion created doesn't sound too unpleasant. By the
way, it is often thought that a larger drive unit will operate down to lower
frequencies. This isn't quite the right way to look at it. Any size of drive
unit will operate down to as low a frequency as you like, but you need a
big drive unit to shift large quantities of air at low frequency. At high
frequency, the drive unit vibrates backwards and forwards rapidly,
moving air on each vibration. At low frequencies there are fewer
opportunities to move air, therefore the area of the drive unit needs to be
greater to achieve the desired level.
The material of the diaphragm has a significant effect on its stiffness.
Early moving coil drive units used paper pulp diaphragms, which were
not particularly stiff. Modern drive units use plastic diaphragms, or pulp
diaphragms that have been doped to stiffen them adequately. Of course,
the ultimate in stiffness would be a metal diaphragm. Unfortunately, it
would be heavy and the drive unit would be less efficient. Carbon fiber
diaphragms have also been used with some success. (It is worth noting
that in drive units used for electric guitars, the diaphragm is designed to
bend and distort. It is part of the sound of the instrument and a distortion-
free sound would not meet a guitarist's requirements).
Moving up the frequency range: as we have said, the diaphragm will
bend and produce distortion. Even if it didn't, there would still be the
problem that a large sound source will tend to focus sound over a narrow
area, becoming narrower as the frequency increases. In fact, this is the
characteristic of direct radiator loudspeakers: that their angle of coverage
decreases as the frequency gets higher. This is significant in PA, where a
single loudspeaker has to cover a large number of people. (It is perhaps
counter-intuitive that a large sound source will focus the sound, but it is
certainly so. A good acoustics text will supply the explanation).
Because of these two factors, higher frequencies are handled by a smaller
drive unit. A smaller diaphragm is more rigid at higher frequencies, and
because it is smaller it spreads sound more widely. Often the diaphragm
is dome shaped rather than conical. This is part of the designer's art and
isn't of direct relevance to the sound engineer, as long as it sounds good.
It might be stating the obvious at this stage, but a low frequency drive
unit is commonly known as a woofer, and a high frequency drive unit as
In loudspeakers where a low frequency drive unit greater than 200 mm is
used, it will not be possible to use the woofer up to a sufficiently high
frequency to hand over directly to the tweeter. Therefore a mid frequency
drive unit has to be used (sometimes known as a squawker!). The
function of dividing the frequency band among the various drive units is
handled by a crossover, more on which later.
There are two ways in which a moving coil drive unit may be damaged.
One is to drive it at too high a level for too long. The coil will get hotter
and hotter and eventually will melt at one point, breaking the circuit
(‘thermal damage’). The drive unit will entirely cease to function. The
other is to ‘shock’ the drive unit with a loud impulse. This can happen if
a microphone is dropped, or placed too close to a theatrical pyrotechnic
effect. The impulse won't contain enough energy to melt the coil, but it
may break apart the turns of the coil, or shift it from its central position
with respect to the magnet (‘mechanical damage’). The drive unit will
still function, but the coil will scrape against the magnet producing a very
harsh distorted sound. Many drive units can be repaired, but of course
damage is best avoided in the first place. The trick is to listen to the
loudspeaker. It will tell you when it is under stress if you listen carefully
One common question regarding damage to loudspeakers is this: What
should the power of the amplifier be in relation to the rated power of the
loudspeaker? In fact, although the power of an amplifier can be measured
very accurately, the capacity of a loudspeaker to soak up this power is
only an intelligent guess, at best. During the design process, the
manufacturer will test drive units to destruction and arrive at a balance
between a high rating (in watts) that will impress potential buyers, and a
low number of complaints from people who have pushed their purchases
too hard. The rating on the cabinet is therefore only a guide. To get the
best performance from a loudspeaker, the amplifier should be rated
higher in terms of watts. It wouldn't be unreasonable to connect a 200 W
amplifier to a 100 W speaker, and it won't blow the drive units unless you
push the level too high. It is up to the sound engineer to control the level.
Suppose, on the other hand, that a 100 W amplifier was connected to a
200 W loudspeaker (two-way, with woofer and tweeter). The sound
engineer might push the level so high that the amplifier started to clip.
Clipping produces high levels of high frequency distortion. In a 200 W
loudspeaker, the tweeter could be rated at as little as 20-30 W, as under
normal circumstances that is all it would be expected to handle. But
under clipping conditions the level supplied to the tweeter could be
massively higher, and it will blow.
Drive units and complete loudspeaker systems are also rated in terms of
their impedance. This is the load presented to the amplifier, where a low
impedance means the amplifier will have to deliver more current, and
hence ‘work harder’. A common nominal impedance is 8 ohms.
‘Nominal’ means that this is averaged over the frequency range of the
drive unit or loudspeaker, and you will find that the actual impedance
departs significantly from nominal according to frequency. Normally this
isn't particularly significant, except in two situations:
At some frequency the impedance drops well below the nominal
impedance. The power amplifier will be called upon to deliver perhaps
more power than it is capable of, causing clipping, or perhaps the
amplifier might even go into protection mode to avoid damage to itself.
The output impedance of a power amplifier is very low – just a small
fraction of an ohm. You could think of the output impedance of the
amplifier in series with the impedance of the loudspeaker as a potential
divider. Work out the potential divider equation with R1 equal to zero
and you will see that the output voltage is equal to the input voltage.
However, give R1 some significant impedance, as would happen with a
long run of loudspeaker cable, and you will see a voltage loss. Make R2 -
the loudspeaker impedance - variable with frequency and you will now
see a rather less than flat frequency response.
To be honest, the above points are not always at the forefront of the
working sound engineer's mind, but they are significant and worth
• What is the difference between the terms 'loudspeaker' and 'drive
• How does a moving coil drive unit work?
• Comment on the two qualities of an electrostatic loudspeaker.
• What is a director radiator drive unit?
• What is the function of a horn?
• Why are drive units larger than 200 mm sometimes used?
• What is meant by the phrase 'rigid piston'?
• Why is the diaphragm of a moving coil loudspeaker normally cone
• Why does the diaphragm bend more at higher frequencies?
• What is 'break up'?
• Does breakup occur in a woofer in normal operation?
• Why should a guitar drive unit distort intentionally?
• Comment on the 'beaming' effect of a large drive unit.
• When is a separate midrange drive unit necessary?
• Comment on the two damage modes of moving coil drive units.
• If a loudspeaker is rated at 100 W, what should be the power of the
amplifier, according to the text?
Chapter 4: Loudspeaker Systems
The moving coil drive unit is as open to the air at the rear as it is to the
front, hence it emits sound forwards and backwards. The backward-
radiated sound causes a problem. Sound diffracts readily, particularly at
low frequencies, and much of the energy will 'bend' around to the front.
Since the movement of the diaphragm to the rear is in the opposite
direction to the movement to the front, this leaked sound is inverted (or
we can say 180 degrees out of phase) and the combination of the two will
tend to cancel each other out. This occurs at frequencies where the
wavelength is larger than the diameter of the drive unit. For a 200 mm
drive unit the frequency at which cancellation would start to become
significant is 1700 Hz, the cancellation getting worse at lower
The simple solution to this is to mount the drive unit on a baffle. A baffle
is simply a flat sheet of wood with a hole cut out for the drive unit.
Amazingly, it works. But to work well down to sufficiently low
frequencies it has to be extremely large. The wavelength at 50 Hz, for
example, is almost 7 meters. The baffle can be folded around the drive
unit to create an open back cabinet, which you will still find in use for
electric guitar loudspeakers. The drawback is that the partially enclosed
space creates a resonance that colors the sound.
The logical extension of the baffle and open back cabinet is to enclose
the rear of the drive unit completely, creating an infinite baffle. It would
now seem that the rear radiation is completely controlled. However, there
The diaphragm now has to push against the air 'spring' that is trapped
inside the cabinet. This present significant opposition to the motion of the
Sound will leak through the cabinet walls anyway.
The cabinet will itself vibrate and is highly unlikely to operate anything
like a rigid piston or have a flat frequency response. (Of course, this
happens with the open back cabinet too).
At this point it is worth saying that the bare drive unit is often used in
theater sound systems where there is a need for extreme clarity in the
human vocal range. Low frequencies can be bolstered with conventional
Despite these problems, careful design of the drive unit to balance the
springiness of the trapped air inside the cabinet against the springiness of
the suspension can work wonders. The infinite baffle, properly designed,
is widely regarded as the most natural sounding type of loudspeaker
(electrostatics excepted). The only real problem is that the compromises
that have to be made to make this design work result in poor low
Points of order:
'Springiness' is more properly known as compliance.
Another term for 'infinite baffle' is acoustic suspension.
You would need a very deep understanding of loudspeakers (starting
with the Thiele-Small parameters of drive units) to be able to design a
loudspeaker that would work well for studio or PA use. Electric guitar
loudspeakers are not so critical.
The next step in cabinet design is the bass reflex enclosure. You will
occasionally hear of this as a ported or vented cabinet.
The bass reflex cabinet borrows the theory of the Helmholtz resonator. A
Helmholtz resonator is nothing more than an enclosed volume of air
connected to the outside world by a narrow tube, called the port. The port
can stick out of the enclosure as in a beer bottle - a perfect example of the
principle - or inwards. The small plug of air in the port bounces against
the compliance of the larger volume of air inside and resonates readily.
Try blowing across the top of the beer bottle (when empty) and you will
The Helmholtz resonator can be designed via a relatively simple formula
to have any resonant frequency you choose. In the case of the bass reflex
enclosure, the resonant frequency is set just at the point where an
equivalently sized infinite baffle would be losing low end response.
Thus, the resonance of the enclosure can assist the drive unit just at the
point where its output is weakening, this extending the low frequency
There is of course a cost to this. Whereas an infinite baffle loudspeaker
can be designed with a low-Q resonance, meaning essentially that when
the input ceases the diaphragm returns straight away to its rest position,
in a bass reflex loudspeaker the drive unit will overshoot the rest position
and then return. Depending on the quality of the design, it may do this
more than once creating an audible resonance. This can result in so-
called 'boomy' bass, which is generally undesirable. Additionally, a
loudspeaker with boomy bass will tend to translate any low frequency
energy into output at the resonant frequency. This a carefully tuned and
recorded kick drum will come out as a boom at the loudspeaker's
resonant frequency. The competent loudspeaker designer is in control of
this and a degree boominess will be balanced against a subjectively
'good' - if not accurate - bass response.
There are other cabinet designs, notably the transmission line, but these
are not generally within the scope of professional sound engineer so they
will be excluded from this text.
We have covered horns to some degree already. There is a whole theory
to horns that deserves consideration, but here we will simply list some of
Whereas a direct radiator drive unit may be only 1% efficient (i.e. 100 W
of electrical power converts to just 1 W of sound power), a horn drive
unit may be up to 5% efficient.
The air in the throat of the horn becomes so compressed at high levels
that significant distortion is produced. However, some people - including
the writer of this text! - can on occasion find the distortion quite pleasant.
To make any significant difference to the efficiency of a loudspeaker at
low frequencies, the length and area of the horn have to be very large.
However, folded horn cabinets can be constructed that make enough of a
difference to be worthwhile. These are sometimes known as 'bass bins'.
The most important application of the horn is in high quality PA systems
such as those used for theater musicals. The problem in theater musicals
is that the sound has to be intelligible otherwise the story won't be
understood by the audience (many of whom in a London West End
theater would be European tourists who wouldn't have English as their
first language). Also, the whole of the auditorium has to be covered with
high quality sound.
if director radiator loudspeakers were used in the theater, then people
who were on-axis would received good quality sound. Those members of
the audience who were further from the 'straight ahead' position would
received lower levels at high frequency and therefore a duller sound. The
solution is the constant directivity horn. (More information on
directivity...). The shape of the curvature of the horn can be one of any
number of mathematical functions, or even just an arbitrary shape. With
careful calculation and design it is possible to produce a constant
directivity horn which has an even frequency response over an angle of
up to 60 degrees. This means that one loudspeaker can cover a sizable
section of the audience, all with pretty much the same quality of sound.
This leads to the concept of the center cluster loudspeaker system that is
widely used wherever intelligibility is a prime requirement in a PA
system. A number of constant directivity horn loudspeakers are arrayed
so that where the coverage of one is just starting to fall off, the adjacent
loudspeaker takes over. Next time you are in a theater, or large place of
worship, with a quality sound system, take a look at the loudspeakers.
Apart from any loudspeakers that are dedicated to bass, where
directionality isn't significant, there should be one cabinet pointing
almost directly at you, plus or minus 30 degrees or so, and there should
be no other loudspeaker pointing at you from any other location in the
building, other than for special theatrical effects. There will be more on
this when we cover PA system specifically.
The function of the crossover is to separate low, mid and high
frequencies according to the number of drive units in the loudspeaker. A
crossover can be passive or active. A passive crossover is generally
internal to the cabinet and consists of a network capacitors, inductors and
resistors. Having no active components, it doesn't need to be powered.
An active crossover on the other hand does contain transistors or ICs and
requires mains power. It sits between the output of the mixing console
and a number of power amplifiers - one for each division of the
frequency band. A system with a three-band active crossover would
require three power amplifiers.
Crossovers have two principal parameter sets: the cut off frequencies of
the bands, and the slopes of the filters. It is impractical, and actually
undesirable, to have a filter that allows frequencies up to, say, 4 kHz to
pass and then cut off everything above that completely. So frequencies
beyond the cutoff frequency (where the response has dropped by 3 dB
from normal) are rolled off at a rate of 6, 12, 18 or 24 dB per octave. In
other words, in the band of frequencies where the slope has kicked in, as
the frequency doubles the response drops by that number of decibels. The
slopes mentioned are actually the easy ones to design. A filter with a
slope of, say, 9 dB per octave would be much more complex.
As it happens, a slope of 6 dB per octave is useless. High frequencies
would be sent to the woofer at sufficient level that there would be audible
distortion due to break up. Low frequencies would be sent to the tweeter
that could damage it. 12 dB/octave is workable, but most systems these
days use 18 dB/octave or 24 dB/octave. There are issues with the phase
response of crossover filters that vary according to slope, but this is an
advanced topic that few working sound engineers would contemplate to
any great extent.
Passive crossovers have a number of advantages:
• Usually matched by the loudspeaker manufacturer to the
requirements of the drive units
• And the disadvantages:
• Not practical to produce a 24 dB/octave slope
• Can waste power
• Not always accurate & component values can change over time
Likewise, active crossovers have advantages:
• Cutoff frequency and slope can be varied
• Power amplifier connects directly to drive unit - no wastage of
power & better control over diaphragm motion
• Limiters can be built into each band to help avoid blowing drive
And the disadvantages:
• It is possible to connect the crossover incorrectly and send LF to
the HF driver and vice versa.
• A third-party unit would not compensate for any deficiencies in the
Some loudspeaker systems come as a package with a dedicated
loudspeaker control unit. The control unit consists of three components:
• Equalizer to correct the response or each drive unit
• Sensing of voltage (and sometimes) current to ensure that each
drive unit is maximally protected
Use of Loudspeakers
As mentioned earlier, there are four main usage areas of loudspeakers:
domestic, hi-fi, studio and PA. We will skip non-critical domestic usage
and move directly on to hi-fi. The hi-fi market is significant in that this is
where we will find the very best sounding loudspeakers. The living room
environment is generally fairly small, and listening levels are generally
well below what we call 'rock and roll'. This means that the loudspeaker
can be optimized for sound quality, and the best examples can be very
satisfying to listen to with few objectionable features, although it still has
to be said that moving coil loudspeakers always sound like loudspeakers
and never exactly like the original sound source.
Recording studio main monitors have to be capable of higher sound
levels. For one thing, the producer, engineer and musicians might just
like to monitor at high level, although for the sake of their hearing they
should not do this too often. Another consideration is that the
acoustically treated control room will absorb a lot of the loudspeaker's
energy, so that any given loudspeaker would seem quieter than it would
in a typical living room. It is generally true that a loudspeaker that is
optimized for high levels won't be as accurate as one that has been
optimized for sound quality. PA speakers are the ultimate example of
this. There has been a trend over the last couple of decades for PA
speakers to be smaller and hence more cost effective to set up. This has
resulted in an intense design effort to make smaller loudspeakers louder.
Obviously the quality suffers. If you put an expensive PA loudspeaker
next to a decent hi-fi loudspeaker in a head-to-head comparison at a
moderate listening level, the hi-fi loudspeaker will win easily.
The most fascinating use of loudspeakers is the near field monitor. Near
field monitors are now almost universally used in the recording studio for
general monitoring purposes and for mixing. This would seem odd
because twenty-five years ago anyone in the recording industry would
have said that studio monitors have to be as good as possible so that the
engineer can hear the mix better than anyone else ever will. That way, all
the detail in the sound can be assessed properly and any faults or
deficiencies picked up. Mixes were also assessed on tiny Auratone
loudspeakers just to make sure they would sound good on cheap
domestic systems, radios or portables.
That was until the arrival of the Yamaha NS10 - a small domestic
loudspeaker with a dreadful sound. It must have found its way into the
studio as cheap domestic reference. A slightly upmarket Auratone if you
like. However, someone must have used it as a primary reference for a
mix, and found that by some magical an indefinable means, the NS10
made it easier to get a great mix - and not only that but a mix that would
'travel well' and sound good on any system. The NS10 and later NS10M
are now no longer in production, but every manufacturer has a nearfield
monitor in their range. Some actually now sound very good, although
their bass response is lacking due to their small size. The success if
nearfield monitoring is something of a mystery. It shouldn't work, but the
fact is that it does. And since so little is quantifiable, the best
recommendation for a nearfield monitor is that it has been used by many
engineers to mix lots of big-selling records. That would be the Yamaha
• What problem is caused by sound coming from the rear of the
• What is a baffle?
• How large does a baffle have to be to work well at low
• What is an 'open back' cabinet?
• What is an 'infinite baffle' cabinet?
• What problem in an infinite baffle cabinet is caused by the trapped
• What is 'compliance'?
• What is a 'bass reflex' enclosure?
• What is the advantage of a bass reflex loudspeaker compared to an
• What is the disadvantage of a bass reflex loudspeaker compared to
an infinite baffle?
• Briefly describe a horn drive unit in comparison with a direct
radiator drive unit.
• What is the advantage of the horn regarding efficiency?
• What is the (greater) advantage of the constant directivity horn?
• What is a 'center cluster'?
• What is meant by the 'slope' of a crossover?
• Contrast some of the principal features of active and passive
• Comment on the use of nearfield monitors
Chapter 5: Analog Recording
Contrary to what you might read in home recording magazines, analog
recording is not dead. Top professional studios still have analog recorders
because they have a sound quality that digital just can't match. This isn't
really to say that they sound better; in fact their faults are easily
quantifiable, but their sound is often said to be 'warm', and it is often true
to say that it is easier to mix a recording made on analog than it is to mix
a digital multitrack recording. The other useful feature of analog
recorders is that they are universal. You can take a tape anywhere and
find a machine to play it on. As digital formats become increasingly
diverse, individual studios become more and more isolated with audio
being subject to an often complex export process to transfer it from one
studio's system to another. With tape, you just mount the reel on the
recorder and press play.
Magnetic tape recording was invented in the early years of the Twentieth
Century and became useful as a device for recording speech, but simply
for the information content, as in a dictation machine - the sound quality
was too poor. In essence, a tape recorder converts an electrical signal to a
magnetic record of that signal. Electricity is an easy medium to work in,
compared to magnetism. It is straightforward to build an electrical device
that responds linearly to an input. As we saw earlier, 'linear' means
without distortion - like a flat mirror compared (linear) to a funfair mirror
(non-linear). Magnetic material does not respond linearly to a
magnetizing force. When a small magnetizing force is applied, the
material hardly responds at all. When a greater magnetizing force is
applied and the initial lack of enthusiasm to become magnetized has been
overcome, then it does respond fairly linearly, right up to the point where
it is magnetized as much as it can be, when we say that it is 'saturated'.
Unfortunately, no-one has devised a way of applying negative feedback
to analog recording, which in an electrical amplifier reduces distortion
Early tape recorders (and wire recorders) had no means of compensating
for the inherent non-linearity of magnetic material, and it was left up to
scientists in Germany during World War II to come up with a solution.
The tape recorder was apparently used to broadcast orchestral concerts at
all hours of day and night, to the consternation of opposing countries who
wondered how Germany could spare the resources to have orchestras
playing in the middle of the night. (Obviously, recording onto disc was
possible, but the characteristic crackle always gave the game away).
After hostilities had ceased, US forces brought some captured machines
back home and development continued from that point. There is a lot of
history to the analog recorder, which we don't need here, but it is
certainly interesting as the development of the tape recorder coincides
with the development of recording as we know it now.
The Sound of Analog
There are three characteristic ingredients of the analog sound:
• Modulation noise
The invention that transformed the analog tape recorder from a dictation
machine to a music recording device, during the 1940s, was AC bias.
Since the response of tape to a small magnetizing force is very small, and
the linear region of the response only starts at higher magnetic force
levels, a constant supporting magnetic force, or bias, is used to overcome
this initial resistance. Prior to AC bias, DC bias was used courtesy of a
simple permanent magnet. However, considerable distortion remained.
AC bias uses a high frequency (~100 kHz) sine wave signal mixed in
with the audio signal to 'help' the audio signal get into the linear region
which is relatively distortion-free. This happens inside the recorder and
no intervention is required on the part of the user. However the level of
the bias signal has to be set correctly for optimum results. In traditional
recording, this is the job of the recording engineer before the session
starts. It has to be said that line up is an exacting procedure and many
modern recording engineers have so much else to think about (their
digital transfers!) that line-up is better left to specialists.
Despite AC bias, analog recording produces a significant amount of
distortion. The higher the level you attempt to record on the tape, the
more the distortion. It isn't like an amplifier or digital recorder where the
signal is clean right up to 0 dBFS, then harsh clipping takes place. The
distortion increases gradually from barely perceptible to downright
unpleasant. Most analog recordings peak at a level that will produce
around 1% distortion, which is very high compared to any other type of
equipment. At 3%, most engineers will be thinking about backing off.
More is unacceptable. It may not sound promising to use a medium that
produces so much distortion, but the fact is that it actually sounds quite
pleasant! It is also different in character than vacuum tube (valve)
distortion so it is an additional tool in the recording engineer's toolkit.
As well as producing more distortion than any other type of audio
equipment, the analog tape recorder produces more noise too - a signal to
noise ratio of around 65 dB is about the best you can hope for and
represents the state of the art since tape recorders matured around the
early 1970s. It is debatable whether noise is a desirable component of
analog recording, but it is certainly a feature. Noise isn't really the ogre it
is made out to be. If levels are set correctly to maximize the use of the
available dynamic range up to the 1% or 3% distortion point, then there
is no reason why it should be troublesome in the final mix, although
some 'noise management' will be necessary of the part of the mix
There have been digital 'analog simulators', but to my ears, unless this
aspect of the character of analog recorders is simulated, they just don't
same the same. Modulation noise is noise that changes as the signal
changes, and has two causes. One is Barkhausen noise which is produced
by quantization of the magnetic domains (a gross over-simplification of a
phenomenon that would take too much understanding for the working
sound engineer to bother with). The other - more significant - cause of
modulation noise is irregularities in the speed of tape travel. These
irregularities are themselves caused by eccentricity and roughness in the
bearings and other rotating parts, and by the tape scraping against the
static parts. We some times hear of the term 'scrape flutter', which creates
modulation noise, and the 'flutter damper roller', which is a component
used to minimize the problem.
If a 1 kHz sine wave tone is recorded onto analog tape, the output will
consist of 1 kHz plus two ranges of other frequencies, some strong and
consistent, others weaker and ever-changing due to random variations.
These are known in radio as 'sidebands' and the concept has exactly the
same meaning here.
Modulation noise, subjectively, causes a 'thickening' of the signal which
accounts for the fat sound of analog, compared to the more accurate, but
thin sound of digital. It has even been known for engineers to artificially
increase the amount of modulation noise by unbalancing one of the
rollers, thus creating more stronger sidebands containing a greater range
of frequencies. Don't try it with your hard disk!
The Studer A807 pictured here is typical of a workhorse stereo analog
recorder, sold mainly into the broadcast market. Let's run through the
major components starting from the ones you can't see:
• Three motors, one each for the supply reel, take-up real and
capstan. The take-up reel motor provides sufficient tension to
collect the tape as it comes through. It does not itself pull the tape
through. The supply reel motor is energized in the reverse direction
to maintain the tension of the tape against the heads.
• The capstan provides the motive force that drives the tape at the
• The pinch wheel holds the tape against the capstan.
• The tach (short for tachometer) roller contains a device to measure
the speed of the tape in play and fast wind.
• The tension arm smooths out any irregularities in tape flow.
• The flutter damper roller reduces vibrations in the tape, lessening
• The erase head wipes the tape clean of any previous recording.
• The record head writes the magnetic signal to the tape. It can also
function as a playback head, usually with reduced high frequency
• The playback head plays back the recording.
Magnetic tape comprises a base film, upon which is coated a layer of iron
oxide. Oxide of iron is sometimes, in other contexts, known as 'rust'. The
oxide is bonded to the base film by a 'binder', which also lubricates the
tape as it passes through the recorder. Other magnetic materials have
been tried, but none suits analog audio recording better than iron, or more
properly 'ferric' oxide. There are two major manufacturers of analog tape
(there used to be several): Quantegy (formerly known as Ampex) and
Emtec (formerly known as BASF).
Tape is manufactured in a variety of widths. (It is also manufactured in
two thickness - so-called 'long play' tape can fit a longer duration of
recording on the same spool, at the expense of certain compromises.).
The widths in common use today are two-inch and half-inch. Oddly
enough, metrication doesn't seem to have reached analog tape and we
tend to avoid talking about 50 mm and 12.5 mm. Other widths are still
available, but they are only used in conjunction with 'legacy' equipment
which is being used until it wears out and is scrapped, and for replay or
remix of archive material. Quarter-inch tape was in the past very widely
used as the standard stereo medium, but there is now little point in using
it as it has no advantages over other options that are available.
Two-inch tape is used on twenty-four track recorders. A twenty-four
track recorder can record - obviously - twenty-four separate tracks across
the width of the tape, thus keeping instruments separate until final
mixdown to stereo. Half-inch tape is used on stereo recorders for the final
The speed at which the tape travels is significant. Higher speeds are
better for capturing high frequencies as the recorded wavelength is
physically longer on the tape. However, there are also irregularities
(sometimes known as 'head bumps, or as 'woodles') in the bass end. The
most common tape speed in professional use used to be 15 inches per
second (38 cm/s), but these days it is more common to use 30 ips (76
cm/s), and not care about the massive cost in tape consumption! At 30
ips, a standard reel of tape costing up to $150 lasts about sixteen minutes.
Analog Recorders in Common Use
Otari MTR90 Mk III
There have been many manufacturers of analog tape recorders, but the
top three historically have been Ampex, Otari and Studer. In the US, you
will commonly find the Ampex MM1200 and occasionally the Ampex
ATR124, which is often regarded as the best analog multitrack ever
made, but Ampex only made fifty of them. All over the world you will
find the Otari MTR90 (illustrated with autolocator) which is considered
to be a good quality workhorse machine, and is still available to buy. The
Studer range is also well respected. The Studer A80 represents the
coming of age of analog multitrack recording in the 1970s. It has a sound
quality which is as good as the best within a very fine margin, but
operational facilities are not totally up to modern standards. For example,
it will not drop out of record mode without stopping the tape. The Studer
A800 is still a prized machine and is fully capable, sonicly and
operationally, of work to the highest professional standard. The more
recent A827 and A820 are also very good, but sadly no longer
Multitrack Recording Techniques
How to set about a multitrack recording session is a topic in itself and
will be explained later. However, there are certain points of relevance to
the equipment itself. The first is the necessity to be able to listen to or
monitor previously recorded tracks while performing an overdub. The
problem here is that there is a gap between the record head and the
playback head. If the singer, for example, sings in time with the output
from the playback head, the signal will be recorded on the tape a couple
of centimeters away, therefore causing a delay. To get around this
problem, while overdubbing, the record head is used as a playback head.
In this situation we talk about taking a 'sync output' from the record head.
The sync output isn't of such good sound quality since the record head is
optimized for recording, nevertheless it is certainly good enough for
monitoring. The playback head is used for final mixdown.
Also, it is commonplace to 'bounce' several tracks, perhaps vocal
harmonies, to one or two tracks (two tracks for stereo), thus freeing up
tracks for further use. This has to be done using the sync output of the
record head, otherwise the bounce won't be in time with the other tracks.
The slight loss of quality has to be tolerated.
Another technique worth mentioning at this stage is editing. As soon as
tape was invented, people were cutting it apart and sticking it back
together again. In fact, with the old wire recorders, people used to weld
the wire together, although the heat killed the magnetism at the join. The
most basic form of tape editing is 'top and tailing'. This means cutting the
tape to within 10 mm or so of the start of the audio, and splicing in a
section of leader tape, usually white (about two meters). Likewise the
tape is cut ten seconds or so after the end of each track and more leader
inserted between tracks. At the end of the tape, red leader is joined on.
No blank tape is left on the spool once top and tailing is complete.
Editing can also be used to improve a performance by cutting out the bad
and splicing in the good. Even two inch tape can be edited, in fact it is
normal to record three or four takes of the backing tracks of a song, and
splice together the best sections. The tape is placed in a special precision-
machined aluminum editing block, and cut with a single-sided razor
blade, guided by an angled slot. Splicing tape is available with exactly
the right degree of stickiness to join the tape back together. When the edit
is done in the right place (usually just before a loud sound), it will be
inaudible. It takes courage to cut through a twenty-four track two-inch
Compared to modern disk recorders, the main limitation of tape-based
multitrack - analog and digital - is that once they are recorded, all the
tracks have a fixed relationship in time. In a disk recorder, it is easy to
move one track backwards or forwards in time, or copy it to a new
location in the song. The equivalent technique in tape-based multitrack
recording is the 'spin in'. In the original sense of the term, a good version
of the chorus, or whatever audio was required to be repeated, would be
copied onto another tape recorder. The multitrack would be wound to
where the audio was to be copied. The two machines would be backed up
a little way, then both set into play. At the right moment, the multitrack
would be punched into record. Of course, the two machines had to be in
sync, and this was the difficult part. If the two machines were identical
mechanically, then a wax pencil mark could be made on corresponding
rotating tape guides and the tapes backed up by the same number of
revolutions. It sounds hit and miss, but it could be made to work
amazingly quickly. When the digital sampler became available, it was
used in place of the second recorder.
There is a difference between the maintenance of an analog recorder and
a digital recorder. Firstly you can do a lot of first-line maintenance on an
analog machine. You can't do more than run a cleaning tape on a digital
recorder. The second is that you have to do the maintenance, otherwise
performance will suffer. These are the elements of maintenance:
Cleaning: the heads and all metallic parts that the tape contacts are
cleaned gently with a cotton bud dipped in isopropyl alcohol. Isopropyl
alcohol is only one of a number of alcohol variants, and it has good
cleaning properties. It is not the same as drinking alcohol, so don't be
tempted. Also, drinking alcohol - ethanol - attracts additional taxes in
some countries, therefore it would not be cost-effective to use it.
The pinch wheel is made of a rubbery plastic. In theory it shouldn't be
cleaned with isopropyl alcohol, but it often is. You can buy special
rubber cleaner from pro audio dealers but in fact you can use a mild
abrasive household liquid cleaner. Just one tiny drop is enough.
Demagnetizing the heads: After a while, the metal parts will collect a
residual magnetism that will partially erase any tape that is played on the
machine. A special demagnetizer is used for which proper training is
necessary, otherwise the condition can be made even worse.
Line-up: Line up, or alignment, has two functions - one is to get the best
out of the machine and the tape; the other is to make sure that a tape
played on one recorder will play properly on any other recorder. The
following parameters are aligned to specified or optimum values:
Azimuth - the heads need to be absolutely vertical with respect to the
tape otherwise the will be cancellation at HF. The other adjustments of
the head - zenith, wrap and height are not so critical and therefore do not
need to be checked so often.
Bias level - optimizes distortion, maximum output level and noise.
Playback level - the 1 kHz tone on a special calibration tape is played and
the output aligned to the studio's electrical standard level.
High frequency playback EQ - the 10 kHz tone on the calibration tape is
played and the HF EQ adjusted.
Record level - a 1 kHz tone at the studio's standard electrical level is
recorded onto a blank tape and the record level adjusted for unity gain.
HF record EQ - adjusted for flat HF response.
LF record EQ - adjusted for flat LF response.
The line-up procedure used to be considered part of the engineer's day-to-
day routing, but is now often left to a specialist technician.
To conclude, this is certainly far from a complete treatise on analog tape
recording, but it is enough for a starting point considering that analog
recorders are now quite rare. Even so, analog recording has a long history
and will almost certainly have a long future ahead. In fact the machines
are so simple and are infinitely maintainable - a fifteen year old Studer
A800 will still be working for its living in fifteen years time. You can't
say that for digital recorders. Also, the sound of analog is very much the
sound of recording, as we understand it. Does it make sense therefore to
use digital emulation to achieve a pale shadow of the analog sound, or
would it be better to use the real thing?
• Give two reasons why analog recorders are still in use in top
• Comment on distortion in analog recording.
• Comment on noise in analog recording.
• Comment on modulation noise in analog recording.
• What is the function of AC bias?
• What is the distortion level of peaks in an analog recording?
• Why is the concept of clipping not relevant in analog recording?
• Why is the supply reel motor driven in the opposite direction to the
actual rotation of the reel?
• What is the capstan?
• What is the pinch wheel?
• What is the tach roller?
• What two tape widths are in common top-level professional use?
• Name three twenty-four track analog tape recorders, make and
• What is 'bouncing'?
• Comment on cut and splice tape editing.
• What are the two functions of line-up?
Chapter 6: Digital Audio
Why digital? Why wasn't analog good enough? The answer starts with
the analog tape recorder which plainly isn't good enough in respect of
signal to noise ratio and distortion performance. Many recording
engineers and producers like the sound of analog now, because it is a
choice. In the days before digital, analog recording wasn't a choice - it
was a necessity. You couldn't get away from the problems. Actually you
could. With Dolby A and subsequently SR noise reduction, noise
performance was vastly improved, to the point where it wasn't a problem
at all. And if you don't have a problem with noise, you can lower the
recording level to improve the distortion performance of analog tape. A
recording well made with Dolby SR noise reduction can sound very good
indeed. Some would say better than 16-bit digital audio, although this is
from a subjective, not a scientific, point of view. Analog record also had
the problem that when a tape was copied, the quality would deteriorate
significantly. And often there were several generations of copies between
original master and final product. Digital audio can be copied identically
as many times as necessary (although this doesn't always work as well as
you might expect. More on this in another module).
In the domestic domain, before CD there was only the vinyl record. Well
there was the compact cassette too, but that never even sounded good
even with Dolby B noise reduction. (Some people say that they don't like
Dolby B noise reduction. The problem is that they are usually comparing
an encoded recording with decoding switched on and off. The extra
brightness of the Dolby B encoded - but not decoded - sound
compensates for dirty and worn heads and the decoded version sounds
dull in comparison!). People with long memories will know that they
used to yearn for a format that wasn't plagued with the clicks, pops and
crackles of vinyl. The release of the CD format was eagerly anticipated,
and of course the CD has become a great success.
Done properly, digital audio recorders can greatly outperform analog in
both signal to noise ratio and distortion performance. That is why they
are used in both the professional and domestic domains. When the
question arises of why the other parts of the signal chain have mostly
been changed over to digital, any possible improvement in sound quality
is hardly relevant. Everything else performs as well as anyone could
possibly want. Well almost anyone, the only exceptions being the
microphone and the loudspeaker, but we are still some way off truly
digital transducers becoming available. By the time digital recording and
reproduction had become properly established, digital audio in general
was showing that it could offer advantages over analog in terms of price
and facilities offered. Digital effects were first, as it became possible to
achieve, for instance, digital reverberation for a tiny fraction of the cost
of an electromechanical system. Digital mixing consoles came rather
later because they require an incredible processing power. Digital mixing
consoles don't sound better than analog. They do however offer more
facilities for the price, and have the advantage that settings can easily be
stored and recalled. This is an important feature that we shall discuss
more when we discuss mixing consoles.
Having established the reasons we have digital audio, let's see how it
Firstly, what do we mean by analog? Analog comes from the word
analogy. If I say that electrical voltage is a similar concept to the pressure
of water behind a tap (excuse me, faucet), then I am making an analogy.
If I convert an acoustic sound to an electrical signal where the rise and
fall in sound pressure is imitated by a similar rise and fall in voltage, then
the electrical signal is an analog of the original. An analog signal is
continuous. It follows the changes of the original without any kind of
subdivision. It might not be able to track the changes fast enough for
complete accuracy, in which case the high frequency response will be
worse than it could be. Its useful dynamic range lies between a maximum
value which the analog signal cannot exceed (generally the positive and
negative voltage limits of the power supply - the signal can never exceed
these and will be clipped if it tries) and random variations at a very low
level that we hear as noise.
Digital systems analyze the original in two ways: firstly by 'sampling' the
signal a number of times every second. Any changes that happen
completely between sampling periods are ignored, but if the sampling
periods are close enough together, the ear won't notice. The other is by
'quantizing' the signal into a number of discrete - separately identifiable -
levels. The smoothly changing analog signal is therefore turned into a
stair-step approximation, since digital audio knows no 'in-between' states.
As you can see, the digital signal here is only a crude approximation of
the original, but it can be made better by increasing the sampling
frequency (sampling rate), and by increasing the number of quantization
levels. Let's go deep...
To reproduce any given frequency, the sampling frequency, or sampling
rate, has to be at least twice that frequency. So to convert the full range of
human hearing to digital, a sampling frequency of at least 40 kHz ( twice
20 kHz) is necessary. In practice, a 'safety margin' has to be added, so we
get the standard compact disc sampling frequency of 44.1 kHz (exactly
this to coincide with the requirements of early digital equipment), and
48 kHz which is used in broadcasting (since in the early days of digital it
was easier to convert to the standard satellite sampling frequency of 32
To reduce the quantization error between the digital signal and the
original analog, more quantization levels must be used. Compact disc and
DAT both use 65,536 levels. This, in digital terms, is a nice round
number corresponding to 16 bits. Without going into binary arithmetic,
each bit provides roughly 6 dB of signal to noise ratio. Therefore a digital
audio system with 16-bit resolution has a signal to noise ratio (at least in
theory) of 96 dB.
The question will arise, what happens if a digital system is presented with
a frequency higher than half the sampling frequency? The answer is that
a phenomenon known as aliasing will occur. What happens is that these
higher frequencies are not properly encoded and are translated into
spurious frequencies in the audio band. These are only distantly related to
the input frequencies and absolutely unmusical (unlike harmonic
distortion, which can be quite pleasant in moderation). The solution is not
to allow frequencies higher than half the sampling rate (in fact less, to
give a margin of safety) into the system. Therefore an 'anti-aliasing' filter
is used just after the input. Filter design is complex, particularly filters
with the steep slopes necessary to maximize frequency response, but not
be too wasteful on storage or bandwidth by having a sampling rate that is
unnecessarily high. The design of the filters is one of the distinguishing
points that make different digital systems actually sound different.
Once the signal has been filtered, sampled and quantized, it must be
coded. It might be possible to record the binary digits directly but that
wouldn't offer the best advantage, and indeed might not work. In the
compact disc system, the tiny pits in the aluminized audio layer
themselves form the spiral that the laser follows from the start of the
recording to the end. A binary '1' is coded by a transition from 'land' - the
level surface - to a pit or vice-versa. A binary '0' is coded by no
transition. But what if the signal was stuck on '0' for a period of time - the
spiral would disappear! Hence a system of coding is used that rearranges
the binary digits in such a way that they are forced to change every so
often, simply to make a workable system. There are other such
constraints that we need not go into here.
Additionally there is the need for error correction. In any storage medium
there are physical defects that would damage the data if nothing were
done to prevent such damage. So additional data is added to the raw
digital signal, firstly to check on replay whether the data is valid or
erroneous, secondly to add a backup data stream so that if a section of
data is corrupted, it can be reconstituted from other data nearby. Adding
error correction involves a compromise between preserving the integrity
of the digital signal, and not adding any more extra data than necessary.
It is fair to say that the error correction system on CD, and on DAT, is
very good. But as in all things, more modern digital systems are cleverer,
All of the above is known as analog to digital encoding, or A to D. The
reverse process is known, fairly obviously, as decoding. To spare the
details that only electronics experts need to know, the digital signal goes
through a D to A convertor and out comes an analog signal. The only
problem is that it now contains a strong component at the sampling
frequency. Obviously this is above audibility, but it could cause severely
audible distortion if allowed into any other equipment that couldn't
properly handle it. To obviate this therefore, the output is filtered with
what is known as a 'brickwall' filter, because of its steep slope. Once
again the design of the filter does affect the sound quality, but digital
tricks have now been developed to make the filter's job easier, therefore
design is more straightforward.
Analog to Digital Conversion
Filtering: removing frequencies, in the analog domain, that are
higher than half the sampling rate.
Sampling: measuring the signal level once per sampling period.
Quantization: deciding which of the 65,536 levels (in a 16-bit
system) is closest to the input signal level, for each sampling period.
Coding: converting the result to a binary number according to a
scheme that incorporates a) error detection, b) provision for error
correction, c) is recordable or transmissable in the chosen medium.
The A to D decoder incorporates three levels of protection against
Error correction; an error is detected in the data and completely corrected
by using the additional error-correction data specifically put there for the
Error concealment; an error is detected but it is too severe to be
corrected. Missing data is therefore 'interpolated' - just one of the many
scientific words for 'guess' - from surrounding data and the result
hopefully will be inaudible. However, if you ever get chance to see a CD
player that has correction and concealment indicator lights, you will
notice that an awful lot of concealment goes on just to play an average
disc. How well concealment is done is one of the factors that make
different digital systems sound different.
Muting; in this case the error is so bad that the system shuts down
momentarily rather than output what could be an exceedingly loud glitch.
Bandwidth, in this context, is the rate of flow of data measured in
kilobits per second. 1 kilobit is 1024 bits. Often, the term byte is
used where 1 byte = 8 bits. The abbreviation for bit is 'b' and for
byte is 'B', but these are often confused, as are the multiplier
prefixes 'k' meaning x1000, and 'K' meaning x1024.
The bandwidth of a single channel of 16-bit 44.1 or 48 kHz digital
audio is roughly 750 Kbps. Compare this with the bandwidth of a
modem (56 Kbps), ISDN2 (128 Kbps) and common ADSL Internet
connections (512 Kbps). None of these systems is capable of
transmitting even a single channel of digital audio, hence the need
for MP3 and similar data-reduction systems.
The quest for ever better sound quality leads us to want to increase both
the sampling rate and the resolution. 24-bit resolution will in theory give
a signal to noise ratio of 144 dB. This will never happen in practice, but
the real achievable signal to noise ratio is probably as good as anyone
could reasonably ask for. Of course, some of the available dynamic range
may be used as additional headroom, to play safe while recording, but
even so the resulting recording will be remarkably quiet. Also, even
though most of us cannot even hear up to 20 kHz, a frequency which is
perfectly well catered for these days by a 44.1 or 48 kHz sampling rate,
there is always a nagging doubt that this is only just good enough, and it
would be worthwhile to have a really high sampling rate to put all doubt
at an end.
This of course, affects storage requirements. It is a reasonable rule of
thumb that CD-quality stereo audio requires about 10 Megabytes per
minute of storage. 24-bit, 96 kHz digital audio will therefore, by simple
multiplication, require 30 Megabytes per stereo minute. Of course,
Megabytes are getting cheaper all the time. There is another problem
however - data bandwidth. When recording onto a hard disk system,
there is a certain data throughput rate beyond which the system will
struggle and possibly fail to record or playback properly. A standard
modern hard drive should be easily capable of achieving 24 tracks of
playback under normal circumstances (the track count is affected, for one
thing, by the 'edit density' - the more short segments you cut the audio
into, and the more widely the data is physically separated on the disk, the
harder it will be to play back). Try this at three times the data rate and the
track count, or the reliability is bound to suffer. However, disks are
getting ever faster and most of the problems of this nature are in the past.
Before long it will be possible to get virtually any number of tracks quite
easily. It's worth a quick look at Digidesign's comments on hard disk
specifications to maximize track count.
Digital interconnection comes in a number of standards, which are
• Also known as AES3 1985 (the year it was implemented)
• Standard for professional digital audio
• Supports up to 24-bit at any sampling rate
• Transmits 2 channels on a single cable
• Uses 110 ohm balanced twisted wire pair cables usually terminated
with XLR connectors
• Can use cables of length up to 100 meters
• Electrical signal level 5 volts
• Standard audio cables can be used for short distances but are not
recommended as their impedance may not be the standard 110
ohm and reflections may occur at the ends of the cable
• Data transmission at 48 kHz sampling rate is 3.072 Megabit/s (64x
the sampling rate)
• Self clocking but master clocking is possible
• Two types:
• Uses 75 Ohm unbalanced coaxial cable with RCA phono
• Cable lengths limited to 6 meters.
• TOSLINK - Uses plastic fiber optic cable and same connectors as
Lightpipe (below). TOSLINK is an optical data transmission
technology developed by Toshiba. TOSLINK does not specify the
protocol to be used
• ST-type - Glass fiber can be used for longer lengths (1 kilometer).
• Meant for consumer products but may be seen on professional
• Supports up to 24-bit/48 kHz sampling rate
• It ought to be necessary to use a format converter when connecting
with AES/EBU since the electrical level is different (0.5 V) and
the format of the data is different also. However, some AES/EBU
inputs can recognise an S/PDIF signal
• Some of the bits within the Channel Status blocks are used for
SCMS (Serial Copy Management System), to prevent consumer
machines from making digital copies of digital copies.
• an extension of the AES3 format (AES/EBU)
• supports up to 24-bit/48 kHz sampling rate (higher rates are
• transmits 56 channels on a 75 Ohm video coaxial cable with BNC
• Length limited to 50 meters. Fiber-optic cable can be used for
• Data transmission rate is 100 Megabit/s
• Requires a master clock - a dedicated master synchronization
signal must be applied to all transmitters and receivers.
• Sometimes known as 'Lightpipe'
• Implemented on the Alesis ADAT MDM and digital devices such
as mixing consoles, synthesizers and effects units
• Supports of to 24-bit/48 kHz sampling rate
• Transmits 8 channels serially on fiber-optic cable
• Distance limited to 10 meters., or up to 30 meters with glass fiber
• Data transmission at 48 kHz is 12 Megabit/s
• Self clocking
• Channels can be reassigned (digital patchbay function)
TDIF (Tascam Digital Interface Format)
• Implemented on Tascam's family of DA-88 recorders and other
digital devices such as mixing consoles
• Supports of to 24-bit/multiple sampling rates
• Transmits 8 channels on multicore, unbalanced cables with 25-pin
• Bidirectional interface: a single cable carries data in both
• Cable length limited to 5 meters
• Data transmission at 48 kHz sampling rate is 3 Megabit/s (like
• Intended for a master clock system, although self-clocking is
• To which type of sound engineering equipment was digital audio
• In relation to the question above, why was this the most pressing
• What types of equipment are currently not available in digital
• Describe 'sampling rate'.
• What is the minimum sampling rate for a digital system capable of
reproduction up to 20 kHz (ignoring any 'safety margin').
• What is 'aliasing'?
• What two sampling rates are most commonly used in digital
• Describe quantization.
• What is the signal to noise ratio, in theory, of a digital system with
• Why is coding necessary? Give two reasons.
• Why does a digital to analog convertor need a filter?
• What is error correction?
• What is error concealment?
• What happens (or at least should happen) if an error is neither
corrected nor concealed?
• How many Megabytes of data, approximately, are occupied by one
minute of CD-quality stereo digital audio?
Why, in a hard disk recording system, is it likely that fewer tracks can be
replayed simultaneously at the 24-bit/96 kHz standard, than at the CD-
q u a l i t y 1 6 - b i t / 4 4 . 1 k H z s t a n d a r d ?
Chapter 7: Digital Audio Tape Recording
The original purpose of DAT (Digital Audio Tape) was to be a
replacement for the Compact Cassette (or simply 'cassette', as we now
know it). Since DAT was intended to be a consumer product right from
the start, the cassette housing is very small, 73 x 54 mm and just 10.5
mm thick. For professional users, this is rather too small, not just because
it makes the cassette easier to lose, but because there will always be a
feeling that DAT could have been a better system if there had been a bit
more space for the data. This would allow for error concealment to be
minimized, and tracking tolerances could be such that a tape recorded on
one recorder could be absolutely guaranteed to play properly on any
other. This is generally the case for professional machines, but not
necessarily so for semi-pro 'domestic' recorders.
Sony professional DAT
Having said that DAT’s size is a disadvantage for professional users, it
really is amazing how it achieves what it does working at microscopic
dimensions. DAT’s full title, R-DAT, indicates that the system uses a
rotary head like a video recorder. Unlike analog tape which records the
signal along a track parallel to the edge of the tape, a rotary head recorder
lays tracks diagonally across the width of the tape. So even though the
tape speed is just 8.15 millimeters/second, the actual writing speed is a
massive 3.133 meters/second. The width of each track is 13.591
millionths of a meter. Unlike an analog tape, the tracks are recorded
without any guard band between them. In fact, the tracks are recorded by
heads which are around 50% wider than the final track width and each
new track partially overlaps the one before, erasing that section. Since the
same heads are used for recording and playback, this may seem to
present a problem because if the head is centred on the track it is meant
to be reading, then it will also see part of the preceding track and part of
the next track. Won't this result in utter confusion? Of course it doesn't,
because a system originally developed for video recording is used,
known as azimuth recording. The ‘azimuth’ of a tape head refers to the
angle between the head gap, where recording takes place, and the tape
track itself. In an analog recorder the azimuth is always adjusted to 90
degrees, so that the head gap is at right angles to the track. In DAT,
which uses two heads, one head is set at -20 degrees and the other to +20
degrees, and they lay down tracks alternately. So on playback, each head
receives a strong signal from the tracks that it recorded, and the adjacent
tracks, which are misaligned by 40 degrees, give such a weak signal that
it can be rejected totally.
Mechanically, there is a strong similarity between a DAT recorder and a
video cassette recorder. Both use a rotary head drum on which are
mounted the record/playback heads. But there are differences. A video
recorder uses a large head drum with the tape wrapped nearly all the way
around. This is necessary so that there can always be a head in contact
with the tape during the time that each video frame is built up on the
screen. With digital audio, data can be read off the tape at any rate that is
convenient and stored up in a buffer before being read out at a constant
speed and converted to a conventional audio signal. The head drum in a
DAT machine is a mere 30mm in diameter (and spins at 2000 revolutions
per minute). The tape is wrapped only a quarter of the way around, which
means that at times neither of the two heads is in contact with the tape,
but as I said, this can be compensated for. This 90 degree wrap has its
• There is only a short length of tape in contact with the drum so
high speed search can be performed with the tape still wrapped.
• Tape tension is low, giving long head and tape life
• If an extra pair of heads is mounted on the drum, simultaneous off-
tape monitoring can be performed during recording just like a
three-head analogue tape recorder.
The signal that is recorded on the tape is of course digital, and very
dissimilar to either analogue audio or video signals. As you know, the
standard DAT format uses 16 bit sampling at a sampling frequency of 48
kHz. This converts the original analog audio signal to a stream of binary
numbers representing the changing level of the signal. But since the
dimensions of the actual recording on the tape are so small, there is a lot
of scope for errors to be made during the record/replay process, and if the
wrong digit comes back from the tape it is likely to be very much more
audible than a drop-out would be on analog tape. Fortunately DAT, like
the Compact Disc, uses a technique called Double Reed-Solomon
Encoding which duplicates much of the audio data, in fact 37.5%, in such
a way that errors can be detected, then either corrected completely or
concealed so that they are not obvious to the ear. If there is a really huge
drop-out on the tape, then the DAT machine will simply mute the output
rather than replay digital gibberish. As an extra precaution against
dropouts, another technique called interleaving is employed which
scatters the data so that if one section of data is lost, then there will be
enough data beyond the site of the damage which can be used to
reconstruct the signal.
The pulse code modulated audio data is recorded in the centre section of
each diagonal track across the tape. There is other data too:
• 'ATF' signals allow for Automatic Track Finding which makes
sure that the heads are always precisely positioned over the centre
of the track, even if the tape is slightly distorted and the track
• Sub Code areas allow extra data to be recorded alongside the audio
information. Not all of the capacity of the Sub Code areas is in use
as yet, allowing for extra expansion of the DAT system. Those at
present in use include:
• A-time, which logs the time taken since the beginning of the tape
• P-time, which logs the time taken since the last Start ID.
• Start ID marks the beginning of each item;
• Skip ID tells the machine to go directly to the next Start ID, thus
performing an ‘instant edit’.
• End ID marks the end of the recording on the tape.
• There is also provision for SMPTE/EBU timecode
DASH stands for Digital Audio Stationery Head. The DASH
specifications include matters such as the size of the tape, the tape speed
and the layout of the tracks on the tape; also the modulation method and
error correction strategy, among other things. The format is based on two
tape widths: 1/4” (6.3 mm) and 1/2” (12.55 mm). For each tape width
there are two track geometries, Normal Density and Double Density and
there are also three tape speeds, nominally Slow, Medium and Fast (a
further variation is caused by each of the three speeds being slightly
different according to whether 44.1 kHz or 48 kHz sampling is used).
According to the above, there must be twelve combinations all of which
conform to the DASH format. This could make life confusing, but just
because a particular combination of parameters is possible, it doesn't
necessarily mean that a machine will be built to accommodate it.
Sony PCM 3348
The original Sony 3324, and recent 24-track machines, use the normal
density geometry on 1/2” tape which allows twenty-four digital audio
tracks, two analog cue tracks, a control track and a timecode track. (The
cue tracks are there so that audio can be made available in other than
normal play speed +/- normal varispeed). The tape speed at 44.1 kHz is
70.01cm/s. The 3324 is totally two-way compatible with the larger 3348
which can record forty-eight digital tracks on the same tape. To give an
example, you may start a project on a 3324, of any vintage, and then the
producer decides as the tracks fill up that he or she really needs more
elbow room for overdubs. So you hire a 3348, put the twenty-four track
tape on this and record another twenty-four tracks in the guard bands left
by the other machine. Continuing my (hypothetical) example, when it is
decided that the project is costing too much and going nowhere, the
producer is sacked and another one brought in who decides that the extra
twenty-four tracks are unnecessary embellishments and the original
tracks, with a little touching up, are all that are required. Off goes the
3348 back to the hire company, the tape - now recorded with forty-eight
tracks - is placed back on the 3324 and the original twenty-four tracks are
successfully sweetened and mixed with not a murmur from the tracks that
are now not wanted. We are now accustomed to new products and
systems which offer new features yet are compatible with material
produced on earlier versions. This must be audio history's only example
of forward as well as reverse compatibility. It shows what thinking ahead
The first thing you are likely to want to do with your new DASH
machine is of course to make a recording with it, but it would be
advisable to read the manual before pressing record and play. Some of
the differences between digital and analog recording stem from the fact
that the heads are not in the same order. On an analog recorder we are
used to having three heads: erase, record and play. DASH doesn't need an
erase head because the tape is always recorded to a set level of
magnetism which overwrites any previous recordings without further
intervention. So the first head that the tape should come across should be
the record head. Right?
Wrong. The first head is a playback head, which on an basic DASH
machine is followed a record head only. If this seems incorrect, you have
to remember that while analog processes take place virtually
instantaneously, digital operations take a little time. So if you imagine
analog overdubbing where the sync playback signal comes from the
record head itself, you can see why this won't work in the digital domain.
There will be a slight delay while the playback signal is processed, and
another delay while the record signal is processed and put onto tape. 105
milliseconds in fact, which corresponds to about 75 mm of tape. To
perform synchronous overdubs there has to be a playback head upstream
of the record head otherwise the multitrack recording process as we know
it just won’t work. For most purposes two heads are enough, and a third
head is available as an option if you need it, and you'll need it if you want
to have confidence monitoring. (There are no combined record/playback
heads, by the way, all are fixed function).
On any digital recording medium the tape has to be formatted to be used.
On DAT the formatting is carried out during recording, but on DASH it
is often better to do it in advance. The machine can format while
recording - in Advance Record mode - but this is best done in situations
where you will be recording the whole of the tape without stopping. If
you wish, you can ‘pre format’ a tape but this obviously takes time. You
can take comfort from the fact that it can be done in one quarter of real
time, and the machine will lay down timecode simultaneously.
Since there are different ways to format a tape and make recordings, the
3342S has three different recording modes: Advance, Insert and
Assemble. Advance mode is as explained above. Insert is for when you
have recorded or formatted the full duration of the material and you want
to go back and re-record some sections. Assemble is when you want to
put the tape on, record a bit, play it back, record a bit more etc, as would
typically happen in classical sessions.
The main text deals with some of the implications of delays caused
by the process of recording digital signals onto tape and playing
them back again. There is another problem caused by delays in the
A/D conversion itself. The convertors used in the Sony 3324S, for
example, while being very high quality, have an inherent delay of
about 1.7 milliseconds.
Imagine the situation where you are punching into a track on an
analog recording to correct a mistake. You will probably set up the
monitoring so that you and the performer can hear both the output
from the recorder and the signal to be recorded. The performer will
play along with his part until the drop in, when the recorder will
switch over to monitor the input signal. This will be returned to the
console and you will hear the level go up by approximately 3dB
because you are now monitoring the same signal via two paths.
On the 3324S you can make a cross fade punch in of up to about
370 milliseconds. This is a good feature, but when you have made
the punch in - using the monitoring arrangement described above -
you will hear the input signal added to the same signal returned
from the recorder but delayed by about 1.7ms. This will caused
phase cancellation and an odd sound. Fortunately, Sony have
included an analog cross fade circuit which will imitate what is
happening in the digital domain, but without the delay.
DASH was designed to be a cut-and-splice editing format. Briefly,
this is possible but it was found in practice that edits were often
unreliable. Editing of DASH tapes is now done by copying between
two machines synchronized together with an offset. Two
synchronized 24-track machines are obviously more versatile in this
respect than one 48-track.
Although an analog recorder can be, and should be, cleaned by the
recording engineer in the normal course of studio activities, a DASH
machine should only be cleaned by an expert, or thousands of dollars
worth of damage can be caused. The heads can be cleaned with a special
chamois-leather cleaning tool, wiping in a horizontal motion only. Cotton
buds, as used for analog records will clog a DASH head with their fibers.
Likewise, an analog record can be aligned by a knowledgable engineer,
but alignment of a DASH machine is something that is done every six
months or so by a suitably qualified engineer carrying a portable PC and
a special test jig in his tool box. The PC runs special service software
which can interrogate just about every aspect of the DASH machine
checking head hours, error rates, remote ports, sampler card etc etc. With
the aid of its human assistant it can even align the heads and tape tension.
The current significance of DASH is as a machine that can record onto a
relatively cheap archivable medium, with confidence that tapes will be
replayable after many years. Also, when an analog project is recorded on
twin 24-track recorders, it is often considered more convenient for
editing to copy the tapes to a Sony 3348. The single 3348 is far faster and
more responsive than synchronized analog machines, making the mixing
process faster and smoother.
The original modular digital multitrack was the Alesis ADAT (below
left). On its introduction it was considered a triumph of engineering to an
affordable price point. The ADAT (Alesis Digital Audio Tape) was
closely followed by the Tascam DTRS (Digital Tape Recording System)
format (below right).
There are certain similarities:
• Both formats capable of 8 tracks.
• Multiple machines can be easily synchronized to give more tracks.
• Recordings are made on commonly available video tapes: ADAT
takes S-VHS tapes, DTRS takes Hi-8
• Tape need to be formatted before use. Formatting can take place
during recording, but this is only appropriate when a continuous
recording is to be made for the entire duration of the tape.
• Very maintenance-intensive. For a 24-track system, four machines
(4 x 8 = 32) are necessary to account for the one that will always
be on the repair bench.
• High resolution versions available (ADAT 20-bit, DTRS 24-bit, 96
kHz, 192 kHz, with reduced track count)
• The differences are these:
• Maximum record time: ADAT - 60 minutes, DTRS - 108 minutes
• ADAT popular in budget music recording studios
• DTRS popular in broadcast and film post-production
One further difference is that it is probably fair to say that the ADAT has
reached the end of its product life-cycle, although there are undoubtedly
still plenty of them around and in use. DTRS however is still useful as a
tape-based system offering a standard format and cheap storage.
• Was DAT originally intended as a professional or a domestic
• What is the sampling rate of standard DAT?
• What is the resolution of standard DAT?
• What is 'azimuth recording'?
• Describe the head wheel in DAT recorder.
• What is SCMS?
• What is the distinguishing feature of a DAT machine capable of
near-simultaneous off-tape monitoring?
• What is the sub-code area of the DAT tape used for?
• What is 'interleaving'?
• What is the width of the tape used for 24-track DASH?
• What is the width of the tape used for 48-track DASH?
• Describe how 24-track and 48-track DASH machines are
• How are DASH tapes edited?
• In DASH, why does a playback head come before the record head
in the tape path?
• Comment on the cleaning requirements of DASH
• How many tracks does a modular digital multitrack (MDM) have?
• How can more tracks be obtained?
• Comment on the types of usage of ADAT and DTRS machines.
Appendix 1: Sound System Parameters
A large part of sound engineering involves adjusting signal level: finding
the right level or finding the right blend of levels. The level of a real
sound traveling in air can be measured in µN/m2 (or µPa/m2 –
micropascals per square meter if you prefer), or more practically dB SPL
with reference to 0 dB SPL or 20 µN/m2. The level of a signal in
electrical form can be measured in volts, naturally, or it can be measured
in dB. The problem is that decibels are always a comparison between two
levels. For acoustic sounds, the dB SPL works by comparing a sound
level with the reference level 20 µN/m2 (the threshold of hearing).
Therefore we need a reference level that works for voltage.
Going in back in history, early telecommunication engineers were
interested in the power that they could transmit over a telephone line.
They decided upon a standard reference level for power, which was 1
mW (1 milliwatt, or one thousandth of a watt). This was subsequently
called 0 dBm. The ‘m’ doesn't stand for anything, it just means that any
measurement in dBm is referenced to 1 mW. Today in audio circuitry,
we are not too concerned about power except at the final end product –
the output of the power amplifier into the loudspeaker. For the rest of the
time we can happily measure signal level in voltage. Going back into
history, standard telephone lines had a characteristic impedance of 600
ohms. (‘Characteristic impedance’ is a term hardly ever used in audio so
explanation here will be omitted). The relationship between power,
voltage and impedance is: P = V2/R. Working out the math we find that a
power of 1 mW delivered via a 600 ohm line develops a voltage of 0.775
V. This became the standard reference level of electrical voltage, and it is
still in use today.
There is a slight problem here. Over the years it became customary to
refer to a voltage of 0.775 V as 0 dBm. This is not wholly correct. It is
only true when the impedance is 600 ohms, which is not necessarily the
case in audio circuitry. Despite this, any reference you find to 0 dBm, in
practice, means 0.775 V regardless of what the impedance is.
Technical sound engineers abhor inconsistencies like this, so a new unit
was invented: dBu, where 0 dBu is 0.775 V, without any reference to
impedance. Once again, the ‘u’ doesn't stand for anything. ‘dBu’ is
sometimes written ‘dBv’ (note lower case ‘v’). Confusingly there is also
another reference: dBV (note upper case ‘V’), where 0 dBV is 1 volt. In
0 dBm = 1 mW
0 dBu = 0.775 V
0 dBv = 0.775 V
0 dBV = 1 V
There are more:
dBr is a measurement in decibels with an arbitrary quoted reference level
dBFS is a measurement in decibels where the reference level is the full
level possible in a specific item of digital audio equipment. 0 dBFS is the
maximum level and any measurement must necessarily be negative, for
example –20 dBFS.
All of the above (with the exception of dBFS) refer to electrical levels.
We also need levels for magnetic tape and other media. Analog recording
on magnetic media is still commonplace in top level music recording,
and outside of the developed countries of the world. Magnetic level is
measured in nWb/m (nanowebers per meter). ‘Nano’ is the prefix
meaning ‘one thousandth of a millionth’. The weber (Wb) is the unit of
magnetic flux. Wb/m is the unit of magnetic flux density, or simply ‘flux
density’. Wilhelm Weber the person (pronounced with a ‘v’ sound in
Europe, with a ‘w’ sound in North America), by the way, is to magnetism
what Alessandro Volta is to electricity.
There are a number of magnetic reference levels in common use. Ampex
level, named for the company that developed the tape recorder from
German prototypes after World War II, is 185 nWb/m. NAB (National
Association of Broadcasters, in the USA) level is 200 nWb/m. DIN
(Deutsche Industrie Normen, in Europe) level is 320 nWb/m. In
Ampex level: 185 nWb/m
NAB level: 200 nWb/m
DIN level: 320 nWb/m
It’s worth noting that none of these reference levels is better than any
other, but NAB and DIN are the most used in North America and Europe
An extension of the concept of level is operating level. This is the level
around which you would expect your material to peak. Much of the time
the actual level of your signal will be lower, sometimes higher. It’s just a
figure to keep in mind as the roughly correct level for your signal. In
electrical terms, the standard operating level of professional equipment is
0 dBu. There is also a semi-professional operating level of –10 dBV.
This does cause some difficulty when fully professional and semi-
professional equipment is combined within the same system. Either you
have to keep a close eye on level and resign yourself to making
corrections often, according to what combination of equipment you
happen to be using, or buying a converter unit that will bring semi-pro
level up to pro level.
Magnetic tape also has a standard ‘operating level’ - several of them in
fact. To simplify a little since analog magnetic tape is now a minority
medium, albeit an important minority: In a studio where VU meters are
used, then it is common to align the VU meters so that 0 VU equals +4
dBu. Tape recorders would be aligned so that a tone at 200 nWb/m gives
a reading of 0 VU. In short:
200 nWb/m on tape normally equates to +4 dBu and 0 VU
Most brands of tape can give good clean sound up to 8 dB above 200
nWb/m and even beyond, although distortion increases considerably
Digital equipment also has an ‘operating level’, of sorts. In some studios
- mainly broadcast - digital recorders such as DAT are aligned so that
–18 dBFS (18 dB below maximum level) is equivalent to +4 dBu and 0
VU. This certainly allows plenty of headroom (see later), but it doesn’t
fully exploit the dynamic range of DAT. Most people who record
digitally record right up to the highest level they think they can get away
with without risk of red lights or ‘overs’.
Gain refers to an increase or decrease in level and is measured in dB.
Since gain refers to both the signal level before gain was applied, and
signal level after gain is applied, then the function of the decibel as a
comparison between two levels holds good. The signal level from a
microphone could be around 1 mV, for instance. Apply a gain of 60 dB
and it is multiplied by a thousand giving around 1 V – enough for the
mixing console to munch on. Suppose the signal then needed to be made
smaller, or attenuated, then a gain of –20 dB would bring it down to
around 100 mV. Some engineers find it fun to play around with these
numbers. Your degree of fluency in the numbers part of decibels depends
on whether you want to be a technical expert, or just concentrate on the
audio. There is work available for both types of engineer.
The need to make a signal bigger or smaller is fairly easy to understand,
but what about making it stay the same level? What kind of gain is this?
The answer is ‘unity gain’ and it is a surprisingly useful concept. Unity
gain implies a change in level of 0 dB. In the analog era it was important
to align a recorder so that whatever level you put in on record, you got
that same level out on replay. Then, apart from being spared changes in
level between record and playback, you could do things like copy tapes,
edit bits and pieces together and the level wouldn’t jump. If you hadn't
aligned your machines to unity gain then the levels would be all over the
place. With digital equipment, it is actually the norm for digital input and
output to be of the same level, so unity gain – in the digital domain at
least – tends to happen automatically.
RMS and Peak Levels
How do you measure the level of an AC (alternating current) waveform?
Or to put it another way, how do you measure the level of an AC
waveform meaningfully? A simple peak-to-peak measurement, or peak
measurement, shows the height (or amplitude) of the waveform, but it
doesn't necessarily tell you how much subjective loudness potential the
waveform contains. A very ‘peaky’ waveform (or a waveform with a
high crest factor, as we say) might have strong peaks, but it will not tend
to sound very loud. A waveform with lower peaks, but greater area
between the line and the x-axis of the graph will tend to sound louder on
delivery to the listener. The most meaningful measurement of level is the
root-mean-square technique. Cutting out all the math, the RMS
measurement tells you the equivalent ‘heating’ capability of a signal. A
waveform of level 100 Vrms would bring an electric fire element to the
same temperature as a direct (DC) voltage of 100 V. A waveform of level
100 Vpeak-to-peak would be significantly less warm.
It is generally accepted that the range of human hearing, taking into
account a selection of real live humans of various ages, is 20 Hz to 20
kHz, and sound equipment must be able to accommodate this. It is not
however sufficient to quote a frequency range. It is necessary to quote a
frequency response, which is rather different. In addition, we are not
looking for any old frequency response, we are looking for a ‘flat
frequency response’ which means that the equipment in question
responds to all frequencies, within its limits, equally and any deviations
from an equal response are defined. The correct way to describe the
frequency response of a piece of equipment is this:
20 Hz to 20 kHz +0 dB/-2 dB
20 Hz to 20 kHz ±1 dB
Of course the actual numbers are just examples, but the concept of
defining the allowable bounds of deviation from ruler-flatness is the key.
Q is used in a variety of ways in electronics and audio but probably the
most significant is as a measure of the ‘sharpness’ of a filter or equalizer.
For example, an equalizer could be set to boost a range of frequencies
around 1 kHz. A high Q would mean that only a narrow band of
frequencies around the center frequency is affected. A low Q would
mean that a wide range of frequencies is affected. Q is calculated thus:
Q = f0/(f2-f1) where f0 is the center frequency of the band, f2 and f1 are
the frequencies where the response has dropped –3 dB with respect to f0.
It may be evident from this that Q is a ratio and has no units. Q doesn't
stand for anything either, it’s just a letter. Whether you need to use a low
Q setting or a high Q setting depends on the nature of the problem you
want to solve. If there is a troublesome frequency, for example acoustic
guitars sometimes have an irritating resonance somewhere around 150
Hz to 200 Hz, then a high Q setting of 4 or 5 will allow you to home in
on the exact frequency and deal with it without affecting surrounding
frequencies too much. If it is more a matter of shaping the spectrum of a
sound to improve it or allow it to blend better with other signals, then a
low Q of perhaps 0.3 would be more appropriate. The range of Q in
common use in audio is from 0.1 up to around 10, although specialist
devices such as feedback suppressers can vastly exceed this.
Noise can be described as unwanted sound, or alternatively as a non-
meaningful component of a sound. Noise occurs naturally in acoustics,
even in the quietest settings. Air molecules are in constant motion at any
temperature above absolute zero and since sound is nothing more than
the motion of air molecules, then the random intrinsic motion must
produce sound - sound of a very low level, but sound none the less. We
are not generally aware of this source of noise, but some microphones
are. A microphone with a large diaphragm will have many molecules
impinging on its surface, and the random motion of the molecules will
tend to average out and be insignificant in comparison with the wanted
signal. A microphone with a small diaphragm however (such as a clip-on
mic) will only be in contact with comparatively few air molecules so the
averaging effect will be less and the noise higher in level in comparison
with the wanted signal.
When sound is converted to an electrical signal, the signal is carried by
electrons. Once again, electrons are in constant random motion causing
what is called Johnson noise. If the signal is carried by a large current (in
a low impedance circuit), then Johnson noise can be insignificant. If the
signal is carried by only a small current with relatively few electrons (in a
high impedance circuit), then the noise level can be much higher. We can
extend this concept to any medium that can carry or store a sound signal.
Noise is cause by variations in the consistency of the medium. One more
example would be a vinyl record groove. The signal is stored as
undulations in the groove, but any irregularities such as dust or scratches
translate into noise on playback.
Digital audio systems are not immune to noise. When a signal is
converted to digital form, it is analyzed into a certain number of levels,
65,536 in the compact disc format for example. Of course, most of the
time the original signal will fall between levels, therefore the analysis is
only an approximation. The inaccuracies necessarily produced are termed
Signal to Noise Ratio
Signal to noise ratio is one measure of how noisy a piece of equipment is.
We said earlier that a common operating level is +4 dBu. If all signal
were removed and the noise level at the output of the console measured,
we might obtain a reading somewhere around –80 dBu. This would mean
that the signal to noise ratio is 84 dB. In analog equipment, a signal to
noise ratio of 80 dB or more is considered good. The worst piece of
equipment as far as noise is concerned is the analog tape recorder, which
can only turn in a signal to noise ratio of around 65 dB. The noise is quite
audible behind low-level signals. Outside of the professional domain, a
compact cassette recorder without noise reduction can only manage
around 45 dB. This is only adequate when used for information content
only, for instance in a dictation machine, or for music which is loud all
the time and therefore masks the noise.
As we said, digital equipment suffers from noise too. Quantization noise
is more grainy in comparison to analog noise and therefore subjectively
more annoying. Digital equipment requires a better signal to noise ratio.
In basic terms, the signal to noise ratio of any digital system can be
calculated by multiplying the number of bits by six. So the compact disc
format with a resolution of 16 bits has a signal to noise ratio of 16 x 6 =
96 dB, if all other parts of the system are optimized. Currently the
professional standard is moving to 24-bit resolution, therefore the
theoretical signal to noise ratio would be 24 x 6 = 144 dB. This is
actually greater than the useful dynamic range of the human ear, but in
practice this idealized figure is never attained.
Another way of measuring the noise performance of equipment is EIN or
Equivalent Input Noise, and this is mainly of relevance to microphone
preamplifiers. An example spec might be 'EIN at 70 dB gain: -125 dBu
(200 ohm source)'. This means that the gain control was set to 70 dB and
the noise measured at the output of the mic preamp - in this case the
measurement would be –55 dBu. When the set amount of gain is
subtracted from this we get the amount of noise that would have to be
present at the input of a noiseless mic amp to give the same result. The
'200 ohm source' bit is necessary to make the measurement meaningful.
If the EIN figure does not give the source impedance, then I am afraid the
measurement is useless. Perhaps it is giving the game away to say that
the reason a gain of 70 dB is quoted is because mic preamps normally
give their optimum EIN figures at a fairly high gain. The lower the gain
at which a manufacturer dare quote the EIN, the better the mic input
Noise as discussed above is a steady-state phenomenon. It is annoying,
but the ear has a way of tuning out sounds that don’t change. However,
there is another type of noise that constantly changes in level, and that is
modulation noise. One source of modulation noise is that which occurs in
analog tape recorders. The effect is that as the signal level changes, the
noise level changes. This can be irritating when the signal is such that it
doesn't adequately mask the noise. A low frequency signal with few
higher harmonics is probably the worst case and will demonstrate
modulation noise quite clearly. Noise reduction systems, as mainly used
in analog recording, also have the effect of creating modulation noise.
Noise reduction systems work by bringing up the level of low-level
signals before they are recorded, and reducing the level again on
playback – at the same time reducing the level of tape noise.
Unfortunately, the noise level is now in a state of constant change and
thereby drawing attention to itself. Some noise reduction systems have
means of minimizing this effect. All of the various Dolby systems, for
example, work well when properly aligned.
Quantization noise in digital systems is also a form of modulation noise.
At very low signal levels it is sometimes possible to hear the noise level
going up and down with the signal.
Where you are most likely to hear modulation noise is on a so-called Hifi
VHS video recorder. The discontinuous nature of the audio track causes a
low frequency fluttering noise which requires noise reduction to
minimize. On some machines, this noise reduction is not wholly effective
and the modulation noise created can be very irritating.
It is worth saying that signal to noise ratio should be measured with any
noise reduction switched out, otherwise the comparison between peak or
operating level and the artificially lowered noise floor when signal is
absent gives an unfairly advantageous figure unrepresentative of the
subjective sound quality of the equipment in question.
Unfortunately, any item of sound equipment 'bends' or distorts the sound
waveform to a greater or lesser extent. This produces, from any given
input frequency, additional unwanted frequencies. Usually, distortion is
measured as a percentage. For a mixing console or an amplifier, anything
less than 0.1% is normally considered quite adequate, although once
again it's the analog tape recorder that lets us down with distortion
figures of anything up to 1% and above.
Distortion normally comes in two varieties: harmonic distortion and
intermodulation distortion. Looking at the harmonic kind first, suppose
you input a 1 kHz tone into a system. From the output you will get not
only that 1 kHz tone but also a measure of 2 kHz, 3 kHz, 4 kHz etc. In
fact, harmonic distortion always comes in integral multiples of the
incoming frequency - rather like musical harmonics in fact. This is why
distortion is sometimes desirable as an effect - it enhances musical
qualities, used with taste and control of course.
Sine wave - the simplest possible sound with no harmonics
The effect of even-order harmonic distortion on a sine wave
The effect of odd-order harmonic distortion on a sine wave
Intermodulation distortion is not so musical in its effect. This is where
two frequencies combine together in such a way as to create extra
frequencies that are not musically related. For instance, if you input two
frequencies, 1000 Hz and 1100 Hz, then intermodulation will produce
sum and difference frequencies – 2100 Hz and 100 Hz.
A third form of distortion is clipping. This is where a signal ‘attempts’ to
exceed the level boundaries imposed by the voltage limits of a piece of
equipment. In modern circuit designs the peaks of the waveform are
flattened off causing a rather unpleasant sound. In vintage equipment the
peaks can be rounded off, or strange things can happen such as the signal
completely disappearing for a second or two.
Crosstalk is defined as a leakage of signal from one signal path to
another. For instance, if you have cymbals or hihat on one channel of
your mixing console and you find they are leaking through to the
adjacent channel, then you have a crosstalk problem. Crosstalk can
consist of the full range of audio frequencies, in which case there is a
resistive path causing the leakage. More often crosstalk is predominantly
higher frequencies, which jump from one circuit track to another through
capacitance. In analog tape recorders, an effect known as fringing allows
low frequencies to leak into adjacent tracks on replay. The worst problem
caused by crosstalk is when timecode leaks from its allocated track or
channel into another signal path. Timecode – used to synchronize audio
and video machines – is an audio signal which to the ear sounds like a
very unpleasant screech. It only takes a little crosstalk to allow timecode
to become audible.
I have already mentioned the concept of operating level which is the
'round about' preferred level in a studio. This would typically be 4 dBu in
a professional studio. But above operating level there needs to be a
certain amount of headroom before the onset of clipping. This is most
important in a mixing console where the level of each individual signal
can vary considerably due to: 1) less than optimal setting of the gain
control, 2) gain due to EQ, or perhaps 3) unexpected enthusiasm on the
part of a musician. Also, when signals are mixed together, the resulting
level isn't always predictable. Professional equipment can handle levels
up to +20 dBu or +26 dBu, therefore there is always plenty of headroom
to play with. Of course, the more headroom you allow, the worse the
signal to noise ratio, so it is always something of a compromise.
In recording systems, it is common to reduce headroom to little or zero.
The recording system is at the end of the signal chain and there are fewer
variables. Nevertheless, it does depend on the nature of the signal source.
If it is a stereo mix from a multitrack recording, then the levels are
known and easily controllable therefore hardly any headroom is required.
If it is a recording of live musicians in a concert setting, then much more
headroom must be allowed because of the more unpredictable level of the
signal, and also because there isn't likely to be a second chance if
Wow and Flutter
The era of wow and flutter is probably coming to an end, but it hasn't
quite got there yet so we need some explanation. Wow and flutter are
both caused by irregularities in mechanical components of analog
equipment such as tape recorders and record players. Wow causes a long-
term cyclic variation in pitch that is audible as such. Flutter is a faster
cyclic variation in pitch that is too fast to be perceived as a rise and fall in
pitch. Wow is just plain unpleasant. You will hear it most often, and at its
worst, on old-style juke boxes that still use vinyl records. Flutter causes a
‘dirtying’ of the sound, which used to be thought of as wholly
unwelcome. Now, when we can have flutter-free digital equipment any
time we want it, old-style analog tape recorders that inevitably suffer
from flutter to some extent have a characteristic sound quality that is
often thought to be desirable. Wow and flutter are measured in
percentage, where less than 0.1% is considered good.
• What is meant by '0 dBm'?
• What is meant by '0 dBu'?
• What operating level is commonly used by semi-professional
• What does the term 'dBFS' mean?
• What level is commonly used as the reference level for analog
magnetic tape in North America?
• Which has the greater heating effect: 100 V RMS or 100 V DC?
• What is meant by 'unity gain'?
• Why is it not acceptable to quote the frequency response of a piece
of equipment as '20 Hz to 20 kHz'?
• What is meant by 'signal to noise ratio'?
• What is meant by 'EIN'?
• What is modulation noise?
• What is harmonic distortion?
• What is intermodulation distortion?
• What is clipping?
• What is headroom?