The document discusses several aspects of how music is cognitively represented and understood by people. It argues that music is structured internally in people's minds in at least three ways: 1) A notational representation similar to what is written on sheet music, 2) An acoustic representation of the sounds as an undifferentiated stream or as meaningfully segmented into musical representations, and 3) A motor representation involved in instrumental performance. It also discusses how the acoustic representation is parsed into meaningful groups and how underlying progressions or "good linkages" between notes in a melody are identified. Understanding music involves recognizing these structural aspects as well as more content-based understanding by relating patterns in the music to familiar abstracted scripts or archetypes.
5. 読書ノート43
The musical mind
The cognitive psychology of music
John A. Sloboda
p11
2. Music , language, and meaning
2. 1. Introduction
This chapter begins by looking at two influential theorists, the linguist Chomsky and the musicologist
Schenker. Their theories have some striking similarities. They both argue, for their own subject matter,
that human behavior must be supported by the ability to form abstract underlying representations. Two of
man's highest and most complex products seem to display something central about his intellect.
(中略)
The major portion of the chapter is organized around the subdivision of language and music into three
components: phonology, syntax, and semantics. Phonology concerns the way in which a potentially
infinite variety of sounds are 'parcelled up' into a finite number of discrete sound caregories which
constitute the basic communitive units. Syntax concerns the way in which these units are combined into
sequences. A major concern of those studying syntax has been the discovery of rules which reliably produce
legal sequences and eliminate illegal ones. Semantics concerns the way in which meaning is carried by the
sequences so constructed.
2.2. Chomsky and Schenker
p16
The demonstration that events quite far apart in both linguistic and musical sequences can have a close
structual relationship has two major consequences. The first is that whatever granners are eventually shown
to be fully adequate for language and music, they must be more powerful than 'finite state' grammers.
Chomsky (1957) provided the classic proof that this type of grammer is inadequate for a natural
language. In such a granner, words are generated one at atime, each word determining the set from which
the next word may be chosen. The rules are thus 'context independent'.It does not matter in which
context a wod is found; exactly the same set of consecutive words is permissible in each case. In a finite
6. state grammer, for instance, the word 'washes' must allow 'himself' and 'herself' to follow, since
'the boy washes himself' and 'the girl washes herself' are both correct English sentences. Yet such a
grammer would allow 'the boy washes herself', which is unaccetable. To improve on a finite-state
grammar we must introduce 'context sensitive' rules which take into account more than the immediately
preceding word. The same arguments apply by analogy to music.
2.3. Other comparisons between langage and music
p18
(d) The natural medium for both langage and music is auditory-vocal. That is, both langage and
music are primarily received as sequences of sounds and produced as sequences of voval movements which
create sounds.
p20
The first question we must ask is whether there is any entity which bears the same relationship to a
musical sequence as a thought bears to a linguistic sequence. A thought is not, in itself, a linguistic sequence
on the argument we have outlined. It exists independently of language and could be entertained by a non-
linguistic or pre-linguistic human. Is there any form of mental activity which could take place in a mind
without musical knowledge that could be somehow expressed by a musical sequence? Such activity would
be, precisely, one which could find musical expression in such natural but diverse forms as a Tibetan chant
or a nursery rhyme. One suggestion is that the mental substrate of music is something like that which
underlies certain types of story. In these stories a starting postion of equilibrium or rest is specified.Then
some disturbance is introduced into the situation, producing various problems and tentions which must be
resolved. The story ends with a return to equilibrium. The underlying representation for music could be
seen as a highly abstracted blueprint for such stories, retaining only the features they all have in common.
The learning of a musical language could then be seen as the acquisition of a way of representing these
features in sound. Maybe, therefore, we should look more closely at Schenker's Ursatz for insight into the
possible nature of universals: for, as a deep structure, it is likely to have a close resemblance to the
underlying thought representation of music.
p21
If we examine an Ursatz such as that given in Example 2.3, we find that all its notes are contained in
the tonic triad (of G major in this case) except the middle note of the upperline (the Urlinie). At the
midpoint of the Urlinie we thus find a departure from the resting position which is established at either end
of the Ursatz. Tension and discordhave been introduced; but it is motivated tension. One may argue that
7. in good stories neither the tensions nor the resolutions are arbitrary. We find it unsatisfactory when the
author introduces some deus ex machina to extricate the hero from a seemingly impossible situation.We
prefer it when the kernel of the solutionj is somehow implicit in what has gone before. For instance, the
villain's evil designs have within them the seeds of their own distruction; the internal dynamics of a
relationship lead the partners to the brink of breakdown and also provide the final resources to save it; and
so on. Similarly, the Ursatz satisfies because it is not just any note (say F) which i troduce tension. It is,
in this case, an A which has two highly important pivotal functions. Firstly, it creates a linear progression
in the Urlinie, B-A-G. The line has its own logic or pattern (two consecutive linear descents of one scale
step) so that, in one sense, the A becomes an inevitable consequence of travelling from the B to the G.
Secondly, it creates, together with its accompanying bass note, the elements of a new triad based on D (A
is the third harmonic of D). The tension-inducing element thus operates by attempting to set up a 'rival
' triad. In the final chord of the Ursatz we witness the 'defeat' of this rival system. Let us, then,
hypothesize that one appropriate 'deep' universal for musical thought is to be summerized in the phrases
'creation and resolution of motivated tension'. This notion has a familk resemblance to the
'implicative' theory of L. B. Meyer (see this chapter, section 2.5).
2.4. Musical phonology
p24
It appears that we acquire the categories of our native language very early in life.A set of studies by
Eimas and his colleagues has demonstrated that three-day-old infans already categorize sounds in the same
way as adults. This precocious ability strongly suggests the existence of special learning mechanisms for
speech patterns, since-of cource-infants categorize differently according to the language they are exposed to
in the first few days of life.
These features of language seem to have some rather close musical parallels. The basic 'phoneme' of
music is a 'note'. Like a phoneme a note is characterized by frequency and duration parameters. Within
a particular musical culture, all music is conposed from a small set of these notes, chosen from an
indefinitely large set of possibilities. Different cultures, however, chosse different subsets of possible notes for
their music. The selection takes place along two dimensions of sound: frequency and duration; these merit
separate discussion.
2.4.1. Categorical perception of frequency
p27
8. This study raises several issues. Firstly, there is the question of the accessibility to listeners of the
uncategorized frequency information. In the speech studies it would appear that this information is not
normally available to conscious perception. In music, however, this information certainly can be made
available to consciousness. If it were not so, then no chord could ever sound badly tuned-the assimilation
to categories would 'complete' the perceptual experience. The music listener, we must conclude, has some
ability to operate both within and outside the categorical mode.
A second question is, therefore, what perceptual contexts encourage categorical perception? For most
listeners the likely answer is that the context must be one which provides a framework will supply, at the
very least, two stimultaneous (or closely consequent) notes. Thus, what music listeners carry in their
memories are not the absolute pitches of any particular scale, but procedures for generating a scale from
any given tonic. In Locke and Keller's study this framework was supplied by the invariant outer notes of
the chord which maintained a 3:2 frequency ratio throughout. This identified them, within diatonic
tonality, as the tonic (first step) and dominant (fifth step) of a diatonic scale. This framework imposed a
categorization on the middle note as submediant (third step) of either a major or a minor diatonic scale.
Had the experiment been carried out using only a single variable note without chordal context, then it is
unlikely that any categorization would have taken place. Data from normal psychophysical studies support
this assertion. There is no evidence of discontinuity in the discrimination functions for single frequencies.
A third question concerns the genesis of categorization within the indivisual. The music data suggest
that, unlike langage, mere exposure to tonal music is insufficient to bring about categorical perception.
Some aspect of musical training heightens the tendency to categorize. For instance, most musical training
involves learning note names and scale terminology. It is possible that possessing verbal labels increases the
likelihood of categorical information being extracted and stored. Another possibility is that, for some
musicians, each category becomes associated with a prototypical 'absolute' frequency band. This is
possible, at least in Western cultures, because there are generally agreed conventions about the precise
frequencies to which instruments should be tuned. Concert A is defined as 440 Hz. Such frequencies could
come to represent the 'central' positions of scale categories for listeners, with deviant pitches being
assigned to the nearest prototype.There is considerable evidence for an ability that could support such
behaviour. It is called 'absolute pitch' or 'perfect pitch' and is possessed by a significant minority of
trained musicians. This is an accurate long-term memory for prototypical pitches and their associated scale
names; and I shall discuss it more fully in Chapter 5.(中略)
p28
9. A different type of evidence for assimilation of music to scale categories is provided by Dowling
(1978). In this study subjects were required to judge pairs of brief melodies as the same or different. In
some cases the second melody (which was always at a diffrent pitch from the first) was an exact
transposition of the first melody. In other cases, the melodic contour was a 'tonal answer' to the first;
that is, the melodic contour was transposed up or down within the same key. This has the consequent that
the exact intervals of original melody are not preserved. (中略)Dowling found that his subjects could
not consistently discriminate exact transpositions from tonal answers. One plausible explanation for this
finding is that, at least for unfamiliar melodies, subjects code melodies as contours in which the number of
scale ateps between adjacent notes is represented, but not the precise pitch distance.
2.4.2 Categorical perception of duration
p28
In a carefully designed series of experiments, Cutting and his colleagues have demonstrated that variations
in rise time (the time from sound onset until yhe time when waveform amplitude reachesits peak) are
responsible for the perception of ithis quality, rise time of 30 ms or less giving rise to 'plucked' sounds,
and those of 60 ms or more producing 'bowed' sounds. The discrimination function shows a peak at
the caregory boundary (around 40 ms) and troughs within each category. The interest of this
phenpmenon is threefold. First, it exactly matches the functions for the categorical perception of a phonetic
distinction in speech: the one displayed between the words 'chop' and 'shop'. This also depends on rise
times and showes a category boundary at about 40 msec. Secondary, adaptation to a sound well within
one category shifts the category booundary towards the unadaptated category. This exactly matches speech
perception data. Thirdly, infants as young as two months demonstrate categorical perception for these
sounds, just as they do for many speech sounds.
These findings have been used to dispute the view that speech perception involves unique psychological
processes and mechanisms. Cutting et al. state that 'it is evident that the arsena of empirical findings which
once distinguished speech perception as a unique type of auditory perception is steadly depleted.' It seems,
in fact, that we possess some perceptual mechanisms which are present from early age and are deployed in
both speech and music perception to produce categorical perception. The reason why the same physical
attribute (rise time) gives rise to different perceptual experience in speech and music is probably that the
acoustic contexts are different. There is evidence that the nature of the immediatedly following vowel/note
affects the way in which the rise portion is heard. For instance, the sound patterns for stop consonants (like
/t/ ans /p/) are heard as chirps when presenreted in isolation. They require a subsequent vowel in order
10. to be heard as speech sounds. Similaly, Cutting et al. sowes that the synthetic musical sounds must persist
for at least 250 ms after the initial rise in order to be heard as plicked or bowed.
The issue of context also assumes central importance in the second major aspect of musical duration that
I wish to discuss. This is the perception of the duration of a note. The results of standard psychological
tests tells us that when two successive isolated tones are presented for discrimination of duration (i.e. subjects
must say which is longer), then the longer the sounds are, the greater must be the difference between them
if it is tobe reliably detected. IN music , however, absolute durations are less important than the rhythmic
implications which notes acquire through their immediate context. Fundamental to most Western music is
the concept of a beat, a musical pulse with underlies any melody. In general, notes will either begin on
the beat or at some simple subdivision of yhe beat (half, third, and quarter being common). This fact is
reflected in musical terminology and notation, where there exists a limited set of categories for discribing
durations of notes. These categories are, for the most part, divisions of a longer category into two equal
halves. Thus, there are two crotchets (quarter notes) to every minim (half note); two quavers (eighth
notes) to every crotchet, and so on. In a paticular duration one of these symbols may be defined as having
a paticular duration (for instance, 'crotchet = 120' is a standard way of indicating that there should be
120 crotchet beats per minute).
p30
In the light of these considerations it becomes a little easier to understand some puzzling data provided
by Sternberg, Kroll, and zukorfsky (1982) who showed that three highly trained professional musicians,
including Pierre Boulez, were unable to reproduce non-standard subdivisions of a beat accurately. In the
experiments subjects heard a series of regular beats at one second intervals, one of which was followed by a
click at a delay ranging from one eiyhth and one seventh of a beat. The reproduction of the shorter
subdivisions (less than one third of a beat) were systematically in error, all being overestimates. In contrast,
reproduction was very accurate when the subdivision was half, three quaters, five sixths, or seven eighths of
a beat. A similar pattern of results was obtained when the subjects were asked to estimate the delay of a
click by giving a verbal categorization (e.g. 'between one eighth and one seventh of a beat'), except that
in this situation the errors were under estimates. Why were these subjects so poor? Maybe accurately
reproduced and estimayed delays correspond to frequently occuring rhythmic patterns in music, Which are
readily categorized.
p32
My discussion of musical phonology has been designed to illustrate one fundamental feature of music
11. behaviour. That is, we tend to caregorize our musical experience along the available dimensions of sound,
giving importance to differences between categories at the expense of differences within categories.The
notions of scale and metre are the fundamental concepts underlying musical phonology (although timbre
and intensity are arguably additional dimensions). The similarities between language and music in this
respect are striking, although categorical perception in music is neither as complet nor as universal as it
appears to be in language. To understand the musical significance of categorization we must now turn to
the way notes are combined with one another. This is the subject matter of musical syntax.