The document discusses analyzing function words in literary texts using text mining techniques. It notes that while function words are typically ignored in text mining as "stop words", they make up about 50% of tokens in a text and play an important grammatical and syntactic role. It questions whether function words should also be considered meaningless from a semantic and literary perspective, and presents analysis of frequency profiles of function words in characters from Charles Dickens' Christmas Books to argue they may provide insights.
4. Marley was dead: to begin with. There is no doubt whatever
about that. The register of his burial was signed by the
clergyman, the clerk, the undertaker, and the chief mourner.
Scrooge signed it. And Scrooge's name was good upon 'Change,
for anything he chose to put his hand to.
Old Marley was as dead as a door-nail.
Mind! I don't mean to say that I know, of my own knowledge,
what there is particularly dead about a door-nail. I might have
been inclined, myself, to regard a coffin-nail as the deadest piece
of ironmongery in the trade. But the wisdom of our ancestors is
in the simile; and my unhallowed hands shall not disturb it, or the
Country's done for. You will therefore permit me to repeat,
emphatically, that Marley was as dead as a door-nail.
4
5. Marley was dead: to begin with. There is no doubt whatever
about that. The register of his burial was signed by the
clergyman, the clerk, the undertaker, and the chief mourner.
Scrooge signed it. And Scrooge's name was good upon 'Change,
for anything he chose to put his hand to.
Old Marley was as dead as a door-nail.
Mind! I don't mean to say that I know, of my own knowledge,
what there is particularly dead about a door-nail. I might have
been inclined, myself, to regard a coffin-nail as the deadest piece
of ironmongery in the trade. But the wisdom of our ancestors is
in the simile; and my unhallowed hands shall not disturb it, or the
Country's done for. You will therefore permit me to repeat,
emphatically, that Marley was as dead as a door-nail.
—Charles Dickens, A Christmas Carol (1843) 5
6. Function words(機能語)
the, of, and, in, to, I, you, . . .
l 高頻度で生起する単語
l 頻度上位60項目のトークン数はテクストの総トークン数
(総語数)の約50%をカバー
l テクストマイニングでも通常は stop words(ゴミ)として
排除(無視)される項目。いわんや,文学批評で取り上げ
られることなど殆ど無い。
6
7. Function words(機能語)
the, of, and, in, to, I, you, . . .
l 高頻度で生起する単語
l 頻度上位60項目のトークン数はテクストの総トークン数
(総語数)の約50%をカバー
l テクストマイニングでも通常は stop words(ゴミ)として
排除(無視)される項目。いわんや,文学批評で取り上げ
られることなど殆ど無い。
7
8. Function words(機能語)
the, of, and, in, to, I, you, . . .
l 高頻度で生起する単語
l 頻度上位60項目のトークン数はテクストの総トークン数
(総語数)の約50%をカバー
l テクストマイニングでも通常は stop words(ゴミ)として
排除(無視)される項目。いわんや,文学批評で取り上げ
られることなど殆ど無い。
8
9. Function words(機能語)
the, of, and, in, to, I, you, . . .
l 高頻度で生起する単語
l 頻度上位60項目のトークン数はテクストの総トークン数
(総語数)の約50%をカバー
l テクストマイニングでも通常は stop words(ゴミ)として
排除(無視)される項目。いわんや,文学批評で取り上げ
られることなど殆ど無い。これらの語はテクスト中で統
語的,文法的に存在するが,意味的,文学的には空気の
ような無標のものでしかない。
9
12. Table 1. Christmas Booksを構成する五作品
Abbrev. Title Date Total of tokens Tokens in dialogue
Carol A Christmas Carol 1843 28,420 7,917
Chimes The Chimes 1844 30,805 13,240
Cricket The Cricket on the Hearth 1845 31,832 11,627
Battle The Battle of Life 1846 29,598 13,325
Haunted The Haunted Man 1848 33,949 15,559
11
17. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
-6.0 É
Ghost
É
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 16
18. 0.3
not
do
are you it
0.2
what know all I
there how dear
come if was
she
0.1 that(d) am had
good very her
me
they never
to(i) at
PC 2 (11.84%)
will(md) been so(a.d.)
here we have when
is one for
0 can
that(c)
or would
your but
a to(p)
him
on(p) be has he
-0.1 as
and
my
of
with Frequency
no(det)
-0.2 the
this his
0 20 40
in
-0.3
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
PC 1 (15.03%)
Fig. 2 Christmas Booksの高頻度語60タイプの相互関係 17
19. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
-6.0 É
Ghost
É
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 18
20. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 19
21. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 19
22. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 20
23. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOT É
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
Dominant ÉÉ Submissive
-2.0 Warden Will
É
支配的 Dr Jeddler 順良
威圧的 献身的
-4.0
強権的 自己犠牲的
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 20
24. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 21
25. 6.0
対話的
口語的
4.0 庶民的 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0 独白的
衒学的
‘モンスター’ É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 21
36. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1
Principal component scores for the 24 major characters:
based on the 30 most common word-types in the corpus 32
37. 6.0
Interpersonal,
vernacular
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOT É
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Monologic, Ghost
É X FEMALE CHARACTERS
Latinate Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
Dominant PC 1 (15.04%) Submissive
Fig. 1
Principal component scores for the 24 major characters:
based on the 30 most common word-types in the corpus 32
38. 0.3
not
do
are you it
0.2
what know all I
there how dear
come if was
she
0.1 that(d) am had
good very her
me
they never
to(i) at
PC 2 (11.84%)
will(md) been so(a.d.)
here we have when
is one for
0 can
that(c)
or would
your but
a to(p)
him
on(p) be has he
-0.1 as
and
my
of
with Frequency
no(det)
-0.2 the
this his
0 20 40
in
-0.3
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
PC 1 (15.03%)
Fig. 2 Principal Components Analysis of the 30 most common word-types in the
language of dialogue: based on the 24 major characters 33
39. 0.3
not
Interpersonal, do
vernacular are you it
0.2
what know all I
there how dear
come if was
she
0.1 that(d) am had
good very her
me
they never
to(i) at
PC 2 (11.84%)
will(md) been so(a.d.)
here we have when
is one for
0 can
that(c)
or would
your but
a to(p)
him
on(p) be has he
-0.1 as
and
my
of
with Frequency
no(det)
Monologic, -0.2 the
Latinate this his
0 20 40
in
-0.3
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
Dominant PC 1 (15.03%)
Submissive
Fig. 2 Principal Components Analysis of the 30 most common word-types in the
language of dialogue: based on the 24 major characters 33
40. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 34
41. 6.0
4.0 X
É Mrs TETTERBY
Tackleton X
Tetterby
É
CLEMENCY
2.0 ÉÉ Toby
Scrooge XDOTÉ
É Caleb
Alderman MEGX É
PC 2 (11.84%)
0.0 É John X
William É MARION
É Alfred
Redlaw É X XMILLY
Snitchey É Philip BERTHA
ÉÉ
-2.0 Warden Will
É
Dr Jeddler
-4.0
É Male characters
-6.0 É
Ghost
É X FEMALE CHARACTERS
Sir Joseph
-8.0
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0
PC 1 (15.04%)
Fig. 1 Christmas Booksの主要登場人物24人の相互関係:高頻度語60タイプを変数として 34
42. 3.0
Tackleton Tetterby
Alderman É É
2.0 Scrooge Mrs TETTERBY
É
É X
BERTHA
1.0 Redlaw Caleb DOT X
PC 2 (9.78%)
X MARION
É É X CLEMENCY X
0.0 Toby É Will John MILLY
É
Warden É Alfred ÉÉ MEG X
William É X
-1.0 Philip É
Dr Jeddler Snitchey É Male characters
-2.0 É É
Ghost
Sir Joseph É É X FEMALE CHARACTERS
-3.0
-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0
PC 1 (33.44%)
Fig. 3
言語と性差:21項目の性差マーカーの生起頻度に基づく
35
43. 0.6
Frequency/1,000
you
0.5
0 20 40
0.4
will(md)
0.3 me
0.2 what dear
good had
PC 2 (9.8%)
how never
0.1 so(a.d.)
was
0 when
is
him
-0.1 that(c)
of would
-0.2
a
-0.3 the
he and
-0.4
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
PC 1 (33.4%)
Fig. 4
Gender-oriented Difference in idiolects: Word plot for
the 21 most common ‘marker’ words in the corpus
36
44. 140
E Male Characters
X FEMALE CHARACTERS Scrooge#4
H
120 H Scrooge in 5 Parts
Scrooge#5
H
100
Sir Joseph Scrooge#3
Alderman E H
E
Scrooge#2
H Caleb
E
PC 3 (12.97%)
80 Dr Jeddler Tackleton
E EGhost
E
E Redlaw
Toby E
William E
MILLY X
Scrooge#1 Will E X BERTHA
H
60 Alfred
Tetterby EE EJohn
Snitchey X MEG Warden
E
E X DOT
40 X
CLEMENCY MRS TETTERBY
X
Philip
E
20
MARION
X
0
-60 -40 -20 0 20 40 60 80
PC 2 (16.22%)
Fig. 5
A Christmas Carol における Scrooge 改心の軌跡:
頻度上位 20 タイプの生起頻度行列に基づく主成分分析の結果 37
45. 140
E Male Characters
X FEMALE CHARACTERS Scrooge#4
H
120 H Scrooge in 5 Parts
Scrooge#5
H
100
Sir Joseph Scrooge#3
Alderman E H
E
Scrooge#2
H Caleb
E
PC 3 (12.97%)
80 Dr Jeddler Tackleton
E EGhost
E
E Redlaw
Dominant William
Toby
E X
E Submissive
Scrooge#1 MILLY
支配的 H
Will E X BERTHA 順良
60 Alfred
威圧的 Tetterby EE EJohn 献身的
Snitchey X MEG Warden
強権的 E X DOT
E
自己犠牲的
40 X
CLEMENCY MRS TETTERBY
X
Philip
E
20
MARION
X
0
-60 -40 -20 0 20 40 60 80
PC 2 (16.22%)
Fig. 5
A Christmas Carol における Scrooge 改心の軌跡:
頻度上位 20 タイプの生起頻度行列に基づく主成分分析の結果 37
46. Important thematic contrasts like those between
social superiors and social inferiors, callousness versus
sympathy, cynicism versus family love and fellow
feeling, and so on, are underscored by the
differentiation of idiolects or the variation within a
single idiolect.
38
47. Among the words lying to the LEFT in Fig 1 and
characterise the idiolects of characters demonstrating
dominance and habits of command, the first person
pronoun you seems to play the most significant role,
and other words, such as are and will, function in
relation to you or, to put it in another way, they are in
frequent co-occurrence with you.
39
48. (1)A marked recourse to you know
‘It’s my place to give advice, you know, because I’m a
Justice. You know I’m a Justice, don’t you?’ (Chimes:
172)
‘You will come to the wedding? We are in the same
boat, you know’ (Cricket: 43).
40
49. (2) Assertive statements using you
‘What right have you to be merry? what reason have you to
be merry? You’re poor enough.’ (Scrooge, Carol: 48); ‘After you
are married, you’ll quarrel with your husband, and come to be
a distressed wife. You may think not: but you will, because I tell
you so’ (Alderman, Chimes: 172).
41
50. (3) Impolite vocatives:
Tackleton addresses John with ‘you dog!’ (Cricket: 45), Edward
with ‘you vagabond’ (113), and Alderman Cute calls Richard ‘you
dull dog’ (Chimes: 173), and ‘you silly fellow’ (173).
These vocatives stand in marked contrast with more intimate
vocatives, like ‘my dear’ or ‘my darling’ to which gentle and
tender-hearted characters, such as Toby, Caleb, John, Philip, and
so on, frequently resort. The differences in the use of vocatives
indicate, with concise expressiveness, the relationships between
the characters as well as their attitude to each other.
42
51. (4) The sarcastic or taunting use of ‘you’.
This type of usage frequently appears in Scrooge’s idiolect
before he is haunted by the ghosts: (in the scene where his
nephew invites Scrooge to the Christmas dinner)
‘What LEFT have you to be merry? what reason have you to
be merry? You’re poor enough.’ (Carol: 48);
in The Chimes it is often seen in the Alderman’s relentless
bantering of Meg:
‘After you are married, you’ll quarrel with your husband, and
come to be a distressed wife. You may think not: but you will,
because I tell you so’ (Chimes: 172).
43