Ch2007slide02

1

— —

Hilofumi Yamamoto

December 13, 2007

2

•
–
–
–
– 1000

• (Goodenough, 1981)
(
)

3

• —
•
•
•

•

5

• (2005)—
• (2006)—
•
•
•
•

6

•
•
• ( , 1983; , 1989)
•

7

) )
) ) 07
) 86 4) 44) ) 205
05 51 0 (1
0 2
11 •11 18 (1
8
( •9 ( •9 ( •1 (• ( (1
8 q=
8 8 d =8 =8 =8
:# = e@ 0d= =&0 MU l2V=8 =8 78E:#
8 E 8 =& 8e 6b ; @i: ?
46 56 79 38 20 44 17
¡

¡

¡

¡

¡

¡

¡

¡
900 950 1000 1050 1100 1150 1200 1250

8

1.
2. (1976)
•
•
3. (1991)

4. (1998)

10

•
• 9484
( )
• kh (β )
• ( ) t2c

•
• (48732) (1408) (49)

11

/$N / Fb /$K / =U /$O / Mh / $K / $1$j / 2) /$N / E`$l / $k / N^ / :# /$d / 2r$/ / $i$`

• – – – ...
..

13

•
•
( , 1983)
•
( , 1996)

idf (inverse document frequency)
( )

14

idf (Sp¨rck Jones, 1972)
a

N
idf (t, N ) = log
df (t)

N
idf (ari, N ) = log (1)
df (ari)
9484
= log (2)
1201
= log 7.89.. (3)
= 2.07.. (4)

15

idf (Sp¨rck Jones, 1972)
a

N
idf (t, N ) = log
df (t)

N
idf (uguisu, N ) = log (5)
df (uguisu)
9484
= log (6)
101
= log 93.90.. (7)
= 4.54.. (8)

16

3500
L-Shape Freq-Type

3000

2500
number of type

2000

1500

1000

500

0
0 200 400 600 800 100012001400160018002000
frequency

17

1200 idf
J-Shape IDF-Type

1000

idf
800
number of type

idf
idf
600

400

200

0
1 2 3 4 5 6 7 8 9
inverse document frequency (idf)

18

• ( )

•

• tﬁdf

w(t, K, N ) = (1 + log tf (t, K)) idf (t, N )

19

(cw)

w(t, K, N ) = (1 + log tf (t, K)) idf (t, N ) (9)
√
cidf (t1 , t2 , N ) = idf (t1 , N ) idf (t2 , N ) (10)
ctf (t1 , t2 , K) = 1 + log |{k : t1 , t2 ∈ k}| (11)

• K

• (10)

• (11) K

•

20

cidf

˙
1000
frequency of patterns

800

600

400

200

0
0 1 2 3 4 5 6 7 8 9
cidf

21

(cw)

|N |
ictf (t1 , t2 , N ) = 1 + log (12)
|{n : t1 , t2 ∈ n}|
cw(t1 , t2 ) = ctf (t1 , t2 , K) ictf (t1 , t2 , N ) cidf (t1 , t2 , N ) (13)

• K N

•

• K

• N

22

cw
900
¨ ‚¯”£
1
cumulative frequency of patterns 8 2
800 3
4
700 1 5
6
7
600 8
3
500

400
7
2
300

200 5 cw z
6

100 4

0
0 10 20 30 40 50 60 70 80 90 100
co-occurrence weight (cw)

28
{ | } (1)
{ | }
t1 –t2 cw z ctf idf (t1 ) idf (t2 )
(24) – 86.06 3.33 10 3.18 4.63
– 65.15 1.76 5 3.18 3.26
– 64.32 1.70 2 3.43 4.69
– 63.36 1.62 2 3.18 4.92
– 61.87 1.51 2 3.18 4.69
– 60.36 1.40 4 3.18 3.18
– 55.34 1.02 2 3.18 4.37
(11) – 54.69 1.33 3 3.18 4.63
– 52.40 1.12 3 3.18 3.26
– 51.40 1.03 1 3.18 8.06
– 51.28 1.02 2 3.43 4.63
(15) – 80.25 3.74 8 3.18 4.63
– 55.90 1.54 2 3.18 3.83
– 54.92 1.46 8 3.18 2.08
– 54.35 1.40 2 3.18 3.95
– 52.42 1.23 2 3.18 3.37
– 50.48 1.05 1 3.18 7.77
(3) N/A

29
{ | } (2)
{ | }
t1 –t2 cw z ctf idf (t1 ) idf (t2 )
(5) – 72.27 3.34 4 3.43 4.63
– 52.17 1.44 2 3.43 3.95
– 51.68 1.40 2 3.43 3.71
– 51.00 1.33 2 3.43 3.43
– 49.48 1.19 4 3.43 2.08
– 48.33 1.08 1 3.43 6.59
– 47.56 1.01 1 3.43 6.38
(6) N/A
(9) N/A
(24) – 63.56 1.64 3 3.43 4.63
– 62.38 1.55 3 3.43 3.14
– 62.18 1.53 4 3.18 4.63
– 56.96 1.14 1 3.43 9.16

30

•

• (cw) z 1σ

1σ(16 )
•

•

31

•
•
•

•
http://etymology.jp/waka/poem.cgi
XML(SVG)
•

Ch2007slide02

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Ch2007slide02

Similar to Ch2007slide02 (20)

Ch2007slide02