Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Goiken2008 slide01

216 views

Published on

  • Be the first to comment

  • Be the first to like this

Goiken2008 slide01

  1. 1. 1— — Hilofumi Yamamoto November 8, 2008
  2. 2. 2• ( , 2005, 2006, 2007)••••• ( , 1983; , 1989)•
  3. 3. 3 ) ) ) ) 07 ) 86 4) 44) ) 205 05 51 0 (1 0 2 11 •11 18 (1 8 ( •9 ( •9 ( •1 (• ( (1 8 q= 8 8 d =8 =8 =8 :# = e@ 0d= =&0 MU l2V=8 =8 78E:# 8 E 8 =& 8e 6b ; @i: ? 46 56 79 38 20 44 17 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡900 950 1000 1050 1100 1150 1200 1250
  4. 4. 4•• (1976)• (1991)• (1998)•••• →
  5. 5. 5•• 9484 ( )• kh (β )• ( ) t2c•• (48732) (1408) (49)
  6. 6. 6/$N / Fb /$K / =U /$O / Mh / $K / $1$j / 2) /$N / E`$l / $k / N^ / :# /$d / 2r$/ / $i$`• – – – ...
  7. 7. 7•• ( , 1983)• ( , 1996)• idf (Sp¨rck Jones, 1972) a N idf (t, N ) = log df (t)
  8. 8. 8idf : inverse document frequency N idf (ari, N ) = log (1) df (ari) 9484 = log (2) 1201 = log 7.89.. = 2.07.. (3) N idf (uguisu, N ) = log (4) df (uguisu) 9484 = log (5) 101 = log 93.90.. = 4.54.. (6)
  9. 9. 9 3500 L-Shape Freq-Type 3000 2500number of type 2000 1500 1000 500 0 0 200 400 600 800 100012001400160018002000 frequency
  10. 10. 10 1200 idf J-Shape IDF-Type 1000 idf 800number of type idf idf 600 400 200 0 1 2 3 4 5 6 7 8 9 inverse document frequency (idf)
  11. 11. 11• ( )•• tfidf w(t, K, N ) = (1 + log tf (t, K)) idf (t, N )
  12. 12. 12 (cw) w(t, K, N ) = (1 + log tf (t, K)) idf (t, N ) (7) √ cidf (t1 , t2 , N ) = idf (t1 , N ) idf (t2 , N ) (8) ctf (t1 , t2 , K) = 1 + log |{k : t1 , t2 ∈ k}| (9)• K• (8)• (9) K•
  13. 13. 13 cidf ˙ 1000frequency of patterns 800 600 400 200 0 0 1 2 3 4 5 6 7 8 9 cidf
  14. 14. 14 (cw) |N |ictf (t1 , t2 , N ) = 1 + log (10) |{n : t1 , t2 ∈ n}| cw(t1 , t2 ) = ctf (t1 , t2 , K) ictf (t1 , t2 , N ) cidf (t1 , t2 , N ) (11) • K N • • K • N
  15. 15. 15 cw 900 ¨ ‚¯”£ 1cumulative frequency of patterns 8 2 800 3 4 700 1 5 6 7 600 8 3 500 400 7 2 300 200 5 cw z 6 100 4 0 0 10 20 30 40 50 60 70 80 90 100 co-occurrence weight (cw)
  16. 16. 161σ 16 ( )
  17. 17. 17
  18. 18. 18
  19. 19. 19
  20. 20. 20
  21. 21. 21 (1) t1 –t2 cw z ctf idf (t1 ) idf (t2 )(24) – 86.06 3.33 10 3.18 4.63 – 65.15 1.76 5 3.18 3.26 – 64.32 1.70 2 3.43 4.69 – 63.36 1.62 2 3.18 4.92 – 61.87 1.51 2 3.18 4.69 – 60.36 1.40 4 3.18 3.18 – 55.34 1.02 2 3.18 4.37(11) – 54.69 1.33 3 3.18 4.63 – 52.40 1.12 3 3.18 3.26 – 51.40 1.03 1 3.18 8.06 – 51.28 1.02 2 3.43 4.63(15) – 80.25 3.74 8 3.18 4.63 – 55.90 1.54 2 3.18 3.83 – 54.92 1.46 8 3.18 2.08 – 54.35 1.40 2 3.18 3.95 – 52.42 1.23 2 3.18 3.37 – 50.48 1.05 1 3.18 7.77 (3) N/A
  22. 22. 22 (2) t1 –t2 cw z ctf idf (t1 ) idf (t2 )(5) – 72.27 3.34 4 3.43 4.63 – 52.17 1.44 2 3.43 3.95 – 51.68 1.40 2 3.43 3.71 – 51.00 1.33 2 3.43 3.43 – 49.48 1.19 4 3.43 2.08 – 48.33 1.08 1 3.43 6.59 – 47.56 1.01 1 3.43 6.38(6) N/A(9) N/A (24) – 63.56 1.64 3 3.43 4.63 – 62.38 1.55 3 3.43 3.14 – 62.18 1.53 4 3.18 4.63 – 56.96 1.14 1 3.43 9.16
  23. 23. 23•• (cw) z 1σ 1σ(16 )••
  24. 24. 24•••• http://etymology.jp/waka/poem.cgi XML(SVG)•

×