Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keio slide

339 views

Published on

  • Be the first to comment

  • Be the first to like this

Keio slide

  1. 1. 1Hilofumi Yamamoto June 4, 2008
  2. 2. 2• – – – – 1000• (Goodenough, 1981) ( )
  3. 3. 3• —••••
  4. 4. 4••••
  5. 5. 5• (2005)—• (2006)—••••
  6. 6. 6••• ( , 1983; , 1989) → p.2•
  7. 7. 7 ) ) ) ) 07 ) 86 4) 44) ) 205 05 51 0 (1 0 2 11 •11 18 (1 8 ( •9 ( •9 ( •1 (• ( (1 8 q= 8 8 d =8 =8 =8 :# = e@ 0d= =&0 MU l2V=8 =8 78E:# 8 E 8 =& 8e 6b ; @i: ? 46 56 79 38 20 44 17 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡900 950 1000 1050 1100 1150 1200 1250
  8. 8. 81.2. (1976) • •3. (1991)4. (1998)
  9. 9. 9••••→
  10. 10. 10•• 9484 ( )• kh (β )• ( ) t2c•• (48732) (1408) (49)
  11. 11. 11/$N / Fb /$K / =U /$O / Mh / $K / $1$j / 2) /$N / E`$l / $k / N^ / :# /$d / 2r$/ / $i$`• – – – ... ..
  12. 12. 12•••
  13. 13. 13•• ( , 1983)• ( , 1996) idf (inverse document frequency) ( )
  14. 14. 14idf (Sp¨rck Jones, 1972) a N idf (t, N ) = log df (t) N idf (ari, N ) = log (1) df (ari) 9484 = log (2) 1201 = log 7.89.. (3) = 2.07.. (4)
  15. 15. 15idf (Sp¨rck Jones, 1972) a N idf (t, N ) = log df (t) Nidf (uguisu, N ) = log (5) df (uguisu) 9484 = log (6) 101 = log 93.90.. (7) = 4.54.. (8)
  16. 16. 16 3500 L-Shape Freq-Type 3000 2500number of type 2000 1500 1000 500 0 0 200 400 600 800 100012001400160018002000 frequency
  17. 17. 17 1200 idf J-Shape IDF-Type 1000 idf 800number of type idf idf 600 400 200 0 1 2 3 4 5 6 7 8 9 inverse document frequency (idf)
  18. 18. 18• ( )•• tfidf w(t, K, N ) = (1 + log tf (t, K)) idf (t, N )
  19. 19. 19 (cw) w(t, K, N ) = (1 + log tf (t, K)) idf (t, N ) (9) √ cidf (t1 , t2 , N ) = idf (t1 , N ) idf (t2 , N ) (10) ctf (t1 , t2 , K) = 1 + log |{k : t1 , t2 ∈ k}| (11)• K• (10) → cidf• (11) K•
  20. 20. 20 cidf ˙ 1000frequency of patterns 800 600 400 200 0 0 1 2 3 4 5 6 7 8 9 cidf
  21. 21. 21 (cw) |N |ictf (t1 , t2 , N ) = 1 + log (12) |{n : t1 , t2 ∈ n}| cw(t1 , t2 ) = ctf (t1 , t2 , K) ictf (t1 , t2 , N ) cidf (t1 , t2 , N ) (13) • K N • • K • N
  22. 22. 22 cw 900 ¨ ‚¯”£ 1cumulative frequency of patterns 8 2 800 3 4 700 1 5 6 7 600 8 3 500 400 7 2 300 200 5 cw z 6 100 4 0 0 10 20 30 40 50 60 70 80 90 100 co-occurrence weight (cw)
  23. 23. 231σ 16 ( )
  24. 24. 24
  25. 25. 25
  26. 26. 26
  27. 27. 27
  28. 28. 28 (1) t1 –t2 cw z ctf idf (t1 ) idf (t2 )(24) – 86.06 3.33 10 3.18 4.63 – 65.15 1.76 5 3.18 3.26 – 64.32 1.70 2 3.43 4.69 – 63.36 1.62 2 3.18 4.92 – 61.87 1.51 2 3.18 4.69 – 60.36 1.40 4 3.18 3.18 – 55.34 1.02 2 3.18 4.37(11) – 54.69 1.33 3 3.18 4.63 – 52.40 1.12 3 3.18 3.26 – 51.40 1.03 1 3.18 8.06 – 51.28 1.02 2 3.43 4.63(15) – 80.25 3.74 8 3.18 4.63 – 55.90 1.54 2 3.18 3.83 – 54.92 1.46 8 3.18 2.08 – 54.35 1.40 2 3.18 3.95 – 52.42 1.23 2 3.18 3.37 – 50.48 1.05 1 3.18 7.77 (3) N/A
  29. 29. 29 (2) t1 –t2 cw z ctf idf (t1 ) idf (t2 )(5) – 72.27 3.34 4 3.43 4.63 – 52.17 1.44 2 3.43 3.95 – 51.68 1.40 2 3.43 3.71 – 51.00 1.33 2 3.43 3.43 – 49.48 1.19 4 3.43 2.08 – 48.33 1.08 1 3.43 6.59 – 47.56 1.01 1 3.43 6.38(6) N/A(9) N/A (24) – 63.56 1.64 3 3.43 4.63 – 62.38 1.55 3 3.43 3.14 – 62.18 1.53 4 3.18 4.63 – 56.96 1.14 1 3.43 9.16
  30. 30. 30•• (cw) z 1σ 1σ(16 )••
  31. 31. 31•••• http://etymology.jp/waka/poem.cgi XML(SVG)•

×