Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hideitsu Hino

464 views

Published on

Hideitsu Hino

Published in: Science
  • Be the first to comment

  • Be the first to like this

Hideitsu Hino

  1. 1. 2016/06/06 @ 1 / 74
  2. 2. 1 2 3 2 / 74
  3. 3. talk 1 If (x) = − ln f(x) H(f) = − f(x) ln f(x)dx f X H(X) 1 Shannon Renyi ((1 − α)−1 log f(x)α dx) Tsallis ((q − 1)−1 (1 − fq (x)dx)) ( ) 3 / 74
  4. 4. talk H(f, g) =Ef [Ig(X)] = − f(x) ln g(x)dx, H(f) =Ef [If (X)] = − f(x) ln f(x)dx Kullback-Leibler DKL(f, g) = Ef [Ig(X)] − Ef [If (X)] = f(x) ln f(x) g(x) dx MI(X, Y ) = H(X) + H(Y ) − H(X, Y ) H(X, Y ) X Y 4 / 74
  5. 5. KL 5 / 74
  6. 6. KL 5 / 74
  7. 7. m Y ∈ Rm n X ∈ Rn Y W ∈ Rn×m : Y = WX. (1) Y W WX f(WX) WX (m ) f(wjX), j = 1, . . . , m W [Hyv¨arinen&Oja, 2000] 6 / 74
  8. 8. 7 / 74
  9. 9. k L(c1, . . . , cK) = n i=1 min l=1,...,K ∥xi − cl∥2 . 8 / 74
  10. 10. k L(c1, . . . , cK) = n i=1 min l=1,...,K ∥xi − cl∥2 . A Nonparametric Information Theoretic Clustering Algorithm −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 (a) (b) (c) −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 (d) (e) (f) Figure 2. Comparison of the proposed clustering method NIC and the k-means clustering algorithm on thr cases. (a)-(c) NIC, (d)-(f) k-means. Fig. from [Faivishevsky&Goldberger, 2010] 8 / 74
  11. 11. H(X|Y ) 9 / 74
  12. 12. H(X|Y ) A Nonparametric Information Theoretic Clustering Algorithm −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 (a) (b) (c) −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 (d) (e) (f) 9 / 74
  13. 13. H(X|Y ) Fisher [Hino&Murata, 2010] 10 / 74
  14. 14. −3 −2 −1 0 1 2 3 −3−2−10123 1st axis 2ndaxis LDA minH −3 −2 −1 0 1 2 3 −3−2−10123 1st axis 2ndaxis −3 −2 −1 0 1 2 3 −3−2−10123 1st axis 2ndaxis LDA minH −3 −2 −1 0 1 2 3 −3−2−10123 1st axis 2ndaxis 11 / 74
  15. 15. 12 / 74
  16. 16. 12 / 74
  17. 17. European single market completed The Great Hanshion- Awaji Earthquake decay of bubble economy the Gulf war TOPIX ChangePointScore 10001500200025003000 0.000.020.040.060.080.10 1988!02!01 1988!09!01 1989!05!01 1989!12!01 1990!08!01 1991!04!01 1992!04!01 1992!10!01 1993!06!01 1993!12!01 1994!07!01 1995!02!01 1995!09!01 1996!04!01 : score(t) = log fafter(t) fbefore(t) . [Murata+, 2013, Koshijima+, 2015] 13 / 74
  18. 18. f(xt+1|xt:1) 50%, 95% f(xt+1|xt:1) 14 / 74
  19. 19. ( ) 15 / 74
  20. 20. ( ) 15 / 74
  21. 21. ( ) Vapnik 15 / 74
  22. 22. 16 / 74
  23. 23. 17 / 74
  24. 24. 18 / 74
  25. 25. 1 2 3 19 / 74
  26. 26. D = {xi}n i=1 ⊂ R 1 D i.i.d. 20 / 74
  27. 27. f(x) = 5 8 φ(x; µ = 0, σ = 1) + 3 8 φ(x; µ = 3, σ = 1) 21 / 74
  28. 28. f(x) = 5 8 φ(x; µ = 0, σ = 1) + 3 8 φ(x; µ = 3, σ = 1) 21 / 74
  29. 29. 22 / 74
  30. 30. 23 / 74
  31. 31. ˆf(x; h) = 1 nh n i=1 κ((x − xi)/h) (2) κ κ(x)dx = 1 h > 0 κh(x) = h−1κ(x/h) ˆf(x; h) = 1 n n i=1 κh(x − xi) 23 / 74
  32. 32. κ N(0, 1) 24 / 74
  33. 33. κ N(0, 1) 24 / 74
  34. 34. κ N(0, 1) 24 / 74
  35. 35. x MSE(mean squared error): ˆθ MSE(ˆθ) = E[(ˆθ − θ)2 ] = Var[ˆθ] + (E[ˆθ] − θ)2 E[ ˆf(x; h)] = E[κh(x − X)] = κh(x − y)f(y)dy (f ∗ g)(x) = f(x − y)g(y)dy ˆf(x; h) E[ ˆf(x; h)] − f(x) = (κh ∗ f)(x) − f(x). Var[ ˆf(x; h)] = 1 n (κ2 h ∗ f)(x) − (κh ∗ f)2 (x) 25 / 74
  36. 36. x MSE[ ˆf(x; h)] = 1 n (κ2 h ∗ f)(x) − (κh ∗ f)2 (x) + {(κh ∗ f)(x) − f(x)}2 26 / 74
  37. 37. L2 ( ) : ISE(integrated squared error) ISE[ ˆf(·; h)] = ˆf(x; h) − f(x) 2 dx 27 / 74
  38. 38. ˆf(x; h) D = {xi}n i=1 ISE ˆf D MISE(mean integrated squared error) MISE[ ˆf(·; h)] =ED[ISE[ ˆf(·; h, D)]] = ED([ ˆf(x; h, D) − f(x)])2 dx = MSE[ ˆf(x; h, D)]dx 28 / 74
  39. 39. MISE[ ˆf(·; h)] =n−1 (κ2 h ∗ f)(x) − (κh ∗ f)2 (x) dx + {(κh ∗ f)(x) − f(x)}2 dx =(nh)−1 κ2 (x)dx + (1 − n−1 ) (κh ∗ f)2 (x)dx − 2 (κh ∗ f)(x)f(x)dx + f(x)2 dx. 29 / 74
  40. 40. MISE h MISE h 30 / 74
  41. 41. 1 f C2- L2 2 {hn} hn n h n : lim n→∞ h = 0, lim n→∞ nh = ∞. 3 κ 4 κ(x)dx = 1, xκ(x)dx = 0, µ2(κ) = x2 κ(x)dx < ∞ 31 / 74
  42. 42. E[ ˆf(x; h)] = κ(z)f(x − hz)dz f(x − hz) f(x − hz) = f(x) − hzf′ (x) + 1 2 h2 z2 f′′ (x) + o(h2 ) E[ ˆf(x; h)] = f(x) + 1 2 h2 f′′ (x) z2 κ(z)dz + o(h2 ) E[ ˆf(x; h)] − f(x) = 1 2 h2 µ2(κ)f′′ (x) + o(h2 ) (3) ˆf f 32 / 74
  43. 43. g R(g) = g2(x)dx Var[ ˆf(x; h)] = (nh)−1 R(κ)f(x) + o((nh)−1 ) (4) (2) (3) 0 MSE MSE[ ˆf(x; h)] =(nh)−1 R(κ)f(x) + 1 4 h4 µ2 2(κ)(f′′ (x))2 + o((nh)−1 + h4 ) 33 / 74
  44. 44. MSE MISE[ ˆf(·; h)] = AMISE[ ˆf(·; h)] + o((nh)−1 + h4 ) AMISE[ ˆf(·; h)] = (nh)−1 R(κ) + 1 4 h4 µ2 2(κ)R(f′′ ). AMISE MISE h : hAMISE = R(κ) µ2 2(κ)R(f′′)n 1/5 . 34 / 74
  45. 45. MSE MISE[ ˆf(·; h)] = AMISE[ ˆf(·; h)] + o((nh)−1 + h4 ) AMISE[ ˆf(·; h)] = (nh)−1 R(κ) + 1 4 h4 µ2 2(κ)R(f′′ ). AMISE MISE h : hAMISE = R(κ) µ2 2(κ)R(f′′)n 1/5 . 34 / 74
  46. 46. k f(z) z ∈ Rp D = {xi}n i=1 z k εk z ε p b(z; ε) = {x ∈ Rp|∥z − x∥ < ε} |b(z; ε)| = cpεp cp = πp/2/Γ(p/2 + 1) Γ( · ) 35 / 74
  47. 47. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● xi ∈ D ◦ z ∈ Rp × 36 / 74
  48. 48. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● xi ∈ D ◦ z ∈ Rp × 36 / 74
  49. 49. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 37 / 74
  50. 50. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 37 / 74
  51. 51. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 37 / 74
  52. 52. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 37 / 74
  53. 53. k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε qz(ε) = b(z;ε) f(x)dx. k/n k ε = εk ε ε kε 38 / 74
  54. 54. k Taylor : qz(εk) = b(z;εk) {f(z) + ∇f(x)(z − x) + O(ε2 k)}dx = |b(z; εk)|(f(z) + O(ε2 k)) ≃ εp kcpf(z). cp Rp 39 / 74
  55. 55. k k n , εp kcpf(z) ˆfk(z) = k cpn ε−p k (5) 40 / 74
  56. 56. k k ˆfk(z) = k cpn ε−p k , (6) εk z D k 41 / 74
  57. 57. 42 / 74
  58. 58. 1 2 3 43 / 74
  59. 59. H(f) D = {xi}n i=1 xi ∈ Rp, i = 1, . . . , n f(x) X 44 / 74
  60. 60. z ε qz(ε) = x∈b(z;ε) f(x)dx (7) 45 / 74
  61. 61. z ε qz(ε) = x∈b(z;ε) f(x)dx (7) qz(ε) = x∈b(z;ε) f(x) + (z − x)⊤ ∇f(z) + O(ε2 ) dx = |b(z; ε)| f(z) + O(ε2 ) = cpεp f(z) + O(εp+2 ) k/n O(εp+2) 45 / 74
  62. 62. z ε qz(ε) ε qz(ε) = cpf(z)εp + p 4(p/2 + 1) cpεp+2 tr∇2 f(z)+O(εp+4 ) (8) 46 / 74
  63. 63. z ε qz(ε) ε qz(ε) = cpf(z)εp + p 4(p/2 + 1) cpεp+2 tr∇2 f(z)+O(εp+4 ) (8) qz(ε) kε/n cpεp kε ncpεp = f(z) + Cε2 + O(ε4 ) (9) C = ptr∇2f(z) 4(p/2+1) 46 / 74
  64. 64. Yε = kε ncpεp Xε = ε2 ε 4 Yε Xε Yε ≃ f(z) + CXε (10) 2 47 / 74
  65. 65. Yε ≃ f(z) + CXε Xε Yε ε 48 / 74
  66. 66. k [ ] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 49 / 74
  67. 67. k [ ] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 49 / 74
  68. 68. k [ ] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 49 / 74
  69. 69. k [ ] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ε z ε ε (k ) 49 / 74
  70. 70. E = {ε1, . . . , εm}, m < n E ε {(Xε, Yε)}ε∈E R = 1 m ε∈E (Yε − f(z) − CXε)2 (11) f(z) C f(z) ˆfs(z) 50 / 74
  71. 71. z ˆfs(z) leave-one-out ˆHs(D) = − 1 n n i=1 ln ˆfs,i(xi), (12) ˆfs,i(xi) xi ˆHs(D) Simple Regression Entropy Estimator (SRE) [Hino+, 2015] 51 / 74
  72. 72. SRE: how it works −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Normal x density 0 1 2 3 40.240.280.320.36 Normal epsilon^2 f(z) Fitted density function Fitted intercept ˆfs(z = 0.5) 52 / 74
  73. 73. SRE: how it works −3 −2 −1 0 1 2 3 0.000.100.200.30 Bimodal x density 1.0 1.5 2.0 2.5 3.0 3.5 4.00.2250.2350.245 Bimodal epsilon^2 f(z) Fitted density function Fitted intercept ˆfs(z = 0.5) 53 / 74
  74. 74. ε xi ∈ D Yε ≃ f(xi) + CXε Yε = kε ncpεp C = ptr∇2f(xi) 4(p/2+1) xi Y i ε Ci : Y i ε ≃ f(xi) + Ci Xε 54 / 74
  75. 75. Y i ε = f(xi) + CiXε xi ∈ D − 1 n n i=1 ln Y i ε = − 1 n n i=1 ln f(xi) + Ci Xε = − 1 n n i=1 ln f(xi) 1 + CiXε f(xi) = − 1 n n i=1 ln f(xi) − 1 n n i=1 ln 1 + CiXε f(xi) ≃ − 1 n n i=1 ln f(xi) − 1 n n i=1 Ci f(xi) Xε 55 / 74
  76. 76. − 1 n n i=1 ln Y i ε ≃ − 1 n n i=1 ln f(xi) − 1 n n i=1 Ci f(xi) Xε ¯Yε = − 1 n n i=1 ln Y i ε H(D) = − 1 n n i=1 f(xi) ¯C = − 1 n n i=1 Ci f(xi) ε > 0 ¯Yε = H(D) + ¯CXε (13) 56 / 74
  77. 77. ε ∈ E (13) Rd = 1 m ε∈E ( ¯Yε − H(D) − ¯CXε)2 Direct Regression Entropy Estimator (DRE) [Hino+, 2015] 57 / 74
  78. 78. qz(ε) = cpf(z)εp + p 4(p/2 + 1) cpεp+2 tr∇2 f(z) + O(εp+4 ) qz(ε) kε/n cpεp kε ncpεp = f(z) + Cε2 + O(ε4 ) Yε = f(z) + CXε 58 / 74
  79. 79. SRE min 1 m ε∈E (Yε − f(z) − CXε)2 , and ˆHs(D) = − 1 n n i=1 ln ˆfi(xi) DRE min 1 m ε∈E ( ¯Yε − H(D) − ¯CXε)2 59 / 74
  80. 80. k 60 / 74
  81. 81. qz(ε) = cpf(z)εp + p 4(p/2 + 1) cpεp+2 tr∇2 f(z) + O(εp+4 ) qz(ε) kε/n n : kε ≃ cpnf(z)εp + cpn p 4(p/2 + 1) tr∇2 f(z)εp+2 61 / 74
  82. 82. kε ≃ cpnf(z)εp + cpn p 4(p/2 + 1) tr∇2 f(z)εp+2 X = (εp, εp+2) Y = kε Y = β⊤X kε Poisson 62 / 74
  83. 83. max L(β) = m i=1 e−X⊤ i β(X⊤ i β)Yi Yi! εp β1 ˆβ1 z ˆβ1/(cpn) SRE LOO Entropy Estimator with Poisson-noise structure and Identity-link regression(EPI) [Hino+,under review] 63 / 74
  84. 84. 1 2 3 64 / 74
  85. 85. H(f) ˆH(D) AE = |H(f) − ˆH(D)| 100 65 / 74
  86. 86. Univariate Case 15 distributions −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Normal x density −3 −2 −1 0 1 2 3 0.00.10.20.30.40.5 Skewed x density −3 −2 −1 0 1 2 3 0.00.20.40.60.81.01.21.4 Strongly Skewed x density −3 −2 −1 0 1 2 3 0.00.51.01.5 Kurtotic x density −3 −2 −1 0 1 2 3 0.000.050.100.150.200.250.30 Bimodal x density −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Skewed Bimodal x density 66 / 74
  87. 87. Univariate Case 15 distributions −3 −2 −1 0 1 2 3 0.000.050.100.150.200.250.30 Trimodal x density −3 −2 −1 0 1 2 3 0.00.10.20.30.40.50.6 10 Claw x density −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Standard Power Exponential x density −3 −2 −1 0 1 2 3 0.050.100.150.200.25 Standard Logistic x density −3 −2 −1 0 1 2 3 0.10.20.30.40.5 Standard Classical Laplace x density −3 −2 −1 0 1 2 3 0.10.20.3 t(df=5) x density 67 / 74
  88. 88. Univariate Case 15 distributions −3 −2 −1 0 1 2 3 0.050.100.150.200.25 Mixed t x density −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 Standard Exponential x density −3 −2 −1 0 1 2 3 0.050.100.150.200.250.30 Cauchy x density 68 / 74
  89. 89. ● ●● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Normal x density −3 −2 −1 0 1 2 3 0.00.10.20.30.40.5 Skewed x density −3 −2 −1 0 1 2 3 0.00.20.40.60.81.01.21.4 Strongly Skewed x density −3 −2 −1 0 1 2 3 0.00.51.01.5 Kurtotic x density −3 −2 −1 0 1 2 3 0.000.050.100.150.200.250.30 Bimodal x density 69 / 74
  90. 90. ● ● ● ● ● ● ●● ● ● −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Skewed Bimodal x density −3 −2 −1 0 1 2 3 0.000.050.100.150.200.250.30 Trimodal x density −3 −2 −1 0 1 2 3 0.00.10.20.30.40.50.6 10 Claw x density −3 −2 −1 0 1 2 3 0.00.10.20.30.4 Standard Power Exponential x density −3 −2 −1 0 1 2 3 0.050.100.150.200.25 Standard Logistic x density 69 / 74
  91. 91. ●● ● ●● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 0.10.20.30.40.5 Standard Classical Laplace x density −3 −2 −1 0 1 2 3 0.10.20.3 t(df=5) x density −3 −2 −1 0 1 2 3 0.050.100.150.200.25 Mixed t x density −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 Standard Exponential x density −3 −2 −1 0 1 2 3 0.050.100.150.200.250.30 Cauchy x density 69 / 74
  92. 92. Univariate Case Results: Curvature and Improvement tr∇2f k γ > 0 : f(x; γ) = 1 πγ(1 + (x/γ)2) . ∇2 f(x; γ) = 2 πγ3 3(x/γ)2 − 1 (1 + (x/γ)2)3 γ 0.01 0.9 n = 300 100 k EPI | ˆHk(D) − H(f)| − | ˆHs(D) − H(f)| 70 / 74
  93. 93. Univariate Case Results: Curvature and Improvement maxx∈R log |∇2f(x; γ)| −0.2 0.0 0.2 0.0 2.5 5.0 7.5 LogMaxCurvature Improvement 71 / 74
  94. 94. That’s all fork Pros. KDE k-NN Cons. 72 / 74
  95. 95. I [Faivishevsky&Goldberger, 2010] Faivishevsky, L. and Goldberger, J. (2010). A Nonparametric Information Theoretic Clustering Algorithm. ICML2010. [Hino+, 2015] Hino, H., Koshijima, K., and Murata, N. (2015). Non-parametric entropy estimators based on simple linear regression. Computational Statistics & Data Analysis, 89(0):72 – 84. [Hino&Murata, 2010] Hino, H. and Murata, N. (2010). A conditional entropy minimization criterion for dimensionality reduction and multiple kernel learning. Neural Computation, 22(11):2887–2923. [Hyv¨arinen&Oja, 2000] Hyv¨arinen, A. and Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4-5):411–430. [Koshijima+, 2015] Koshijima, K., Hino, H., and Murata, N. (2015). Change-point detection in a sequence of bags-of-data. Knowledge and Data Engineering, IEEE Transactions on, 27(10):2632–2644. 73 / 74
  96. 96. II [Murata+, 2013] Murata, N., Koshijima, K., and Hino, H. (2013). Distance-based change-point detection with entropy estimation. In Proceedings of the Sixth Workshop on Information Theoretic Methods in Science and Engineering, pages 22–25. 74 / 74

×