Successfully reported this slideshow.
Your SlideShare is downloading. ×

高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 97 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to 高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー) (20)

More from STAIR Lab, Chiba Institute of Technology (7)

Advertisement

Recently uploaded (20)

高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)

  1. 1. 2017/07/21@STAIR Lab AI seminar Improving Nearest Neighbor Methods from the Perspective of Hubness Phenomenon Yutaro Shigeto STAIR Lab, Chiba Institute of Technology
  2. 2. A complete reference list is available at https://yutaro-s.github.io/download/ref-20170721.html
  3. 3. !3 Nearest neighbor methods are •a fundamental technique •used in various fields: NLP, CV, ML, DM
  4. 4. !4 Nearest neighbor methods are •a fundamental technique •used in various fields: NLP, CV, ML, DM
  5. 5. !5 Nearest neighbor methods are •a fundamental technique •used in various fields: NLP, CV, ML, DM
  6. 6. •a fundamental technique •used in various fields: NLP, CV, ML, DM Nearest neighbor methods are !6 cat dog gorilla
  7. 7. •a fundamental technique •used in various fields: NLP, CV, ML, DM Nearest neighbor methods are !7 cat dog gorilla
  8. 8. •a fundamental technique •used in various fields: NLP, CV, ML, DM Nearest neighbor methods are !8 cat dog gorilla
  9. 9. •a fundamental technique •used in various fields: NLP, CV, ML, DM Nearest neighbor methods are !9cat cat dog gorilla
  10. 10. Hubness Phenomenon
  11. 11. !11 The nearest neighbors of many queries are the same objects (“hubs”) Hubness Phenomenon cat [Radovanović+, 2010]
  12. 12. !12 The nearest neighbors of many queries are the same objects (“hubs”) Hubness Phenomenon [Radovanović+, 2010] cat
  13. 13. !13 The nearest neighbors of many queries are the same objects (“hubs”) Hubness Phenomenon [Radovanović+, 2010] cat
  14. 14. !14 The nearest neighbors of many queries are the same objects (“hubs”) Hubness Phenomenon hub [Radovanović+, 2010] cat
  15. 15. : Normal distribution (zero mean) !15 Then it can be shown that Fixed objects , with EX [ x y2 ] EX [ x y1 ] > 0 y1 < y2 is more likely to be closer to more likely to be a hubi.e. Why hubs emerge? [Radovanović+, 2010]
  16. 16. : Normal distribution (zero mean) !16 Then it can be shown that Fixed objects , with EX [ x y2 ] EX [ x y1 ] > 0 y1 < y2 is more likely to be closer to more likely to be a hubi.e. Because this holds for any pair and , objects closest to the origin tend to be hubs This bias is called “spatial centrality” Why hubs emerge? [Radovanović+, 2010]
  17. 17. Variants •Squared Euclidean distance [Shigeto+, 2015] •Inner product [Suzuki+, 2013] !17 EX x y2 2 EX x y1 2 > 0 1 |D| x D x, y2 1 |D| x D x, y1 < 0
  18. 18. !18 Research Objective: Improve the performance of nearest neighbor methods via reducing the emergence of hubs Problem: The emergence of hubs diminishes nearest neighbor methods
  19. 19. Normalization of Distances
  20. 20. !20 [Suzuki+, 2013] Centering: Reducing spatial centrality Spatial centrality implies the object which is similar to the centroid tends to be hub After centering, similarities are identical: i.e., zero centroid tends to be hub
  21. 21. !21 Centering: Reducing spatial centrality Spatial centrality implies the object which is similar to the centroid tends to be hub After centering, similarities are identical: i.e., zero centroid = origin [Suzuki+, 2013]
  22. 22. Mutual proximity: Breaking asymmetric relation !22 [Schnitzer+, 2012] Although hub becomes the nearest neighbor of many objects, such objects can not become the nearest neighbor of hub Mutual proximity makes neighbor relations symmetric hub
  23. 23. Mutual proximity: Breaking asymmetric relation !23 Although hub becomes the nearest neighbor of many objects, such objects can not become the nearest neighbor of hub Mutual proximity makes neighbor relations symmetric hub [Schnitzer+, 2012]
  24. 24. Mutual proximity: Breaking asymmetric relation !24 Although hub becomes the nearest neighbor of many objects, such objects can not become the nearest neighbor of hub Mutual proximity makes neighbor relations symmetric hub [Schnitzer+, 2012]
  25. 25. Zero-Shot Learning [Shigeto+, 2015]
  26. 26. Zero-shot learning Active research topic in NLP, CV, ML Many applications: •Image labeling •Bilingual lexicon extraction + Many other cross-domain matching tasks !26 [Larochelle+, 2008]
  27. 27. …but classifier has to predict labels not appearing in training set ZSL is a type of multi-class classification !27 ZSL task Standard classification task
  28. 28. !28 Pre-processing: Label embedding Labels are embedded in metric space Objects and labels = both vectors label space lion tigerexample space chimpanzee
  29. 29. Find a matrix M that projects examples into label space !29 chimpanzee lion tigerlabel spaceexample space M Training: find a projection function
  30. 30. lion tigerlabel spaceexample space label spaceexample space chimpanzee leopard gorilla !30 Prediction: Nearest neighbor search Given test object and test labels, to predict the label of a test object,
  31. 31. 1. project the example into label space, using matrix M 2. find the nearest label Prediction: Nearest neighbor search Given test object and test labels, to predict the label of a test object, lion tigerlabel spaceexample space label spaceexample space chimpanzee leopard gorilla M !31
  32. 32. Hubness: Problem in ZSL !32 sheep zebra hippo rat label spaceexample space Classifier frequently predicts the same labels (“hubs”) [Dinu and Baroni, 2015; see also Radovanović+, 2010]
  33. 33. !33 sheep zebra hippo rat label spaceexample space Classifier frequently predicts the same labels (“hubs”) Hubness: Problem in ZSL [Dinu and Baroni, 2015; see also Radovanović+, 2010]
  34. 34. !34 Classifier frequently predicts the same labels (“hubs”) sheep zebra hippo rat label spaceexample space Hubness: Problem in ZSL [Dinu and Baroni, 2015; see also Radovanović+, 2010]
  35. 35. !35 sheep zebra hippo rat label spaceexample space Classifier frequently predicts the same labels (“hubs”) Hubness: Problem in ZSL [Dinu and Baroni, 2015; see also Radovanović+, 2010]
  36. 36. !36 sheep zebra hippo rat label spaceexample space Classifier frequently predicts the same labels (“hubs”) Hubness: Problem in ZSL [Dinu and Baroni, 2015; see also Radovanović+, 2010]
  37. 37. !37 Problem with current regression approach: Research objective: Learned classifier frequently predicts the same labels (Emergence of “hub” labels) Investigate why hubs emerge in regression-based ZSL, and how to reduce the emergence of hubs
  38. 38. Proposed approach
  39. 39. Current approach: !39 Proposed approach: chimpanzee lion tigerlabel spaceexample space M example space chimpanzee lion tigerlabel space M
  40. 40. Current approach: Proposed approach: chimpanzee lion tigerlabel spaceexample space M example space chimpanzee lion tigerlabel space label spaceexample space label spaceexample space !40 leopard gorilla M leopard gorillaM
  41. 41. Synthetic data result !41 Hubness (N1 skewness) Accuracy 24.2 13.8 0.5 87.6 Current Proposed Proposed approach reduces hubness and improves accuracy
  42. 42. Why proposed approach reduces hubness Shrinkage in regression !42 Argument for our proposal relies on two concepts Spatial centrality of data distributions
  43. 43. !43 If we optimize Then, we have “Shrinkage” in ridge/least squares regression [See also Lazaridou+,2015]
  44. 44. !44 If we optimize Then, we have For simplicity, projected objects are assumed to also follow normal distribution “Shrinkage” in ridge/least squares regression [See also Lazaridou+,2015]
  45. 45. Why proposed approach reduces hubness Shrinkage in regression !45 Argument for our proposal relies on two concepts Spatial centrality of data distributions ✔
  46. 46. “Spatial centrality” : query distribution (zero mean) !46 Fixed objects , with [See also Radovanović+, 2010]
  47. 47. “Spatial centrality” : query distribution (zero mean) is more likely to be closer to more likely to be a hub !47 Then it can be shown that i.e. Fixed objects , with EX x y2 2 EX x y1 2 > 0 [See also Radovanović+, 2010]
  48. 48. “Spatial centrality” : query distribution (zero mean) is more likely to be closer to more likely to be a hub !48 Then it can be shown that i.e. Fixed objects , with Because this holds for any pair and , objects closest to the origin tend to be hubs This bias is called “spatial centrality.” EX x y2 2 EX x y1 2 > 0 [See also Radovanović+, 2010]
  49. 49. Degree of spatial centrality !49 Further assume distribution of and
  50. 50. Degree of spatial centrality !50 Further assume distribution of and This formula quantifies the degree of spatial centrality We have
  51. 51. The smaller the variance of label distribution, the smaller the spatial centrality (= bias causing hubness) Spatial centrality depends on variance of label distributions !51
  52. 52. Why proposed approach reduces hubness Shrinkage in regression !52 Argument for our proposal relies on two concepts Spatial centrality of data distributions ✔ ✔
  53. 53. !53 Current approach: map X into Y Proposed approach: map Y into X
  54. 54. !54 Current approach: map X into Y Proposed approach: map Y into X Shrinkage
  55. 55. !55 Current approach: map X into Y Proposed approach: map Y into X
  56. 56. !56 Current approach: map X into Y Proposed approach: map Y into X
  57. 57. Q. Which configuration is better for reducing hubs? !57 Proposed Current
  58. 58. Q. Which configuration is better for reducing hubs? !58 Proposed Current逆方向 順方向 Spatial centrality For a fixed query distribution , data distribution with smaller variance is preferable to reduce hubs
  59. 59. Q. Which configuration is better for reducing hubs? !59 Proposed Current
  60. 60. Q. Which configuration is better for reducing hubs? !60 Proposed Current Since distribution is not fixed, comparing label distribution is not meaningful
  61. 61. !61 Proposed (scaled) Current Q. Which configuration is better for reducing hubs? Scaling does not change the nearest neighbor relation
  62. 62. !62 Proposed (scaled) Current Q. Which configuration is better for reducing hubs? A. Reverse direction is preferable For fixed distribution , variance of distribution in proposed is smaller
  63. 63. Summary of our proposal !63 Project labels into example space ➥ reduces variance of labels, hence suppresses hubness chimpanzee gorilla example space label space Label distribution with smaller variance is desirable to reduce hubness Spatial centrality Regression shrinks variance of projected objects Shrinkage Proposal
  64. 64. Experiments 64
  65. 65. Experimental objective !65 We evaluate proposed approach in real tasks •Does it suppress hubs? •Does it improve the prediction accuracy?
  66. 66. •Bilingual lexicon extraction gorilla leopard : source language : target language gorille Zero-shot tasks !66 •Image labeling gorilla leopard : image : label
  67. 67. Compared methods !67 Current Proposed CCA We used Euclidean distance as a distance measure for finding the nearest label
  68. 68. 2.00 0.08 2.61 better Hubness (skewness) !68 9.2 41.3 22.6 current reverse CCA better Accuracy [%] Image labeling
  69. 69. 10.0 37.7 3.85.2 65.162.1 better Hubness (skewness) Ja → En En → Ja Bilingual lexicon extraction: Ja - En !69 21.620.2 34.431.9 0.40.2 current reverse CCA better Accuracy [%] Ja → En En → Ja
  70. 70. Summary • Analyzed why hubs emerge in current ZSL approach - Variance of labels greater than examples • Proposed a simple method for reducing hubness - Reverse the mapping direction • Proposed method reduced hubness and outperformed current approach and CCA in image labeling and bilingual lexicon extraction tasks !70
  71. 71. k-Nearest Neighbor Classification [Shigeto+, 2016]
  72. 72. k-nearest neighbor classification !72 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi)
  73. 73. k-nearest neighbor classification !73 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi)
  74. 74. k-nearest neighbor classification !74 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi)
  75. 75. !75 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi) k-nearest neighbor classification Distance metric learning learns a matrix f(x, xi) = Lx Lxi L Training is computationally expensive
  76. 76. !76 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi) Proposal: Dissimilarity
  77. 77. !77 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi) Proposal: Dissimilarity Spatial centrality For a fixed query distribution , data distribution with smaller variance is preferable to reduce hubs
  78. 78. !78 Given a dataset D = {(xi, yi)}n i=1 the label of is decided by its k-nearest neighbors:x ˆy = arg min yi:(xi,yi) D f(x, xi) The function f needs to be computed only between labeled objects and unlabeled object ➡labeled objects are always target of retrieval, and unlabeled object is always query f(x, xi) = x Wxi 2 Proposal: Dissimilarity
  79. 79. This method is not metric learning !79 •The goal of classification is to classify the query correctly -finding a suitable decision boundary (not metric)
  80. 80. !80 min W n i=1 z Ti xi Wz 2 + W 2 F Find a matrix which minimizes the distance: Proposal: Training W
  81. 81. !81 min W n i=1 z Ti xi Wz 2 + W 2 F Proposal: Training Find a matrix which minimizes the distance:W
  82. 82. !82 min W n i=1 z Ti xi Wz 2 + W 2 F Proposal: Training Find a matrix which minimizes the distance:W
  83. 83. !83 min W n i=1 z Ti xi Wz 2 + W 2 F Proposal: Training W = XJXT (XXT + I) 1 This function has the closed-form solution: Find a matrix which minimizes the distance:W
  84. 84. !84 Givne a query object , ˆy = arg min yi:(xi,yi) D x Wxi 2 Proposal: Test x
  85. 85. !85 ˆy = arg min yi:(xi,yi) D x Wxi 2 Proposal: Test Givne a query object ,x
  86. 86. Move labeled objects v.s. move query !86 f(x, xi) = Mx xi 2 f(x, xi) = x Wxi 2 •Move labeled objects (proposal) •Move query This reduces the variance = reducing the emergence of hubs This increases the variance = promoting the emergence of hubs
  87. 87. Experiments 87
  88. 88. Experimental objective !88 We evaluate the proposed method on various datasets Our main focuses are -Does it suppress hubs? -Does it improve the classification accuracy? -Is it faster than distance metric learning?
  89. 89. Results: Skewness (degree of hubness) !89 The proposed method -reduces the emergence of hubs method RCV News Reuters TDT original metric 13.35 21.93 7.61 4.89 LMNN 3.86 14.74 7.63 4.01 ITML 4.27 19.65 7.30 2.39 DML-eig 1.71 1.45 3.05 1.34 Move-labeled (proposed) 1.14 2.88 4.53 1.44 Move-query 21.57 33.36 17.49 6.71 (c) Image datasets. method AwA CUB SUN aPY original metric 2.49 2.38 2.52 2.80 LMNN 3.10 2.96 2.80 3.94 ITML 2.42 2.27 2.37 2.69 DML-eig 1.90 1.77 2.39 2.17 Move-labeled (proposed) 1.24 0.97 1.02 1.23 Move-query 7.81 7.83 7.48 11.65 Image datasets (smaller is better)
  90. 90. Results: Classification accuracy [%] !90 Image datasets The proposed method -reduces the emergence of hubs -is better than metric learning methods on most datasets method RCV News Reuters TDT original metric 92.1 76.9 89.5 96.1 LMNN 94.7 79.9 91.5 96.6 ITML 93.2 77.0 90.8 96.5 DML-eig 94.5 73.3 85.9 95.7 Move-labeled (proposed) 94.4 81.6 91.6 96.7 Move-query 89.1 70.0 85.9 95.4 (c) Image datasets. method AwA CUB SUN aPY original metric 83.2 51.6 26.2 82.2 LMNN 83.0 54.7 24.4 81.8 ITML 83.1 51.3 26.0 82.4 DML-eig 82.0 53.5 22.4 81.6 Move-labeled (proposed) 84.1 52.4 28.3 83.4 Move-query 79.2 43.3 14.6 78.7
  91. 91. Results: Training time [s] !91 The proposed method -reduces the emergence of hubs -is better than metric learning methods on most datasets -is faster than … on all datasets Document datasetsndicate the best performer for each dataset. (b) Image datasets. method AwA CUB SUN aPY LMNN 1525.5 1098.2 15704.3 317.3 ITML 1536.3 577.6 1126.4 9211.2 DML-eig 2048.0 2084.7 2006.1 1787.1 proposed 9.5 1.5 4.1 6.4
  92. 92. Results: UCI datasets !92 The proposed method -reduces the emergence of hubs -is better than metric learning methods on most datasets -is faster than … on all datasets -does not work well on UCI datasets able 3: Classification accuracy [%]: Bold figures indicate the best performers for each ataset. (a) UCI datasets. method ionosphere balance-scale iris wine glass original metric 86.8 89.5 97.2 98.1 68.1 LMNN 90.3 90.0 96.7 98.1 67.7 ITML 87.7 89.5 97.8 99.1 65.0 DML-eig 87.7 91.2 96.7 98.6 66.5 Move-labeled (proposed) 89.6 89.5 97.2 98.6 70.8 Move-query 79.7 89.4 97.2 96.3 62.3 (b) Document datasets.
  93. 93. Summary !93 Prediction: ˆy = arg min yi:(xi,yi) D x Wxi 2 The proposed method -reduces the emergence of hubs -is better than metric learning methods on most datasets -is faster than … on all datasets -does not work well on UCI datasets
  94. 94. Misc.
  95. 95. Other topics • Normalization of distances - Local scaling [Schnitzer+, 2012], Laplacian-based kernel [Suzuki+, 2012], Localized centering [Hara+, 2015] • Classifiers -hw-kNN [Radovanović+, 2009], h-FNN [Tomašev+, 2013], NHBNN [Tomašev+, 2011] !95See comprehensive survey [Tomašev+, 2015; Suzuki, 2014; Radovanović, 2017]
  96. 96. Tools • Hub miner: Hubness-aware machine learning • Hub toolbox • PyHubs • Our code !96
  97. 97. Conclusions • Introduced why hubs emerge -Spatial centrality • Showed hub reduction methods which improved the performance of nearest neighbor methods !97

×