SlideShare a Scribd company logo
1 of 11
Download to read offline
Motivation
Methods
Evaluation
Conclusion
Self-Tuning Spectral Clustering
(NIPS2004)
パターン認識と機械学習の勉強会 #9
上田 隼也 (筑波大学)
情報数理研究室 修士 1 年
December 24, 2015
1 / 11
Motivation
Methods
Evaluation
Conclusion
1 Motivation
2 Methods
3 Evaluation
4 Conclusion
2 / 11
Motivation
Methods
Evaluation
Conclusion
概要・著者
概要
• Spectral Clusteirng は非線形のクラスタリングが可能
• グラフ表現行列 (Graph Laplacian) の固有ベクトル空間
でクラスタリングを行う手法
問題点
• 熟練したパラメータチューニングが要求される
• グラフ構築時のガウシアンカーネルに用いられる σ
• fx) exp(
−∥xi−xj∥2
2
σ2 )
• クラスタ数が自動決定されない
貢献
• σ のより良い値の決定方法
• クラスタ数の自動調整方法
3 / 11
Motivation
Methods
Evaluation
Conclusion
Local Scaling
Estimate Cluster Number
半教師付き学習
• データ si から sj までの距離を
d(si,sj)
σi
• データ sj から si までの距離を
d(sj,si)
σj
d(si, sj)d(sj, si)
σiσj
=
d(si, sj)2
σiσj
ˆAij = exp(−
d(si, sj)2
σiσj
)
σi = d(si, sK) K は何個目の近傍データを表す。
K = 3 なら σi = d(si, s3) となり、si に 3 番目に近いデータ
の距離を返す。
彼らの実験結果では、K = 7 がベスト
高次元の人工データ・画像データで良好な結果が得られて
いる。
4 / 11
Motivation
Methods
Evaluation
Conclusion
Local Scaling
Estimate Cluster Number
Local Scalingのイメージ
5 / 11
Motivation
Methods
Evaluation
Conclusion
Local Scaling
Estimate Cluster Number
固有値分布からクラスタ数を推定可能か?
ヒューリスティックな手法は存在するが、
理論的な定式化は不可能
6 / 11
Motivation
Methods
Evaluation
Conclusion
Local Scaling
Estimate Cluster Number
Analyziing the Eigenvectors
固有ベクトルの分布からクラスタ数を推定
グラフラプラシアンは対称行列なので固有ベクトルは直交
している。
まず理想的な L の固有ベクトルの振る舞いを考える。
L はソートされた後で、綺麗なブロック対角行列。
L =


L(1)
0 0
0 . . . 0
0 0 L(C)


7 / 11
Motivation
Methods
Evaluation
Conclusion
Local Scaling
Estimate Cluster Number
Analyziing the Eigenvectors
固有ベクトルの分布からクラスタ数を推定
グラフラプラシアンは対称行列なので固有ベクトルは直交
している。
ˆX ∈ Rn×C
: C 本の固有ベクトルを横に並べる。
ˆX =


x(1) ⃗0 ⃗0
⃗0 . . . ⃗0
⃗0 ⃗0 x(C)


X = ˆXR
8 / 11
Motivation
Methods
Evaluation
Conclusion
Local Scaling
Estimate Cluster Number
Analyziing the Eigenvectors
Z ∈ Rn×C
, Z = XR
Mi = maxj Zij
J =
n∑
i=1
C∑
j=1
Z2
ij
M2
i
この目的関数が最小化された際の C が最適なクラスタ数
イメージ: ˆX の非ゼロ要素数が最大化される Z を探す。
9 / 11
Motivation
Methods
Evaluation
Conclusion
クラスタ数推定結果
10 / 11
Motivation
Methods
Evaluation
Conclusion
結論・貢献
貢献
1 Spectral Clustering のパラメータチューニングには根気
強さと素晴らしいテクニックが必要
2 グラフの新しいパラメータチューニング方法を提案
(Local Scaling)
3 クラスタ数の推定方法を提案 (固有ベクトルを解析)
• 固有ベクトル集合で非零要素が多くなるものを探す
11 / 11

More Related Content

More from Shunya Ueta

Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Shunya Ueta
 
Auto Content Moderation in C2C e-Commerce at OpML20
Auto Content Moderation in C2C e-Commerce at OpML20Auto Content Moderation in C2C e-Commerce at OpML20
Auto Content Moderation in C2C e-Commerce at OpML20Shunya Ueta
 
How to evaluate & manage machine learning model #daft
How to evaluate & manage machine learning model  #daftHow to evaluate & manage machine learning model  #daft
How to evaluate & manage machine learning model #daftShunya Ueta
 
Introduction to argo
Introduction to argoIntroduction to argo
Introduction to argoShunya Ueta
 
Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Shunya Ueta
 
Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Shunya Ueta
 
How to break the machine learning system barrier ?
How to break the machine learning system barrier ?How to break the machine learning system barrier ?
How to break the machine learning system barrier ?Shunya Ueta
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformShunya Ueta
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Applied machine learning at facebook  a datacenter infrastructure perspective...Applied machine learning at facebook  a datacenter infrastructure perspective...
Applied machine learning at facebook a datacenter infrastructure perspective...Shunya Ueta
 
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Shunya Ueta
 
Detecting Research Topics via the Correlation between Graphs and Texts
 Detecting Research Topics via the Correlation between Graphs and Texts Detecting Research Topics via the Correlation between Graphs and Texts
Detecting Research Topics via the Correlation between Graphs and TextsShunya Ueta
 
Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Shunya Ueta
 
"Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio..."Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio...Shunya Ueta
 
コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法Shunya Ueta
 

More from Shunya Ueta (14)

Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...
 
Auto Content Moderation in C2C e-Commerce at OpML20
Auto Content Moderation in C2C e-Commerce at OpML20Auto Content Moderation in C2C e-Commerce at OpML20
Auto Content Moderation in C2C e-Commerce at OpML20
 
How to evaluate & manage machine learning model #daft
How to evaluate & manage machine learning model  #daftHow to evaluate & manage machine learning model  #daft
How to evaluate & manage machine learning model #daft
 
Introduction to argo
Introduction to argoIntroduction to argo
Introduction to argo
 
Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)
 
Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18
 
How to break the machine learning system barrier ?
How to break the machine learning system barrier ?How to break the machine learning system barrier ?
How to break the machine learning system barrier ?
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platform
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Applied machine learning at facebook  a datacenter infrastructure perspective...Applied machine learning at facebook  a datacenter infrastructure perspective...
Applied machine learning at facebook a datacenter infrastructure perspective...
 
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
 
Detecting Research Topics via the Correlation between Graphs and Texts
 Detecting Research Topics via the Correlation between Graphs and Texts Detecting Research Topics via the Correlation between Graphs and Texts
Detecting Research Topics via the Correlation between Graphs and Texts
 
Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)
 
"Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio..."Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio...
 
コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法
 

Self-turning Spectral Clustering (NIPS2004)