2. References
Cluster
Forests
Donghui
Yan
Department
of
Sta=s=cs
University
of
California,
Berkeley
Aiyou
Chen
(Google)
Michael
I.
Jordan
(U.C.
Berkeley)
3. Overview
• Clustering
aims
to
par==on
a
set
of
data
such
that
points
are
“similar”
within
the
same
cluster
while
“dissimilar”
across
clusters.
v One
of
the
fundamental
task
in
machine
learning
and
paOern
classifica=on
v Applicable
in
wide
scien=fic
and
business
domains
4. Challenges
• Modern
data
has
addi=onal
challenges
v High
dimensionality
v Huge
number
of
observa=ons
v Increasingly
complex
5. Mo=va=on
• Ensemble
to
achieve
best
performance
• Can
we
develop
a
clustering
analogy
to
RF?
• Unifying
view
of
clustering
and
classifica=on
6. General
Approach
Cluster
ensemble
methods
generally
consist
of
two
stages
v Genera=on
of
clustering
instances
v Aggrega=on
of
mul=ple
clustering
instances
9. Performance
Metrics
• Propor=on
of
pairs
of
points
with
“correct”
co-‐
cluster
membership
Pr
=
(#
correctly
clustered
pairs/Total
#
pairs)
X
100
%
• Clustering
accuracy
Pc
=
(#
points
with
“correct”
cluster
membership/
Total
#
points)
X
100
%
• Assume
availability
of
“true”
labels
for
the
datasets