Set-values prototypes through Consensus Analysis

Prototypes deﬁnition
Consensus Analysis
Partitioning methods
Simulated data examples
Application on real data
Conclusions
Set-valued prototypes
through Consensus Analysis
M. Fordellone1 F. Palumbo2
1Department of Statistical Sciences
University of Padua (Italy)
email: fordellone@stat.unipd.it
2Department of Political Sciences
University of Naples (Italy)
email: fpalumbo@unina.it
IFCS Conference
July 6th 2015, Bologna (Italy)
M. Fordellone, F. Palumbo Set-valued prototypes through Consensus Analysis

Consensus Analysis
Conclusions
Outline
1 Prototypes deﬁnition
What is a prototype?
2 Consensus Analysis
Consensus clustering
Consensus measurement
3 Partitioning methods
k-Means
Fuzzy criterion
Fuzzy c-Means (FCM) and Archetypal Analysis (AA)
4 Simulated data examples
Eight experimental contexts
5 Application on real data
I.P.I.P. test

Consensus Analysis
Conclusions
Outline
k-Means
Fuzzy criterion
I.P.I.P. test

Consensus Analysis
Conclusions
According to Rosch (1975, 1999), prototypes are the elements that
better than others represent a category.
Smith and Medin (1981) refer to the concept of category as the
highest order of genera that cannot be deﬁned by a mere listing of
properties shared by all elements.
A prototype is not necessarily a real element of the category, it
can be observed or unobserved (abstract) entity (Medin, D. L. and
Schaﬀer, M. M., 1978).

Consensus Analysis
Conclusions
Outline
k-Means
Fuzzy criterion
I.P.I.P. test

Consensus Analysis
Conclusions
Consensus concept
Finding and measuring the agreement between two or more parti-
tions of the same data set is of substantial interest in cluster analysis.
This particular case of consensus analysis is also known as consensus
clustering.

Consensus Analysis
Conclusions
Comparing partitions
Let X be a N×J data matrix, and T and V two partitions of X, then
nrc (r = 1, . . . , R; c = 1, . . . , C) represents the number of objects
assigned to the classes tr and vc, with respect to the two partitioning
criteria. Consensus between the partitions T and V is evaluated
starting from the entries of the cross-classifying contingency table.

Consensus Analysis
Conclusions
Comparing partitions
Let X be a N×J data matrix, and T and V two partitions of X, then
nrc (r = 1, . . . , R; c = 1, . . . , C) represents the number of objects
assigned to the classes tr and vc, with respect to the two partitioning
criteria. Consensus between the partitions T and V is evaluated
starting from the entries of the cross-classifying contingency table.
Table : Contingency table
Partition V
v1 v2 · · · vC
Partition T
t1 n11 n12 · · · n1C n1·
t2 n21 n22 · · · n2C n2·
...
...
...
...
...
...
tR nR1 nR2 · · · nRC nR·
n·1 n·2 · · · n·C n

Consensus Analysis
Conclusions
Measure of Consensus
Number of ways that n units can pair:
S = n
2 = n(n−1)
2

Consensus Analysis
Conclusions
S = n
2 = n(n−1)
2
Total number of Agreements:
A = n
2 + R
r=1
C
c=1 n2
rc − 1
2
R
r=1 n2
r· + C
c=1 n2
·c

Consensus Analysis
Conclusions
S = n
2 = n(n−1)
2
A = n
2 + R
r=1
C
c=1 n2
rc − 1
2
R
r=1 n2
r· + C
c=1 n2
·c
Total number of Disagreements:
D = 1
2
R
r=1 n2
r· + C
c=1 n2
·c − R
r=1
C
c=1 n2
rc

Consensus Analysis
Conclusions
S = n
2 = n(n−1)
2
A = n
2 + R
r=1
C
c=1 n2
rc − 1
2
R
r=1 n2
r· + C
c=1 n2
·c
Total number of Disagreements:
D = 1
2
R
r=1 n2
r· + C
c=1 n2
·c − R
r=1
C
c=1 n2
rc
Table : Measures of Consensus
Authors Measure Range
Rand (1971) A/S ∈ [0, 1]
Arabie et al. (1973) D/S ∈ [0, 1]
Hubert (1977) (A − D)/S ∈ [0, 1]

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
Outline
k-Means
Fuzzy criterion
I.P.I.P. test

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method
K-Means method is developed by Queen (1967). He suggests the
name k-Means for describing an algorithm that assigns each unit
to the group having the nearest centroid (mean). The iterative
procedure consists in four principal steps:

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method
1 Randomly select K group centers;

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method
2 Calculate the distance between each data point and group
centers;

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method
centers;
3 Assign the data point to the group whose distance from the
group center is minimum among all the group centers;

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method
centers;
4 Recalculate the new group centers.

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method
centers;
4 Recalculate the new group centers.
The procedure repeats from step 2 until no more assignments take
place.

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
k-Means method

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
Fuzzy clustering
In fuzzy clustering data elements can belong to more than one group,
in according to a measure of association given by a set of member-
ship levels.
The memberships, ∈ [0, 1], indicate the strength of the association
between each data element and each group.

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
Fuzzy clustering
In fuzzy clustering data elements can belong to more than one group,
in according to a measure of association given by a set of member-
ship levels.
The memberships, ∈ [0, 1], indicate the strength of the association
between each data element and each group.
In our case the units with the max membership degree can be uni-
vocally assigned to the corresponding group.

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-means (Bezdek et al., 1984) and Archetypal Analysis (Cutler
and Breiman, 1994) can be seen as a fuzzy approach of the k-Means,
under diﬀerent constraints.

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-means (Bezdek et al., 1984) and Archetypal Analysis (Cutler
and Breiman, 1994) can be seen as a fuzzy approach of the k-Means,
under diﬀerent constraints.
Fuzzy c-Means minimizes the sum of distances between each point
and a set of K centers; Archetypal Analysis minimizes the sum of
distances between each point and a set of K archetypes as deﬁned
by a convex combination of extreme points.

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
γik is the membership level of
the i-th unit and of the k-th
group

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
group
ck is the center of the k-th
group

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
group
group
Constraints:
K
k=1 γik = 1;
γik ≥ 0.
∀k ∈ 1, 2, . . . , K

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
group
group
Constraints:
K
k=1 γik = 1;
γik ≥ 0.
∀k ∈ 1, 2, . . . , K
Archetypal Analysis
J =
n
i=1
K
k=1
xi − δikak
2

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
group
group
Constraints:
K
k=1 γik = 1;
γik ≥ 0.
∀k ∈ 1, 2, . . . , K
Archetypal Analysis
J =
n
i=1
K
k=1
xi − δikak
2
δik is the membership level of
group

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
group
group
Constraints:
K
k=1 γik = 1;
γik ≥ 0.
∀k ∈ 1, 2, . . . , K
Archetypal Analysis
J =
n
i=1
K
k=1
xi − δikak
2
group
ak = n
i=1 xi βik is the
archetype of the k-th group

Consensus Analysis
Conclusions
k-Means
Fuzzy criterion
FCM and AA
Fuzzy c-Means
W =
n
i=1
K
k=1
γ2
ik xi − ck
2
group
group
Constraints:
K
k=1 γik = 1;
γik ≥ 0.
∀k ∈ 1, 2, . . . , K
Archetypal Analysis
J =
n
i=1
K
k=1
xi − δikak
2
group
ak = n
i=1 xi βik is the
archetype of the k-th group
Constraints:
K
k=1 δik = 1; δik ≥ 0;
K
k=1 βik = 1; βik ≥ 0.
∀k ∈ 1, 2, . . . , K

Consensus Analysis
Conclusions
Outline
k-Means
Fuzzy criterion
I.P.I.P. test

Consensus Analysis
Conclusions
Simulated data
Three groups of units in diﬀerent experimental contexts have been
generated by a multivariate Gaussian distribution with eight dimen-
sions (four variables are white noise).

Consensus Analysis
Conclusions
Simulated data
Three groups of units in diﬀerent experimental contexts have been
generated by a multivariate Gaussian distribution with eight dimen-
sions (four variables are white noise).
Table : Experimental contexts
Size Correlation Kurtosis
Case 1 900 0.2 − 0.4 β = 3
Case 2 300 0.2 − 0.4 β = 3
Case 3 900 0.2 − 0.4 β < 3
Case 4 300 0.2 − 0.4 β < 3
Case 5 900 0.6 − 0.8 β = 3
Case 6 300 0.6 − 0.8 β = 3
Case 7 900 0.6 − 0.8 β < 3
Case 8 300 0.6 − 0.8 β < 3

Consensus Analysis
Conclusions
Simulated data: Case 1
K-means groups

Consensus Analysis
Conclusions
K-means groups

Consensus Analysis
Conclusions
Memberships FCM and AA

Consensus Analysis
Conclusions

Consensus Analysis
Conclusions
Consensus Analysis between FCM and AA

Consensus Analysis
Conclusions

Consensus Analysis
Conclusions
Consensus groups FCM and AA

Consensus Analysis
Conclusions

Consensus Analysis
Conclusions
Simulated data: Summary
Table : Results of Consensus Analysis and deﬁnition of the prototypes
Experimental Conditions Prototyping Results Consensus Measuring
N Corr. Kurt. K Size Rand Arabie Hubert
900 0.2 − 0.4 β = 3 3 900 (100.0%) 1.000 0.000 1.000
300 0.2 − 0.4 β = 3 3 300 (100.0%) 1.000 0.000 1.000
900 0.2 − 0.4 β < 3 3 625 (69.4%) 0.725 0.275 0.449
300 0.2 − 0.4 β < 3 3 185 (61.7%) 0.683 0.317 0.365
900 0.6 − 0.8 β = 3 3 599 (66.6%) 0.753 0.247 0.506
300 0.6 − 0.8 β = 3 3 202 (67.3%) 0.758 0.242 0.517
900 0.6 − 0.8 β < 3 3 533 (59.2%) 0.698 0.302 0.397
300 0.6 − 0.8 β < 3 3 189 (63.0%) 0.720 0.280 0.439

Consensus Analysis
Conclusions
I.P.I.P. test
Outline
k-Means
Fuzzy criterion
I.P.I.P. test

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
Web Site: http://personality-testing.info/ rawdata/

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
Four diﬀerent scales were used as part of an experiment DISC per-
sonality test. The scales are from the International Personality Item
Pool (http://ipip.ori.org/newCPIKey.htm).
The scales used are:

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
Assertiveness, is the quality of being self-assured and
conﬁdent without being aggressive

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
Social conﬁdence, is generally described as a state of being
certain

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
certain
Adventurousness, is represented by the activities with some
potential for physical danger

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
certain
Adventurousness, is represented by the activities with some
potential for physical danger
Dominance, is conceptualized as a measure of individual
diﬀerences in levels of group-based discrimination

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
Dataset consists in 40 items (10 for each scale) and 898 individuals.
The items were rated on a 5 point scale where:

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data
Dataset consists in 40 items (10 for each scale) and 898 individuals.
The items were rated on a 5 point scale where:
1=Strongly disagree,
2=Disagree,
3=Neither agree not disagree,
4=Agree,
5=Strongly agree.

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
About data

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
Principal Component Analysis

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
Scree-plots FCM and AA

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
K-means groups

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test

Consensus Analysis
Conclusions
I.P.I.P. test
I.P.I.P. test
Description of prototypes

Consensus Analysis
Conclusions
Conclusions
The results of the applications confirm the following hypothesis:
When the groups are well defined, avoiding any overlapping,
the consensus analysis between the two different partitioning
methods underlined the presence of the groups;
The simulation has been useful to study which are the causes
that can deeply affect the consensus among the two
approaches: firstly correlation between variables, secondly
presence of multivariate outliers (different kurtosis levels).
We believe that the prototypes definitions through the consensus
approach is more reliable in comparison to the classical approaches:
the finding of the groups in respect to the consensus-criterion, guar-
antees more homogeneous prototypes.

Consensus Analysis
Conclusions

Appendix Bibliography

Set-values prototypes through Consensus Analysis

Recommended

Recommended

More Related Content

Similar to Set-values prototypes through Consensus Analysis

Similar to Set-values prototypes through Consensus Analysis (20)

Recently uploaded

Recently uploaded (20)

Set-values prototypes through Consensus Analysis