7. Tableau uses the k-means algorithm for clustering. For a given
number of clusters k, the algorithm partitions the data into k
clusters. Each cluster has a center (centroid) that is the mean
value of all the points in that cluster.
Clustering
13. Variables: Sum of CO2 emissions per capita (metric tons)
Sum of CO2 emissions, total (KtCO2)
Level of Detail: Country name
Scaling: Normalized
Inputs for Clustering
14. Number of Clusters: 5
Number of Points: 218
Between-group Sum of Squares: 5.0161
Within-group Sum of Squares: 0.64986
Total Sum of Squares: 5.6659
Summary Diagnostics
Editor's Notes
K-means locates centers through an iterative procedure that minimizes distances between individual points in a cluster and the cluster center. In Tableau, you can specify a desired number of clusters, or have Tableau test different values of k and suggest an optimal number of clusters (see Determining the optimal number of clusters).
Tableau uses the Calinski-Harabasz criterion to assess cluster quality. The Calinski-Harabasz criterion is defined as
where SSB is the overall between-cluster variance, SSW the overall within-cluster variance, k the number of clusters, and N the number of observations.
The greater the value of this ratio, the more cohesive the clusters (low within-cluster variance) and the more distinct/separate the individual clusters (high between-cluster variance).