Historical prices have been downloaded from finance.yahoo.com for the most influencial 500 stocks in the United States from the S&P500 index.
For every company is associated a label corresponding to the sector in which the company operates. Yahoo identifies eight sector plus a ninth sector, Conglomerates, used for companies which own divisions in different and separate businesses.
Historical prices for each stock can be downloaded through the url:
If we exclude the 7 stocks from the Conglomerate cluster, that is, those stocks that don't belong to any specific sector, then we can try to cluster the rest of the stocks using k-means setting the number of groups equal to 8.
To make the process supervised we set the ”seed” centroids for k-means as the centroids of the 8 groups of stocks indicated by Yahoo.
At first sight a classification accuracy of 67.75% might seem low. Although, we have to remember that there were 8 clusters and therefore a random classification would have yielded only an accuracy of 12.5%.
Mahalanobis distance cannot be applied since the space where the stock vectors were embedded has dimension greater than the size of the clusters.
Figures were generated using matlab. Source code and data is freely available at the following address: