Your SlideShare is downloading. ×
Market analysis for the S
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Market analysis for the S


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Market analysis for the S&P500 Giulio Genovese Tuesday, December 11 2007
  • 2. Data collected and used
    • Historical prices have been downloaded from for the most influencial 500 stocks in the United States from the S&P500 index.
    • For every company is associated a label corresponding to the sector in which the company operates. Yahoo identifies eight sector plus a ninth sector, Conglomerates, used for companies which own divisions in different and separate businesses.
    • Historical prices for each stock can be downloaded through the url:
    • where SYMBOL is the ticker associated to the company in the stock market.
    • Data contains daily closing prices starting from January 1 st 1962 or later for companies who went public after that date.
    • The following sectors were listed from Yahoo finance:
    • Basic Materials (53 stocks)‏
    • Conglomerates (7 stocks)‏
    • Consumer Goods (62 stocks)‏
    • Financial (93 stocks)‏
    • Healthcare (40 stocks)‏
    • Industrial Goods (40 stocks)‏
    • Services (93 stocks)‏
    • Technology (82 stocks)‏
    • Utilities (30 stocks)‏
  • 3. Goal of the project
    • We want to measure how the prices of the stock market reflect the sector division given by the Yahoo website.
    • To this aim we want to apply the k-means clustering algorithms to the 500 stocks in the S&P500 index to investigate how price variations follow the market sectorization.
  • 4. Data preprocessing
    • As a first step we compute the log of the closing prices for every day.
    • Then we compute for every day the return, intended as the difference between the current day log price and the previous day log price.
    • For every day we subtract the average return among all the stocks, intended as the general market return for the day.
    • With the serieses of the modified returns we compute the 500x500 matrix of correlations among stocks, using the daily returns from January 1 st 2000 up to December 7 th 2007.
    • To visualize the result we use the principal component analysis over the correlation matrix and we visualize how the stocks appear when projected over the first two eigenvectors of the matrix.
  • 5. Geometric interpretation
    • Think of the correlation matric C as the matrix of the scalar products (kernel) of the stocks thought as vectors.
    • Using the eigenvector decomposition we get that C=VDV' where D is a diagonal matrix and V is the matrix whose columns correspond to the eigenvectors.
    • Consider now W=V*sqrt(D). Then C=WW' and therefore C can be seen as the matrix of scalar products among the rows of W.
    • Since the elements on the diagonal of C are all 1's, all the rows of W are unitary vectors.
    • Each one of these unitary vector will represent from now on one of our stocks.
  • 6. PCA of the correlation matrix
    • As you can see already with the first two eigenvectors it is possible to tell apart some of the sectors.
    • To get a better plot more eigenvectors are needed and therefore it is not possible to visualize it on screen.
  • 7. Eigenvalues of the matrix
    • The eigenvalues after the second eigenvalue are not significantly smaller than the first two.
    • This is a sign that we lost many of the metric properties carried by the correlation matrix.
  • 8. Using k-means to cluster
    • If we exclude the 7 stocks from the Conglomerate cluster, that is, those stocks that don't belong to any specific sector, then we can try to cluster the rest of the stocks using k-means setting the number of groups equal to 8.
    • To make the process supervised we set the ”seed” centroids for k-means as the centroids of the 8 groups of stocks indicated by Yahoo.
  • 9. k-means results
    • Setting the ”seed” centroid has been an essential step since there are a plethora of possible outputs for k-means.
    • The cluster classification looks very similar to the sector classification. Can we quantify that?
  • 10. Accuracy of classification
    • The picture shows how the 500 stocks were classified by the k-means method compared to how they are classified per sector.
    • Overall, 334 out of 493 stocks have been classified correctly, that is, the accuracy of the classification has been of 67.75%.
  • 11. Conclusions
    • At first sight a classification accuracy of 67.75% might seem low. Although, we have to remember that there were 8 clusters and therefore a random classification would have yielded only an accuracy of 12.5%.
    • Mahalanobis distance cannot be applied since the space where the stock vectors were embedded has dimension greater than the size of the clusters.
    • Figures were generated using matlab. Source code and data is freely available at the following address: