Your SlideShare is downloading. ×
Market analysis for the S
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Market analysis for the S

234
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
234
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Market analysis for the S&P500 Giulio Genovese Tuesday, December 11 2007
  • 2. Data collected and used
    • Historical prices have been downloaded from finance.yahoo.com for the most influencial 500 stocks in the United States from the S&P500 index.
    • For every company is associated a label corresponding to the sector in which the company operates. Yahoo identifies eight sector plus a ninth sector, Conglomerates, used for companies which own divisions in different and separate businesses.
    • Historical prices for each stock can be downloaded through the url:
    • http://ichart.finance.yahoo.com/table.csv?s=SYMBOL
    • where SYMBOL is the ticker associated to the company in the stock market.
    • Data contains daily closing prices starting from January 1 st 1962 or later for companies who went public after that date.
    • The following sectors were listed from Yahoo finance:
    • Basic Materials (53 stocks)‏
    • Conglomerates (7 stocks)‏
    • Consumer Goods (62 stocks)‏
    • Financial (93 stocks)‏
    • Healthcare (40 stocks)‏
    • Industrial Goods (40 stocks)‏
    • Services (93 stocks)‏
    • Technology (82 stocks)‏
    • Utilities (30 stocks)‏
  • 3. Goal of the project
    • We want to measure how the prices of the stock market reflect the sector division given by the Yahoo website.
    • To this aim we want to apply the k-means clustering algorithms to the 500 stocks in the S&P500 index to investigate how price variations follow the market sectorization.
  • 4. Data preprocessing
    • As a first step we compute the log of the closing prices for every day.
    • Then we compute for every day the return, intended as the difference between the current day log price and the previous day log price.
    • For every day we subtract the average return among all the stocks, intended as the general market return for the day.
    • With the serieses of the modified returns we compute the 500x500 matrix of correlations among stocks, using the daily returns from January 1 st 2000 up to December 7 th 2007.
    • To visualize the result we use the principal component analysis over the correlation matrix and we visualize how the stocks appear when projected over the first two eigenvectors of the matrix.
  • 5. Geometric interpretation
    • Think of the correlation matric C as the matrix of the scalar products (kernel) of the stocks thought as vectors.
    • Using the eigenvector decomposition we get that C=VDV' where D is a diagonal matrix and V is the matrix whose columns correspond to the eigenvectors.
    • Consider now W=V*sqrt(D). Then C=WW' and therefore C can be seen as the matrix of scalar products among the rows of W.
    • Since the elements on the diagonal of C are all 1's, all the rows of W are unitary vectors.
    • Each one of these unitary vector will represent from now on one of our stocks.
  • 6. PCA of the correlation matrix
    • As you can see already with the first two eigenvectors it is possible to tell apart some of the sectors.
    • To get a better plot more eigenvectors are needed and therefore it is not possible to visualize it on screen.
  • 7. Eigenvalues of the matrix
    • The eigenvalues after the second eigenvalue are not significantly smaller than the first two.
    • This is a sign that we lost many of the metric properties carried by the correlation matrix.
  • 8. Using k-means to cluster
    • If we exclude the 7 stocks from the Conglomerate cluster, that is, those stocks that don't belong to any specific sector, then we can try to cluster the rest of the stocks using k-means setting the number of groups equal to 8.
    • To make the process supervised we set the ”seed” centroids for k-means as the centroids of the 8 groups of stocks indicated by Yahoo.
  • 9. k-means results
    • Setting the ”seed” centroid has been an essential step since there are a plethora of possible outputs for k-means.
    • The cluster classification looks very similar to the sector classification. Can we quantify that?
  • 10. Accuracy of classification
    • The picture shows how the 500 stocks were classified by the k-means method compared to how they are classified per sector.
    • Overall, 334 out of 493 stocks have been classified correctly, that is, the accuracy of the classification has been of 67.75%.
  • 11. Conclusions
    • At first sight a classification accuracy of 67.75% might seem low. Although, we have to remember that there were 8 clusters and therefore a random classification would have yielded only an accuracy of 12.5%.
    • Mahalanobis distance cannot be applied since the space where the stock vectors were embedded has dimension greater than the size of the clusters.
    • Figures were generated using matlab. Source code and data is freely available at the following address:
    • http://mlab01.dartmouth.edu/finance/

×