slide->title; ?>


Published on

slide->imported == 1 || $this->owner_pres == 0) echo "readonly"; ?>>slide->description; ?>

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

slide->title; ?>

  1. 1. Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>The stock selection task </li></ul><ul><li>The Prototype Ranking method </li></ul><ul><li>Experimental results </li></ul><ul><li>Conclusions </li></ul>
  3. 3. Introduction <ul><li>Objective: </li></ul><ul><ul><li>Use machine learning to select a small number of “good” stocks to form a portfolio </li></ul></ul><ul><li>Research questions: </li></ul><ul><ul><li>Learning in the noisy dataset </li></ul></ul><ul><ul><li>Learning in the imbalanced dataset </li></ul></ul><ul><li>Our solution: Prototype Ranking </li></ul><ul><ul><li>A specially designed machine learning method </li></ul></ul>
  4. 4. Outline <ul><li>Introduction </li></ul><ul><li>The stock selection task </li></ul><ul><li>The Prototype Ranking method </li></ul><ul><li>Experimental results </li></ul><ul><li>Conclusions </li></ul>
  5. 5. Stock Selection Task <ul><li>Given information prior to week t , predict performance of stocks of week t </li></ul><ul><ul><li>Training set </li></ul></ul><ul><li>Learning a ranking function to rank testing data </li></ul><ul><ul><li>Select n highest to buy, n lowest to short-sell </li></ul></ul>Predictor 1 Predictor 2 Predictor 3 Goal Stock ID Return of week t -1 Return of week t -2 Volume ratio of t -2/ t -1 Return of week t
  6. 6. Outline <ul><li>Introduction </li></ul><ul><li>The stock selection task </li></ul><ul><li>The Prototype Ranking method </li></ul><ul><li>Experimental results </li></ul><ul><li>Conclusions </li></ul>
  7. 7. Prototype Ranking <ul><li>Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data </li></ul><ul><li>The PR System </li></ul><ul><ul><li>Step 1. Find good “prototypes” in training data </li></ul></ul><ul><ul><li>Step 2. Use k-NN on prototypes to rank test data </li></ul></ul>
  8. 8. Step 1: Finding Prototypes <ul><li>Prototypes: representative points </li></ul><ul><ul><li>Goal: discover the underlying density/clusters of the training samples by distributing prototypes in sample space </li></ul></ul><ul><ul><li>Reduce data size </li></ul></ul>prototypes prototype neighborhood samples
  9. 9. Analysis??? <ul><li>Competitive learning for stock selection task </li></ul><ul><ul><li>Pros: </li></ul></ul><ul><ul><ul><li>Noise-tolerant </li></ul></ul></ul><ul><ul><ul><li>On-line update: practical for huge dataset </li></ul></ul></ul><ul><ul><ul><li>Smoothly simulate the training samples </li></ul></ul></ul><ul><ul><li>Cons: </li></ul></ul><ul><ul><ul><li>Searching the nearest prototype is tedious </li></ul></ul></ul><ul><ul><ul><li>Poor performance for the prediction task </li></ul></ul></ul><ul><ul><ul><ul><li>Design for tasks such as clustering, feature mapping… </li></ul></ul></ul></ul><ul><ul><ul><ul><li>The stock selection is a prediction task </li></ul></ul></ul></ul><ul><ul><ul><li>Poor performance for imbalanced dataset modeling </li></ul></ul></ul>
  10. 10. <ul><li>Finding prototypes using competitive learning </li></ul><ul><li>General competitive learning </li></ul><ul><li>Step 1: Randomly initialize a set of prototypes </li></ul><ul><li>Step 2: Search the nearest prototypes </li></ul><ul><li>Step 3: Adjust the prototypes </li></ul><ul><li>Step 4: Output the prototypes </li></ul><ul><li>Hidden density in training is reflected in prototypes </li></ul>
  11. 11. <ul><li>Modifications for Stock data </li></ul><ul><li>In step 1: Initial prototypes organized in a tree-structure </li></ul><ul><ul><li>Fast nearest prototype searching </li></ul></ul><ul><li>In step 2: Searching prototypes in the predictor space </li></ul><ul><ul><li>Better learning effect for the prediction tasks </li></ul></ul><ul><li>In step 3: Adjusting prototypes in the goal attribute space </li></ul><ul><ul><li>Better learning effect in the imbalanced stock data </li></ul></ul><ul><li>In step 4, prune the prototype tree </li></ul><ul><ul><li>Prune children prototypes if they are similar to the parent </li></ul></ul><ul><ul><li>Combine leaf prototypes to form the final prototypes </li></ul></ul>
  12. 12. Step 2: Predicting Test Data <ul><li>The weighted average of k nearest prototypes </li></ul><ul><li>Online update the model with new data </li></ul>
  13. 13. Outline <ul><li>Introduction </li></ul><ul><li>The stock selection task </li></ul><ul><li>The Prototype Ranking method </li></ul><ul><li>Experimental results </li></ul><ul><li>Conclusions </li></ul>
  14. 14. Data <ul><li>CRSP daily stock database </li></ul><ul><ul><li>300 NYSE and AMEX stocks, largest market cap </li></ul></ul><ul><ul><li>From 1962 to 2004 </li></ul></ul>
  15. 15. Testing PR <ul><li>Experiment 1: Larger portfolio, lower average return, lower risk – diversification </li></ul><ul><li>Experiment 2: is PR better than Cooper’s method? </li></ul>
  16. 16. Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)
  17. 17. Experiment 2: Comparison to Cooper’s method <ul><li>Cooper’s method (CP): A traditional non-ML method for stock selection… </li></ul><ul><li>Compare PR and CP in 10-stock portfolios </li></ul>
  18. 18. Results of Experiment 2 <ul><li>Measures: </li></ul><ul><li>Average Return (Ret.) </li></ul><ul><li>Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std. </li></ul>
  19. 19. Results Portfolio Performance 1978-1993 1994-2004 PR CP PR CP 10-stock Ave. Return (%) 1.69 0.89 1.37 0.81 STD (%) 3.30 2.80 6.20 5.10 Sharpe Ratio 0.51 0.32 0.22 0.16 20-stock Ave. Return (%) 1.35 0.80 1.32 0.81 STD (%) 2.60 2.10 5.10 4.30 Sharpe Ratio 0.52 0.38 0.26 0.19 30-stock Ave. Return (%) 1.14 0.67 1.16 0.77 STD (%) 2.20 1.80 4.60 3.50 Sharpe Ratio 0.52 0.37 0.27 0.22
  20. 20. Outline <ul><li>Introduction </li></ul><ul><li>The stock selection task </li></ul><ul><li>The Prototype Ranking method </li></ul><ul><li>Experimental results </li></ul><ul><li>Conclusions </li></ul>
  21. 21. Conclusions <ul><li>PR: modified competitive learning and k-NN for noisy and imbalanced stock data </li></ul><ul><li>PR does well in stock selection </li></ul><ul><ul><li>Larger portfolio, lower return, lower risk </li></ul></ul><ul><ul><li>PR outperforms the non-ML method CP </li></ul></ul><ul><li>Future work: use it to invest and make money! </li></ul>