SlideShare a Scribd company logo
Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}@csd.uwo.ca
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Introduction Objective:  Use machine learning to select a small number of “good” stocks to form a portfolio  Research questions: Learning in the noisy dataset Learning in the imbalanced dataset Our solution: Prototype Ranking A specially designed machine learning method
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Stock Selection Task Given information prior to week  t , predict performance of stocks of week  t Training set Learning a ranking function to rank   testing data Select  n  highest to buy,  n  lowest to short-sell  Predictor 1 Predictor 2 Predictor 3 Goal Stock ID Return of week  t -1 Return of week  t -2 Volume ratio of  t -2/ t -1 Return of  week  t
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Prototype Ranking Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data
Step 1: Finding Prototypes Prototypes: representative points Goal:  discover the underlying density/clusters of the training  samples by distributing  prototypes in sample  space Reduce data size prototypes prototype  neighborhood samples
Analysis??? Competitive learning for stock selection task Pros: Noise-tolerant On-line update: practical for huge dataset Smoothly simulate the training samples Cons: Searching the nearest prototype is tedious  Poor performance for the prediction task Design for tasks such as clustering, feature mapping… The stock selection is a prediction task Poor performance for imbalanced dataset modeling
Finding prototypes using  competitive learning General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes Hidden density in training is reflected in  prototypes
Modifications for Stock data In step 1: Initial prototypes organized in a tree-structure Fast nearest prototype searching  In step 2: Searching prototypes in the  predictor space Better learning effect for the prediction tasks In step 3: Adjusting prototypes in the  goal attribute space Better learning effect in the imbalanced stock data  In step 4, prune the prototype tree Prune children prototypes if they are similar to the parent Combine leaf prototypes to form the final prototypes
Step 2: Predicting Test Data The weighted average of  k  nearest prototypes Online update the model with new data
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Data CRSP daily stock database 300 NYSE and AMEX stocks, largest market cap From 1962 to 2004
Testing PR Experiment 1: Larger portfolio, lower average return, lower risk – diversification Experiment 2: is PR better than Cooper’s method?
Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)
Experiment 2: Comparison to Cooper’s method Cooper’s method (CP): A traditional non-ML method for stock selection… Compare PR and CP in 10-stock portfolios
Results of Experiment 2  Measures:  Average Return (Ret.) Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.
Results Portfolio Performance 1978-1993 1994-2004 PR CP PR CP 10-stock Ave. Return (%) 1.69 0.89 1.37 0.81 STD (%) 3.30 2.80 6.20 5.10 Sharpe Ratio 0.51 0.32 0.22 0.16 20-stock Ave. Return (%) 1.35 0.80 1.32 0.81 STD (%) 2.60 2.10 5.10 4.30 Sharpe Ratio 0.52 0.38 0.26 0.19 30-stock Ave. Return (%) 1.14 0.67 1.16 0.77 STD (%) 2.20 1.80 4.60 3.50 Sharpe Ratio 0.52 0.37 0.27 0.22
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Conclusions PR: modified competitive learning and k-NN for  noisy and imbalanced stock data PR does well in stock selection Larger portfolio, lower return, lower risk PR outperforms the non-ML method CP Future work: use it to invest and make money!

More Related Content

What's hot

Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
Babu Priyavrat
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
StepUp Analytics
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
Christopher Marker
 
Developing a Computerized Adaptive Test
Developing a Computerized Adaptive TestDeveloping a Computerized Adaptive Test
Developing a Computerized Adaptive Test
Nathan Thompson
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
butest
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - Introduction
Marina Santini
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
Chode Amarnath
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
Machine Learning Valencia
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Generating SPSS training materials in StatJR
Generating SPSS training materials in StatJRGenerating SPSS training materials in StatJR
Generating SPSS training materials in StatJR
University of Southampton
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
Abraham Kong
 
Sampling and measurement
Sampling and measurementSampling and measurement
Sampling and measurement
Praveen Minz
 
RapidMiner: Learning Schemes In Rapid Miner
RapidMiner:  Learning Schemes In Rapid MinerRapidMiner:  Learning Schemes In Rapid Miner
RapidMiner: Learning Schemes In Rapid Miner
DataminingTools Inc
 
Stareast2008
Stareast2008Stareast2008
Stareast2008
JaAe CK
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
 

What's hot (20)

Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Developing a Computerized Adaptive Test
Developing a Computerized Adaptive TestDeveloping a Computerized Adaptive Test
Developing a Computerized Adaptive Test
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - Introduction
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Generating SPSS training materials in StatJR
Generating SPSS training materials in StatJRGenerating SPSS training materials in StatJR
Generating SPSS training materials in StatJR
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
 
Sampling and measurement
Sampling and measurementSampling and measurement
Sampling and measurement
 
RapidMiner: Learning Schemes In Rapid Miner
RapidMiner:  Learning Schemes In Rapid MinerRapidMiner:  Learning Schemes In Rapid Miner
RapidMiner: Learning Schemes In Rapid Miner
 
Stareast2008
Stareast2008Stareast2008
Stareast2008
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 

Similar to KDD

Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
AdwaitLaud
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
butest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Vikash Kumar
 
Real-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert AdviceReal-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert Advice
Hila Becker
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
AnjaliJain608033
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
mobilizer1000
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
Chirag Gupta
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
ESCOM
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
Thesis presentation: Applications of machine learning in predicting supply risks
Thesis presentation: Applications of machine learning in predicting supply risksThesis presentation: Applications of machine learning in predicting supply risks
Thesis presentation: Applications of machine learning in predicting supply risks
TuanNguyen1697
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
Joaquin Vanschoren
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
David Zibriczky
 
Introduction
IntroductionIntroduction
Introduction
butest
 

Similar to KDD (20)

Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Real-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert AdviceReal-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert Advice
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Thesis presentation: Applications of machine learning in predicting supply risks
Thesis presentation: Applications of machine learning in predicting supply risksThesis presentation: Applications of machine learning in predicting supply risks
Thesis presentation: Applications of machine learning in predicting supply risks
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Introduction
IntroductionIntroduction
Introduction
 

More from Shyam Singh

demo
		demo				demo
demo
Shyam Singh
 
d234
d234d234
pdf file test
pdf file testpdf file test
pdf file test
Shyam Singh
 
Demo
DemoDemo
iip
iipiip
slide->title; ?>
slide->title; ?>slide->title; ?>
slide->title; ?>
Shyam Singh
 
iipu
iipuiipu
iip
iipiip
demo1
demo1demo1

More from Shyam Singh (9)

demo
		demo				demo
demo
 
d234
d234d234
d234
 
pdf file test
pdf file testpdf file test
pdf file test
 
Demo
DemoDemo
Demo
 
iip
iipiip
iip
 
slide->title; ?>
slide->title; ?>slide->title; ?>
slide->title; ?>
 
iipu
iipuiipu
iipu
 
iip
iipiip
iip
 
demo1
demo1demo1
demo1
 

KDD

  • 1. Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}@csd.uwo.ca
  • 2. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 3. Introduction Objective: Use machine learning to select a small number of “good” stocks to form a portfolio Research questions: Learning in the noisy dataset Learning in the imbalanced dataset Our solution: Prototype Ranking A specially designed machine learning method
  • 4. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 5. Stock Selection Task Given information prior to week t , predict performance of stocks of week t Training set Learning a ranking function to rank testing data Select n highest to buy, n lowest to short-sell Predictor 1 Predictor 2 Predictor 3 Goal Stock ID Return of week t -1 Return of week t -2 Volume ratio of t -2/ t -1 Return of week t
  • 6. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 7. Prototype Ranking Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data
  • 8. Step 1: Finding Prototypes Prototypes: representative points Goal: discover the underlying density/clusters of the training samples by distributing prototypes in sample space Reduce data size prototypes prototype neighborhood samples
  • 9. Analysis??? Competitive learning for stock selection task Pros: Noise-tolerant On-line update: practical for huge dataset Smoothly simulate the training samples Cons: Searching the nearest prototype is tedious Poor performance for the prediction task Design for tasks such as clustering, feature mapping… The stock selection is a prediction task Poor performance for imbalanced dataset modeling
  • 10. Finding prototypes using competitive learning General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes Hidden density in training is reflected in prototypes
  • 11. Modifications for Stock data In step 1: Initial prototypes organized in a tree-structure Fast nearest prototype searching In step 2: Searching prototypes in the predictor space Better learning effect for the prediction tasks In step 3: Adjusting prototypes in the goal attribute space Better learning effect in the imbalanced stock data In step 4, prune the prototype tree Prune children prototypes if they are similar to the parent Combine leaf prototypes to form the final prototypes
  • 12. Step 2: Predicting Test Data The weighted average of k nearest prototypes Online update the model with new data
  • 13. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 14. Data CRSP daily stock database 300 NYSE and AMEX stocks, largest market cap From 1962 to 2004
  • 15. Testing PR Experiment 1: Larger portfolio, lower average return, lower risk – diversification Experiment 2: is PR better than Cooper’s method?
  • 16. Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)
  • 17. Experiment 2: Comparison to Cooper’s method Cooper’s method (CP): A traditional non-ML method for stock selection… Compare PR and CP in 10-stock portfolios
  • 18. Results of Experiment 2 Measures: Average Return (Ret.) Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.
  • 19. Results Portfolio Performance 1978-1993 1994-2004 PR CP PR CP 10-stock Ave. Return (%) 1.69 0.89 1.37 0.81 STD (%) 3.30 2.80 6.20 5.10 Sharpe Ratio 0.51 0.32 0.22 0.16 20-stock Ave. Return (%) 1.35 0.80 1.32 0.81 STD (%) 2.60 2.10 5.10 4.30 Sharpe Ratio 0.52 0.38 0.26 0.19 30-stock Ave. Return (%) 1.14 0.67 1.16 0.77 STD (%) 2.20 1.80 4.60 3.50 Sharpe Ratio 0.52 0.37 0.27 0.22
  • 20. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 21. Conclusions PR: modified competitive learning and k-NN for noisy and imbalanced stock data PR does well in stock selection Larger portfolio, lower return, lower risk PR outperforms the non-ML method CP Future work: use it to invest and make money!