SlideShare a Scribd company logo
STOCK ANALYSIS
- Financial Indicators of a stock price -
1
GOAL
Can a novice investor
buy a stock using
machine learning
algorithms?
2
AGENDA
 What is a Stock?
 What does it
mean to own
stock in a
company?
Basics
01
 What sample?
 What features?
 What size?
Dataset
 Zero dominance
 Null dominance
 Outlier detection
Preprocessing
 Feature selection
 Modeling
 Comparison
Modelling
 Important
variables
 Future work
Conclusion
02 03 04 05
3
BASICS
What is a stock?
what does it mean to
own stock in a company?
4
Each company is part of a
sector that classifies it in a
macro-area
US BASED STOCKS
01
02
03
• 4392 companies
• 225 financial indicators
• 2018 year
Sample size
• Binary Class
• 1 Class
• 0 Class
Target variable
These are commonly found in the
10-K filings each publicly traded company
releases yearly
From Kaggle
DATASET
DATASET – Sectors
6
DATASET – TARGET
• Binary Class
• 1 indicates should Buy
• 0 indicates should NOT Buy
• Year 2019
1 – 70%
0 - 30%
Target Variable
7
NULL DOMINANCE
• Removed feature wise
• 13 features have above
40% missing values
ZERO DOMINANCE
• Removed Feature wise
• 10 features have above 65%
zero values
Data Preprocessing
8
IMPUTING NULLS
Replacing remaining missing
values with their means
according to each sector
INORGANIC GROWTH
• Price variation above
300%
• 19 companies
Data Preprocessing Cont’d:
OUTLIER DETECTION
• Z-scores
• IQR
• Coerce
9
NAN
200+ financial indicators at RANDOM!
10
Feature Selection methods
 Univariate Feature Selection - Chi-squared
 Wrapper Select via model
 Mutual info Classification
 Stepwise Recursive Backwards Feature removal
 L2 Regularization
 Exhaustive search
 Genetic search
11
 Decision tree Classifier
 Random Forest Classifier
 Gradient Decent Classifier
 Ada Boost Classifier
 Neural Network Classifier
 Logistic Regression
 Support Vector Machine
Modeling methods
Modeling – Decision Tree, Random Forest
Selection method Accuracy AUC # Features
selected
Low Variance filter
< 20%
0.64 (+/- 0.06) 0.59 (+/- 0.03) 10
Stepwise Backward
Removal
0.62 (+/- 0.07) 0.57 (+/- 0.03) 5
Mutual Info
Classification
0.63 (+/- 0.05 0.57 (+/- 0.04) 10
Wrapper Select 0.65 (+/- 0.04) 0.60 (+/- 0.04) 68
Selection method Accuracy AUC # Features
selected
Low Variance filter
< 20%
0.71 (+/- 0.06) 0.77 (+/-0.02) 151
Stepwise Backward
Removal
0.69 (+/- 0.06) 0.70 (+/0.03) 5
Mutual Info
Classification
0.72 (+/-0.06) 0.77 (+/0.01) 10
Wrapper Select
Random Forest
0.72 (+/- 0.06) 0.77 (+/-0.01) 62
Decision tree Random Forest
12
Decision Boundaries– Random Forest vs Decision Tree
13
 Decision boundary technique helps to develop an intuition of how a model work
 These boundaries separate the data-points (companies) into regions signifying different classes (1,0)
Actual 0 Class Actual 1 Class
Modeling – Boosting Methods
14
71 71
70
68
66
76 76
72
70
69
3 5 10 15 20
InPercentages
Depth of a tree
Max Depth vs CV Performance
Accuracy AUC
Selection method Accuracy AUC # Features
selected
Low Variance filter
< 20%
0.72 (+/- 0.05) 0.77 (+/- 0.03) 151
Stepwise Backwards
Removal
0.71 (+/- 0.05) 0.75 (+/- 0.03) 5
Mutual Info
Classification
0.72 (+/- 0.02) 0.73 (+/- 0.02) 10
Wrapper Select 0.72 (+/- 0.04) 0.78 (+/- 0.03) 51
Selection method Accuracy AUC # Features
selected
Low Variance filter
< 20%
0.72 (+/- 0.05) 0.76 (+/- 0.02) 151
Stepwise Backwards
Removal
0.71 (+/- 0.06) 0.73 (+/- 0.04) 5
Mutual Info
Classification
0.72 (+/- 0.03) 0.73 (+/- 0.02) 10
Wrapper Select 0.72 (+/- 0.05) 0.77 (+/- 0.02) 51
Gradient Boost
Ada Boosting
Decision Boundaries– Gradient Boost vs AdaBoost
15
 Blue region classifies to “NOT Buy” class while orange classifies to “Buy” class.
Actual 0 Class Actual 1 Class
Modeling – Neural Network
16
Selection method Accuracy AUC # Features
selected
Low Variance filter
< 20%
0.68
(+/- 0.05)
0.64
(+/- 0.05)
151
Stepwise Backwards
Removal
0.69
(+/- 0.01)
0.52
(+/- 0.07)
5
Mutual Info
Classification
0.71
(+/- 0.02)
0.69
(+/- 0.03)
10
Wrapper Select 0.70
(+/- 0.03)
0.59
(+/- 0.07)
51
Modeling – SVM
17
Selection methcod Accuracy AUC # Features
selected
Low Variance filter
< 20%
0.72
(+/- 0.02)
0.72
(+/- 0.03)
88
Stepwise Backwards
Removal
0.69
(+/- 0.00)
0.67
(+/- 0.03)
5
Mutual Info Classification 0.70
(+/- 0.00)
0.60
(+/- 0.03)
5
Wrapper Select 0.71
(+/- 0.02)
0.73
(+/- 0.03)
62
Model comparison
18
Important Features
19
A calculation used to gauge the
quality of a company's earnings
per share (EPS).
EPS Diluted:
If a company has been buying
back shares, this number will be
negative.
Weighted Average Shares:
The amount of money that would be
returned to shareholders if all of
the assets were liquidated.
Shareholders Equity:
The portion of a
company's profit that is allocated
to each outstanding share of its
common stock.
Net Income per share:
Earnings per share divided
by the share price.
Earnings Yield:
Long-term assets that have a
useful life of more than one
year.
Total non-current assets:
Compare model results to
2020 financial year.
Compute gains, losses and
ROI for each stock.
Future work
Plotting decision boundaries
using 2 components.
20

More Related Content

Similar to Future stock performance presentation

Machine Learning Training in Jalandhar
Machine Learning Training in JalandharMachine Learning Training in Jalandhar
Machine Learning Training in Jalandhar
E2MATRIX
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
Asoka Korale
 
Machine Learning Training in Mohali
Machine Learning Training in MohaliMachine Learning Training in Mohali
Machine Learning Training in Mohali
E2MATRIX
 
Get Competitive with Driverless AI
Get Competitive with Driverless AIGet Competitive with Driverless AI
Get Competitive with Driverless AI
Sri Ambati
 
Machine Learning Training in Phagwara
Machine Learning Training in PhagwaraMachine Learning Training in Phagwara
Machine Learning Training in Phagwara
E2MATRIX
 
CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...
CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...
CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...
CanSecWest
 
Estimating Tail Parameters
Estimating Tail ParametersEstimating Tail Parameters
Estimating Tail Parameters
Alejandro Ortega
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
Ayan Omer
 
An Introduction to Bayesisan Decision Analysis
An Introduction to Bayesisan Decision Analysis An Introduction to Bayesisan Decision Analysis
An Introduction to Bayesisan Decision Analysis
Medgate Inc.
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learning
Armando Vieira
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
Jeff Birkner
 
Risk management Report
Risk management ReportRisk management Report
Risk management Report
NewGate India
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Mm3 project ppt group 1_section a
Mm3 project ppt group 1_section aMm3 project ppt group 1_section a
Mm3 project ppt group 1_section a
Abhijeet Dash
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
Andrea Gigli
 
Marketing Analytics RM Report
Marketing Analytics RM ReportMarketing Analytics RM Report
Marketing Analytics RM Report
Logan Moore
 
Sgf2008 146 2008
Sgf2008 146 2008Sgf2008 146 2008
Sgf2008 146 2008
trexpruitt
 
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Lili Zhang
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Ahp
AhpAhp

Similar to Future stock performance presentation (20)

Machine Learning Training in Jalandhar
Machine Learning Training in JalandharMachine Learning Training in Jalandhar
Machine Learning Training in Jalandhar
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
 
Machine Learning Training in Mohali
Machine Learning Training in MohaliMachine Learning Training in Mohali
Machine Learning Training in Mohali
 
Get Competitive with Driverless AI
Get Competitive with Driverless AIGet Competitive with Driverless AI
Get Competitive with Driverless AI
 
Machine Learning Training in Phagwara
Machine Learning Training in PhagwaraMachine Learning Training in Phagwara
Machine Learning Training in Phagwara
 
CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...
CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...
CSW2017 Minrui yan+Jianhao-liu a visualization tool for evaluating can-bus cy...
 
Estimating Tail Parameters
Estimating Tail ParametersEstimating Tail Parameters
Estimating Tail Parameters
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
 
An Introduction to Bayesisan Decision Analysis
An Introduction to Bayesisan Decision Analysis An Introduction to Bayesisan Decision Analysis
An Introduction to Bayesisan Decision Analysis
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learning
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
 
Risk management Report
Risk management ReportRisk management Report
Risk management Report
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Mm3 project ppt group 1_section a
Mm3 project ppt group 1_section aMm3 project ppt group 1_section a
Mm3 project ppt group 1_section a
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Marketing Analytics RM Report
Marketing Analytics RM ReportMarketing Analytics RM Report
Marketing Analytics RM Report
 
Sgf2008 146 2008
Sgf2008 146 2008Sgf2008 146 2008
Sgf2008 146 2008
 
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Ahp
AhpAhp
Ahp
 

Recently uploaded

一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 

Recently uploaded (20)

一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 

Future stock performance presentation

  • 1. STOCK ANALYSIS - Financial Indicators of a stock price - 1
  • 2. GOAL Can a novice investor buy a stock using machine learning algorithms? 2
  • 3. AGENDA  What is a Stock?  What does it mean to own stock in a company? Basics 01  What sample?  What features?  What size? Dataset  Zero dominance  Null dominance  Outlier detection Preprocessing  Feature selection  Modeling  Comparison Modelling  Important variables  Future work Conclusion 02 03 04 05 3
  • 4. BASICS What is a stock? what does it mean to own stock in a company? 4
  • 5. Each company is part of a sector that classifies it in a macro-area US BASED STOCKS 01 02 03 • 4392 companies • 225 financial indicators • 2018 year Sample size • Binary Class • 1 Class • 0 Class Target variable These are commonly found in the 10-K filings each publicly traded company releases yearly From Kaggle DATASET
  • 7. DATASET – TARGET • Binary Class • 1 indicates should Buy • 0 indicates should NOT Buy • Year 2019 1 – 70% 0 - 30% Target Variable 7
  • 8. NULL DOMINANCE • Removed feature wise • 13 features have above 40% missing values ZERO DOMINANCE • Removed Feature wise • 10 features have above 65% zero values Data Preprocessing 8
  • 9. IMPUTING NULLS Replacing remaining missing values with their means according to each sector INORGANIC GROWTH • Price variation above 300% • 19 companies Data Preprocessing Cont’d: OUTLIER DETECTION • Z-scores • IQR • Coerce 9 NAN
  • 10. 200+ financial indicators at RANDOM! 10
  • 11. Feature Selection methods  Univariate Feature Selection - Chi-squared  Wrapper Select via model  Mutual info Classification  Stepwise Recursive Backwards Feature removal  L2 Regularization  Exhaustive search  Genetic search 11  Decision tree Classifier  Random Forest Classifier  Gradient Decent Classifier  Ada Boost Classifier  Neural Network Classifier  Logistic Regression  Support Vector Machine Modeling methods
  • 12. Modeling – Decision Tree, Random Forest Selection method Accuracy AUC # Features selected Low Variance filter < 20% 0.64 (+/- 0.06) 0.59 (+/- 0.03) 10 Stepwise Backward Removal 0.62 (+/- 0.07) 0.57 (+/- 0.03) 5 Mutual Info Classification 0.63 (+/- 0.05 0.57 (+/- 0.04) 10 Wrapper Select 0.65 (+/- 0.04) 0.60 (+/- 0.04) 68 Selection method Accuracy AUC # Features selected Low Variance filter < 20% 0.71 (+/- 0.06) 0.77 (+/-0.02) 151 Stepwise Backward Removal 0.69 (+/- 0.06) 0.70 (+/0.03) 5 Mutual Info Classification 0.72 (+/-0.06) 0.77 (+/0.01) 10 Wrapper Select Random Forest 0.72 (+/- 0.06) 0.77 (+/-0.01) 62 Decision tree Random Forest 12
  • 13. Decision Boundaries– Random Forest vs Decision Tree 13  Decision boundary technique helps to develop an intuition of how a model work  These boundaries separate the data-points (companies) into regions signifying different classes (1,0) Actual 0 Class Actual 1 Class
  • 14. Modeling – Boosting Methods 14 71 71 70 68 66 76 76 72 70 69 3 5 10 15 20 InPercentages Depth of a tree Max Depth vs CV Performance Accuracy AUC Selection method Accuracy AUC # Features selected Low Variance filter < 20% 0.72 (+/- 0.05) 0.77 (+/- 0.03) 151 Stepwise Backwards Removal 0.71 (+/- 0.05) 0.75 (+/- 0.03) 5 Mutual Info Classification 0.72 (+/- 0.02) 0.73 (+/- 0.02) 10 Wrapper Select 0.72 (+/- 0.04) 0.78 (+/- 0.03) 51 Selection method Accuracy AUC # Features selected Low Variance filter < 20% 0.72 (+/- 0.05) 0.76 (+/- 0.02) 151 Stepwise Backwards Removal 0.71 (+/- 0.06) 0.73 (+/- 0.04) 5 Mutual Info Classification 0.72 (+/- 0.03) 0.73 (+/- 0.02) 10 Wrapper Select 0.72 (+/- 0.05) 0.77 (+/- 0.02) 51 Gradient Boost Ada Boosting
  • 15. Decision Boundaries– Gradient Boost vs AdaBoost 15  Blue region classifies to “NOT Buy” class while orange classifies to “Buy” class. Actual 0 Class Actual 1 Class
  • 16. Modeling – Neural Network 16 Selection method Accuracy AUC # Features selected Low Variance filter < 20% 0.68 (+/- 0.05) 0.64 (+/- 0.05) 151 Stepwise Backwards Removal 0.69 (+/- 0.01) 0.52 (+/- 0.07) 5 Mutual Info Classification 0.71 (+/- 0.02) 0.69 (+/- 0.03) 10 Wrapper Select 0.70 (+/- 0.03) 0.59 (+/- 0.07) 51
  • 17. Modeling – SVM 17 Selection methcod Accuracy AUC # Features selected Low Variance filter < 20% 0.72 (+/- 0.02) 0.72 (+/- 0.03) 88 Stepwise Backwards Removal 0.69 (+/- 0.00) 0.67 (+/- 0.03) 5 Mutual Info Classification 0.70 (+/- 0.00) 0.60 (+/- 0.03) 5 Wrapper Select 0.71 (+/- 0.02) 0.73 (+/- 0.03) 62
  • 19. Important Features 19 A calculation used to gauge the quality of a company's earnings per share (EPS). EPS Diluted: If a company has been buying back shares, this number will be negative. Weighted Average Shares: The amount of money that would be returned to shareholders if all of the assets were liquidated. Shareholders Equity: The portion of a company's profit that is allocated to each outstanding share of its common stock. Net Income per share: Earnings per share divided by the share price. Earnings Yield: Long-term assets that have a useful life of more than one year. Total non-current assets:
  • 20. Compare model results to 2020 financial year. Compute gains, losses and ROI for each stock. Future work Plotting decision boundaries using 2 components. 20

Editor's Notes

  1. What financial indicators should one look into before buying a stock?
  2. The objective of this project is to find out whether a person without any prior financial experience will be able to invest in stocks that are trustworthy by applying machine learning algorithms. Moreover, is it possible to accomplish this solely by analyzing the financial indicators of a company?
  3. The word “stock” refers to a share of ownership in a particular company If you own a stock, you are an owner of a very small fraction of that company.
  4. The dataset is obtained from Kaggle and sample contains 4392 companies with 225 features. This information is commonly found in the 10-k filings which are released yearly.
  5. In the sample each company is a part of a sector in macro-area. There are diverse sectors like Healthcare, Real estate, technology. And the companies include amazon, roku, Infosys, nexa,
  6. From a trading perspective, class 1 identifies those stocks that a trader should BUY at the start of the year and sell at the end of the year for a profit. The class 0 identifies those stocks that a trader should NOT BUY, since their value will decrease, meaning a loss
  7. One of the biggest challenges was dealing with missing values and values which are zero, the features which have more than 40% missing values and 65% zero values were omitted from the analysis. So totally 25 features
  8. There are 20 companies with price variation above 300% due to miss typing's which are also been omitted. Some financial indicators show huge discrepancy between max value and 75% quantile so I have droped bottom 3% and the top 3% of the data. The remaining nulls have been replaced wrt their sectors
  9. 25 features were removed I believe, feature selection is going to play a crucial role since this can be overwhelming to a novice investor. So part of the goal is to have less number of features as possible
  10. I was experimenting with various algorithms and feature selection methods. Also I tried different parameter combinations and compared the model performances
  11. Random forest exhibited better performance. AUC are higher than Accuracy in random forest whereas that’s opposite in decision tree.
  12.  these scatter plots show classification of the companies in Feature Space using two of the most important features. Net income per share and earnings yield. Blue region classifies to “NOT Buy” class while orange classifies to “Buy” class. Random forest better classified than decision tree
  13. As you can see in the graph, I’ve limited to 3 nodes in the tree since the performance seems to decrease as the max depth increases. Both the boosting method performance was comparable.
  14. Specific to net income per share and earning yield features
  15. The neural network model consists of 2 Hidden layers with 10 nodes each. Compared to other models neural network had least performance. Neural network misclassified many of the companies
  16. Same is the case with SVM. The performance was comparable to each other. However, the variation in scores is less in svm compared to other models
  17. These are the common features selected by various feature selection methods. I believe these financial indicators will contribute towards buying a stock or not. From my analysis, financial indicators like, Net income pers hare, shareholders equity, total non-current assets should be kept in mind while buying a stock.
  18. I would like to compare my classification results to this year, also I would try to plot decision boundaries with reduced dimentionality using PCA or LDA.