13. Correlation Analysis
• Vraiable that were co related to each other and was redundant was
removed.
• E.g
• Payout ratio related with revenue per share
• Gross profit related with profit margin
14. Missing Values imputation
• Missing value of a given stock is replaced by the Average figure of that
variable for that stock. Step 1
• Step 2. remaining Missing values of a stock were replaced by the
Average figure of that variable for given industry.
• Step 3. still remaining Missing values of a stock were replaced by the
Average figure of that variable for given Sector
• Stock level avgs , industry level avg, sector level avg.
15. Missing Values imputation
• Missing value of a given stock is replaced by the Average figure of that
variable for that stock. Step 1
• Step 2. remaining Missing values of a stock were replaced by the
Average figure of that variable for given industry.
• Step 3. still remaining Missing values of a stock were replaced by the
Average figure of that variable for given Sector
• Stock level avgs , industry level avg, sector level avg.
16. Missing Values imputation
• Missing value of a given stock is replaced by the Average figure of that
variable for that stock. Step 1
• Step 2. remaining Missing values of a stock were replaced by the
Average figure of that variable for given industry.
• Step 3. still remaining Missing values of a stock were replaced by the
Average figure of that variable for given Sector
• Stock level avgs , industry level avg, sector level avg.
17. Missing Values imputation
• Missing value of a given stock is replaced by the Average figure of that
variable for that stock. Step 1
• Step 2. remaining Missing values of a stock were replaced by the
Average figure of that variable for given industry.
• Step 3. still remaining Missing values of a stock were replaced by the
Average figure of that variable for given Sector
• Stock level avgs , industry level avg, sector level avg.
18. Missing Values imputation
• Missing value of a given stock is replaced by the Average figure of that
variable for that stock. Step 1
• Step 2. remaining Missing values of a stock were replaced by the
Average figure of that variable for given industry.
• Step 3. still remaining Missing values of a stock were replaced by the
Average figure of that variable for given Sector
• Stock level avgs , industry level avg, sector level avg.
19. Data Transformation
• Encoded categorical features to make them fit for ML Algo
• Converted stock price into three levels
increase
decrease
remains same
•
20. Data Transformation
• Encoded categorical features to make them fit for ML Algo
• Converted stock price into three levels
increase
decrease
remains same
•
22. Building Decision tree
• Simple decision tree
accuracy on test data = 44.5
accuracy 10 fold CV = 44.8
23. Building Decision tree
• Boosting approach (EdaBoost)
base estimator is DT.
Number of estimators are 101
accuracy on test data = 50.9
accuracy 10 fold CV = 49.9
24. Is accuracy good in short or long term ?
• Data is divided into four parts
• Checked after quarter . Data points 4011
• Checked after six months. DP 1349
• Checked after year. DP 1156
• Checked after more than a year DP. 1155
25. Analysis – within a quarter
• Simple decision tree
• accuracy on test data = 37.2
• Running bagging decision tree
• accuracy on test data = 41.2
accuracy 10 fold CV = 44.6
26. Analysis – after six months
• Simple decision tree
• accuracy on test data = 52.9
• Running bagging decision tree
• accuracy on test data = 62.9
accuracy 10 fold CV = 60.1
27. Analysis – within one year
• Simple decision tree
• accuracy on test data = 51.3
• Running bagging decision tree
• accuracy on test data = 61.2
accuracy 10 fold CV = 59.9
28. Analysis – after one year
• Simple decision tree
• accuracy on test data = 64.9
• Running bagging decision tree
• accuracy on test data = 72.7
accuracy 10 fold CV = 70.8
29. Analysis – after one year
• Simple decision tree
• accuracy on test data = 64.9
• Running bagging decision tree
• accuracy on test data = 72.7
accuracy 10 fold CV = 70.8