1. Stock Market Prediction using
Deep Learning Models
Kondal Kolipaka
Liverpool John Moores University
Student number: 931219
2. Outline
• Introduction
• Problem Description
• Aim and objectives
• Literature Study
• Research Methodology
• Analysis
• Results and Discussions
• Conclusion and future work
3. Introduction
• Stock market prediction is the act of determine
the future value of a company stock or other
financial instrument traded on an exchange.
• Predicting the stock market performance is a very
large and profitable area of the study
• The successful prediction of a stock's future price
could yield significant profit
• BSE Sensex 7th largest stock exchange in the world
with US $ 2.8 trillion market cap and index
represents 30 largest companies listed on the
exchange
4. Problem Description
• Stock market is an interesting task for researchers and academicians,
it divides them into two groups
• Not possible to predict the stock market – Efficient Market
Hypothesis(EMH) Principle
• There is a scope to beat the stock market
• Deep Learning Models for prediction
• LSTM for stock market prediction
• Not many researchers have used numerical and textual analysis for
prediction
• Hybrid LSTM model
5. Aim and objectives of the study
• Assist the investors to make better decisions
• Find the gaps presented in the past
• Prediction model based on Stock historical data and news data
• Model identification
• Model building
• Model performance analysis
6. Research Questions
• Can we combine Stocks numerical analysis and textual analysis to
predict the stock market?
• What is the best machine learning model for stock market prediction?
• How to classify the business news for public sentiment analysis?
• How historical data and text data techniques help to generate better
stock market prediction?
7. Literature Review
• Numerical data – India and international markets
• Textual Data – News, twitter feeds, blogs
• Linear Models - AR, MA, ARIMA, ARMA
• Deep Learning Models – RNN, MLP, CNN, LSTM
• Hybrid Models - ARIMA-BPNN , ARIMA-GRU, LSTM and ensemble EMD
Limitations
• Either focused on Stock historical data or news sentiment data
• Not much research into merging numerical and text analysis data and
predicting stock market
9. Analysis
• Numerical Data
oBSE Sensex historical data
downloaded from Yahoo
Finance
o15 years of data (30-06-2005
to 29-06-2020)
oDaily-price for 3672 days
oVariables – Date, Open, High,
Low, Close, Adj Close and
Volume
oMain variable: Close
• Text Data
• News headlines published by
Times of India, Harvard
Dataverse
• 20 years of data (till mid of
2020)
• 3.3 million records
• Variables – publish_date,
headline_category,
headline_text
• Main variable: headline_text
11. Analysis
• Data cleansing and pre-processing
• Numerical data: Dropping null values, missing data, Outlier
detection, feature selection
• Text Data: Dropping null values, feature selection, data range
• Exploratory data analysis
• Numerical Data Modeling – ARIMA & LSTM
ARIMA Model Prediction LSTM Model prediction
12. Cont. Analysis
LSTM Hybrid Model
Add the sentiment of the texts to the
original LSTM and see if there is an
improvement in the performance
• Date
• Close
• Headline_text => Sentiment Score
Model Parameters:
• 80:20 training and validation set
• Tanh activation function
• Adam optimizer
• Batch size 16
• Epochs 100
Text Analysis
• Naïve Bayes Classifier
• SVM Classifier
• Random Forest
Classifier
13. Results and Discussions
ARIMA model Performance
Parameter Result
MSE 14469805.031856986
MAE 2620.2431482654974
RMSE 3803.9196931398255
MAPE 0.07676215004310963
Parameter Result
MSE 637816.3887958465
MAE 650.9328685484523
RMSE 798.6340769062177
MAPE 0.01779417716769563
Classification Model Accuracy
Naive Bayes Classifier 0.751
SVM Classifier 0.888
Random Forest Classifier 0.842
Parameter Result
MSE 243371.66329966017
MAE 317.4715822069669
RMSE 493.32713618820947
MAPE 0.009039365197613879
Model MSE MAE RMSE MAPE
ARIMA 14469805.032 2620.243 3803.919 0.0767
LSTM 637816.388 650.932 798.634 0.0177
Hybrid LSTM 243371.663 317.471 483.327 0.009
LSTM model Performance
Text Analysis
Hybrid Model
Different model performance
Around 7.6% MAPE represents the model is about 92.4% accurate
in predicting the stock price over test set.
Around 1.7% MAPE represents the model is about 98.3% accurate
in predicting the stock price over test set.
Around 0.9% MAPE represents the model is about 99.1% accurate in predicting
the stock price over test dataset which is a great improvement compared with the
individual LSTM model and ARIMA model.
14. Contributions and future work
• Novel approach for prediction of stock market by combining numerical and text analysis data
• With the LSTM Hybrid model where the text analysis is augmented over the prediction of the
numerical analysis by combing the sentiment with the closing price of the numerical data, resulted in
an MAPE of 0.0090 and RMSE of 493.32. This clearly shows by combining the sentiment analysis
data with the historical data we are able to get the better results than the individual LSTM model.
• Results can be improved for numerical analysis by using more sophisticated approaches like
stacked auto encoders presented by the Wei Bao (Bao, Yue and Rao, 2017) where novel deep
learning framework studies by combining wavelet transforms(WT), stacked autoencoders (SAEs)
and long-short term memory (LSTM) for forecasting the stock prices