An intelligent scalable stock market prediction system

Guided By:
Prof. Arun Kumar Agrawal
IIT (BHU) Varanasi
Submitted by:
Surbhi Agrawal (11100EN004)
Harshit Agarwal (11100EN006)
Saket Tomer (11100EN007)
Saurabh Ojha (11100EN011)
Gaurav Kumar (11100EN030)

 Definitions
 Objectives
 Back Propagation Neural Network(BPNN)
 Genetically Optimized Neural Network(GONN)
 Hadoop
 Sentiment Analysis

 Stock Market:
Stock markets are a great source of income for millions
of people around the globe. Many lives get affected,
directly or indirectly, by the daily ups and downs of the
stock market.

 Stock Market Variations:
Stock prices change every day as a result of market
forces. By this we mean that share prices change because
of supply and demand.

 Artificial Intelligence and Soft Computing
techniques:
These Technique can predict an overall trend, based on
past data sets, and support the trader in his decision
making.

 Prediction of stock prices using artificial intelligence.
 Deployment of hadoop cluster, as the data used is
very huge (Big Data).
 Use of Sentimental Analysis, as the mood of
companies changes.

 A neural network can be defined as a model of
reasoning based on the human brain.

The learning algorithm has two phases :
 First: The network propagates the input pattern from
layer to layer until the output pattern is generated by
the output layer.
 Second: If this pattern is different from the desired
output, an error is calculated and then propagated
backwards through the network from the output layer
to the input layer. The weights are modified as the
error is propagated.

Input
layer
xi
x1
x2
xn
1
2
i
n
Output
layer
1
2
k
l
yk
y1
y2
yl
Input signals
Error signals
wjk
Hidden
layer
wij
1
2
j
m

Approach and Formulation :
Going toward Mathematical Formualtion
1. Input matrix: X
2. Weight Matrices: Theta1 and Theta2
3. Predicted Output Matrix: p or q
4. Actual Result Matrix: y
5. Error matrix: e

Notations:
 aij = Output of a node j in layer i
 xij = Value of feature j in example i
 uij = Value of weight for link between node j of input
layer to node i of hidden layer
 vi = Value of weight for link between node i of hidden
layer to the only node of output layer)

 The neuron operates on its inputs and
produces an output a as follows :
a = 𝒊=𝟏
𝒏
𝒘𝒊 𝒙𝒊
 To express the activation aj neuron, the
formula is modified as follows:
aj = 𝒊=𝟏
𝒏
𝒘𝒊𝒋 𝒙𝒊

 Activations of a node in input layer:
a11=x11 (closing price of previous day)
a12=x12 (opening price of current day)
a13=x13 (intraday high)
a14=x14 (intraday low)

 Activations of a node in hidden layer:
a21=a11*u11 + a12*u12 + a13*u13 + a14*u14
a22= a11*u21 + a12*u12 + a13*u13 + a14*u14
a23= a11*u11 + a12*u12 + a13*u13 + a14*u14
a24= a11*u11 + a12*u12 + a13*u13 + a14*u14

 Activation of output node:
a3= a21*v1 + a22*v2 + a23*v3 + a24*v4
 Cost Function:
J = 𝑖=1
𝑚
(𝑝𝑖 − 𝑦𝑖)2
 We have used the in-built function in MATLAB
‘fminunc’ to minimize J.

 To eliminate some of the disadvantages of neural
network, and take up the advantages of an
evolutionary approach, we can come up with a model,
which combines both of them, to give the best of both
the approaches.
 Since, this model is inspired from Koza’s model to
optimize the structure of neural network;

Koza used the following function set F and terminal set T for a GP
tree:
F = {P,W, +, -, *, %}
T = {feature variables of dataset, R}
where,
P is linear threshold processing function
W is the weighting function for a signal going into P
+, -, * and % are normal arithmetic functions
D0, D1, .. Dn are the feature variables (input data signal) of dataset.
R contains randomly generated floating point constants.

 The rules of construction for a neural network with one
output signal, which are laid down by Koza are as
follows:
 The root of the tree must be a processing function P.
 The only thing allowed at the level immediately below
any processing function P is a weighting function W.

 The only thing allowed below a weighting function W
on the left child is either a floating-point random
constant or an arithmetic function.
 The only thing allowed below a weighting function W
on the right child is either an input data signal (such as
D0 and D1) or the output of a P function.
 The only thing allowed below an arithmetic function is
either a floating-point random constant or an
arithmetic function (+,-, *,%).

 We have used Hadoop framework in our code to make
sure that our algorithm is able to handle ‘Big Data’.
 MapReduce is the programming model to work on
data within the HDFS, which we have implemented
using Java.

A MapReduce program consists of the following 3 parts :
 Driver - Responsible for building the configuration of
the job and submitting it to the Hadoop Cluster.
 Mapper - Reads the input files as <Key,Value> pairs and
emits key value pairs.
 Reducer - Reads the outputs generated by the different
mappers as <Key,Value> pairs and emits key value
pairs.

1. Hadoop Installation - Download Hadoop and extract
the package to any location:
sudo tar xzf hadoop-1.0.3.tar.gz
2. Update $HOME/.bashrc by adding the following line to
it :
export JAVA_HOME=/usr/lib/jvm/java-6-sun
3. Configure core-site.xml, mapred-site.xml, hdfsite.xml.

4. Formatting the HDFS file system via namenode by
following command:
/usr/local/hadoop/bin/hadoop namenode –
format
5. Now we start our single node Hadoop cluster by
following command:
/usr/local/hadoop/bin/start-all.sh
6. Check all the running Hadoop processes :
/usr/local/hadoop$ jps
7. To stop single node Hadoop cluster run :
/usr/local/hadoop/bin/stop-all.sh

 Uploading the jar file of our code application to the
master instance,
 Uploading the input f iles (past stock market data) on
Google cloud storage bucket associate with the project
 Running the jar file through driver class of
MapReduce by passing the required arguments
[input and output paths]
 Saved the output files , and note down algorithm-
execution information, like time taken.

 Sentiment analysis refers to the use of natural
language processing, text analysis and computational
linguistics to identify and extract subjective
information in source materials.
 If the news is positive, then there are good chances
that the share prices of that company will go up.
 Similarly, if the news article has an overall negative
sentiment, the share prices of the company or industry
can go down.

Begin
Hashmap Create Hashmap for positive words as P
Hashmap Create Hashmap for negatie words as N.
Declaration declare and initialize an integer variable Ans = 0 to store the output value
For each word in the text do
If word belongs to P then
Increment Ans = Ans + 1
Else if word belongs to N then
Decrement Ans = Ans-1
End if.
End for
If Ans > 0 then
Print Positive news, share prices likely to go up
Else if Ans <0, then
Print Negative News, share prices likely to go down
else
Print Neutral news, no predictable effect on share price
end if
end

An intelligent scalable stock market prediction system

An intelligent scalable stock market prediction system

More Related Content

What's hot

Viewers also liked

Similar to An intelligent scalable stock market prediction system

Recently uploaded

An intelligent scalable stock market prediction system