Guided By:
Prof. Arun Kumar Agrawal
IIT (BHU) Varanasi
Submitted by:
Surbhi Agrawal (11100EN004)
Harshit Agarwal (11100EN006)
Saket Tomer (11100EN007)
Saurabh Ojha (11100EN011)
Gaurav Kumar (11100EN030)
 Definitions
 Objectives
 Back Propagation Neural Network(BPNN)
 Genetically Optimized Neural Network(GONN)
 Hadoop
 Sentiment Analysis
 Stock Market:
Stock markets are a great source of income for millions
of people around the globe. Many lives get affected,
directly or indirectly, by the daily ups and downs of the
stock market.
 Stock Market Variations:
Stock prices change every day as a result of market
forces. By this we mean that share prices change because
of supply and demand.
 Artificial Intelligence and Soft Computing
techniques:
These Technique can predict an overall trend, based on
past data sets, and support the trader in his decision
making.
 Prediction of stock prices using artificial intelligence.
 Deployment of hadoop cluster, as the data used is
very huge (Big Data).
 Use of Sentimental Analysis, as the mood of
companies changes.
 A neural network can be defined as a model of
reasoning based on the human brain.
The learning algorithm has two phases :
 First: The network propagates the input pattern from
layer to layer until the output pattern is generated by
the output layer.
 Second: If this pattern is different from the desired
output, an error is calculated and then propagated
backwards through the network from the output layer
to the input layer. The weights are modified as the
error is propagated.
Input
layer
xi
x1
x2
xn
1
2
i
n
Output
layer
1
2
k
l
yk
y1
y2
yl
Input signals
Error signals
wjk
Hidden
layer
wij
1
2
j
m
Approach and Formulation :
Going toward Mathematical Formualtion
1. Input matrix: X
2. Weight Matrices: Theta1 and Theta2
3. Predicted Output Matrix: p or q
4. Actual Result Matrix: y
5. Error matrix: e
Notations:
 aij = Output of a node j in layer i
 xij = Value of feature j in example i
 uij = Value of weight for link between node j of input
layer to node i of hidden layer
 vi = Value of weight for link between node i of hidden
layer to the only node of output layer)
 The neuron operates on its inputs and
produces an output a as follows :
a = 𝒊=𝟏
𝒏
𝒘𝒊 𝒙𝒊
 To express the activation aj neuron, the
formula is modified as follows:
aj = 𝒊=𝟏
𝒏
𝒘𝒊𝒋 𝒙𝒊
 Activations of a node in input layer:
a11=x11 (closing price of previous day)
a12=x12 (opening price of current day)
a13=x13 (intraday high)
a14=x14 (intraday low)
 Activations of a node in hidden layer:
a21=a11*u11 + a12*u12 + a13*u13 + a14*u14
a22= a11*u21 + a12*u12 + a13*u13 + a14*u14
a23= a11*u11 + a12*u12 + a13*u13 + a14*u14
a24= a11*u11 + a12*u12 + a13*u13 + a14*u14
 Activation of output node:
a3= a21*v1 + a22*v2 + a23*v3 + a24*v4
 Cost Function:
J = 𝑖=1
𝑚
(𝑝𝑖 − 𝑦𝑖)2
 We have used the in-built function in MATLAB
‘fminunc’ to minimize J.
 To eliminate some of the disadvantages of neural
network, and take up the advantages of an
evolutionary approach, we can come up with a model,
which combines both of them, to give the best of both
the approaches.
 Since, this model is inspired from Koza’s model to
optimize the structure of neural network;
Koza used the following function set F and terminal set T for a GP
tree:
F = {P,W, +, -, *, %}
T = {feature variables of dataset, R}
where,
P is linear threshold processing function
W is the weighting function for a signal going into P
+, -, * and % are normal arithmetic functions
D0, D1, .. Dn are the feature variables (input data signal) of dataset.
R contains randomly generated floating point constants.
 The rules of construction for a neural network with one
output signal, which are laid down by Koza are as
follows:
 The root of the tree must be a processing function P.
 The only thing allowed at the level immediately below
any processing function P is a weighting function W.
 The only thing allowed below a weighting function W
on the left child is either a floating-point random
constant or an arithmetic function.
 The only thing allowed below a weighting function W
on the right child is either an input data signal (such as
D0 and D1) or the output of a P function.
 The only thing allowed below an arithmetic function is
either a floating-point random constant or an
arithmetic function (+,-, *,%).
 We have used Hadoop framework in our code to make
sure that our algorithm is able to handle ‘Big Data’.
 MapReduce is the programming model to work on
data within the HDFS, which we have implemented
using Java.
A MapReduce program consists of the following 3 parts :
 Driver - Responsible for building the configuration of
the job and submitting it to the Hadoop Cluster.
 Mapper - Reads the input files as <Key,Value> pairs and
emits key value pairs.
 Reducer - Reads the outputs generated by the different
mappers as <Key,Value> pairs and emits key value
pairs.
1. Hadoop Installation - Download Hadoop and extract
the package to any location:
sudo tar xzf hadoop-1.0.3.tar.gz
2. Update $HOME/.bashrc by adding the following line to
it :
export JAVA_HOME=/usr/lib/jvm/java-6-sun
3. Configure core-site.xml, mapred-site.xml, hdfsite.xml.
4. Formatting the HDFS file system via namenode by
following command:
/usr/local/hadoop/bin/hadoop namenode –
format
5. Now we start our single node Hadoop cluster by
following command:
/usr/local/hadoop/bin/start-all.sh
6. Check all the running Hadoop processes :
/usr/local/hadoop$ jps
7. To stop single node Hadoop cluster run :
/usr/local/hadoop/bin/stop-all.sh
 Uploading the jar file of our code application to the
master instance,
 Uploading the input f iles (past stock market data) on
Google cloud storage bucket associate with the project
 Running the jar file through driver class of
MapReduce by passing the required arguments
[input and output paths]
 Saved the output files , and note down algorithm-
execution information, like time taken.
 Sentiment analysis refers to the use of natural
language processing, text analysis and computational
linguistics to identify and extract subjective
information in source materials.
 If the news is positive, then there are good chances
that the share prices of that company will go up.
 Similarly, if the news article has an overall negative
sentiment, the share prices of the company or industry
can go down.
Begin
Hashmap Create Hashmap for positive words as P
Hashmap Create Hashmap for negatie words as N.
Declaration declare and initialize an integer variable Ans = 0 to store the output value
For each word in the text do
If word belongs to P then
Increment Ans = Ans + 1
Else if word belongs to N then
Decrement Ans = Ans-1
End if.
End for
If Ans > 0 then
Print Positive news, share prices likely to go up
Else if Ans <0, then
Print Negative News, share prices likely to go down
else
Print Neutral news, no predictable effect on share price
end if
end
An intelligent scalable stock market prediction system

An intelligent scalable stock market prediction system

  • 1.
    Guided By: Prof. ArunKumar Agrawal IIT (BHU) Varanasi Submitted by: Surbhi Agrawal (11100EN004) Harshit Agarwal (11100EN006) Saket Tomer (11100EN007) Saurabh Ojha (11100EN011) Gaurav Kumar (11100EN030)
  • 2.
     Definitions  Objectives Back Propagation Neural Network(BPNN)  Genetically Optimized Neural Network(GONN)  Hadoop  Sentiment Analysis
  • 3.
     Stock Market: Stockmarkets are a great source of income for millions of people around the globe. Many lives get affected, directly or indirectly, by the daily ups and downs of the stock market.
  • 4.
     Stock MarketVariations: Stock prices change every day as a result of market forces. By this we mean that share prices change because of supply and demand.
  • 5.
     Artificial Intelligenceand Soft Computing techniques: These Technique can predict an overall trend, based on past data sets, and support the trader in his decision making.
  • 6.
     Prediction ofstock prices using artificial intelligence.  Deployment of hadoop cluster, as the data used is very huge (Big Data).  Use of Sentimental Analysis, as the mood of companies changes.
  • 8.
     A neuralnetwork can be defined as a model of reasoning based on the human brain.
  • 9.
    The learning algorithmhas two phases :  First: The network propagates the input pattern from layer to layer until the output pattern is generated by the output layer.  Second: If this pattern is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer. The weights are modified as the error is propagated.
  • 10.
  • 11.
    Approach and Formulation: Going toward Mathematical Formualtion 1. Input matrix: X 2. Weight Matrices: Theta1 and Theta2 3. Predicted Output Matrix: p or q 4. Actual Result Matrix: y 5. Error matrix: e
  • 12.
    Notations:  aij =Output of a node j in layer i  xij = Value of feature j in example i  uij = Value of weight for link between node j of input layer to node i of hidden layer  vi = Value of weight for link between node i of hidden layer to the only node of output layer)
  • 13.
     The neuronoperates on its inputs and produces an output a as follows : a = 𝒊=𝟏 𝒏 𝒘𝒊 𝒙𝒊  To express the activation aj neuron, the formula is modified as follows: aj = 𝒊=𝟏 𝒏 𝒘𝒊𝒋 𝒙𝒊
  • 14.
     Activations ofa node in input layer: a11=x11 (closing price of previous day) a12=x12 (opening price of current day) a13=x13 (intraday high) a14=x14 (intraday low)
  • 15.
     Activations ofa node in hidden layer: a21=a11*u11 + a12*u12 + a13*u13 + a14*u14 a22= a11*u21 + a12*u12 + a13*u13 + a14*u14 a23= a11*u11 + a12*u12 + a13*u13 + a14*u14 a24= a11*u11 + a12*u12 + a13*u13 + a14*u14
  • 16.
     Activation ofoutput node: a3= a21*v1 + a22*v2 + a23*v3 + a24*v4  Cost Function: J = 𝑖=1 𝑚 (𝑝𝑖 − 𝑦𝑖)2  We have used the in-built function in MATLAB ‘fminunc’ to minimize J.
  • 18.
     To eliminatesome of the disadvantages of neural network, and take up the advantages of an evolutionary approach, we can come up with a model, which combines both of them, to give the best of both the approaches.  Since, this model is inspired from Koza’s model to optimize the structure of neural network;
  • 19.
    Koza used thefollowing function set F and terminal set T for a GP tree: F = {P,W, +, -, *, %} T = {feature variables of dataset, R} where, P is linear threshold processing function W is the weighting function for a signal going into P +, -, * and % are normal arithmetic functions D0, D1, .. Dn are the feature variables (input data signal) of dataset. R contains randomly generated floating point constants.
  • 20.
     The rulesof construction for a neural network with one output signal, which are laid down by Koza are as follows:  The root of the tree must be a processing function P.  The only thing allowed at the level immediately below any processing function P is a weighting function W.
  • 21.
     The onlything allowed below a weighting function W on the left child is either a floating-point random constant or an arithmetic function.  The only thing allowed below a weighting function W on the right child is either an input data signal (such as D0 and D1) or the output of a P function.  The only thing allowed below an arithmetic function is either a floating-point random constant or an arithmetic function (+,-, *,%).
  • 27.
     We haveused Hadoop framework in our code to make sure that our algorithm is able to handle ‘Big Data’.  MapReduce is the programming model to work on data within the HDFS, which we have implemented using Java.
  • 28.
    A MapReduce programconsists of the following 3 parts :  Driver - Responsible for building the configuration of the job and submitting it to the Hadoop Cluster.  Mapper - Reads the input files as <Key,Value> pairs and emits key value pairs.  Reducer - Reads the outputs generated by the different mappers as <Key,Value> pairs and emits key value pairs.
  • 30.
    1. Hadoop Installation- Download Hadoop and extract the package to any location: sudo tar xzf hadoop-1.0.3.tar.gz 2. Update $HOME/.bashrc by adding the following line to it : export JAVA_HOME=/usr/lib/jvm/java-6-sun 3. Configure core-site.xml, mapred-site.xml, hdfsite.xml.
  • 31.
    4. Formatting theHDFS file system via namenode by following command: /usr/local/hadoop/bin/hadoop namenode – format 5. Now we start our single node Hadoop cluster by following command: /usr/local/hadoop/bin/start-all.sh 6. Check all the running Hadoop processes : /usr/local/hadoop$ jps 7. To stop single node Hadoop cluster run : /usr/local/hadoop/bin/stop-all.sh
  • 32.
     Uploading thejar file of our code application to the master instance,  Uploading the input f iles (past stock market data) on Google cloud storage bucket associate with the project  Running the jar file through driver class of MapReduce by passing the required arguments [input and output paths]  Saved the output files , and note down algorithm- execution information, like time taken.
  • 33.
     Sentiment analysisrefers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.  If the news is positive, then there are good chances that the share prices of that company will go up.  Similarly, if the news article has an overall negative sentiment, the share prices of the company or industry can go down.
  • 34.
    Begin Hashmap Create Hashmapfor positive words as P Hashmap Create Hashmap for negatie words as N. Declaration declare and initialize an integer variable Ans = 0 to store the output value For each word in the text do If word belongs to P then Increment Ans = Ans + 1 Else if word belongs to N then Decrement Ans = Ans-1 End if. End for If Ans > 0 then Print Positive news, share prices likely to go up Else if Ans <0, then Print Negative News, share prices likely to go down else Print Neutral news, no predictable effect on share price end if end