Aditya report finaL

STOCK MARKET PREDICTION
Enroll No. 9910103561
Name of Student Aditya datta
Name of Supervisor Mr. Bansidhar Joshi
MAY’ 2014
Submitted in partial fulfillment of the Degree of
Bachelor of Technology
In
Computer Science Engineering
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING &
INFORMATION TECHNOLOGY
JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY, NOIDA

2
(I)
TABLE OF CONTENTS
Chapter No. Topics Page No.
Student Declaration II
Certificate from the Supervisor III
Acknowledgement IV
Summary (Not more than 250 words) V
List of Figures VI
List of Tables VII
List of Symbols VIII
List of Acronyms IX
Chapter-1 Introduction 10 - 13
1.1 General Introduction 10
1.2 Problem Statement 12
1.3 Empirical Study 13
1.4 Approach to problem in terms of technology 13
1.5 Support for novelty/significance of problem
Chapter-2 Literature Survey 14 - 18
2.1 Summary of papers 14
2.2 Integrated summary of the literature studied 18
Chapter 3: Analysis, Design and Modeling 19 - 25
3.1 Overall Description of the project 19
3.2 Functional Requirements 19
3.3. Non Functional Requirements
3.4 Design Diagrams 22
3.4.1Use Case diagrams 23
3.4.2 Data Flow Diagram 24-25
Chapter-4 Implementation details and issues 26 - 29
4.1 Implementation Details and Issues 26
4.1.1 Implementation Issues
4.1.2 Algorithms
4.2 Risk Analysis and Mitigation plan 29
Chapter-5 Testing
5.1 Testing Plan 30
5.2 Component decomposition & type of testing required 34
5.3 Test cases 35
5.4 Error and Exception Handling 37
5.5 Limitation of the solution

3
Chapter-6 Findings & Conclusion 38 - 39
6.1 Findings 38
6.2 Conclusion 39
6.3 Future Work 39
References 40

4
(II)
DECLARATION
I hereby declare that this submission is my/our own work and that, to the best of my knowledge
and belief, it contains no material previously published or written by another person nor material
which has been accepted for the award of any other degree or diploma of the university or other
institute of higher learning, except where due acknowledgment has been made in the text.
Place: Signature:
Date: Name: Aditya datta
Enrollment No: 9910103561

5
(III)
CERTIFICATE
This is to certify that the work titled “Stock market prediction” submitted by “Aditya datta” in
partial fulfillment for the award of degree of Bachelors of Technology of Jaypee Institute of
Information Technology University, Noida has been carried out under my supervision. This work
has not been submitted partially or wholly to any other University or Institute for the award of
this or any other degree or diploma.
Signature of Supervisor ……………………..
Name of Supervisor ……………………..
Designation ……………………..
Date ……………………..

6
(IV)
ACKNOWLEDGEMENT
I take this opportunity to acknowledge all the people who have helped me whole heartedly at
every stage of the project.
I would like to express my special thanks of gratitude to my respected supervisor Mr. Bansidhar
Joshi who gave me the golden opportunity to do this great project on the topic:”Stock market
prediction”. The guidance and support received from her was vital for the success of the project.
I also extend my sincere thanks to all other faculty members of Computer Science Engineering
Department who helped me in the project.
Signature of the Student ……………………..
Name of Student ……………………..
Enrollment Number ……………………..
Date ……………………..

7
(V)
SUMMARY
Stock market prediction is a classic problem which has been analyzed extensively using tools
and techniques of Machine Learning. Interesting properties which make this modeling non-trivial
is the time dependence, volatility and other similar complex dependencies of this problem. To in
corporate these, Hidden Markov Models (HMM's) have recently been applied to forecast and
predict the stock market. We present the Maximum a Posteriori HMM approach for forecasting
stock values for the next day given historical data. In our approach, we consider the fractional
change in Stock value and the intra-day high and low values of the stock to train the continuous
HMM.
This HMM is then used to make a Maximum a Posteriori decision over all the possible stock
values for the next day. We test our approach on several stocks, and compare the performance to
some of the existing methods using HMMs and Artificial Neural Networks using Mean Absolute
Percentage Error (MAPE).
__________________ __________________
Signature of Student Signature of Supervisor
Name Name
Date Date

8
(VI)
LIST OF FIGURES
FIGURE NUMBER NAME PAGE NUMBER
Figure 1 USE CASE DIAGRAM 22
Figure 2 DFD DIAGRAM 23
Figure 3 DFD LEVEL1 DIAGRAM 24
Figure 4 ARCHITECTURE 25
Figure 6 PROCESS OVERVIEW 25
Figure 7 RISK ANALYSIS AND 35
MITIGATION DIAGRAM
Figure 8 CORRELATION BETWEEN 39
ACTUAL AND PREDICTED
VALUE FOR DELL
Figure 9 CORRELATION BETWEEN 39
ACTUAL AND PREDICTED
VALUE FOR GOOGLE

9
(VII)
LIST OF TABLES
TABLE NUMBER NAME PAGE NUMBER
Table 1 Risk Analysis & 30
Mitigation plan
Table 2 Test Plan 33
Table 3 Test Activity 34
Table 4 Software & Hardware 34
Items
Table 5 Component Testing 35
Table 6 Test Cases 36

10
(IX)
LIST OF ACRONYMS
S.NO. ABBREVIATION EXPANSION
1 POSM prediction on stock market
2 HMM Hidden markov model
3 CSV Comma separated Values
4 LRA Left right algorithm
5 BWA Baum Welch algorithm
6 ARIMA Integrated Moving Average

11
CHAPTER 1
INTRODUCTION
1.1. GENERAL
The stock market is a network which provides a platform for almost all major economic
transactions in the world at a dynamic rate called the stock value which is based on market
equilibrium. Predicting this stock value offers enormous arbitrage profit opportunities which are
a huge motivation for research in this area. Knowledge of a stock value beforehand by even a
fraction of a second can result in high profits. Similarly, a probabilistically correct prediction can
be extremely profitable in the amortized case. This attractiveness of finding a solution has
prompted researchers, in both industry and academia to find a way past the problems like
volatility, seasonality and dependence on time, economies and rest of the market. Previously,
techniques of Artificial Intelligence and Machine Learning - like Artificial Neural Networks,
Fuzzy Logic and Support Vector Machines, have been used to solve these problems. Recently,
the Hidden Markov Model (HMM) approach was applied to this problem in predicting the
pattern. The reason for using this approach is fairly intuitive. HMM's have been successful in
analyzing and predicting time depending phenomena, or time series. They have been used
extensively in the past in speech recognition, ECG analysis etc. The stock market prediction
problem is similar in its inherent relation with time. Hidden Markov Models are based on a set of
unobserved underlying states amongst which transitions can occur and each state is associated
with a set of possible observations. The stock market can also be seen in a similar manner. The
underlying states, which determine the behavior of the stock value, are usually invisible to the
investor. The transitions between these underlying states are based on company policy, decisions
and economic conditions etc. The visible effect which reflects these is the value of the stock.
Clearly, the HMM conforms well to this real life scenario. The choice of attributes, or feature
selection is significant in this approach. In the past various attempts have been made using the
volume of trade, the momentum of the stock, correlation with the market, the volatility of the
stock etc. In our model we use the daily fractional change in the stock value, and the fractional

12
deviation of intra-day high and low. The fractional change is necessary in order to make the
required prediction. Measuring the fractional deviation of both the intra-day high and low value
is a good measure as it gives the direction of the volatility as well. We use three different stocks
for evaluating the approach – Google , Apple Inc., and Dell Inc. A separate HMM is trained for
each stock. The one constraint that the training set needs to have is suitable variability in the
data. This is taken care of by taking appropriately large periods of time in which the stock value
changes steadily yet significantly. The remaining paper is organized as follows. In Section II we
review some of the existing techniques for stock market prediction, especially the ones using
HMMs. In Section III we give details of our approach with mathematical justifications.
In Section IV we describe the data-sets and provide the experimental results. Finally, in Section
V we discuss the results and conclude the paper.
1.2. PROBLEM STATEMENT
Stock Market prediction has been one of the more active research areas in the past, given the
obvious interest of a lot of major companies. In this research several machine learning
techniques have been applied to varying degrees of success. However, stock forecasting is still
severely limited due to its non-stationary, seasonal and in general unpredictable nature.
Predicting forecasts from just the previous stock data is an even more challenging task since it
ignores several outlying factors (such as the state of the company, economic conditions
ownership etc.). Machine learning techniques which have been widely applied to forecasting
stock market data include Artificial Neural Networks (ANNs) , Fuzzy Logic (FL), and Support
Vector Machines (SVMs) . Out of these ANNs have been the most successful, however even
their performance is quite limited, and not reliable enough .Wang and Leu trained a system based
on a recurrent neural network which used features extracted using the Autoregressive Integrated
Moving Average (ARIMA)analysis, which showed reasonable accuracy, .HMMs have only been
rarely applied to the given problem in the past, which is surprising given the time-dependent
nature of the market. HMMs have been successfully applied in the areas of Speech Recognition,
DNA sequencing, and ECG analysis . Shi and Weigend, , used HMMs to predict changes in the
trajectories of financial time series data. Recently Hassal combined HMMs and fuzzy logic rules
to improve the prediction accuracy on non-stationary stock data sets, . His performance was
significantly better than that of past approaches. The basic idea there was to combine HMM's

13
data pattern identification method to partition the data-space with the generation of fuzzy logic
for the prediction of multivariate financial time series data. Our approach is similar to the one
taken by Nobakht et al in .They model the daily opening, closing, high and low indices as
continuous observations from underlying hidden states. The main difference between our
approach and theirs lies in the features used (we use fractional changes in the above quantities)
and in the manner of forecasting. While they look for similar data patterns in the past data, we
maximize the likelihood of a sequence of observations over all possible forecasted future values.
1.3. APPROACH TO PROBLEM IN TERMS OF TECHNOLOGY
/PLATFORM TO BE USED
We use a continuous Hidden Markov Model (CHMM) to model the stock data as a time series.
An HMM (denoted by λ) can be written as λ=(π,A,B) Where A is the transition matrix whose
elements give the probability of a transition from one state to another, B is the emission matrix
giving the probability of observing when in state j, and π gives the initial
probabilities of the states at t= 1. Further, for a continuous HMM the emission probabilities are
modelled as Gaussian Mixture Models (GMMs):
where:
•M is the number of Gaussian Mixture components.
• is the weight of the mth mixture component in state j
• is the mean vector for the mth component in the jth
state.
• is the probability of observing in the multi-dimensional Gaussian
distribution.Training of the above HMM from given sequences of observations is done using the
Baum-Welch algorithm which uses Expectation-Maximization (EM) to arrive at the optimal
parameters for the HMM,.In our model the observations are the daily stock data in the form of
the 4-dimensional vector,

14
Here open is the day opening value, close is the day closing value, high is the day high, and low
is the day low. We use fractional changes along to model the variation in stock data which
remains constant over the years.
Once the model is trained, testing is done using an approximate Maximum a Posteriori (MAP)
approach. We assume a latency of d days while forecasting future stock values. Hence, the
problem becomes as follows - given the HMM model λ and the stock values for d days along
with the stock open value for the (d+1) day, we need to compute the close value for the (d+1)
day. This is equivalent to estimating the fractional change( ) for the (d+1) day. For
this, we compute the MAP estimate of the observation vector .Let be the MAP estimate
of the observation on the (d+1) day, given the values of the first d days. Then,
The observation vector is varied over all possible values. Since the denominator is constant with
respect to, the MAP estimate becomes,
The joint probability value can be computed using the
forward-backward algorithm for HMMs. In practice, we compute the probability over a discrete
set of possible values of see Table II, and find the maximum, hence the name MAP HMM
model. The computational complexity of the forward-backward algorithm for finding the
likelihood of a given observation is , where n is the number of states in the HMM and d
is the latency. This procedure is repeated over the discrete set of possible values of . In our
case n= 4, d= 10 and there are 50x10x10 possible values of The closing value of a

15
particular day can be computed by using the day opening value and the predicted fractional
change for that day.
CHAPTER 2
ADDITIONAL LITERATURE SURVEY
2.1. SUMMARY OF RELEVANT PAPERS
Title of paper Stock Market Forecasting Using Hidden Markov Model:
A New Approach
Authors Md. Rafiul Hassan and Baikunth Nath
Year of Publication 2005
Publishing details Proceedings of the 2005 ,5th International Conference on Intelligent
Systems Design and Applications
Summary This paper presents Hidden Markov Models (HMM) approach for
forecasting stock price for interrelated markets. We apply HMM to
forecast some of the airlines stock. HMMs have been extensively
used for pattern recognition and classification problems because of
its proven suitability for modelling dynamic systems. However,
using HMM for predicting future events is not straightforward.
Here we use only one HMM that is trained on the past dataset of
the chosen airlines. The trained HMM is used to search for the
variable of interest behavioural data pattern from the past dataset.
By interpolating the neighbouring values of these datasets
forecasts are prepared. The results obtained using HMM are
encouraging and HMM offers a new paradigm for stock market
forecasting, an area that has been of much research interest lately.
Key Words: HMM, stock market forecasting,
financial time series, feature selection
Web link hassan-nath-2005

16
stock_market_forecasting_using_hidden_markov_model_a_new_app
roach.pdf
Paper 2
Title of paper Analysis of Hidden Markov Models and Support Vector Machines
in Financial Applications.
Authors Satish Rao
Jerry Hong
Year of Publication 12th
may,2010
Publishing details Electrical Engineering and Computer Sciences University of
California at Berkeley
Summary This paper presents two approaches in helping investors make
better decisions. First, we discuss conventional methods, such as
using the Efficient Market Hypothesis and technical indicators, for
forecasting stock prices and movements.
We will show that these methods are inadequate, and thus, we
need to rethink the issue. Afterwards, we will discuss using
artificial intelligence, such as Hidden Markov Models and Support
Vector Machines, to help investors gather and compute enormous
amount of data that will enable them to make informed decisions.
We will leverage the Simlio engine to train both the HMM and
SVM on past datasets and use it to predict future stock
movements. The results are encouraging and they warrant future
research on using AI for market forecasts.
Web link EECS-2010-63.pdf

17
Paper 3
Title of paper Stock Market Prediction using Hidden Markov Model
Authors Aditya gupta,Bhuwan Dhinghra
Publishing details International Journal of Multimedia and Ubiquitous Engineering
Summary Stock market prediction is a classic problem which has been
analyzed extensively using tools and techniques of Machine
Learning. Interesting properties which make this modelling non-
trivial is the time dependence, volatility and other similar complex
dependencies of this problem. To incorporate these, Hidden
Markov Models (HMM’s) have recently been applied to forecast
and predict the stock market. We present the Maximum a
Posteriori HMM approach for forecasting the stock value for the
next day given historical data. In our approach, we consider the
fractional change in Stock value, and the intra-day high and low
values of the stock, to train the continuous HMM. This HMM is
then used to make a Maximum a Posteriori decision over all the
possible stock values for the next day. We test our approach on
several stocks, and compare the performance to some of the
existing methods using HMMs and Artificial Neural Networks
using Mean Absolute Percentage Error (MAPE).
Web link Y8036_Y8167.pdf

18
Paper 4
Title of paper Stock market trend analysis using Hidden markov Model
Authors Kavitha G, Udhaya kumar A,Nagarajan D
Publishing details IEEE Transactions on Network and Service Management
Summary Price movements of stock market are not totally random. In fact,
what drives the financial market and what pattern financial time
series follows have long been the interest that attracts economists,
mathematicians and most recently computer scientist. This paper
gives an idea about the trend analysis of stock market behaviour
using Hidden Markov Model (HMM). The trend once followed
over a particular period will sure repeat in future. The one day
difference in close value of stocks for a certain period is found and
its corresponding steady state probability distribution values are
determined. The pattern of the stock market behaviour is then
decided based on these probability values for a particular time.
The goal is to figure out the hidden state sequence given the
observation sequence so that the trend can be analyzed using the
steady state probability distribution(π ) values. Six optimal hidden
state sequences are generated and compared. The one day
difference in close value when considered is found to give the best
optimum state sequence.
.
Web link http://arxiv.org/ftp/arxiv/papers/1311/1311.4771.pdf

19
2.2. INTEGRATED SUMMARY OF LITERATURE STUDIED
The study of papers mainly focus on Hidden Markov Models (HMM) approach for forecasting
stock price for interrelated markets. We apply HMM to forecast some of the airlines stock.
HMMs have been extensively used for pattern recognition and classification problems because of
its proven suitability for modelling dynamic systems. However, using HMM for predicting future
events is not straightforward. Here we use only one HMM that is trained on the past dataset of
the chosen airlines. The trained HMM is used to search for the variable of interest behavioural
data pattern from the past dataset. By interpolating the neighbouring values of these datasets
forecasts are prepared. The results obtained using HMM are encouraging and HMM offers a new
paradigm for stock market forecasting, an area that has been of much research interest lately.
These conventional tools offered much insight into the workings of the financial market.
However, they provide only a macro-simplification that does not always reflect how the real
market works. There are definitely limitations to these tools that prevent them from modeling the
market in a more focused, “micro” manner. One of the major issues is that many conventional
finance theories only take in so many factors. This limited scope prevents us to accurately model
the real market that has countably infinite number of patterns. We need a model that can
constantly adapt to the dynamic nature of the market. Technical indicators can only help an
investor so much before the different combinations and patterns causes the investor to question
whether any formula actually works consistently. This is where AI models such as HMMs and
SVMs come into play. Using these tools, we can achieve a more realistic micro-representation of
the market while overcoming the limitations of the earlier techniques. In Recent years, a variety
of forecasting methods have been proposed and implemented for the stock market analysis. A
brief study on the literature survey is presented. Markov Process is a stochastic process where the
probability at one time is only conditioned on a finite history, being in a certain state at a certain
time. Markov chain is “Given the present, the future is independent of the past”. HMM is a form
of probabilistic finite state system where the actual states are not directly observable. They can
only be estimated using observable symbols associated with the hidden states. At each time
point, the HMM emits a symbol and changes a state with certain probability. HMM analyze and

20
predict time series or time depending phenomena. There is not a one to one correspondence
between the states and the observation symbols.
CHAPTER 3
ANALYSIS, DESIGN AND MODELLING
3.1. OVERALL DESCRIPTION OF THE PROJECT
3.1.1. PRODUCT PERSPECTIVE
The product to be developed: “Prediction of stock market(POSM)” is a stand-alone application
and an independent product and is developed in Java programming language using swings
framework.
3.1.2. PRODUCT FUNCTIONS
Hardware Requirements:
The hardware requirements may serve as the basis for a contract for the implementation of the
system and should therefore be a complete and consistent specification of the whole system.
They are used by software engineers as the starting point for the system design. It should what
the system do and not how it should be implemented.
PROCESSOR: PENTIUM 4 2.1 GHZ & ABOVE.
RAM: 512 MB DDR2 RAM
MONITOR: COLOR
HARD DISK SPACE: 10 MB
Software Requirements:
The software requirements document is the specification of the system. It should include both a
definition and a specification of requirements. It is a set of what the system should do rather than
how it should do it. The software requirements provide a basis for creating the software
requirements specification. It is useful in estimating cost, planning team activities, performing
tasks and tracking the teams and tracking the team’s progress throughout the development
activity.
OPERATING SYSTEM: WINDOWS 7/8/8.1
LANGUAGE: JAVA

21
FRAMEWORK: JAVA SWINGS
TOOL USED: NETBEANS IDE 7.1 & ABOVE, JDK 1.7 & ABOVE.
3.1.3. USER CHARACTERSTICS
• The user needs to have Java Virtual Machine and Java Runtime Environment Installed
in the computer
• User must have some knowledge in the field of Share Market.
3.1.4. CONSTRAINTS
Reliability requirements:
The system must be reliable and must not crash due to higher number of operations and
higher processing time. A high speed processor must be employed.
3.1.5. ASSUMPTIONS AND DEPENDENCIES
Assumption 1: The user knows how to operate Windows system and is aware of Share
Marketing basics.
Assumption 2: The JDK Version should not be less than the Version 1.7.
Dependency 1: Dependent on Internet speed of the System.
3.1.6. APPORTIONING OF REQUIREMENTS
The system may be optimized in terms of accuracy and speed in the future versions of the
product.
3.2. FUNCTIONAL REQUIREMENTS
1. Fetch Chart data from Yahoo Finance.
2. Convert the Csv file to XLS spread sheet.
3. Apply Left right algorithm to the incoming data and send it for a transition matrix.
4. Apply Baum Welch or predicting algorithm to the received observations ,calculate the
fractional change.
5. Calculate the opening ,closing ,high, low value of the given stock using hidden
markov model.

22
3.3. NON FUNCTIONAL REQUIREMENTS
1. Data fetched from yahoo finance need to be delivered reliably and quickly.
2. Study the specifications and configuration setting of java API’s , while integrating
with user application.
.
3.4. DESIGN DIAGRAMS
3.4.1. USE CASE DIAGRAM

24
DATA FLOW DIAGRAM
LEVEL 0-DFD

26
3.4.2. ARCHITECTURAL DIAGRAM
Fig. 1. Architecture of stock market prediction
There are a number of sources of stock and financial data on web including Google Finance and
Yahoo! Finance. In the specific case of SENSEX Index, it seems that Google Finance does not
provide the required data format as required. As an alternative, Yahoo! Finance was used to load
and store the stock data from 1984. In the application developed for this project, an interface is
included to use when needed to load the data beforehand using the algorithms for training and
prediction.
SMP
Microsoft
Excel Reader
Excel
reader
JAVA
ARCHITECT
URE Stock
Data
Extracted
Data
returned
as data -
source

27
In this problem, there is a sequence of data over time with which we need to train an HMM,
To train an HMM matching a set of the sequence of stock index data as ~O = (open; high; low;
close), the following settings and considerations have been taken into account:
1. States: in the experiments, we have N = 4; intuitively, it is denoting the stages in time that are
allowed in different transitions in the HMM training.
2. Dimensions: as a mixture of multivariate Gaussian is utilized in this problem, we have D = 4
as the observation vector for each stock date is as (open; low; high; close).
3. Mixtures: [10] proposes to have M = 3.
4. Left-Right Delta: experimentally, we have tried delta=1 and delta = 3.
5. Prior Probability: Adhering to the left-right HMM, we have = (1; 0; 0; 0).

28
CHAPTER 4
IMPLEMENTATION DETAILS AND ISSUES
3.1. IMPLEMENTATION DETAILS AND ISSUES
The software is implemented in java programming language and the user interface is designed
using Java swings framework.
DETAILS OF THE PROJECT
As the problem of the prediction could be complex and lengthy, a series of actions and activities
in the form of several phases was considered to break down the problem to conquer the
complexity. Figure 1 depicts the overall process that is considered when solving the problem.
Additionally, the following section will discuss each of the the phases in more details. As
mentions, through this experiment, we try to take advantage of Hidden Markov Models (HMM)
to address some interesting problems regarding stock market analysis.
Specically, stock market index prediction is done in this assignment. First, a set of past data is
loaded and analyzed; then, an HMM is modelled and trained for the problem model. Afterwards,
similar past data are distinguished and used to predict future stock market values.Stock market
data that are used in this assignment are the data from SENSEX that is an index over the stock
data in INDIA. Basically, each stock market data is a quadruple (open; low; high; close) carrying
the meaning that each day the stock market starts its activity, it starts with some opening after
which during the day it reaches its highest or drops down to its lowest of the days and then it will
stop with a close value. Such data seems to be very sensitive for stock workers and business
shareholders to predict future stock trends. In this project, we try to estimate the future day's
close values as precise as possible.

29
3.1.1. IMPLEMENTATION ISSUES
A number of important implementation issues must be addressed before POSM can be
feasibly deployed . Some of these issues are discussed below:
1. Interesting HMM Questions:
1. How to compute the probability of the occurrence of
a specific sequence of observations, P(~O j_), in which ~O = f(O1;O2,,,,,,;OT) g
2. How to choose state sequence (q(1); q(2),,,,,; q(T)) that explains best the observation of
~O in the model .
3. How to tune parameters (٨; A;B) to and a model that best matches the observations of
~O In stock market analysis and prediction, we are facing the first and the third problem.
Training HMM is done through the problem (3) and prediction is achieved with problem
(1).
2. Initializing the HMM: Another important issue in the modeling with HMM is how to
initialize the parameters of defined HMM, i.e. transition probabilities, observation probabilities
(distributions) and prior probabilities. Although the HMM can adjusts it parameters as good as
possible in the learning process, however badly initiating the HMM parameters can cause
imprecise model even though after HMM is trained and learnt. A basic approach is required to
obtain initial optimized values of the HMM parameters. In this project we proposed our approach
to initialize the parameters of the HMM base of physical behavior of data. As we describe in one
of the best HMM types that can be used for time series prediction is Left-Right HMM.
Consequently the prior probabilities is (1, 0, 0, 0) since the Left-Right HMM impose this fact.
Transition probabilities is again mainly depends on the defined model. we described how should
we initialized the transition probabilities based on the value of Observation distributions can be
initialized based on the training data. Since our proposed HMM has 4 states, we divided our
training sequence in 4 equal sections, each sequence is used to estimates the best possible
mixture of multi-Gaussian distribution for each state. The parameters of the multi-Gaussian
distributions are estimated with maximum likelihood method.

30
3.Continous HMM: In some applications such as stock market analysis or speech recognition,
the observations are not from of a discrete space; this would make the discrete observation Ot to
some ~Ot in each state meaning that in each state a series of observations could be received.
Specically, the observation vector in our assignment would be:~Ot = (opening; low; high; close)
which are the values of the stock index. Usually, for such HMM's, the representation for the
probability density function (pdf) that is used is a mixture of higher-dimensional Gaussian
distribution.
3.1.2. ALGORITHMS
Likelihood update algorithm
Likelihood update Algorithm in POSM
In the path to prediction, first, there is a need to find the most similar day in stock market data for
a specific day so that it could be used to predict the following day's close value. To do so, first,
we need to compute the likelihood of previous days in the desired range. When having one day's
stock data, it is straightforward to compute the likelihood of that specific day from the HMM.
This is Problem (1) which is computed using Forward Backward Algorithm proposed in [12, 2,
13]. Algorithm 1 overalls depicts the method to compute likelihoods.

31
Prediction Algorithm
When the likelihood probabilities of different days are computed, the last phase would be to
predict some day's close value as the target of this experiment. To do so, we introduce a
parameter likelihood tolerance denoting the similarity neighborhood that we can accept similar
days to the previous day. Through using the likelihood tolerance, we fetch a list of similar days
to yesterday's stock data and then we try to find the best guess as the one that has the highest
likelihood of all. In this experiment, we used the likelihood tolerance value in range [0.001, 0.01]
From this point, prediction is straightforward with calculating the difference of the similar day
and yesterday's values and then calculating tomorrow's close value. Algorithm 2 shows the
overall pseudo-code used for prediction. Along with prediction computation, we calculate also
the MAPE (Mean Average Percentage Error) measure.
Prediction algorithm for POSM

32
CHAPTER 5
TESTING
5.1. TESTING PLAN
Type of test Will test be
performed
Comments/explanation Software component
Requirements
testing
Yes Needs to be done to cope
up with changing
environment
Fluctuation in the share
market.
Unit Yes Maximum number of
defects are found. Each
component of code was
tested or analyzed
accordingly not only to
ensure the best quality of
the developed software
but also to make sure that
code behaves in the same
way as it was intended to.
Unit testing was
performed as and when
the component was
developed.
• User interface code
• Baum welch code
• Left right code
• Sgenerator code
• distribution Code
• hmm code
• prediction of day code
Integration Yes All the well-developed
sub-system are
integrated together and
tested called as
integration testing.
• Left right Bakis
Algorithm
• Baum welch Algorithm

33
Performance Yes Performance is the major
criteria for evaluating
any type of the system. It
holds importance and is
tested likewise.
Performance of different
Algorithms is measured in
combination. Algorithms are:
• Left right Algorithm
• Baum Welch algorithm
Performance is also measured
on the precision of output
Stress No - -
Compliance No - -
Security No - -
Table 1: Testing Plan
TEST TEAM DETAILS
Role Name Responsibility
Chief Testing
Incharge
Aditya data To perform requirements, unit, integration,
performance and load testing.
Table 2: Test Team details
Activity Start Date Completion Date Hours Comments

34
Develop Input 25-03-2014 28-03-2014 6 Nominal and
trivial issues
tested using the
standard test
cases designed
for the system.
Test Region
Setup
5-04-2014 10-04-2014 10 Test region is so
defined to check
all the features
individually as
well as in
combination
Table 3: Testing Schedule
Test Environment
Software Items
• Window 7/8/8.1 Stability
• Mac Stability
• Internet connection
• Java Runtime Environment & Development Kit 1.7 & above
• Netbeans7.1 & above
Hardware Items
• Personal Computer/Laptop
• Network Interface card
• Wireless connection or connecting cable
Table 4: Test Environment
5.2. COMPONENT TESTING

35
S.No Components that require
testing
Type of testing required Technique for
writing test case
1
Left right code Unit Testing White Box Testing
2 Csv to xls converter code Unit Testing White Box Testing
3 Model code Unit Testing White Box Testing
4 graph code Unit Testing White Box Testing
5 sgenerator code Unit Testing White Box Testing
6 Destination System Testing Black Box Testing
7 Source System Testing Black Box Testing
8 User interface code Performance Testing Black Box Testing
9 Utils code Performance Testing Black Box testing
Table 5: Component decomposition and identification of tests required
5.3. TEST CASES
Test Id T1
Input Enter the starting and the last date to update the data
Expected Output Data fetched from yahoo finance.
Status Pass
Test Id T2
Input Predict the stock rate for the very next day.
Expected Output We get low, high, opening and closing for the next day.
Status
Pass

36
Test Id T3
Input
Predict the stock rate for any day after a week from the given set
of data.
Expected Output Enter a date within a week
Status Pass
Test Id T4
Input Check the precision of output by entering a date whose value’s
are already known.
Expected Output Outputs are almost precise.
Status Pass
5.4. ERROR AND EXCEPTION HANDLING
Test Case Id Test Case Debugging Technique
T1 Fetching data from yahoo
finance .
Check the posmdownloader
code and check for errors.
Table 6: Error and Exception Handling
5.5. LIMITATION OF THE SOLUTION
.
• The precison of the output sometimes is not even near to the actual value.
• System sometimes hang due to loss of connection to Internet.

37
5.6. Risk analysis and Mitigation plan
Risk
Id
Classification Description Risk Area Probability Impact RE =
(P*I)
R1 Performance Low
Performance
Product
Engineering
L H 81
R2 Budget Medium
Budget
Program
Constraints
M L 3
R3 Project
Specification
Infeasible
Specifications
Product
Engineering
L M 3
R4 Hardware Hardware
Constraints
Product
Engineering
L L 1
R5 Accuracy Low Accuracy Product
Engineering
M H 27
R6 External Inputs Inaccurate
Inputs
Program
Constraints
H H 81
Table 7: Risk Identification

38
S. No. Risk Area # of Risk
Statements
Weight(in+out) Total
Weight
Priority
1 Performance 5 1+1+9+9+9 29 1
2 Accuracy 3 9+9+9 27 2
3 External Inputs 3 3+9+9 21 3
4 Hardware 2 9+9 18 4
5 Project
Specification
3 3+3+1 7 5
6 Budget 2 3+1 4 6
Table 8: Risk Area Wise Total Weighting Factor
Risk
Id
Risk Statement Risk Area Priority
R1 Risk of Performance Performance 1
R5 Risk of Low Accuracy Accuracy 2
R6 Risk of Inaccurate Inputs External Inputs 3
R4 Risk of Inaccurate Hardware
equipment
Hardware 4
Table 9: Risk with Maximum Weight

39
MITIGATION APPROACHES
Approach 1: To ensure high performance, optimize the system by reducing response time
10 April 2014 25 April 2014 Aditya datta
Additional Resources: High Speed Processor
Approach 2: To ensure high accuracy, optimize the code
Additional Resources: Internet
Approach 3: Ensure the system is secure
Additional Resources: None
Approach 4: Ensure the all the specifications and inputs are correct
Additional Resources: None

40
CHAPTER 6
FINDINGS AND CONCLUSION
As it is revealing, we have been successful rough estimation of the future data required in the
project. Though, the quality of “preciseness" becomes more significant as the sensitiveness of the
data rises. Thus, regarding the work that has been done, for future, one of the ideas to apply to
gain better quality is to consider weighted ranking of the most similar past data in search for the
likelihood tolerance. Intuitively, it will somehow try to control the deviation from the actual
values that are seen over time. Additionally, further boundary checks could be applied to the
predicted data to prevent undesired deviations in the predictions Another idea could be proposed
as “continuous training", as opposed to the current situation in which a period of time is
considered and for that an amount of data is located and used to train an HMM. Then the trained
HMM is used for prediction purposes. However, a better idea is to somehow persist the trained
HMM and over time try to optimize and tune the HMM according to the latest data that emerge
in time. This way, intuitively, we would be trying to optimize and improve the HMM over time
without losing the trained HMM from the past. ANN is well researched and established method
that has been successfully used to predict time series behaviour from past datasets. In this paper,
we proposed the use of HMM, a new approach, to predict unknown value in a time series (stock
market). It is clear from that the mean absolute percentage errors (MAPE) values of the two
methods are quite similar. Whilst, the primary weakness with ANNs is the inability to properly
explain the models. According to Repley“ the design and learning for feed-forward networks are
Opening ,price High, price Low, price Closing, price Predicted”. The proposed method using
HMM to forecast stock price is explainable and has solid statistical foundation. The results show
potential of using HMM for time series prediction. In our future work we plan to develop hybrid
systems using AI paradigms with HMM to further improve accuracy and efficiency of our
forecasts.

41
Correlation between predicted and actual closing stock price for google.
.
Correlation between predicted and actual closing stock price for dell.

43
REFERENCES
[1] Kuo R J, Lee L C and Lee C F (1996), Integration of Artificial NN and Fuzzy Delphi for
Stock market forecasting, IEEE International Conference on Systems, Man, and Cybernetics,
Vol. 2, pp. 1073-1078.
[2] Kimoto T, Asakawa K, Yoda M and Takeoka M (1990), Stock market prediction system with
modular neural networks, Proc. International Joint Conference on Neural Networks, San Diego,
Vol. 1, pp. 1-6.
[3] White H (1998), Economic Prediction Using Neural Networks: The Case of IBM Daily Stock
Returns, Proceedings of the Second Annual IEEE Conference on Neural Networks, Vol. 2, pp.
451-458.
[4] Chiang W C, Urban T L and Baldridge G W (1996), A Neural Network Approach to Mutual
Fund Net Asset Value Forecasting. Omega, Vol. 24 (2), pp. 205-215.
[5] Kim S H and Chun S H (1998), Graded forecasting using an array of bipolar predictions:
application of probabilistic neural networks to a stock market index.International Journal of
Forecasting, Vol. 14, pp. 323-337.
[6] Romahi Y and Shen Q (2000), Dynamic Financial Forecasting with Automatically Induced
Fuzzy Associations, Proceedings of the 9th international conference on Fuzzy systems, pp. 493-
498.
[7] Thammano A (1999), Neuro-fuzzy Model for Stock Market Prediction, Proceedings of the
Artificial Neural Networks in Engineering Conference, ASME Press, New York, pp. 587-591.
[8] Abraham A, Nath B and Mahanti P K (2001), Hybrid Intelligent Systems for Stock Market
Analysis,Proceedings of the International Conference on Computational Science. Springer, pp.
337-345.
[9] Raposo R De C T and Cruz A J De O (2004), Stock Market prediction based on
fundamentalist analysis with Fuzzy-Neural Networks.
http://www.labic.nce.ufrj.br/downloads/3wses_fsfs_2002.pdf
[10] Cao L and Tay F E H (2001), Financial Forecasting Using Support Vector Machines, Neural
Computation and Application, Vol. 10, pp. 184-192.
[11] Huang X, Ariki Y, Jack M (1990), Hidden Markov Models for speech recognition.
Edinburgh University Press.
.

Aditya report finaL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Aditya report finaL

Similar to Aditya report finaL (20)

Recently uploaded

Recently uploaded (20)

Aditya report finaL