This document describes a project that aims to analyze stock market data using Hadoop and MapReduce. The project will take a stock's daily price data as input and use MapReduce algorithms to find the frequency of different percentage changes in the stock's price over time. Technical indicators like simple moving average and exponential moving average will also be analyzed to understand stock trends. The overall goal is to help investors analyze stock behavior and identify good investment opportunities.
Stock Market Trend Analysis Using Hadoop MapReduce
1. Name of the students: 1) Rohit Jain (10103473)
2) Neeraj Chaudhary (10103525)
Name of the supervisor: Mr. Vivek Mishra.
2. Importance of project
Today is World where stock market is one of the major market to invest
in , to earn money. The stock market reflects the variation of the market
economy, and receives ten million investors’ focus since its opening
development. The stock market is characterize by high-risk, high-yield,
so investors are concerned about the analysis of the stock market and
trying to forecast the trend of the stock market. However, stock market is
impacted by the politics, economy and many other factors, coupled with
the complexity of its internal law, such as price changes in the non-linear,
and shares data with high noise characteristics, therefore the traditional
mathematical statistical techniques to forecast the stock market has not
yielded suitable results
Hence, we are going to analyze the stock based on different algorithms
designed using some tools and techniques which include hadoop and
mapreduce.
3. INTRODUCTION
Analysis of data is a process of inspecting, cleaning, transforming, and
modeling data with the goal of discovering useful information,suggesting
conclusions, and supporting decision making. Data analysis has multiple
facets and approaches, encompassing diverse techniques under a variety
of names, in different business, science, and social science domains.
We are going to analyse the data of a stock to find different types of
trend in this stock using hadoop and mapreduce.
Hadoop is an open source framework for writing and running distributed
applications that process large amounts of data. Distributed computing is
a wide and varied field, but the key distinctions of Hadoop are that it is
4. MapReduce is a data processing model . Its greatest advantage is the
easy scaling of data processing over multiple computing nodes. Under
the MapReduce model, the data processing primitives are called
mappers and reducers . Decomposing a data processing application
into mappers and reducers is sometimes nontrivial. But, once you
write an application in the MapReduce form, scaling the application
to run over hundreds, thousands, or even tens of thousands of
machines in a cluster is merely a configuration change. This simple
scalability is what has attracted many programmers to the MapReduce
model.
5. Technical and graphical indicators used
We are analyzing the data of a particular for past several years through
different type of algorithms. We need to find no. of days where a same
percentage change has been occurred in whole data.
For example, given a particular stock, we’d like to know how often in the
past several years its changed by 1%, 2%, 3% etc (kind of like a a
Fourier Transform, or transforming some temporal domain data into the
frequency domain).
Further we will we using different Technical indicators for analysis
purpose only which will include:
•Simple Moving Average (SMA)
•Exponential Moving Average (EMA)
•On Balance Volume (OBV)
6. TECHNICAL INDICATORS USED -:
This method is used for analysis purpose by using one of the
following feature users can see graph of that company by entering
period as input.
Simple Moving Average (SMA)-
1) SMA is basic of the moving average used for treading.
2) It is based on closing price.
Exponential Moving Average (EMA) –
1) Try to reduce Lag by applying more weight to recent price.
2) EMA (Current) = ((Price (Cur) – EMA (Prev))*Multiplier) +
EMA (Prev)
Multiplier = (2/ (Time period+1))
7. Overall description of the project
Our project aims at analyzing the data of particular stock using hadoop
mapreduce.
We proposed some algorithm to analyse the data of stock. Initially we are
finding the frequency of stock changes using an excel sheet as a input of
a stock. We will be using mapreduce functions to perform this operation
so that data could be analysed.
Then we will be using some other algorithm to forecast the trend of the
stock using some technical indicator exponential moving average(EMA)
. After this we are using graphical stock trend indicator to understand
the trend of stock.
This project is whole working on a Hadoop mapreduce .
8. Functional requirements and Non Functional
requirements
After the time elapsed in the project and working out the procedure to
implement our algorithm there are some requirements namely that are
needed for the proper functioning of the project .
A functional requirement describes what a software system should do,
while non-functional requirements place constraints on how the system
will do so.
•Functional Requirements:
• Hadoop should handle the inputted data of the stock.
•Mapper must have a key for mapping the data.
•Reducer must integrate the data as an output.
•Non Functional Reuirements:
•Scability: The application must work for a large data. It should not
fail in a this condition.
•Reliability: The application must be reliable in every aspect for the
user who is using for analyzing the data.
•Efficiency: Specifies how well the software utilizes scarce resources:
9. Component description and dependency details
An excel file of a particular stock is used as an input for the project. We
have used the excel of BP stock from yahoo server.
•Softwares Requirement
•Oracle (Sun) Java 6: Oracle (Sun) Java 6 is the reference
implementation for Java6.
•Hadoop: Hadoop Map/Reduce is a software framework for easily
writing applications which process vast amounts of data (multi-terabyte
data-sets) in-parallel on large clusters (thousands of nodes) of
commodity hardware in a reliable, fault-tolerant manner.
•Hardware Requirement
•PC 1.6 Ghz or higher
•3 Gb Ram or higher
•Operating System: Ubuntu
10. Overall Architecture
We are taking an excel file as an input and allowing map function
to perform a task on it and then reducing the result to get an output.
11. Proposed Algorithm
Algorithm based on percentage change of stock:
It’s an Algorithm to compute the frequency of stock market changes.
For example, given a particular stock, we’d like to know how often in the
past several years its changed by 1%, 2%, 3% etc (kind of like a a
Fourier Transform, or transforming some temporal domain data into the
frequency domain).
Yahoo Finance provides us a stock of BP as an excel sheet for the
analysis.
12. Map Function:
Primarily we are writing a stream processor here that atomically
performs what needs to happen on one line of data. Thats perfect for us,
we’re going to simply take the opening price, the closing price, calculate
the percent change and spit it out.
//Date,Open,High,Low,Close,Volume,Adj Close
String[] tokens = value.toString().split(“,”);
Float open= Float.valueOf(tokens[1]);
Float close= Float.valueOf(tokens[4]);
Float change=((close-open)/open)*100;
Word.set(new DecimalFormat(“0.##”).format((double)change) + “%”);
Context.write(word, one);
We will get a stream of (name, value) pairs with the name being the
percentage change for the day and the value being the integer ‘1’. This
function can be distributed over X number of machines, each one
performing its streaming function in parallel and independent of the
others.
13. Reduce Function:
This function is going to take the (name, value) outputs from all the
mappers and process that data accordingly (often ‘reducing’ it). In our
case we are simply going to count the number of times a particular
percentage change happens. In essence we are going to change this:
1.2% 1
1.3% 1
1.2% 1
Into
1.2% 2
1.3% 1
int sum=0;
for(IntWritable val : values)
{
Sum=sum +val.get();
}
Context,write(key , new IntWritable(sum));
14. Technical indicators algorithm:
This method is used for analysis purpose by using one of the following
feature users can see graph of that company by entering period as input.
A. Simple Moving Average (SMA)-
1) SMA is basic of the moving average used for treading.
2) It is based on closing price.
Ex. Daily Closing price- 11,12,13,14,15,16,17
To Find MA of day-
1st day- (11+12+13+14+15)/5=13
2nd day- (12+13+14+15+16)/5=14
3rd day- (13+14+15+16+17)/5=15 & so on.
15. B. Exponential Moving Average (EMA) –
1) Try to reduce Lag by applying more weight to recent
price.
2) EMA (Current) = ((Price (Cur) – EMA
(Prev))*Multiplier) + EMA (Prev)
Multiplier = (2/ (Time period+1))
16. Conclusion
Investing into stocks is a common side business of companies and
indivisual to get compound interest, time value of money, tax benefit,
diversification. So that to invest into good rising stock is necessary to get
desired profit. To select good stock stock change indicator is very helpful
for the user. Hadoop is a open source software which can handle the
huge amount of data quite easily. Hadoop has some modules like map
reduce function , HDFS, Hadoop common, hadoop yarn. Map is a
programming model which calculates percentage change of stock and
assigns that change as key and gives value equals to 1 for each key.
Whereas map function reads key and set of values associated to it.
Reduce function than calculates sum of values associates with key and
gives key and that sum (frequency) as final output. EMA algorithm
mainly focuses on recent price values. By analyzing these values user
can choose a stock less risky. By drawing graph of EMA closing prices
user can understand trend of stock. So that he can invest into less risky or
more risky (according to his choice) stock with upper trend.
17. Future work
We have planned the following things to do in future .
•We want add some more technical indiacators (like back propagation
neural networks ) to this program so that person can compare result of
each indicator. User will have the freedom to give importance on
particular condition (indicator).
•We want to add some graphical indicators ( like OBVP ) also with this
project so that user gets the graphical knowledge along with statistical
knowledge. So that he can better understand the trend of stock.
•We want to link this project to a website so that more no of people can
take benefit of this project.
18. I. Apache Software Foundation. Official apache hadoop website,
http://hadoop.apache.org
II. The Hadoop Architecture and Design,
Available:http://hadoop.apache.org/common/docs/r0.16.4/hdfs_desig
n.html
III. Aditya B. Patel, Manashvi Birla, Ushma Nair ,Addressing Big Data
Problem Using Hadoop and Map Reduce, NIRMA UNIVERSITY
INTERNATIONAL CONFERENCE ON ENGINEERING,
NUiCONE-2012, 06-08DECEMBER, 2012.
References
19. I. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplied Data
Processing on Large Clusters,OSDI 2004.
II. KUSHAGRA SAHU, REVATI PAWAR, SONALI TILEKAR,
RESHMA SATPUTE, STOCK EXCHANGE IFORECASTING
USING HADOOP MAP-REDUCE TECHNIQUE,
International Journal ofAdvancements inResearch & Technology,
Volume 2,Issue4,April‐2013
III. Hadoop in Action” by Chuck Lam.
IV. “Pro Hadoop- build scalable distributed applications in the cloud” by
Jason Venner Michael G Noll tutorials Applied Research. Big
Data. Distributed Systems. website: http://www.michael-noll.com