Hadoop Project

Stock Analyzer
(Mapreduce and Hive Implementation)
Presented by
Punit Kishore(A13011)
Debayan Datta(A13006)
Sunil Kumar P(A13020)
Maruthi Nataraj K(A13009)
Ashish Ranjan(A13004)
Praxis Business School
AGENDA
 Understanding of the problem
 Technical Architecture
 Basic Structure
 Pseudo Code
 Final Result
 Business Implications

Electronics Template
UNDERSTANDING OF THE PROBLEM
 Objective : To find the adjusted closing price for each

day that a stock not reported a dividend.

 Data Sources :
 NYSE daily prices dataset with the below schema
exchange

stock_symbol

date

stock_price
_open

stock_
price_high

stock_price
_low

stock_price
_close

stock_volume

stock_pric
e_adj_close

 NYSE dividends dataset with the below schema
exchange

stock_symbol

date

dividends

 Isolation of dividend data from total data will give better
picture of the company because sometimes firms avoid
cutting dividends even when earnings drop.
Framework– Mapreduce/Hive
Electronics Template
TECHNICAL ARCHITECTURE

Eclipse Indigo 3.7.2
Hadoop 1.2.1 plugin

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
TECHNICAL ARCHITECTURE
WinSCP

Electronics Template
TECHNICAL ARCHITECTURE

Electronics Template
Putty

Electronics Template

TECHNICAL ARCHITECTURE
TECHNICAL ARCHITECTURE

Unix Environment /Amazon AWS EC2 Praxis Hadoop Cluster

Electronics Template
TECHNICAL ARCHITECTURE

Sample data - NYSE_daily_prices_AT.csv (Testing is done on sample data only due to
load and time constraints).

Electronics Template
TECHNICAL ARCHITECTURE

Sample data - NSE_daily_prices_BT.csv

Electronics Template
TECHNICAL ARCHITECTURE
Sample data - dividendstest.csv

Electronics Template
BASIC STRUCTURE
Input Key Value Pair <Memory Pointer,NYSE,AIT,
12-11-2009,X,X,X,X,X,20.69>

Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0>
<AIT12-11-2009,1~Null~1>

Output/Result Key Value Pair
AIT
12-11-2009
20.69

Electronics Template
PSEUDO CODE
import java and hadoop packages

Mapper
Mapper

public static class StockAnalysisMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text>
{
// declaration of Mapkey and Mapvalue
@Override
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output,
Reporter reporter) throws IOException
{
// declaration of private variables
// switch case to parse the input lines and store the data
// check for null values in the key
// check the header and send the key value to output collector
}

}

Electronics Template
PSEUDO CODE
public static class StockAnalysisReducer extends MapReduceBase
implements Reducer<Text, Text, Text, Text>

Reducer
Reducer

{
//Declaration of required private variables
@Override
public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter
reporter) throws IOException
{
//Declaration of sum and flag variables
while (values.hasNext())
{
// Parse the inputs which are count,stock adjusted closing price and check
// Store them as required after parsing
//check for null values of stock adjusted closing price
}
}
}

//Increment the sum
// write to output if sum is 1

Electronics Template
PSEUDO CODE
public static void main(String [] arguments) throws Exception
{
JobConf conf = new JobConf(StockAnalyzer.class);
conf.setJobName("Stock Analysis");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(StockAnalysisMapper.class);
conf.setReducerClass(StockAnalysisReducer.class);
Path MapperInputPath = new Path(arguments[0]);
Path OutputPath = new Path(arguments[1]);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, MapperInputPath);
FileOutputFormat.setOutputPath(conf, OutputPath);
JobClient.runJob(conf);
}

Electronics Template

Driver
Driver
FINAL RESULT
• NYSE Daily A
– 14 inclusive of
1 header
• NYSE Daily B
– 39 inclusive of
1 header
• Dividends file
– 22 inclusive of
1 header
Total – 75

Electronics Template
FINAL RESULT
• Total – 75
• Matching
records – 7
• Headers – 3
• Dividend
records – 21
• Final Output
– 44 records

Electronics Template
FINAL RESULT

Electronics Template
HIVE
FINAL RESULT HIVE

Electronics Template
BUSINESS IMPLICATIONS
 The daily close stock prices are adjusted for dividend distributions/stock
splits because they are a part of total return and affect the historical volatility
estimates .
 The primary use for the adjusted closing price is as a means to develop an
accurate track record of a stock's performance. The comparison of a stock's
historical adjusted closing price to its current price shows the true rate of
return.
 Graphing the volatility history of the target firm simultaneously with that of its
competitors and Market Index can provide unique insights into risk and
comparative advantages(frequency distribution of returns can also be used).
 Historic stock price volatility might have implications to business valuators.

Electronics Template
Electronics Template

Stock Analyzer Hadoop MapReduce Implementation

  • 1.
    Hadoop Project Stock Analyzer (Mapreduceand Hive Implementation) Presented by Punit Kishore(A13011) Debayan Datta(A13006) Sunil Kumar P(A13020) Maruthi Nataraj K(A13009) Ashish Ranjan(A13004) Praxis Business School
  • 2.
    AGENDA  Understanding ofthe problem  Technical Architecture  Basic Structure  Pseudo Code  Final Result  Business Implications Electronics Template
  • 3.
    UNDERSTANDING OF THEPROBLEM  Objective : To find the adjusted closing price for each day that a stock not reported a dividend.  Data Sources :  NYSE daily prices dataset with the below schema exchange stock_symbol date stock_price _open stock_ price_high stock_price _low stock_price _close stock_volume stock_pric e_adj_close  NYSE dividends dataset with the below schema exchange stock_symbol date dividends  Isolation of dividend data from total data will give better picture of the company because sometimes firms avoid cutting dividends even when earnings drop. Framework– Mapreduce/Hive Electronics Template
  • 4.
    TECHNICAL ARCHITECTURE Eclipse Indigo3.7.2 Hadoop 1.2.1 plugin Electronics Template
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    TECHNICAL ARCHITECTURE Unix Environment/Amazon AWS EC2 Praxis Hadoop Cluster Electronics Template
  • 11.
    TECHNICAL ARCHITECTURE Sample data- NYSE_daily_prices_AT.csv (Testing is done on sample data only due to load and time constraints). Electronics Template
  • 12.
    TECHNICAL ARCHITECTURE Sample data- NSE_daily_prices_BT.csv Electronics Template
  • 13.
    TECHNICAL ARCHITECTURE Sample data- dividendstest.csv Electronics Template
  • 14.
    BASIC STRUCTURE Input KeyValue Pair <Memory Pointer,NYSE,AIT, 12-11-2009,X,X,X,X,X,20.69> Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0> <AIT12-11-2009,1~Null~1> Output/Result Key Value Pair AIT 12-11-2009 20.69 Electronics Template
  • 15.
    PSEUDO CODE import javaand hadoop packages Mapper Mapper public static class StockAnalysisMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { // declaration of Mapkey and Mapvalue @Override public void map(LongWritable key, Text value,OutputCollector<Text, Text> output, Reporter reporter) throws IOException { // declaration of private variables // switch case to parse the input lines and store the data // check for null values in the key // check the header and send the key value to output collector } } Electronics Template
  • 16.
    PSEUDO CODE public staticclass StockAnalysisReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> Reducer Reducer { //Declaration of required private variables @Override public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter reporter) throws IOException { //Declaration of sum and flag variables while (values.hasNext()) { // Parse the inputs which are count,stock adjusted closing price and check // Store them as required after parsing //check for null values of stock adjusted closing price } } } //Increment the sum // write to output if sum is 1 Electronics Template
  • 17.
    PSEUDO CODE public staticvoid main(String [] arguments) throws Exception { JobConf conf = new JobConf(StockAnalyzer.class); conf.setJobName("Stock Analysis"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(StockAnalysisMapper.class); conf.setReducerClass(StockAnalysisReducer.class); Path MapperInputPath = new Path(arguments[0]); Path OutputPath = new Path(arguments[1]); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, MapperInputPath); FileOutputFormat.setOutputPath(conf, OutputPath); JobClient.runJob(conf); } Electronics Template Driver Driver
  • 18.
    FINAL RESULT • NYSEDaily A – 14 inclusive of 1 header • NYSE Daily B – 39 inclusive of 1 header • Dividends file – 22 inclusive of 1 header Total – 75 Electronics Template
  • 19.
    FINAL RESULT • Total– 75 • Matching records – 7 • Headers – 3 • Dividend records – 21 • Final Output – 44 records Electronics Template
  • 20.
  • 21.
  • 22.
    BUSINESS IMPLICATIONS  Thedaily close stock prices are adjusted for dividend distributions/stock splits because they are a part of total return and affect the historical volatility estimates .  The primary use for the adjusted closing price is as a means to develop an accurate track record of a stock's performance. The comparison of a stock's historical adjusted closing price to its current price shows the true rate of return.  Graphing the volatility history of the target firm simultaneously with that of its competitors and Market Index can provide unique insights into risk and comparative advantages(frequency distribution of returns can also be used).  Historic stock price volatility might have implications to business valuators. Electronics Template
  • 23.