Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hack reduce mr-intro

446 views

Published on

  • Be the first to comment

  • Be the first to like this

Hack reduce mr-intro

  1. 1. HackReduce M a p R e d u c e I n t r oHopper.com (Greg Lu)
  2. 2. Project github.com/hackreduce/Hackathon Wiki github.com/hackreduce/Hackathon/wikiDownload the Github project for some sample datasets
  3. 3. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csv }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57 }NASDAQ,DYNT,2008-12-29,0.31,0.31,0.29,0.30,26900,0.30NASDAQ,DMLP,2003-10-21,17.65,17.94,17.58,17.59,4800,9.73NASDAQ,DORM,1997-02-07,7.88,7.88,7.63,7.75,7400,3.87 InputSplit 2NASDAQ,DXPE,2004-10-25,5.19,5.24,5.00,5.00,7600,2.50 }NASDAQ,DEST,2009-03-17,4.55,5.03,4.55,5.03,6800,5.03NASDAQ,DBRN,1992-01-02,8.88,9.25,8.75,8.88,84800,2.22NASDAQ,DXYN,1998-11-25,6.38,6.44,6.19,6.25,211100,6.25 InputSplit 3NASDAQ,DEAR,1998-12-08,10.50,11.50,10.50,10.50,5800,6.45... org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version) public int run(String[] args) throws Exception { Configuration conf = getConf(); if (args.length != 2) { System.err.println("Usage: " + getClass().getName() + " <input> <output>"); System.exit(2); } // Creating the MapReduce job (configuration) object Job job = new Job(conf); job.setJarByClass(getClass()); job.setJobName(getClass().getName()); } Defines how the data is split // The Nasdaq/NYSE data dumps comes in as a CSV file (text input), so we configure // the job to use this format. job.setInputFormatClass(TextInputFormat.class); and assigned to which mappers [...]
  4. 4. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csvdatasets/nasdaq/daily_prices }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version) public int run(String[] args) throws Exception { [...] // Tell the job which Mapper and Reducer to use (classes defined above) job.setMapperClass(MarketCapitalizationMapper.class); job.setReducerClass(MarketCapitalizationReducer.class); } Point the job to the custom classes that we created in order to process the data. } // This is what the Mapper will be outputting to the Reducer job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); Define the types of the (key, value) // This is what the Reducer will be outputting pairs that we’ll be outputting from the job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); mappers and the result of the job itself. // Setting the input folder of the job FileInputFormat.addInputPath(job, new Path(args[0])); // Preparing the output folder by first deleting it if it exists Path output = new Path(args[1]); FileSystem.get(conf).delete(output, true); FileOutputFormat.setOutputPath(job, output); Now we’ll show the MarketCapitalizationMapper class
  5. 5. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csvdatasets/nasdaq/daily_prices }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version)public static class MarketCapitalizationMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> { protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String inputString = value.toString(); String[] attributes = inputString.split(","); if (attributes.length != 9) throw new IllegalArgumentException("Input string given did not have 9 values in CSV format"); try { String exchange = attributes[0]; String stockSymbol = attributes[1]; Date date = sdf.parse(attributes[2]); double stockPriceOpen = Double.parseDouble(attributes[3]); double stockPriceHigh = Double.parseDouble(attributes[4]); double stockPriceLow = Double.parseDouble(attributes[5]); double stockPriceClose = Double.parseDouble(attributes[6]); int stockVolume = Integer.parseInt(attributes[7]); double stockPriceAdjClose = Double.parseDouble(attributes[8]); } catch (ParseException e) { throw new IllegalArgumentException("Input string contained an unknown value that couldnt be parsed"); } catch (NumberFormatException e) { throw new IllegalArgumentException("Input string contained an unknown number value that couldnt be parsed"); } double marketCap = stockPriceClose * stockVolume; context.write(new Text(stockSymbol), new DoubleWritable(marketCap)); } This job doesn’t do a whole lot, } but this is where the processing} is occurring.
  6. 6. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csvdatasets/nasdaq/daily_prices }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57 (line-by-line) MarketCapitalizationMapper (emits) (DELL, 82.81*48736000) (DITC, 1.60*133600) (DLIA, 2.23*760800) (DWCH, 3.14*2400) (sorted and partitioned to specific reducers) MarketCapitalizationReducer
  7. 7. (coming from different mappers) (DELL, 82.81*48736000) (DELL, 31.92*18678500) (DELL, 23.85*16038700) (DELL, 30.38*68759800) (...) (but arriving at the same reducer)org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version)public static class MarketCapitalizationReducer extends Reducer<Text, DoubleWritable, Text, Text> { NumberFormat currencyFormat = NumberFormat.getCurrencyInstance(Locale.getDefault()); @Override protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { double highestCap = 0.0; for (DoubleWritable value : values) { highestCap = Math.max(highestCap, value.get()); } context.write(key, new Text(currencyFormat.format(highestCap))); }} (output of this reducer) (DELL, $4,035,828,160.00)
  8. 8. /tmp/nasdaq_marketcaps/part-r-00000 DAIO $1,515,345.00DAKT $63,656,600.00DANKY $89,668,857.00DARA $1,464,720.00DASTY $14,141,055.00DATA $2,888,325.00DAVE $5,144,800.00DBLE $1,040,996.00DBLEP $79,584.00DBRN $131,023,326.00DBTK $7,405,366.00DCAI $20,058,990.00DCGN $10,372,992.00DCOM $12,298,208.00DCTH $3,285,652.00DDDC $79,176.00DDIC $3,684,100.00DDMX $7,811,204.00DDRX $12,480,500.00DDSS $4,545,438.00DEAR $4,375,800.00DECK $271,081,580.00DEER $5,363,740.00DEIX $5,285,892.00
  9. 9. We can dynamically increase your clusters ifyou need the processing power, but it’stypically bottlenecked by the code.If your job takes longer than 10 minutes torun, come see us.

×