Your SlideShare is downloading. ×
0
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Hack reduce mr-intro
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hack reduce mr-intro

156

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
156
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HackReduce M a p R e d u c e I n t r oHopper.com (Greg Lu)
  • 2. Project github.com/hackreduce/Hackathon Wiki github.com/hackreduce/Hackathon/wikiDownload the Github project for some sample datasets
  • 3. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csv }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57 }NASDAQ,DYNT,2008-12-29,0.31,0.31,0.29,0.30,26900,0.30NASDAQ,DMLP,2003-10-21,17.65,17.94,17.58,17.59,4800,9.73NASDAQ,DORM,1997-02-07,7.88,7.88,7.63,7.75,7400,3.87 InputSplit 2NASDAQ,DXPE,2004-10-25,5.19,5.24,5.00,5.00,7600,2.50 }NASDAQ,DEST,2009-03-17,4.55,5.03,4.55,5.03,6800,5.03NASDAQ,DBRN,1992-01-02,8.88,9.25,8.75,8.88,84800,2.22NASDAQ,DXYN,1998-11-25,6.38,6.44,6.19,6.25,211100,6.25 InputSplit 3NASDAQ,DEAR,1998-12-08,10.50,11.50,10.50,10.50,5800,6.45... org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version) public int run(String[] args) throws Exception { Configuration conf = getConf(); if (args.length != 2) { System.err.println("Usage: " + getClass().getName() + " <input> <output>"); System.exit(2); } // Creating the MapReduce job (configuration) object Job job = new Job(conf); job.setJarByClass(getClass()); job.setJobName(getClass().getName()); } Defines how the data is split // The Nasdaq/NYSE data dumps comes in as a CSV file (text input), so we configure // the job to use this format. job.setInputFormatClass(TextInputFormat.class); and assigned to which mappers [...]
  • 4. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csvdatasets/nasdaq/daily_prices }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version) public int run(String[] args) throws Exception { [...] // Tell the job which Mapper and Reducer to use (classes defined above) job.setMapperClass(MarketCapitalizationMapper.class); job.setReducerClass(MarketCapitalizationReducer.class); } Point the job to the custom classes that we created in order to process the data. } // This is what the Mapper will be outputting to the Reducer job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); Define the types of the (key, value) // This is what the Reducer will be outputting pairs that we’ll be outputting from the job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); mappers and the result of the job itself. // Setting the input folder of the job FileInputFormat.addInputPath(job, new Path(args[0])); // Preparing the output folder by first deleting it if it exists Path output = new Path(args[1]); FileSystem.get(conf).delete(output, true); FileOutputFormat.setOutputPath(job, output); Now we’ll show the MarketCapitalizationMapper class
  • 5. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csvdatasets/nasdaq/daily_prices }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version)public static class MarketCapitalizationMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> { protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String inputString = value.toString(); String[] attributes = inputString.split(","); if (attributes.length != 9) throw new IllegalArgumentException("Input string given did not have 9 values in CSV format"); try { String exchange = attributes[0]; String stockSymbol = attributes[1]; Date date = sdf.parse(attributes[2]); double stockPriceOpen = Double.parseDouble(attributes[3]); double stockPriceHigh = Double.parseDouble(attributes[4]); double stockPriceLow = Double.parseDouble(attributes[5]); double stockPriceClose = Double.parseDouble(attributes[6]); int stockVolume = Integer.parseInt(attributes[7]); double stockPriceAdjClose = Double.parseDouble(attributes[8]); } catch (ParseException e) { throw new IllegalArgumentException("Input string contained an unknown value that couldnt be parsed"); } catch (NumberFormatException e) { throw new IllegalArgumentException("Input string contained an unknown number value that couldnt be parsed"); } double marketCap = stockPriceClose * stockVolume; context.write(new Text(stockSymbol), new DoubleWritable(marketCap)); } This job doesn’t do a whole lot, } but this is where the processing} is occurring.
  • 6. datasets/nasdaq/daily_prices/NASDAQ_daily_prices_subset.csvdatasets/nasdaq/daily_prices }NASDAQ,DELL,1997-08-26,83.87,84.75,82.50,82.81,48736000,10.35NASDAQ,DITC,2002-10-24,1.56,1.69,1.53,1.60,133600,1.60NASDAQ,DLIA,2008-01-28,1.91,2.31,1.91,2.23,760800,2.23 InputSplit 1NASDAQ,DWCH,2002-07-10,3.09,3.14,3.09,3.14,2400,1.57 (line-by-line) MarketCapitalizationMapper (emits) (DELL, 82.81*48736000) (DITC, 1.60*133600) (DLIA, 2.23*760800) (DWCH, 3.14*2400) (sorted and partitioned to specific reducers) MarketCapitalizationReducer
  • 7. (coming from different mappers) (DELL, 82.81*48736000) (DELL, 31.92*18678500) (DELL, 23.85*16038700) (DELL, 30.38*68759800) (...) (but arriving at the same reducer)org.hackreduce.examples.stockexchange.MarketCapitalization (expanded version)public static class MarketCapitalizationReducer extends Reducer<Text, DoubleWritable, Text, Text> { NumberFormat currencyFormat = NumberFormat.getCurrencyInstance(Locale.getDefault()); @Override protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { double highestCap = 0.0; for (DoubleWritable value : values) { highestCap = Math.max(highestCap, value.get()); } context.write(key, new Text(currencyFormat.format(highestCap))); }} (output of this reducer) (DELL, $4,035,828,160.00)
  • 8. /tmp/nasdaq_marketcaps/part-r-00000 DAIO $1,515,345.00DAKT $63,656,600.00DANKY $89,668,857.00DARA $1,464,720.00DASTY $14,141,055.00DATA $2,888,325.00DAVE $5,144,800.00DBLE $1,040,996.00DBLEP $79,584.00DBRN $131,023,326.00DBTK $7,405,366.00DCAI $20,058,990.00DCGN $10,372,992.00DCOM $12,298,208.00DCTH $3,285,652.00DDDC $79,176.00DDIC $3,684,100.00DDMX $7,811,204.00DDRX $12,480,500.00DDSS $4,545,438.00DEAR $4,375,800.00DECK $271,081,580.00DEER $5,363,740.00DEIX $5,285,892.00
  • 9. We can dynamically increase your clusters ifyou need the processing power, but it’stypically bottlenecked by the code.If your job takes longer than 10 minutes torun, come see us.

×