• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,080
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
116
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MapReduce 
  • 2. MapReduce • Google • Google • • Map Reduce • Google map reduce • MapReduce • Map – [1,2,3,4] – (*2)  [2,3,6,8] • Reduce – [1,2,3,4] – (sum)  10 – (Divide and Conquer) Copyright 2009 - Trend Micro Inc.
  • 3. MapReduce • MapReduce Google Map Reduce MapReduce • – Map ” ” key/value ” ” intermediate key/value – Reduce intermediate key intermediate values key/value • MapReduce Copyright 2009 - Trend Micro Inc.
  • 4. • – – – • http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/ Copyright 2009 - Trend Micro Inc.
  • 5. MapReduce • – map (K1, V1)  list(K2, V2) – reduce (K2, list(V2))  list(K3, V3) • grep – Map: (offset, line)  [(match, 1)] – Reduce: (match, [1, 1, ...])  [(match, n)] • MapReduce : Copyright 2009 - Trend Micro Inc.
  • 6. 6 Classification Copyright 2009 - Trend Micro Inc.
  • 7. ‧ ➝ ‧ ➝ Copyright 2009 - Trend Micro Inc.
  • 8. Word Count Classification Copyright 2009 - Trend Micro Inc.
  • 9. MapReduce • (Distributed Grep) – (pattern) • (Distributed Sort) – • URL (Count of URL Access Frequency) – Web URL Copyright 2009 - Trend Micro Inc.
  • 10. MapReduce Classification Copyright 2007 - Trend Micro Inc.
  • 11. Hadoop MapReduce • Apache Hadoop Google MapReduce – MapReduce – Java – Hadoop (HDFS) • Yahoo! • Google, Yahoo!, IBM, Amazon Hadoop • (Trend Micro) Hadoop MapReduce Copyright 2009 - Trend Micro Inc.
  • 12. Hadoop MapReduce • Map/Reduce framework – JobTracker – TaskTracker • JobTracker – Job – Job JobTracker Job. • TaskTrackers • Job Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 13. Hadoop MapReduce class MyJob { class Map { // Map } class Reduce { // Reduce } } main() { // job JobConf conf = new JobConf(“MyJob.class”); conf.setInputPath(…); conf.setOutputPath(…); conf.setMapperClass(Map.class); conf.setReduceClass(Reduce.class) // Job JobClient.runJob(conf); } Classification Copyright 2007 - Trend Micro Inc. }
  • 14. • – – – – HDFS MapReduce • – – • • , GUID, , , 1, 123, 131231231, VSAPI, open file 2, 456, 123123123, VSAPI, connect internet Copyright 2007 - Trend Micro Inc.
  • 15. Map • Mapper map() • Map : (K1, V1)  list(K2, V2) map( WritableComparable, Writable, OutputCollector, Reporter) • input map() • OutputCollector collect() method OutputCollector.collect( WritableComparable,Writable ) Copyright 2007 - Trend Micro Inc.
  • 16. Map class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text hour = new Text(); public void map( LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); String[] token = line.split(quot;,quot;); String timestamp = token[1]; Calendar c = Calendar.getInstance(); c.setTimeInMillis(Long.parseLong(timestamp)); Integer h = c.get(Calendar.HOUR); hour.set(h.toString()); output.collect(hour, one) }}} Copyright 2007 - Trend Micro Inc.
  • 17. Reduce • Reducer reduce() method • Reduce : (K2, list(V2))  list(K3, V3) reduce (WritableComparable, Iterator, OutputCollector, Reporter) • OutputCollector collect() method OutputCollector.collect( WritableComparable,Writable ) Copyright 2007 - Trend Micro Inc.
  • 18. Reduce class ReduceClass extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> { IntWritable SumValue = new IntWritable(); public void reduce( Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get(); SumValue.set(sum); output.collect(key, SumValue); }} Copyright 2007 - Trend Micro Inc.
  • 19. • JobConf – Mapper Reducer Inputformat OutputFormat Combiler Petitioner – – – • map reduce • • JobClient JobConf JobClient.runJob(conf); JobClient.submitJob(conf); JobClient.setJobEndNotificationURI(URI); Copyright 2007 - Trend Micro Inc.
  • 20. Main Function Class MyJob{ public static void main(String[] args) { JobConf conf = new JobConf(MyJob.class); conf.setJobName(”Caculate feedback log time distributionquot;); // set path conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); // set map reduce conf.setOutputKeyClass(Text.class); // set every word as key conf.setOutputValueClass(IntWritable.class); // set 1 as value conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(ReduceClass.class); onf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); // run JobClient.runJob(conf); }} Copyright 2007 - Trend Micro Inc.
  • 21. 1. – javac -classpath hadoop-*-core.jar -d MyJava MyJob.java 2. – jar –cvf MyJob.jar -C MyJava . 3. – bin/hadoop jar MyJob.jar MyJob input/ output/ Copyright 2007 - Trend Micro Inc.
  • 22. • bin/hadoop jar MyJob.jar MyJob input/ output/ Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 23. Web Console http://172.16.203.132:50030/ Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 24. Hadoop MapReduce • Mapper ? – Mapper Input Input Hadoop Mapper – JobConf setNumMapTasks(int) Hadoop Mapper Hadoop • Reducer ? – JobConf JobConf.setNumReduceTasks(int) Reducer – Reducer Reducer MapReduce Map Reduce Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 25. Non-Java Interface • Hadoop Pipes – MapReduce C++ API – C++ java • Hadoop Streaming – MapReduce Copyright 2007 - Trend Micro Inc.
  • 26. • Google MapReduce – http://labs.google.com/papers/mapreduce.html • Google MapReduce – http://code.google.com/edu/submissions/mapreduce/listing.html • Google MapReduce – http://code.google.com/edu/submissions/mapreduce-minilecture/ listing.html • Hadoop – http://hadoop.apache.org/core/ Copyright 2007 - Trend Micro Inc.
  • 27. • Eclipse MapReduce (IBM ) – Eclipse Hadoop – • http://code.google.com/edu/parallel/tools/hadoopvm/hadoop- eclipse-plugin.jar • Hadoop (Google ) – VMware Hadoop VMware Google – • http://code.google.com/edu/parallel/tools/hadoopvm/ index.html Copyright 2007 - Trend Micro Inc.