0
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓Google Map/Reduce             GFS



✓Java
 - Apache
 - http://hadoop.apache.org/
✓

✓
✓

- HDD
-
✓
✓


✓
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓

✓Map   Reduce
✓
- grep
-
-
✓
-

-
-
PV UU
30+2
5+1
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
•JobTracker TaskTracker Map/Reduce
•NameNode DataNode HDFS
•JobTracker NameNode
•SecondaryNameNode NameNode

•TaskTracker DataNode

•JobTracker/NameNode
•TaskTracker/DataNode
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓
-   MapTask
-   ReduceTask
-   JobClient    JobTracker Job
-   HDFS
-   Map/Reduce
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, Int...
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, Int...
✓Hadoop Streaming
✓LibHDFS
✓Hadoop Pipes
✓Amazon Elastic MapReduce
✓Hadoop      hadoop-streaming.jar
✓                 Map/Reduce


 - C Perl Ruby Python
     Map/Reduce
 -      Map/Reduce
✓Map:cat / Reduce:wc
✓HDFS
$ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/
$ hadoop jar /usr/local/ha...
✓python
   http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python


$ hadoop jar /usr/local/hadoop...
✓C HDFS
http://wiki.apache.org/hadoop/LibHDFS
✓C C++ HDFS                                Map/Reduce
                      API
http://hadoop.apache.org/common/docs/r0.20...
✓Amazon EC2                                MapReduce
 http://aws.amazon.com/elasticmapreduce/
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓
- NameNode/TaskTracker
-
✓HDFS
- HDFS    DataNode
✓JMX              metrics
 - Hadoop

 - Hadoop Java                    jmxremote
 - http://www.cloudera.com/blog/2009/03/1...
✓JobTracker
 -              (http://jobtracker:50030/jobtracker.jsp)

 - Map/Reduce
✓      wiki
 - http://wiki.apache.org/hadoop/FAQ
✓Yahoo
 - http://www.docstoc.com/docs/3766688/Hadoop-
  Map-Reduce-Tuning...
✓TaskTracker                         Map
 Reduce
 -               hadoop-site.xml)


     mapred.tasktracker.reduce.tasks....
✓Map→Reduce


 -   io.sort.mb
 -   io.sort.factor
 -   io.sort.record.parcent
 -   io.sort.spill.parcent
✓Reduce


 - mapred.reduce.parallel.copies
✓Map

- mapred.compress.map.output (true )
✓Map→Reduce
                         HDFS


 - fs.inmemory.size.mb
✓Reduce              HDFS


✓        HDFS
 - org.apache.hadoop.mapred.lib.NullOutputFormat
✓Reduce
✓     Reduce


 -

 -
✓MRUnit
 - MapTask/ReduceTask
 - cloudera               Hadoop
 - http://www.cloudera.com/hadoop-mrunit

✓JMock
 -        ...
✓Hudson Hadoop
 - zero conf Hudson               Hadoop


 - Hudson                Hadoop


 - http://d.hatena.ne.jp/kkawa...
✓

✓Hadoop Streaming          Java


✓Letʼs Try Hadoop Programing!
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
Upcoming SlideShare
Loading in...5
×

ちょっとHadoopについて語ってみるか(仮題)

9,653

Published on

Published in: Technology
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,653
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
234
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide
  • Transcript of "ちょっとHadoopについて語ってみるか(仮題)"

    1. 1. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
    2. 2. ✓Google Map/Reduce GFS ✓Java - Apache - http://hadoop.apache.org/
    3. 3. ✓ ✓
    4. 4. ✓ - HDD - ✓ ✓ ✓
    5. 5. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
    6. 6. ✓ ✓Map Reduce
    7. 7. ✓ - grep - - ✓ - - -
    8. 8. PV UU 30+2
    9. 9. 5+1
    10. 10. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
    11. 11. •JobTracker TaskTracker Map/Reduce •NameNode DataNode HDFS
    12. 12. •JobTracker NameNode •SecondaryNameNode NameNode •TaskTracker DataNode •JobTracker/NameNode •TaskTracker/DataNode
    13. 13. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
    14. 14. ✓ - MapTask - ReduceTask - JobClient JobTracker Job - HDFS - Map/Reduce
    15. 15. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
    16. 16. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
    17. 17. ✓Hadoop Streaming ✓LibHDFS ✓Hadoop Pipes ✓Amazon Elastic MapReduce
    18. 18. ✓Hadoop hadoop-streaming.jar ✓ Map/Reduce - C Perl Ruby Python Map/Reduce - Map/Reduce
    19. 19. ✓Map:cat / Reduce:wc ✓HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
    20. 20. ✓python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
    21. 21. ✓C HDFS http://wiki.apache.org/hadoop/LibHDFS
    22. 22. ✓C C++ HDFS Map/Reduce API http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/ package-summary.html
    23. 23. ✓Amazon EC2 MapReduce http://aws.amazon.com/elasticmapreduce/
    24. 24. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
    25. 25. ✓ - NameNode/TaskTracker - ✓HDFS - HDFS DataNode
    26. 26. ✓JMX metrics - Hadoop - Hadoop Java jmxremote - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/ ✓metrics - DFS / MapReduce / JVM / RPC - Map/Reduce Task (Keyword Tracker
    27. 27. ✓JobTracker - (http://jobtracker:50030/jobtracker.jsp) - Map/Reduce
    28. 28. ✓ wiki - http://wiki.apache.org/hadoop/FAQ ✓Yahoo - http://www.docstoc.com/docs/3766688/Hadoop- Map-Reduce-Tuning-and-Debugging-Arun-C- Murthy-acmurthy
    29. 29. ✓TaskTracker Map Reduce - hadoop-site.xml) mapred.tasktracker.reduce.tasks.maximum - TaskTracker - TaskTracker 4 8GB
    30. 30. ✓Map→Reduce - io.sort.mb - io.sort.factor - io.sort.record.parcent - io.sort.spill.parcent
    31. 31. ✓Reduce - mapred.reduce.parallel.copies
    32. 32. ✓Map - mapred.compress.map.output (true )
    33. 33. ✓Map→Reduce HDFS - fs.inmemory.size.mb
    34. 34. ✓Reduce HDFS ✓ HDFS - org.apache.hadoop.mapred.lib.NullOutputFormat
    35. 35. ✓Reduce ✓ Reduce - -
    36. 36. ✓MRUnit - MapTask/ReduceTask - cloudera Hadoop - http://www.cloudera.com/hadoop-mrunit ✓JMock - Mock - http://www.jmock.org/ ✓Hadoop - (´ ω `)
    37. 37. ✓Hudson Hadoop - zero conf Hudson Hadoop - Hudson Hadoop - http://d.hatena.ne.jp/kkawa/20090315/p1 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/ instantly_turni.html
    38. 38. ✓ ✓Hadoop Streaming Java ✓Letʼs Try Hadoop Programing!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×