ちょっとHadoopについて語ってみるか(仮題)
Upcoming SlideShare
Loading in...5
×
 

ちょっとHadoopについて語ってみるか(仮題)

on

  • 11,735 views

 

Statistics

Views

Total Views
11,735
Views on SlideShare
11,625
Embed Views
110

Actions

Likes
18
Downloads
234
Comments
0

5 Embeds 110

http://ameblo.jp 62
http://www.slideshare.net 31
http://paper.li 12
https://twitter.com 4
http://s.deeeki.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ちょっとHadoopについて語ってみるか(仮題) ちょっとHadoopについて語ってみるか(仮題) Presentation Transcript

  • Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • ✓Google Map/Reduce GFS ✓Java - Apache - http://hadoop.apache.org/
  • ✓ ✓
  • ✓ - HDD - ✓ ✓ ✓
  • Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • ✓ ✓Map Reduce
  • ✓ - grep - - ✓ - - -
  • PV UU 30+2
  • 5+1
  • Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • •JobTracker TaskTracker Map/Reduce •NameNode DataNode HDFS
  • •JobTracker NameNode •SecondaryNameNode NameNode •TaskTracker DataNode •JobTracker/NameNode •TaskTracker/DataNode
  • Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • ✓ - MapTask - ReduceTask - JobClient JobTracker Job - HDFS - Map/Reduce
  • public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • ✓Hadoop Streaming ✓LibHDFS ✓Hadoop Pipes ✓Amazon Elastic MapReduce
  • ✓Hadoop hadoop-streaming.jar ✓ Map/Reduce - C Perl Ruby Python Map/Reduce - Map/Reduce
  • ✓Map:cat / Reduce:wc ✓HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
  • ✓python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
  • ✓C HDFS http://wiki.apache.org/hadoop/LibHDFS
  • ✓C C++ HDFS Map/Reduce API http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/ package-summary.html
  • ✓Amazon EC2 MapReduce http://aws.amazon.com/elasticmapreduce/
  • Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • ✓ - NameNode/TaskTracker - ✓HDFS - HDFS DataNode
  • ✓JMX metrics - Hadoop - Hadoop Java jmxremote - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/ ✓metrics - DFS / MapReduce / JVM / RPC - Map/Reduce Task (Keyword Tracker
  • ✓JobTracker - (http://jobtracker:50030/jobtracker.jsp) - Map/Reduce
  • ✓ wiki - http://wiki.apache.org/hadoop/FAQ ✓Yahoo - http://www.docstoc.com/docs/3766688/Hadoop- Map-Reduce-Tuning-and-Debugging-Arun-C- Murthy-acmurthy
  • ✓TaskTracker Map Reduce - hadoop-site.xml) mapred.tasktracker.reduce.tasks.maximum - TaskTracker - TaskTracker 4 8GB
  • ✓Map→Reduce - io.sort.mb - io.sort.factor - io.sort.record.parcent - io.sort.spill.parcent
  • ✓Reduce - mapred.reduce.parallel.copies
  • ✓Map - mapred.compress.map.output (true )
  • ✓Map→Reduce HDFS - fs.inmemory.size.mb
  • ✓Reduce HDFS ✓ HDFS - org.apache.hadoop.mapred.lib.NullOutputFormat
  • ✓Reduce ✓ Reduce - -
  • ✓MRUnit - MapTask/ReduceTask - cloudera Hadoop - http://www.cloudera.com/hadoop-mrunit ✓JMock - Mock - http://www.jmock.org/ ✓Hadoop - (´ ω `)
  • ✓Hudson Hadoop - zero conf Hudson Hadoop - Hudson Hadoop - http://d.hatena.ne.jp/kkawa/20090315/p1 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/ instantly_turni.html
  • ✓ ✓Hadoop Streaming Java ✓Letʼs Try Hadoop Programing!