Hadoop

21,520 views
21,327 views

Published on

Yahoo!でも使われているMapReduceのオープンソース版「Apache Hadoop」の紹介と動かし方

Published in: Technology
1 Comment
13 Likes
Statistics
Notes
  • thanks!
    ----------------
    http://svsupham.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
21,520
On SlideShare
0
From Embeds
0
Number of Embeds
84
Actions
Shares
0
Downloads
254
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide

Hadoop

  1. 1. $ env | grep JAVA JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home $ java -version java -versionjava version quot;1.5.0_07quot;Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-154)Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing) $ curl -O http://www.apache.org/dist/lucene/hadoop/stable/hadoop-0.13.0.tar.gz $ tar zxvf hadoop-0.13.0.tar.gz $ cd hadoop-0.13.0 $ bin/hadoop version Hadoop 0.13.0Subversion https://svn.apache.org/repos/asf/lucene/hadoop/branches/ branch-0.13 -r 544207Compiled by cutting on Mon Jun 4 12:01:18 PDT 2007 $
  2. 2. $ bin/hadoop jar hadoop-0.13.0-examples.jar wordcount < Dir> < Dir> $ bin/hadoop jar hadoop-0.13.0-examples.jar pi <Map > <1Map Sample > $ bin/hadoop jar hadoop-0.13.0-examples.jar grep < Dir> < Dir> <regex>
  3. 3. /Users/kaku/hadoop-deployed /Users/kaku/hadoop-deployed/hadoop-0.13.0 /Users/kaku/hadoop-deployed/filesystem
  4. 4. $ ssh-keygen -t rsa $ vi ~/.ssh/authorized_keys
  5. 5. $ cd ~/hadoop-deployed/hadoop-0.13.0 $ cat conf/slaves localhost $ vi conf/slaves $ cat conf/slaves mac01 mac02 mac03 $
  6. 6. export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/ CurrentJDK/Home export HADOOP_HOME=/Users/kaku/hadoop-deployed/hadoop-0.13.0
  7. 7. <configuration> <property> <name>fs.default.name</name> <value>mac01:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>mac01:9001</value> </property> <property> <name>mapred.map.tasks</name> <value>17</value> </property> <property> <name>mapred.reduce.tasks</name> <value>7</value> </property>
  8. 8. <property> <name>dfs.name.dir</name> <value>/Users/kaku/hadoop-deployed/filesystem/name</value> </property> <property> <name>dfs.data.dir</name> <value>/Users/kaku/hadoop-deployed/filesystem/data</value> </property> <property> <name>mapred.system.dir</name> <value>/Users/kaku/hadoop-deployed/filesystem/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/Users/kaku/hadoop-deployed/filesystem/mapred/local</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
  9. 9. $ rsync -avrz -e ssh ~/hadoop-deployed/hadoop-0.13.0/ mac02:~/hadoop-deployed/ hadoop-0.13.0/ $ rsync -avrz -e ssh ~/hadoop-deployed/hadoop-0.13.0/ mac03:~/hadoop-deployed/ hadoop-0.13.0/
  10. 10. $ cd ~/hadoop-deployed/hadoop-0.13.0 $ bin/hadoop namenode -format 07/07/21 05:26:52 INFO dfs.Storage: Storage directory /Users/kaku/hadoop-deployed/ filesystem/name has been successfully formatted. $ find ~/hadoop-deployed/filesystem /Users/kaku/hadoop-deployed/filesystem /Users/kaku/hadoop-deployed/filesystem/name /Users/kaku/hadoop-deployed/filesystem/name/current /Users/kaku/hadoop-deployed/filesystem/name/current/edits /Users/kaku/hadoop-deployed/filesystem/name/current/fsimage /Users/kaku/hadoop-deployed/filesystem/name/current/fstime /Users/kaku/hadoop-deployed/filesystem/name/current/VERSION /Users/kaku/hadoop-deployed/filesystem/name/image /Users/kaku/hadoop-deployed/filesystem/name/image/fsimage $
  11. 11. $ cd ~/hadoop-deployed/hadoop-0.13.0 $ cat bin/start-all.sh starting namenode, logging to /Users/kaku/hadoop-deployed/hadoop-0.13.0/logs/hadoop- kaku-namenode-mac01.out macbook.local: starting datanode, logging to /Users/kaku/hadoop-deployed/hadoop-0.13.0/ logs/hadoop-kaku-datanode-mac01.out localhost: starting secondarynamenode, logging to /Users/kaku/hadoop-deployed/ hadoop-0.13.0/logs/hadoop-kaku-secondarynamenode-mac01.out starting jobtracker, logging to /Users/kaku/hadoop-deployed/hadoop-0.13.0/logs/hadoop- kaku-jobtracker-mac01.out macbook.local: starting tasktracker, logging to /Users/kaku/hadoop-deployed/hadoop-0.13.0/ logs/hadoop-kaku-tasktracker-mac01.out $
  12. 12. $ lsof -i:9000 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 1274 kaku 8u IPv6 0x3967a24 0t0 TCP [::127.0.0.1]:cslistener (LISTEN) $ lsof -i:9001 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 1432 kaku 9u IPv6 0x5bdd7f8 0t0 TCP [::127.0.0.1]:etlservicemgr (LISTEN) $
  13. 13. $ cd ~/hadoop-deployed/hadoop-0.13.0 $ cat bin/stop-all.sh stopping jobtracker mac01: stopping tasktracker stopping namenode mac01: stopping datanode localhost: stopping secondarynamenode $
  14. 14. Java Standard Edition and Java Enterprise Edition <Java,1> <Standard,1> <Edition,1> <and,1> <Java,1> <Enterprise,1> <Edition,1> <Java,2> <Standard,1> <Edition,2> <and,1> <Enterprise,1>
  15. 15. public class WordCountMapper extends MapReduceBase implements Mapper { private static final IntWritable ONE = new IntWritable(1); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { output.collect(new Text(itr.nextToken()), ONE); } } }
  16. 16. public class WordCountReducer extends MapReduceBase implements Reducer { public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += ((IntWritable) values.next()).get(); } output.collect(key, new IntWritable(sum)); } }

×