Hadoop on ec2

3,133 views

Published on

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,133
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
43
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hadoop on ec2

  1. 1. Hadoop on EC2 Configuring and running Hadoop clusters, using Cloudera distribution
  2. 2. My farm <ul><li>  </li></ul>
  3. 3. Start <ul><li>  </li></ul>
  4. 4. Confirm <ul><li>  </li></ul>
  5. 5. OK, it's running <ul><li>  </li></ul>
  6. 6. Set /etc/hosts <ul><li>  </li></ul>
  7. 7. Logging into an EC2 machine <ul><li>ec2_login.sh: </li></ul><ul><li>#!/bin/sh ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1 </li></ul><ul><li>For example, </li></ul><ul><li>ec2_login sh1 </li></ul>
  8. 8. Run command on the cluster <ul><li>run_on_cluster.sh: </li></ul><ul><li>  </li></ul><ul><li>#!/bin/bash for i in {1..6} do   ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1 done </li></ul><ul><li>For example: </li></ul><ul><li>run_on_cluster.sh 'ifconfig | grep cast' </li></ul>
  9. 9. Result of running ifconfig <ul><li>inet addr:10.220.141.227  Bcast:10.220.141.255   </li></ul><ul><li>inet addr:10.95.31.140  Bcast:10.95.31.255   </li></ul><ul><li>inet addr:10.220.214.15  Bcast:10.220.215.255   </li></ul><ul><li>inet addr:10.94.245.56  Bcast:10.94.245.255   </li></ul><ul><li>inet addr:10.127.17.143  Bcast:10.127.17.255   </li></ul><ul><li>inet addr:10.125.79.225  Bcast:10.125.79.255  </li></ul>
  10. 10. Edit local conf <ul><li>gedit masters core-site.xml mapred-site.xml slaves </li></ul><ul><li>10.220.141.227    -    masters, core-site.xml 10.95.31.140    -    mapred-site.xml 10.220.214.15    -    slaves... 10.94.245.56 10.127.17.143 10.125.79.225 </li></ul>
  11. 11. masters <ul><li>10.220.141.227 </li></ul>
  12. 12. core-site.xml <ul><li><?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?> <!-- Put site-specific property overrides in this file. --> <configuration>   <property>     <name>fs.default.name</name>     <value>hdfs://10.220.141.227</value>   </property> </configuration> </li></ul>
  13. 13. mapred-site.xml <ul><li><?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?> <!-- Put site-specific property overrides in this file. --> <configuration>   <property>     <name>mapred.job.tracker</name>     <value>10.95.31.140:54311</value>   </property>   <property>     <name>mapred.local.dir</name>     <value>/tmp/mapred</value>   </property> </configuration> </li></ul>
  14. 14. slaves <ul><li>10.220.214.15 10.94.245.56 10.127.17.143 10.125.79.225 </li></ul>
  15. 15. update-hadoop-cluster.sh <ul><li>#!/bin/bash for i in {1..6} do   scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/ done run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/' </li></ul>
  16. 16. Important gotchas <ul><li>sudo chkconfig hadoop-0.20-namenode off </li></ul><ul><li>repeat for each installed service </li></ul>
  17. 17. Important gotchas - 2 <ul><li>On EC2 in /etc/hosts you have </li></ul><ul><li># Added by cloud-init 127.0.1.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53 </li></ul><ul><li>Instead, do 127.0.0.1 </li></ul><ul><li># Added by me - the developer 127.0.0.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53 </li></ul><ul><li># !!! for remote access, use internal ip: </li></ul><ul><li>10.220.169.157  domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53 </li></ul>
  18. 18. Now start Hadoop services <ul><li>On each node  </li></ul><ul><li>for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done </li></ul><ul><li>Do it with script </li></ul><ul><li>run_my_cluster.sh '' </li></ul>
  19. 19. Verify HDFS and MR <ul><li>Verify! </li></ul><ul><li>hadoop fs -ls / </li></ul><ul><li>copy to </li></ul><ul><li>copy from </li></ul><ul><li>run MR </li></ul><ul><li>  </li></ul><ul><li>better yet... </li></ul>
  20. 20. w3m http://localhost:50070
  21. 21. w3m next screen <ul><li>  </li></ul>
  22. 22. Start HBase <ul><li>start-hbase.sh </li></ul><ul><li>#!/bin/sh sudo /etc/init.d/hadoop-hbase-master start sudo /etc/init.d/hadoop-zookeeper-server  start sudo /etc/init.d/hadoop-hbase-regionserver start </li></ul>
  23. 23. w3m http://localhost:60010
  24. 24. Stop HBase <ul><li>1. Do compaction </li></ul><ul><li>2. stop-hbase.sh </li></ul><ul><li>#!/bin/sh sudo /etc/init.d/hadoop-hbase-master stop sleep 5 sudo /etc/init.d/hadoop-zookeeper-server  stop sudo /etc/init.d/hadoop-hbase-regionserver stop </li></ul>
  25. 25. Amazon EMR <ul><li>  </li></ul>
  26. 26. Whirr <ul><li>  </li></ul>

×