Your SlideShare is downloading. ×

Hadoop on ec2

2,714

Published on

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,714
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop on EC2 Configuring and running Hadoop clusters, using Cloudera distribution
  • 2. My farm
    •  
  • 3. Start
    •  
  • 4. Confirm
    •  
  • 5. OK, it's running
    •  
  • 6. Set /etc/hosts
    •  
  • 7. Logging into an EC2 machine
    • ec2_login.sh:
    • #!/bin/sh ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1
    • For example,
    • ec2_login sh1
  • 8. Run command on the cluster
    • run_on_cluster.sh:
    •  
    • #!/bin/bash for i in {1..6} do   ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1 done
    • For example:
    • run_on_cluster.sh 'ifconfig | grep cast'
  • 9. Result of running ifconfig
    • inet addr:10.220.141.227  Bcast:10.220.141.255  
    • inet addr:10.95.31.140  Bcast:10.95.31.255  
    • inet addr:10.220.214.15  Bcast:10.220.215.255  
    • inet addr:10.94.245.56  Bcast:10.94.245.255  
    • inet addr:10.127.17.143  Bcast:10.127.17.255  
    • inet addr:10.125.79.225  Bcast:10.125.79.255 
  • 10. Edit local conf
    • gedit masters core-site.xml mapred-site.xml slaves
    • 10.220.141.227    -    masters, core-site.xml 10.95.31.140    -    mapred-site.xml 10.220.214.15    -    slaves... 10.94.245.56 10.127.17.143 10.125.79.225
  • 11. masters
    • 10.220.141.227
  • 12. core-site.xml
    • <?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?> <!-- Put site-specific property overrides in this file. --> <configuration>   <property>     <name>fs.default.name</name>     <value>hdfs://10.220.141.227</value>   </property> </configuration>
  • 13. mapred-site.xml
    • <?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?> <!-- Put site-specific property overrides in this file. --> <configuration>   <property>     <name>mapred.job.tracker</name>     <value>10.95.31.140:54311</value>   </property>   <property>     <name>mapred.local.dir</name>     <value>/tmp/mapred</value>   </property> </configuration>
  • 14. slaves
    • 10.220.214.15 10.94.245.56 10.127.17.143 10.125.79.225
  • 15. update-hadoop-cluster.sh
    • #!/bin/bash for i in {1..6} do   scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/ done run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/'
  • 16. Important gotchas
    • sudo chkconfig hadoop-0.20-namenode off
    • repeat for each installed service
  • 17. Important gotchas - 2
    • On EC2 in /etc/hosts you have
    • # Added by cloud-init 127.0.1.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
    • Instead, do 127.0.0.1
    • # Added by me - the developer 127.0.0.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
    • # !!! for remote access, use internal ip:
    • 10.220.169.157  domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
  • 18. Now start Hadoop services
    • On each node 
    • for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
    • Do it with script
    • run_my_cluster.sh ''
  • 19. Verify HDFS and MR
    • Verify!
    • hadoop fs -ls /
    • copy to
    • copy from
    • run MR
    •  
    • better yet...
  • 20. w3m http://localhost:50070
  • 21. w3m next screen
    •  
  • 22. Start HBase
    • start-hbase.sh
    • #!/bin/sh sudo /etc/init.d/hadoop-hbase-master start sudo /etc/init.d/hadoop-zookeeper-server  start sudo /etc/init.d/hadoop-hbase-regionserver start
  • 23. w3m http://localhost:60010
  • 24. Stop HBase
    • 1. Do compaction
    • 2. stop-hbase.sh
    • #!/bin/sh sudo /etc/init.d/hadoop-hbase-master stop sleep 5 sudo /etc/init.d/hadoop-zookeeper-server  stop sudo /etc/init.d/hadoop-hbase-regionserver stop
  • 25. Amazon EMR
    •  
  • 26. Whirr
    •  

×