Hadoop on ec2
Upcoming SlideShare
Loading in...5
×
 

Hadoop on ec2

on

  • 3,398 views

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

Statistics

Views

Total Views
3,398
Views on SlideShare
3,396
Embed Views
2

Actions

Likes
2
Downloads
37
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop on ec2 Hadoop on ec2 Presentation Transcript

  • Hadoop on EC2 Configuring and running Hadoop clusters, using Cloudera distribution
  • My farm
    •  
  • Start
    •  
  • Confirm
    •  
  • OK, it's running
    •  
  • Set /etc/hosts
    •  
  • Logging into an EC2 machine
    • ec2_login.sh:
    • #!/bin/sh ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1
    • For example,
    • ec2_login sh1
  • Run command on the cluster
    • run_on_cluster.sh:
    •  
    • #!/bin/bash for i in {1..6} do   ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1 done
    • For example:
    • run_on_cluster.sh 'ifconfig | grep cast'
  • Result of running ifconfig
    • inet addr:10.220.141.227  Bcast:10.220.141.255  
    • inet addr:10.95.31.140  Bcast:10.95.31.255  
    • inet addr:10.220.214.15  Bcast:10.220.215.255  
    • inet addr:10.94.245.56  Bcast:10.94.245.255  
    • inet addr:10.127.17.143  Bcast:10.127.17.255  
    • inet addr:10.125.79.225  Bcast:10.125.79.255 
  • Edit local conf
    • gedit masters core-site.xml mapred-site.xml slaves
    • 10.220.141.227    -    masters, core-site.xml 10.95.31.140    -    mapred-site.xml 10.220.214.15    -    slaves... 10.94.245.56 10.127.17.143 10.125.79.225
  • masters
    • 10.220.141.227
  • core-site.xml
    • <?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?> <!-- Put site-specific property overrides in this file. --> <configuration>   <property>     <name>fs.default.name</name>     <value>hdfs://10.220.141.227</value>   </property> </configuration>
  • mapred-site.xml
    • <?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?> <!-- Put site-specific property overrides in this file. --> <configuration>   <property>     <name>mapred.job.tracker</name>     <value>10.95.31.140:54311</value>   </property>   <property>     <name>mapred.local.dir</name>     <value>/tmp/mapred</value>   </property> </configuration>
  • slaves
    • 10.220.214.15 10.94.245.56 10.127.17.143 10.125.79.225
  • update-hadoop-cluster.sh
    • #!/bin/bash for i in {1..6} do   scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/ done run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/'
  • Important gotchas
    • sudo chkconfig hadoop-0.20-namenode off
    • repeat for each installed service
  • Important gotchas - 2
    • On EC2 in /etc/hosts you have
    • # Added by cloud-init 127.0.1.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
    • Instead, do 127.0.0.1
    • # Added by me - the developer 127.0.0.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
    • # !!! for remote access, use internal ip:
    • 10.220.169.157  domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53
  • Now start Hadoop services
    • On each node 
    • for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
    • Do it with script
    • run_my_cluster.sh ''
  • Verify HDFS and MR
    • Verify!
    • hadoop fs -ls /
    • copy to
    • copy from
    • run MR
    •  
    • better yet...
  • w3m http://localhost:50070
  • w3m next screen
    •  
  • Start HBase
    • start-hbase.sh
    • #!/bin/sh sudo /etc/init.d/hadoop-hbase-master start sudo /etc/init.d/hadoop-zookeeper-server  start sudo /etc/init.d/hadoop-hbase-regionserver start
  • w3m http://localhost:60010
  • Stop HBase
    • 1. Do compaction
    • 2. stop-hbase.sh
    • #!/bin/sh sudo /etc/init.d/hadoop-hbase-master stop sleep 5 sudo /etc/init.d/hadoop-zookeeper-server  stop sudo /etc/init.d/hadoop-hbase-regionserver stop
  • Amazon EMR
    •  
  • Whirr
    •