Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Set up Hadoop cluster
on Amazon EC2
Nattinan Yontchai (Earng): KMIT
Dr.Thanachart Numnonda: IMC Institute
By
January 2015
11
VPC Creation
In order to install Hadoop on EC2 instance, we may need to make use of Amazon VPC (Virtual Private
Cloud N...
22
3. Select VPC name as Hadoop VPC and left the rest as default, then click on Create VPC
4. On VPC Dashboard, select Sec...
33
6. Save the Inbound rules and rename the group as Hadoop Security Group
Launch EC2 for a Hadoop Master
In this step, we...
44
4 . Select 1 instance for Namenode, Hadoop-VPC as Network (the above created VPC) and remaining
properties as default >...
55
7. Configure Security Group > Select an existing security group > Select Security Group Name: default >
Click Review an...
66
10. Select on EC2 service and choose Elastic IPs, then click on Allocate New Address
11. After getting the IP address, ...
77
Install Hadoop Master
In this step. we will install a Hadoop master node as follows:
1. View a command for connecting t...
88
4. Type command > sudo apt-get update
5. Type command > ssh-keygen (press Enter when it prompts for answering)
6. Type ...
99
13. Type command > sudo mv hadoop-1.2.1 /usr/local/hadoop
14. Type command > sudo vi $HOME/.bashrc
15. Add config as fi...
1010
20. Add Private IP of a master server as figure below (in this case a private IP is 10.0.0.212)
21. Type command > su...
1111
25. Type command > sudo mkdir /usr/local/hadoop/tmp
26. Type command > sudo chown ubuntu /usr/local/hadoop
27. Type c...
1212
2. Click on Actions > Create Image
3. Name an image as Hadoop-Image as shown below:
4. Select on AMI tab (in the left...
1313
6. Select 3 instance for Namenode, Hadoop-VPC as Network (the above created VPC) and remaining
properties as default ...
1414
10. Review Instance Launch > Click Launch
11. Choose an existing key pair > LabCloudera > select on acknowledge > Lau...
1515
8. Type command > exit
9. Repeat step 6 – 8 for all slaves
10. Start Hadoop services by type command >> start-all.sh
...
1616
Testing the Hadoop Cluster
1. Viewing the Hadoop HDFS using WebUI by typing the following url in the web browser
http...
1717
Upcoming SlideShare
Loading in …5
×

Set up Hadoop Cluster on Amazon EC2

2,945 views

Published on

Published in: Technology

Set up Hadoop Cluster on Amazon EC2

  1. 1. Set up Hadoop cluster on Amazon EC2 Nattinan Yontchai (Earng): KMIT Dr.Thanachart Numnonda: IMC Institute By January 2015
  2. 2. 11 VPC Creation In order to install Hadoop on EC2 instance, we may need to make use of Amazon VPC (Virtual Private Cloud Network) and Elastic IP addresses so that we can stop and start the instance whenever we needed. With these two AWS services, we can achieve static private and public IP addresses for the EC2 instances being created. In this step, we will create VPC and assign the security group as follows: 1. Select a VPC Cloud service, it will open VPC dashboard as shown below: 2. Click on Start VPC Wizard and select VPC with a single public subnet as shown below:
  3. 3. 22 3. Select VPC name as Hadoop VPC and left the rest as default, then click on Create VPC 4. On VPC Dashboard, select Security group and then the default group as shown below: 5. Select Inbound Rules, then click on Edit and enter the following rules:
  4. 4. 33 6. Save the Inbound rules and rename the group as Hadoop Security Group Launch EC2 for a Hadoop Master In this step, we will launch an EC2 instance for a Hadoop master node as follows: 1. Select a EC2 service and click on Lunch Instance 2. Select an Amazon Machine Image (AMI). Select Ubuntu Server 14.04 LTS (PV) 3. Select an Instance Type: m3.large and click Next: Configure Instance Details
  5. 5. 44 4 . Select 1 instance for Namenode, Hadoop-VPC as Network (the above created VPC) and remaining properties as default > Click Next: Add Storage 5. Add Storage at least 40 GB > Next: Tag Instance 6. Tag Instance > Enter Value: Hadoop Master 01 > Click Next.
  6. 6. 55 7. Configure Security Group > Select an existing security group > Select Security Group Name: default > Click Review and Launch 8. Review Instance Launch > Click Launch 9. Choose an existing key pair > LabCloudera (or Create new key pair) >click on acknowledge > Launch Instances
  7. 7. 66 10. Select on EC2 service and choose Elastic IPs, then click on Allocate New Address 11. After getting the IP address, click on Allocate New Address 12. In the Allocate Address dialog box, select the instance just created
  8. 8. 77 Install Hadoop Master In this step. we will install a Hadoop master node as follows: 1. View a command for connecting the EC2 instance by select on EC2 dashboard then choose Hadoop master 01, click on Connect you will see the ssh command as follows: (Note in this case the public IP is 54.69.195.87) 2. Open the client terminal console and type the following command ssh -i clouderalab.pem ubuntu@54.69.195.xx 3. The EC2 instance terminal will now be open
  9. 9. 88 4. Type command > sudo apt-get update 5. Type command > ssh-keygen (press Enter when it prompts for answering) 6. Type command > cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 7. Type command > ssh 54.69.195.xx (Enter yes when prompt for answering) 8. Type command > exit 9. Type command > sudo apt-get install openjdk-7-jdk (Enter Y when prompt for answering) 10. Type command > java –version and press Enter key. (It should display as shown below) 11. Type command > wget http://mirror.issp.co.th/apache/hadoop/common/hadoop-1.2.1/hadoop- 1.2.1.tar.gz 12. Type command > tar –xvzf hadoop-1.2.1.tar.gz
  10. 10. 99 13. Type command > sudo mv hadoop-1.2.1 /usr/local/hadoop 14. Type command > sudo vi $HOME/.bashrc 15. Add config as figure below export HADOOP_PREFIX=/usr/local/hadoop export PATH=$PATH:$HADOOP_PREFIX/bin 16. Type command > exec bash 17. Type command > sudo vi /usr/local/hadoop/conf/hadoop-env.sh command. Press Enter key. 18. Edit the file as figure below export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=TRUE 19. Type command > sudo vi /usr/local/hadoop/conf/core-site.xml
  11. 11. 1010 20. Add Private IP of a master server as figure below (in this case a private IP is 10.0.0.212) 21. Type command > sudo vi /usr/local/hadoop/conf/mapred-site.xml 22. Add Private IP of Jobtracker server as figure below 23. Type command > sudo vi /usr/local/hadoop/conf/hdfs-site.xml 24. Add configure as figure below
  12. 12. 1111 25. Type command > sudo mkdir /usr/local/hadoop/tmp 26. Type command > sudo chown ubuntu /usr/local/hadoop 27. Type command > sudo chown ubuntu /usr/local/hadoop/tmp 28. Type command > hadoop namenode –format 29. Finish Cloning Instance on EC2 for Hadoop Slaves In this step, we will clone the created Hadoop instance for three other Hadoop instances to act as Hadoop slave. 1. Select a EC2 service and choose Hadoop Master 01
  13. 13. 1212 2. Click on Actions > Create Image 3. Name an image as Hadoop-Image as shown below: 4. Select on AMI tab (in the left pane) and choose Hadoop-Image, then click on Launch 5. Select an Instance Type: m3.medium and click Next: Configure Instance Details
  14. 14. 1313 6. Select 3 instance for Namenode, Hadoop-VPC as Network (the above created VPC) and remaining properties as default > Click Next: Add Storage 7. Add Storage at least 80 GB > Next: Tag Instance 8. Tag Instance > Enter Value: Hadoop Slave > Click Next. 9. Configure Security Group > Select an existing security group > Select Security Group Name: default > Click Review and Launch
  15. 15. 1414 10. Review Instance Launch > Click Launch 11. Choose an existing key pair > LabCloudera > select on acknowledge > Launch Instances 12. View the EC2 dashboard, it will show three new instances named Hadoop Slave 13. Allocate three new Elastic IP addresses and associate them to the Hadoop Slave instances as shown on example below Setup Hadoop Cluster 1. Ssh to the Master node (ssh -i clouderalab.pem ubuntu@54.69.195.xx) 2. Type command > sudo vi /usr/local/hadoop/conf/masters 3. Enter Private IP for the master server. Save and exit. 4. Type command > sudo vi /usr/local/hadoop/conf/slaves 5. Enter Private IP for Datanode servers. Save and exit. 6. Type command > ssh-copy-id –i $HOME/.ssh/id_rsa.pub ubuntu@10.0.0.193 (Enter yes when prompt for answering) 7. Type command > ssh 10.0.0.193 and press Enter key. (Test password-less )
  16. 16. 1515 8. Type command > exit 9. Repeat step 6 – 8 for all slaves 10. Start Hadoop services by type command >> start-all.sh 11. Type command jps in all four systems to ensure that Hadoop services are running At this point, the following Java processes should run on master… …and the following on slave.
  17. 17. 1616 Testing the Hadoop Cluster 1. Viewing the Hadoop HDFS using WebUI by typing the following url in the web browser http://54.69.195.xx:50070/
  18. 18. 1717

×