Your SlideShare is downloading. ×
Hadoop Cluster Configuration on AWS EC2-----------------------------------------------------------------------------------...
10.155.245.217             ec2-174-129-99-84.compute-1.amazonaws.com slave5        10.155.244.177             ec2-50-16-10...
<property>             <name>fs.default.name</name>             <value>hdfs://master:9000</value>        </property>      ...
<value>14</value>     </property><property> <name>mapred.system.dir</name>  <value>/media/ephemeral0/system-${user.name}</...
--------------------------------------------------------------------------------------------------------------------------...
salve10-------------------------------------------------------------------------------------------------------------------...
Upcoming SlideShare
Loading in...5
×

Hadoop on aws amazon

426

Published on

Hadoop Configuration on EC2 AWS Amazon

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
426
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop on aws amazon"

  1. 1. Hadoop Cluster Configuration on AWS EC2-----------------------------------------------------------------------------------------------------------Buy some Instances on Aws amazon and one master and 10 slavesec2-50-17-21-209.compute-1.amazonaws.com masterec2-54-242-251-124.compute-1.amazonaws.com slave1ec2-23-23-17-15.compute-1.amazonaws.com slave2ec2-50-19-79-241.compute-1.amazonaws.com slave3ec2-50-16-49-229.compute-1.amazonaws.com slave4ec2-174-129-99-84.compute-1.amazonaws.com slave5ec2-50-16-105-188.compute-1.amazonaws.com slave6ec2-174-129-92-105.compute-1.amazonaws.com slave7ec2-54-242-20-144.compute-1.amazonaws.com slave8ec2-54-243-24-10.compute-1.amazonaws.com slave9ec2-204-236-205-227.compute-1.amazonaws.com slave10---------------------------------------------------------------------------------------------------------------------------- • Make seperation as one master and 10 slaves---------------------------------------------------------------------------------------------------------------------------- • Make sure ssh is working from master to all slaves---------------------------------------------------------------------------------------------------------------------------- • Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master---------------------------------------------------------------------------------------------------------------------------- • Master /etc/hosts file Looks like this. 127.0.0.1 localhost localhost.localdomain 10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4 10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10---------------------------------------------------------------------------------------------------------------------------- • and slaves etc/hosts file looks like this. • remove 127.0.0.1 in all slaves 10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4
  2. 2. 10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10--------------------------------------------------------------------------------------------------------------------------- • Download Hadoop installation folder from ApacheHadoop release and keep it in master folder (Ex:-/usr/local/hadoop1.0.4)---------------------------------------------------------------------------------------------------------------------------- • Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder---------------------------------------------------------------------------------------------------------------------------- • set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH, HADOOP_OPTS export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 export HADOOP_HOME=/usr/local/hadoop-1.0.4/ export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64 export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true" export HADOOP_HEAPSIZE=400000 (in MB)---------------------------------------------------------------------------------------------------------------------------- • Open the Hdfs-Site.xml file. • and set the following params<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name} </value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property>
  3. 3. <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>3</value> <description>Default block replication</description> </property> <property> <name>dfs.block.size</name> <value>536870912</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property></configuration>---------------------------------------------------------------------------------------------------------------------------- • Open the Mapred-site.xml. • and set the following params<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>mapred.child.java.opts</name> <value>60000</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>-Xmx400m</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>14</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name>
  4. 4. <value>14</value> </property><property> <name>mapred.system.dir</name> <value>/media/ephemeral0/system-${user.name}</value> <description> system directory to run map and reduce tasks </description> </property><property><name>hadoop.log.dir</name><value>/media/ephemeral0/log-${user.name}</value></property> <property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>10</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property><property> <name> mapreduce.map.output.compress</name> <value>true</value></property><property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value></property><property> <name>mapred.create.symlink</name> <value>true</value></property><property><name>mapred.child.ulimit</name><value>unlimited</value></property></configuration>
  5. 5. ---------------------------------------------------------------------------------------------------------------------------- • Open the Core-Site.Xml • and set the following params<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name}</value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property></configuration>---------------------------------------------------------------------------------------------------------------------------- • Open the Masters file and set the following params master---------------------------------------------------------------------------------------------------------------------------- • Open the Slaves file and set the following params slave1 salve2 salve3 salve4 salve5 salve6 salve7 salve8 salve9
  6. 6. salve10---------------------------------------------------------------------------------------------------------------------------- • Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are all we using for hadoop).---------------------------------------------------------------------------------------------------------------------------- • from master copy full hadoop-1.0.4 to all slave ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute- 1.amazonaws.com:/usr/local/hadoop-1.0.4---------------------------------------------------------------------------------------------------------------------------- • copy to all slaves from master.---------------------------------------------------------------------------------------------------------------------------- • Add port 50000-50100 in security groups in aws console. Hadoop namenode -format from master and start-all.sh----------------------------------------------------------------------------------------------------------------------------

×