Hadoop on aws amazon

Hadoop Cluster Configuration on AWS EC2

-----------------------------------------------------------------------------------------------------------
Buy some Instances on Aws amazon and one master and 10 slaves

ec2-50-17-21-209.compute-1.amazonaws.com master
ec2-54-242-251-124.compute-1.amazonaws.com slave1
----------------------------------------------------------------------------------------------------------------------------
• Make seperation as one master and 10 slaves
----------------------------------------------------------------------------------------------------------------------------
• Make sure ssh is working from master to all slaves
----------------------------------------------------------------------------------------------------------------------------
• Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master
----------------------------------------------------------------------------------------------------------------------------
• Master /etc/hosts file Looks like this.

127.0.0.1 localhost localhost.localdomain
10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master
10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1

----------------------------------------------------------------------------------------------------------------------------
• and slaves etc/hosts file looks like this.

• remove 127.0.0.1 in all slaves

10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master

---------------------------------------------------------------------------------------------------------------------------
• Download Hadoop installation folder from ApacheHadoop release and keep it in master folder
(Ex:-/usr/local/hadoop1.0.4)
----------------------------------------------------------------------------------------------------------------------------
• Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder
----------------------------------------------------------------------------------------------------------------------------
• set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH,
HADOOP_OPTS

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
export HADOOP_HOME=/usr/local/hadoop-1.0.4/
export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_HEAPSIZE=400000 (in MB)
----------------------------------------------------------------------------------------------------------------------------
• Open the Hdfs-Site.xml file.

• and set the following param's
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
<name>hadoop.log.dir</name>
<value>/media/ephemeral0/logs</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/media/ephemeral0/tmp-${user.name}
</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/media/ephemeral0/data-${user.name}</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/media/ephemeral0/name-${user.name}</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication</description>
</property>
<property>
<name>dfs.block.size</name>
<value>536870912</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

</configuration>

----------------------------------------------------------------------------------------------------------------------------
• Open the Mapred-site.xml.




<configuration>
<property>
<value>/media/ephemeral0/logs</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>60000</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>-Xmx400m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>14</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>14</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/media/ephemeral0/system-${user.name}</value>
<description>
system directory to run map and reduce tasks
</description>
</property>
<property>
<value>/media/ephemeral0/log-${user.name}</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>10</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name> mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
<property>
<name>mapred.create.symlink</name>
<value>true</value>
</property>
<property>
<name>mapred.child.ulimit</name>
<value>unlimited</value>
</property>
</configuration>

----------------------------------------------------------------------------------------------------------------------------
• Open the Core-Site.Xml




<configuration>
<property>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/media/ephemeral0/tmp-${user.name}</value>
</property>
<property>
</property>
<property>
<name>dfs.name.dir</name>
<value>/media/ephemeral0/name-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
----------------------------------------------------------------------------------------------------------------------------
• Open the Masters file and set the following param's
master
----------------------------------------------------------------------------------------------------------------------------
• Open the Slaves file and set the following param's

slave1
salve2
salve3
salve4
salve5
salve6
salve7
salve8
salve9

salve10
----------------------------------------------------------------------------------------------------------------------------

• Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are
all we using for hadoop).
----------------------------------------------------------------------------------------------------------------------------
• from master copy full hadoop-1.0.4 to all slave
ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute-
1.amazonaws.com:/usr/local/hadoop-1.0.4
----------------------------------------------------------------------------------------------------------------------------
• copy to all slaves from master.
----------------------------------------------------------------------------------------------------------------------------
• Add port 50000-50100 in security groups in aws console.
Hadoop namenode -format from master
and start-all.sh
----------------------------------------------------------------------------------------------------------------------------

Hadoop on aws amazon

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Hadoop on aws amazon

Similar to Hadoop on aws amazon (20)

More from Sandish Kumar H N

More from Sandish Kumar H N (6)

Recently uploaded

Recently uploaded (20)

Hadoop on aws amazon