Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  2. 2. 1. Decide On Cluster Layout  There are four components of Hadoop which we would like to spread out across the cluster: ◦ Data nodes – actually store and manage data; ◦ Naming node – acts as a catalogue service, showing what data is stored where; ◦ Job tracker – tracks and manages submitted MapReduce tasks; ◦ Task tracker – low level worker that is issued jobs from job tracker.  Lets go with the following setup. This is fairly typical in terms of data nodes and task trackers across the cluster, and one instance of the naming node and job tracker: Node Hostname Component Master ec2-23-22-133-70 Naming Node Job Tracker Slave 1 ec2-23-20-53-36 Data Node Task Tracker Slave 2 ec2-184-73-42-163 Data Node Task Tracker
  3. 3. 2a. Configure Server Names  Logout of all of the machines and log back into the master server;  The hadoop configuration will be located here on the server: cd /home/ubuntu/hadoop-1.0.3/conf  Open the file ‘masters’ and replace the word ‘localhost’ with the hostname of the server that you have allocated to master: cd /home/ubuntu/hadoop-1.0.3/conf vi masters  Open the file ‘slaves’ and replace the word ‘localhost’ with the 2 hostnames of the server that you have been allocated on 2 separate lines: cd /home/ubuntu/hadoop-1.0.3/conf vi slaves