1. Phoenix Training Academy Bangalore
For trainingenquiriescontact
hadoopacademy@gmail.com +91 9008500211 /+91 9972355965
Run a MapReduce Job
Validating our setup and data
First, validate that our cluster is set up, and that we can access our data. Navigate to the
command line to execute the following commands.
1. Type ./bdutil shell to SSH into the master node of the Hadoop cluster.
2. Type hadoop fs -ls . to check the cluster status. If data outputs, the cluster is set
up correctly.
Running the job
Next, run the job from the command line, while you are still connected to the cluster via SSH.
Always run jobs as the hadoop user to avoid having to type full Hadoop paths in commands.
The following example runs a sample job called WordCount. Hadoop installations include
this sample in the /home/hadoop/hadoop-install/hadoop-examples-*.jar file.
To run the WordCount job:
1. Navigate to the command line.
2. Type ./bdutil shell to SSH into the master node of the Hadoop cluster.
3. Type hadoop fs -mkdir input to create the input directory.
Note that when using Google Cloud Storage as your default file system, input
automatically resolves to gs://$<CONFIGBUCKET>/input. For more information
about these file paths, see accessing data from a job.
4. Copy any file from the web, such as the following example text from Apache, by
typing the following command: curl
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
common/ClusterSetup.html > setup.html.
5. Copy one or more text files into the input directory. Using the same Apache text in
the previous step, type the following command: hadoop fs -copyFromLocal
setup.html input.
6. Type cd /hadoop-install/share/hadoop/mapreduce to navigate to the Hadoop
install directory.
7. Type hadoop jar share/hadoop/mapreduce/hadoop-*-examples-*.jar
wordcount input output to run the job on data in the input directory, and place
results in the output directory.