Hadoop Interview Questions
As Hadoop becomes the hot career today, many finds
difficult to pass through interview.
In this post we have put together a comprehensive list
of frequently asked interview questions and answers to
help you get through your Hadoop interview.
1) What is MapReduce?
It is a framework or a programming model that is used
for processing large data sets over clusters of
computers using distributed programming.
2) What are ‘maps’ and ‘reduces’?
‘Maps‘ and ‘Reduces‘ are two phases of solving a query
in HDFS. ‘Map’ is responsible to read data from input
location, and based on the input type, it will generate a
key value pair, that is, an intermediate output in local
machine. ’Reducer’ is responsible to process the
intermediate output received from the mapper and
generate the final output.
3) What are the four basic parameters of a mapper?
The four basic parameters of a mapper are Long
Writable, text, text and IntWritable. The first two
represent input parameters and the second two
represent intermediate output parameters.
4) What are the four basic parameters of a reducer?
The four basic parameters of a reducer are text,
IntWritable, text, IntWritable. The first two represent
intermediate output parameters and the second two
represent final output parameters.
5) Which are the three modes in which Hadoop can be
run?
The three modes in which Hadoop can be run are:
1. standalone (local) mode
2. Pseudo-distributed mode
3. Fully distributed mode
6) What are the features of Stand alone (local) mode?
In stand-alone mode there are no daemons, everything
runs on a single JVM. It has no DFS and utilizes the
local file system. Stand-alone mode is suitable only for
running MapReduce programs during development. It
is one of the most least used environments.
7) What are the features of Pseudo mode?
Pseudo mode is used both for development and in the
QA environment. In the Pseudo mode all the daemons
run on the same machine.
8) What is BloomMapFile used for?
The BloomMapFile is a class that extends MapFile. So
its functionality is similar to MapFile. BloomMapFile
uses dynamic Bloom filters to provide quick
membership test for the keys. It is used in Hbase table
format.
9) What is PIG?
PIG is a platform for analyzing large data sets that
consist of high level language for expressing data
analysis programs, coupled with infrastructure for
evaluating these programs. PIG’s infrastructure layer
consists of a compiler that produces sequence of
MapReduce Programs.
10) What is the difference between logical and physical
plans?
The logical plan describes the logical operators that
have to be executed by Pig during execution. After this,
Pig produces a physical plan. The physical plan
describes the physical operators that are needed to
execute the script.
We’ve covered some of the frequently asked interview
questions.
If you are looking out for more Hadoop Interview
Questions that are frequently asked by employers, visit
http://www.edureka.in/blog/category/big-data-and-hadoop/

Hadoop Interview Question and Answers

  • 1.
  • 2.
    As Hadoop becomesthe hot career today, many finds difficult to pass through interview.
  • 3.
    In this postwe have put together a comprehensive list of frequently asked interview questions and answers to help you get through your Hadoop interview.
  • 4.
    1) What isMapReduce? It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming.
  • 5.
    2) What are‘maps’ and ‘reduces’? ‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS. ‘Map’ is responsible to read data from input location, and based on the input type, it will generate a key value pair, that is, an intermediate output in local machine. ’Reducer’ is responsible to process the intermediate output received from the mapper and generate the final output.
  • 6.
    3) What arethe four basic parameters of a mapper? The four basic parameters of a mapper are Long Writable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters.
  • 7.
    4) What arethe four basic parameters of a reducer? The four basic parameters of a reducer are text, IntWritable, text, IntWritable. The first two represent intermediate output parameters and the second two represent final output parameters.
  • 8.
    5) Which arethe three modes in which Hadoop can be run? The three modes in which Hadoop can be run are: 1. standalone (local) mode 2. Pseudo-distributed mode 3. Fully distributed mode
  • 9.
    6) What arethe features of Stand alone (local) mode? In stand-alone mode there are no daemons, everything runs on a single JVM. It has no DFS and utilizes the local file system. Stand-alone mode is suitable only for running MapReduce programs during development. It is one of the most least used environments.
  • 10.
    7) What arethe features of Pseudo mode? Pseudo mode is used both for development and in the QA environment. In the Pseudo mode all the daemons run on the same machine.
  • 11.
    8) What isBloomMapFile used for? The BloomMapFile is a class that extends MapFile. So its functionality is similar to MapFile. BloomMapFile uses dynamic Bloom filters to provide quick membership test for the keys. It is used in Hbase table format.
  • 12.
    9) What isPIG? PIG is a platform for analyzing large data sets that consist of high level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. PIG’s infrastructure layer consists of a compiler that produces sequence of MapReduce Programs.
  • 13.
    10) What isthe difference between logical and physical plans? The logical plan describes the logical operators that have to be executed by Pig during execution. After this, Pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.
  • 14.
    We’ve covered someof the frequently asked interview questions.
  • 15.
    If you arelooking out for more Hadoop Interview Questions that are frequently asked by employers, visit http://www.edureka.in/blog/category/big-data-and-hadoop/