SlideShare a Scribd company logo
1 of 8
Download to read offline
1. Example Word Count using MR  
a. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-
client-core/MapReduceTutorial.html 
 
2. Custom Partition example  
https://myhadoopexamples.wordpress.com/2014/03/01/hadoop-mapreduce-example-with-par
titioner/ 
 
3. How to set total number of Mapper and Reducer in programmatically  
 job.setNumReduceTasks(3); 
 
4. How to pass external dependency via Distributed Cache  
  
     JobConf job = new JobConf(); 
     DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job); 
     DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); 
     DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); 
     DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); 
     DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); 
     DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job); 
  
5. Combiner example  
job.setCombinerClass(IntSumReducer.class); 
 
6. Counters example  
 
 
7. Rebalancing  
$ hdfs balancer 
 
8. Hadoop shell commands  
 
9. Initially I set replication factor is 3. After Certain time If I set replication factor is 4, what will 
happen 
 
10. I have a new node I want to add to a running Hadoop cluster; how do I start services on just 
one node? 
$ cd path/to/hadoop 
$ bin/hadoop-daemon.sh start datanode 
$ bin/hadoop-daemon.sh start tasktracker 
 
hadoop dfsadmin -refreshNodes 
hadoop mradmin -refreshNodes  
so that the NameNode and JobTracker know of the additional node that has been added 
 
 
Hadoop Data Types 
Hadoop provides several primitive data types in writable versions. 
 
int -------->​IntWritable  
long -------->​LongWritable  
boolean -------->​BooleanWritable  
float -------->​FloatWritable  
byte --------> ​ByteWritable 
 
We can use the following built-in data types as key and value 
 
Text ​:This stores a UTF8 text 
ByteWritable ​: This stores a sequence of bytes 
VIntWritable ​and ​VLongWritable ​: These stores variable length integer and long values 
Nullwritable​:This is zero-length Writable type that can be used when you don’t want to use a 
key or value type 
 
The following hadoop built-in collection data types can only be used as value types 
ArrayWritable​: This stores an array of values belonging to a Writable type. 
 
11.  ​Is there an easy way to see the status and health of a cluster? 
$ bin/hadoop dfsadmin -report 
 
12. How much network bandwidth might I need between racks in a medium size (40-80 node) 
Hadoop cluster? 
 
The true answer depends on the types of jobs you're running. As a back of the envelope calculation one 
might figure something like this: 
 
60 nodes total on 2 racks = 30 nodes per rack Each node might process about 100MB/sec of data In the 
case of a sort job where the intermediate data is the same size as the input data, that means each node 
needs to shuffle 100MB/sec of data In aggregate, each rack is then producing about 3GB/sec of data 
However, given even reducer spread across the racks, each rack will need to send 1.5GB/sec to reducers 
running on the other rack. Since the connection is full duplex, that means you need 1.5GB/sec of 
bisection bandwidth for this theoretical job. So that's 12Gbps. 
 
However, the above calculations are probably somewhat of an upper bound. A large number of jobs 
have significant data reduction during the map phase, either by some kind of filtering/selection going 
on in the Mapper itself, or by good usage of Combiners. Additionally, intermediate data compression 
can cut the intermediate data transfer by a significant factor. Lastly, although your disks can probably 
provide 100MB sustained throughput, it's rare to see a MR job which can sustain disk speed IO through 
the entire pipeline. So, I'd say my estimate is at least a factor of 2 too high. 
 
So, the simple answer is that 4-6Gbps is most likely just fine for most practical jobs. If you want to be 
extra safe, many inexpensive switches can operate in a "stacked" configuration where the bandwidth 
between them is essentially backplane speed. That should scale you to 96 nodes with plenty of 
headroom. Many inexpensive gigabit switches also have one or two 10GigE ports which can be used 
effectively to connect to each other or to a 10GE core. 
 
 I see a maximum of 2 maps/reduces spawned concurrently on each TaskTracker, how do I increase 
that? 
mapred.tasktracker.map.tasks.maximum  
mapred.tasktracker.reduce.tasks.maximum  
 
How do you gracefully stop a running job? 
hadoop job -kill JOBID 
 
What is Hadoop framework? 
Hadoop is a open source framework which is written in java by apche  
software foundation. This framework is used to write software application which  
requires to process vast amount of data (It could handle multi tera bytes of data).  
It works in-paralle on large clusters which could have 1000 of computers (Nodes)  
on the clusters. It also process data very reliably and fault-tolerant manner. See  
the below image how does it looks. 
 
What is MapReduce ? 
Map reduce is an algorithm or concept to process Huge amount of data in a  
faster way. As per its name you can divide it Map and Reduce. 
The main MapReduce job usually splits the input data-set into independent chunks.  
(Big data sets in the multiple small datasets) 
 
MapTask: will process these chunks in a completely parallel manner (One node can process one or more 
chunks). 
 
The framework sorts the outputs of the maps. 
 
Reduce Task : And the above output will be the input for the reducetasks, produces the final result. 
Your business logic would be written in the MappedTask and ReducedTask. Typically both the input and 
the output of the job are stored in a file-system (Not database). The framework takes care of 
scheduling tasks, monitoring them and re-executes the failed tasks. 
 
Which interface needs to be implemented to create Mapper and Reducer for the Hadoop? 
org.apache.hadoop.mapreduce.Mapper  
org.apache.hadoop.mapreduce.Reducer 
 
What Mapper does? 
Maps are the individual tasks that transform input records into intermediate records. The 
transformed intermediate records do not need to be of the same type as the input records. A given 
input pair may map to zero or many  
output pairs 
 
Which are the methods in the Mapper interface? 
The Mapper contains the run() method, which call its own setup() method only once, it also call 
a map() method for each input and finally calls it cleanup() method. All above methods you can override 
in your code. 
 
What happens if you don’t override the Mapper methods and keep them as it is? 
If you do not override any methods (leaving even map as-is), it will act as  
the identity function, emitting each input record as a separate output. 
 
What is the use of Context object? 
    The Context object allows the mapper to interact with the rest of the Hadoop system. It 
Includes configuration data for the job, as well as interfaces which allow it to emit output. 
 
How can you add the arbitrary key-value pairs in your mapper? 
You can set arbitrary (key, value) pairs of configuration data in your Job, e.g. with 
Job.getConfiguration().set("myKey", "myVal")​, and then retrieve this data in your mapper with 
Context.getConfiguration().get("myKey")​. This kind of functionality is typically done in the Mapper's 
setup() method. 
 
 
How does Mapper’s run() method works? 
The Mapper.run() method then calls​ map(KeyInType, ValInType, Context) ​for each key/value 
pair in the InputSplit for that task 
 
Which object can be used to get the progress of a particular job ? 
Context 
 
What is next step after Mapper or MapTask? 
The output of the Mapper are sorted and Partitions will be created for the output. Number of 
partition depends on the number of reducer. 
 
What is the use of Combiner? 
It is an optional component or class, and can be specify via ​Job.setCombinerClass(ClassName)​, 
to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data 
transferred from the Mapper to the Reducer. 
 
How many maps are there in a particular Job? 
The number of maps is usually driven by the total size of the inputs, that is, the total number of 
blocks of the input files. Generally it is around 10-100 maps per-node. Task setup takes a while, so it is  
best if the maps take at least a minute to execute. Suppose, if you expect 10TB of input data and have a 
blocksize of 128MB, you'll end up with 82,000 maps, to control the number of block you can use the  
mapreduce.job.maps parameter (which only provides a hint to the framework). Ultimately, the number 
of tasks is controlled by the number of splits returned by  the InputFormat.getSplits() method (which 
you can override). 
 
26. What is the Reducer used for? 
Reducer reduces a set of intermediate values which share a key to a (usually smaller) set of values. 
The number of reduces for the job is set by the user via ​Job.setNumReduceTasks(int). 
 
27. Explain the core methods of the Reducer? 
Ans: The API of Reducer is very similar to that of Mapper, there's a ​run()​ method that receives a Context 
containing the job's configuration as well as interfacing methods that return data from the reducer 
itself back to the framework. The ​run() ​method calls ​setup() ​once, ​reduce()​ once for each key associated 
with the reduce task, and ​cleanup()​ once at the end. Each of these methods can access the job's 
configuration data by using ​Context.getConfiguration()​. As in Mapper, any or all of these methods can 
be overridden with custom  implementations. If none of these methods are overridden, the default 
reducer  
operation is the identity function; values are passed through without further processing. 
The heart of Reducer is its ​reduce()​ method. This is called once per key; the second argument is an 
Iterable which returns all the values associated with that key. 
 
What are the primary phases of the Reducer? 
Shuffle, Sort and Reduce 
 
Explain the Reducer’s Sort phase? 
Ans: The framework groups Reducer inputs by keys (since different mappers may have output 
the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are 
being fetched they are merged (It is similar to merge-sort). 
 
31. Explain the Reducer’s reduce phase? 
Ans: In this phase the ​reduce(MapOutKeyType, Iterable, Context)​ method is called for each pair 
in the grouped inputs. The output of the reduce task is typically written to the FileSystem via 
Context.write(ReduceOutKeyType, ReduceOutValType).​ Applications can use the Context to report 
progress, set application-level status messages and update Counters, or just indicate that they are 
alive. The output of the Reducer is not sorted. 
 
32. How many Reducers should be configured? 
Ans: The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * 
mapreduce.tasktracker.reduce.tasks.maximum). With 0.95 all of the reduces can launch immediately 
and start transfering map outputs as the maps finish. With 1.75 the faster nodes will finish their first 
round of reduces and  launch a second wave of reduces doing a much better job of load balancing. 
Increasing  
the number of reduces increases the framework overhead, but increases load balancing  and lowers the 
cost of failures. 
 
What is the use of Combiners in the Hadoop framework? 
Combiners are used to increase the efficiency of a MapReduce program.  
They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners 
can help you reduce the amount of data that needs to be transferred across to the reducers. 
You can use your reducer code as a combiner if the operation performed is commutative and 
associative. 
The execution of combiner is not guaranteed; Hadoop may or may not execute a combiner. 
Also, if required it may execute it more than 1 times. Therefore your MapReduce jobs should not 
depend on the combiners’ execution 
 
What is a IdentityMapper and IdentityReducer in MapReduce? 
◦org.apache.hadoop.mapred.lib.IdentityMapper:​ Implements the identity function, mapping 
inputs directly to outputs. If MapReduce programmer does not set the Mapper Class using 
JobConf.setMapperClass then IdentityMapper.class is used as a default value. 
◦org.apache.hadoop.mapred.lib.IdentityReducer : ​Performs no reduction, writing all input 
values directly to the output. If MapReduce programmer does not set the Reducer Class using 
JobConf.setReducerClass then IdentityReducer.class is used as a default value. 
 
 
Mention what are the main configuration parameters that user need to specify to run Mapreduce 
Job ? 
 
The user of Mapreduce framework needs to specify 
 
- Job’s input locations in the distributed file system 
- Job’s output location in the distributed file system 
- Input format 
- Output format 
- Class containing the map function 
- Class containing the reduce function 
- JAR file containing the mapper, reducer and driver classes 
 
Explain what is Sequencefileinputformat? 
Sequencefileinputformat is used for reading files in sequence. It is a specific compressed binary 
file format which is optimized for passing data between the output of one MapReduce job to the input 
of some other MapReduce job. 
 
 
 
 
 
 
 
 
 
 
 
 
 
Common Input Formats defined in Hadoop  
a. TextInputFormat 
b. KeyValueInputFormat 
c. SequenceFileInputFormat 
 
What is InputSplit in Hadoop? 
When a Hadoop job is run, it splits input files into chunks and assign each split to a mapper to 
process. This is called InputSplit. 
 
What is the purpose of RecordReader in Hadoop? 
The InputSplit has defined a slice of work, but does not describe how to access it. The 
RecordReader class actually loads the data from its source and converts it into (key, value) pairs 
suitable for reading by the Mapper. The RecordReader instance is defined by the Input Format. 
 
After the Map phase finishes, the Hadoop framework does “Partitioning, Shuffle and sort”. 
Explain what happens in this phase? 
Partitioning:​ It is the process of determining which reducer instance will receive which 
intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which 
reducer will receive them. It is necessary that for any key, regardless of which mapper instance 
generated it, the destination partition is the same. 
Shuffle:​ After the first map tasks have completed, the nodes may still be performing several 
more map tasks each. But they also begin exchanging the intermediate outputs from the map tasks to 
where they are required by the reducers. This process of moving map outputs to the reducers is known 
as shuffling. 
Sort: ​Each reduce task is responsible for reducing the values associated with several 
intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop 
before they are presented to the Reducer. 
 
What is a Combiner? 
The Combiner is a ‘mini-reduce’ process which operates only on data generated by a mapper. 
The Combiner will receive as input all data emitted by the Mapper instances on a given node. The 
output from the Combiner is then sent to the Reducers, instead of the output from the Mappers. 
 
What are some typical functions of Job Tracker? 
 The following are some typical tasks of JobTracker:- 
- Accepts jobs from clients 
- It talks to the NameNode to determine the location of the data. 
- It locates TaskTracker nodes with available slots at or near the data. 
- It submits the work to the chosen TaskTracker nodes and monitors progress of each task by receiving 
heartbeat signals from Task tracker. 
 
How does speculative execution work in Hadoop?   
JobTracker makes different TaskTrackers process same input. When tasks complete, they 
announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive 
copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks 
and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed 
successfully, first. 
 
What is Hadoop Streaming?   
Streaming is a generic API that allows programs written in virtually any language to be used as 
Hadoop Mapper and Reducer implementations. 
 
What is Distributed Cache in Hadoop? 
Distributed Cache is a facility provided by the MapReduce framework to cache files (text, 
archives, jars and so on) needed by applications during execution of the job. The framework will copy 
the necessary files to the slave node before any tasks for the job are executed on that node. 
 
How can you set an arbitrary number of mappers to be created for a job in Hadoop? 
 You cannot set it. 
 
How can you set an arbitrary number of Reducers to be created for a job in Hadoop? 
You can either do it programmatically by using method setNumReduceTasks in the Jobconf 
Class or set it up as a configuration setting. 
 
How will you write a custom partitioner for a Hadoop job? 
 To have Hadoop use a custom partitioner you will have to do minimum the following three: 
- Create a new class that extends Partitioner Class 
- Override method getPartition 
- In the wrapper that runs the Mapreduce, either 
- Add the custom partitioner to the job programmatically using method set Partitioner Class or – add 
the custom partitioner to the job as a config file (if your wrapper reads from config file or oozie) 
 
 
 
 
 
 
 
 
 

More Related Content

What's hot

Read, store and create xml and json
Read, store and create xml and jsonRead, store and create xml and json
Read, store and create xml and jsonKim Berg Hansen
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010Thejas Nair
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on HadoopChung-Tsai Su
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answerstechieguy85
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questionsKalyan Hadoop
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Konrad Malawski
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigJason Shao
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1Charles Givre
 

What's hot (20)

Read, store and create xml and json
Read, store and create xml and jsonRead, store and create xml and json
Read, store and create xml and json
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on Hadoop
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 

Similar to Hadoop Interview Questions

ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)moai kids
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
Spring data iii
Spring data iiiSpring data iii
Spring data iii명철 강
 
Hadoop Hackathon Reader
Hadoop Hackathon ReaderHadoop Hackathon Reader
Hadoop Hackathon ReaderEvert Lammerts
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063Madhusudan Anand
 
Integrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleIntegrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleJyotirmoy Pramanik
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction ISSGC Summer School
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce Amazon Web Services
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 

Similar to Hadoop Interview Questions (20)

ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
Spring data iii
Spring data iiiSpring data iii
Spring data iii
 
Hadoop Hackathon Reader
Hadoop Hackathon ReaderHadoop Hackathon Reader
Hadoop Hackathon Reader
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Unit 2
Unit 2Unit 2
Unit 2
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063
 
Integrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleIntegrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracle
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Cake php
Cake phpCake php
Cake php
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 

More from Venkateswaran Kandasamy (6)

Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Hadoop YARN
Hadoop YARN Hadoop YARN
Hadoop YARN
 
Lambda Architecture
Lambda ArchitectureLambda Architecture
Lambda Architecture
 
Hive training
Hive trainingHive training
Hive training
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Scala in a nutshell by venkat
Scala in a nutshell by venkatScala in a nutshell by venkat
Scala in a nutshell by venkat
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Hadoop Interview Questions