1. Course Name: Big data analytics & business Intelligence
1. When is the earliest point at which the reduce method of a given Reducer can be called?
a) As soon as at least one mapper has finished processing its input split.
b) As soon as a mapper has emitted at least one record.
c) Not until all mappers have finished processing all records.
d) It depends on the InputFormat used for the job.
2. Point out the wrong statement :
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to
3. Which phase of the data analytics lifecycle usually takes the longest?
a) Phase 2: Data Preparation
b) Phase 3: Model Planning
c) Phase 4: Model Building
d) Phase 5: Communicate Results
4. Hadoop achieves reliability by replicating the data across multiple hosts, and hence does not
require................... storage on hosts
a) RAID b) ZFS c) DFS d) HFS
5. .....................is a Platform for constructing data flows for extract, transform, and load (ETL) processing and
analysis of large datasets.
a)Pig latin b) Oozie c) Pig d) Hive
6. Facebook Tackles Big Data with __________ based on Hadoop.
a)Prism b) Project Prism c) Project Big d) Project data
7. The number of maps is usually driven by the total size of
a) Task b) Output c) Input d) None
8. Input to the _______________ is the sorted output of the mappers.
a) Reducer b) Mapper c) Shuffle d) All above
2. 9. _________________ is the world’s most complete, tested, and popular distribution of Apache Hadoop and
related projects.
a) MDH b) CDH c) ADH d) BDH
10. Which of the following is base package for R Language?
a) Util b) lang c) tools d) all the above