1. Question1. Which of the followingis a monitoring solution
for hadoop?
1. Sirona
2. Sentry
3. Slider
4. Streams
Question2. __________ is a distributed machine learning
framework on top of spark
1. MLlib
2. Spark Streaming
3. GraphX
4. RDDs
Question3. Point out the correct statement?
1. Knox is a stateless reverse proxy framework
2. Knox also intercepts REST/HTTP calls and provides
authentication
3. Knox scales linearlyby adding more knox nodes as the
load increases
4. All of the mentioned
Question4. PCollection,PTable, and PGroupedTableall
support a __________ operation.
1. Intersection
2. Union
3. OR
2. 4. None of the mentioned
Question 5. How many types of mode are present in
Hama?
1. 2
2. 3
3. 4
4. 5
Question6. The IBM ____________ Platform provides all
the foundational buildingblocks of trusted information,
includingdata integration, data warehousing, master data
management, big data and information governance.
1. Infostream
2. Infosphere
3. Infosurface
4. Infodata
Question7. ________ is the name of the archive you would
like to create.
1. Archive
2. Archive name
3. Name
4. None of the mentioned
Question 8. Ambari provides a _______API that enables
integration with existing tools, such as Microsoft System
Center.
3. 1. Restless
2. Web services
3. Restful
4. None of the mentioned
Question9. _______ forge software for the development of
software projects.
1. Oozie
2. Allura
3. Ambari
4. All of the mentioned
Question10. Posting format now uses a __________ API
when writing postings just like doc values.
1. Push
2. Pull
3. Read
4. All of the mentioned
Question11. Point out the correct statement
1. BuildingPylucene requires CNU make, a recent version
of ant capableof buildingjavalucene and a c++ compiler
2. Pylucene is supported on Mac OS X, linux, SOlaries and
windows
3. Use of the setuptoolsis recommended for lucene
4. All the mentioned
4. 5. Question12. ________ buildsvirtualmachines of
branches trunk and 0.3 for KVM, VMWare and virtual
box.
1. Bigtop-trunk-pakagetest
2. Bigtop-trunk-repository
3. Bigtop-VM-matrix
4. None of the mentioned
Question13. Zookeeper is used for configuration,leader
election in cloud edition of
1. Solr
2. Solur
3. Solar101
4. Solr
Question14. How are keys and values presented and passed
to the reducers during a standard sort and shuffle phase of
Mapreduce?
1. Keys are presented to reducer in sorted order; values for
a given key are not sorted
2. Keys are presented to reducer in sorted order; values for
a given key are sorted in ascending order
3. Keys are presented to reducer in random order; values
for a given key are not sorted
4. Keys are presented to reducer in random order; values
for a given key are sorted in ascending order
Question15. Datastage RTI is real time integrationpack for:
5. 1. STD
2. ISD
3. EXD
4. None of the above
Question16. Which mapreduce stage serves as a barrier,
where all the previous stages must be completed before it
may proceed?
1. Combine
2. Group (a.k.a. ‘shuffle’)
3. Reduce
4. Write
Question17. Which of the following format is more
compression aggressive?
1. Partition compressed
2. Record compressed
3. Block compressed
4. Uncompressed
Question18. _________ is the way of encodingstructured
data in an efficient yet extensible format.
1. Thrift
2. Protocol buffers
3. Avro
4. None of the above
6. Question19. Which of the following argument is not
supported by import-all-tabletool?
1. Class name
2. Package name
3. Database name
4. Table name
Question20. Which of the following operating system is not
supported by big top?
1. Fedora
2. Solaris
3. Ubuntu
4. SUSE
Question21. Distributed modes are mapped in the _____ file.
1. Groomservers
2. Grervers
3. Grsvers
4. Groom
Question22. ________ is the architectural center of hadoop
that allowsmultipledata processing engines.
1. YARN
2. Hive
3. Incubator
4. Chuckwa
7. Question23. Users can easily run spark on top of
amazons____________
1. ‘infosphere
2. ‘EC2
3. EMR
4. None of the above
Question24. Which of the following projects is interface
definitionlanguage for hadoop?
1. Oozie
2. Mahout
3. Thrift
4. Impala
Question25. Output of the mapper is first written on the local
disk for sorting and _____ process.
1. Shuffling
2. Secondary sorting
3. Forking
4. Reducing
Question26. HDT projects work with eclipse version _____
and above
1. 3.4
2. 3.5
3. 3.6
4. 3.7
8. Question27. Which of the following languageis not
supported by spark?
1. Java
2. Pascal
3. Scala
4. Python
Question28. Data analyticsscripts are written in __________
1. Hivw
2. CQL
3. Piglatin
4. Java
Question29. Ripper is a browser based mobile phone
emulatordesigned to aid in the development of ______
bases mobile application.
1. Javascript’
2. Java
3. C++
4. HTML5
Question30. If you set the inlineLOB limit to ____, all large
objects will be placed in external storage.
1. 0
2. 1
3. 2
4. 3
9. Question31. Hadoop archives reliabilityby replicatingthe
data across multiplehosts, and hence does not require _____
storage on hosts.
1. RAID
2. Standard RAID levels
3. ZFS
4. Operating system
Question32. The configuration file must be owned by the
user running
1. Data manager
2. Node manager
3. Validationmanager
4. None of the above
Question33. ________ is non blockinga synchronous event
driven high performance web framework
1. AWS
2. AWF
3. AWT
4. ASW
Question34. Falcon provides seamless integrationwith
1. HCatalog
2. Metastore
3. HBase
4. Kafka
10. Question35. One supported datatype that deserves
special mention are:
1. Money
2. Counters
3. Smallint
4. Tinyint
Question36. _______ are chukwa processes that actually
produce data
1. Collectors
2. Agents
3. Hbase table
4. HCatalog
Question37. Which of the following hadoopfile formats is
supported by impala?
1. Sequencefile’
2. Avro
3. Rcfile
4. All of the above
Question38. Avro is said to be the future ___________ layer
of hadoop
1. RMC
2. RPC
3. RDC
4. All of the above
11. Question39. ______ nodes are the mechanism by which a
workflow triggers the execution of a computation/processing
task
1. Server
2. Client
3. Mechanism
4. Action
Question40. The _______ attribute in the join node is the
name of the workflow join node
1. Name
2. To
3. Down
4. All of the above
Question41. Yarn commands are invoked by the _____ script
1. Hive
2. Bin
3. Hadoop
4. Home
Question42. Which of the following function is used to read
data in PIG?
1. Write
2. Read
3. Load
4. None of the above
12. Question43. Which of the following hive commands is not
supported by hcatalog?
1. Alter index rebuild
2. Create new
3. Show functions
4. Drop table
Question44. Apache hadoopdevelopmenttools is an effort
undergoing incubationat
1. ADF
2. ASF
3. HCC
4. AFS
Question45. Kafka users key value pairs in the _________ file
format for configuration
1. RFC
2. Avro
3. Property
4. None of the above
Question46. Facebook tackles big data with __________
based in hadoop
1. Project prism
2. Prism
3. Project big
4. Project data
13. Question47. The size of block in HDFCs is
1. 512 bytes
2. 64 mb
3. 1024 kb
4. None of the above
Question48. Which is the most popularNoSQL databases for
scalable big data store with hadoop?
1. Hbase
2. mongoDB
3. Cassandra
4. None of the above
Question 49. A ________- can route requests to multiple
knox instances
1. Collector
2. Load balancer
3. Comparator
4. All of the above
Question50. Hcatalog is installedwith hive, starting with hive
release
1. 0.10..0
2. 0.9.0
3. 0.11.0
4. 0.12.0
Question51. Tablemetadata in hive is:
14. 1. Stored as metadata on the name node
2. Stored along with the data in HDFCs
3. Stored in the metastore
4. Stored in zookeeper
Question52. Avro schemes are defined with ________
1. JSON
2. XML
3. JAVA
4. All of the above
Question53. Spark was initiallystarted by ___________ at uc
Berkeley AMPlab in 2009
1. Matei Zaharia
2. Mahek Zaharia
3. Doug cutting
4. Stonebreaker
Question54. __________ does rewrite data and pack rows
into column for certain time periods
1. Open TS
2. Open TSDB
3. Open TSD
4. Open DB
Question55. Which of the following phrases occur
simultaneously
1. Shuffle and sort
15. 2. Reduce and sort
3. Shuffle and map
4. All of the above
Question56. ________ command fetches the contents of row
or a cell
1. Select
2. Get
3. Put
4. None of the above
Quesiotn57. _______ are encoded as a series of blocks
1. Arrays
2. Enum
3. Unions
4. Maps
Question58. Hive also support custom extensions written in
1. C#
2. Java
3. C
4. C++
Question59. How many types of nodes are present in storm
cluster?
1. 1
2. 2
3. 3
16. 4. 4
Question60. All decision nodes must have a ____________
element to avoidbringing the workflow into an error state if
none of the predicatesevaluatesto true.
1. Name
2. Default
3. Server
4. Client
Question61. ________ is a rest API for Hcatalog
1. Web hcat
2. Wbhcat
3. Inphcat
4. None of the above
Question62. Streaming supports streaming commands option
as well as ____________ command options
1. Generic
2. Tool
3. Library
4. Task
Questio63. By default collectorslisten on port
1. 8008
2. 8070
3. 8080
4. None of the above
17. Question64. _______ communicate with the client and
handledata related operations.
1. Master server
2. Region server
3. Htable
4. All of the above
Question65. We can declare the scheme of our data either in
________ file
1. JSON
2. XML
3. SQL
4. VB
Question66. ________ provides a couchbase server hadoop
connector by means of sqoop
1. Memcache
2. Couchbase
3. Hbase
4. All of the above
Question67. Storm integrates with _________ via apache
slider
1. Scheduler
2. Yarn
3. Compaction
4. All of the above
18. Question68. Avro-backed table can simply be created by
using ___________ in a DDL statement
1. Stored as avro
2. Stored as hive’
3. Stored as avrohive
4. Stored as serd
Question69. Drill analyze semistructured/nested data coming
from ______applications
1. RDBMS
2. NoSQL
3. newSQL
4. none of the above
Question70. The hadoop list includes the HBase Database,
the apache mahout __________ system and matrix
operations.
1. Machinelearning
2. Pattern recognition
3. Statistical classification
4. Articficial classification
Question71. Oozie workflow jobs are directed _______
graphs of actions
1. Acyclical
2. Cyclical
3. Elliptical
19. 4. All of the above
Question72. ___ is an open source SQL query engine for
apache Hbase
1. Pig
2. Phoenix
3. Pivot’
4. None of the above
Question73. $ pig x tez_local will enable _____ mode in pig
1. Mapreduce
2. Tez
3. Local
4. None of the above
Question74. In comparison to SQl, pig uses
1. Lazy evaluation
2. ETL
3. Supports pipelinessplits
4. All of the above
Question75. For Apache _________ users, storm utilizes the
same ODBC interfaces
1. C takes
2. Hive
3. Pig
4. Oozie
20. Question76. In one or more actionsstarted by the workflow
job are executed when the _________ node is reached, the
actionswill be killed.
1. Kill’
2. Start
3. End
4. Finish
Question77. Which of the following data type is supported by
hive?
1. Map
2. Record
3. String
4. Enum
Question78. Hcatalog supports reading and writing files in
any format for which a _____ can be written
1. SerDE
2. SaerDear
3. Doc Sear
4. All
Question79. _______ is python port of the core project
1. Solr
2. Lucene core
3. Lucy
4. Pylucene
21. Question80. Apache storm added open source, stream data
processing to _________ data platform
1. Cloudera
2. Hortonworks
3. Local cloudera
4. Map R
Question81. Which of the following is spatialinformation
system?
1. Sling
2. Solr
3. SIS
4. All of the above
Question82. _______ properties can be overriddenby
specifying them in a job-xml file or configuration element.
1. Pipe
2. Decision
3. Flag
4. None of the above
Question83. CDH process and control sensitive data and
facilities:
1. Multi-tenancy
2. Flexibility
3. Scalability
4. All of the above
22. Qyestion84. Avro supports _________ kinds of complex types
1. 3
2. 4
3. 6
4. 7
Question85. With _________we can store data and read it
easily with variousprogramming languages.
1. Thrift
2. Protocol buffers
3. Avro
4. None of the above
Question86. A float parameter defaultsto 0.0001f, which
means we can deal with 1 error every ________ rows
1. 1000
2. 10000
3. 1 millionrows
4. None of the above
Question87. The ________ data mapper framework makes it
easier to use a databasewith Java or.NET applications
1. iBix
2. Helix
3. iBATIS
4. iBAT
23. Question88. ___________ is the most popularhigh level java
API in Hadoop Ecosystem
1. scalding
2. HCatalog
3. Cascalog
4. Cascading
Question89. Spark includesa collection over _________
operationsfor transforming data and familier data frame
APIs for manipulatingsemi-structured data
1. 50
2. 60
3. 70
4. 80
Question90. Zookeper’s architecture supports high ________
through redundantservices
1. Flexibilty’
2. Scalability
3. Availability
4. Interactivity
Question91. The Lucene ____________ is pleasedto
announcethe availabilityof Apache Lucene 5.0.0 and Apache
solr 5.0.0
1. PMC
2. RPC
24. 3. CPM
4. All of the above
Question92. EC2 capacity can be increased or decreased in
real time from as few as one to more than ________ virtual
machines simultaneousl
1. 1000
2. 2000
3. 3000
4. None of the above
Question93. HTD has been tested on_________- and Juno.
And can work 0n kepler as well
1. Raibow
2. Indigo
3. Idiavo
4. Hadovo
Question94. Each kafka partitionhas one server which acts as
the __________
1. Leaders
2. Followers
3. Staters
4. All of the above
Question95. The right numbers of reduces seems to be
1. 0.9
2. 0.8
25. 3. 0.36
4. 0.95
Question96. Which of the following is a configuration
management system?
1. Alex
2. Puppet
3. Acem
4. None of the above
Question97. Which of the following is the only for storage
with limited compute?
1. Hot
2. Cold
3. Warm
4. All_SSD
Question98. Grooms servers start up with a _______ instance
and a RPC proxy to contact the bsp master
1. RPC
2. BSP Peer
3. LPC
4. None of the above
Question99. A ________ represents a distributed, immutable
collectionof elements of type t.
1. Pcollect
2. Pcollection
26. 3. Pcol
4. All of the above
Question100. ________ is used to read data from bytes
buffers
1. Write{}
2. Read{}
3. Readwrite{}
4. All of the above
Q101-Which is the default Input Formats defined in Hadoop?
1. SequenceFileInputFormat
2. ByteInputFormat
3. KeyValueInputFormat
4. TextInputFormat
Q102. Which of the following is not an inputformat in
Hadoop?
1.TextInputFormat
2. ByteInputFormat
3. SequenceFileInputFormat
27. 4. KeyValueInputFormat
Q103. Which of the following is a valid flow in Hadoop ?
1.Input -> Reducer -> Mapper -> Combiner -> -> Output
2. Input -> Mapper -> Reducer -> Combiner -> Output
3. Input -> Mapper -> Combiner -> Reducer -> Output
4. Input -> Reducer -> Combiner -> Mapper -> Output
Q104. MapReduce was devised by ...
1.Apple
2. Google
3. Microsoft
4. Samsung
Q105. Which of the following is not a phase of Reducer ?
1. Map
2. Reduce
3. Shuffle
28. 4. Sort
Q106. How many instances of Job tracker can run on Hadoop
cluster ?
1.1
2. 2
3.3
4.4
Q107. Which of the following is not the Dameon process that
runs on a hadoopcluster ?
1.JobTracker
2.DataNode
3.TaskTracker
4.TaskNode
Q108-As companies move past the experimental phase with
Hadoop,many cite the need for additionalcapabilities,
including:
29. 1.Improved data storage and informationretrieval
2.Improved extract, transform and loadfeatures for data
integration
3.Improved data warehousing functionality
4.Improved security, workload management and SQL
support
Q109-Point out the correct statement :
1.Hadoopdo need specializedhardware to process the
data
2.Hadoop2.0 allowslive stream processing of real time
data
3.In Hadoop programming framework output files are
dividedin to lines or records
4.None of the mentioned
Q110-. According to analysts, for what can traditionalIT
systems provide a foundationwhen they’re integrated with
big data technologies like Hadoop?
1.Big data management and data mining
2. Data warehousing and business intelligence
3.Management of Hadoopclusters
4.Collectingand storing unstructured data
30. Q111- Point out the wrong statement :
1.Hardtop’s processing capabilitiesare huge and its real
advantagelies in the abilityto process terabytes & petabytes
of data
2.Hadoopuses a programming model called
“MapReduce”,all the programs should confirms to this
model in order to work on Hadoopplatform
3.The programming model, MapReduce,used by Hadoop
is difficult to write and test
4.All of the mentioned
Q112- What was Hadoopnamed after?
1. Creator Doug Cutting’s favorite circus act
2.Cutting’s high school rock band
3.The toy elephantof Cutting’s son
4.A sound Cutting’slaptop made during Hadoop’s
development
Q113- All of the following accurately describe Hadoop,
EXCEPT:
1.Open source
2. Real-time
3.Java-based
4. Distributed computing approach
31. Q114- __________ can best be described as a programming
model used to develop Hadoop-basedapplicationsthat can
process massive amounts of data.
1.MapReduce
2.Mahout
3.Oozie
4.All of the mentioned
Q115- __________ has the world’s largest Hadoop cluster.
1.Apple
2. Datamatics
3.Facebook
4.None of the mentioned
Q116- Facebook Tackles Big Data With _______ based on
Hadoop.
1.‘Project Prism’
2.‘Prism’
3.‘Project Big’
4. ‘Project Data’
32. Q 117- What is the main problem faced while reading and
writing data in parallelfrom multiple disks?
1.Processing high volume of data faster.
2. Combining data from multipledisks.
3. The software required to do this task is extremely costly.
4. The hardware required to do this task is extremely costly.
Q118 - Under Hadoop High Availability,Fencing means
1.Preventing a previously active namenode from start
running again.
2. Preventing the start of a failover in the event of network
failure with the active namenode.
3. Preventing the power down to the previously active
namenode.
33. 4. Preventing a previously active namenodefrom writing to
the edit log.
Q119 - The default replicationfactor for HDFS file system in
hadoopis
1.1
2. 2
3. 3
4. 4
Q120 - The hadfscommand put is used to
1.Copy files from local file system to HDFS.
2. Copy files or directories from local file system to HDFS.
34. 3. Copy files from from HDFS to local filesystem.
4. Copy files or directories from HDFS to localfilesystem.
Q121 - The namenodeknows that the datanodeis active
using a mechanism known as
1.heartbeats
2. datapulse
3. h-signal
4. Active-pulse
Q122 - When a machine is declared as a datanode,the disk
space in it
1.Can be used only for HDFS storage
35. 2. Can be used for both HDFS and non-HDFs storage
3. Cannot be accessed by non-hadoop commands
4. cannot store text files.
Q123 - The data from a remote hadoopcluster can
1. not be read by another hadoopcluster
2. be read using http
3. be read using hhtp
4. be read suing hftp
Q124 - Which one is not one of the big data feature?
1. Velocity
36. 2. Veracity
3. volume
4. variety
Q125 - What is HBASE?
1. Hbase is separate set of the Java API for Hadoop cluster.
2. Hbase is a part of the Apache Hadoopproject that
provides interface for scanning large amount of data using
Hadoopinfrastructure.
3. Hbase is a "database"like interface to Hadoopcluster
data.
4. HBase is a part of the Apache Hadoop project that
provides a SQL like interface for data processing.
37. Q125 - Which of the following is false about RawComparator
?
1.Compare the keys by byte.
2. Performance can be improved in sort and suffle phase by
using RawComparator.
3. Intermediary keys are deserialized to perform a
comparison.
Q 126 - Zookeeper ensures that
1. All the namenodes are actively serving the client
requests
2. Only one namenode is actively serving the client requests
3. A failover is triggered when any of the datanodefails.
4. A failover can not be started by hadoopadministrator.
38. Q 127 - Which scenario demandshighest bandwidthfor data
transfer between nodes in Hadoop?
1. Different nodes on the same rack
2. Nodes on different racks in the same data center.
3. Nodes in different data centers
4. Data on the same node.
Q128 - The hadoopframe work is written in
1. C++
2. Python
3. Java
39. 4. GO
Q129 - When a client contacts the namenode for accessing a
file, the namenoderesponds with
1. Size of the file requested.
2. Block ID of the file requested.
3. Block ID and hostname of any one of the data nodes
containingthat block.
4. Block ID and hostname of all the data nodes containing
that block.
Q130 - Which of the following is not a goal of HDFS?
1. Fault detection and recovery
2. Handle huge dataset
40. 3. Prevent deletionof data
4. Provide high network bandwidthfor data movement
Q 131 - In HDFS the files cannot be
1. read
2. deleted
3. executed
4. Archived
Q132 - The number of tasks a task tracker can accept
dependson
1. Maximum memory availablein the node
41. 2. Not limited
3. Number of slots configured in it
4. As decided by the jobTracker
Q133 - When using HDFS, what occurs when a file is deleted
from the command line?
1. It is permanently deleted if trash is enabled.
2. It is placed into a trash directory common to all users for
that cluster.
3. It is permanently deleted and the file attributesare
recorded in a log file.
4. It is moved into the trash directory of the user who
deleted it if trash is enabled.
43. Question135. Mapreduce has undergone a complete
overhaul in hadoop?
1.0.21
2.0.23
3.0.24
4.0.26
Question136. __________ is the slave/worker node
and holds the user data in the form of data blocks
1.Data node
2.Name node
3.Data block
4.Replication
Qyestion137. Spark is engineered from the bottom up
for performance running __________
1.100x
2.150x
3.200x
4.None of the above
Question138. _________nodes are the mechanism by
which a workflow triggers the execution of a
computation/processing task
1.Server
44. 2.Client
3.Mechanism
4.Action
Question139. __________ maps input key/value pairs
to a set of intermediate key/value pairs
1.Mapper
2.Reducer
3.Mapper and reducer
4.None of the above
Question140. Zookeeper keep track of the cluster
state such as the ____________- table location
1.Domain
2.Node
3.Root
4.All of the above
Question141. When __________ contents exceed a
configurable threshold, the memtable data, which
includes indexes, is put in a queue to be flushed to disk
1.Subtable
2.Memtable
3.Intable
4.Memorytable
45. Question142. Apache knox accesses hadoop cluster
over
1.HTTP
2.TCT
3.ICMP
4.None of the above
Question143. ___________ supports a new command
shell beeline that works with hiveserver2.
1.Hiveserver2
2.Hiveserver3
3.Hiveserver4
4.Hiveserver5
Question144. ________ sink can be a text file, the
console display, a simple HDFC path or a null bucket
where the data is simply deleted
1.Collector tier event
2.Agent tier event
3.Basic
4.None of the above
Question145. __________ name node is used when
the primary name node goes down
1.Rack
46. 2.Data
3.Secondary
4.None of the above
Question146. Data transfer between web-console
and clients are protected by using
1.SSL
2.Kerberos
3.SSH
4.None of the above
Question147. Which of the following is one of the
possible state for a workflow jobs?
1.PREP
2.START
3.RESUME
4.END
Question148. Stratus will be a polygot _______
framework
1.Daas
2.Paas
3.Saas
4.Raas
47. Question149. All file access user java’s _______ APIs
which give lucen stronger index safety
1.NIO.2
2.NIO.3
3.NIO.4
4.NIO.5
Question150. Which of the following is a standard
compliment XML Query processor?
1.Whirr
2.VXQuery
3.Knife
4.Lens
Question151. ______ is a query processing and
optimization system for large-scale
1.MRQL
2.Nifi
3.Openaz
4.ODF toolkit
Question152. Reduce progress () gets the progress of
the jobs reduce tasks as a float between
1.0.0-1.0
2.1.0-2.0
3.2.0-3.0
48. 4.3.0-4.0
Question153. _____- is a framework for building java
server applications GUIs
1.My faces
2.Muse
3.Flume
4.Big top
Question154. Apache fkune 1.3.0 is the fourth release
under the auspices of apache of the so-called _____
codeline
1.NG
2.ND
3.NF
4.NR
Question155. Starting in hive ______ the avro scheme
can be inferred from the hive table scheme
1.0.14
2.0.12
3.0.13
4.0.11
Question156. A workflow definition is a ___ with
control flow nodes or action nodes
49. 1.CAG
2.DAG
3.BAG
4.None of the above
Question157. Lucene provides scalable high-
performance indexing over ______ per hour on
modern hardware
1.1TB
2.150GB
3.10GB
4.200 GB
Question158. The right level of parallelism for maps
seems to be around _____ maps pernode
1.1to 10
2.10 to 100
3.100 to 150
4.150 to 200
Question159. The LZO compression format is
composed of approximately ______ blocks of
compressed data
1.128k
2.256k
50. 3.24k
4.36k
Question160. ___________ is the software
development collaboration tool.
1.Buildr
2.Cassandra
3.Bloodhound
4.All of the above
Question161. A ________ is an operation on the
stream that can transform the stream
1.Decorator
2.Source
3.Sinks
4.All of the above
Question162. _________ has the worlds largest
hadoop cluster
1.Apple
2.Datamatics
3.Facebook
4.None of the above
51. Question163. When a ___________ is triggered the
client receives a packet saying that the znode has
changed
1.Event’
2.Watch
3.Row
4.Value
Question164. Ambary leverages __________ for
system altering and will send emails when your
attention is needed
1.Nagios
2.Nagaond
3.Ganglia
4.None of the above
Question165. ____ is a software distribution
framework based on OSGi
1.ACE
2.Abdera
3.Zeppelin
4.Accumulo
Question166. Which of the following is content
managementand punlishing system based on cocoon?
52. 1.Lib cloud
2.Kafka
3.Lenya
4.All of the above
Question167. If the failure is of ________ nature oozie
will suspend the workflow job
1.Transient
2.Non transient
3.Permanent
4.Non permanent
Question168. _________- node distributes code across
the cluster
1.Zookeeper
2.Nimbus
3.Supervisor
4.Non of the above
Question169. A workflow definition musthave one
_____ node
1.Start
2.Resume
3.Finish
4.Non of the above
53. Question170. _________ is a rest API for HCatalog
1.Webhcat
2.WbhCAT
3.InpJcat
4.None of the above
Question171. Which of the following fike contains user
defined functions (UDGCs)
1.Script2-local.pig
2.Pig.jar
3.Tutorial.jar
4.Excite.log.bz2
Question172. Helprace is using zookeeper on a ______
cluster in conjugation with hadoop and hBase
1.3 node
2.4 node
3.5 node
4.6 node
Question173. ____________ represents the logical
computations of your crunch pipelines
1.Do Fns
2.Three Fns
3.Do fn
54. 4.None of the above
Question174. _____________ has stronger ordering
guarantees than a traditional messaging system
1.Kafka
2.Slider
3.Suz
4.None of the above
Question175. HBase is _________ defines only column
families
1.Row oriented
2.Scheme less
3.Fixed scheme
4.All of the above
Question176. An input ___________ is a chunk of the
input that is processed by a single map
1.Textformat
2.Split
3.Datanode
4.All of the above
Question177. ___________ permits data written by
one system to be efficiency sorted by another system
1.Complex data type
55. 2.Order
3.Sort order
4.All of the above
Question178. __________ text is appropriate for most
non binary data types
1.Character
2.Binary
3.Delimited
4.None of the above
Question179. __________ is an open source set of
libraries tools examples and documentation
engineered
1.Kite
2.Kize
3.Ookie
4.All of the above
Question180. Map output larger than_______ percent
of the memory allocated to copying map outputs
1.10
2.15
3.25
4.35
56. Question181. Cassandra creates a __________
for each table which allows you to symlink a table to a
chosen physical drive or data volume
1.Directory
2.Subdirectory
3.Domain
4.Path
Question182. Use ________ and embedded the
scheme in the create statement
1.Scheme.literal
2.Scheme.lit
3.Row.literal
4.All of the above
Question183. Which of the following can be used to
launch spark jobs inside map reduce?
1, SIM
2. SIMR
3. SIR
4. RIS
Question184. HDFS works in a __________ fashion
1.Master worker
57. 2.Master slave
3.Worker/slave
4.All of the above
Question185. HDFS by default replicates each data
block ____ times on different nodes and on at least
____ racks
1.3,2
2.1,2
3.2,3
4.1,3
Question186. You can run pig in batch mode using
__________
1.Pig shell command
2.Pig scripts
3.Pig options
4.All of the above
Question187. Which of the following is the primitive
data type in Avro?
1.Null
2.Boolean
3.Float
4.All of the above
58. Question188. ___________ name node is used when
the primary name node goes down
1.Rack
2.Data
3.Secondary
4.None of the abive
Question189. Which command is used to disable all the
tables matching the given regex?
1.Remove all
2.Drop all
3.Disable all
4.All of the above
Question190. Ambari ___________ deliver a template
approach to cluster deployment
1.View
2.Stack advisor
3.Blueprints
4.All of the above
Question191. Cassandra uses a protocol called
__________ to discover location and state information
1.Gossip
2.Intergos
59. 3.Goss
4.All of the above
Question192. Gzip (short for GNU zip) generates
compressed files that have a ________extension
1..gzip
2..gz
3..gzp
4..g
Question193. Falcon provides _________ workflow for
copying data from source to target.
1.Recurring
2.Investment
3.Data
4.None of the above
Question194. ___________- is the node responsible for
all reads amd writes for the given partition
1.Replicas
2.Leader
3.Follower
4.Isr
Question195. The compression offset map grows to
_______ gb per terabyte compressed
60. 1.1-3
2.42659
3.20-22
4.0-1
Question196. Drill also provides intuitive extensions to
SQL to work with ____________ data types
1.Simple
2.Nested
3.Int
4.All of the above
Question197. Hive uses _________- for logging
1.Logj4
2.Log4l
3.Log4i
4.Log4j
Question198. Spark SQL provides a domain specific
language to manipulate _______________---- in scala,
java or python
1.Spark streaming
2.Spark SQL
3.RDDs
4.All of the above
61. Question199. HBase is a distributed _________
database built on top of the hadoop file system
1.Column oriented
2.Row oriented
3.Tuple oriented
4.None of the above
Question200. Which of the following has method to
deal with metadata?
1.Load push down
2.Load metadata
3.Load caster
4.All of the above
Question201. Which of the following is a collaborative
data analytics and visualization tool?
1.ACE
2.Abdera
3.Zeppelin
4.Accumulo
Question202. Ignite is a unified _______ data fabric
providing high performance, distributed im memory
data management
1.Column
62. 2.In memory
3.Row oriented
4.Column oriented
Question203. Avro messages are framed as a list of
__________
1.Buffers
2.Frames
3.Rows
4.Column
Question204. ___________is a distributed and scalable
OLAP engine built on hadoop to support extremely
large data sets
1.Kylin
2.Lens
3.Log4cxx2
4.MRQL
Question205. Sqoop is an open source tool written
at___________
1.Cloudera
2.IBM
3.Microsoft
4. All of the above
63. Question206. Zookeeper essentially mirrors the
_______ functionality exposed in the linux kernel
1.Iread
2.Inotify
3.Iwrite
4.Icount
Question207. Apache bigtop uses _________ for
continuous integration testing
1.Jenkinstop
2.Jerry
3.Jenkins
4.None of the above
Question208. Which of the following command is used
to show values to key used in pig?
1.Set
2.Declare
3.Display
4.All of the above
Question209. For apache ________ users storm utilizes
the same ODBC interfaces
1.C takers
2.Hive
64. 3.Pig
4.Oozie
Question210. The tokens are passed through a lucene
___________ to produce NGrams of the desired length
1.Shnglefil
2.Shingle filter
3.Single filter
4.Collfilter