Big Data Solutions MCQ With Answer
only one NameNode per cluster.
a. Data Node
b. NameNode
c. Data block
d. Replication
Answer: b
2. Point out the correct statement.
a. DataNode is the slave/worker node and
holds the user data in the form of Data
Blocks
b. Each incoming file is broken into 32 MB by
default
c. Data blocks are replicated across different
nodes in the cluster to ensure a low degree
of fault tolerance
d. None of the mentioned
Answer: a
3. HDFS works in a __________ fashion.
a. master-worker
b. master-slave
c. worker/slave
d. all of the mentioned
Answer: a
4. ________ NameNode is used when the
Primary NameNode goes down.
a. Rack
b. Data
c. Secondary
d. None of the mentioned
Answer: c
5. Point out the wrong statement.
a. Replication Factor can be configured at a
cluster level (Default is set to 3) and also at
a file level
b. Block Report from each DataNode contains
a list of all the blocks that are stored on that
DataNode
c. User data is stored on the local file system
of DataNodes
d. DataNode is aware of the files to which
the blocks stored on it belong to
Answer: d
6. Which of the following scenario may not be a
good fit for HDFS?
a. HDFS is not suitable for scenarios
requiring multiple/simultaneous writes to
the same file
b. HDFS is suitable for storing data related to
applications requiring low latency data
access
c. HDFS is suitable for storing data related to
applications requiring low latency data
access
d. None of the mentioned
Answer: a
7. The need for data replication can arise in
various scenarios like ____________
a. Replication Factor is changed
b. DataNode goes down
c. Data Blocks get corrupted
d. All of the mentioned
Answer: d
8. ________ is the slave/worker node and holds
the user data in the form of Data Blocks.
a. DataNode
b. NameNode
c. Data block
d. Replication
Answer: a
9. HDFS provides a command line interface
called __________ used to interact with HDFS.
a. “HDFS Shell”
b. “FS Shell”
c. “DFS Shell”
d. None of the mentioned
Answer: b
10. HDFS is implemented in ___________
programming language.
a. C++
b. Java
c. Scala
d. None of the mentioned
Answer: b
11. For YARN, the ___________ Manager UI
provides host and port information.
a. Data Node
b. NameNode
c. Resource
d. Replication
Answer: c
12. Point out the correct statement.
a. The Hadoop framework publishes the
job flow status to an internally
running web server on the master
nodes of the Hadoop cluster
b. Each incoming file is broken into 32 MB
by default
c. Data blocks are replicated across
different nodes in the cluster to ensure a
low degree of fault tolerance
d. None of the mentioned
Answer: a
13. For ________ the HBase Master UI provides
information about the HBase Master uptime.
a. HBase
b. Oozie
c. Kafka
d. All of the mentioned
Answer: a
14. During start up, the ___________ loads the file
system state from the fsimage and the edits log
file.
a. DataNode
b. NameNode
c. ActionNode
d. None of the mentioned
Answer: b
15. What is the utility of the HBase ?
a. It is the tool for Random and Fast
Read/Write operations in Hadoop
b. Acts as Faster Read only query engine in
Hadoop
c. It is MapReduce alternative in Hadoop
d. It is Fast MapReduce layer in Hadoop
Answer: a
16. What is Hive used as?
a. Hadoop query engine
b. MapReduce wrapper
c. Hadoop SQL interface
d. All of the above
Answer: d
17. What is the default size of the HDFS block ?
a. 32 MB
b. 64 KB
c. 128 KB
d. 64 MB
Answer: d
18. In the HDFS what is the default replication
factor of the Data Node?
a. 4
b. 1
c. 3
d. 2
Answer: c
19. What is the protocol name that is used to create replica in HDFS?
a. Forward protocol
b. Sliding Window Protocol
c. HDFS protocol
d. Store and Forward protocol
Answer: c
20. HDFS data blocks can be read in parallel.
a. True
b. False
Answer: a
21. Which of the following is fact about combiners
in HDFS?
a. Combiners can be used for mapper
only job
b. Combiners can be used for any Map
Reduce operation
c. Mappers can be used as a combiner
class
d. Combiners are primarily aimed to
improve Map Reduce performance
e. Combiners can’t be applied for
associative operations
Answer: d
22. In HDFS the Distributed Cache is used in which
of the following
a. Mapper phase only
b. Reducer phase only
c. In either phase, but not on both sides
simultaneously
d. In either phase
Answer: d
23. Which of the following type of joins can be
performed in Reduce side join operation?
a. Equi Join
b. Left Outer Join
c. Right Outer Join
d. Full Outer Join
e. All of the above
Answer: e
24. A Map reduce function can be written:
a. Java
b. Ruby
c. Python
d. Any Language which can read from input stream
Answer: d
25. In the map is there any input format?
a. Yes, but only in Hadoop 0.22+.
b. Yes, there is a special format for map files.
c. No, but sequence file input format can
read map files.
d.Both 2 and 3 are correct answers
Answer: c
26. Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
a. Split
b.Map
c. Combine
d.Reduce
Answer: a
27. Which method of the FileSystem object is used
for reading a file in HDFS
a. open()
b. access()
c. select()
d. None of the above
Answer: a
28. The world’s largest Hadoop cluster.
a. Apple
b.Facebook
c. Datamatics
d. None of the mentioned
Answer: b
29. The Big Data Tackles Facebook are based on
on________ Hadoop.
a. ‘Project Data
b.‘Prism’
c. ‘Project Big’
d.‘Project Prism’
Answer: d
30. Which SequenceFile are present in Hadoop I/O
?
a. 2
b. 8
c. 9
d. 3
Answer: c
31. slowest compression technique is ______
a. Bzip2
b.LZO
c. Gzip
d. All of the mentioned
Answer: c
32. Which of the following is a typically
compresses files which are best available
techniques.10% to 15 %.
a.Bzip2
b.LZO
c. Gzip
d. both Dand C
Answer: a
33. Which of the following is provides search
technology? and Java-based indexing
a. Solr
b. Lucy
c. Lucene Core
d. None of these
Answer: c
34. Are defined with Avro schemas _____
a. JAVA
b. XML
c. All of the mentioned
d. JSON
Answer: d
35. _________ of the field is used to Thrift
resolves possible conflicts.
a. Name
b. UID
c. Static number
d. All of the mentioned
Answer: c
36. ______ layer of is said to be the future Hadoop.
Avro.
a. RMC
b. RPC
c. RDC
d. All of the mentioned
Answer: b
37. High storage density Which of the following has
high storage density?
a. RAM_DISK
b.ARCHIVE
c. ROM_DISK
d. All of the mentioned
Answer: b
38. HDFS provides a command line interface called
__________ used to interact with HDFS.
a. “HDFS Shell”
b. “FS Shell”
c. “DFS Shell”
d. None of the mentioned
Answer: b
39. Which format from the given format is more
compression-aggressive?
a. Partition Compressed
b. Record Compressed
c. Block-Compressed
d. Uncompressed
Answer: c
40. Avro schemas describe the format of the
message and are defined using ____
a. JSON
b. XML
c. JS
d. All of the mentioned
Answer: b
41. Which editor is used for editing files in HDFS
a. Vi Editor
b. Python editor
c. DOS editor
d. DEV C++ Editor
Answer: a
42. Command to view the directories and files in
specific directory:
a. Ls
b. Fs –ls
c. Hadoop fs –ls
d. Hadoop fs
Answer: a
43. Which among the following is correct?
S1: MapReduce is a programming model for
data processing
S2: Hadoop can run MapReduce programs
written in various languages
S3: MapReduce programs are inherently
parallel
a. S1 and S2
b. S2 and S3
c. S1 and S3
d. S1, S2 and S3
Answer: d
44. Mapper class is
a. generic type
b. abstract type
c. static type
d. final
Answer: a
45. Which package provides the basic types of
Hadoop?
a. org.apache.hadoop.io
b. org.apache.hadoop.util
c. org.apache.hadoop.type
d. org.apache.hadoop.lang
Answer: a
46. Which among the following does the Job
control in Hadoop?
a. Mapper class
b. Reducer class
c. Task class
d. Job class
Answer: d
47. Hadoop runs the jobs by dividing them into
a. maps
b. tasks
c. individual files
d. None of these
Answer: b
48. Which are the two nodes that control the job
execution process of Hadoop?
a. Job Tracker and Task Tracker
b. Map Tracker and Reduce Tracker
c. Map Tracker and Job Tracker
d. Map Tracker and Task Tracker
Answer: a
49. Which among the following schedules tasks to
be run?
a. Job Tracker
b. Task Tracker
c. Job Scheduler
d. Task Controller
Answer: A
50. What are fixed size pieces of MapReduce job
called?
a. records
b. splits
c. tasks
d. maps
Answer: b
51. Where is the output of map tasks written?
a. local disk
b. HDFS
c. File System
d. secondary storge
Answer: a
52. Which among the following is responsible for
processing one or more chunks of data and
producing the output results.
a. Maptask
b. jobtask
c. Mapper class
d. Reducetask
Answer: a
53. Which acts as an interface between Hadoop and
the program written?
a. Hadoop Cluster
b. Hadoop Streams
c. Hadoop Sequencing
d. Hadoop Streaming
Answer: d
54. What are Hadoop Pipes?
a. Java interface to Hadoop MapReduce
b. C++ interface to Hadoop MapReduce
c. Ruby interface to Hadoop MapReduce
d. Python interface to Hadoop MapReduce
Answer: b
55. What does Hadoop Common Package contain?
a. war files
b. msi files
c. jar files
d. exe files
Answer: c
56. Which among the following is the master node?
a. Name Node
b. Data Node
c. Job Node
d. Task Node
Answer: a
57. Which among the following is the slave node?
a. Name Node
b. Data Node
c. Job Node
d. Task Node
Answer: b
58. Which acts as a checkpoint node in HDFS?
a. Name Node
b. Data Node
c. Secondary Name Node
d. Secondary Data Node
Answer: c
59. Which among the following holds the location
of data?
a. Name Node
b. Data Node
c. Job Tracker
d. Task Tracker
Answer: a
60. What is the process of applying the code
received by the JobTracker on the file called?
a. Naming
b. Tracker
c. Mapper
d. Reducer
Answer: a
61. In which mode should Hadoop run in order to
run pipes job?
a. distributed mode
b. centralized mode
c. pseudo distributed mode
d. parallel mode
Answer: b
62. Which of the following are correct? S1:
Namespace volumes are independent of each
other S2: Namespace volumes are manages by
namenode
a. S1 only
b. S2 only
c. Both S1 and S2
d. Neither S1 nor S2
Answer: c
63. Which among the following architectural
changes need to attain High availability in
HDFS?
a. Clients must be configured to handle
namenode failover
b. Datanodes must send block reports to both
namenodes since the block mappings are
stored in a namenode’s memory, and not
on disk
c. namenodes must use highly-available
shared storage to share the edit log
d. All of the above
Answer: d
64. Which controller in HDFS manages the
transition from the active namenode to the
standby?
a. failover controller
b. recovery controller
c. failsafe controller
d. fencing controller
Answer: a
65. Which among the following is not an fencing
mechanism employed by system in HDFS?
a. killing the namenode’s process
b. disabling namenode’s network port via a
remote management command
c. revoking namenode’s access to the shared
storage directory
d. None of the above
Answer: d
66. What is the value of the property dfs.replication
et in case of pseudo distributed mode?
a. 0
b. 1
c. null
d. yes
Answer: b
67. What is the minimum amount of data that a disk
can read or write in HDFS?
a. block size
b. byte size
c. heap
-21
26
d. None
Answer: a
68. Which HDFS command checks file system and
lists the blocks?
a. hfsck
b. fcsk
c. fblock
d. fsck
Answer: d
69. What is an administered group used to manage
cache permissions and resource usage?
a. Cache pools
b. block pool
c. Namenodes
d. HDFS Cluster
Answer: a
70. Which object encapsulates a client or server’s
configuration?
a. File Object
b. Configuration object
c. Path Object
d. Stream Object
Answer: b
71. Which interface permits seeking to a position in
the file and provides a query method for the
current offset from the start of the file?
DataStream
a. Seekable
b. PositionedReadable
c. Progressable
Answer: b
72. Which method is used to list the contents of a
directory?
a. listFiles
b. listContents
c. listStatus
d. listPaths
Answer: C
73. What is the operation that use wildcard
characters to match multiple files with a single
expression called?
a. globbing
b. pattern matching
c. regex
d. regexfilter
74. What does the globStatus() methods return?
a. an array of FileStatus objects
b. an array of ListStatus objects
c. an array of PathStatus objects
d. an array of FilterStatus objects
Answer: a
75. What does the glob question mark(?) matches?
a. zero or more characters
b. one or more characters
c. a single character
d. metacharacter
Answer: c
76. Which method on FileSystem is used to
permanently remove files or directories?
a. remove()
b. rm()
c. del()
d. delete()
Answer: d
77. Which streams the packets to the first datanode
in the pipeline?
a. DataStreamer
b. FileStreamer
c. InputStreamer
d. PathStreamer
Answer: a
78. Which queue is responsible for asking the
namenode to allocate new blocks by picking a
list of suitable datanodes to store the replicas?
a. ack queue
b. data queue
c. path queue
d. stream queue
Answer: b
79. Which command is used to copy
files/directories?
a. distcp
b. hcp
c. copy
d. cp
Answer: a
80. Which flag is used with distcp to delete any
files or directories from the destination?
a. -remove
b. -rm
c. -del
d. -delete
Answer: d
0 Comments