Big Data Analytics Quiz For AKTU Part 4

Big Data Solutions MCQ With Answer

1. Which among the following is Hadoop’s cluster
resource management system?
a. GLOB
b. YARN
c. ARM
d. SPARK
Answer: b

2. Which of the following processing framework
interacts with YARN directly?
a. Pig
b. Hive
c. Crunch
d. None of these
Answer: D

3. Which of the following processing frameworks
run on MapReduce?
a. Pig
b. Hive
c. Crunch
d. All of the above
Answer: d

4. Which among the following are the core
services of YARN?
a. resource manager and node manager
b. namenode and datanode
c. data manager and resource manager
d. data manager and application manager
Answer: a

5. Which constraints can be used to request a
container on a specific node or rack, or
anywhere on the cluster in YARN?
a. Container constraints
b. Space constraints
c. Locality constraints
d. Resource constraints
Answer: c

6. Which among the following can be used to
model YARN applications?
a. one application per user job
b. run one application per workflow
c. long-running application that is shared by
different users
d. All of the above
Answer: d

7. Which follows one application per user job
model?
a. MapReduce
b. Spark
c. Apache Slider
d. Samza
Answer: a

8. Which application runs per user session?
a. MapReduce
b. Spark
c. Apache Slider
d. None of the above
Answer: b

9. Which among the following has a long-running
application master for launching other
applications on the cluster?
a. MapReduce
b. Spark
c. Apache Slider
d. None of the above
Answer: c

10. Which among the following can be used for
stream processing?
a. Spark
b. Samza
c. Storm
d. All of the above
Answer: d

11. Which provides a simple programming model
for developing distributed applications on
YARN?
a. Apache Slider
b. Apache Twill
c. Spark
d. Tez
Answer: b

12. Which among the following statements are true
with respect to Apache Twill? S1: Twill
supports real-time logging S2: Allows the usage
of a Java Runnable interface
a. S1 only
b. S2 only
c. Both S1 and S2
d. Neither S1 nor S2
Answer: c

13. Which daemon control the job execution
process in MapReduce 1?
a. jobtracker
b. tasktrackers
c. Both jobtracker and tasktrackers
d. Name node and data node
Answer: c

14. Which among the following coordinates all the
jobs run on the system by scheduling tasks in
MapReduce 1?
a. jobtracker
b. tasktrackers
c. data node
d. Name node
Answer: a

15. Which of the following which keeps a record of
the overall progress of each job in MapReduce
1?
a. jobtracker
b. tasktrackers
c. data node
d. Name node
Answer: a

16. Which among the following run tasks and send
progress reports in MapReduce 1?
a. jobtracker
b. tasktrackers
c. data node
d. Name node
Answer: b

17. Choose the tasks of jobtracker in MapReduce
1?
a. job scheduling
b. task progress monitoring
c. task bookkeeping
d. All of the above
Answer: d

18. Which is responsible for storing job history in
MapReduce 1?
a. jobtracker
b. tasktrackers
c. data node
d. Name node
Answer: a

19. In YARN, the responsibility of jobtracker is
handled by
a. Resource manager
b. application master
c. timeline server
d. All of the above
Answer: d

20. In YARN, the responsibility of tasktracker is
handled by
a. Resource manager
b. application master
c. timeline server
d. Node manager
Answer: d

21. Which stores the application history in YARN?
a. Resource manager
b. application master
c. timeline server
d. Node manager
Answer: c

22. Which among the following are the features of
YARN?
a. Scalability
b. Multitenancy
c. Availabilit
d. All of the above
Answer: d

23. Which among the following schedulers
available in YARN?
a. FIFO
b. Shortest Job First
c. Round Robin
d. Shortest Remaining Time
Answer: a

24. Which are/is the schedulers available in
YARN?
a. FIFO
b. Capacity
c. Fair Schedulers
d. All of the above
Answer: d

25. Which among the following schedulers attempts
to allocate resources so that all running
applications get the same share of resources in
YARN
a. FIFO
b. Capacity
c. Fair Schedulers
d. Round Robin
Answer: c

26. Which among the following schedulers
provides queue elasticity in YARN?
a. FIFO
b. Capacity
c. Fair Schedulers
d. Round Robin
Answer: b

27. Which among the following schedulers in
YARN is used by default?
FIFO
Capacity
Fair Schedulers
Round Robin
Answer: b

28. In which xml, is the default configuration of
schedulers to be changed?
a. yarn-site.xml
b. config.xml
c. scheduler.xml
d. yarn-scheduler.xml
Answer: a

29. Which among the following queue scheduling
policies are/is supported by Fair Schedulers in
YARN?
a. FIFO
b. Dominant Resource Fairness
c. preemption
d. All of the above
Answer: d

30. Which holds the list of rules for queue
placement in Fair Scheduling?
a. queuePlacementPolicy
b. rulePlacementolicy
c. scheduleQueuePolicy
d. schedulingPolicy
Answer: a

31. Which of the setting is used to set preemption
globally?
a. yarn.scheduler.fair.preemption = true
b. yarn.scheduler.preemption = true
c. yarn.scheduler.global.preemption = true
d. yarn.scheduler.enable.preemption = true
Answer: a

32. Which among the following supports delay
scheduling?
a. FIFO
b. Capacity Scheduler
c. Fair Scheduler
d. Both Capacity and Fair Scheduler
Answer: d

33. What is the default period of heartbeat request
sent by node manager?
a. one per millisecond
b. one per second
c. one per minute
d. one per nanosecond
Answer: b

34. Which error detection code is used in HDFS?
a. CRC-32
b. CRC-32C
c. SHA
d. SHA-1
Answer: b

35. CRC-32C has the storage overhead
a. less than 1%
b. less than 5%
c. less than 10%
d. less than 2.5%
Answer: a

36. The heartbeat signal are sent from
a. Jobtracker to Tasktracker
b. Tasktracker to Job tracker
c. Jobtracker to namenode
d. Tasktracker to namenode
Answer: b

37. Spark was initially started by ________ at UC
Berkeley AMPLab in 2009.
a. Mahek Zaharia
b. Matei Zaharia
c. Doug Cutting
d. Stonebraker
Answer: (b)

38. ________ is a component on top of Spark Core.
a. Spark Streaming
b. Spark SQL
c. RDDs
d. All of the mentioned
Answer: (b)

39. Spark SQL provides a domain-specific
language to manipulate ___________ in Scala,
Java, or Python.
a. Spark Streaming
b. Spark SQL
c. RDDs
d. All of the mentioned
Answer: (c)

40. ______________ leverages Spark Core fast
scheduling capability to perform streaming
analytics.
a. MLlib
b. Spark Streaming
c. GraphX
d. RDDs
Answer: (b)

41. ________ is a distributed machine learning
framework on top of Spark.
a. MLlib
b. Spark Streaming
c. GraphX
d. RDDs
Answer: (a)

42. Users can easily run Spark on top of Amazon’s
__________
a. Infosphere
b. EC2
c. EMR
d. None of the mentioned
Answer: (b)

43. Which of the following can be used to launch
Spark jobs inside MapReduce?
a. SIM
b. SIMR
c. SIR
d. RIS
Answer: (b)

44. Which of the following language is not
supported by Spark?
a. Java
b. Pascal
c. Scala
d. Python
Answer: (b)

45. Spark is packaged with higher level libraries,
including support for _________ queries.
a. SQL
b. C
c. C++
d. None of the mentioned
Answer: (a)

46. Spark includes a collection over ________
operators for transforming data and familiar
data frame APIs for manipulating semistructured data.
a. 50
b. 60
c. 70
d. 80
Answer: (d)

47. Spark is engineered from the bottom-up for
performance, running ___________ faster than
Hadoop by exploiting in memory computing
and other optimizations.
a. 100x
b. 150x
c. 200x
d. None of the mentioned
Answer: (a)

48. Spark powers a stack of high-level tools
including Spark SQL, MLlib for _________
a. regression models
b. statistics
c. machine learning
d. reproductive research
Answer: (c)

49. For Multiclass classification problem which
algorithm is not the solution?
a. Naive Bayes
b. Random Forests
c. Logistic Regression
d. Decision Trees
Answer: (d)

50. Which of the following is a tool of Machine
Learning Library?
a. Persistence
b. Utilities like linear algebra, statistics
c. Pipelines
d. All of the above
Answer: (d)

51. Which of the following is true for Spark core?
a. It is the kernel of Spark
b. It enables users to run SQL / HQL queries
on the top of Spark.
c. It is the scalable machine learning library
which delivers efficiencies
d. Improves the performance of iterative
algorithm drastically.
Answer: (a)

52. Which of the following is true for Spark MLlib?
a. Provides an execution platform for all the
Spark applications
b. It is the scalable machine learning library
which delivers efficiencies
c. enables powerful interactive and data
analytics application across live streaming
data
d. All of the above
Answer: (b)

53. Which of the following is true for RDD?
a. We can operate Spark RDDs in parallel
with a low-level API
b. RDDs are similar to the table in a
relational database
c. It allows processing of a large amount of
structured data
d. It has built-in optimization engine
Answer: (a)

54. RDD is fault-tolerant and immutable
a. True
b. False
Answer: (a)

55. The read operation on RDD is
a. Fine-grained
b. Coarse-grained
c. Either fine-grained or coarse-grained
d. Neither fine-grained nor coarse-grained
Answer: (c)

56. The write operation on RDD is
a. Fine-grained
b. Coarse-grained
c. Either fine-grained or coarse-grained
d. Neither fine-grained nor coarse-grained
Answer: (b)

57. Is it possible to mitigate stragglers in RDD?
a. Yes
b. No
Answer: (a)

58. Fault Tolerance in RDD is achieved using
a. Immutable nature of RDD
b. DAG (Directed Acyclic Graph)
c. Lazy-evaluation
d. None of the above
Answer: (b)

59. What is action in Spark RDD?
a. The ways to send result from
executors to the driver
b. Takes RDD as input and produces one
or more RDD as output.
c. Creates one or many new RDDs
d. All of the above
Answer: (a)

60. The shortcomings of Hadoop MapReduce was
overcome by Spark RDD by
a. Lazy-evaluation
b. DAG
c. In-memory processing
d. All of the above
Answer: (d)

61. Spark is developed in which language
a. Java
b. Scala
c. Python
d. R
Answer: (b)

62. Which of the following is not a component of
the Spark Ecosystem?
(a) Sqoop
(b) GraphX
(c) MLlib
(d) BlinkDB
Answer: (a)

63. Which of the following algorithm is not present
in MLlib?
a. Streaming Linear Regression
b. Streaming KMeans
c. Tanimoto distance
d. None of the above
Answer: (c)

64. Which of the following is not the feature of
Spark?
a. Supports in-memory computation
b. Fault-tolerance
c. It is cost-efficient
d. Compatible with other file storage system
Answer: (c)

65. Which of the following is the reason for Spark
being Speedy than MapReduce?
a. DAG execution engine and in-memory
computation
b. Support for different language APIs like
Scala, Java, Python and R
c. RDDs are immutable and fault-tolerant
d. None of the above
Answer: (a)

66. Which of the following is true for RDD?
a. RDD is a programming paradigm
b. RDD in Apache Spark is an
immutable collection of objects
c. It is a database
d. None of the above
Answer: (b)

67. Which of the following is a tool of the Machine
Learning Library?
a. Persistence
b. Utilities like linear algebra, statistics
c. Pipelines
d. All of the above
Answer: (d)

68. __________ is a online NoSQL developed by
Cloudera.
a. HCatalog
b. Hbase
c. Imphala
d. Oozie
Answer: (b)

69. Which of the following is not a NoSQL
database?
a. SQL Server
b. MongoDB
c. Cassandra
d. None of the mentioned
Answer: (a)

70. Which of the following is a NoSQL Database
Type?
a. SQL
b. Document databases
c. JSON
d. All of the mentioned
Answer: (b)

71. Which of the following is a wide-column store?
a. Cassandra
b. Riak
c. MongoDB
d. Redis
Answer: (a)

72. “Sharding” a database across many server
instances can be achieved with _
a. LAN
b. SAN
c. MAN
d. All of the mentioned
Answer: (b)

73. Most NoSQL databases support automatic
__________ meaning that you get high
availability and disaster recovery.
a. processing
b. scalability
c. replication
d. all of the mentioned
Answer: (c)

74. Which of the following are the simplest NoSQL
databases?
a. Key-value
b. Wide-column
c. Document
d. All of the mentioned
Answer: (a)

75. ________ stores are used to store information
about networks, such as social connections.
a. Key-value
b. Wide-column
c. Document
d. Graph
Answer: (d)

76. NoSQL databases is used mainly for handling
large volumes of _____ data.
a. unstructured
b. structured
c. semi-structured
d. all of the mentioned
Answer: (a)

77. Which of the following language is MongoDB
written in?
a. Javascript
b. C
c. C++
d. All of the mentioned
Answer: (d)

78. Point out the correct statement.
a. MongoDB is classified as a NoSQL
database
b. MongoDB favors XML format more than
JSON
c. MongoDB is column-oriented database
store
d. All of the mentioned
Answer: (a)

79. Which of the following format is supported by
MongoDB?
a. SQL
b. XML
c. BSON
d. All of the mentioned
Answer: (c)
.
80. NoSQL was designed with security in mind, so
developers or security teams don’t need to
worry about implementing a security layer. Is it
true or false?
a. True
b. False
Answer: (b)

81. Which of the following is not a reason NoSQL
has become a popular solution for some
organizations?
a. Better scalability
b. Improved ability to keep data consistent
c. Faster access to data than relational
database management systems (RDBMS)
d. More easily allows for data to be held
across multiple servers
Answer: (b)

82. NoSQL prohibits structured query language
(SQL). Is it True or False?
a. True
b. False
Answer: (b)

83. When is it best to use a NoSQL database?
a. When providing confidentiality, integrity,
and availability is crucial
b. When the data is predictable
c. When the retrieval of large quantities of
data is needed
d. When the retrieval speed of data is not
critical
Answer: (c)

84. Which of the following companies developed
NoSQL database Apache Cassandra?
a. LinkedIn
b. Twitter
c. MySpace
d. Facebook
Answer: (d)

85. NoSQL databases are most often referred to as:
a. Relational
b. Distributed
c. Object-oriented
d. Network
Answer: (b)

86. SQL databases are:
a. Horizontally scalable
b. Vertically scalable
c. Either horizontally or vertically scalable
d. They don’t scale
Answer: (b)

87. Which of the following is not an example of a
NoSQL database?
a. CouchDB
b. MongoDB
c. HBase
d. PostgreSQL
Answer: (d)

88. SQL command types include data manipulation
language (DML) and data definition language
(DDL).
a. True
b. False
Answer: (a)

89. ________ systems are scale-out file-based
(HDD) systems moving to more uses of
memory in the nodes.
a. NoSQL
b. NewSQL
c. SQL
d. All of the mentioned
Answer: (a)

90. Point out the correct statement.
a. Hadoop is ideal for the analytical, postoperational, data-warehouse-ish type of
workload
b. HDFS runs on a small cluster of commodityclass nodes
c. NEWSQL is frequently the collection point
for big data
d. None of the mentioned
Answer: (a)

91. Which is an advantage of NewSQL ?
a. Less complex applications, greater
consistency.
b. Convenient standard tooling.
c. SQL influenced extensions.
d. All of the mentioned
Answer: (d)

92. Following represent column in NoSQL
__________.
a. Database
b. Field
c. Document
d. Collection
Answer:(b)

93. What is the aim of NoSQL?
a. NoSQL provides an alternative to SQL
databases to store textual data.
b. NoSQL databases allow storing nonstructured data.
c. NoSQL is not suitable for storing structured
data.
d. NoSQL is a new data format to store large
datasets.
Answer: (d)

94. Which of the following is not a feature for
NoSQL databases?
a. Data can be easily held across multiple
servers
b.Relational Data
c. Scalability
d. Faster data access than SQL databases
Ans : b

95. Which of the following statement is correct
with respect to mongoDB?
a. MongoDB is a NoSQL Database
b. MongoDB used XML over JSON for data
exchange
c. MongoDB is not scalable
d. All of the above
Ans : a

96. Which of the following represent column in mongoDB?
a. document
b. database
c. collection
d. field
Ans : d

97. The system generated _id field is?
a. A 12 byte hexadecimal value
b. A 16 byte octal value
c. A 12 byte decimal value
d. A 10 bytes binary value
Ans : a

98. Which of the following true about mongoDB?
a. MongoDB is a cross-platform
b.MongoDB is a document oriented database
c. MongoDB provides high performance
d.All of the above
Ans : d

99. Collection is a group of MongoDB __?
a.Database
b. Document
c.Field
d. None of the above
Ans : b

100. A developer want to develop a database for
LFC system where the data stored is mostly in
similar manner. Which database should use?
a. Relational
b. NoSQL
c. Both A and B can be used
d. None of the above
Ans : b

101. Documents in the same collection do not need
to have the same set of fields or structure, and
common fields in a collection’s documents may
hold different types of data is known as ?
a. dynamic schema
b. mongod
c. mongo
d. Embedded Documents
Ans : a

102.Instead of Primary Key mongoDB use?
a. Embedded Documents
b. Default key _id
c. mongod
d. mongo
Ans : B

PART 5