Big Data Solutions MCQ With Answer
1. A ________
serves as the master and there is only one NameNode per cluster.
a. Data Node
b. NameNode
c. Data block
d. Replication
Answer: (b)
2. Point out the
correct statement.
a. DataNode is the
slave/worker node and holds the user data in the form of Data Blocks
b. Each incoming
file is broken into 32 MB by default
c. Data blocks are
replicated across different nodes in the cluster to ensure a low degree of
fault tolerance
d. None of the
mentioned
Answer: (a)
3. HDFS works in a
__________ fashion.
a. master-worker
b. master-slave
c. worker/slave
d. all of the
mentioned
Answer: (a)
4. ________
NameNode is used when the Primary NameNode goes down.
a. Rack
b. Data
c. Secondary
d. None of the
mentioned
Answer: (c)
5. Which of the
following scenario may not be a good fit for HDFS?
a. HDFS is not
suitable for scenarios requiring multiple/simultaneous writes to the same file
b. HDFS is
suitable for storing data related to applications requiring low latency data
access
c. HDFS is
suitable for storing data related to applications requiring low latency data
access
d. None of the
mentioned
Answer: (a)
6. ________ is the
slave/worker node and holds the user data in the form of Data Blocks.
a. DataNode
b. NameNode
c. Data block
d. Replication
Answer: (a)
7. HDFS provides a
command line interface called __________ used to interact with HDFS.
a. “HDFS Shell”
b. “FS Shell”
c. “DFS Shell”
d. None of the
mentioned
Answer: (b)
8. For YARN, the
___________ Manager UI provides host and port information.
a. Data Node
b. NameNode
c. Resource
d. Replication
Answer: (c)
9. During start
up, the ___________ loads the file system state from the fsimage and the edits
log file.
a. DataNode
b. NameNode
c. ActionNode
d. None of the
mentioned
Answer: (b)
10. In HDFS the
files cannot be
a. read
b. deleted
c. executed
d. Archived
Answer: (c)
12. Which of the
following operator executes
a shell command
from the Hive shell?
a. |
b. !
c. ^
d. +
Answer: (b)
13. Hive specific
commands can be run from Beeline, when the Hive _______ driver is used.
a. ODBC
b. JDBC
c. ODBC-JDBC
d. All of the
Mentioned
Answer: Option (b)
14. Which of the
following data type is supported by Hive?
a. map
b. record
c. string
d. enum
Answer: (d)
15. Avro-backed
tables can simply be created by using _________ in a DDL statement.
a. “STORED AS
AVRO”
b. “STORED AS
HIVE”
c. “STORED AS
AVROHIVE”
d. “STORED AS
SERDE”
Answer: (a)
16. Types that may
be null must be defined as
a ______ of that
type and Null within Avro.
a. Union
b. Intersection
c. Set
d. All of the
mentioned
Answer: (a)
17. _______ is
interpolated into the quotes to
correctly handle
spaces within the schema.
a. $SCHEMA
b. $ROW
c. $SCHEMASPACES
d. $NAMESPACES
Answer: (a)
18. ________ was
designed to overcome the
limitations of the
other Hive file formats.
a. ORC
b. OPC
c. ODC
d. None of the
mentioned
Answer: (a)
19. An ORC file
contains groups of row data called __________
a. postscript
b. stripes
c. script
d. none of the
mentioned
Answer: (b)
20. HBase is a
distributed ________ database built on top of the Hadoop file system.
a. Column-oriented
b. Row-oriented
c. Tuple-oriented
d. None of the
mentioned
Answer: (a)
21. HBase is
________ defines only column
families.
a. Row Oriented
b. Schema-less
c. Fixed Schema
d. All of the
mentioned
Answer: (b)
22. The _________
Server assigns regions to the region servers and takes the help of Apache
ZooKeeper for this task.
a. Region
b. Master
c. Zookeeper
d. All of the
mentioned
Answer: (b)
23. Which of the
following command provides information about the user?
a. status
b. version
c. whoami
d. user
Answer: (c)
24. _________
command fetches the contents
of a row or a
cell.
a. select
b. get
c. put
d. none of the
mentioned
Answer: (b)
25. HBaseAdmin and
____________ are the two important classes in this package that provide
DDL
functionalities.
a.
HTableDescriptor
b. HDescriptor
c. HTable
d. HTabDescriptor
Answer: (a)
26. The minimum
number of row versions to keep is configured per column family via _______
a. HBaseDecriptor
b. HTabDescriptor
c.
HColumnDescriptor
d. All of the
mentioned
Answer: (c)
27. HBase supports
a ____________ interface via Put and Result.
a. “bytes-in/bytes-out”
b. “bytes-in”
c. “bytes-out”
d. none of the
mentioned
Answer: (a)
28. One supported
data type that deserves
special mention
are ____________
a. money
b. counters
c. smallint
d. tinyint
Answer: (b)
29. __________
does re-write data and pack rows into columns for certain time-periods.
a. OpenTS
b. OpenTSDB
c. OpenTSD
d. OpenDB
Answer: (b)
30. __________
command disables drops and recreates a table.
a. drop
b. truncate
c. delete
d. none of the
mentioned
Answer: (b)
34. When a _______
is triggered the client
receives a packet
saying that the znode has
changed.
a. event
b. watch
c. row
d. value
Answer: (b)
35. The underlying
client-server protocol has changed in version _______ of ZooKeeper.
a. 2.0.0
b. 3.0.0
c. 4.0.0
d. 6.0.0
Answer: (b)
36. A number of
constants used in the client ZooKeeper API were renamed in order to reduce
________
collision.
a. value
b. namespace
c. counter
d. none of the
mentioned
Answer: (b)
37. ZooKeeper
allows distributed processes to
coordinate with
each other through registers,
known as
___________
a. znodes
b. hnodes
c. vnodes
d. rnodes
Answer: (a)
38. Zookeeper
essentially mirrors the _______
functionality
exposed in the Linux kernel.
a. iread
b. inotify
c. iwrite
d. icount
Answer: (b)
39. ZooKeeper’s
architecture supports high ____________ through redundant services.
a. flexibility
b. scalability
c. availability
d. interactivity
Answer: (c)
40. You need to
have _________ installed
before running
ZooKeeper.
a. Java
b. C
c. C++
d. SQLGUI
Answer: (a)
41. To register a
“watch” on a znode data, you
need to use the
_______ commands to access the
current content or
metadata.
a. stat
b. put
c. receive
d. gets
Answer: (a)
42. _______ has a
design policy of using ZooKeeper only for transient data.
a. Hive
b. Imphala
c. Hbase
d. Oozie
Answer: (c)
43. The ________
master will register its own address in this znode at startup, making this
znode the source of truth for identifying which server is the Master.
a. active
b. passive
c. region
d. all of the
mentioned
Answer: (a)
44. Pig operates
in mainly how many nodes?
a. Two
b. Three
c. Four
d. Five
Answer: (a)
45. You can run
Pig in batch mode using
__________
a. Pig shell
command
b. Pig scripts
c. Pig options
d. All of the
mentioned
Answer: (b)
46. Which of the
following function is used to
read data in PIG?
a. WRITE
b. READ
c. LOAD
d. None of the
mentioned
Answer:(c)
47. You can run
Pig in interactive mode using
the ______ shell.
a. Grunt
b. FS
c. HDFS
d. None of the
mentioned
Answer: (a)
48. Which of the
following is the default mode?
a. Mapreduce
b. Tez
c. Local
d. All of the
mentioned
Answer: (a)
49. ________ is a
platform for constructing data flows for extract, transform, and load (ETL)
processing and analysis of large datasets.
a. Pig Latin
b. Oozie
c. Pig
d. Hive
Answer: (c)
50. Hive also
support custom extensions
written in :
a. C
b. C++
c. C#
d. Java
Answer: (d)
51. Which of the
following is not true about Pig?
a. Apache Pig is
an abstraction over MapReduce
b.Pig can not
perform all the data manipulation operations in Hadoop.
c. Pig is a
tool/platform which is used to analyze larger sets of data representing them as
data flows.
d. None of the
above
Ans : b
52. Which of the
following is/are a feature of
Pig?
a. Rich set of
operators
b.Ease of
programming
c. Extensibility
d. All of the
above
Ans : d
53. In which year
apache Pig was released?
a. 2005
b.2006
c. 2007
d. 2008
Ans : b
54. Pig operates
in mainly how many nodes?
a. 2
b. 3
c. 4
d. 5
Ans : a
55. Which of the
following company has developed PIG?
a. Google
b.Yahoo
c. Microsoft
d. Apple
Ans : b
56. Which of the
following function is used to read data in PIG?
a. Write
b.Read
c. Perform
d.Load
Ans : d
57. __________ is
a framework for collecting and storing script-level statistics for Pig Latin.
a. Pig Stats
b. PStatistics
c. Pig Statistics
d. All of the
above
Ans : c
58. Which of the
following is true statement?
a. Pig is a high
level language.
b. Performing a
Join operation in Apache Pig is pretty simple.
c. Apache Pig is a
data flow language.
d. All of the
above
Ans : d
59. Which of the
following will compile the Pigunit?
a. $pig_trunk ant
pigunit-jar
b. $pig_tr ant
pigunit-jar
c. $pig_ ant
pigunit-jar
d. $pigtr_ ant
pigunit-jar
Ans : a
60. Point out the wrong statement.a. Pig can invoke code in language like Java Onlyb. Pig enables data workers to write complex data transformations without knowing Javac. Pig’s simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scriptinglanguages and SQLd. Pig is complete, so you can do all required data manipulations in Apache Hadoop with PigAns : a 61. You can run Pig in interactive mode using the______ shella.Gruntb. FSc. HDFSd. None of the mentionedAns : a 62. Which of the following is the default mode?a. Mapreduceb.Tezc. Locald.All of the mentionedAns : d 63. Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive mode)a. fetchb. declarec. rund. all of the mentionedAns : c
64. What are the different complex data types inPIGa. Mapsb.Tuplesc. Bagsd.All of theseAnswer: d 65. What are the various diagnostic operatorsavailable in Apache Pig?a. Dump Operatorb. Describe Operatorc. Explain Operatord.All of these 66. If data has less elements than the specifiedschema elements in pig, then?a. Pig will not do any thingb.It will pad the end of the record columnswith nullsc. Pig will through errord. Pig will warn you before it throws errorAnswer: b
67. Which of the following command sets the value of a particular configuration variable (key)?a.set -vb.set =c.setd.resetAnswer: b 69. Which of the following will remove the resource(s) from the distributed cache?a.delete FILE[S] *b.delete JAR[S] *c.delete ARCHIVE[S] *d.all of the mentionedAnswer: d
70. _________ is a shell utility which can be used to run Hive queries in either interactive or batch mode.a. $HIVE/bin/hiveb. $HIVE_HOME/hivec. $HIVE_HOME/bin/hived. All of the mentionedAnswer: c
71. HiveServer2 introduced in Hive 0.11 has a new CLI called __________a. BeeLineb. SqlLinec. HiveLined. CLilLineAnswer: a
72. Variable Substitution is disabled by using___________a. set hive.variable.substitute=false;b. set hive.variable.substitutevalues=false;c. set hive.variable.substitute=true;d. all of the mentionedAnswer: a
73. _______ supports a new command shellBeeline that works with HiveServer2.a. HiveServer2b. HiveServer3c. HiveServer4d. None of the mentionedAnswer: a
74. In ______ mode HiveServer2 only acceptsvalid Thrift calls.a. Remoteb. HTTPc. Embeddedd. InteractiveAnswer: a
75. The Hbase tables area. Made read only by setting the read-onlyoptionb. Always writeablec. Always read-onlyd. Are made read only using the query to theAnswer: a
76. Every row in a Hbase table hasa.Same number of columnsb.Same number of column familiesc.Different number of columnsd.Different number of column familiesAnswer: d
77. Hbase creates a new version of a recordduringa. Creation of a recordb.Modification of a recordc. Deletion of a recordd.All the aboveAnswer: d
78. HBaseAdmin and ____________ are the twoimportant classes in this package that provideDDL functionalities.a.HTableDescriptorb. HDescriptorc. HTabled. HTabDescriptorAnswer: a
79. Mention how many operational commands in Hbase?a. Getb. Putc. Deleted. All of the mentionedAnswer: d
a. Data Node
b. NameNode
c. Data block
d. Replication
Answer: (b)
2. Point out the correct statement.
a. DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b. Each incoming file is broken into 32 MB by default
c. Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d. None of the mentioned
Answer: (a)
3. HDFS works in a __________ fashion.
a. master-worker
b. master-slave
c. worker/slave
d. all of the mentioned
Answer: (a)
4. ________ NameNode is used when the Primary NameNode goes down.
a. Rack
b. Data
c. Secondary
d. None of the mentioned
Answer: (c)
5. Which of the following scenario may not be a good fit for HDFS?
a. HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b. HDFS is suitable for storing data related to applications requiring low latency data access
c. HDFS is suitable for storing data related to applications requiring low latency data access
d. None of the mentioned
Answer: (a)
6. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
a. DataNode
b. NameNode
c. Data block
d. Replication
Answer: (a)
7. HDFS provides a command line interface called __________ used to interact with HDFS.
a. “HDFS Shell”
b. “FS Shell”
c. “DFS Shell”
d. None of the mentioned
Answer: (b)
8. For YARN, the ___________ Manager UI provides host and port information.
a. Data Node
b. NameNode
c. Resource
d. Replication
Answer: (c)
9. During start up, the ___________ loads the file system state from the fsimage and the edits log file.
a. DataNode
b. NameNode
c. ActionNode
d. None of the mentioned
Answer: (b)
10. In HDFS the files cannot be
a. read
b. deleted
c. executed
d. Archived
Answer: (c)
12. Which of the following operator executes
a shell command from the Hive shell?
a. |
b. !
c. ^
d. +
Answer: (b)
13. Hive specific commands can be run from Beeline, when the Hive _______ driver is used.
a. ODBC
b. JDBC
c. ODBC-JDBC
d. All of the Mentioned
Answer: Option (b)
14. Which of the following data type is supported by Hive?
a. map
b. record
c. string
d. enum
Answer: (d)
15. Avro-backed tables can simply be created by using _________ in a DDL statement.
a. “STORED AS AVRO”
b. “STORED AS HIVE”
c. “STORED AS AVROHIVE”
d. “STORED AS SERDE”
Answer: (a)
16. Types that may be null must be defined as
a ______ of that type and Null within Avro.
a. Union
b. Intersection
c. Set
d. All of the mentioned
Answer: (a)
17. _______ is interpolated into the quotes to
correctly handle spaces within the schema.
a. $SCHEMA
b. $ROW
c. $SCHEMASPACES
d. $NAMESPACES
Answer: (a)
18. ________ was designed to overcome the
limitations of the other Hive file formats.
a. ORC
b. OPC
c. ODC
d. None of the mentioned
Answer: (a)
19. An ORC file contains groups of row data called __________
a. postscript
b. stripes
c. script
d. none of the mentioned
Answer: (b)
20. HBase is a distributed ________ database built on top of the Hadoop file system.
a. Column-oriented
b. Row-oriented
c. Tuple-oriented
d. None of the mentioned
Answer: (a)
21. HBase is ________ defines only column
families.
a. Row Oriented
b. Schema-less
c. Fixed Schema
d. All of the mentioned
Answer: (b)
22. The _________ Server assigns regions to the region servers and takes the help of Apache ZooKeeper for this task.
a. Region
b. Master
c. Zookeeper
d. All of the mentioned
Answer: (b)
23. Which of the following command provides information about the user?
a. status
b. version
c. whoami
d. user
Answer: (c)
24. _________ command fetches the contents
of a row or a cell.
a. select
b. get
c. put
d. none of the mentioned
Answer: (b)
25. HBaseAdmin and ____________ are the two important classes in this package that provide
DDL functionalities.
a. HTableDescriptor
b. HDescriptor
c. HTable
d. HTabDescriptor
Answer: (a)
26. The minimum number of row versions to keep is configured per column family via _______
a. HBaseDecriptor
b. HTabDescriptor
c. HColumnDescriptor
d. All of the mentioned
Answer: (c)
27. HBase supports a ____________ interface via Put and Result.
a. “bytes-in/bytes-out”
b. “bytes-in”
c. “bytes-out”
d. none of the mentioned
Answer: (a)
28. One supported data type that deserves
special mention are ____________
a. money
b. counters
c. smallint
d. tinyint
Answer: (b)
29. __________ does re-write data and pack rows into columns for certain time-periods.
a. OpenTS
b. OpenTSDB
c. OpenTSD
d. OpenDB
Answer: (b)
30. __________ command disables drops and recreates a table.
a. drop
b. truncate
c. delete
d. none of the mentioned
Answer: (b)
34. When a _______ is triggered the client
receives a packet saying that the znode has
changed.
a. event
b. watch
c. row
d. value
Answer: (b)
35. The underlying client-server protocol has changed in version _______ of ZooKeeper.
a. 2.0.0
b. 3.0.0
c. 4.0.0
d. 6.0.0
Answer: (b)
36. A number of constants used in the client ZooKeeper API were renamed in order to reduce
________ collision.
a. value
b. namespace
c. counter
d. none of the mentioned
Answer: (b)
37. ZooKeeper allows distributed processes to
coordinate with each other through registers,
known as ___________
a. znodes
b. hnodes
c. vnodes
d. rnodes
Answer: (a)
38. Zookeeper essentially mirrors the _______
functionality exposed in the Linux kernel.
a. iread
b. inotify
c. iwrite
d. icount
Answer: (b)
39. ZooKeeper’s architecture supports high ____________ through redundant services.
a. flexibility
b. scalability
c. availability
d. interactivity
Answer: (c)
40. You need to have _________ installed
before running ZooKeeper.
a. Java
b. C
c. C++
d. SQLGUI
Answer: (a)
41. To register a “watch” on a znode data, you
need to use the _______ commands to access the
current content or metadata.
a. stat
b. put
c. receive
d. gets
Answer: (a)
42. _______ has a design policy of using ZooKeeper only for transient data.
a. Hive
b. Imphala
c. Hbase
d. Oozie
Answer: (c)
43. The ________ master will register its own address in this znode at startup, making this znode the source of truth for identifying which server is the Master.
a. active
b. passive
c. region
d. all of the mentioned
Answer: (a)
44. Pig operates in mainly how many nodes?
a. Two
b. Three
c. Four
d. Five
Answer: (a)
45. You can run Pig in batch mode using
__________
a. Pig shell command
b. Pig scripts
c. Pig options
d. All of the mentioned
Answer: (b)
46. Which of the following function is used to
read data in PIG?
a. WRITE
b. READ
c. LOAD
d. None of the mentioned
Answer:(c)
47. You can run Pig in interactive mode using
the ______ shell.
a. Grunt
b. FS
c. HDFS
d. None of the mentioned
Answer: (a)
48. Which of the following is the default mode?
a. Mapreduce
b. Tez
c. Local
d. All of the mentioned
Answer: (a)
49. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.
a. Pig Latin
b. Oozie
c. Pig
d. Hive
Answer: (c)
50. Hive also support custom extensions
written in :
a. C
b. C++
c. C#
d. Java
Answer: (d)
51. Which of the following is not true about Pig?
a. Apache Pig is an abstraction over MapReduce
b.Pig can not perform all the data manipulation operations in Hadoop.
c. Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows.
d. None of the above
Ans : b
52. Which of the following is/are a feature of
Pig?
a. Rich set of operators
b.Ease of programming
c. Extensibility
d. All of the above
Ans : d
53. In which year apache Pig was released?
a. 2005
b.2006
c. 2007
d. 2008
Ans : b
54. Pig operates in mainly how many nodes?
a. 2
b. 3
c. 4
d. 5
Ans : a
55. Which of the following company has developed PIG?
a. Google
b.Yahoo
c. Microsoft
d. Apple
Ans : b
56. Which of the following function is used to read data in PIG?
a. Write
b.Read
c. Perform
d.Load
Ans : d
57. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
a. Pig Stats
b. PStatistics
c. Pig Statistics
d. All of the above
Ans : c
58. Which of the following is true statement?
a. Pig is a high level language.
b. Performing a Join operation in Apache Pig is pretty simple.
c. Apache Pig is a data flow language.
d. All of the above
Ans : d
59. Which of the following will compile the Pigunit?
a. $pig_trunk ant pigunit-jar
b. $pig_tr ant pigunit-jar
c. $pig_ ant pigunit-jar
d. $pigtr_ ant pigunit-jar
Ans : a
0 Comments