Sunteți pe pagina 1din 3

Questionnaire

HIVE
1. Real time scenario: Long running queries - How to improve the performances
2. Set parameters - How will they be helpful?
3. Hive Query optimization
a. What is your approach to do query optimization in Hive, while handling very huge datasets
b. What are the benefits of using ORC file
c. What is cost based query optimization?
d. What is Hive bucketing? When to use it?
e. What is Hive partition? When to use it?
f. What is Hive indexing?
4. Structured / semi structured Data handling in Hive:
a. How to load data from xml file to Hive table?
b. How to load data from json file to Hive table?
c. How can we normalize multi dimensional array in Hive?
5. What is Hive meta-store? What it stores and where it is stored?
6. What is Thrift client?
7. What is Beeline query?
8. What is HCatalog? What is the usage of it?
9. Which exceptions you have faced while querying in Hive?
10. What is difference between External table and default table?
11. How to rename/drop columns, change the data types?
12. Have you faced Hive failure(vertex failures) anytime? What is it?
13. How to delete the data from external tables?
14. What are the differences between Truncate, Delete, Drop while handling with tables?
15. Vectorization
16. SMB – sort merge Bucket JOIN
17. Merge Bucket JOIN
18. SKEW JOIN
19. What is MSCK Repair? When to use it?
20. What is orphan partition?
21. What is use of: set hive.exec.dynamic.partition.mode=strict
22. What is LLAP in Hive?
23.

DATABASE
1. Joins concept and ask some examples in sql

Table A Table B
Column 1 Column 1
1 3
2 4
3 5
4 6
5 7

What is output of:


I. Select A.col1, B.col1 from table A
Left OUTER JOIN table B
On A.col1 = B.col1

II.
Select A.col1, B.col1 from table A
Left OUTER JOIN table B
On A.col1 = B.col1
AND A.col1 = 3;
III. Select A.col1, B.col1 from table A
Left OUTER JOIN table B
On A.col1 = B.col1
WHERE A.col1 = 3;
2. What is candidate key? What is surrogate key? What is Composite key? Super key?
3. Explain about ACID properties.
4. Explain about CAP theorem
5. Explain Normalization and De-Normalization
6. What are the different types of Normalization?
7. What is DeadLock? Phantom Deadlock?
8. Define Cursor and its types.
9. What is BCNF?
10. Explain about Dimension table and fact table.
11. Did you create or use Database triggers? Explain in real time, where used.
12. What is Stored procedure? Where used in real time experience.
13. What is a transaction? How to control?
14. How to implement Privileges in SQL?
15. Make it real time question….What is Star Schema and Snowflake Schema?
16.

SQOOP
1. While importing data (daily imports) from RDBMS to HDFS through SQOOP, How can we handle
modified, inserted and deleted rows?
2. How to optimize Sqoop command?
3.

HBASE
1. What is the purpose of HBASE?
2. What is the benefit of schema-less tables?
3. What is your approach to optimize HBase Query?
4. What is bloom filter? Explain with example.

ZOOKEEPER
1. What is the role of zookeeper?
2.

KAFKA
1. Explain about your work/use case experience with Kafka.
2. How Kafka works?
3. What are key components/methods in kafka producer and consumer?
4. What is poll interface in Kafka (consumer)?
5. What is kafka consumer read committed mode?

SPARK
1. How to sort data based on partitions in Spark
2. What is RDD? How can we create RDD? Types of RDD operation?
3. RDD persistence methods?
4. What are the options in RDD persistence through cache?
5. What are the Spark Variable types? When to use broadcast/accumulator variables?
6. What is Stage boundary

AWS – Cloud
1. What is cluster? How to do horizontal and vertical scale-up in a EMR cluster?

JAVA

1. What is Immutable Object?


2. What’s Polymorphism? Explain with real time example where you used?
3. Encapsulation usages?
4.

S-ar putea să vă placă și