Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Top 10 Hadoop Interview Questions & Answers

$
0
0

Q1. What exactly is Hadoop?

A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.

Q2. What are 5 Vs of Big Data ?

A2. Volume Size of the data

Velocity Speed of change of data

Variety Different types of data : Structured, Semi-Structured, Unstructured data.

Q3. Give me examples of Unstructured data.

A3. Images, Videos, Audios etc.

Q4. Tell me about Hadoop file system and processing framework.

A4. Hadoop files system is called as HDFS Hadoop distributed file system. It consists of Name Node, Data Node and Secondary Name Node.

Hadoop processing framework is known as MapReduce. It caters Map and Reduce tasks that get scheduled in parallel to achieve efficiency.

Q5/ What is High Availability feature in Hadoop2.

A5. In Hadoop 2 Passive Name Node is introduced to avoid NameNode becoming single point of failure. This results into High Availability of Hadoop cluster.

Q6. What is Federation.

A6. Federation is introduced in Hadoop 2 to cater multiple NameNodes in Hadoop cluster. This makes NameNode horizontally scalable and allows to cater huge amount of Meta Data.

Q7. What is MetaData ?

A7. MetaData is data about data. Name Node caters MetaData in Hadoop cluster information about files in HDFS.

Q8. What are the main components in Hadoop Eco-System and what are their functions ?

A8. Here is a list of Hadoop Eco-System components

1. HDFS distributed File System

2. MapReduce programming paradigm based on Java

3. Pig- to process and analyse the structured and semi-structured data

4. Hive to process and analyse structured data

5. HBASE NOSQL database

6. SQOOP Import/Export structured data

7. Oozie Scheduler

Q9. Tell me some major benefits of Hadoop?

A9. Some major benefits of Hadoop are

a. Cost-Effective

b. Ability to handle multiple data types

c. Ability to handle big data

d. Common platform for machine learning/business intelligence/datawarehousing etc.

Q10. How Hadoop is cost-effective?

A10. Hadoop is used with commodity hardware and is open-source. So, it provides a cost-effective solution from both hardware and software fronts.

For original click here


Viewing all articles
Browse latest Browse all 6262

Trending Articles