Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

HBase Architecture: HBase Data Model & HBase Read/Write Mechanism

$
0
0
HBase Architecture

In my previous blog on HBase Tutorial , I explained what is HBase and its features. I also mentioned Facebook messenger’s case study to help you to connect better. Now further moving aheadin our Hadoop Tutorial Series , I will explain you the data model of HBase and HBase Architecture. So, you can understand what makes HBase very famous.

The important topics that I will be taking you through in this HBase architecture blog are:

HBase Data Model HBase Architecture and it’s Components HBase Write Mechanism HBase Read Mechanism HBase Performance Optimization Mechanisms

Let us first understand the data model of HBase. Ithelps HBase in faster read/write and searches.

HBase Architecture: HBase Data Model

As we know, HBase is a column-oriented NoSQL database. Although it looks similar to a relational database which contains rows and columns, but it is not a relational database. Relational databases areroworiented while HBase is column-oriented. So, let us first understand the difference between Column-oriented and Row-oriented databases:

Row-oriented vs column-oriented Databases:

Row-oriented databases store table records in a sequence of rows. Whereas column-oriented databases store table records in a sequence of columns, i.e. the entries in a column are stored in contiguous locations on disks.

To better understand it, let us take an example and consider the table below.


HBase Architecture: HBase Data Model & HBase Read/Write Mechanism

If this table is stored in a row-oriented database. It will store the records as shown below:

1 , Paul Walker , US , 231 , Gallardo ,

2, Vin Diesel , Brazil , 520 , Mustang

In row-oriented databases data is stored on the basis of rows or tuples as you can see above.

While the column-oriented databases store this data as:

1 , 2 , Paul Walker , Vin Diesel , US , Brazil , 231 , 520 , Gallardo , Mustang

In a column-oriented databases, all the column values are stored together like first column values will be stored together, then the second column values will be stored together and data in other columns are stored in a similar manner.

When the amount of data is very huge, like in terms of petabytes or exabytes, we use column-oriented approach, because the data of a single column is stored together and can be accessed faster. While row-oriented approach comparatively handles less number of rows and columns efficiently, as row-oriented database stores data is a structured format. When we need to processand analyze a large set of semi-structured or unstructured data, we use column oriented approach. Such as applications dealing with Online Analytical Processing like data mining, data warehousing, applications including analytics, etc. Whereas, Online Transactional Processing such as banking and finance domains which handle structured data and require transactional properties (ACID properties) use row-oriented approach.

HBase tables has following components, shown in the image below:


HBase Architecture: HBase Data Model & HBase Read/Write Mechanism
Tables : Data isstored in a table format in HBase. But here tables are in column-oriented format. Row Key : Row keys are used to search recordswhich makesearches fast. You would be curious to know how? I will explain it in the architecture part moving ahead in this blog. Column Families : Various columns are combined in a column family. These column families are stored together which makes the searching process faster because data belonging to same column family can be accessedtogether in a single seek. Column Qualifiers : Each column’s name is known as its column qualifier. Cell : Data is stored in cells. The data is dumped into cells which are specifically identified by rowkey and column qualifiers. Timestamp : Timestamp is a combination of date and time. Whenever data is stored, it is stored with its timestamp. This makes easy to search for a particular version of data.

In a more simple and understanding way,we can say HBase consists of:

Set of tables Each table with column families and rows Row key acts as a Primary key in HBase. Any access to HBase tables uses this Primary Key Each column qualifier present in HBase denotes attribute corresponding to the object which resides in the cell.

Now that you know about HBaseData Model, let us see how this data model falls in line with HBase Architecture and makes it suitable for large storage and faster processing.

Learn HBase in our Hadoop Course

HBase Architecture: Components of HBase Architecture

HBase has three major components i.e., HMaster Server , HBase Region Server, Regions and Zookeeper .

The below figure explains the hierarchy of the HBase Architecture. We will talk about each one of them individually.


HBase Architecture: HBase Data Model & HBase Read/Write Mechanism

Now before going to the HMaster, we will understand Regions as all these Servers (HMaster, Region Server, Zookeeper) are placed to coordinate and manage Regions and perform various operations inside the Regions. So you would be curious to know what are regions and why are they so important?

HBase Architecture: Region

A region contains all the rows between the start key and the end key assigned to that region. HBase tables can be divided into a number of regions in sucha way that all the columns of a column familyis stored in one region. Each region contains the rows in a sorted order.

Many regions are assigned to a Region Server , which is responsible for handling, managing, executing reads and writes operations on that set of regions.

So, concluding in a simpler way:

A table can be divided into anumber of regions. A Region is a sorted range of rows storing data between a start key and an end key. A Regionhas a default size of 256MB which can be configured accor

Viewing all articles
Browse latest Browse all 6262

Trending Articles