Tech Corner

How does HBase scale?

Shripati Bhat
.
Sep 14
.
2-3 mins

Table of contents

Automate your business at $5/day with Engati

REQUEST A DEMO
Learn HBase

HBase is a column-oriented database. It belongs to the NoSQL family of databases which means that there are no fixed rows, columns, or defined structure in the database which all data in the database must compulsorily adhere to. To understand how HBase scales with these dynamic data components within it, we would need to understand the components in HBase.

Zookeeper:

HBase is a distributed database. As any distributed database, there has to be a component that centrally manages the metadata of all other components. HBase uses Zookeeper for this task. Zookeeper manages the meta information like the number of region servers, location of all components (master, region server). Zookeeper constants do heartbeats with all components, reports to the Master Server in case of region server failure and does a master re-election in case of a Master Server node failure.

Putting in Zookeeper as a component here might be a little bit confusing because it can be considered as a dependency rather than a component. But HBase is so closely knit to Zookeeper for functioning that it could almost be treated as a component in HBase.

HBase structure
HBase structure

HMaster:

As the name suggests, HMaster is the Master Server that monitors region servers. In HBase1.6, the metadata existed in Zookeeper, and hence the clients could reach the required region server even when Master Server is down. HMaster only comes into the picture on table creation or alter when a new region server is added or when there is a failure in one of the existing region servers. WALs are used to transfer the current state to another Master when the current Master Server crashes.

Region Servers:

Region Servers are primarily responsible for doing handle the Read (GET) & Write (PUT) requests that come from the client (via the ZK). In addition to this, Region Servers also manage Regions (smallest units of a table holding actual data).

Compactions:

Region Servers go through different types of Compactions.

A minor compaction just collates multiple smaller files on the disk into one to make the read more efficient and fast (occurs when more than a configurable amount of data is held in Memory - hbase.hregion.memstore.flush.size).

A major compaction, on the other hand (occurs once a week by default), combines all files stored into one per Region per store and also deletes any data that has to be cleaned up.

Hence even though the major compaction is intended to benefit reads, it ends up taking a lot of resources and can cause problems in the Production environment.

Region:

Regions are the smallest units comprising of a store per column family (which holds data in memory and persists it to disk after a threshold). Regions split automatically and is done by the Region Server. The Region Server first informs Zookeeper about splitting a region. It then creates daughter regions and splits HDFS files as needed. After successful completion, the metadata is updated in other region servers, Zookeepers, and master (in the same order). Thus, HBase automatically handles both splitting and compaction of data in regions to optimize the read operations.

CAP Theorem:

Now HBase is strongly consistent and unlike a few other NoSQL database like MongoDB (when secondary reads are enabled - https://docs.mongodb.com/manual/reference/read-concern/or partition replica failure occurs) or Cassandra (through consistency level configuration -https://cassandra.apache.org/doc/latest/configuration/cass_yaml_file.html#idealconsistency-level)

Due to the unavailability of Regions during the split, HBase cannot provide Availability.

Consistency is achieved by reading / writing through Region Servers concerned. PartitionTolerance is achieved by replication of data files using the underlying HDFS.

Wrapping up

Hope you guys now know a little more about the various components in HBase and how it can be used to handle millions of data points and still return results in the order of a few milliseconds. That’s how easy it is to scale HBase. Here's our article on how to install Hbase on Mac!

Want to empower your business with an AI-powered chatbot? Register with Engati to get started!

Shripati Bhat

Shripati is a Senior Director at Engati. He's a technical enthusiast, passionate about designing and building scalable software.

Shripati has a deeply ingrained customer-first ideology and is highly skilled in designing and developing Java/J2EE applications and BigData applications.

Close Icon
Request a Demo!
Get started on Engati with the help of a personalised demo.
Thanks for the information.
We will be shortly getting in touch with you.
Oops! something went wrong!
For any query reach out to us on contact@engati.com
Close Icon
Congratulations! Your demo is recorded.

Select an option on how Engati can help you.

I am looking for a conversational AI engagement solution for the web and other channels.

I would like for a conversational AI engagement solution for WhatsApp as the primary channel

I am an e-commerce store with Shopify. I am looking for a conversational AI engagement solution for my business

I am looking to partner with Engati to build conversational AI solutions for other businesses

continue
Finish
Close Icon
You're a step away from building your Al chatbot

How many customers do you expect to engage in a month?

Less Than 2000

2000-5000

More than 5000

Finish
Close Icon
Thanks for the information.

We will be shortly getting in touch with you.

Close Icon

Contact Us

Please fill in your details and we will contact you shortly.

Thanks for the information.
We will be shortly getting in touch with you.
Oops! Looks like there is a problem.
Never mind, drop us a mail at contact@engati.com