WhatsApp chatbot now available in 3 pricing plans - 250 USD for 10K messages, 333 USD for 30K messages and 600 USD for 100K messages, all inclusive.

How does HBase scale?

Shripati Bhat
|
3
min read
How does HBase scale?

HBase is a column-oriented database. It belongs to the NoSQL family of databases which means that there are no fixed rows, columns or defined structure in the database which all data in the database must compulsorily adhere to. To understand how HBase scales with such a dynamic data within it, we would need to understand the components in HBase.

Zookeeper:

HBase is a distributed database. As any distributed database, there has to be a component which centrally manages the metadata of all other components. HBase uses zookeeper for this task. Zookeeper manages the meta information like number of region servers, location of all components (master, region server). Zookeeper constants does heart beats with all components, reports to the Master Server in case of region server failure and does a master re-election in case of a Master Server node failure.

Putting in zookeeper as a component here might be a little bit confusing because it can be considered as a dependency rather than a component. But HBase is so closely knit to zookeeper for functioning that it could almost be treated as a component in HBase.

HMaster:

As the name suggests, HMaster is the Master Server which monitors region servers. In HBase1.6, the metadata existed in zookeeper and hence the clients could reach to the required region server even when Master Server is down. HMaster only comes into picture on table creation or alter, when a new region server is added or when there is a failure in one of the existing region servers. WALs are used to transfer the current state to another Master when the current Master Server crashes

Region Servers:

Region Servers are primarily responsible for doing handle the Read (GET) & Write (PUT)requests that come from the client (via the ZK). In addition to this, Region Servers also manageRegions (smallest units of a table holding actual data)

Compactions:

Region Servers goes through different types of Compactions.

A minor compaction just collates multiple smaller files on the disk into one to make the read more efficient and fast (occurs when more than a configurable amount of data is held inMemory - hbase.hregion.memstore.flush.size).

A major compaction on the other hand (occurs once a week by default) combines all files stored into one per Region per store and also deletes any data that has to be cleaned up.Hence the major compaction even though is intended to benefit reads, but ends up taking a lot of resources and can cause problems in Production environment.

Region:

Regions are the smallest units comprising of a store per column family (which holds data in memory and persists it to disk after a threshold). Regions split automatically and is done by the Region Server. The Region Server first informs Zookeeper about splitting a region. It then creates daughter regions and splits HDFS files as needed. After successful completion, the meta data is updated in other region servers, zookeeper and master (in the same order). Thus, HBase automatically handles both splitting and compaction of data in regions to optimize the read operations.

CAP Theorem:

Now HBase is strongly consistent and unlike few other NoSQL database like MongoDB (whensecondary reads are enabled - https://docs.mongodb.com/manual/reference/read-concern/or partition replica failure occurs) or Cassandra (through consistency level configuration -https://cassandra.apache.org/doc/latest/configuration/cass_yaml_file.html#idealconsistency-level)

Due to the unavailability of Regions during the split, HBase cannot provide Availability.Consistency is achieved by reading / writing through Region Servers concerned. PartitionTolerance is achieved by replication of data files using the underlying HDFS.

Wrapping up

Hope you guys now know a little bit more in detail about the various component in HBase and how it can be used to handle millions of datapoints and still return results in the order of a few milliseconds

Tags
No items found.
About Engati

Engati is a one-stop platform for delighted customers. With our intelligent bots, we help you create the smoothest of Customer Experiences. And now, we're helping you find those customers too. The award-winning Marketing Automation platform, LeadMi, received some major upgrades and joined our family as Engati Acquire. So, let's get started?

Get Started Free