• HBase stores rows of data in tables. Tables are split into chunks
of rows called “regions”. Those regions are distributed across
the cluster, hosted and made available to client processes by the
• A region is a continuous range within the key space, meaning
all rows in the table that sort between the region’s start key and
end key are stored in the same region. a single row key belongs
to exactly one region at any point in time
• A Region in turn, consists of many “Stores”, which correspond
to column families. A store contains one memstore and zero or
more store ﬁles. The data for each column family is stored and
There are two types of compactions: minor and major.
Minor compactions will usually pick up a couple of the smaller adjacent
StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells,
only major compactions do this.
Sometimes a minor compaction will pick up all the StoreFiles in the Store
and in this case it actually promotes itself to being a major compaction.
see : org.apache.hadoop.hbase.regionserver.CompactionChecker;
A table typically consists of many regions, which are in turn
hosted by many region servers. Thus, regions are the physical
mechanism used to distribute the write and query load across
region servers. When a table is ﬁrst created, HBase, by default, will
allocate only one region for the table. This means that initially, all
requests will go to a single region server, regardless of the number
of region servers. This is the primary reason why initial phases of
loading data into an empty table cannot utilize the whole capacity
of the cluster.