19. HBase Compact
• HBase stores rows of data in tables. Tables are split into chunks
of rows called “regions”. Those regions are distributed across
the cluster, hosted and made available to client processes by the
RegionServer process.
• A region is a continuous range within the key space, meaning
all rows in the table that sort between the region’s start key and
end key are stored in the same region. a single row key belongs
to exactly one region at any point in time
• A Region in turn, consists of many “Stores”, which correspond
to column families. A store contains one memstore and zero or
more store files. The data for each column family is stored and
accessed separately.
23. HBase Compact
There are two types of compactions: minor and major.
Minor compactions will usually pick up a couple of the smaller adjacent
StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells,
only major compactions do this.
Sometimes a minor compaction will pick up all the StoreFiles in the Store
and in this case it actually promotes itself to being a major compaction.
24. HBase Compact
主要入口:
HregionServer启动时,会启动一个任务,定期扫描RegionServer托管
的所有Hregion下面的所有store,检查是否需要进行compact.
在不做配置的情况下10000s,大概3个小时左右check一次。
see : org.apache.hadoop.hbase.regionserver.CompactionChecker;
31. HBase Split
Why
Hbase
needs
split?
A table typically consists of many regions, which are in turn
hosted by many region servers. Thus, regions are the physical
mechanism used to distribute the write and query load across
region servers. When a table is first created, HBase, by default, will
allocate only one region for the table. This means that initially, all
requests will go to a single region server, regardless of the number
of region servers. This is the primary reason why initial phases of
loading data into an empty table cannot utilize the whole capacity
of the cluster.
实现较为复杂,涉及到旧Region下线,以及新的Region上
线。