Trends in Supporting Production Apache HBase Clusters
by Hadoop_Summit on Jul 09, 2013
- 631 views
Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. By supporting a wide range of production ...
Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. By supporting a wide range of production Apache HBase clusters with diverse use cases and sizes over the past year, we?ve noticed several new trends, learned lessons, and taken action to improve the HBase experience. We?ll present aggregated root-cause statistics on resolved support tickets from the past year. The comparison between this and the previous year?s shows an interesting shift away from problems internal to HBase (splitting, repairs, recovery time) that skews towards user-inflicted problems like poor application architecture level that can be mitigated by tuning (bulk load, r/w latencies and compaction policies). The talk will discuss several tuning tips used for a variety of production workloads running on top of the HBase 0.92.x/0.94.x clusters with 10s to 100s of nodes. This will include settings and their justification for sizing clusters, tuning bulk loads, region counts, and memory settings. We?ll also discuss recently added HBase features that alleviate these problems including an improved mean time to recovery, improved predictability, and improved performance.
- Total Views
- Views on SlideShare
- Embed Views