Home
Explore
Submit Search
Upload
Login
Signup
Advertisement
Check these out next
Introduction to Spark Internals
Pietro Michiardi
Hadoopのシステム設計・運用のポイント
Cloudera Japan
Terraform on Azure
Mithun Shanbhag
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
DataWorks Summit
KVM tools and enterprise usage
vincentvdk
Apache Ambari: Past, Present, Future
Hortonworks
HBase Low Latency
DataWorks Summit
Impala Architecture presentation
hadooparchbook
1
of
43
Top clipped slide
Operating and Supporting Apache HBase Best Practices and Improvements
Jul. 10, 2016
•
0 likes
4 likes
×
Be the first to like this
Show More
•
2,810 views
views
×
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download Now
Download to read offline
Report
Technology
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
Follow
DataWorks Summit/Hadoop Summit
Advertisement
Advertisement
Advertisement
Recommended
Scaling HBase for Big Data
Salesforce Engineering
2.1K views
•
50 slides
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
41.7K views
•
189 slides
Optimizing Hive Queries
Owen O'Malley
35.9K views
•
36 slides
Cloudera Impala Internals
David Groozman
4.4K views
•
52 slides
Seamless replication and disaster recovery for Apache Hive Warehouse
DataWorks Summit
1.8K views
•
40 slides
Ozone and HDFS's Evolution
DataWorks Summit
972 views
•
28 slides
More Related Content
Slideshows for you
(20)
Introduction to Spark Internals
Pietro Michiardi
•
19.1K views
Hadoopのシステム設計・運用のポイント
Cloudera Japan
•
34.5K views
Terraform on Azure
Mithun Shanbhag
•
2.4K views
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
DataWorks Summit
•
7.8K views
KVM tools and enterprise usage
vincentvdk
•
4.9K views
Apache Ambari: Past, Present, Future
Hortonworks
•
3.2K views
HBase Low Latency
DataWorks Summit
•
5.1K views
Impala Architecture presentation
hadooparchbook
•
10.6K views
Cassandra Introduction & Features
DataStax Academy
•
31.1K views
Apache airflow
Pavel Alexeev
•
2.5K views
What is new in Apache Hive 3.0?
DataWorks Summit
•
6.4K views
ORC File - Optimizing Your Big Data
DataWorks Summit
•
11.4K views
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
•
7.9K views
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
•
1.8K views
HBaseCon 2013: Apache HBase Table Snapshots
Cloudera, Inc.
•
11.8K views
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
•
653 views
Introducing the Apache Flink Kubernetes Operator
Flink Forward
•
559 views
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
•
10.8K views
Application Timeline Server - Past, Present and Future
VARUN SAXENA
•
3.4K views
The basics of fluentd
Treasure Data, Inc.
•
40.1K views
Viewers also liked
(20)
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
•
1.7K views
Apache HBase Workshop
Valerii Moisieienko
•
1K views
Modern data warehouse
Stephen Alex
•
499 views
A First Look at San Francisco’s New ETL Job Platform
Safe Software
•
746 views
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
•
2.3K views
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Andreas Buckenhofer
•
974 views
Data warehouse design
ines beltaief
•
2.4K views
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
•
2K views
Supporting Data Services Marketplace using Data Virtualization
Denodo
•
536 views
Introduction to ETL process
Omid Vahdaty
•
3.1K views
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
•
939 views
Automate data warehouse etl testing and migration testing the agile way
Sandesh Gawande
•
11.4K views
Data Warehouse Back to Basics: Dimensional Modeling
Dunn Solutions Group
•
4.2K views
What's new in SQL on Hadoop and Beyond
DataWorks Summit/Hadoop Summit
•
705 views
Kafka Security
DataWorks Summit/Hadoop Summit
•
2.6K views
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Andreas Buckenhofer
•
1.2K views
Designing and implementing_an_etl_framework
Bharat Vadlamudi
•
220 views
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Denodo
•
743 views
YARN Federation
DataWorks Summit/Hadoop Summit
•
2.5K views
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
DataStax
•
13.5K views
Advertisement
Similar to Operating and Supporting Apache HBase Best Practices and Improvements
(20)
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
•
440 views
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
•
1.8K views
SQLintersection keynote a tale of two teams
Sumeet Bansal
•
476 views
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
•
1.2K views
MNPHP Scalable Architecture 101 - Feb 3 2011
Mike Willbanks
•
1K views
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
•
65.8K views
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon
•
5.3K views
Share point 2013’s distributed cache service 6.0 (1)
Hexaware Technologies
•
745 views
Inside MapR's M7
MapR Technologies
•
1.9K views
Inside MapR's M7
Ted Dunning
•
7.5K views
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
•
7.3K views
How to choose the right server
ASB INTERNATIONAL PVT LTD
•
285 views
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
•
1.4K views
Tales from the Cloudera Field
HBaseCon
•
4K views
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios
•
2.1K views
Hdfs 2016-hadoop-summit-dublin-v1
Chris Nauroth
•
1.8K views
HDFS: Optimization, Stabilization and Supportability
DataWorks Summit/Hadoop Summit
•
3.5K views
Apache Hadoop 3.0 Community Update
DataWorks Summit
•
1.4K views
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
•
6.2K views
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
•
588 views
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
•
9.6K views
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
•
3.2K views
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
•
6.8K views
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
•
1.4K views
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
•
2.1K views
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
•
1K views
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
•
4.2K views
Data Science Crash Course
DataWorks Summit/Hadoop Summit
•
2.5K views
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
•
3.1K views
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
•
8.7K views
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
•
5.4K views
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
•
2.1K views
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
•
9.9K views
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
•
1.5K views
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
•
760 views
HBase in Practice
DataWorks Summit/Hadoop Summit
•
5.4K views
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
•
567 views
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
•
962 views
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
•
1.4K views
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
•
1.2K views
Advertisement
Recently uploaded
(20)
Lesson Plan.docx
ThokalaNandiniReddy
•
0 views
finalppt-150606051347-lva1-app6892.pptx
AJAYVISHALRP
•
0 views
International Journal of Computer Science, Engineering and Applications (IJCS...
IJCSEA Journal
•
0 views
The era of spatial computing is here.docx
ssuserb36abd1
•
0 views
DEMO20_Cash Pool - Predemo Steps.pptx
VatsalaC1
•
0 views
Declarative observability management for Microservice architectures
Sven Bernhardt
•
0 views
Ir-Ru-Contract-Scanned.pdf
Ярина Клос
•
0 views
Les10.ppt
AlhassanFederated
•
0 views
The CAFE community: a local, inclusive programming community for researchers ...
SURFevents
•
0 views
Rive
Artmiker Studios
•
0 views
How Does Grocery Delivery App Generate Revenue.pdf
Shopia Wilson
•
0 views
Underexplored Opportunities in the Arabian Plate.pdf
ssuseref75f1
•
0 views
THE-INFORMATION-DISORDER.pdf
JackMaristela
•
0 views
DS Fusion CE - External Transactions.pptx
VatsalaC1
•
0 views
Process 84% more MySQL database activity with the latest-gen Dell PowerEdge R...
Principled Technologies
•
0 views
normal vs. cute.pptx
ShaliniSreedharan1
•
0 views
QAM Microsoft PowerPoint جديد.pptx
ssuserd6ee01
•
0 views
P4+ONOS SRv6 tutorial.pptx
tampham61268
•
0 views
What's new in web in 2023
RajeshKumar825078
•
0 views
StructuralUncertainty_example.pptx
ssuseref75f1
•
0 views
Operating and Supporting Apache HBase Best Practices and Improvements
Page1 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Operating and Supporting Apache HBase - Best Practices and Improvements Tanvir Kherada (tkherada@hortonworks.com) Enis Soztutar (enis@hortonworks.com)
Page2 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved About Us Tanvir Kherada Primary SME for HBase / Phoenix Technical team lead @Hortonworks support Enis Soztutar Committer and PMC member in Apache HBase, Phoenix, and Hadoop HBase/Phoenix dev @Hortonworks
Page3 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Outline Tools to debug: HBase UI and HBCK Top 3 categories of issues SmartSense Improvements for better operability Metrics and Alerts
Page4 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Tools
Page5 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved HBase UI Load Distribution Debug Dump Runtime Configuration RPC Tasks
Page6 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved HBase UI – Load Distribution Request Per Second Read Request Count per RegionServer Write Request Count per RegionServer
Page7 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved HBase UI – Debug Dump contains Thread Dumps
Page8 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved HBase UI – Runtime Configurations Runtime configurations can be reviewed from UI Consolidated view of every relevant configuration.
Page9 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved HBase UI – Tasks Tasks can be reviewed and monitored Like major compactions. RPC calls
Page10 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved HBCK Covered extensively later while we discuss inconsistencies
Page11 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Regionserver Stability Issues
Page12 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Region Server Crashes – JVM Pauses Hbase’s high availability comes from excellent orchestration conducted by ZooKeeper on monitoring every RS and Hbase Master Zookeeper issues a shutdown of RS if a heartbeat check to RS is not responded within timeout Extended JVM pauses at a RS can manifest as unresponsive RS causing ZK to issue a shutdown ZK RSHeartBeat Check I am ok ZK RS In GC ShutDown Issued HeartBeat Check No Response
Page13 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Region Server Crashes - Garbage Collection Pause What do we see in RS Logs? 2016-06-13 18:13:20,533 WARN regionserver/b-bdata-r07f4- prod.phx2.symcpe.net/100.80.148.53:60020 util.Sleeper: We slept 82136ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad 2016-06-13 18:13:20,533 WARN JvmPauseMonitor util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 79669ms GC pool 'ParNew' had collection(s): count=2 time=65742ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=14253ms
Page14 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Region Server Crashes - Garbage Collection Pause GC Tuning Recommendation for CMS and YoungGen. – hbase-env.sh -Xmx32g -Xms32g -Xmn2500m -XX:PermSize=128m (eliminated in Java 8) -XX:MaxPermSize=128m (eliminated in Java 8) -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly Also test G1 for your use case.
Page15 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved RS Crashes - Non GC JVM Pause Disk IO GC logs show unusual behavior What we’ve seen is a delta between user time and real time taken in GC logs. 2015-07-06T23:55:10.642-0700: 7271.224: [GC2015-07-06T23:55:41.688- 0700: 7302.270: [ParNew: 420401K->1077K(471872K), 0.0347330 secs] 1066189K->646865K(32453440K), 31.0811340 secs] [Times: user=0.77 sys=0.01, real=31.08 secs] This is that classic head scracthing moment.
Page16 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved RS Crashes - Non GC JVM Pause Disk IO With no further leads in RS logs and GC logs we focus on system level information. /var/log/message provides significant leads Right when the we see that unusual delta between user and real clocks in GC logs we see the following in system logs kernel: sd 0:0:0:0: attempting task abort! scmd(ffff8809f5b7ddc0) kernel: sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 17 0b 1c c8 00 00 08 00 kernel: scsi target0:0:0: handle(0x0007), sas_address(0x4433221102000000), phy(2) kernel: scsi target0:0:0: enclosure_logical_id(0x500605b009941140), slot(0) kernel: sd 0:0:0:0: task abort: SUCCESS scmd(ffff8809f5b7ddc0) Enabling DEBUG logging at disk driver level clearly showed 30 seconds pauses during write operations.
Page17 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved RS Crashes - Non GC JVM Pause CPU Halts RS Logs show long JVM pause However; it explicitly clarifies that it’s a non GC Pause 2016-02-11 04:59:33,859 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 140009ms No GCs detected 2016-02-11 04:59:33,861 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 140482ms instead of We look at other component logs on the same machine. DataNode logs show break in activity around the same time frame. We don’t see exceptions in DN logs. But certainly break in log continuation.
Page18 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved RS Crashes - Non GC JVM Pause CPU Halts Start looking at system level information dmesg buffer logs by running dmesg command provides leads on CPU pauses INFO: task java:100759 blocked for more than 120 seconds. Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D 000000000000001b 0 100759 100731 0x00000080 This was identified as a kernel level Red Hat bug Root Cause: hpsa driver can block CPU's workqueue for up to 10 minutes timeout as it waits for controller's acknowledgment. When this happens it results in stalled workqueue. And since the tty work ended up in the same CPU workqueue, we have the hung task
Page19 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Mitigate JVM Pauses Mitigate Crashes from JVM Pauses? – Extend ZK Tick Time in zoo.cfg – Extend zookeeper.session.timeout to match tick time in hbase-site.xml How Much? $ cat hbase-hbase*.log | grep –i pause 97903ms 102732ms 106956ms 112824ms 125318ms 165652ms – Biggest Pause so Far Consider – 180000ms Not my favorite workaround. Cons? • Now ZK will wait for extended time to issue a shutdown. • Makes Hbase fall short on its High Availability promises. • Make every effort to debug and resolve pauses.
Page20 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Read Write Performance
Page21 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Write Performance • Write to WAL caps your write performance. • Relies on throughput of DataNode Pipeline • Writes to Memstore is instantaneous • Writes build up in RS heap • Flushes eventually on the disk
Page22 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Write Performance How to go about debugging Write Performance issues in really huge clusters? – Thanks to Hbase community, starting Hbase 0.99 onwards we have DN pipeline printed for slow Hlog Sync. – For Hlog writes slower than what is configured as hbase.regionserver.hlog.slowsync.ms we now print DN pipeline in RS logs. 2016-06-23 05:01:06,972 INFO [sync.2] wal.FSHLog: Slow sync cost: 131006 ms, current pipeline: [DatanodeInfoWithStorage[10.189.115.117:50010,DS-c9d2a4b4-710b-4b3a-bd9d-93e8ba443f60,DISK], DatanodeInfoWithStorage[10.189.115.121:50010,DS-7b7ba04c-f654-4a50-ad3b-16116a593d37,DISK], DatanodeInfoWithStorage[10.189.111.128:50010,DS-8abb86da-84ac-413f-80a3-56ea7db1cb59,DISK]] Tracking slow DN prior to Hbase 0.99 was a very convoluted process. – It starts with tracking which RS has RPC call queue length backing up – Identify the most recent WAL file associated with that RS – Run hadoop fsck –files –blocks –locations <WAL file> – Identify DN involved with hosting blocks for the most recent WAL file
Page23 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Read Performance Hbase provides block caching which can improve subsequent scans However first read has to follow the read path of hitting HDFS first and the disk eventually. Read performance ideally depends on how fast the disks are responding. Best Practices to Improve Read Performance Major Compactions - Once a day during low traffic hour. Balanced Cluster – Even distribution of regions across all region servers
Page24 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Read Performance – Best Practices Major Compaction – Consolidates multiple store files into one – Drastically improves block locality to avoid remote calls to read data. – Review Block Locality Metrics in RegionServer UI
Page25 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Read Performance – Best Practices Balanced Cluster – Even distribution of regions across all regionserver – Balancer if turned on runs ever 5 minutes and keeps balancing the cluster – It prevents a regionserver from being the most sought after regionserver. Preventing Hot Spotting Other Configs – Enable HDFS Short Circuit – Turned on by Default in HDP distribution. – Client Scanner Cache hbase.client.scanner.caching. Set to 100 in HDP by default
Page26 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies
Page27 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Hbase stores information in multiple places which includes Unhandled situation within Meta, ZK, HDFS or Master just throws the entire system out of sync causing inconsistencies Region Splits is an extremely complex and orchestrated work flow. It includes interaction with all of the above mentioned components and has very little room for error. We’ve seen the most inconsistencies coming out of region splits. – Lingering reference files – Catalog Janitor prematurely deleting parent store file. HBASE-13331 HDFS Zookeeper META Master Memory
Page28 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Symptoms Client Hbase Region Not Serving Retries/Time Out
Page29 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Tools to identify and resolve inconsistencies HBCK – Great Tool to identify inconsistencies • Can be executed from any hbase client machine • Confirms if Hbase is healthy or has inconsistencies • Provides fix options to resolve inconsistencies HBCK not a silver bullet • Deep dive into RS logs • Review Znodes • Hbase Master UI • Won’t run if Master has not initialized
Page30 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Some of the inconsistencies we see – ERROR: Region { meta => xxx,x1A,1440904364342.ffdece0f3fc5323055b56b4d79e99e16., hdfs => null, deployed => } found in META, but not in HDFS or deployed on any region server – This is broken meta even though it says file missing on HDFS. – hbase hbck -fixMeta Zookeeper Master Memory HDFS META
Page31 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Some of the inconsistencies we see – ERROR: There is a hole in the region chain between X and Y. You need to create a new .regioninfo and region dir in hdfs to plug the hole. – This is broken HDFS. Expected region directory is missing – hbase hbck –fixHdfsOrphans -fixHdfsHoles ZookeeperHDFS Master MemoryMETA
Page32 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Some of the inconsistencies we see – ERROR: Found lingering reference file hdfs://namenode.example.com:8020/apps/hbase/data/XXX/f1d15a5a44f966f3f6ef1db4bd2b1874/a/ d730de20dcf148939c683bb20ed1acad.5dedd121a18d32879460713467db8736 – Region Splits did not complete successfully leaving lingering reference files – hbase hbck -fixReferenceFiles ZookeeperHDFS Master MemoryMETA
Page33 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Some of the inconsistencies we see – HBCK reporting 0 inconsistencies after running the fixes. – However hbase master UI is still reporting RIT – Restart Hbase Master to resolve this. ZookeeperHDFS Master MemoryMETA
Page34 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Not Always Straight Forward – ERROR: Region { meta => null, hdfs => hdfs://xxx/hbase/yyy/00e2eed3bd0c3e8993fb2e130dbaa9b8, deployed => } on HDFS, but not listed in META or deployed on any region server – Inconsistency of this nature needs deeper dive into other inconsistencies – It also need assessment of logs. HDFS Master Memory Zookeeper META
Page35 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Inconsistencies Hbase Hbck Best Practices • Redirect output to a file hbase hbck >>/tmp/hbck.txt • Larger clusters run table specific hbck fixes • hbase hbck –fixMeta mytable • Avoid running hbck with –repair flag.
Page36 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved SmartSense
Page37 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved SmartSense Great at detecting setup/config issues proactively – Ulimits – Dedicated ZK drives – Transparent Huge Pages – Swapiness This is common knowledge. However; if you don’t have it setup SmartSense will prompt for resolution
Page38 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved SmartSense
Page39 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Improvements in Ops and Stability
Page40 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Metrics You MUST have a metric solution to successfully operate HBase cluster(s) – GC Times, pause times – Gets / Puts, Scans per second – Memstore and Block cache (use memory!) – Queues (RPC, flush, compaction) – Replication (lag, queue, etc) – Load Distribution, per-server view – Look at HDFS and system(cpu, disk) metrics as well Use OpenTSDB if nothing else is available New versions keep adding more and more metrics – Pause times, more master metrics, per-table metrics, FS latencies, etc How to chose important metrics out of hundreds available? Region Server and Master UI is your friend
Page41 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Grafana + AMS <insert grafana>
Page42 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Other Improvements Canary Tool – Monitor per-regionserver / per-region, do actual reads and writes, create alerts Procedure V2 based assignments – Robust cluster ops (HBase-2.0) – Eliminate states in multiple places – Less manual intervention will be needed Bigger Heaps – Reduce garbage being generated – More offheap stuff (eliminate buffer copy, ipc buffers, memstore, cells, etc) Graceful handling of peak loads – RPC scheduling – client backoff Rolling Upgradable, no downtime
Page43 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Thanks. Q & A
Editor's Notes
- What is hbase? - What is it good at? - How do you use it in my applications? Context, first principals
Understand the world it lives in and it’s building blocks
Understand the world it lives in and it’s building blocks
Understand the world it lives in and it’s building blocks
Understand the world it lives in and it’s building blocks
Understand the world it lives in and it’s building blocks
Understand the world it lives in and it’s building blocks
Understand the world it lives in and it’s building blocks
Advertisement