What about Hardware? <ul><li>Having more smaller nodes is better than having less faster bigger nodes.
Lots of RAM is good but only to a point, just avoid swap.
We use sub $1k desktop grade servers, they work great!
Check your network hardware for packet drops (we had outifDiscards interrupting zookeeper messages, Region servers would suicide during packet loss), just use ping -f to test for packet loss between core nodes.
JVM GC does take lots of CPU when misconfigured – e.g. Small NewSize
Single Namenode? No problem, just build two clusters have your APP tier do log query replication and replays when needed.
Inexpensive 2TB hitachi disks (~$100) work great, get more units for your money. </li></ul>
Critical configuration elements <ul><li>Starting with Hbase is easy, but we need to pay attention to: </li><ul><li>1. Do not start without Graphs – trends over time are critical.
2. Setup HDFS to work flawlessly (pay attention to ulimits, thread limits, hardware stats, graphs, iowait, etc)
3. Adjust JVM GC NewSize to be at least 100MB (if YG GC is too slow for 100MB, you need faster CPUs).
4. For metadata rows (small rows) adjust your Hbase block size to be 4 or 8kb, you will see less IO and more blocks will fit into RAM.
5. Setup write cache (memstore) and read cache (block cache) depending on your load </li><ul><li>Write cache must have lower and upper limits close to each other otherwise you will have very large cache flushes, not good for IO or GC. </li></ul></ul></ul>
What to monitor? (via Ganglia) <ul><li>Runnable threads graph should be flat, if its not, you like have some contention somewhere (IO, HDFS, etc)
Memstore size graph should be fairly flat with even flushes over time.
Iowait graphs should not go over 70-80% during major compaction, and 20% during minor compactions. Otherwise just add more disks and/or nodes.
Monitor and graph Thrift threads (via ps -eLf | grep PID), if your threads end up over 25,000, you may run out of RAM. We have dedicated thrift boxes so that we don't accidently kill RS nodes.
We use Nagios to monitor and alert for DN, RS, ZK, NN, etc on their web tcp ports – very helpful.
Run hbck to check for consistency of meta structures. </li></ul>
Issues we had. <ul><li>App Tier bugs would abuse Hbase, generate millions of queries – logging all RPC calls to HBASE on the App Tier is critical. Took us long time to figure out that Hbase was not at fault, because we did not know what to expect.
Various RAM brands – boxes crash for no reason.
Glibc in FC13 had race condition bug, would lock up nodes, crash JVM processes under high load. Solution: yum -y update glibc (invalid binfree)
When running in mixed hardware environment, some boxes were slow enough to affect HDFS for the whole cluster – looking at “runnable threads” and “fsreadlatency” in Ganglia always pointed which boxes were 'slow'
Running cloudera HDFS under user 'hadoop', that was restricted to 1024 threads by default would crash datanodes, but only during compactions. Setting hadoop soft(and hard) nproc 32,000 in limits.conf resolved it.
GC sometimes autotunes NewSize of 20MB, caused GC run to 20 or 30 per second, causing CPU to flatline at 100% and kill the RS. Manually setting to 128MB resolved this issue. </li></ul>
Finally everything is tuned. <ul><li>And Hbase Runs great!