Your SlideShare is downloading. ×
  • Like
006 performance tuningandclusteradmin
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

006 performance tuningandclusteradmin



  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 2. AGENDA Course Credit Performance Tuning More… Cluster Administration More…2
  • 3. COURSE CREDIT Show up, 30 scores Ask question, each question earns 5 scores Hands-on, 40 scores 70 scores will pass this course Each course credit will be calculated once for eachcourse finished The course credit will be sent to you and yoursupervisor by mail3
  • 4. PERFORMANCE TUNING Garbage Collection Tuning MSLAB Compression Optimizing Splits and Compactions Load Balancing Merging Regions Client API: Best Practices Configuration Load Tests4
  • 5. GARBAGE COLLECTION TUNING The process to rewrite the heap generation inquestion is called a garbage collection (GC) GC parameters only need to be added to the regionservers JRE comes with basic assumptions Regarding what your programs are doing, how theycreate objects, how they allocate the heap to handledata, and so on These assumptions work well in a lot of cases But NOT work well for HBase… Especially write-heavy ones It cannot safely rely on the JRE assumption alone 5
  • 6. 6
  • 7. 7
  • 8. GARBAGE COLLECTION TUNING –WRITE-HEAVY USE CASES (1/2) Memstore flushes the data by the configured minimumflush size, hbase.hregion.memstore.flush.size It leaves different size of holes in the heap Data resided in different locations in the generationalarchitecture of the Java heap Depending on how long the data was in memory Young generation (new generation) The space can be reclaimed quickly and no harm is done Old generation (tenured generation) Data promoted to this location if it stays in memory for a longerperiod of time8
  • 9. GARBAGE COLLECTION TUNING –WRITE-HEAVY USE CASES (2/2) Reuse the holes created by data that has been writtento disk Requests a size of heap that does not fit into one ofthose holes Needs to compact the fragmented heap Young to Old The promotion of longer-living objects from the young to the oldgeneration Old to Stop-The-World There is no longer enough space for a young allocation caused bythe fragmentation Falls back to the stop-the-world garbage collector Rewrites the entire heap space and compacts it to the remainingactive objects If this fails, you will see a promotion failure in yourgarbage collection logs9
  • 10. 10What is the Heap looks like ?
  • 11. GARBAGE COLLECTION TUNING –SPECIFY THE YOUNG GENERATION SIZE Young generation is between 128 MB and 512 MB Old generation holds the remaining available heap, which is usuallymany gigabytes of memory Using 128 MB is a good starting point Further observation of the JVM metrics should beconducted Specify the young generation size like so -XX:MaxNewSize=128m -XX:NewSize=128m One convenient option -Xmn128m11
  • 12. GARBAGE COLLECTION TUNING –GC OPTIONS SETTING GC Options setting for HBase Adding them in the configuration file HBASE_OPTS variable for all HBase HBASE_REGIONSERVER_OPTS variable for all regionservers Enable the JRE’s log output for garbage collectiondetails Monitor it for occurrences of "concurrent mode failure" or "promotionfailed" messages 12-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
  • 13. GARBAGE COLLECTION TUNING –GC STRATEGY FOR YOUNG GENERATION Recommended value for young generation -XX:+UseParNewGC Use the Parallel New Collector It stops the entire Java process to clean up the younggeneration heap Since Young generation’s size is small in comparison Usually less than a few hundred milliseconds13
  • 14. GARBAGE COLLECTION TUNING –GC STRATEGY FOR OLD GENERATION Recommended value for old generation -XX:+UseConcMarkSweepGC Use the Concurrent Mark-Sweep Collector (CMS) It tries to do as much work concurrently aspossible, without stopping the Java process It takes extra effort and an increased CPU load Avoids the required stops to rewrite a fragmented oldgeneration heap If you hit the promotion error It falls back to stop-the-world again14
  • 15. GARBAGE COLLECTION TUNING –GC STRATEGY FOR OLD GENERATION A switch for CMS -XX:CMSInitiatingOccupancyFraction=70 A percentage that specifies when the backgroundprocess starts Avoids the concurrent mode failure The background process to mark and sweep the heap forcollection is still running when the heap runs out of usablespace Falls back to stop-the-world again Initiating occupancy fraction to 70% 20% block cache + 40% memstore limits = 60%, by default Starts the background process at appropriate time Early enough, and not too early 15
  • 16. GARBAGE COLLECTION TUNING - SUMMARY Recommended GC options The Alex Su’s GC options GC Options Reference16export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<%= hbase_log_path %>/hbase-regionserver-gc-`date +%F-%H-%M-%S`.log -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:PrintFLSStatistics=1 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<%= hbase_log_path %>/hbase-regionserver.hprof
  • 17. MSLAB - QUESTION For solving the stop-the-world issue Stop-the-world The key to reducing these compacting collections is toreduce fragmentation Only objects of exactly the same size should beallocated from the heap Subsequent allocations of new objects of the exact same sizewill always reuse these holes No promotion error, and therefore no stop-the-worldcompacting collection is required17
  • 18. MSLAB –MEMSTORE-LOCAL ALLOCATION BUFFER (1/3) Are buffers of fixed sizes containing KeyValueinstances of varying sizes1. A buffer cannot completely fit a newly addedKeyValue, it is considered full2. And a new buffer is created, once again of the givenfixed size Enabled by default in version 0.92 Disabled in version 0.90 of HBase hbase.hregion.memstore.mslab.enabled property It is recommended that test your setup with thisfeature 18
  • 19. MSLAB –MEMSTORE-LOCAL ALLOCATION BUFFER (2/3) The size of each allocated, fixed-sized buffer hbase.hregion.memstore.mslab.chunksize property Default is 2 MB Based on your KeyValue instances, you may have to adjustthis value E.g., 100 KB in size, you need to increase the MSLAB size to fitmore than just a few cells An upper boundary of what is stored in the buffers hbase.hregion.memstore.mslab.max.allocation property Default 256 KB Any cell (KeyValue) that is larger will be directly allocated inthe Java heap 19
  • 20. MSLAB –MEMSTORE-LOCAL ALLOCATION BUFFER (3/3) MSLAB do not come without a cost More wasteful in regard to heap usage Most likely not fill every buffer to the last byte A Tradeoff Use MSLABs and benefit from better garbage collection butincur the extra space that is required NOT use MSLABs and benefit from better memoryefficiency but deal with the problem caused by garbagecollection pauses Could plan to restart the servers every few days, or weeks, beforethe pause happens The buffers require an additional byte array copyoperation, therefore slightly slower Measure the impact on your workload20
  • 21. COMPRESSION A number of compression algorithms that can beenabled at the column family level It is recommended Enable compression unless you have a reason not to doso For example, when using already compressed content, suchas JPEG images Compression usually will yield overall betterperformance The overhead of the CPU performing the compression/de-compression is less than what is required to readmore data from disk21
  • 22. COMPRESSION – AVAILABLE CODECS It is recommended Snappy/Zippy (in Bigtable) Released by Google under the BSD License Ships with the required JNI libraries to be able to use it in HBase-0.92 Must install the native binary library on all region servers LZO (Lempel-Ziv-Oberhumer) A lossless data compression algorithm that is focused ondecompression speed, and written in ANSI C HBase cannot ship with LZO because of licensing issues incompatible GNU General Public License (GPL) LZO installation needs to be performed separately, after HBase hasbeen installed22
  • 23. COMPRESSION –COMPRESSION TEST TOOL Use command hbase org.apache.hadoop.hbase.util.CompressionTest<path> <none|gz|lzo|snappy> Example ./bin/hbase org.apache.hadoop.hbase.util.CompressionTest /user/larsgeorge/test.gz gz It will return result based on the test If success If failed23…SUCCESSException in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException:com.hadoop.compression.lzo.LzoCodec…
  • 24. COMPRESSION – STARTUP CHECK A fast failing setup notices the missing libraries Instead of running into issues later For example, check the Snappy and LZOcompression libraries The server will abort at startup with an IOExceptionstating "Compression codec <codec-name> notsupported, aborting RS construction" Copy the changed configuration file to all regionservers and to restart them afterward24<property><name>hbase.regionserver.codecs</name><value>snappy,lzo</value></property>
  • 25. COMPRESSION – ENABLING COMPRESSION Install the JNI libraries Install native compression libraries Specifying the chosen algorithm in the column family schema In HBase shell create testtable, { NAME => colfam1, COMPRESSION => GZ } In API HColumnDescriptor.setCompressionType(…) Refer to ppt#003, p#1125
  • 26. OPTIMIZING SPLITS AND COMPACTIONS- SPLIT/COMPACTION STORMS Grow your regions roughly at the same rate Eventually they all need to be split at about thesame time A large spike in disk I/O because of the requiredcompactions to rewrite the split region Refer to ppt#004, p#1326
  • 27. OPTIMIZING SPLITS AND COMPACTIONS –MANAGED SPLITTING (1/2) you can turn it off and manually invoke the split andmajor_compact commands Setting Region Maximum File Size hbase.hregion.max.filesize property for the entire cluster table level by API HTableDescriptor.setMaxFileSize(…) Refer to ppt#003, p#7 To a very high number Better to set this value to a reasonable upper boundary Such as 100GB Long.MAX_VALUE is not recommended in case the manualsplits fail to run Then you can time-control them Running them staggered across all regions Spreads the I/O load as much as possible, avoiding anysplit/compaction storm Use HBase shell + cron Or write your own codes with HBase Admin API supports Refer to #003, p#2127
  • 28. OPTIMIZING SPLITS AND COMPACTIONS –MANAGED SPLITTING (2/2) RegionSplitter Class (added in version 0.90.2) Another way to split existing regions Rolling split feature Split the existing regions while waiting long enough for theinvolved compactions to complete API docs An additional advantage Have better control over which regions are available atany time In rare case, you need to do very low-level debugging With automated splits, it is hard to debug !! Due to this region is split to two daughter regions28
  • 29. OPTIMIZING SPLITS AND COMPACTIONS –REGION HOTSPOTTING You may be dealing with a write pattern that is causing aspecific region to run hot Use Region Server Metrics to observe Refer to ppt#005, p#12 Key design approaches Salt keys, random keys, etc Refer to ppt#004, p#52 Other only way to alleviate this situation Manually split a hot region into one or more new regions, atexact boundaries You can specify any row key within specific region Be able to generate halves that are completely different in size Refer ppt#003, p#21 This can not dealing with completely sequential key ranges Those are always going to hit one region for a considerable amountof time29
  • 30. OPTIMIZING SPLITS AND COMPACTIONS –PRESPLITTING REGIONS (1/3) Manage splits manually is useful Therefore start with a larger number of regions right fromthe table creation Means to create a table with the required number ofregions Three ways… HBase shell create, refer to ppt#003, p#37 API HBaseAdmin.createTable(…), refer to ppt#003, p#16 RegionSplitter Class By default, MD5StringSplit class to partition the row keys intoranges Use -D split.algorithm=<your-algorithm-class> for otherimplementation30/bin/hbase org.apache.hadoop.hbase.util.RegionSplitterusage: RegionSplitter <TABLE>
  • 31. OPTIMIZING SPLITS AND COMPACTIONS –PRESPLITTING REGIONS (2/3) RegionSplitter with MD5StringSplit sample31testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093.testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a.testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92.testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258.testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9.testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5.
  • 32. OPTIMIZING SPLITS AND COMPACTIONS –PRESPLITTING REGIONS (3/3) How many presplit regions ? Start low with 10 presplit regions per server and watch as datagrows over time It is better to err on the side of too few regions and using arolling split later If Presplit regions to thin Increase hbase.hregion.majorcompaction property Refet to ppt#004, p# 19 If data size grows too large Use the RegionSplitter utility to perform a rolling split of allregions The main objective is to avoid split/compaction storm32
  • 33. LOAD BALANCING – BALANCER (1/3) The master has a built-in feature Called the balancer By default, runs every five minutes hbase.balancer.period property Attempts to equal out the number of assignedregions per region server Within one region of the average number per server Determines a new assignment plan Describes which regions should be moved where startsthe process of moving the regions by calling theunassign() method Refer to ppt#003, p#22 33
  • 34. LOAD BALANCING - BALANCER (2/3) balancer has an upper limit on how long it is allowed torun hbase.balancer.max.balancing property defaults to half of the balancer period value 2.5 mins The balancer switch Toggle the balancer status between enabled and disabled HBase shell balance_switch command, refer to ppt#003, p#39 balanceSwitch() API method, refer to ppt#003, p#2234
  • 35. LOAD BALANCING - BALANCER (3/3) Can be explicitly started HBase shell balancer command, refer to ppt#003, p#39 balancer() API method, refer to ppt#003, p#22 Return true Any work has be done Return false balancer was switched off No work to be done balancer was not able to run the balancer There is a region currently in transition, the balancer will beskipped35
  • 36. LOAD BALANCING - MOVE Can also use the move To assign regions to other servers HBase shell move command, refer to ppt#003, p#39 move() API method, refer to ppt#003, p#2236
  • 37. MERGING REGIONS Sometimes you may need to merge regions For example, after you have removed a large amount ofdata and you want to reduce the number of regionshosted by each server HBase allows you to merge two adjacent regions The HBase cluster must be offline, but HDFS37/bin/hbase org.apache.hadoop.hbase.util.MergeUsage: bin/hbase merge <table-name> <region-1> <region-2>
  • 38. CLIENT API: BEST PRACTICES (1/3) Disable auto-flush When performing a lot of put operations Refer to ppt#002, p#9 Use scanner-caching Set Scan.setCaching() method to something greater than thedefault of 1 if needed Refer to ppt#002, p#26 Limit scan scope If only a small number of the available columns are to beprocessed, only those should be specified in the input scan For example, use Scan.addFamily() method Refer to ppt#002, p#24 38
  • 39. CLIENT API: BEST PRACTICES (2/3) Close ResultScanners Avoiding performance problems This may cause problems on the region servers Refer to ppt#002, p#25 Block cache usage Scan instances can be set to use the block cache in theregion server via the setCacheBlocks() method true by default, default settings of the table and familyare used API docs Server side block cache settings Refer to ppt#003, p#12 39
  • 40. CLIENT API: BEST PRACTICES (3/3) Optimal loading of row keys When performing a table scan where only the row keysare needed a FilterList with a MUST_PASS_ALL operator +FirstKeyOnlyFilter + KeyOnlyFilter Refer to ppt#002, p#43 & 46 Turn off WAL on Puts Increasing throughput on Puts is to callwriteToWAL(false), there might be data loss Consider to use the bulk loading techniques instead40
  • 41. CONFIGURATION (1/6) Advanced options you can consider adjustingbased on your use case Most properties are configured in hbase-site.xml Others are in Decrease ZooKeeper timeout The default timeout between a region server and theZooKeeper quorum is three minutes Tune the timeout down to a minute, or even less, so themaster notices failures sooner zookeeper.session.timeout property Be careful of ―Juliet Pause‖ 41
  • 42. CONFIGURATION (2/6) Increase handlers The number of threads that are kept open to answerincoming requests to user tables By default is 10 hbase.regionserver.handler.count property Keep this number low when the payload per requestapproaches megabytes And high when the payload is small Increase heap settings HBASE_HEAPSIZE setting in file Consider using HBASE_REGIONSERVER_OPTSinstead of changing the global HBASE_HEAP SIZE Region servers may need more memory than Master42
  • 43. CONFIGURATION (3/6) Enable data compression Should enable compression for the storage files In most cases, boosts performance Increase region size Consider going to larger regions to cut down on the totalnumber of regions on your cluster Fewer regions to manage makes for a smoother-runningcluster43
  • 44. CONFIGURATION (4/6) Adjust block cache size The amount of heap used for the block cache is specified as apercentage Defaults to 20% perf.hfile.block.cache.size property It is good if you have mainly reading workloads Adjust memstore limits Memstore heap usage property Defaults to 40% property Defaults to 35% Control the amount of flushing that will take place once the server isrequired to free heap space Mainly read-oriented workloads Consider reducing both limits to make more room for the block cache Handling many writes Increase the memstore limits to reduce the excessive amount of I/Othis causes 44
  • 45. CONFIGURATION (5/6) Increase blocking store files The region servers block further updates from clients togive compactions time to reduce the number of files Default is seven files hbase.hstore.blockingStoreFiles property Increase block multiplier A safety latch that blocks any further updates from clientswhen the memstores exceed the multiplier * flush size limit hbase.hregion.memstore.block.multiplier property Default to 2 If you have enough memory, can increase this value tohandle spikes more gracefully Refer to ppt#003, p#8 45
  • 46. CONFIGURATION (6/6) Decrease maximum logfiles How often flushes occur based on the number of WALfiles on disk Default is 32 hbase.regionserver.maxlogs property Can be high in a write-heavy use case Lower it to force the servers to flush data more often todisk46
  • 47. LOAD TESTS It is advisable to run performance tests to verifyfunctionality of your cluster These tests give you a baseline which you can referto After making changes to the configuration of the cluster Or the schemas of your tables Doing a burn-in of your cluster Show you how much you can gain from it But this does not replace a test with the load asexpected from your use case47
  • 48. LOAD TESTS –PERFORMANCE EVALUATION (1/2) HBase ships with its own tool to execute aperformance evaluation Performance Evaluation (PE) Wiki org.apache.hadoop.hbase.PerformanceEvaluationUsage: java org.apache.hadoop.hbase.PerformanceEvaluation [--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>
  • 49. LOAD TESTS –PERFORMANCE EVALUATION (2/2) Example49/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 111/07/03 13:18:34 INFO hbase.PerformanceEvaluation: Start class org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset 0 for 1048576 rows...11/07/03 13:18:41 INFO hbase.PerformanceEvaluation: 0/104857/1048576...11/07/03 13:18:45 INFO hbase.PerformanceEvaluation: 0/209714/1048576...11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: 0/1048570/104857611/07/03 13:20:03 INFO hbase.PerformanceEvaluation: Finished class org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest in 89062ms at offset 0 for 1048576 rows
  • 50. LOAD TESTS – YCSB (1/2) Yahoo! Cloud Serving Benchmark* (YCSB) It is a suite of tools that can be used to run comparableworkloads against different storage systems Also a reasonable tool for performing an HBase cluster burn-in—or performance test Using YCSB is preferred over the HBase-suppliedPerformance Evaluation Offers more options Can combine read and write workloads Home page
  • 51. LOAD TESTS – YCSB (2/2) Use HBase shell create “usertable”, “family” git pull cd ${GIT_HOME}/hbase-training/006/ycsb Run command Then you can see performance metrics in ycsb-laod.log file51java -cp "${HBASE_CONF_DIR}:core-0.1.4.jar:hbase-binding-0.1.4.jar" -load -db -Pworkloads/workloada -p columnfamily=family -p recordcount=1000 -s > ycsb-load.log
  • 52. CLUSTER ADMINISTRATION52 Operational Tasks Node Decommission Rolling Restarts Adding BackupMaster Adding a RegionServer Data Task Export Import CopyTable Tool Bulk Import Troubleshooting HBase Fsck Analyzing the Logs
  • 53. OPERATIONAL TASKS – NODE DECOMMISSION (1/2) Use following script In normal HBase distribution In tm distribution Disable the Load Balancer beforeDecommissioning a node In hbase shell balance_switch false Regions could be offline for a good period of time Many regions on the server All regions close The master notices the region server’s ZooKeeperznode being removed53${HBASE_HOME}/bin/ stop regionserver${TM_PUPPET_HOME}/bin/services/ [<host> ...]
  • 54. OPERATIONAL TASKS – NODE DECOMMISSION (2/2) Stop a region server gradually A node to gradually shed its load and then shut itselfdown From HBASE 0.90.2 ${HBASE_HOME}/bin/ Example Check the HOSTNAME on your HBase master UI Refer to ppt#003, p#41 IP address is NOT supported at present54${HBASE_HOME}/bin/ HOSTNAME
  • 55. OPERATIONAL TASKS – ROLLING RESTARTS Also use Steps as follows1. Ensure the cluster is consistent Fix it if inconsistent2. Restart the master3. Disable the region balancer4. Run the script per region server5. Restart the master again Clear out the dead servers list and reenable the balancer6. Run hbck to ensure the cluster is consistent55hbase hbckhbase hbck -fix${HBASE_HOME}/bin/ stop master; ${HBASE_HOME}/bin/ start masterecho "balance_switch false" | ${HBASE_HOME}/bin/hbase shellfor i in `cat conf/regionservers|sort`; do ./bin/ --restart --reload --debug $i; done &> /tmp/log.txt &
  • 56. OPERATIONAL TASKS –ADDING BACKUP MASTER (1/2) To prevent the Single Point of Failure The machine currently hosting the active master isfailing, the system can fall back to a backup master Underlying operations1. A dedicated ZooKeeper znode /hbase/master2. All master processes will race to create, and the firstone to create it wins (become currently master) It happens at startup3. All other master processes simply loop around theznode check and wait for it to disappear Triggering the race again 56
  • 57. OPERATIONAL TASKS –ADDING BACKUP MASTER (2/2) How to start multiple backup master processes Use original way to start a master process In tm distribution Specifically start a backup master process57${HBASE_HOME}/bin/ start master${TM_PUPPET_HOME}/bin/services/ [<host> ...]${HBASE_HOME}/bin/ start master --backup
  • 58. OPERATIONAL TASKS –ADDING A REGION SERVER In normal HBase distribution Edit the ${HBASE_HOME}/conf/regionservers To add newly added region server’s host name Two scripts can use… ${HBASE_HOME}/bin/ It will bypass the original existing region servers, and startthe newly added region server referred to regionservers file ${HBASE_HOME}/bin/ start regionserver Must executing on the newly added region server In tm distribution New feature, not talk about this here58
  • 59. DATA TASK You may be required to move the data as a wholeor in parts Archive data for backup purposes To bootstrap another cluster59hadoop jar ${HBASE_HOME}/hbase-0.91.0-SNAPSHOT.jarAn example program must be given as the first argument.Valid program names are:…completebulkload: Complete a bulk data load.copytable: Export a table from local cluster to peer clusterexport: Write table data to HDFS.import: Import data written by Export.importtsv: Import data in TSV format.…
  • 60. DATA TASK – EXPORT (1/3)60hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar exportUsage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]
  • 61. DATA TASK - EXPORT (2/3)61hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export testtable /user/larsgeorge/backup-testtable11/06/25 15:58:29 INFO mapred.JobClient: Running job: job_201106251558_000111/06/25 15:58:30 INFO mapred.JobClient: map 0% reduce 0%…11/06/25 15:59:40 INFO mapred.JobClient: map 100% reduce 0%11/06/25 15:59:42 INFO mapred.JobClient: Job complete: job_201106251558_000111/06/25 15:59:42 INFO mapred.JobClient: Counters: 611/06/25 15:59:42 INFO mapred.JobClient: Job Counters11/06/25 15:59:42 INFO mapred.JobClient: Rack-local map tasks=3211/06/25 15:59:42 INFO mapred.JobClient: Launched map tasks=3211/06/25 15:59:42 INFO mapred.JobClient: FileSystemCounters11/06/25 15:59:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=364811/06/25 15:59:42 INFO mapred.JobClient: Map-Reduce Framework11/06/25 15:59:42 INFO mapred.JobClient: Map input records=011/06/25 15:59:42 INFO mapred.JobClient: Spilled Records=011/06/25 15:59:42 INFO mapred.JobClient: Map output records=0
  • 62. DATA TASK - EXPORT (3/3) Each part-m-nnnnn file contains a piece of theexported data Together they form the full backup of the table Use the hadoop distcp command to move thedirectory from one cluster to another, and performthe import there 62hadoop dfs -lsr /user/larsgeorge/backup-testtabledrwxr-xr-x - ... 0 2011-06-25 15:58 _logs-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00000-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00001…-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00030-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00031
  • 63. DATA TASK – IMPORT (1/2)63hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importUsage: Import <tablename> <inputdir>hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import testtable /user/larsgeorge/backup-testtable11/06/25 17:09:48 INFO mapreduce.TableOutputFormat: Created table instance for testtable11/06/25 17:09:48 INFO input.FileInputFormat: Total input paths to process : 3211/06/25 17:09:49 INFO mapred.JobClient: Running job: job_201106251558_000311/06/25 17:09:50 INFO mapred.JobClient: map 0% reduce 0%11/06/25 17:10:04 INFO mapred.JobClient: map 6% reduce 0%…11/06/25 17:10:51 INFO mapred.JobClient: Job Counters11/06/25 17:10:51 INFO mapred.JobClient: Launched map tasks=3211/06/25 17:10:51 INFO mapred.JobClient: Data-local map tasks=3211/06/25 17:10:51 INFO mapred.JobClient: FileSystemCounters11/06/25 17:10:51 INFO mapred.JobClient: HDFS_BYTES_READ=364811/06/25 17:10:51 INFO mapred.JobClient: Map-Reduce Framework11/06/25 17:10:51 INFO mapred.JobClient: Map input records=011/06/25 17:10:51 INFO mapred.JobClient: Spilled Records=011/06/25 17:10:51 INFO mapred.JobClient: Map output records=0
  • 64. DATA TASK - IMPORT (2/2) Use the Import job to store the data in a differenttable With the same schema Both export/import commend are per-table only Use hadoop distcp command to copy the entire/hbase in HDFS Not recommended May copy store files that are halfway through amemstore flush operation64
  • 65. DATA TASK – COPYTABLE TOOL (1/2) Designed to bootstrap cluster replication Make a copy of an existing table from the mastercluster to the slave cluster65hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytableUsage: CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X][--endtime=Y] [] [--peer.adr=ADR] <tablename>
  • 66. DATA TASK – COPYTABLE TOOL (2/2) The copy of the table is stored on the same cluster66hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable testtable11/06/26 15:20:07 INFO mapreduce.TableOutputFormat: Created table instance for testtable311/06/26 15:20:07 INFO mapred.JobClient: Running job: job_201106261454_000311/06/26 15:20:08 INFO mapred.JobClient: map 0% reduce 0%11/06/26 15:20:19 INFO mapred.JobClient: map 6% reduce 0%…11/06/26 15:21:04 INFO mapred.JobClient: map 100% reduce 0%11/06/26 15:21:06 INFO mapred.JobClient: Job complete: job_201106261454_000311/06/26 15:21:06 INFO mapred.JobClient: Counters: 511/06/26 15:21:06 INFO mapred.JobClient: Job Counters11/06/26 15:21:06 INFO mapred.JobClient: Launched map tasks=3211/06/26 15:21:06 INFO mapred.JobClient: Data-local map tasks=3211/06/26 15:21:06 INFO mapred.JobClient: Map-Reduce Framework11/06/26 15:21:06 INFO mapred.JobClient: Map input records=011/06/26 15:21:06 INFO mapred.JobClient: Spilled Records=011/06/26 15:21:06 INFO mapred.JobClient: Map output records=0
  • 67. DATA TASK – BULK IMPORT (1/2) Importtsv tool Given files containing data in tab-separated value (TSV)format By default , it uses the HBase put() API to insert datainto HBase one row at a time By setting importtsv.bulk.output option, generate filesusing HFileOutputFormat These can subsequently be bulk-loaded into HBase bycompletebulkload Tool67hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importtsvUsage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>
  • 68. DATA TASK – BULK IMPORT (2/2) completebulkload Tool Is used to import the data into the running cluster After a data import has been prepared By using the importtsv tool with the importtsv.bulk.outputoption By some other MapReduce job using theHFileOutputFormat68hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar completebulkload -conf ~/my-hbase-site.xml /user/larsgeorge/myoutput mytable
  • 69. TROUBLESHOOTING – HBASE FSCK (1/4) Shell Command ${HBASE_HOME}/bin/hbase hbck Once started Scans the .META. table to gather all the pertinent informationit holds Scans the HDFS root directory HBase is configured to use Compare the collected details to report on inconsistenciesand integrity issues Consistency check Whether the region is listed in .META. and exists in HDFS Is also assigned to exactly one region server Integrity check Compares the regions with the table details to find missingregions Those that have holes or overlaps in their row key ranges 69
  • 70. TROUBLESHOOTING – HBASE FSCK (2/4)70${HBASE_HOME}/bin/hbase hbck -hUsage: fsck [opts]where [opts] are:-details Display full report of all regions.-timelag {timeInSeconds} Process only regions that have not experiencedany metadata updates in the last {{timeInSeconds} seconds.-fix Try to fix some of the errors.-sleepBeforeRerun {timeInSeconds} Sleep this many seconds before checkingif the fix worked if run with -fix-summary Print only summary of the tables and status.
  • 71. TROUBLESHOOTING – HBASE FSCK (3/4) No option at all invokes the normal output detail71${HBASE_HOME}/bin/hbase hbckNumber of Tables: 40Number of live region servers: 19Number of dead region servers: 0Number of empty REGIONINFO_QUALIFIER rows in .META.: 0Summary:...testtable2 is okay.Number of regions: 1Deployed on: inconsistencies detected.Status: OK
  • 72. TROUBLESHOOTING – HBASE FSCK (4/4) ${HBASE_HOME}/bin/hbase hbck -fix Repairs following issues Assign .META. to a single new server if it is unassigned Reassign .META. to a single new server if it is assigned tomultiple servers Assign a user table region to a new server if it is unassigned Reassign a user table region to a single new server if it isassigned to multiple servers Reassign a user table region to a new server if the currentserver does not match what the .META. table refers to hbck reports inconsistencies which are temporal, ortransitional only Rerun the tool a few times to confirm a permanent problem72
  • 73. TROUBLESHOOTING – ANALYZING THE LOGS (1/2)Server type Default Logfile tm settingsHBase Master$HBASE_HOME/logs/hbase-<user>-master-<hostname>.log/var/log/hbase/hbase-<user>-master-<hostname>.logHBaseRegionServer$HBASE_HOME/logs/hbase-<user>-regionserver-<hostname>.log/var/log/hbase/hbase-<user>-regionserver-<hostname>.logZooKeeper Console log output only/var/log/hbase/hbase-<user>-zookeeper-<hostname>.logNameNode$HADOOP_HOME/logs/hadoop-<user>-namenode-<hostname>.log/var/log/hadoop/hadoop-<user>-namenode-<hostname>.logDataNode$HADOOP_HOME/logs/hadoop-<user>-datanode-<hostname>.log/var/log/hadoop/hadoop-<user>-datanode-<hostname>.logJobTracker$HADOOP_HOME/logs/hadoop-<user>-jobtracker-<hostname>.log/var/log/hadoop/hadoop-<user>-jobtracker-<hostname>.logTaskTracker$HADOOP_HOME/logs/hadoop-<user>-jobtracker-<hostname>.log/var/log/hadoop/hadoop-<user>-jobtracker-<hostname>.log73
  • 74. TROUBLESHOOTING – ANALYZING THE LOGS (2/2) Is useful to begin with the master logfile first It acts as the coordinator service of the entire cluster Find the processes began logging ERROR levelmessages Be able to identify the root cause A lot of subsequent messages are often side-effect of theoriginal problem Recommend to use the error log event metric underSystem Event Metrics group Gives you a graph showing you where the server(s)started logging an increasing number of error messagesin the logfiles If find an error message Google it !! Use the online resources to search for the message inthe public mailing lists Search Hadoop74
  • 75. HANDS-ON – USE YCSB New VM list Due to VMs are not affordable at present :p ${YOUR_HOME}=${GIT_HOME}/hbase-training/006/hands-on/${YOUR_NAME} mkdir ${YOUR_HOME} cd ${YOUR_HOME}; cp -rf ../../ycsb/* . Use HBase shell create <YOUR_NAMED_TABLE>, “family” Run YCSB with 5000 record count And ouput ycsb-load.log file Hands-on result Put the ycsb-load.log file under ${YOUR_HOME}75