Hbase源码初探丹臣/赵林2011-6-28
一个简单的测试需要的jia包    commons-logging-1.1.1.jar    hadoop-core-0.20.203.0.jar    hbase-0.90.3.jar    log4j-1.2.16.jar    zookeeper-3.3.2.jar编译的指令:javac -classpath D:\hbase-0.90.3\danchen\commons-logging-1.1.1.jar;D:\hbase-0.90.3\danchen\hadoop-core-0.20.203.0.jar;D:\hbase-0.90.3\danchen\hbase-0.90.3.jar;D:\hbase-0.90.3\log4j-1.2.16.jar;D:\hbase-0.90.3\zookeeper-3.3.2.jar HBaseTestCase.java
测试的数据10000 rows data are insert. take 37 s.50000 rows data are insert. take 172 s.100000 rows data are insert. take 351 s.平均每秒钟写入270条记录,平均写入每条记录花费时间3.51ms测试环境:两台物理机,两台虚拟机
Hmaster的源代码入口Source: \org\apache\hadoop\hbase\master\Hmaster.java
Hmaster处理ddl操作
Hbase建表Number of Versions The number of row versions to store is configured per column family via HColumnDescriptor. The default is 3. This is an important parameter because as described in Chapter 9, Data Model section HBase does not overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions. The number of versions may need to be increased or decreased depending on application needs.
Hbase建表TTL  :  time to live  对于超过一定时间,需要清理数据的应用,会特别合适。In_memory => true or false ,在region server block cache的分配使用上会有所不同。具体如何不同,研究完相关代码再补充。
一次DDL异常删除表无法真正成功(返回客户端成功),hbase master一直卡在如下的循环代码里,并不断的打日志Waiting on region to clear regions in transition.重启master解决这个问题。
Hregionserver的源代码入口Source: \org\apache\hadoop\hbase\regionserver\HRegionServer.java
Hregionserver的启动与关闭
三大操作FlushCompactSplitWrite pathBackground choresWrite beginWrite WAL LogCompactFlush MemStoreFlush WAL logNeed CompactNeed SplitUpdate MemStoreflush queuecompact queueNeed FlushSplit
flushFlush的过程是什么样的?hbase.hregion.memstore.flush.size参数的误区
hbase.hregion.memstore.flush.size这个参数的注解有点误导人,看起来好像是一个memstore的大小限制,而实际上是基于一个region中所有memstores大小之和,可以看一下蓝色的代码部份。
Flush程序路径加入flush queue
Flush程序路径判断是否是metaregion,以及是否超过文件个数,以便以后启动compact
Flush程序路径
flush源文件地址: \org\apache\hadoop\hbase\regionserver\HRegion.java
flush真正处理flush memstore的函数是internalFlushcache \org\apache\hadoop\hbase\regionserver\HRegion.java
compact    \org\apache\hadoop\hbase\regionserver\HRegion.javacompact\org\apache\hadoop\hbase\regionserver\CompactSplitThread.java
splitsource:\org\apache\hadoop\hbase\regionserver\CompactSplitThread.javaSplit的过程有一定的事务性,一定程度上可以回滚
Hlog的改进想法
Hbase中的Log研究Hbase的利器,学会查看分析各种日志:Master logRegion server logZookeeper log每种操作(ddl, flush, compact, split, etc…),看相应的日志,理解各角色(region server,zookeeper,master)的交互过程。
Hbase工具RegionSplitterHow to export and import a table
RegionSplitterThe RegionSplitter class provides several utilities to help in the administration lifecycle for developers who choose to manually split regions instead of having HBase handle that automatically. The most useful utilities are: Create a table with a specified number of pre-split regions Execute a rolling split of all regions on an existing table
RegionSplitter
RegionSplitter
How to export and import a tableMap-reduce job
Q/Athanks
Hbase源码初探

Hbase源码初探