0
Hbase源码初探<br />丹臣/赵林<br />2011-6-28<br />
一个简单的测试<br />需要的jia包<br />    commons-logging-1.1.1.jar<br />    hadoop-core-0.20.203.0.jar<br />    hbase-0.90.3.jar<br /...
测试的数据<br />10000 rows data are insert. take 37 s.<br />50000 rows data are insert. take 172 s.<br />100000 rows data are i...
Hmaster的源代码入口<br />Source: orgapachehadoophbasemasterHmaster.java<br />
Hmaster处理ddl操作<br />
Hbase建表<br />Number of Versions <br />The number of row versions to store is configured per column family via HColumnDescr...
Hbase建表<br />TTL  :  time to live  对于超过一定时间,需要清理数据的应用,会特别合适。<br />In_memory => true or false ,在region server block cache的分...
一次DDL异常<br />删除表无法真正成功(返回客户端成功),hbase master一直卡在如下的循环代码里,并不断的打日志Waiting on region to clear regions in transition.重启master解...
Hregionserver的源代码入口<br />Source: orgapachehadoophbaseregionserverHRegionServer.java<br />
Hregionserver的启动与关闭<br />
三大操作<br />Flush<br />Compact<br />Split<br />Write path<br />Background chores<br />Write begin<br />Write WAL Log<br />Co...
flush<br />Flush的过程是什么样的?<br />hbase.hregion.memstore.flush.size参数的误区<br />
hbase.hregion.memstore.flush.size<br />这个参数的注解有点误导人,看起来好像是一个memstore的大小限制,而实际上是基于一个region中所有memstores大小之和,可以看一下蓝色的代码部份。<br />
Flush程序路径<br />加入flush queue<br />
Flush程序路径<br />判断是否是metaregion,以及是否超过文件个数,以便以后启动compact<br />
Flush程序路径<br />
flush<br />源文件地址:<br /> orgapachehadoophbaseregionserverHRegion.java<br />
flush<br />真正处理flush memstore的函数是internalFlushcache<br /> orgapachehadoophbaseregionserverHRegion.java<br />
compact<br /><ul><li>    orgapachehadoophbaseregionserverHRegion.java</li></li></ul><li>compact<br />orgapachehadoophbaser...
split<br />source:orgapachehadoophbaseregionserverCompactSplitThread.java<br />Split的过程有一定的事务性,一定程度上可以回滚<br />
Hlog的改进想法<br />
Hbase中的Log<br />研究Hbase的利器,学会查看分析各种日志:<br />Master log<br />Region server log<br />Zookeeper log<br />每种操作(ddl, flush, com...
Hbase工具<br />RegionSplitter<br />How to export and import a table<br />
RegionSplitter<br />The RegionSplitter class provides several utilities to help in the administration lifecycle for develo...
RegionSplitter<br />
RegionSplitter<br />
How to export and import a table<br />Map-reduce job<br />
Q/A<br />thanks<br />
Hbase源码初探
Upcoming SlideShare
Loading in...5
×

Hbase源码初探

5,174

Published on

hbase source 源代码学习,在代码层面分析flush,compact,split过程,以及一些工具介绍。这篇ppt只是一个初稿,后面不断完善。

Published in: Education, Technology
0 Comments
17 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,174
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
228
Comments
0
Likes
17
Embeds 0
No embeds

No notes for slide

Transcript of "Hbase源码初探"

  1. 1. Hbase源码初探<br />丹臣/赵林<br />2011-6-28<br />
  2. 2. 一个简单的测试<br />需要的jia包<br /> commons-logging-1.1.1.jar<br /> hadoop-core-0.20.203.0.jar<br /> hbase-0.90.3.jar<br /> log4j-1.2.16.jar<br /> zookeeper-3.3.2.jar<br />编译的指令:<br />javac -classpath D:hbase-0.90.3danchencommons-logging-1.1.1.jar;D:hbase-0.90.3danchenhadoop-core-0.20.203.0.jar;D:hbase-0.90.3danchenhbase-0.90.3.jar;D:hbase-0.90.3log4j-1.2.16.jar;D:hbase-0.90.3zookeeper-3.3.2.jar HBaseTestCase.java<br />
  3. 3. 测试的数据<br />10000 rows data are insert. take 37 s.<br />50000 rows data are insert. take 172 s.<br />100000 rows data are insert. take 351 s.<br />平均每秒钟写入270条记录,平均写入每条记录花费时间3.51ms<br />测试环境:两台物理机,两台虚拟机<br />
  4. 4. Hmaster的源代码入口<br />Source: orgapachehadoophbasemasterHmaster.java<br />
  5. 5. Hmaster处理ddl操作<br />
  6. 6. Hbase建表<br />Number of Versions <br />The number of row versions to store is configured per column family via HColumnDescriptor. The default is 3. This is an important parameter because as described in Chapter 9, Data Model section HBase does not overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions. The number of versions may need to be increased or decreased depending on application needs. <br />
  7. 7. Hbase建表<br />TTL : time to live 对于超过一定时间,需要清理数据的应用,会特别合适。<br />In_memory => true or false ,在region server block cache的分配使用上会有所不同。具体如何不同,研究完相关代码再补充。<br />
  8. 8. 一次DDL异常<br />删除表无法真正成功(返回客户端成功),hbase master一直卡在如下的循环代码里,并不断的打日志Waiting on region to clear regions in transition.重启master解决这个问题。<br />
  9. 9. Hregionserver的源代码入口<br />Source: orgapachehadoophbaseregionserverHRegionServer.java<br />
  10. 10. Hregionserver的启动与关闭<br />
  11. 11. 三大操作<br />Flush<br />Compact<br />Split<br />Write path<br />Background chores<br />Write begin<br />Write WAL Log<br />Compact<br />Flush MemStore<br />Flush WAL log<br />Need Compact<br />Need Split<br />Update MemStore<br />flush queue<br />compact queue<br />Need Flush<br />Split<br />
  12. 12. flush<br />Flush的过程是什么样的?<br />hbase.hregion.memstore.flush.size参数的误区<br />
  13. 13. hbase.hregion.memstore.flush.size<br />这个参数的注解有点误导人,看起来好像是一个memstore的大小限制,而实际上是基于一个region中所有memstores大小之和,可以看一下蓝色的代码部份。<br />
  14. 14. Flush程序路径<br />加入flush queue<br />
  15. 15. Flush程序路径<br />判断是否是metaregion,以及是否超过文件个数,以便以后启动compact<br />
  16. 16. Flush程序路径<br />
  17. 17. flush<br />源文件地址:<br /> orgapachehadoophbaseregionserverHRegion.java<br />
  18. 18. flush<br />真正处理flush memstore的函数是internalFlushcache<br /> orgapachehadoophbaseregionserverHRegion.java<br />
  19. 19. compact<br /><ul><li> orgapachehadoophbaseregionserverHRegion.java</li></li></ul><li>compact<br />orgapachehadoophbaseregionserverCompactSplitThread.java<br />
  20. 20. split<br />source:orgapachehadoophbaseregionserverCompactSplitThread.java<br />Split的过程有一定的事务性,一定程度上可以回滚<br />
  21. 21. Hlog的改进想法<br />
  22. 22. Hbase中的Log<br />研究Hbase的利器,学会查看分析各种日志:<br />Master log<br />Region server log<br />Zookeeper log<br />每种操作(ddl, flush, compact, split, etc…),看相应的日志,理解各角色(region server,zookeeper,master)的交互过程。<br />
  23. 23. Hbase工具<br />RegionSplitter<br />How to export and import a table<br />
  24. 24. RegionSplitter<br />The RegionSplitter class provides several utilities to help in the administration lifecycle for developers who choose to manually split regions instead of having HBase handle that automatically. The most useful utilities are: <br />Create a table with a specified number of pre-split regions <br />Execute a rolling split of all regions on an existing table <br />
  25. 25. RegionSplitter<br />
  26. 26. RegionSplitter<br />
  27. 27. How to export and import a table<br />Map-reduce job<br />
  28. 28. Q/A<br />thanks<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×