Hbase introduction

1,058 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,058
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hbase introduction

  1. 1. Hbase Introduction @yangwm
  2. 2. what hbase open-source, distributed, versioned, column-oriented store, implementby Java, like bigtable Hadoop: A distributed system, for large scale storage and paralleled computing HDFS: A distributed file system that provides high throughput access to application data. ZooKeeper: A high-performance coordination service for distributed applications.
  3. 3. why need hbase Big Data: billions of rows X millions of columns Scalability: Linear scability, across hundreds or thousands of machine Read/write performance: put: MemStore(later merge into data file) and WAL(append instead random write) get and scan: Block cache and Bloom Filters Failure handling:http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing Schema: Loosely-structured {key, value} data
  4. 4. how does hbase work (Table, RowKey, Family, Column, Timestamp) → ValueHBase table is a three-dimensional sorted map Each family consists of any number of columns Each column consists of any number of versions row(asc), column(asc), timestamp(desc)
  5. 5. HMasterAssignment, load balancing, splitting Dispatch Regions to RegionServers. Assign RegionServers.Not part of the read/write pathHighly available with ZooKeeper and standbys
  6. 6. HRegionServer StoreFile is stored in HDFS as HFileTable (HBase table) Region (Regions for the table) Store (Store per ColumnFamily for each Region for the table) MemStore (MemStore for each Store for each Region for the table) StoreFile (StoreFiles for each Store for each Region for the table) Block (Blocks within a StoreFile within a Store for each Region for the table)
  7. 7. MemStore & HLog Data is written into MemStore HLog first. Data are written into cache and log first, Data are flushed from cache to file, then merge later, HLog are used for recovering.
  8. 8. Zookeeper Tree-structure index: Zookeeper file Keep the pointer to the -ROOT- Region. Store index –ROOT- positions of .META. Regions Store table info .META. positions of each region on each regioin-server Store the Hbase schema--table info, column family info Fully cached in RAM Monitor RegionServer’s aliveness
  9. 9. HClient (Gateway of HBase) Cache the region positions. read : Batch Loading, Scan Caching, Scan Attribute(Column Family or Column) Selection write : AutoFlush, Turn off WAL on Puts Hbase client pool
  10. 10. thank you

×