1. HBase Intro.
Anty.Rao
July 13, 2012
Big Data Engineering Team
Hanborq Inc.
2. Outline
• What is HBase
• Data Model
• Physical Structures
• HBase Architecture
• Q/A
2
3. Apache HBase
HBase is an
open source, distributed,
Sorted map
modeled after Google’s BigTable
3
4. Why HBase
• HDFS
– File in HDFS is immutable, don’t support update
• HBase = HDFS +random read/write
• HBase uses HDFS for storage
• “Log Structured merge tree”
– Similar to “log structured file systems”
– Same storage pattern as Cassandra
4
5. Data Model
• Tables are sorted by Row
• Table Schema only define it’s column families
– Each family consists of any number of columns
– Each column consists of any number of versions
– Columns only exist when inserted, NULLs are free
– Columns with in a family are sorted and stored together
• Everything except table names are byte[]
• (Row,Family:Column,Timestamp) Value
5
6. Operators
• Operations are based on row keys
• Operations:
– Put
– Get
– Scan
– Delete
• Just a tombstone marker
6
8. How data is physically stored
HFile
http://www.slideshare.net/schubertzhang/hfile-a-blockindexed-file-format-to-store-
sorted-keyvalue-pairs 8
9. Data Organization : Region
• Region: unit of
distribution and
availability
• Regions are split when
grown too large
• Max region size is a
tuning parameter
– Too Low: prevents
parallel scalability
– Too high: makes things
slow
9
14. Master
• Master duties
– Bootstrapping, doing bulk initial assign.
– Load balancer
– Splitting WAL, assign regions
– Get crashed region back
• What Master does Not do
– Does not handle any write request
(not a DB master)
– Does not handle location finding requests
– Not involved in the read/write path
– Even master(s) is(are) down, cluster can response to write/read request.
– Generally does very little most the time
14
15. Master is stateless
• All the date and state info stored in HDFS &
ZooKeeper
• Master is not SPOF!
15
17. HBase Client
• Cache write
requests
• Look up region
server location
when writing and
reading
– First locate .ROOT.
– Then –META-
region
– User region
• Make RPC call to
region server.
17