Contents• Objective• What is HBase?• Why HBase?• Features of HBase• HBase architecture(overview)• HBase architecture(Write-ahead-Log)• HBase architecture(Hlog)• HBase architecture(HFile)• HBase Client• Zookeeper• Master• HBase Region server• HBase tables and regions• HBase tables• HBase Examples• HBase users• Conclusion
Objective• To study and understand one of the growingtechnologies of cloud computing and clone ofbig table i.e “HBase”.
What is HBase?• Open source project.• Hbase ia a Hadoop data base.• It is a distributed,large scale data store.• Efficient at random reads/writes.• Initially modeled after google’s big table.
Why HBase?• Datasets are reaching petabytes.• Need for random access and batch processing.• Traditional databases are expensive to scaleand difficult to manage.• Commodity hardware is cheap and powerful.
Features of HBase• It supports unstructured and semistructureddata.• It has built in version management.• Fast key based lookups.• It stores null values for free.
HBase Client• The HBase client is responsible for findingRegionServers that are serving the particular rowrange of interest.• It does this by querying the .META. and -ROOT-catalog tables in Zookeeper.• After locating the required region(s), the clientdirectly contacts the RegionServer serving thatregion
Zookeeper• Zookeeper serves as a distributed co-ordinatorservice.• It bootstraps and co-ordinates clusters.• Manages Master election and server availability• The catalog tables -ROOT- and .META. aremaintained in Zookeeper.• -ROOT- keeps track of where the .META. table is.• The .META. table keeps a list of all regions in thesystem with their corresponding region serverassignments .
Master• The Master server is responsible formonitoring all RegionServer instances in thecluster, and is the interface for all metadatachanges.• If the active Master shuts down then theremaining Masters jostle to take over theMaster role in the Zookeeper.
HBase Region Server• It is responsible for serving and managingregions.• It supports both data-oriented and region-maintenance methods.• data(get, put, delete, next, etc.)• Region (splitRegion, compactRegion, etc.)interfaces.
HBase Tables and Regions• HBase table is made up of roughly equal sizedregions.• Each region may live on a different node andis made up of several HDFS files and blocks,each of which is replicated by Hadoop.• Region is specified by its startKey and endKey
HBase Tables• Tables are sorted by Row in lexicographical order• Table schema only defines its column familiesi)Each family consists of any number of columnsii)Each column consists of any number of versionsiii)Columns only exist when inserted, NULLs are freeiv)Columns within a family are sorted andstored togetherv)Everything except table names are byte(Table, Row, Family:Column, Timestamp) -> Value
ExampleLet us take an example of a user and hisfriendship details.In RDBMS:
HBase users•Facebook•Twitter•Yfrog•Adobe•Groups at yahoo•Mozilla(Socorro)•Trend Micro•Stumble upon
Conclusion• HBase is one of the most successful ,growingtechnologies of cloud computing.• It have opened the window for further researchin many field.• whenever we need scalability then thepropeties and the flexibility of HBase canrelieve us from the headaches associated withscaling an RDBMS.