SlideShare a Scribd company logo
1 of 19
HBase Intro.

         Anty.Rao
       July 13, 2012
Big Data Engineering Team
       Hanborq Inc.
Outline
•   What is HBase
•   Data Model
•   Physical Structures
•   HBase Architecture
•   Q/A




                                    2
Apache HBase

         HBase is an
   open source, distributed,
         Sorted map
modeled after Google’s BigTable




                                  3
Why HBase
• HDFS
  – File in HDFS is immutable, don’t support update
• HBase = HDFS +random read/write
• HBase uses HDFS for storage
• “Log Structured merge tree”
  – Similar to “log structured file systems”
  – Same storage pattern as Cassandra



                                                      4
Data Model
• Tables are sorted by Row
• Table Schema only define it’s column families
  –   Each family consists of any number of columns
  –   Each column consists of any number of versions
  –   Columns only exist when inserted, NULLs are free
  –   Columns with in a family are sorted and stored together
• Everything except table names are byte[]
• (Row,Family:Column,Timestamp)  Value




                                                                5
Operators
• Operations are based on row keys
• Operations:
  – Put
  – Get
  – Scan
  – Delete
    • Just a tombstone marker




                                     6
How row is physically stored
                      KeyValue




Row Key           Column Key          Timestamp   Cell
com.cnn.www       anchor:cnnsi.com    T9          CNN
com.cnn.www       Anchor:my.look.ca   T8          CNN.com
com.cnn.www       Contents:           T6          <html>….
com.cnn.www       Contents:           t5          <html>…
com.cnn.www       Contents:           t3          <html>…

                                                             7
How data is physically stored
               HFile




http://www.slideshare.net/schubertzhang/hfile-a-blockindexed-file-format-to-store-
sorted-keyvalue-pairs                                                                8
Data Organization : Region
• Region: unit of
  distribution and
  availability
• Regions are split when
  grown too large
• Max region size is a
  tuning parameter
  – Too Low: prevents
    parallel scalability
  – Too high: makes things
    slow


                                       9
Read/Write Path




                  10
Architecture Overview




                        11
Write-Ahead-Log Flow




                       12
Three Major Components
• Master
• HRegionServer
• Client




                                 13
Master
• Master duties
   –   Bootstrapping, doing bulk initial assign.
   –   Load balancer
   –   Splitting WAL, assign regions
   –   Get crashed region back
• What Master does Not do
   – Does not handle any write request
        (not a DB master)
   –   Does not handle location finding requests
   –   Not involved in the read/write path
   –   Even master(s) is(are) down, cluster can response to write/read request.
   –   Generally does very little most the time


                                                                             14
Master is stateless
• All the date and state info stored in HDFS &
  ZooKeeper
• Master is not SPOF!




                                                 15
HRegionServer
•   Send heartbeat(Load info) to Master
•   Write Requests
•   Read Request
•   Flush
•   Compaction
•   Region Splits



                                          16
HBase Client
• Cache write
  requests
• Look up region
  server location
  when writing and
  reading
  – First locate .ROOT.
  – Then –META-
    region
  – User region
• Make RPC call to
  region server.
                                         17
Q/A




      18
THANK YOU !

       Anty.Rao
(ant.rao@gmail.com)



                      19

More Related Content

What's hot

Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
智杰 付
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
IndicThreads
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
Edward Yoon
 

What's hot (20)

Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Storage in hadoop
Storage in hadoopStorage in hadoop
Storage in hadoop
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 

Viewers also liked

Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
 

Viewers also liked (16)

MetaScale Case Study: Hadoop Extends DataStage ETL Capacity
MetaScale Case Study: Hadoop Extends DataStage ETL CapacityMetaScale Case Study: Hadoop Extends DataStage ETL Capacity
MetaScale Case Study: Hadoop Extends DataStage ETL Capacity
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Date time java 8 (jsr 310)
Date time java 8 (jsr 310)Date time java 8 (jsr 310)
Date time java 8 (jsr 310)
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
HBase Lightning Talk
HBase Lightning TalkHBase Lightning Talk
HBase Lightning Talk
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache spark
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
 
Fluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log ManagementFluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log Management
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to HBase Introduction

Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 

Similar to HBase Introduction (20)

Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
 

More from Hanborq Inc.

More from Hanborq Inc. (12)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
FlumeBase Study
FlumeBase StudyFlumeBase Study
FlumeBase Study
 
Flume and Flive Introduction
Flume and Flive IntroductionFlume and Flive Introduction
Flume and Flive Introduction
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
 
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler IntroductionHadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service SystemsHow to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

HBase Introduction

  • 1. HBase Intro. Anty.Rao July 13, 2012 Big Data Engineering Team Hanborq Inc.
  • 2. Outline • What is HBase • Data Model • Physical Structures • HBase Architecture • Q/A 2
  • 3. Apache HBase HBase is an open source, distributed, Sorted map modeled after Google’s BigTable 3
  • 4. Why HBase • HDFS – File in HDFS is immutable, don’t support update • HBase = HDFS +random read/write • HBase uses HDFS for storage • “Log Structured merge tree” – Similar to “log structured file systems” – Same storage pattern as Cassandra 4
  • 5. Data Model • Tables are sorted by Row • Table Schema only define it’s column families – Each family consists of any number of columns – Each column consists of any number of versions – Columns only exist when inserted, NULLs are free – Columns with in a family are sorted and stored together • Everything except table names are byte[] • (Row,Family:Column,Timestamp)  Value 5
  • 6. Operators • Operations are based on row keys • Operations: – Put – Get – Scan – Delete • Just a tombstone marker 6
  • 7. How row is physically stored KeyValue Row Key Column Key Timestamp Cell com.cnn.www anchor:cnnsi.com T9 CNN com.cnn.www Anchor:my.look.ca T8 CNN.com com.cnn.www Contents: T6 <html>…. com.cnn.www Contents: t5 <html>… com.cnn.www Contents: t3 <html>… 7
  • 8. How data is physically stored HFile http://www.slideshare.net/schubertzhang/hfile-a-blockindexed-file-format-to-store- sorted-keyvalue-pairs 8
  • 9. Data Organization : Region • Region: unit of distribution and availability • Regions are split when grown too large • Max region size is a tuning parameter – Too Low: prevents parallel scalability – Too high: makes things slow 9
  • 13. Three Major Components • Master • HRegionServer • Client 13
  • 14. Master • Master duties – Bootstrapping, doing bulk initial assign. – Load balancer – Splitting WAL, assign regions – Get crashed region back • What Master does Not do – Does not handle any write request (not a DB master) – Does not handle location finding requests – Not involved in the read/write path – Even master(s) is(are) down, cluster can response to write/read request. – Generally does very little most the time 14
  • 15. Master is stateless • All the date and state info stored in HDFS & ZooKeeper • Master is not SPOF! 15
  • 16. HRegionServer • Send heartbeat(Load info) to Master • Write Requests • Read Request • Flush • Compaction • Region Splits 16
  • 17. HBase Client • Cache write requests • Look up region server location when writing and reading – First locate .ROOT. – Then –META- region – User region • Make RPC call to region server. 17
  • 18. Q/A 18
  • 19. THANK YOU ! Anty.Rao (ant.rao@gmail.com) 19