SlideShare a Scribd company logo
Introduction to HBase
     Concept, Data Model and Architecture




                     
                     




Public 2009/5/13




                                            1
Outline

       •     Introduction to HBase
       •     HBase Data Model
       •     HBase Architecture
       •     Reference




                                     Copyright 2009 - Trend Micro Inc.
Classification

                                                                         2
Introduction to HBase

    • HBase is …
             –   Distributed storage
             –   Table-like in data structure (multi-dimensional map)
             –   High scalability
             –   High availability
             –   High performance
    • HBase provides Google BigTable-like capabilities
    • A distributed data store that can scale horizontally to 1,000s
      of commodity servers and petabytes of indexed storage.
    • Designed to operate on top of the Hadoop distributed file
      system (HDFS) or Kosmos File System (KFS, aka
      Cloudstore) for scalability, fault tolerance, and high
      availability.
    • Integrated into the Hadoop map-reduce platform and
      paradigm.
                                                                        Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                            3
History of HBase

       • 2006.11
                 – Google releases paper on BigTable
       • 2007.2
                 – Initial HBase prototype created as Hadoop contrib.
       • 2007.10
                 – First useable HBase
       • 2008.1
                 – Hadoop become Apache top-level project and HBase becomes
                   subproject
       • 2008.10~
                 – HBase 0.18, 0.19 released




                                                                        Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                            4
HBase Is (Not) …
  •     Not a relational data storage system
          – Tables have one primary index, the row key.
          – No join operators.
          – Scans and queries can select a subset of available columns, perhaps
            by using a wildcard.
          – There are three types of lookups:
                 • Fast lookup using row key and optional timestamp.
                 • Full table scan
                 • Range scan from region start to end.
          – Limited atomicity and transaction support.
                 • HBase supports multiple batched mutations of single rows only.
                 • Data is unstructured and untyped.
          – No accessed or manipulated via SQL.
                 • Programmatic access via Java, REST, or Thrift APIs.
                 • Scripting via JRuby.



                                                                                    Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                                        5
Why Bigtable?
  • Performance of RDBMS system is good for transaction
    processing but for very large scale analytic processing, the
    solutions are commercial, expensive, and specialized.
  • Very large scale analytic processing
          – Big queries – typically range or table scans.
          – Big databases (100s of TB)
  • Map reduce on Bigtable with optionally Cascading on top to
    support some relational algebras may be a cost effective
    solution.
  • Sharding is not a solution to scale open source RDBMS
    platforms
          – Application specific
          – Labor intensive (re)partitionaing



                                                            Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                6
Why HBase?

       • HBase is a Bigtable clone.
       • It is open source
       • It has a good community and promise for the future
       • It is developed on top of and has good integration for the
         Hadoop platform, if you are using Hadoop already.
       • It has a Cascading connector.




                                                               Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                   7
Data Model – Conceptual View
      • Tables are sorted by Row
      • Table schema only define it’s column families .
                 –   Each family consists of any number of columns
                 –   Each column consists of any number of versions
                 –   Columns only exist when inserted, NULLs are free.
                 –   Columns within a family are sorted and stored together
                 –   Everything except table names are byte[]
      • (Row, Family: Column, Timestamp)  Value

      Row Key                  Time     Column       Column (Family)
                               Stamp    (Family)     “anchor:”
                                        “content:”
      com.cnn.www              t9       “<html>…”    “anchor:cnnsi.com”    “CNN”
                               t8                    “anchor:cnnsi.com”    “CNN”
                                                     “anchor:my.lock.ca”   “MyLook”

                               t6       “<html>…”
                                                                                   Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                                       8
Data Model – Conceptual View
    • Conceptually a table may be thought of a collection of rows
      that are located by a row key (and optional timestamp) and
      where any column may not have a value for particular row
      key (sparse)


       Row Key       Time    Column       Column (Family)
                     Stamp   (Family)     “anchor:”
                             “content:”
       com.cnn.www   t9      “<html>…”    “anchor:cnnsi.com”    “CNN”
                     t8                   “anchor:cnnsi.com”    “CNN”
                                          “anchor:my.lock.ca”   “MyLook”

                     t6      “<html>…”




                                                                        Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                            9
Data Model – Physical Storage
   • Although, at a conceptual level, tables may be viewed as a
     sparse set of rows, physically they are stored as column
     oriented data sets.

                 Row Key       Time    Column (Family)
                               Stamp   “content:”

                 com.cnn.www   t9      “<html>…”
                               t6      “<html>…”

                 Row Key       Time    Column (Family)
                               Stamp   “anchor:”
                 com.cnn.www   t9      “anchor:cnnsi.c “CNN”
                                       om”
                               t8      “anchor:cnnsi.c “CNN”
                                       om”             “MyLook”
                                       “anchor:my.loc
                                       k.ca”
                                                                  Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                      10
Data Model – Real Life Example
  •     Consider
          – Blog entries, which consist of a title, an under title, a date, and author, a
            type (or tag), a text, and comments, can be created and updated by
            logged in users
          – Users, which consist of a username, a password, a nuame, can log in
            and log out
          – Comments, which consist of a title, an author, and text
  •     Source ERD




                                                                                Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                                    11
Data Model – Real Life Example




  •     Row key
          – A concatenation of type (shortened to 2 letters) and timestamp
          – Row will be gathered first by type and then by date throughout the cluster
          – More chances of hitting a single region to fetch the needed data
  •     One-to-many relationship between BLOGENTRY and COMMENT is
        handled by storing comments as a family in BLOGENTRY and by using the
        timestamp as column key


                                                                                   Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                                       12
Data Model – Real Life Example




  •     Target schema advantages
          – Building the front page of the blog requires only a fetch of the “info:”
            family from BLOGTABLE
          – By using timestamps in the row key, scanners will fetch sequential rows
            for in order rendering of comments




                                                                           Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                               13
Hbase Archtecture

      Region     Row Keys   Column Family   •   Table is made up of any
                            “Content:”
                                                number of regions
      Region 1 00000        …                   – Region is specified by its
                 00001      …                     startKey and endKey
                 …          …               •   Each region may live on
                 09999      …                   different node and is made up
      Region 2 10000        …                   of several HDFS files and
                 …          …                   blocks, each of which is
                 …          …                   replicated by Hadoop.
                 29999      …




                                                                      Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                          14
HBase Architecture

   • Region Servers
            – Serving requests(Write/Read/Scan) of Client
            – Send HeartBeat to Master
            – Throughput and Region numbers are scalable by region servers
   • HBase Master
            –    Responsible for monitoring region servers
            –    Load balancing for regions
            –    Redirect client to correct region servers
            –    The current SPOF




                                                                  Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                      15
HBase Architecture


                                                   ROOT
                                         Region
                                         Server
                                                   META

                                                  Region 1
                                         Region
                                         Server
                                                  Region 2
             Master
                                                  Region 3
                                         Region
                                  HRPC   Server
                 HRPC                             Region 4

                        Clients          Region   Region 5
                                         Server
                                                  Region 6




                                                             Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                 16
Connecting to HBase

       • Java client
                 – get(byte [] row, byte [] column, long timestamp, int versions);
       • Non-Java clients
                 – Thrift server hosting HBase client instance
                     • Sample ruby, c++, & java (via thrift) clients
                 – REST server hosts HBase client
       • TableInput/OutputFormat for MapReduce
                 – HBase as MR source or sink
       • HBase Shell
                 – JRuby IRB with “DSL” to add get, scan, and admin
                 – ./bin/hbase shell YOUR_SCRIPT




                                                                            Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                                17
Reference

       • Google BigTable paper
                 – http://labs.google.com/papers/bigtable.html
       • HBase official site
                 – http://hadoop.apache.org/hbase/
       • HBase official wiki
                 – http://wiki.apache.org/hadoop/Hbase




                                                                 Copyright 2009 - Trend Micro Inc.
Classification

                                                                                                     18

More Related Content

What's hot

Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
Richard McDougall
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
BigDataCloud
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
Cloudera, Inc.
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Yahoo Developer Network
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMongoDB
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
lucenerevolution
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
Ravi Veeramachaneni
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloHortonworks
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
Vitthal Gogate
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
WANdisco Plc
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
Adam Muise
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
Cloudera, Inc.
 

What's hot (20)

Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 

Similar to Introduction to h base

Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
Bas van Oudenaarde
 
Apache Drill
Apache DrillApache Drill
Apache Drill
Ted Dunning
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
William LaForest
 
MongoDB quickstart for Java, PHP, and Python developers
MongoDB quickstart for Java, PHP, and Python developersMongoDB quickstart for Java, PHP, and Python developers
MongoDB quickstart for Java, PHP, and Python developers
Rick Hightower
 
Mongo DB for Java, Python and PHP Developers
Mongo DB for Java, Python and PHP DevelopersMongo DB for Java, Python and PHP Developers
Mongo DB for Java, Python and PHP Developers
Rick Hightower
 
Cassandra for Rails
Cassandra for RailsCassandra for Rails
Cassandra for Rails
Pablo Delgado
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
Dipti Borkar
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Base
kevin liao
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 

Similar to Introduction to h base (20)

Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
MongoDB quickstart for Java, PHP, and Python developers
MongoDB quickstart for Java, PHP, and Python developersMongoDB quickstart for Java, PHP, and Python developers
MongoDB quickstart for Java, PHP, and Python developers
 
Mongo DB for Java, Python and PHP Developers
Mongo DB for Java, Python and PHP DevelopersMongo DB for Java, Python and PHP Developers
Mongo DB for Java, Python and PHP Developers
 
Cassandra for Rails
Cassandra for RailsCassandra for Rails
Cassandra for Rails
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Hbase jdd
Hbase jddHbase jdd
Hbase jdd
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Base
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Introduction to h base

  • 1. Introduction to HBase Concept, Data Model and Architecture   Public 2009/5/13 1
  • 2. Outline • Introduction to HBase • HBase Data Model • HBase Architecture • Reference Copyright 2009 - Trend Micro Inc. Classification 2
  • 3. Introduction to HBase • HBase is … – Distributed storage – Table-like in data structure (multi-dimensional map) – High scalability – High availability – High performance • HBase provides Google BigTable-like capabilities • A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. • Designed to operate on top of the Hadoop distributed file system (HDFS) or Kosmos File System (KFS, aka Cloudstore) for scalability, fault tolerance, and high availability. • Integrated into the Hadoop map-reduce platform and paradigm. Copyright 2009 - Trend Micro Inc. Classification 3
  • 4. History of HBase • 2006.11 – Google releases paper on BigTable • 2007.2 – Initial HBase prototype created as Hadoop contrib. • 2007.10 – First useable HBase • 2008.1 – Hadoop become Apache top-level project and HBase becomes subproject • 2008.10~ – HBase 0.18, 0.19 released Copyright 2009 - Trend Micro Inc. Classification 4
  • 5. HBase Is (Not) … • Not a relational data storage system – Tables have one primary index, the row key. – No join operators. – Scans and queries can select a subset of available columns, perhaps by using a wildcard. – There are three types of lookups: • Fast lookup using row key and optional timestamp. • Full table scan • Range scan from region start to end. – Limited atomicity and transaction support. • HBase supports multiple batched mutations of single rows only. • Data is unstructured and untyped. – No accessed or manipulated via SQL. • Programmatic access via Java, REST, or Thrift APIs. • Scripting via JRuby. Copyright 2009 - Trend Micro Inc. Classification 5
  • 6. Why Bigtable? • Performance of RDBMS system is good for transaction processing but for very large scale analytic processing, the solutions are commercial, expensive, and specialized. • Very large scale analytic processing – Big queries – typically range or table scans. – Big databases (100s of TB) • Map reduce on Bigtable with optionally Cascading on top to support some relational algebras may be a cost effective solution. • Sharding is not a solution to scale open source RDBMS platforms – Application specific – Labor intensive (re)partitionaing Copyright 2009 - Trend Micro Inc. Classification 6
  • 7. Why HBase? • HBase is a Bigtable clone. • It is open source • It has a good community and promise for the future • It is developed on top of and has good integration for the Hadoop platform, if you are using Hadoop already. • It has a Cascading connector. Copyright 2009 - Trend Micro Inc. Classification 7
  • 8. Data Model – Conceptual View • Tables are sorted by Row • Table schema only define it’s column families . – Each family consists of any number of columns – Each column consists of any number of versions – Columns only exist when inserted, NULLs are free. – Columns within a family are sorted and stored together – Everything except table names are byte[] • (Row, Family: Column, Timestamp)  Value Row Key Time Column Column (Family) Stamp (Family) “anchor:” “content:” com.cnn.www t9 “<html>…” “anchor:cnnsi.com” “CNN” t8 “anchor:cnnsi.com” “CNN” “anchor:my.lock.ca” “MyLook” t6 “<html>…” Copyright 2009 - Trend Micro Inc. Classification 8
  • 9. Data Model – Conceptual View • Conceptually a table may be thought of a collection of rows that are located by a row key (and optional timestamp) and where any column may not have a value for particular row key (sparse) Row Key Time Column Column (Family) Stamp (Family) “anchor:” “content:” com.cnn.www t9 “<html>…” “anchor:cnnsi.com” “CNN” t8 “anchor:cnnsi.com” “CNN” “anchor:my.lock.ca” “MyLook” t6 “<html>…” Copyright 2009 - Trend Micro Inc. Classification 9
  • 10. Data Model – Physical Storage • Although, at a conceptual level, tables may be viewed as a sparse set of rows, physically they are stored as column oriented data sets. Row Key Time Column (Family) Stamp “content:” com.cnn.www t9 “<html>…” t6 “<html>…” Row Key Time Column (Family) Stamp “anchor:” com.cnn.www t9 “anchor:cnnsi.c “CNN” om” t8 “anchor:cnnsi.c “CNN” om” “MyLook” “anchor:my.loc k.ca” Copyright 2009 - Trend Micro Inc. Classification 10
  • 11. Data Model – Real Life Example • Consider – Blog entries, which consist of a title, an under title, a date, and author, a type (or tag), a text, and comments, can be created and updated by logged in users – Users, which consist of a username, a password, a nuame, can log in and log out – Comments, which consist of a title, an author, and text • Source ERD Copyright 2009 - Trend Micro Inc. Classification 11
  • 12. Data Model – Real Life Example • Row key – A concatenation of type (shortened to 2 letters) and timestamp – Row will be gathered first by type and then by date throughout the cluster – More chances of hitting a single region to fetch the needed data • One-to-many relationship between BLOGENTRY and COMMENT is handled by storing comments as a family in BLOGENTRY and by using the timestamp as column key Copyright 2009 - Trend Micro Inc. Classification 12
  • 13. Data Model – Real Life Example • Target schema advantages – Building the front page of the blog requires only a fetch of the “info:” family from BLOGTABLE – By using timestamps in the row key, scanners will fetch sequential rows for in order rendering of comments Copyright 2009 - Trend Micro Inc. Classification 13
  • 14. Hbase Archtecture Region Row Keys Column Family • Table is made up of any “Content:” number of regions Region 1 00000 … – Region is specified by its 00001 … startKey and endKey … … • Each region may live on 09999 … different node and is made up Region 2 10000 … of several HDFS files and … … blocks, each of which is … … replicated by Hadoop. 29999 … Copyright 2009 - Trend Micro Inc. Classification 14
  • 15. HBase Architecture • Region Servers – Serving requests(Write/Read/Scan) of Client – Send HeartBeat to Master – Throughput and Region numbers are scalable by region servers • HBase Master – Responsible for monitoring region servers – Load balancing for regions – Redirect client to correct region servers – The current SPOF Copyright 2009 - Trend Micro Inc. Classification 15
  • 16. HBase Architecture ROOT Region Server META Region 1 Region Server Region 2 Master Region 3 Region HRPC Server HRPC Region 4 Clients Region Region 5 Server Region 6 Copyright 2009 - Trend Micro Inc. Classification 16
  • 17. Connecting to HBase • Java client – get(byte [] row, byte [] column, long timestamp, int versions); • Non-Java clients – Thrift server hosting HBase client instance • Sample ruby, c++, & java (via thrift) clients – REST server hosts HBase client • TableInput/OutputFormat for MapReduce – HBase as MR source or sink • HBase Shell – JRuby IRB with “DSL” to add get, scan, and admin – ./bin/hbase shell YOUR_SCRIPT Copyright 2009 - Trend Micro Inc. Classification 17
  • 18. Reference • Google BigTable paper – http://labs.google.com/papers/bigtable.html • HBase official site – http://hadoop.apache.org/hbase/ • HBase official wiki – http://wiki.apache.org/hadoop/Hbase Copyright 2009 - Trend Micro Inc. Classification 18