SlideShare a Scribd company logo
1 of 24
Big Data and Hadoop Training
HBASE
Page 2Classification: Restricted
Agenda
•HBase Introduction
•Row & Column storage
•Characteristics of a huge DB
•What is HBase?
•HBase Data-Model
•HBase vs RDBMS
•HBase architecture
•HBase in operation
•Loading Data into HBase
•HBase shell commands
•HBase operations through Java
•HBase operations through MR
Page 3Classification: Restricted
What is Hbase?
• Open source project built on top of Apache Hadoop
• NoSQL database
• Distributed, scalable store
• Column-family datastore
Page 4Classification: Restricted
How do you pick Sql or NoSql?
• What does your data look like?
• Is your data model likely to change?
• Is your data growing exponentially?
• Will you be doing real-time analytics on operational data?
Page 5Classification: Restricted
Inspiration for Hbase
•Google’s BigTable is the inspiration for Hbase
•It is designed to run on a cluster of computers.
Characteristics of Big Table:
•Data is ‘Sparse’
•Data is stored as a ‘Sorted Map’
•‘Distributed’
•‘Multi-dimensional’
•‘Consistent’
Page 6Classification: Restricted
Hbase vs RDBMS
HBase RDBMS
Data that is accessed together is stored
together
Data is normalized
Column-oriented Row-oriented(mostly)
Flexible schema, can add columns on
the fly
Fixed schema
Good with Sparse tables Not optimized for sparse tables
No Joins Optimized for joins
Horizontal Scalability Hard to shard and scale
Good for structured, semi-structured
data
Good for structured data
Row-based transactions Distributed transactions
Page 7Classification: Restricted
Row & Column - Storage
•Column oriented store – For specific queries, not all values of a table are
needed (analytical databases)
•Advantages of Column-oriented storage:
•Reduced I/O
•Values of columns in the logical rows are similar – better suited for
compression
Page 8Classification: Restricted
Page 9Classification: Restricted
Hbase Data - Model
Component Description
Table Data organized into tables; comprised rows
Row key Data stored in rows; Rows identified by Rowkeys;
Primary key; Rows are sorted by this value
Column family Columns are grouped into families
Column Qualifier Identifies the column
Cell Combination of the rowkey, column family, colum, timestamp;
contains the value
Version Values within cell versioned by version number  timestamp
Page 10Classification: Restricted
Hbase Data Model
Page 11Classification: Restricted
Hbase Data - Model
• Regions – Horizontal partitions of a Hbase Table.
• A Region is denoted by the Table it belongs to, it’s first row(inclusive), last
row(exclusive)
• Regions are the units that get distributed over an entire cluster.
• Initially, a table comprises a single region, but as the region grows it eventually
crosses a configurable size threshold, at which point it splits at a row boundary
into two new regions of approximately equal size
Page 12Classification: Restricted
Hbase Architecture
Page 13Classification: Restricted
• Hbase Master – master node
• Regionservers – slave nodes
• Hbase Master
• bootstraps a virgin install,
• assigns regions to registered regionservers,
• recovers regionserver failures
• Regionservers
• carry zero or more regions
• take client read/write requests
• Manage region splits – informs master about the new daughter regions
Hbase Architecture
Page 14Classification: Restricted
• ZooKeeper – Authority on the cluster state
• Hbase – location of catalog table & cluster master
• Assignment of regions is mediated via Zookeeper in case servers crash mid-
assignment
• Hbase Client must know the location of the zookeeper ensemble.
• Thereafter, client navigates the zookeeper hierarchy to learn cluster attributes
such as server lcoations.
Hbase Architecture
Page 15Classification: Restricted
• hbase:meta – list, state & locations of all regions on the cluster.
• Entries in hbase:meta are keyed by region name
• Region name – table name of the region, region’s start row, time of
creation, and MD5 hash of all of these.
• Eg: TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece.
• As row keys are sorted, finding the region that hosts a particular key is easy
• Whenever region(s) split, enabled, disabled, deleted etc., the catalog table is
updated.
Hbase in Operation
Page 16Classification: Restricted
• Fresh clients connect to Zookeeper cluster to get the location of hbase:meta
 To figure out hosting user-space regions and its locations.
• Then, clients interact directly with regionservers.
• Clients cache their previous operations – works fine until there is a fault.
• If fault happens, clients contact hbase:meta again. If this has also moved,
clients will contact Zookeeper.
• Writes arriving at a regionserver are first appended to a commit log and then
added to an in-memory memstore. When a memstore fills, its content is
flushed to the filesystem
Hbase in Operation
Page 17Classification: Restricted
• When reading, the region’s memstore is consulted first. If sufficient versions
are found reading memstore alone, the query completes there. Otherwise,
flush files are consulted in order, from newest to oldest, either until versions
sufficient to satisfy the query are found or until we run out of flush files.
Hbase in Operation
Page 18Classification: Restricted
• Using HBase shell
• Using Client APIs
• Using Pig
• Using Sqoop
Loading Data Into Hbase
Page 19Classification: Restricted
Hbase Shell commands
Page 20Classification: Restricted
Hbase Shell commands
Page 21Classification: Restricted
Hbase Shell Commands
Page 22Classification: Restricted
Connect to Hbase from Clients
Page 23Classification: Restricted
Hbase Use cases
•Capturing incremental data – Time series data – High Volume, Velocity
Writes
•eg: Sensor, system metrics, events, stock prices, server logs, rainfall data
•Information Exchange – High Volume, Velocity Write/Read
•eg: email, chat
•Content serving, web Application Backend – High Volume, Velocity Reads
•eg: ebay, groupon
Page 24Classification: Restricted
Thank You

More Related Content

What's hot

HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012Ian Varley
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsBlake Irvine
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsScyllaDB
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 
FIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LDFIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LDFIWARE
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitternkallen
 
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...DataStax Academy
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 

What's hot (20)

HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
HBase
HBaseHBase
HBase
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of Analytics
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL Systems
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
FIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LDFIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LD
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 

Similar to Hbase

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPERKrishnaVeni451953
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBaseGokuldas Pillai
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Gokuldas Pillai
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base智杰 付
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 

Similar to Hbase (20)

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Hbase
HbaseHbase
Hbase
 
01 hbase
01 hbase01 hbase
01 hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Hbase

  • 1. Big Data and Hadoop Training HBASE
  • 2. Page 2Classification: Restricted Agenda •HBase Introduction •Row & Column storage •Characteristics of a huge DB •What is HBase? •HBase Data-Model •HBase vs RDBMS •HBase architecture •HBase in operation •Loading Data into HBase •HBase shell commands •HBase operations through Java •HBase operations through MR
  • 3. Page 3Classification: Restricted What is Hbase? • Open source project built on top of Apache Hadoop • NoSQL database • Distributed, scalable store • Column-family datastore
  • 4. Page 4Classification: Restricted How do you pick Sql or NoSql? • What does your data look like? • Is your data model likely to change? • Is your data growing exponentially? • Will you be doing real-time analytics on operational data?
  • 5. Page 5Classification: Restricted Inspiration for Hbase •Google’s BigTable is the inspiration for Hbase •It is designed to run on a cluster of computers. Characteristics of Big Table: •Data is ‘Sparse’ •Data is stored as a ‘Sorted Map’ •‘Distributed’ •‘Multi-dimensional’ •‘Consistent’
  • 6. Page 6Classification: Restricted Hbase vs RDBMS HBase RDBMS Data that is accessed together is stored together Data is normalized Column-oriented Row-oriented(mostly) Flexible schema, can add columns on the fly Fixed schema Good with Sparse tables Not optimized for sparse tables No Joins Optimized for joins Horizontal Scalability Hard to shard and scale Good for structured, semi-structured data Good for structured data Row-based transactions Distributed transactions
  • 7. Page 7Classification: Restricted Row & Column - Storage •Column oriented store – For specific queries, not all values of a table are needed (analytical databases) •Advantages of Column-oriented storage: •Reduced I/O •Values of columns in the logical rows are similar – better suited for compression
  • 9. Page 9Classification: Restricted Hbase Data - Model Component Description Table Data organized into tables; comprised rows Row key Data stored in rows; Rows identified by Rowkeys; Primary key; Rows are sorted by this value Column family Columns are grouped into families Column Qualifier Identifies the column Cell Combination of the rowkey, column family, colum, timestamp; contains the value Version Values within cell versioned by version number  timestamp
  • 11. Page 11Classification: Restricted Hbase Data - Model • Regions – Horizontal partitions of a Hbase Table. • A Region is denoted by the Table it belongs to, it’s first row(inclusive), last row(exclusive) • Regions are the units that get distributed over an entire cluster. • Initially, a table comprises a single region, but as the region grows it eventually crosses a configurable size threshold, at which point it splits at a row boundary into two new regions of approximately equal size
  • 13. Page 13Classification: Restricted • Hbase Master – master node • Regionservers – slave nodes • Hbase Master • bootstraps a virgin install, • assigns regions to registered regionservers, • recovers regionserver failures • Regionservers • carry zero or more regions • take client read/write requests • Manage region splits – informs master about the new daughter regions Hbase Architecture
  • 14. Page 14Classification: Restricted • ZooKeeper – Authority on the cluster state • Hbase – location of catalog table & cluster master • Assignment of regions is mediated via Zookeeper in case servers crash mid- assignment • Hbase Client must know the location of the zookeeper ensemble. • Thereafter, client navigates the zookeeper hierarchy to learn cluster attributes such as server lcoations. Hbase Architecture
  • 15. Page 15Classification: Restricted • hbase:meta – list, state & locations of all regions on the cluster. • Entries in hbase:meta are keyed by region name • Region name – table name of the region, region’s start row, time of creation, and MD5 hash of all of these. • Eg: TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece. • As row keys are sorted, finding the region that hosts a particular key is easy • Whenever region(s) split, enabled, disabled, deleted etc., the catalog table is updated. Hbase in Operation
  • 16. Page 16Classification: Restricted • Fresh clients connect to Zookeeper cluster to get the location of hbase:meta  To figure out hosting user-space regions and its locations. • Then, clients interact directly with regionservers. • Clients cache their previous operations – works fine until there is a fault. • If fault happens, clients contact hbase:meta again. If this has also moved, clients will contact Zookeeper. • Writes arriving at a regionserver are first appended to a commit log and then added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem Hbase in Operation
  • 17. Page 17Classification: Restricted • When reading, the region’s memstore is consulted first. If sufficient versions are found reading memstore alone, the query completes there. Otherwise, flush files are consulted in order, from newest to oldest, either until versions sufficient to satisfy the query are found or until we run out of flush files. Hbase in Operation
  • 18. Page 18Classification: Restricted • Using HBase shell • Using Client APIs • Using Pig • Using Sqoop Loading Data Into Hbase
  • 23. Page 23Classification: Restricted Hbase Use cases •Capturing incremental data – Time series data – High Volume, Velocity Writes •eg: Sensor, system metrics, events, stock prices, server logs, rainfall data •Information Exchange – High Volume, Velocity Write/Read •eg: email, chat •Content serving, web Application Backend – High Volume, Velocity Reads •eg: ebay, groupon