SlideShare a Scribd company logo
Big Data and Hadoop Training
HBASE
Page 2Classification: Restricted
Agenda
•HBase Introduction
•Row & Column storage
•Characteristics of a huge DB
•What is HBase?
•HBase Data-Model
•HBase vs RDBMS
•HBase architecture
•HBase in operation
•Loading Data into HBase
•HBase shell commands
•HBase operations through Java
•HBase operations through MR
Page 3Classification: Restricted
What is Hbase?
• Open source project built on top of Apache Hadoop
• NoSQL database
• Distributed, scalable store
• Column-family datastore
Page 4Classification: Restricted
How do you pick Sql or NoSql?
• What does your data look like?
• Is your data model likely to change?
• Is your data growing exponentially?
• Will you be doing real-time analytics on operational data?
Page 5Classification: Restricted
Inspiration for Hbase
•Google’s BigTable is the inspiration for Hbase
•It is designed to run on a cluster of computers.
Characteristics of Big Table:
•Data is ‘Sparse’
•Data is stored as a ‘Sorted Map’
•‘Distributed’
•‘Multi-dimensional’
•‘Consistent’
Page 6Classification: Restricted
Hbase vs RDBMS
HBase RDBMS
Data that is accessed together is stored
together
Data is normalized
Column-oriented Row-oriented(mostly)
Flexible schema, can add columns on
the fly
Fixed schema
Good with Sparse tables Not optimized for sparse tables
No Joins Optimized for joins
Horizontal Scalability Hard to shard and scale
Good for structured, semi-structured
data
Good for structured data
Row-based transactions Distributed transactions
Page 7Classification: Restricted
Row & Column - Storage
•Column oriented store – For specific queries, not all values of a table are
needed (analytical databases)
•Advantages of Column-oriented storage:
•Reduced I/O
•Values of columns in the logical rows are similar – better suited for
compression
Page 8Classification: Restricted
Page 9Classification: Restricted
Hbase Data - Model
Component Description
Table Data organized into tables; comprised rows
Row key Data stored in rows; Rows identified by Rowkeys;
Primary key; Rows are sorted by this value
Column family Columns are grouped into families
Column Qualifier Identifies the column
Cell Combination of the rowkey, column family, colum, timestamp;
contains the value
Version Values within cell versioned by version number  timestamp
Page 10Classification: Restricted
Hbase Data Model
Page 11Classification: Restricted
Hbase Data - Model
• Regions – Horizontal partitions of a Hbase Table.
• A Region is denoted by the Table it belongs to, it’s first row(inclusive), last
row(exclusive)
• Regions are the units that get distributed over an entire cluster.
• Initially, a table comprises a single region, but as the region grows it eventually
crosses a configurable size threshold, at which point it splits at a row boundary
into two new regions of approximately equal size
Page 12Classification: Restricted
Hbase Architecture
Page 13Classification: Restricted
• Hbase Master – master node
• Regionservers – slave nodes
• Hbase Master
• bootstraps a virgin install,
• assigns regions to registered regionservers,
• recovers regionserver failures
• Regionservers
• carry zero or more regions
• take client read/write requests
• Manage region splits – informs master about the new daughter regions
Hbase Architecture
Page 14Classification: Restricted
• ZooKeeper – Authority on the cluster state
• Hbase – location of catalog table & cluster master
• Assignment of regions is mediated via Zookeeper in case servers crash mid-
assignment
• Hbase Client must know the location of the zookeeper ensemble.
• Thereafter, client navigates the zookeeper hierarchy to learn cluster attributes
such as server lcoations.
Hbase Architecture
Page 15Classification: Restricted
• hbase:meta – list, state & locations of all regions on the cluster.
• Entries in hbase:meta are keyed by region name
• Region name – table name of the region, region’s start row, time of
creation, and MD5 hash of all of these.
• Eg: TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece.
• As row keys are sorted, finding the region that hosts a particular key is easy
• Whenever region(s) split, enabled, disabled, deleted etc., the catalog table is
updated.
Hbase in Operation
Page 16Classification: Restricted
• Fresh clients connect to Zookeeper cluster to get the location of hbase:meta
 To figure out hosting user-space regions and its locations.
• Then, clients interact directly with regionservers.
• Clients cache their previous operations – works fine until there is a fault.
• If fault happens, clients contact hbase:meta again. If this has also moved,
clients will contact Zookeeper.
• Writes arriving at a regionserver are first appended to a commit log and then
added to an in-memory memstore. When a memstore fills, its content is
flushed to the filesystem
Hbase in Operation
Page 17Classification: Restricted
• When reading, the region’s memstore is consulted first. If sufficient versions
are found reading memstore alone, the query completes there. Otherwise,
flush files are consulted in order, from newest to oldest, either until versions
sufficient to satisfy the query are found or until we run out of flush files.
Hbase in Operation
Page 18Classification: Restricted
• Using HBase shell
• Using Client APIs
• Using Pig
• Using Sqoop
Loading Data Into Hbase
Page 19Classification: Restricted
Hbase Shell commands
Page 20Classification: Restricted
Hbase Shell commands
Page 21Classification: Restricted
Hbase Shell Commands
Page 22Classification: Restricted
Connect to Hbase from Clients
Page 23Classification: Restricted
Hbase Use cases
•Capturing incremental data – Time series data – High Volume, Velocity
Writes
•eg: Sensor, system metrics, events, stock prices, server logs, rainfall data
•Information Exchange – High Volume, Velocity Write/Read
•eg: email, chat
•Content serving, web Application Backend – High Volume, Velocity Reads
•eg: ebay, groupon
Page 24Classification: Restricted
Thank You

More Related Content

What's hot

Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
knowbigdata
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Mitul Tiwari
 
Apache hive
Apache hiveApache hive
Apache hive
pradipbajpai68
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
Avkash Chauhan
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
Rajeshbabu Chintaguntla
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
 
Hive tuning
Hive tuningHive tuning
Hive tuning
Michael Zhang
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Monitoring Orange’s applications with the Elastic Stack
Monitoring Orange’s applications with the Elastic StackMonitoring Orange’s applications with the Elastic Stack
Monitoring Orange’s applications with the Elastic Stack
Elasticsearch
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
Taras Matyashovsky
 

What's hot (20)

Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Apache hive
Apache hiveApache hive
Apache hive
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Monitoring Orange’s applications with the Elastic Stack
Monitoring Orange’s applications with the Elastic StackMonitoring Orange’s applications with the Elastic Stack
Monitoring Orange’s applications with the Elastic Stack
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 

Similar to Hbase

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
Vibrant Technologies & Computers
 
Hbase
HbaseHbase
01 hbase
01 hbase01 hbase
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
Anuja Gunale
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
Matteo Bertozzi
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
sheetal sharma
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
智杰 付
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 

Similar to Hbase (20)

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Hbase
HbaseHbase
Hbase
 
01 hbase
01 hbase01 hbase
01 hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Hbase

  • 1. Big Data and Hadoop Training HBASE
  • 2. Page 2Classification: Restricted Agenda •HBase Introduction •Row & Column storage •Characteristics of a huge DB •What is HBase? •HBase Data-Model •HBase vs RDBMS •HBase architecture •HBase in operation •Loading Data into HBase •HBase shell commands •HBase operations through Java •HBase operations through MR
  • 3. Page 3Classification: Restricted What is Hbase? • Open source project built on top of Apache Hadoop • NoSQL database • Distributed, scalable store • Column-family datastore
  • 4. Page 4Classification: Restricted How do you pick Sql or NoSql? • What does your data look like? • Is your data model likely to change? • Is your data growing exponentially? • Will you be doing real-time analytics on operational data?
  • 5. Page 5Classification: Restricted Inspiration for Hbase •Google’s BigTable is the inspiration for Hbase •It is designed to run on a cluster of computers. Characteristics of Big Table: •Data is ‘Sparse’ •Data is stored as a ‘Sorted Map’ •‘Distributed’ •‘Multi-dimensional’ •‘Consistent’
  • 6. Page 6Classification: Restricted Hbase vs RDBMS HBase RDBMS Data that is accessed together is stored together Data is normalized Column-oriented Row-oriented(mostly) Flexible schema, can add columns on the fly Fixed schema Good with Sparse tables Not optimized for sparse tables No Joins Optimized for joins Horizontal Scalability Hard to shard and scale Good for structured, semi-structured data Good for structured data Row-based transactions Distributed transactions
  • 7. Page 7Classification: Restricted Row & Column - Storage •Column oriented store – For specific queries, not all values of a table are needed (analytical databases) •Advantages of Column-oriented storage: •Reduced I/O •Values of columns in the logical rows are similar – better suited for compression
  • 9. Page 9Classification: Restricted Hbase Data - Model Component Description Table Data organized into tables; comprised rows Row key Data stored in rows; Rows identified by Rowkeys; Primary key; Rows are sorted by this value Column family Columns are grouped into families Column Qualifier Identifies the column Cell Combination of the rowkey, column family, colum, timestamp; contains the value Version Values within cell versioned by version number  timestamp
  • 11. Page 11Classification: Restricted Hbase Data - Model • Regions – Horizontal partitions of a Hbase Table. • A Region is denoted by the Table it belongs to, it’s first row(inclusive), last row(exclusive) • Regions are the units that get distributed over an entire cluster. • Initially, a table comprises a single region, but as the region grows it eventually crosses a configurable size threshold, at which point it splits at a row boundary into two new regions of approximately equal size
  • 13. Page 13Classification: Restricted • Hbase Master – master node • Regionservers – slave nodes • Hbase Master • bootstraps a virgin install, • assigns regions to registered regionservers, • recovers regionserver failures • Regionservers • carry zero or more regions • take client read/write requests • Manage region splits – informs master about the new daughter regions Hbase Architecture
  • 14. Page 14Classification: Restricted • ZooKeeper – Authority on the cluster state • Hbase – location of catalog table & cluster master • Assignment of regions is mediated via Zookeeper in case servers crash mid- assignment • Hbase Client must know the location of the zookeeper ensemble. • Thereafter, client navigates the zookeeper hierarchy to learn cluster attributes such as server lcoations. Hbase Architecture
  • 15. Page 15Classification: Restricted • hbase:meta – list, state & locations of all regions on the cluster. • Entries in hbase:meta are keyed by region name • Region name – table name of the region, region’s start row, time of creation, and MD5 hash of all of these. • Eg: TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece. • As row keys are sorted, finding the region that hosts a particular key is easy • Whenever region(s) split, enabled, disabled, deleted etc., the catalog table is updated. Hbase in Operation
  • 16. Page 16Classification: Restricted • Fresh clients connect to Zookeeper cluster to get the location of hbase:meta  To figure out hosting user-space regions and its locations. • Then, clients interact directly with regionservers. • Clients cache their previous operations – works fine until there is a fault. • If fault happens, clients contact hbase:meta again. If this has also moved, clients will contact Zookeeper. • Writes arriving at a regionserver are first appended to a commit log and then added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem Hbase in Operation
  • 17. Page 17Classification: Restricted • When reading, the region’s memstore is consulted first. If sufficient versions are found reading memstore alone, the query completes there. Otherwise, flush files are consulted in order, from newest to oldest, either until versions sufficient to satisfy the query are found or until we run out of flush files. Hbase in Operation
  • 18. Page 18Classification: Restricted • Using HBase shell • Using Client APIs • Using Pig • Using Sqoop Loading Data Into Hbase
  • 23. Page 23Classification: Restricted Hbase Use cases •Capturing incremental data – Time series data – High Volume, Velocity Writes •eg: Sensor, system metrics, events, stock prices, server logs, rainfall data •Information Exchange – High Volume, Velocity Write/Read •eg: email, chat •Content serving, web Application Backend – High Volume, Velocity Reads •eg: ebay, groupon