SlideShare a Scribd company logo
Chapter 09: Deploying HBase
HBase IN ACTION
by Nick Dimiduk et. al.
Overview: Deploying HBase
Planning your cluster
Deploying software
Distributions
Configuration
Managing the daemons
Summary
09/24/15
9.1 Planning your cluster
 Planning an HBase cluster includes planning the underlying Hadoop
cluster.
 This section will highlight the considerations to keep in mind when
choosing hardware and how the roles (HBase Master,
RegionServers, ZooKeeper, and so on) should be deployed on the
cluster.
 Prototype cluster
 A prototype cluster is one that doesn’t have strict SLAs, and it’s okay for it to
go
 down.
 Collocate the HBase Master with the Hadoop NameNode and JobTracker on
the same node.
 It typically has fewer than 10 nodes.
 It’s okay to collocate multiple services on a single node in a prototype cluster.
 4–6 cores, 24–32 GB RAM, and 4 disks per node should be a good place to
start.
09/24/15
9.1 Planning your cluster (con't)
 Small production cluster (10–20 servers) : Generally, you shouldn’t have
fewer than 10 nodes in a production HBase cluster.
 Fewer than 10 slave nodes is hard to make operationalize.
 Consider relatively better hardware for the Master nodes if you’re
deploying a production cluster. Dual power supplies and perhaps RAID
are the order of the day.
 Small production clusters with not much traffic/workload can have
services collocated.
 A single HBase Master is okay for small clusters.
 A single ZooKeeper is okay for small clusters and can be collocated with
the HBase Master. If the host running the NameNode and JobTracker is
beefy enough, put ZooKeeper and HBase Master on it too. This will save
you having to buy an extra machine.
 A single HBase Master and ZooKeeper limits serviceability.
Hbase Course
Data Manipulation at Scale: Systems and Algorit
Using HBase for Real-time Access to Your Big Da
09/24/15
9.1 Planning your cluster (con't)
Medium production cluster (up to ~50 servers)
 Up to 50 nodes, possibly in production, would fall in this category.
 We recommend that you not collocate HBase and MapReduce for
performance reasons. If you do collocate, deploy NameNode and
JobTracker on separate hardware.
 Three ZooKeeper and three HBase Master nodes should be deployed,
especially if this is a production system. You don’t need three HBase
Masters and can do with two; but given that you already have three
ZooKeeper nodes and are sharing ZooKeeper and HBase Master, it
doesn’t hurt to have a third Master.
 Don’t cheap out on the hardware for the NameNode and Secondary
NameNodes.
09/24/15
9.1 Planning your cluster (con't)
Large production cluster (>~50 servers)
 Everything for the medium-sized cluster holds true, except that you may
need five ZooKeeper instances that can also collocate with HBase
Masters.
 Make sure NameNode and Secondary NameNode have enough memory,
depending on the storage capacity of the cluster.
Hadoop Master nodes
 Have redundancy at the hardware level for the various components:
NICs, RAID disks
 There is enough RAM to be able to address the entire namespace :
Namenode
 The Secondary NameNode should have the same hardware as the
NameNode.
09/24/15
9.1 Planning your cluster (con't)
HBase Master
 HBase Master is a lightweight process and doesn’t need a lot of resources,
but it’s wise to keep it on independent hardware if possible.
 Have multiple HBase Masters for redundancy.
 cores, 8–16 GB RAM, and 2 disks are more than enough for the HBase
Master nodes.
Hadoop DataNodes and HBase RegionServers
 DataNodes and RegionServers are always collocated. They serve the
traffic. Avoid running MapReduce on the same nodes.
 8–12 cores, 24–32 GB RAM, 12x 1 TB disks is a good place to start.
 You can increase the number of disks for higher storage density, but
don’t go too high or replication will take time in the face of node or disk
failure.
 Get a larger number of reasonably sized boxes instead of fewer beefy
ones.
09/24/15
9.1 Planning your cluster (con't)
ZooKeeper(s)
 ZooKeepers are lightweight but latency sensitive.
 Hardware similar to that of the HBase Master works fine if you’re looking to
deploy them separately.
 HBase Master and ZooKeeper can be collocated safely as long as you make
sure ZooKeeper gets a dedicated spindle for its data persistence.
 Add a disk (for the ZooKeeper data to be persisted on) to the configuration
mentioned in the HBase Master section if you’re collocating.
09/24/15
9.1 Planning your cluster (con't)
What about the cloud?
 At least 16 GB RAM. HBase RegionServers are RAM hungry. But don’t give
them too much, or you’ll run into Java GC issues. We’ll talk about tuning GC
later in this chapter.
 Have as many disks as possible. Most EC2 instances at the time of writing
don’t provide a high number of disks.
 A fatter network is always better.
 Get ample compute based on your individual use case. MapReduce jobs need
more compute power than a simple website-serving database.
 It’s important that you’re aware of the arguments in favor of and against
deploying HBase in the cloud.
 Cost
 Ease of use
 Operations
 Reliability
 Lack of customization
 Performance
 Security
09/24/15
9.2 Deploying software
Managing and deploying on a cluster of machines, especially
in production, is nontrivial and needs careful work.
When deploying to a large number of machines, we
recommend that you automate the process as much as
possible.
Our intent is to introduce you to all the ways you can think
about deployments.
 Whirr: deploying in the cloud : If you’re looking to deploy HBase in the
cloud, you should get Apache Whirr to make your life easier.
09/24/15
9.3 Distributions
 This section will cover installing HBase on your cluster. Numerous
distributions (or packages) of HBase are available, and each has
multiple releases. The most notable distributions currently are the
stock Apache distribution and Cloudera’s CDH:
 Apache : The Apache HBase project is the parent project where all the
development for HBase happens.
 Cloudera’s CDH : Cloudera is a company that has its own distribution
containing Hadoop and other components in the ecosystem, including
HBase.
 We recommend using Cloudera’s CDH distribution. It typically includes
more patches than the stock releases to add stability, performance
improvements, and sometimes features.
 CDH is also better tested than the Apache releases and is running in
production in more clusters than stock Apache. These are points we
recommend thinking about before you choose the distribution for your
cluster.
09/24/15
9.3.1 Using the stock Apache distribution
To install the stock Apache distribution, you need to download
the tarballs and install those into a directory of your choice.
09/24/15
9.3.2 Using Cloudera’s CDH distribution
 The current release for CDH is
CDH4u0 which is based on the
0.92.1 Apache release. The
installation instructions are
environment specific; the
fundamental steps are as follows:
09/24/15
9.4 Configuration
Deploying HBase requires configuring Linux, Hadoop, and, of
course, HBase.
In order to configure the system in the most optimal manner,
it’s important that you understand the parameters and the
implications of tuning them one way or another.
09/24/15
9.4.1 HBase configurations
 ENVIRONMENT
CONFIGURATIONS : hbase-
env.sh things like the Java heap
size, garbage-collection
parameters, and other
environment variables are set
here.
09/24/15
9.4.1 HBase configurations (con't)
 The configuration parameters for
HBase daemons are put in an XML
file called hbase-site.xml.
09/24/15
9.4.1 HBase configurations (con't)
09/24/15
9.4.2 Hadoop configuration parameters relevant to HBase
09/24/15
9.4.3 Operating system configurations
 HBase is a database and needs to keep files open so you can read from and
write to them without incurring the overhead of opening and closing them
on each operation.
 To increase the open-file limit for the user, put the following statements in
your /etc/ security/limits.conf file for the user that will run the Hadoop and
HBase daemons. CDH does this for you as a part of the package installation:
hadoopuser nofile 32768
hbaseuser nofile 32768
hadoopuser soft/hard nproc 32000
hbaseuser soft/hard nproc 32000
Another important configuration parameter to tune is the swap behavior.
$ sysctl -w vm.swappiness=0
09/24/15
9.5 Managing the daemons
The relevant services need to be started on each node of the
cluster :
 Use the bundled start and stop scripts.
 Cluster SSH (http://sourceforge.net/projects/clusterssh) is a useful tool if
you’re dealing with a cluster of machines. It allows you to simultaneously run
the same shell commands on a cluster of machines that you’re logged in to in
separate windows.
 Homegrown scripts are always an option.
 Use management software like Cloudera Manager that allows you to manage
all the services on the cluster from a single web-based UI.
Hbase Course
Data Manipulation at Scale: Systems and Algorit
Using HBase for Real-time Access to Your Big Da
09/24/15
9.5 Summary
In this chapter, we covered the various aspects of deploying
HBase in a fully distributed environment for your production
application.
 We talked about the considerations to take into account when choosing
hardware for your cluster, including whether to deploy on your own
hardware or in the cloud.
 This chapter gets you ready to think about putting HBase in production.

More Related Content

What's hot

Presentation day1oracle 12c
Presentation day1oracle 12cPresentation day1oracle 12c
Presentation day1oracle 12c
Pradeep Srivastava
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Piotr Pruski
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
Nick Dimiduk
 
Presentation day4 oracle12c
Presentation day4 oracle12cPresentation day4 oracle12c
Presentation day4 oracle12c
Pradeep Srivastava
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
David Groozman
 
Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c
Anju Garg
 
Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
EDB
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
Cynthia Saracco
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Piotr Pruski
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
Yue Chen
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
Nisheet Mahajan
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
Yue Chen
 
What's New in PostgreSQL 9.3
What's New in PostgreSQL 9.3What's New in PostgreSQL 9.3
What's New in PostgreSQL 9.3
EDB
 
Presentation day5 oracle12c
Presentation day5 oracle12cPresentation day5 oracle12c
Presentation day5 oracle12c
Pradeep Srivastava
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 

What's hot (19)

Presentation day1oracle 12c
Presentation day1oracle 12cPresentation day1oracle 12c
Presentation day1oracle 12c
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
Presentation day4 oracle12c
Presentation day4 oracle12cPresentation day4 oracle12c
Presentation day4 oracle12c
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c
 
Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Hbase
HbaseHbase
Hbase
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
What's New in PostgreSQL 9.3
What's New in PostgreSQL 9.3What's New in PostgreSQL 9.3
What's New in PostgreSQL 9.3
 
Presentation day5 oracle12c
Presentation day5 oracle12cPresentation day5 oracle12c
Presentation day5 oracle12c
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 

Viewers also liked

Policia nacional del peru
Policia nacional del peruPolicia nacional del peru
Policia nacional del peru
Verito Mac
 
Pp kam veiling animatie
Pp kam  veiling animatiePp kam  veiling animatie
Pp kam veiling animatie
KAMariakerke
 
Orlando W .Kelly 2015
Orlando W .Kelly 2015Orlando W .Kelly 2015
Orlando W .Kelly 2015Orlando Kelly
 
Kärcher
KärcherKärcher
Kärcher járműtisztítás
Kärcher járműtisztításKärcher járműtisztítás
Kärcher járműtisztítás
Kaercherhungariakft
 
Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...
Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...
Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...
Investments Network marcus evans
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Flexibility in the Family Office Structure is Key - Sean Cortis News Release
Flexibility in the Family Office Structure is Key - Sean Cortis News ReleaseFlexibility in the Family Office Structure is Key - Sean Cortis News Release
Flexibility in the Family Office Structure is Key - Sean Cortis News Release
Investments Network marcus evans
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
phanleson
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
phanleson
 
Hemorragia masiva en el paciente politraumatizado
Hemorragia masiva en el paciente politraumatizadoHemorragia masiva en el paciente politraumatizado
Hemorragia masiva en el paciente politraumatizado
María Rigoni
 

Viewers also liked (15)

Policia nacional del peru
Policia nacional del peruPolicia nacional del peru
Policia nacional del peru
 
Pp kam veiling animatie
Pp kam  veiling animatiePp kam  veiling animatie
Pp kam veiling animatie
 
Orlando W .Kelly 2015
Orlando W .Kelly 2015Orlando W .Kelly 2015
Orlando W .Kelly 2015
 
Rania Zayat CV
Rania Zayat CVRania Zayat CV
Rania Zayat CV
 
Kärcher
KärcherKärcher
Kärcher
 
resume
resumeresume
resume
 
Kärcher járműtisztítás
Kärcher járműtisztításKärcher járműtisztítás
Kärcher járműtisztítás
 
resume2016
resume2016resume2016
resume2016
 
Final proposal
Final proposalFinal proposal
Final proposal
 
Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...
Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...
Aligning Investment Strategies Toward Mexico - Interview: Oscar Franco, Execu...
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Flexibility in the Family Office Structure is Key - Sean Cortis News Release
Flexibility in the Family Office Structure is Key - Sean Cortis News ReleaseFlexibility in the Family Office Structure is Key - Sean Cortis News Release
Flexibility in the Family Office Structure is Key - Sean Cortis News Release
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
 
Hemorragia masiva en el paciente politraumatizado
Hemorragia masiva en el paciente politraumatizadoHemorragia masiva en el paciente politraumatizado
Hemorragia masiva en el paciente politraumatizado
 

Similar to Hbase in action - Chapter 09: Deploying HBase

Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
Edureka!
 
Unit 5
Unit  5Unit  5
Unit 5
Ravi Kumar
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
Sandeep Raju
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
Ashok Modi
 
Wido den Hollander - building highly available cloud with Ceph and CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStackWido den Hollander - building highly available cloud with Ceph and CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStack
ShapeBlue
 
Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scale
Alex Thompson
 
Run more applications without expanding your datacenter
Run more applications without expanding your datacenterRun more applications without expanding your datacenter
Run more applications without expanding your datacenter
Principled Technologies
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
IJRESJOURNAL
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 

Similar to Hbase in action - Chapter 09: Deploying HBase (20)

Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Unit 5
Unit  5Unit  5
Unit 5
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
Wido den Hollander - building highly available cloud with Ceph and CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStackWido den Hollander - building highly available cloud with Ceph and CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStack
 
Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scale
 
Hbase
HbaseHbase
Hbase
 
Unit 1
Unit 1Unit 1
Unit 1
 
Run more applications without expanding your datacenter
Run more applications without expanding your datacenterRun more applications without expanding your datacenter
Run more applications without expanding your datacenter
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 

More from phanleson

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
phanleson
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
phanleson
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
phanleson
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
phanleson
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
phanleson
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
phanleson
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
phanleson
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
phanleson
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
phanleson
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLphanleson
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
phanleson
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposes
phanleson
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19
phanleson
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Development
phanleson
 
Lecture 15 - Technical Details
Lecture 15 - Technical DetailsLecture 15 - Technical Details
Lecture 15 - Technical Details
phanleson
 
Lecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange PatternsLecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange Patterns
phanleson
 
Lecture 9 - SOA in Context
Lecture 9 - SOA in ContextLecture 9 - SOA in Context
Lecture 9 - SOA in Context
phanleson
 

More from phanleson (20)

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposes
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Development
 
Lecture 15 - Technical Details
Lecture 15 - Technical DetailsLecture 15 - Technical Details
Lecture 15 - Technical Details
 
Lecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange PatternsLecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange Patterns
 
Lecture 9 - SOA in Context
Lecture 9 - SOA in ContextLecture 9 - SOA in Context
Lecture 9 - SOA in Context
 

Recently uploaded

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 

Recently uploaded (20)

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 

Hbase in action - Chapter 09: Deploying HBase

  • 1. Chapter 09: Deploying HBase HBase IN ACTION by Nick Dimiduk et. al.
  • 2. Overview: Deploying HBase Planning your cluster Deploying software Distributions Configuration Managing the daemons Summary
  • 3. 09/24/15 9.1 Planning your cluster  Planning an HBase cluster includes planning the underlying Hadoop cluster.  This section will highlight the considerations to keep in mind when choosing hardware and how the roles (HBase Master, RegionServers, ZooKeeper, and so on) should be deployed on the cluster.  Prototype cluster  A prototype cluster is one that doesn’t have strict SLAs, and it’s okay for it to go  down.  Collocate the HBase Master with the Hadoop NameNode and JobTracker on the same node.  It typically has fewer than 10 nodes.  It’s okay to collocate multiple services on a single node in a prototype cluster.  4–6 cores, 24–32 GB RAM, and 4 disks per node should be a good place to start.
  • 4. 09/24/15 9.1 Planning your cluster (con't)  Small production cluster (10–20 servers) : Generally, you shouldn’t have fewer than 10 nodes in a production HBase cluster.  Fewer than 10 slave nodes is hard to make operationalize.  Consider relatively better hardware for the Master nodes if you’re deploying a production cluster. Dual power supplies and perhaps RAID are the order of the day.  Small production clusters with not much traffic/workload can have services collocated.  A single HBase Master is okay for small clusters.  A single ZooKeeper is okay for small clusters and can be collocated with the HBase Master. If the host running the NameNode and JobTracker is beefy enough, put ZooKeeper and HBase Master on it too. This will save you having to buy an extra machine.  A single HBase Master and ZooKeeper limits serviceability.
  • 5. Hbase Course Data Manipulation at Scale: Systems and Algorit Using HBase for Real-time Access to Your Big Da
  • 6. 09/24/15 9.1 Planning your cluster (con't) Medium production cluster (up to ~50 servers)  Up to 50 nodes, possibly in production, would fall in this category.  We recommend that you not collocate HBase and MapReduce for performance reasons. If you do collocate, deploy NameNode and JobTracker on separate hardware.  Three ZooKeeper and three HBase Master nodes should be deployed, especially if this is a production system. You don’t need three HBase Masters and can do with two; but given that you already have three ZooKeeper nodes and are sharing ZooKeeper and HBase Master, it doesn’t hurt to have a third Master.  Don’t cheap out on the hardware for the NameNode and Secondary NameNodes.
  • 7. 09/24/15 9.1 Planning your cluster (con't) Large production cluster (>~50 servers)  Everything for the medium-sized cluster holds true, except that you may need five ZooKeeper instances that can also collocate with HBase Masters.  Make sure NameNode and Secondary NameNode have enough memory, depending on the storage capacity of the cluster. Hadoop Master nodes  Have redundancy at the hardware level for the various components: NICs, RAID disks  There is enough RAM to be able to address the entire namespace : Namenode  The Secondary NameNode should have the same hardware as the NameNode.
  • 8. 09/24/15 9.1 Planning your cluster (con't) HBase Master  HBase Master is a lightweight process and doesn’t need a lot of resources, but it’s wise to keep it on independent hardware if possible.  Have multiple HBase Masters for redundancy.  cores, 8–16 GB RAM, and 2 disks are more than enough for the HBase Master nodes. Hadoop DataNodes and HBase RegionServers  DataNodes and RegionServers are always collocated. They serve the traffic. Avoid running MapReduce on the same nodes.  8–12 cores, 24–32 GB RAM, 12x 1 TB disks is a good place to start.  You can increase the number of disks for higher storage density, but don’t go too high or replication will take time in the face of node or disk failure.  Get a larger number of reasonably sized boxes instead of fewer beefy ones.
  • 9. 09/24/15 9.1 Planning your cluster (con't) ZooKeeper(s)  ZooKeepers are lightweight but latency sensitive.  Hardware similar to that of the HBase Master works fine if you’re looking to deploy them separately.  HBase Master and ZooKeeper can be collocated safely as long as you make sure ZooKeeper gets a dedicated spindle for its data persistence.  Add a disk (for the ZooKeeper data to be persisted on) to the configuration mentioned in the HBase Master section if you’re collocating.
  • 10. 09/24/15 9.1 Planning your cluster (con't) What about the cloud?  At least 16 GB RAM. HBase RegionServers are RAM hungry. But don’t give them too much, or you’ll run into Java GC issues. We’ll talk about tuning GC later in this chapter.  Have as many disks as possible. Most EC2 instances at the time of writing don’t provide a high number of disks.  A fatter network is always better.  Get ample compute based on your individual use case. MapReduce jobs need more compute power than a simple website-serving database.  It’s important that you’re aware of the arguments in favor of and against deploying HBase in the cloud.  Cost  Ease of use  Operations  Reliability  Lack of customization  Performance  Security
  • 11. 09/24/15 9.2 Deploying software Managing and deploying on a cluster of machines, especially in production, is nontrivial and needs careful work. When deploying to a large number of machines, we recommend that you automate the process as much as possible. Our intent is to introduce you to all the ways you can think about deployments.  Whirr: deploying in the cloud : If you’re looking to deploy HBase in the cloud, you should get Apache Whirr to make your life easier.
  • 12. 09/24/15 9.3 Distributions  This section will cover installing HBase on your cluster. Numerous distributions (or packages) of HBase are available, and each has multiple releases. The most notable distributions currently are the stock Apache distribution and Cloudera’s CDH:  Apache : The Apache HBase project is the parent project where all the development for HBase happens.  Cloudera’s CDH : Cloudera is a company that has its own distribution containing Hadoop and other components in the ecosystem, including HBase.  We recommend using Cloudera’s CDH distribution. It typically includes more patches than the stock releases to add stability, performance improvements, and sometimes features.  CDH is also better tested than the Apache releases and is running in production in more clusters than stock Apache. These are points we recommend thinking about before you choose the distribution for your cluster.
  • 13. 09/24/15 9.3.1 Using the stock Apache distribution To install the stock Apache distribution, you need to download the tarballs and install those into a directory of your choice.
  • 14. 09/24/15 9.3.2 Using Cloudera’s CDH distribution  The current release for CDH is CDH4u0 which is based on the 0.92.1 Apache release. The installation instructions are environment specific; the fundamental steps are as follows:
  • 15. 09/24/15 9.4 Configuration Deploying HBase requires configuring Linux, Hadoop, and, of course, HBase. In order to configure the system in the most optimal manner, it’s important that you understand the parameters and the implications of tuning them one way or another.
  • 16. 09/24/15 9.4.1 HBase configurations  ENVIRONMENT CONFIGURATIONS : hbase- env.sh things like the Java heap size, garbage-collection parameters, and other environment variables are set here.
  • 17. 09/24/15 9.4.1 HBase configurations (con't)  The configuration parameters for HBase daemons are put in an XML file called hbase-site.xml.
  • 19. 09/24/15 9.4.2 Hadoop configuration parameters relevant to HBase
  • 20. 09/24/15 9.4.3 Operating system configurations  HBase is a database and needs to keep files open so you can read from and write to them without incurring the overhead of opening and closing them on each operation.  To increase the open-file limit for the user, put the following statements in your /etc/ security/limits.conf file for the user that will run the Hadoop and HBase daemons. CDH does this for you as a part of the package installation: hadoopuser nofile 32768 hbaseuser nofile 32768 hadoopuser soft/hard nproc 32000 hbaseuser soft/hard nproc 32000 Another important configuration parameter to tune is the swap behavior. $ sysctl -w vm.swappiness=0
  • 21. 09/24/15 9.5 Managing the daemons The relevant services need to be started on each node of the cluster :  Use the bundled start and stop scripts.  Cluster SSH (http://sourceforge.net/projects/clusterssh) is a useful tool if you’re dealing with a cluster of machines. It allows you to simultaneously run the same shell commands on a cluster of machines that you’re logged in to in separate windows.  Homegrown scripts are always an option.  Use management software like Cloudera Manager that allows you to manage all the services on the cluster from a single web-based UI.
  • 22. Hbase Course Data Manipulation at Scale: Systems and Algorit Using HBase for Real-time Access to Your Big Da
  • 23. 09/24/15 9.5 Summary In this chapter, we covered the various aspects of deploying HBase in a fully distributed environment for your production application.  We talked about the considerations to take into account when choosing hardware for your cluster, including whether to deploy on your own hardware or in the cloud.  This chapter gets you ready to think about putting HBase in production.

Editor's Notes

  1. This assumes you aren’t collocating MapReduce with HBase, which is the recommended way of running HBase if you’re using it for low-latency access. Collocating the two would require more cores, RAM, and spindles.
  2. http://ouo.io/uaiKO
  3. Two of the important things configured here are the memory allocation and GC. It’s critical to pay attention to these if you want to extract decent performance from your HBase deployment. HBase is a database and needs lots of memory to provide lowlatency reads and writes. We don’t recommend that you give the RegionServers more than 15 GB of heap in a production HBase deployment. The reason for not going over the top and allocating larger heaps than that is that GC starts to become too expensive. -Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
  4. http://ouo.io/uaiKO