Configure h base hadoop and hbase client

Configure Hadoop/HBase/ and Hbase-Client
Configure Hadoop:

Prerequisites:
Check your IP setup
If you’re on Ubuntu check your hosts file. If you see something like:
127.0.0.1 localhost
127.0.1.1 <server fqn> <server name, as in /etc/hostname>

get rid of the second line, and change to

127.0.0.1 localhost
<server ip> <server fqn> <server name, as in /etc/hostname>

e.g.
127.0.0.1 localhost
23.201.99.100 shashwat

If you don’t do this, the region servers will resolve their addresses to
127.0.1.1. This information will be stored inside the ZooKeeper
instance that HBase runs (the directory and lock manager used by
the system to configure and synchronize a running HBase cluster).
When manipulating remote HBase data, client code libraries actually
connect to ZooKeeper to find the address of the region server
maintaining the data. In this case, they will be given 127.0.1.1 which
resolves to the client machine.

Sun Java 6
1. Add the Canonical Partner Repository to your apt repositories:
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"

2. Update the source list
$ sudo apt-get update

3. Install sun-java6-jdk
$ sudo apt-get install sun-java6-jdk

4. Select Sun’s Java as the default on your machine.
$ sudo update-java-alternatives -s java-6-sun

If you installing it on Ubuntu 12.04 then follow this :

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop. While
that’s not required it is recommended because it helps to separate the
Hadoop installation from other software applications and user accounts
running on the same machine (think: security, permissions, backups, etc).
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add the user hduser and the group hadoop to your local machine.

Configuring SSH
1. su - hduser
2. ssh-keygen -t rsa -P ""
3. cat $HOME/.ssh/id_rsa.pub >>$HOME/.ssh/authorized_keys

ssh shashwat //If you able to connect to shashwat successfully without //giving password then ssh is
successfully configured, else //delete the .ssh folder in user's home folder and try again to configure ssh

If the SSH connect should fail, these general tips might help:
Enable debugging with ssh -vvv shashwat and investigate the error in detail.
Check the SSH server configuration in /etc/ssh/sshd_config, in particular the
optionsPubkeyAuthentication (which should be set to yes) andAllowUsers(if
this option is active, add thehduser user to it). If you made any changes to the
SSH server configuration file, you can force a configuration reload with sudo
/etc/init.d/ssh reload.

Disabling IPv6
One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various
networking-related Hadoop configuration options will result in Hadoop
binding to the IPv6 addresses of my Ubuntu box.
In my case, I realized that there’s no practical point in enabling IPv6 on a
box when you are not connected to any IPv6 network. Hence, I simply
disabled IPv6 on my Ubuntu machine. Your mileage may vary.
To disable IPv6 on Ubuntu 10.04 LTS, open /etc/sysctl.conf in the editor of
your choice and add the following lines to the end of the file:
#disable ipv6

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.lo.disable_ipv6 = 1

You have to reboot your machine in order to make the changes take effect.
You can check whether IPv6 is enabled on your machine with the following
command:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

A return value of 0 means IPv6 is enabled, a value of 1 means disabled
(that’s what we want).

Alternative
You can also disable IPv6 only for Hadoop as documented in
HADOOP-3437. You can do so by adding the following line to conf/hadoop-
env.sh:
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Hadoop

Installation
You have to download Hadoop from the Apache Download Mirrors and
extract the contents of the Hadoop package to a location of your choice. I
picked /usr/local/hadoop. Make sure to change the owner of all the files to
the hduser user and hadoop group, for example:
$ cd /usr/local
$ sudo tar xzf hadoop-0.20.2.tar.gz

$ sudo mv hadoop-0.20.2 hadoop

$ sudo chown -R hduser:hadoop hadoop

Hadoop-Configuration

hadoop-env.sh
The only required environment variable we have to configure for Hadoop
in this tutorial is JAVA_HOME. Open/conf/hadoop-env.sh in the editor of
your choice (if you used the installation path in this tutorial, the full path
is/usr/local/hadoop/conf/hadoop-env.sh) and set the JAVA_HOME
environment variable to the Sun JDK/JRE 6 directory.

Change
# The java implementation to use. Required.

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

to
# The java implementation to use. Required.

export JAVA_HOME=/usr/lib/jvm/java-6-sun

Add the following snippets between the <configuration> </configuration>
tags in the respective configuration XML file.

In file conf/core-site.xml:

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://shashwat:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
The
uri's scheme determines the config property (fs.SCHEME.impl)
naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

In file conf/mapred-site.xml:

<property>
<name>mapred.job.tracker</name>

<value>shashwat:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
In file conf/hdfs-site.xml:

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is
created.
The default is used if replication is not specified in create time.
</description>
</property>

Formatting the HDFS filesystem via the NameNode
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

Starting your single-node cluster
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

A nifty tool for checking whether the expected Hadoop processes are running is
jps
if you can see the following:
TaskTracker
JobTracker
DataNode

SecondaryNameNode
Jps
NameNode
then you can be sure that hadoop is configured correctly and running.

Making sure Hadoop is working
You can see the Hadoop logs in ~/work/hadoop/logs
You should be able to see the Hadoop Namenode web interface at
http://shashwat:50070/ and the JobTracker Web Interface at
http://shashwat:50030/. If not, check that you have 5 java processes running where
each of those java processes have one of the following as their last command line (as
seen from a ps ax | grep hadoop command) :
org.apache.hadoop.mapred.JobTracker
org.apache.hadoop.hdfs.server.namenode.NameNode
org.apache.hadoop.mapred.TaskTracker
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
org.apache.hadoop.hdfs.server.datanode.DataNode

If you do not see these 5 processes, check the logs in ~work/hadoop/logs/*.{out,log}
for messages that might give you a hint as to what went wrong.

Run some example map/reduce jobs
The Hadoop distro comes with some example / test map / reduce jobs. Here we’ll run
them and make sure things are working end to end.
cd ~/work/hadoop
# Copy the input files into the distributed filesystem
# (there will be no output visible from the command):
bin/hadoop fs -put conf input
# Run some of the examples provided:
# (there will be a large amount of INFO statements as output)
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
# Examine the output files:
bin/hadoop fs -cat output/part-00000

The resulting output should be something like:
3 dfs.class
2 dfs.period
1 dfs.file
1 dfs.replication

1 dfs.servers
1 dfsadmin
1 dfsmetrics.log

Configure Hbase :

The following config files all reside in ~/work/hbase/conf. As mentioned earlier,
use a FQDN or a Bonjour name instead of shashwat if you need remote clients
to access HBase. But if you don’t use shashwat here, make sure you do the same
in the Hadoop config.

hbase-env.sh
Add the following line below the commented out JAVA_HOME line is in hbase-
env.sh
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/
CurrentJDK/Home

Add the following line below the commented out HBASE_CLASSPATH= line
export HBASE_CLASSPATH=${HOME}/work/hadoop/conf

hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?
>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://shashwat:9000/hbase</value>
<description>The directory shared by region servers.
</description>
</property>
</configuration>

Making Sure HBase is Working
If you do a ps ax | grep hbase you should see two java processes. One should
end with:
org.apache.hadoop.hbase.zookeeper.HQuorumPeer start
And the other should end with:
org.apache.hadoop.hbase.master.HMaster start

Since we are running in the Pseudo-Distributed mode, there will not be any
explicit regionservers running. If you have problems, check the logs in
~/work/hbase/logs/*.{out,log}

Testing HBase using the HBase Shell
From the unix prompt give the following command:
~/work/hbase/bin/hbase shell

Here is some example commands from the Apache HBase Installation
Instructions:
base> # Type "help" to see shell help screen
hbase> help
hbase> # To create a table named "mylittletable" with a column family of
"mylittlecolumnfamily", type
hbase> create "mylittletable", "mylittlecolumnfamily"
hbase> # To see the schema for you just created "mylittletable" table and its
single "mylittlecolumnfamily", type
hbase> describe "mylittletable"
hbase> # To add a row whose id is "myrow", to the column
"mylittlecolumnfamily:x" with a value of 'v', do
hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v"
hbase> # To get the cell just added, do
hbase> get "mylittletable", "myrow"
hbase> # To scan you new table, do
hbase> scan "mylittletable"

You can stop hbase with the command:
~/work/hbase/bin/stop-hbase.sh

Once that has stopped you can stop hadoop:
~/work/hadoop/bin/stop-all.sh

Setting Hbase Client (Accessing Hbase Remotely):

Add following to hbase-site.xml

<property>
<name>hbase.rootdir</name>
<value>hdfs://<domain address>:9000/hbase</value>
</property>
<property>
<name>hbase.master</name>
<value>shashwat:60000</value>
<description>The host and port that the HBase master runs at.</
description>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60020</value>
description>
</property>

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>

<name>hbase.tmp.dir</name>
<value>/home/shashwat/Hadoop/hbase-0.90.4/temp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>shashwat</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/<user>/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>

After adding the above text to hbase-site.xml, start hbase and check if the
HMaster is running using -jps- command on the shell.

After this the ip address and dommain name of the hbase master should be
added to the client machines which are interested in connected to hbase
remotely.

Suppose hbase.master is running on 192.168.2.125 and the domain name is
shashwat

Windows -: My computer-> c:-> windows->system32->drivers->etc->hosts
Linux -: /etc/hosts

Open the file with admin permission and add the line
192.168.2.125 shashwat //the server where the
//hbase.master is running

save this file and exit.

Building the hbae client :
code for accessing the hmaster using client. Following is the client code.

import java.io.IOException;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

public class HbaseClient {

/**
* @param args the command line arguments
*/
public static void main(String[] args) throws MasterNotRunningException,
ZooKeeperConnectionException, IOException {
// TODO code application logic here

System.out.println("Hbase Demo Application ");

// CONFIGURATION
// ENSURE RUNNING
Configuration conf = HBaseConfiguration.create();
conf.clear();
conf.set("hbase.zookeeper.quorum", "shashwat");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.master", "shashwat:60000");
HBaseAdmin.checkHBaseAvailable(conf);

System.out.println("HBase is running!");
HTable table = new HTable(conf, "date");
System.out.println("Table obtained");

System.out.println("Fetching data now.....");

Get g = new Get(Bytes.toBytes("-101"));
Result r = table.get(g);
byte[] value = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("cf1"));
// If we convert the value bytes, we should get back 'Some Value', the
// value we inserted at this location.

String valueStr = Bytes.toString(value);
System.out.println("GET: " + valueStr);
}
}
Some references :

http://wiki.apache.org/hadoop/
http://wiki.apache.org/hadoop/Hbase/FAQ
http://blog.ibd.com/howto/hbase-hadoop-on-mac-ox-x/
http://ria101.wordpress.com/2010/01/28/setup-hbase-in-pseudo-
distributed-mode-and-connect-java-client/
http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html#P
seudoDistributed
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-
stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-
mrbench/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-
linux-multi-node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-
linux-single-node-cluster/
http://hbase.apache.org/book.html

Note :
If you find any mistake in this document, feel free to drop me a mail @
Gmail : dwivedishashwat@gmail.com
Facebook : https://www.facebook.com/shriparv
Twitter : https://twitter.com/#!/shashwat_2010
Skype : shriparv
Visit My blogs at :
http://helpmetocode.blogspot.in/
http://writingishabit.blogspot.in/
http://realiq.blogspot.in/

Configure h base hadoop and hbase client

More Related Content

What's hot

Viewers also liked

Similar to Configure h base hadoop and hbase client

More from Shashwat Shriparv

Recently uploaded

Configure h base hadoop and hbase client