Perform Hive installation with internal and external table import-export and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
2. Installing Hive
$ wget: http://www-us.apache.org/dist/hive/hive-1.2.2/apache-hive-1.2.2-bin.t ar.gz
#unzip
Tar –zxvf apache-hive-1.2.2-bin.tar.gz
#renaming
mv apache-hive-1.2.2.bin.tar.gz hive
(#move hive to the folder were we have other apache products for ease
maintenance (optional) $ mv hive /usr/local)
#now specify hive in .bashrc
$cd ..
$cd hduser
hduser@localhost$ vi .bashrc
Export HIVE_HOME = /home/hduser/Hive (or /usr/local/hive) # define the location
where Hive is installed.
#update
PATH=$PATH:$HADOOP_PREFIX/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin
Rupak Roy
3. #udpate OS with the recent changes:
hduser@localhost$ source ~/.bashrc
To see list of commands type:
hduser@localhost$ hive –help
Again to see list of services type
hduser@localhost$ hive --help
Now to Start Hive simply type:
hduser@localhost$ hive
Rupak Roy
4. Schema
Hive does it best to read the format of the data,
if the HDFS file have more data then the columns
described while creating a hive table, then it will
not take the extra columns and again if the HDFs
file has less data columns then the columns
described then it will take those columns extra
columns as NULL.
Also if the column data type is different than the
original data type, it will return null values. For
example describing integer data type for a string
data type will show as null values.
Rupak Roy
5. Hive Databases
A database is a namespace or a collection of tables.
To create a database in Hive use
CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
[IF NOT EXISTS ] checks if there is already a database with the same name.
Example:
hive> CREATE DATABASE IF NOT EXISTS db_1;
Or hive> CREATE SCHEMA db_1;
To view the list of databases;
hive> SHOW DATABASES
We can also check and set the database directory using:
CHECK
hive > SET hive.metastore.warehouse.dir;
output: hive.metastore.warehouse.dir =/user/hive/warehouse
SET
hive> SET hive.metastore.warehouse.dir = /user/hive1/warehouse1;
Rupak Roy
6. To delete a hive database use:
DROP DATABASE <database_name>
Example:
hive> DROP DATABASE db_1;
Remember: We can’t DROP a database if it contains table, this will throw an error.
So use, hive> DROP DATABASE db_1 CASCADE;
Now if we want to store it in a different location/directory use:
hive> CREATE DATABASE IF NOT EXISTS db_2
LOCATION ‘/user/cloudera/hive/’ ;
To use a newly created database:
hive> use db_2;
Lets create a table to demonstrate the DROP and the Cascade function.
hive> create table tb_1 ( Name STRING, ID INT);
hive> drop database db_2;
-------error -----database is not empty.
hive> drop database db_2 CASCADE;
------no error---database is deleted.
Rupak Roy
7. In real life it will be very difficult each and every time to remember or
keep track of the current database that we are working with. So its
better we should enable the cli.print = true to display the current
working database.
hive> create database db_2;
hive> SET hive.cli.print.current.db;
hive> SET hive.cli.print.current.db = true;
hive(db_2) > quit;
Now, we can set this feature permanently so that each time we run hive
we don’t have to go through the above commands again and again.
#browse to the user directory
$ cd ..
$ cd home
$ cd user (hduser)
hduser@localhost~$ vi .hiverc
Set hive.cli.print.current.db=true;
Next time when we start the hive, it will look for a file labeled .hvierc in
our home directory and will execute the command automatically.
Rupak Roy
8. To create a Hive table
hive>Create table IF NOT EXISTS employee (ID int, name string , location string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘#’
LINES TERMINATED BY ‘n’
STORED AS TEXTFILE;
hive> describe employee
We can also prefix the database name
where we want to create the table even if
we are already working in another database
Example:
Hive> use db_3;
Hive> create table IF NOT EXISTS db_1.employee (ID int, name string, location String)
Row format delimited
Fields terminated by ‘t’
Lines terminated by ‘n’
Stored as textfile;
ID Name Location
Rupak Roy
9. Alternative Way
Copy the Structure that is the Schema of the table:
hive> create db_2;
hive> create table db_2.emp Like db_1.employee
To get more details about a table use:
hive> describe db_2.emp ;
hive> describe extended db_2.emp;
hive> describe FORMATTED db_2.emp;
We can create & insert data at the same using a single query:
hive(db_2)> create table tb_9 AS
Select * from emp ;
Also we can create new table by sub setting:
hive(db_2)> create table tb_9 AS
Select ID, location from emp;
Rupak Roy
10. Hive table in two different ways:
1) Internal Table: whenever we drop the table it will also delete the physical
file(table data) from the directory created by hive. This is why it is not user
friendly when it comes to sharing the same data with other applications.
Hive(db_2)>Create table IF NOT EXISTS emp (ID int, name string, location string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘t’
LINES TERMINATED BY ‘n’
STORED AS TEXTFILE;
hive(db_2)> load data local INPATH ‘/home/hduser/datasets/htable’
OVERWRITE INTO table emp;
$ hadoop fs -cat /user/cloudera/hive/emp/htable;
Note: if we want to perform the same steps again just use
hive> truncate table emp;
it will delete all the rows in the table and not the table itself.
Rupak Roy
11. 2) External Table: is the opposite of the internal table. Whenever the table is
dropped only the meta data is removed not the physical file (data).
hive(db_2)>Create EXTERNAL TABLE empExternal
(ID int, name string, location string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘#’
LINES TERMINATED BY ‘n’
STORED AS TEXTFILE
LOCATION ‘/user/cloudera/hiveExternal’;
Hive(db_2)> select * from empExternal ;
-------------------empty----- data is not copied -------------
$ hadoop fs -put datasets/file1 /user/cloudera/hiveExternal;
hive(db_2)> select * from empExternal
#now drop the table
hive(db_2)> DROP table emp;
hive(db_2)> DROP table empExternal
Rupak Roy
12. Load data
Load Data locally:
hive(db_1)> LOAD data local INPATH(‘datasets/file1’)
OVERWRITE INTO employees;
Load data from HDFS
hive(db_1)> LOAD data INPATH ‘hivedata/file1’ OVERWRITE
INTO employees;
Load all the data from a folder instead a single file.
hvie(db_1)> Load data local INPATH ‘datasets/ ‘ OVERWRITE
INTO employees;
Note: whenever we load the data from the HDFS, the data
will be moved from its directory to the destination directory
preventing duplicate records inside HDFS, however in local
mode it will create an another copy of the file in the HDFS.
Rupak Roy
13. Export data from Hive
Hive> INSERT overwrite Local Directory
‘user/hduser/datasets/ ‘ select * from emp;
hive> insert INTO Local Directory ‘user/hduser/datasets/’
select * from emp
OR inside HDFS
hive> insert ‘directory in the hdfs’ select * from emp;
Or manually copy the file from HDFS like
$ hadoop fs –get
/user/cloudera/emp/user/hduser/datasets
Rupak Roy