SlideShare a Scribd company logo
1 of 40
Hadoop Training
HIVE
Page 2Classification: Restricted
• HIVE Overview
• Working of Hive
• Hive Tables
• Hive - Data Types
• Complex Types
• Hive Database
• HiveQL - Select-Joins
• Different Types of Join
• Partitions
• Buckets
• Strict Mode in Hive
• Like and Rlike in Hive
• Hive UDF
Agenda
Page 3Classification: Restricted
Hive is a data warehouse infrastructure tool to process structured data in
Hadoop. It makes querying and analyzing easy.
You should work on HiveQL to become a successful hadoop developer
using hive
Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under
the name Apache Hive. It is used by different companies. For example,
Amazon uses it in Amazon Elastic MapReduce.
Features of Hive
It stores schema in a database and processed data into HDFS.
It is designed for Analytical processing.
It provides SQL type language for querying called HiveQL or HQL.
It is familiar, fast, scalable, and extensible.
HIVE Overview
Page 4Classification: Restricted
HIVE
Page 5Classification: Restricted
Unit Name Operation
User Interface Hive is a data warehouse infrastructure software that can create interaction
between user and HDFS. The user interfaces that Hive supports are Hive Web
UI, Hive command line, and Hive HD Insight (In Windows server).
Meta Store Hive chooses respective database servers to store the schema or Metadata of
tables, databases, columns in a table, their data types, and HDFS mapping.
HiveQL Process
Engine
HiveQL is similar to SQL for querying on schema info on the Metastore. It is one
of the replacements of traditional approach for MapReduce program. Instead
of writing MapReduce program in Java, we can write a query for MapReduce
job and process it.
Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive
Execution Engine. Execution engine processes the query and generates results
as same as MapReduce results. It uses the flavor of MapReduce.
HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to
store data into file system.
HIVE
Page 6Classification: Restricted
The following diagram depicts the workflow between
Hive and Hadoop
Working of Hive
Page 7Classification: Restricted
Step No. Operation
1 Execute QueryThe Hive interface such as Command Line or Web UI sends
query to Driver (any database driver such as JDBC, ODBC, etc.) to execute.
2 Get PlanThe driver takes the help of query compiler that parses the query to
check the syntax and query plan or the requirement of query.
3 Get MetadataThe compiler sends metadata request to Metastore (any
database).
4 Send MetadataMetastore sends metadata as a response to the compiler.
5 Send PlanThe compiler checks the requirement and resends the plan to the
driver. Up to here, the parsing and compiling of a query is complete.
6 Execute PlanThe driver sends the execute plan to the execution engine.
7 Execute JobInternally, the process of execution job is a MapReduce job. The
execution engine sends the job to JobTracker, which is in Name node and it
assigns this job to TaskTracker, which is in Data node. Here, the query
executes MapReduce job.
8 Fetch ResultThe execution engine receives the results from Data nodes.
9 Send ResultsThe execution engine sends those resultant values to the driver.
10 Send ResultsThe driver sends the results to Hive Interfaces.
The following table defines how Hive interacts with Hadoop framework:
Page 8Classification: Restricted
The Hive metastore service stores the metadata for Hive tables and
partitions in a relational database, and provides clients (including Hive)
access to this information via the metastore service API
HIVE
Page 9Classification: Restricted
For internal table
CREATE TABLE internal1 (col1 string);
Hive multiple table insert - Insert data into multiple hive tables
FROM sethu
INSERT OVERWRITE TABLE tab1 SELECT
sethu.column_one,sethu.column_two
INSERT OVERWRITE TABLE table_two SELECT
table_name.column_two
Hive Tables
Page 10Classification: Restricted
This chapter takes you through the different data types in Hive, which
are involved in the table creation. All the data types in Hive are
classified into :
PRIMITIVE TYPES:
Integral Types
Integer type data can be specified using integral data types, INT. When
the data range exceeds the range of INT, you need to use BIGINT and if
the data range is smaller than the INT, you use SMALLINT. TINYINT is
smaller than SMALLINT.
Floating Point Types
Floating point types are nothing but numbers with decimal points.
Generally, this type of data is composed of DOUBLE data type.
Dates
DATE values are described in year/month/day format in the form
{{YYYY-MM-DD}}.
Boolean type
BOOLEAN—TRUE/FALSE
Hive - Data Types
Page 11Classification: Restricted
String Types
String type data types can be specified using single quotes (' ') or double quotes
(" "). It contains two data types: VARCHAR and CHAR. Hive follows C-types
escape characters.
Decimals
The DECIMAL type in Hive is as same as Big Decimal format of Java. It is used
for representing immutable arbitrary precision. The syntax and example is as
follows:
DECIMAL(precision, scale)
decimal(10,0)
Precision is the number of digits in a number. Scale is the number of digits to
the right of the decimal point in a number. For example, the number 123.45
has aprecision of 5 and a scale of 2.
Hive - Data Types
Page 12Classification: Restricted
Map<K,V>
Type Parameters:
K - the type of keys maintained by this map
V - the type of mapped values
Arrays
Arrays in Hive are used the same way they are used in Java.
Syntax: ARRAY<data_type>
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
friends ARRAY<BIGINT>, properties MAP<STRING, STRING>, …………)
Complex Types
Page 13Classification: Restricted
Hive is a database technology that can define databases and tables to
analyze structured data. The theme for structured data analysis is to store
the data in a tabular manner, and pass queries to analyze it.
Create Database is a statement used to create a database in Hive. A
database in Hive is a namespace or a collection of tables. The syntax for
this statement is as follows:
CREATE DATABASE <database name>
The following query is executed to create a database named userdb:
hive> CREATE DATABASE userdb;
The following query is used to verify a databases list:
hive> SHOW DATABASES;
The following queries are used to drop a database. Let us assume that the
database name is userdb.
hive> DROP DATABASE userdb;
Hive Database
Page 14Classification: Restricted
Create Table Statement
Create Table is a statement used to create a table in Hive. The example are
as follows:
The following data is a Comment, Row formatted fields such as Field
terminator, Lines terminator, and Stored File type.
COMMENT ‘Employee details’
FIELDS TERMINATED BY ‘t’
LINES TERMINATED BY ‘n';
hive> CREATE TABLE employee ( eid int, name String,
salary String, destination String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘t’;
Hive Database
Page 15Classification: Restricted
Create table road (id int,name VARCHAR(20),des string,year int) row format
delimited fields terminated by ',';
Then move data in hadoop using hadoop fs -put /home/mishra/Desktop/hive
/destination after creating the table and defining the schema,the next job is to
load data into hive which is done by:
load data inpath '/pnt' into table road;
inserting data is a complex operation in hive,generally not done but
to insert ad-hoc value like (12,"xyz), do this:
insert into table road select * from (select 12,"xyz","hr")a;
Alter Table Statement
It is used to alter a table in Hive.
Hive Database
Page 16Classification: Restricted
The following query renames the table from employee to emp.
hive> ALTER TABLE employee RENAME TO emp;
CHANGE STATEMENT
The following queries rename the column name and column data type using
the above data:
hive> ALTER TABLE employee CHANGE name ename String;
Add Columns Statement
The following query adds a column named dept to the employee table.
Hive Database
Page 17Classification: Restricted
Drop Table Statement
The syntax is as follows:
DROP TABLE table_name;
The following query drops a table named employee:
hive> DROP TABLE employee;
Hive Database
Page 18Classification: Restricted
You can save any result set data as a view. The usage of view in Hive is same as
that of the view in SQL.
A view is nothing more than a statement that is stored in the database with an
associated name.
Summarize data from various tables which can be used to generate reports.
Creating Views:
Database views are created using the CREATE VIEW statement.
The basic CREATE VIEW syntax is as follows:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
Example:
Hive Database
Page 19Classification: Restricted
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Now, following is the example to create a view from CUSTOMERS table. This view would be
used to have customer name and age from CUSTOMERS table:
hive > CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in similar way as you query an actual table.
Following is the example:
Hive Database
Page 20Classification: Restricted
hive > SELECT * FROM CUSTOMERS_VIEW;
This would produce the following result:
+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+
Dropping a View
Use the following syntax to drop a view:
DROP VIEW view_name
Following is an example to delete a record having AGE= 22.
hive > DELETE FROM CUSTOMERS_VIEW
WHERE age = 22;
Hive Database
Page 21Classification: Restricted
JOIN is a clause that is used for combining specific fields from two tables by using
values common to each one. It is used to combine records from two or more tables in
the database. It is more or less similar to SQL JOIN.
Syntax
join_table:
table_reference JOIN table_factor [join_condition]
| table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
join_condition
| table_reference LEFT SEMI JOIN table_reference join_condition
| table_reference CROSS JOIN table_reference [join_condition]
Example
We will use the following two tables in this chapter. Consider the following table
named CUSTOMERS..
HiveQL - Select-Joins
Page 22Classification: Restricted
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Consider another table ORDERS as follows:
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
HiveQL - Select-Joins
Page 23Classification: Restricted
create a doc for orders on desktop and paste::::
102,2009-10-08 00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2060
create a doc for customers on desktop and paste::::
1,Ramesh,32,Ahmedabad,2000.00
2,Khilan,25,Delhi,1500.00
3,kaushik,23,Kota,2000.00
4,Chaitali,25,Mumbai,6500.00
5,Hardik,27,Bhopal,8500.00
6,Komal,22,MP,4500.00
7,Muffy,24,Indore,10000.00
put the data in hdfs::::
hadoop fs -put /home/mishra/Desktop/untti /doc
hadoop fs -put /home/mishra/Desktop/delhi /docum
now create the table for them in hive as::::
create table CUSTOMERS (ID int,NAME string,AGE int,ADDRESS string,SALARY string) row
format delimited fields terminated by ',';
HiveQL - Select-Joins
Page 24Classification: Restricted
create table ORDERS (OID int,date string,CUSTOMER_ID int,AMOUNT string) row format
delimited fields terminated by ',';
now load data into hive table as:::
load data inpath '/doc' into table ORDERS;
load data inpath '/docum' into table CUSTOMERS;
There are different types of joins given as follows:
1. JOIN
2. LEFT OUTER JOIN
3. RIGHT OUTER JOIN
4. FULL OUTER JOIN
JOIN
The JOIN creates a new result table by combining column values of two tables (table1
and table2) based upon the join-predicate. The query compares each row of table1 with
each row of table2 to find all pairs of rows which satisfy the join-predicate. When the
join-predicate is satisfied, column values for each matched pair of rows of A and B are
combined into a result row.
Different Types of Join
Page 25Classification: Restricted
The following query executes JOIN on the CUSTOMER and ORDER tables, and retrieves the
records:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
or
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
INNER JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
On successful execution of the query, you get to see the following response:
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Different Types of Join
Page 26Classification: Restricted
The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there are no
matches in the right table. This means, if the ON clause matches 0 (zero) records in the right
table, the JOIN still returns a row in the result, but with NULL in each column from the right
table.
A LEFT JOIN returns all the values from the left table, plus the matched values from the right
table, or NULL in case of no matching JOIN predicate.
The following query demonstrates LEFT OUTER JOIN between CUSTOMER and ORDER tables:
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
or(just like sql regular practice)
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
Different Types of Join
Page 27Classification: Restricted
On successful execution of the query, you get to see the following response:
+----+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+----+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+----+----------+--------+---------------------+
RIGHT OUTER JOIN
The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if there are
no matches in the left table. If the ON clause matches 0 (zero) records in the left table,
the JOIN still returns a row in the result, but with NULL in each column from the left table.
Different Types of Join
Page 28Classification: Restricted
A RIGHT JOIN returns all the values from the right table, plus the matched values from
the left table, or NULL in case of no matching join predicate.
The following query demonstrates RIGHT OUTER JOIN between the CUSTOMER and
ORDER tables.
notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c
RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
or
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
On successful execution of the query, you get to see the following response:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
Different Types of Join
Page 29Classification: Restricted
The HiveQL FULL OUTER JOIN combines the records of both the left and the right
outer tables that fulfil the JOIN condition. The joined table contains either all the
records from both the tables, or fills in NULL values for missing matches on either
side.
The following query demonstrates FULL OUTER JOIN between CUSTOMER and ORDER
tables:
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
or
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
FULL JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
Different Types of Join
Page 30Classification: Restricted
On successful execution of the query, you get to see the following response:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+
Different Types of Join
Page 31Classification: Restricted
GROUP BY
Generate a query to retrieve the number of employees in each department.
The following query retrieves the employee details using the above scenario.
hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;
ORDER BY
SELECT * FROM CUSTOMERS ORDER BY NAME
Following is an example, which would sort the result in descending order by NAME:
SELECT * FROM CUSTOMERS ORDER BY NAME DESC;
Different Types of Join
Page 32Classification: Restricted
Hive is a good tool for performing queries on large datasets, especially datasets that
require full table scans. But quite often there are instances where users need to filter
the data on specific column values. Generally, Hive users know about the domain of the
data that they deal with. With this knowledge they can identify common columns that
are frequently queried in order to identify columns with low cardinality which can be
used to organize data using the partitioning feature of Hive.
In non-partitioned tables, Hive would have to read all the files in a table’s data directory
and subsequently apply filters on it. This is slow and expensive—especially in cases of
large tables.
Partitions are essentially slices of data which allow larger sets of data to be separated
into more manageable chunks.
When a partitioned table is queried with one or both partition columns in criteria or in
the WHERE clause, what Hive effectively does is partition elimination by scanning only
those data directories that are needed. If no partitioned columns are used, then all the
directories are scanned (full table scan) and partitioning will not have any effect.
Partitions
Page 33Classification: Restricted
 Hive organizes tables into partitions. It is a way of dividing a table into related parts
based on the values of partitioned columns such as date, city, and department. Using
partition, it is easy to query a portion of the data.
How to create partitions?
create table anand(url string,page string)partitioned by(day string);
 How to load data in a partition .
load data local inpath '/home/andy1/Desktop/1234.txt' into table anand
partition(day='tue');
The partitioning can be viewed in /user/hive/warehouse/logs
hive> select * from anand where day='mon';
Partitions in Hive
Page 34Classification: Restricted
Tables or partitions are sub-divided into buckets, to provide extra structure to
the data that may be used for more efficient querying. Bucketing works based
on the value of hash function of some column of a table.
 How to create buckets.
CREATE TABLE bucketed_users (id INT, name STRING)
CLUSTERED BY (id) INTO 4 BUCKETS;
 Ex:::::
 Create sorted bucket.
CREATE TABLE bucketed_users (id INT, name STRING)
CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;
Buckets
Page 35Classification: Restricted
Setting the Strict mode in hive means that you can query strictly only on the partition
defined and the query wont execute on the non partitioned part.
For ex::
You can set the strict mode in hive by command in hive as:::
set hive.mapred.mode=strict;
Now query on the data which we partitioned earlier:::::
Select * from logs where line==“wes”;
It would show some semantic error
BUT if you do the query on the coloumn which you have partitioned then it will show
The desired output.
set hive.mapred.mode=nonstrict;
The same query now gives the output as:::
2344 Wes 25feb india
Now you can undo the strict mode by:::::
If your partitioned table is very large, you could block any full table scan queries by
putting Hive into strict mode using the set hive.mapred.mode=strict command. In this
mode, when users submit a query that would result in a full table scan (i.e. queries
without any partitioned columns) an error is issued.
Strict Mode in Hive
Page 36Classification: Restricted
like in hive :::::
compares the string pattern of two coloumns specified as a and b
a LIKE b
create database andy;
use andy;
create table rat(id int,dep string,des string) row format delimited fields terminated by ',';
load data local inpath '/home/mishra/Desktop/naya' into table rat;
select * from rat;
SELECT * FROM rat WHERE des LIKE dep;
this command returns the value where the string of dep matches des
Rlike in hive :::::
True if any substring of A matches with B otherwise false
suppose my data is ::::::::::::::::::
id dep des
1 hr hr
2 hr man
3 peon staff
NULL NULL NULL
1 hr shr
2 hman man
3 peon staff
Like and Rlike in Hive
Page 37Classification: Restricted
SELECT * FROM rat WHERE des RLIKE dep; //or dep part of des
o/p::::::
1 hr hr
1 hr shr
SELECT * FROM rat WHERE dep RLIKE des;//or des part of dep
o/p:::::::
1 hr hr
2 hman man
Like and Rlike in Hive
Page 38Classification: Restricted
Hive UDF
Generally Hive having some Built-in functions like LIKE and RLIKE,we can use
that Built-in functions for our Hive program with out adding any extra code but
some times user requirement is not available in that built-in functions at that
time user can write some own custom user defined functions called UDF (user
defined function).
Process is:::::::
open eclipse and save the package with name xyz
save the class with name ToUpper.java
paste the following user defined codein it:::::::::
package xyz;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class ToUpper extends UDF {
public Text evaluate(Text s) {
Text to_value = new Text("");
if (s != null) { {
to_value.set(s.toString().toUpperCase());
} {
to_value = new Text(s);
}
}
return to_value; }}
Page 39Classification: Restricted
Add external jars to your eclipse project.two most important jars are hadoop-
common jar which is
visible outside incommon folder in /usr/local/work/hadoop/share/hadoop/common
and another jar is hive-exec jar present in lib folder in /usr/local/work/hive/lib
now add the jar in you hive using add jar command
hive>add jar /home/ands/Desktop/hiveudf.jar;
create a temporary function using create temporary function by the name by which
you wantt to run your udf.
hive>create temporary function toupper as 'xyz.ToUpper';
hive> create table anda (name string,age int)row format delimited fields terminated
by ',';
load data local inpath '/home/ands/Desktop/expudf' into table anda;
select toupper(name) from anda;
Hive UDF
Page 40Classification: Restricted
THANK YOU!

More Related Content

What's hot

Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 

What's hot (20)

Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Sqoop
SqoopSqoop
Sqoop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop
HadoopHadoop
Hadoop
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 

Similar to Session 14 - Hive

Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingearnwithme2522
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSRUHULAMINHAZARIKA
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 

Similar to Session 14 - Hive (20)

Hive presentation
Hive presentationHive presentation
Hive presentation
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Hive
HiveHive
Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Apache hive
Apache hiveApache hive
Apache hive
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hive.pptx
Hive.pptxHive.pptx
Hive.pptx
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
20080529dublinpt3
20080529dublinpt320080529dublinpt3
20080529dublinpt3
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
 
Apache hive
Apache hiveApache hive
Apache hive
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 

More from AnandMHadoop

Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - FlumeAnandMHadoop
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig ContinuedAnandMHadoop
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slidesAnandMHadoop
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn ConceptsAnandMHadoop
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 

More from AnandMHadoop (9)

Overview of Java
Overview of Java Overview of Java
Overview of Java
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slides
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn Concepts
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Session 14 - Hive

  • 2. Page 2Classification: Restricted • HIVE Overview • Working of Hive • Hive Tables • Hive - Data Types • Complex Types • Hive Database • HiveQL - Select-Joins • Different Types of Join • Partitions • Buckets • Strict Mode in Hive • Like and Rlike in Hive • Hive UDF Agenda
  • 3. Page 3Classification: Restricted Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It makes querying and analyzing easy. You should work on HiveQL to become a successful hadoop developer using hive Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce. Features of Hive It stores schema in a database and processed data into HDFS. It is designed for Analytical processing. It provides SQL type language for querying called HiveQL or HQL. It is familiar, fast, scalable, and extensible. HIVE Overview
  • 5. Page 5Classification: Restricted Unit Name Operation User Interface Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server). Meta Store Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping. HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of the replacements of traditional approach for MapReduce program. Instead of writing MapReduce program in Java, we can write a query for MapReduce job and process it. Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. It uses the flavor of MapReduce. HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to store data into file system. HIVE
  • 6. Page 6Classification: Restricted The following diagram depicts the workflow between Hive and Hadoop Working of Hive
  • 7. Page 7Classification: Restricted Step No. Operation 1 Execute QueryThe Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute. 2 Get PlanThe driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query. 3 Get MetadataThe compiler sends metadata request to Metastore (any database). 4 Send MetadataMetastore sends metadata as a response to the compiler. 5 Send PlanThe compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete. 6 Execute PlanThe driver sends the execute plan to the execution engine. 7 Execute JobInternally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job. 8 Fetch ResultThe execution engine receives the results from Data nodes. 9 Send ResultsThe execution engine sends those resultant values to the driver. 10 Send ResultsThe driver sends the results to Hive Interfaces. The following table defines how Hive interacts with Hadoop framework:
  • 8. Page 8Classification: Restricted The Hive metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information via the metastore service API HIVE
  • 9. Page 9Classification: Restricted For internal table CREATE TABLE internal1 (col1 string); Hive multiple table insert - Insert data into multiple hive tables FROM sethu INSERT OVERWRITE TABLE tab1 SELECT sethu.column_one,sethu.column_two INSERT OVERWRITE TABLE table_two SELECT table_name.column_two Hive Tables
  • 10. Page 10Classification: Restricted This chapter takes you through the different data types in Hive, which are involved in the table creation. All the data types in Hive are classified into : PRIMITIVE TYPES: Integral Types Integer type data can be specified using integral data types, INT. When the data range exceeds the range of INT, you need to use BIGINT and if the data range is smaller than the INT, you use SMALLINT. TINYINT is smaller than SMALLINT. Floating Point Types Floating point types are nothing but numbers with decimal points. Generally, this type of data is composed of DOUBLE data type. Dates DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}. Boolean type BOOLEAN—TRUE/FALSE Hive - Data Types
  • 11. Page 11Classification: Restricted String Types String type data types can be specified using single quotes (' ') or double quotes (" "). It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters. Decimals The DECIMAL type in Hive is as same as Big Decimal format of Java. It is used for representing immutable arbitrary precision. The syntax and example is as follows: DECIMAL(precision, scale) decimal(10,0) Precision is the number of digits in a number. Scale is the number of digits to the right of the decimal point in a number. For example, the number 123.45 has aprecision of 5 and a scale of 2. Hive - Data Types
  • 12. Page 12Classification: Restricted Map<K,V> Type Parameters: K - the type of keys maintained by this map V - the type of mapped values Arrays Arrays in Hive are used the same way they are used in Java. Syntax: ARRAY<data_type> CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, friends ARRAY<BIGINT>, properties MAP<STRING, STRING>, …………) Complex Types
  • 13. Page 13Classification: Restricted Hive is a database technology that can define databases and tables to analyze structured data. The theme for structured data analysis is to store the data in a tabular manner, and pass queries to analyze it. Create Database is a statement used to create a database in Hive. A database in Hive is a namespace or a collection of tables. The syntax for this statement is as follows: CREATE DATABASE <database name> The following query is executed to create a database named userdb: hive> CREATE DATABASE userdb; The following query is used to verify a databases list: hive> SHOW DATABASES; The following queries are used to drop a database. Let us assume that the database name is userdb. hive> DROP DATABASE userdb; Hive Database
  • 14. Page 14Classification: Restricted Create Table Statement Create Table is a statement used to create a table in Hive. The example are as follows: The following data is a Comment, Row formatted fields such as Field terminator, Lines terminator, and Stored File type. COMMENT ‘Employee details’ FIELDS TERMINATED BY ‘t’ LINES TERMINATED BY ‘n'; hive> CREATE TABLE employee ( eid int, name String, salary String, destination String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’; Hive Database
  • 15. Page 15Classification: Restricted Create table road (id int,name VARCHAR(20),des string,year int) row format delimited fields terminated by ','; Then move data in hadoop using hadoop fs -put /home/mishra/Desktop/hive /destination after creating the table and defining the schema,the next job is to load data into hive which is done by: load data inpath '/pnt' into table road; inserting data is a complex operation in hive,generally not done but to insert ad-hoc value like (12,"xyz), do this: insert into table road select * from (select 12,"xyz","hr")a; Alter Table Statement It is used to alter a table in Hive. Hive Database
  • 16. Page 16Classification: Restricted The following query renames the table from employee to emp. hive> ALTER TABLE employee RENAME TO emp; CHANGE STATEMENT The following queries rename the column name and column data type using the above data: hive> ALTER TABLE employee CHANGE name ename String; Add Columns Statement The following query adds a column named dept to the employee table. Hive Database
  • 17. Page 17Classification: Restricted Drop Table Statement The syntax is as follows: DROP TABLE table_name; The following query drops a table named employee: hive> DROP TABLE employee; Hive Database
  • 18. Page 18Classification: Restricted You can save any result set data as a view. The usage of view in Hive is same as that of the view in SQL. A view is nothing more than a statement that is stored in the database with an associated name. Summarize data from various tables which can be used to generate reports. Creating Views: Database views are created using the CREATE VIEW statement. The basic CREATE VIEW syntax is as follows: CREATE VIEW view_name AS SELECT column1, column2..... FROM table_name WHERE [condition]; Example: Hive Database
  • 19. Page 19Classification: Restricted Consider the CUSTOMERS table having the following records: +----+----------+-----+-----------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+-----------+----------+ | 1 | Ramesh | 32 | Ahmedabad | 2000.00 | | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 8500.00 | | 6 | Komal | 22 | MP | 4500.00 | | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+-----------+----------+ Now, following is the example to create a view from CUSTOMERS table. This view would be used to have customer name and age from CUSTOMERS table: hive > CREATE VIEW CUSTOMERS_VIEW AS SELECT name, age FROM CUSTOMERS; Now, you can query CUSTOMERS_VIEW in similar way as you query an actual table. Following is the example: Hive Database
  • 20. Page 20Classification: Restricted hive > SELECT * FROM CUSTOMERS_VIEW; This would produce the following result: +----------+-----+ | name | age | +----------+-----+ | Ramesh | 32 | | Khilan | 25 | | kaushik | 23 | | Chaitali | 25 | | Hardik | 27 | | Komal | 22 | | Muffy | 24 | +----------+-----+ Dropping a View Use the following syntax to drop a view: DROP VIEW view_name Following is an example to delete a record having AGE= 22. hive > DELETE FROM CUSTOMERS_VIEW WHERE age = 22; Hive Database
  • 21. Page 21Classification: Restricted JOIN is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database. It is more or less similar to SQL JOIN. Syntax join_table: table_reference JOIN table_factor [join_condition] | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition | table_reference LEFT SEMI JOIN table_reference join_condition | table_reference CROSS JOIN table_reference [join_condition] Example We will use the following two tables in this chapter. Consider the following table named CUSTOMERS.. HiveQL - Select-Joins
  • 22. Page 22Classification: Restricted +----+----------+-----+-----------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+-----------+----------+ | 1 | Ramesh | 32 | Ahmedabad | 2000.00 | | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 8500.00 | | 6 | Komal | 22 | MP | 4500.00 | | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+-----------+----------+ Consider another table ORDERS as follows: +-----+---------------------+-------------+--------+ |OID | DATE | CUSTOMER_ID | AMOUNT | +-----+---------------------+-------------+--------+ | 102 | 2009-10-08 00:00:00 | 3 | 3000 | | 100 | 2009-10-08 00:00:00 | 3 | 1500 | | 101 | 2009-11-20 00:00:00 | 2 | 1560 | | 103 | 2008-05-20 00:00:00 | 4 | 2060 | HiveQL - Select-Joins
  • 23. Page 23Classification: Restricted create a doc for orders on desktop and paste:::: 102,2009-10-08 00:00:00,3,3000 100,2009-10-08 00:00:00,3,1500 101,2009-11-20 00:00:00,2,1560 103,2008-05-20 00:00:00,4,2060 create a doc for customers on desktop and paste:::: 1,Ramesh,32,Ahmedabad,2000.00 2,Khilan,25,Delhi,1500.00 3,kaushik,23,Kota,2000.00 4,Chaitali,25,Mumbai,6500.00 5,Hardik,27,Bhopal,8500.00 6,Komal,22,MP,4500.00 7,Muffy,24,Indore,10000.00 put the data in hdfs:::: hadoop fs -put /home/mishra/Desktop/untti /doc hadoop fs -put /home/mishra/Desktop/delhi /docum now create the table for them in hive as:::: create table CUSTOMERS (ID int,NAME string,AGE int,ADDRESS string,SALARY string) row format delimited fields terminated by ','; HiveQL - Select-Joins
  • 24. Page 24Classification: Restricted create table ORDERS (OID int,date string,CUSTOMER_ID int,AMOUNT string) row format delimited fields terminated by ','; now load data into hive table as::: load data inpath '/doc' into table ORDERS; load data inpath '/docum' into table CUSTOMERS; There are different types of joins given as follows: 1. JOIN 2. LEFT OUTER JOIN 3. RIGHT OUTER JOIN 4. FULL OUTER JOIN JOIN The JOIN creates a new result table by combining column values of two tables (table1 and table2) based upon the join-predicate. The query compares each row of table1 with each row of table2 to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of A and B are combined into a result row. Different Types of Join
  • 25. Page 25Classification: Restricted The following query executes JOIN on the CUSTOMER and ORDER tables, and retrieves the records: hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); or SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS INNER JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; On successful execution of the query, you get to see the following response: +----+----------+-----+--------+ | ID | NAME | AGE | AMOUNT | +----+----------+-----+--------+ | 3 | kaushik | 23 | 3000 | | 3 | kaushik | 23 | 1500 | | 2 | Khilan | 25 | 1560 | | 4 | Chaitali | 25 | 2060 | +----+----------+-----+--------+ Different Types of Join
  • 26. Page 26Classification: Restricted The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there are no matches in the right table. This means, if the ON clause matches 0 (zero) records in the right table, the JOIN still returns a row in the result, but with NULL in each column from the right table. A LEFT JOIN returns all the values from the left table, plus the matched values from the right table, or NULL in case of no matching JOIN predicate. The following query demonstrates LEFT OUTER JOIN between CUSTOMER and ORDER tables: hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c LEFT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); or(just like sql regular practice) SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; Different Types of Join
  • 27. Page 27Classification: Restricted On successful execution of the query, you get to see the following response: +----+----------+--------+---------------------+ | ID | NAME | AMOUNT | DATE | +----+----------+--------+---------------------+ | 1 | Ramesh | NULL | NULL | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | | 5 | Hardik | NULL | NULL | | 6 | Komal | NULL | NULL | | 7 | Muffy | NULL | NULL | +----+----------+--------+---------------------+ RIGHT OUTER JOIN The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if there are no matches in the left table. If the ON clause matches 0 (zero) records in the left table, the JOIN still returns a row in the result, but with NULL in each column from the left table. Different Types of Join
  • 28. Page 28Classification: Restricted A RIGHT JOIN returns all the values from the right table, plus the matched values from the left table, or NULL in case of no matching join predicate. The following query demonstrates RIGHT OUTER JOIN between the CUSTOMER and ORDER tables. notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); or SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; On successful execution of the query, you get to see the following response: +------+----------+--------+---------------------+ | ID | NAME | AMOUNT | DATE | +------+----------+--------+---------------------+ | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | Different Types of Join
  • 29. Page 29Classification: Restricted The HiveQL FULL OUTER JOIN combines the records of both the left and the right outer tables that fulfil the JOIN condition. The joined table contains either all the records from both the tables, or fills in NULL values for missing matches on either side. The following query demonstrates FULL OUTER JOIN between CUSTOMER and ORDER tables: hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c FULL OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); or SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS FULL JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID; Different Types of Join
  • 30. Page 30Classification: Restricted On successful execution of the query, you get to see the following response: +------+----------+--------+---------------------+ | ID | NAME | AMOUNT | DATE | +------+----------+--------+---------------------+ | 1 | Ramesh | NULL | NULL | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | | 5 | Hardik | NULL | NULL | | 6 | Komal | NULL | NULL | | 7 | Muffy | NULL | NULL | | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | +------+----------+--------+---------------------+ Different Types of Join
  • 31. Page 31Classification: Restricted GROUP BY Generate a query to retrieve the number of employees in each department. The following query retrieves the employee details using the above scenario. hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT; ORDER BY SELECT * FROM CUSTOMERS ORDER BY NAME Following is an example, which would sort the result in descending order by NAME: SELECT * FROM CUSTOMERS ORDER BY NAME DESC; Different Types of Join
  • 32. Page 32Classification: Restricted Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values. Generally, Hive users know about the domain of the data that they deal with. With this knowledge they can identify common columns that are frequently queried in order to identify columns with low cardinality which can be used to organize data using the partitioning feature of Hive. In non-partitioned tables, Hive would have to read all the files in a table’s data directory and subsequently apply filters on it. This is slow and expensive—especially in cases of large tables. Partitions are essentially slices of data which allow larger sets of data to be separated into more manageable chunks. When a partitioned table is queried with one or both partition columns in criteria or in the WHERE clause, what Hive effectively does is partition elimination by scanning only those data directories that are needed. If no partitioned columns are used, then all the directories are scanned (full table scan) and partitioning will not have any effect. Partitions
  • 33. Page 33Classification: Restricted  Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data. How to create partitions? create table anand(url string,page string)partitioned by(day string);  How to load data in a partition . load data local inpath '/home/andy1/Desktop/1234.txt' into table anand partition(day='tue'); The partitioning can be viewed in /user/hive/warehouse/logs hive> select * from anand where day='mon'; Partitions in Hive
  • 34. Page 34Classification: Restricted Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.  How to create buckets. CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) INTO 4 BUCKETS;  Ex:::::  Create sorted bucket. CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS; Buckets
  • 35. Page 35Classification: Restricted Setting the Strict mode in hive means that you can query strictly only on the partition defined and the query wont execute on the non partitioned part. For ex:: You can set the strict mode in hive by command in hive as::: set hive.mapred.mode=strict; Now query on the data which we partitioned earlier::::: Select * from logs where line==“wes”; It would show some semantic error BUT if you do the query on the coloumn which you have partitioned then it will show The desired output. set hive.mapred.mode=nonstrict; The same query now gives the output as::: 2344 Wes 25feb india Now you can undo the strict mode by::::: If your partitioned table is very large, you could block any full table scan queries by putting Hive into strict mode using the set hive.mapred.mode=strict command. In this mode, when users submit a query that would result in a full table scan (i.e. queries without any partitioned columns) an error is issued. Strict Mode in Hive
  • 36. Page 36Classification: Restricted like in hive ::::: compares the string pattern of two coloumns specified as a and b a LIKE b create database andy; use andy; create table rat(id int,dep string,des string) row format delimited fields terminated by ','; load data local inpath '/home/mishra/Desktop/naya' into table rat; select * from rat; SELECT * FROM rat WHERE des LIKE dep; this command returns the value where the string of dep matches des Rlike in hive ::::: True if any substring of A matches with B otherwise false suppose my data is :::::::::::::::::: id dep des 1 hr hr 2 hr man 3 peon staff NULL NULL NULL 1 hr shr 2 hman man 3 peon staff Like and Rlike in Hive
  • 37. Page 37Classification: Restricted SELECT * FROM rat WHERE des RLIKE dep; //or dep part of des o/p:::::: 1 hr hr 1 hr shr SELECT * FROM rat WHERE dep RLIKE des;//or des part of dep o/p::::::: 1 hr hr 2 hman man Like and Rlike in Hive
  • 38. Page 38Classification: Restricted Hive UDF Generally Hive having some Built-in functions like LIKE and RLIKE,we can use that Built-in functions for our Hive program with out adding any extra code but some times user requirement is not available in that built-in functions at that time user can write some own custom user defined functions called UDF (user defined function). Process is::::::: open eclipse and save the package with name xyz save the class with name ToUpper.java paste the following user defined codein it::::::::: package xyz; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class ToUpper extends UDF { public Text evaluate(Text s) { Text to_value = new Text(""); if (s != null) { { to_value.set(s.toString().toUpperCase()); } { to_value = new Text(s); } } return to_value; }}
  • 39. Page 39Classification: Restricted Add external jars to your eclipse project.two most important jars are hadoop- common jar which is visible outside incommon folder in /usr/local/work/hadoop/share/hadoop/common and another jar is hive-exec jar present in lib folder in /usr/local/work/hive/lib now add the jar in you hive using add jar command hive>add jar /home/ands/Desktop/hiveudf.jar; create a temporary function using create temporary function by the name by which you wantt to run your udf. hive>create temporary function toupper as 'xyz.ToUpper'; hive> create table anda (name string,age int)row format delimited fields terminated by ','; load data local inpath '/home/ands/Desktop/expudf' into table anda; select toupper(name) from anda; Hive UDF