4. Structure DataStructure Data
Large Data SetLarge Data Set
MapreduceMapreduce Parallel
Distribution
Parallel
Distribution
Query DataQuery Data
Why HIVE
4Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
6. HDFS or HBASE STORAGE SYSTEM
Execution Engine
Hive QL Process Engine
WEB UIWEB UI
HIVE
COMMAND
LINE
HIVE
COMMAND
LINE
HD InsightHD Insight
Meta Store
User
Interface
HIVE Architecture
6Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
8. Hive File formats
• Text Files - Delimited by Parameters
• Sequence Files - Less Data
• RC Files - Analytic Processing
• ORC Files – Optimized file format in binary
format
8
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT
2016
9. Hive query language offers:
Create Database
Create ,manage and partition tables
Supports various operators like Relational, Arithmetic and
Logical to evaluate functions
Hive supports DDL and DML
HIVE Query Language (HQL)
9
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT
2016
10. DDL Data Definition Language)
Statements
The DDL commands are listed below
Create, Alter, Drop database
Create Alter, Drop, Truncate table
Create, Alter with Partitioning and Bucketing
Create Views
Show
Describe
10Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
11. Loading files
Inserting data into Hive Tables from queries
DML (Data Manipulation
Language) Statements
11Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
12. Database Operations
Syntax
CREATE DATABASE IF NOT EXISTS db_name
COMMENT ‘db_name Details’
WITH DBPROPERTIES (‘creator’ = ‘name’);
Example
CREATE DATABASE IF NOT EXISTS LIBDETS
COMMENT ’LIBRARY DETAILS’
WITH DBPROPERTIES (‘creator’ = ‘KIRUTHI’);
12Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
13. Database
OperationsSyntax
SHOW DATABASES // displays databases available
Example
SHOW DATABASES;
Syntax
DESCRIBE DATABASE db_name; //display Schema of database
DESCRIBE DATABASE EXTENDED db_name;
Example
DESCRIBE DATABASE LIBDETS;
DESCRIBE DATABASE EXTENDED LIBDETS
13Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
14. ALTER Database
Syntax
ALTER DATABASE db_name // Alter database properties
SET DBPROPERTIES (‘edited-by’ = ‘name’);
Example
ALTER DATABASE LIBDETS
SET DBPROPERTIES (‘edited-by’ = ‘KANI’);
14Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
15. USE , DROP Database
Syntax
USE db_name; //Assign database as current working database
Example
USE LIBDETS;
Syntax
DROP DATABASE db_name; // delete database
Example
DROP DATABASE LIBDETS;
15Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
16. TABLES
Hive supports two types of tables
Managed Table – Table stored in
HiveWarehouse folder
External Table – Retains a schema copy in
specified location even table is deleted
16Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
17. Creating Managed Table
Syntax
CREATE TABLE IF NOT EXISTS tb_name (column_name
data_type, column_name datatype,column_name data type)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘t’ ;
Example
CREATE TABLE IF NOT EXISTS LIBTBL ( Member_Code
INT,Membr_Name STRING, Designation STRING,Dept_code
INT,dept_name STRING,group_name STRING,course_name
STRING,title STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘t’ ;
Managed Table
17Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
18. External Table.
Creating External Table
Syntax
CREATE EXTERNAL TABLE tb_name IF NOT EXISTS
tb_name (column_name datatype, column_name datatype,
column_name datatype)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘ /home/usr/filename.format’;
Example
CREATE EXTERNAL TABLE IF NOT EXISTS LIBTBL
(Member_Code INT, Member_Name STRING, Designation
STRING, Dept_code INT, course_code INT, dept_name STRING,
group_name STRING, course_name STRING, title STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘/home/livrith/Desktop/Book2.csv’;
18Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
19. Loading Data into Table
Syntax
LOAD DATA LOCAL INPATH
‘hdfs_file_or_directory_path’
OVERWRITE INTO TABLE tb_name;
Example
LOAD DATA LOCAL INPATH
‘/home/kiruthika/Documents/Book2.csv’
OVERWRITE INTO TABLE LIBTBL;
19Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
20. Select clause
Syntax
SELET [ALL | DISTINCT] select_expr, select_expr, . . .
FROM tb_name
[WHERE where_conditon]
[GROUP BY column_name]
[ORDER BY column_name]
[HAVING having_condition]
[DISTRIBUTED column_name]
[LIMIT number];
Example:1
SELECT * FROM LIBTBL;
Example:2
SELECT Member Name, Designation FROM LIBTBL;
20Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
21. Select – where
Example
SELECT * FROM LIBUDET WHERE group_name =
‘TEACHING’
OR group_name = ‘student’
AND Dept_name>= ‘18’;
Select - regular expression
Syntax
SELECT column1,column2,column3 FROM tb_name WHERE
column_name LIKE ‘%alp%’;
Example
SELECT PRODUCT, STATE, CITY FROM SALESDETS
WHERE City LIKE ‘%O%’;
21Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
22. Group by
Example
SELECT PRODUCT, COUNT(PRODUCT)AS C1, STATE,
COUNTRY FROM SALESDETS GROUP BY PRODUCT,
STATE;
Order by // Sorts use only one reducer
Example
SELECT PRODUCT, STATE, PRICE, COUNTRY FROM
SALESDETS
ORDER BY COUNTRY;
22Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
23. Sort by // Sorts the data before given to reducer
Example
SELECT PRODUC,STATE,COUNTRY FROM SALESDETS
SORT BY COUNTRY
LIMIT 10;
Having // Filter data based on Group By
Example
SELECT PRODUCT, COUNT(PRODUCT) AS
C1,STATE,COUNTRY FROM SALESDETS
GROUP BY PRODUCT, STATE, COUNTRY
HAVING C1 > 5;
23Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
24. Limit
Example
SELECT PRODUCT,STATE, PRICE, COUNTRY FROM
SALESDETS COUNTRY LIMIT 10;
Distribute by // distributes rows among reducers
Syntax
SELECT column_name1, column_name2,column_name3 FROM
tb_name DISTRIBUTE BY column_name SORT BY column_name
ASC,column_name ASC LIMIT count;
Example
SELECT PRODUCT,PRICE,STATE FROM SALESDETS
DISTRIBUTE BY STATE
SORT BY STATE ASC, PRODUCT ASC
LIMIT 50;
24Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
25. Cluster by // does the job of both distribute by and sort by
Example
SELECT PRODUCT,PRICE,STATE FROM SALESDETS
CLUSTER BY STATE LIMIT 50;
Difference in Execution of Order By , Sort By, Distribute By, Cluster By
25Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
26. Data Aggregation
COUNT
AVG DISTINCT (AVG)
MIN DISTINCT(MIN)
MAX , DISTINCT(MAX)
26Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
27. Partitions
Hive reads the entire dataset from warehouse even when filter
condition is specified to fetch a particular column. This results as
bottleneck in MapReduce jobs and involves huge degree of I/O.
Partition command is used to break larger dataset into small
chunks on columns.
Hive supports two types of partition
Static partition
Dynamic partition
27Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
28. Creating partition table
Syntax
CREATE TABLE tb_name (column1 datatype, column2
datatype,column3 datatype)
COMMENT ‘Details of the dataset’
PARTITIONED BY (column_name STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ‘,’;
Example
CREATE TABLE MY_TABLE1 (Member_Name STRING,dept_name
STRING,group_name STRING,course_name STRING,title STRING)
COMMENT ‘User information’ PARTITIONED BY (Designation
STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘,’;
28Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
29. Load data into static partition table
Syntax
LOAD DATA LOCAL INPATH ‘file_path’ OVERWRITE
INTO TABLE tb_name;
Example
LOAD DATA LOCAL INPATH
‘/home/livrith/Desktop/mytab.csv’ OVERWRITE INTO
TABLE MY_TABLE2;
29Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
30. Set dynamic partition
The following setting has to be modified to execute
dynamic partitions.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Example
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
30Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
31. Insert data - Dynamic partition table
Syntax
INSERT OVERWRITE TABLE 1st
_tb_name
PARTITION(column_name) SELECT
column_name1,column_name2,column_name3 FROM
2nd
_tb_name;
//partition field should be the last attribute when inserting data
Example
INSERT OVERWRITE TABLE MY_TABLE1
PARTITION(Designation)
SELECT Member_Name,dept_name,group_name,
course_name,title,Designation FROM MY_TABLE2;
31Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
33. Bucketing
Bucketing is similar to partitioning.
Bucket is a file.
Bucket are used to create partition on specified column values
where as partitioning is used to divided data into small blocks on
columns.
33
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
34. Table creation
Syntax
CREATE TABLE IF NOT EXISTS tb_name (column1
datatype,column2 datatype,column3 datatype) CLUSTER
BY(column_name) into 3 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘/t’;
Example
CREATE TABLE SALES_BUC1 (Transacyion_date
TIMESTAMP,Product STRING,Price INT,Payment_Type
STRING,Name STRING,City STRING,State STRING,Country
STRING,Account_Created TIMESTAMP) CLUSTERED BY
(Price) into 3 BUCKETS ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’;
34
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
35. Load data into table
Syntax
FROM 1st
_tb_name INSERT OVERWRITE TABLE
2nd
_tb_name
SELECT column_name1, column_name2,column_name3;
Example
FROM SALESDETS INSERT OVERWRITE TABLE
SALES_BUC1 SELECT
Transaction_date,Product,Price,Payment_Type,Name,City,Sta
te,Country,Account_Created;
35Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
36. Select from bucket table
Syntax:1
SELECT DISTINCT column_name FROM 2nd
_tb_name
tb_name (BUCKET 1 OUT OF 3 ON column_name);
Example
SELECT DISTINCT Price FROM SALES_BUC1
TABLESAMPLE (BUCKET 1 OUT OF 3 ON PRICE);
Syntax:2
SELECT DISTINCT column_name FROM tb_name2
Tb_name(BUCKET 1 OUT OF 2 ON column_name);
Example
SELECT DISTINCT PRICE FROM SALES_BUC1
TABLESAMPLE(BUCKET 1 OUT OF 2 ON Price);
36
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
37. Sampling
•SAMPLING is used in hive to populate small dataset from
the existing large datasets. Sampling employs selects records
randomly to create small datasets.
Syntax
SELECT COUNT(*) FROM tb_name TABLESAMPLE
(BUCKET 1 OUT OF 3 ON column_name);
Example
In the example given below sample are created from the table
sales_buc from the available 3 buckets.
SELECT COUNT(*) FROM SALES_BUC TABLESAMPLE
(BUCKET 1 OUT OF 3 ON Price);
37Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
38. • Apache HBase is an open-source, distributed, versioned,
non-relational database modeled after Google's Bigtable
• Apache HBase provides Bigtable-like capabilities on top
of Hadoop and HDFS.
38
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
39. NoSQL Databases
• NoSQL – Not only SQL, Non Relational/Non
SQL Databases
• SCHEMA LESS
• Ideology
• BASE – Basically available Eventual
Consistency - Only can support two
availabilty, replication
39
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
40. NoSQL Types
• Key Value Store - Amazon S3, Riak
• Document based store – CouchDB,MongoDB
• Column based store - Hbase, Cassandra
• Graph based stores - Neoj4, Orientdb
40
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
41. HBASE is Not
• Table with one primary key (row key)
• No Join Operations
• Limited Atomicty and transaction support
• Manipulated by SQL
41
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
42. Hbase components
• Master - Manages load balancing and scripting
• Regionserver – Range of tables assigned by master
Zookeper –
• Client communicate via Zookeeper for read write
operations in region servers for storing node details
• Region server uses Memstore similar to cache
memory
• Provides services for synchronization, maintenance
42
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016