SlideShare a Scribd company logo
1 of 56
SESSION 2017-2018
B.TECH (CSE) YEAR: III SEMESTER: VI
INTRODUCTION TO HIVE
(CSE6005)
MODULE 2 (L6)
Presented By
Vivek Kumar
Dept of Computer Engineering & Applications
GLA University India
Agenda
Learning Objectives Learning Outcomes
Introduction to Hive
1. To study the Hive Architecture
2. To study the Hive File format
3. To study the Hive Query
Language
a) To understand the hive
architecture.
b) To create databases, tables and
execute data manipulation
language statements on it.
c) To differentiate between static
and dynamic partitions.
d) To differentiate between
managed and external tables.
Agenda
 What is Hive?
 Hive Architecture
 Hive Data Types
 Primitive Data Types
 Collection Data Types
 Hive File Format
 Text File
 Sequential File
 RCFile (Record Columnar File)
Agenda …
 Hive Query Language
 DDL (Data Definition Language) Statements
 DML (Data Manipulation Language) Statements
 Database
 Tables
 Partitions
 Buckets
 Aggregation
 Group BY and Having
 SERDER
Case Study: Retail
 Major Indian retailers
include FutureGroup, Reliance Industries, Tata
Group and Aditya Birla Group are using Hive.
 One of the retail groups, let’s call it BigX,
wanted their last 5 years semi- structured
dataset to be analyzed for trends and patterns.
 Let us see how we can solve their problem
using Hadoop.
Case Study: Retail cont..
About BigX
 BigX is a chain of hypermarket in India.
Currently there are 220+ stores across 85
cities and towns in India and employs 35,000+
people. Its annual revenue for the year 2011
was USD 1 Billion. It offers a wide range of
products including fashion and apparels, food
products, books, furniture, electronics, health
care, general merchandise and entertainment
sections.
Case Study: Retail cont..
Problem Scenario
1. One of BigX log datasets that needs to be
analyzed was approximately 12TB in overall
size and holds 5 years of vital information in
semi structured form.
Case Study: Retail cont..
2. Traditional business intelligence (BI) tools are
good up to a certain degree, usually several
hundreds of gigabytes. But when the scale is
of the order of terabytes and petabytes, these
frameworks become inefficient. Also, BI tools
work best when data is present in a known
pre-defined schema. The particular dataset
from BigX was mostly logs which didn’t
conform to any specific schema.
Case Study: Retail cont..
3. It took around 12+ hours to move the data
into their Business Intelligence systems bi-
weekly. BigX wanted to reduce this time
drastically.
4. Querying such large data set was taking too
long
Case Study: Retail cont..
Solution
 This is where Hadoop shines in all its glory as
a solution. Since the size of the logs dataset is
12TB, at such a large scale, the problem is 2-
fold:
 Problem 1: Moving the logs dataset to HDFS
periodically
 Problem 2: Performing the analysis on this
HDFS dataset
Case Study: Retail cont..
Solution of
Problem1
 Since logs are
unstructured in
this case, Sqoop
was of little or no
use. So Flume
was used to move
the log data
periodically into
HDFS.
Case Study: Retail cont..
Solution of Problem2
 Hive is a data warehouse infrastructure built on top of
Hadoop for providing data summarization, query and
analysis. It provides an SQL-like language called
HiveQL and converts the query into MapReduce tasks.
Hive in this Case Study
 Hive uses “Schema on Read” unlike a
traditional database which uses “Schema on
Write”.
 While reading log files, the simplest
recommended approach during Hive table
creation is to use a RegexSerDe.
 By default, Hive metadata is usually stored in
an embeddedDerbydatabase which allows
only one user to issue queries. This is not ideal
for production purposes. Hence, Hive was
Conclusion- Case Study: Retail
 Using the Hadoop system, log transfer time
was reduced to ~3 hours bi-weekly and
querying time also was significantly improved.
 Thanks to Vijay, for case study, Big Data Lead
at 8KMiles, holds M. Tech in Information
Retrieval from IIIT-B.
 https://yourstory.com/2012/04/hive-for-retail-
analysis/
What is Hive?
 Hive is a Data Warehousing tool. Hive is used
to query structured data built on top of
Hadoop.
 Facebook created Hive component to manage
their ever-growing volumes of data. Hive
makes use of the following:
1. HDFS for Storage
2. MapReduce for execution
3. Stores metadata in an RDBMS.
What is Hive ?
 Apache Hive is a popular SQL interface for
batch processing on Hadoop.
 Hadoop was built to organize and store
massive amounts of data.
 Hive gives another way to access Data inside
the cluster in easy, quick way.
 Hive provides a query language
called HiveQL that closely resembles the
common Structured Query Language (SQL)
standard.
 Hive was one of the earliest project to bring
higher-level languages to Apache Hadoop.
 Hive Gives ability to Analysts and Data
Scientists to access data with out being expert
in Java .
 Hive gives structure to Data on HDFS making
it data warehousing platform.
 This interface to Hadoop
 not only accelerates the time required to produce
results from data analysis,
 it significantly broadens who can use Hadoop and
MapReduce.
 Let us take a moment to thank Facebook team
because
 Hive was developed by the Facebook Data team
and, after being used internally,
 it was contributed to the Apache Software
Foundation .
 Currently Hive is freely available as an open
What Hive is not?
 Hive is not Relational Database, it uses a
database to store meta data, but the data that
hive processes is stored in HDFS.
 Hive is not designed for on-line transaction
processing(OLTP).
 Hive is not suited for real-time queries and row
level updates and it is best used for batch jobs
over large sets of immutable data such as web
logs.
Typical Use-Case of Hive
 Hive takes large amount of unstructured data
and place it into a structured view.
 Hive supports use cases such as Ad-hoc
queries, summarization, data analysis.
 HIVEQL can also be exchange with custom
scalar functions means user defined
functions(UDF'S), aggregations(UDFA's) and
table functions(UDTF's)
 It converts SQL queries into MapReduce jobs.
Features of Hive
1. It is similar to SQL.
2. HQL is easy to code.
3. Hive supports rich data types such as structs,
lists, and maps.
4. Hive supports SQL filters, group-by and order-
by clauses.
Prerequisites of Hive in Hadoop
 The prerequisites for setting up Hive and
running queries are
1. User should have stable build of Hadoop
2. Machine Should have Java 1.6 installed
3. Basic Java Programming skills
4. Basic SQL Knowledge
 Start all the services of Hadoop using the
command $ start-all.sh.
 Check all services are running, then use $ hive to
start HIVE
Hive Integration and Workflow
 Hourly Log data
can be stored
directly into
HDFS
 And then
datacleaning is
performed on the
log file
 Finally Hive Table
can be created to
query the log file.
Hadoop HDFS
Hourly Log
Log Compression
Hive table 2 Hive Table 1
Hive Architecture
Metastore
Driver (Query Compiler,
Executor)
Command-Line
Interface
Hive Web Interface
Hive Server
(Thrift)
Hive
JobTracker TaskTracker
Hadoop
HDFS
Hive Architecture
The various parts are as follows:
 Hive Command-.Line Interface (Hive CLI): The most
commonly used interface to interact with Hive.
 Hive Web Interface: It is a simple Graphic User
Interface to interact with Hive and to execute query.
 Hive Server: This is an optional server. This can be
used to submit Hive Jobs from a remote client.
 JDBC / ODBC: Jobs can be submitted from a JDBC
Client. One can write a Java code to connect to Hive
and submit jobs on it.
Hive Architecture
 Driver: Hive queries are sent to the driver for
compilation, optimization and execution.
 Metastore: Hive table definitions and mappings to the
data are stored in a Metastore. A Metastore
consists of the following:
'Metastore service: Offers interface to the Hive.
' Database: Stores data definitions, mappings to the
data and others.
 The metadata which is stored in the metastore includes IDs
of Database, IDs of Tables, IDs of Indexes etc, the time of
creation of a Table, the Input Format used for a Table, the
Output Format used for a table etc. The metastore is updated
whenever a cable is created or deleted from Hive. There are
three kinds of metastore.
Hive Architecture
 1. Embedded Metatore: This metastore is mainly used
for unit tests. Here, only one process is allowed to
connect to the metastore at a time. This is the default
metastore for Hive. It is Apache Derby Database. In this
metastore, both the database and the metastore service
runs, embedded in the main
Hive Server process. Figure 9.8 shows an Embedded
Mecastore.
 2. Local Metastore: Metadata can be stored in any
RDBMS component like MySQL Local metastore allows
multiple connections at a time. In this mode, the Hive
metastore service runs in the main Hive Server process,
but the metastore database runs in a separate process,
and can be on a separate host, Figure 9.9 shows a local
Hive Architecture
 3. Remote Metastore: In this, the Hive driver and the
metastore interface run on different JVMs (which can
run on different machines as well) as in Figure 9.10.
This way the database can be fire-walled from the Hive
user and also database credentials are completely
isolated from the users of Hive.
Hive Data Units
Hive Data Model Contd.
 Tables
- Analogous to relational tables
- Each table has a corresponding directory in
HDFS
- Data serialized and stored as files within that
directory
- Hive has default serialization built in which
supports compression and lazy deserialization
- Users can specify custom serialization –
deserialization schemes (SerDe’s)
Hive Data Model Contd.
 Partitions
- Each table can be broken into partitions
- Partitions determine distribution of data within
subdirectories
Example -
CREATE_TABLE Sales (sale_id INT, amount
FLOAT)
PARTITIONED BY (country STRING, year INT,
month INT)
So each partition will be split out into different folders
like
Sales/country=US/year=2012/month=12
Hierarchy of Hive Partitions
File File File
Partition
 The general definition of Partition
is horizontally dividing the data into number
of slice in a equal and manageable manner.
 Every partition is stored as directory within data
warehouse table.
 In data warehouse this partition concept is
common but there is two types of Partitions
are available in data warehouse concepts.
 There are
i) SQL Partition
ii) Hive Partition
Hive Partition
 The main work of Hive partition is also same
as SQL Partition but
 the main difference between SQL partition and
hive partition is SQL partition is only supported for
single column in table but in Hive partition it
supported for Multiple columns in a table .
 The main work of Hive partition is also same as
SQL Partition but the main difference between
SQL partition and hive partition is SQL partition is
only supported for single column in table but in
Hive partition it supported for Multiple columns in
a table .
Hive Data Model Contd.
 Buckets
- Data in each partition divided into buckets
- Based on a hash function of the column
- H(column) mod NumBuckets = bucket
number
- Each bucket is stored as a file in partition
directory
Hive Data Types
Numeric Data Type
TINYINT 1 - byte signed integer
SMALLINT 2 -byte signed integer
INT 4 - byte signed integer
BIGINT 8 - byte signed integer
FLOAT 4 - byte single-precision floating-point
DOUBLE 8 - byte double-precision floating-point number
String Types
STRING
VARCHAR Only available starting with Hive 0.12.0
CHAR Only available starting with Hive 0.13.0
Strings can be expressed in either single quotes (‘) or double quotes (“)
Miscellaneous Types
BOOLEAN
BINARY Only available starting with Hive
Hive Data Types cont..
Collection Data Types
STRUC
T
Similar to ‘C’ struct. Fields are accessed using dot notation.
E.g.: struct('John', 'Doe')
MAP A collection of key - value pairs. Fields are accessed using [] notation.
E.g.: map('first', 'John', 'last', 'Doe')
ARRAY Ordered sequence of same types. Fields are accessed using array index.
E.g.: array('John', 'Doe')
Hive File Format
 Text File: The default file format is text file.
 Sequential File: Sequential files are flat files
that store binary key-value pairs.
 RCFile (Record Columnar File):
RCFile stores the data in Column Oriented
Manner which ensures that Aggregation
operation is not an expensive operation.
Hive Query Language (HQL)
 Works on Databases, Tables, Partitions, Buckets
(Clusters)
 Create and manage tables and partitions.
 Support various Relational, Arithmetic, and Logical
Operators.
 Evaluate functions.
 Downloads the contents of a table to a local
directory or result of queries to HDFS directory.
Database
 To create a database named “STUDENTS”
with comments and database properties.
CREATE DATABASE IF NOT EXISTS
STUDENTS COMMENT 'STUDENT Details'
WITH DBPROPERTIES ('creator' = 'JOHN');
Database
 To describe a database
DESCRIBE DATABASE STUDENTS;
 To show Databases
SHOW DATABASES;
 To drop database.
DROP DATABASE STUDENTS;
Tables
 There are two types of tables in Hive:
Managed table
External table
 The difference between two is when you drop
a table:
 if it is managed table hive deletes both data and
meta data,
if it is external table hive only deletes metadata.
 Use external keyword to create a external
table
Tables
To create managed table named ‘STUDENT’.
CREATE TABLE IF NOT EXISTS
STUDENT(rollno INT,name STRING,gpa
FLOAT) ROW FORMAT DELIMITED FIELDS
TERMINATED BY 't';
Tables
To create external table named
‘EXT_STUDENT’.
CREATE EXTERNAL TABLE IF NOT EXISTS
EXT_STUDENT(rollno INT,name STRING,gpa
FLOAT) ROW FORMAT DELIMITED FIELDS
TERMINATED BY 't' LOCATION
‘/STUDENT_INFO;
Tables
To load data into the table from file named
student.tsv.
LOAD DATA LOCAL INPATH
‘/root/hivedemos/student.tsv' OVERWRITE
INTO TABLE EXT_STUDENT;
To retrieve the student details from
“EXT_STUDENT” table.
SELECT * from EXT_STUDENT;
Table ALTER Operations
 ALTER TABLE mytablename RENAME to mt;
 ALTER TABLE mytable ADD COLOUMNS (mycol
STRING);
 ALTER TABLE name RENAME TO new_name
 ALTER TABLE name DROP [COLUMN]
column_name
 ALTER TABLE name CHANGE column_name
new_name new_type
 ALTER TABLE name REPLACE COLUMNS
(col_spec[, col_spec ...])
Partitions
 Partitions split the larger dataset into more meaningful chunks.
 Hive provides two kinds of partitions: Static Partition and Dynamic
Partition.
• To create static partition based on “gpa” column.
CREATE TABLE IF NOT EXISTS STATIC_PART_STUDENT
(rollno INT, name STRING) PARTITIONED BY (gpa FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't';
Load data into partition table from table.
INSERT OVERWRITE TABLE STATIC_PART_STUDENT
PARTITION (gpa =4.0) SELECT rollno, name from
EXT_STUDENT where gpa=4.0;
Partitions
• To create dynamic partition on column date.
CREATE TABLE IF NOT EXISTS
DYNAMIC_PART_STUDENT(rollno INT, name STRING)
PARTITIONED BY (gpa FLOAT) ROW FORMAT DELIMITED
FIELDS TERMINATED BY 't';
To load data into a dynamic partition table from table.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Note: The dynamic partition strict mode requires at least one static
partition column. To turn this off,
set hive.exec.dynamic.partition.mode=nonstrict
INSERT OVERWRITE TABLE DYNAMIC_PART_STUDENT
PARTITION (gpa) SELECT rollno,name,gpa from
EXT_STUDENT;
Buckets
 Tables or partitions are sub-divided
into buckets, to provide extra structure to the
data that may be used for more efficient
querying. Bucketing works based on the value
of hash function of some column of a table.
 We can add partitions to a table by altering the
table. Let us assume we have a table
called employee with fields such as Id, Name,
Salary, Designation, Dept, and yoj.
Buckets
• To create a bucketed table having 3 buckets.
CREATE TABLE IF NOT EXISTS STUDENT_BUCKET (rollno
INT,name STRING,grade FLOAT)
CLUSTERED BY (grade) into 3 buckets;
Load data to bucketed table.
FROM STUDENT
INSERT OVERWRITE TABLE STUDENT_BUCKET
SELECT rollno,name,grade;
To display the content of first bucket.
SELECT DISTINCT GRADE FROM STUDENT_BUCKET
TABLESAMPLE(BUCKET 1 OUT OF 3 ON GRADE);
Aggregations
 Hive supports aggregation functions like avg,
count, etc.
 To write the average and count aggregation
function.
SELECT avg(gpa) FROM STUDENT;
SELECT count(*) FROM STUDENT;
Group by and Having
To write group by and having function.
SELECT rollno, name,gpa
FROM STUDENT
GROUP BY rollno,name,gpa
HAVING gpa > 4.0;
SerDer
 SerDer stands for Serializer/Deserializer.
 Contains the logic to convert unstructured data
into records.
 Implemented using Java.
 Serializers are used at the time of writing.
 Deserializers are used at query time (SELECT
Statement).
Fill in the blanks
 The metastore consists of ______________
and a ______________.
 The most commonly used interface to interact
with Hive is ______________.
 The default metastore for Hive is
______________.
 Metastore contains ______________ of Hive
tables.
 ______________ is responsible for
compilation, optimization, and execution of
Hive queries.

More Related Content

Similar to Introduction to Hive Architecture and Query Language

01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptxVIJAYAPRABAP
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...Puneet Kansal
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detailHariKumar544765
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 

Similar to Introduction to Hive Architecture and Query Language (20)

01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Big data
Big dataBig data
Big data
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detail
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
Hive.pptx
Hive.pptxHive.pptx
Hive.pptx
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 

More from Anonymous9etQKwW

More from Anonymous9etQKwW (11)

os distributed system theoretical foundation
os distributed system theoretical foundationos distributed system theoretical foundation
os distributed system theoretical foundation
 
osi model computer networks complete detail
osi model computer networks complete detailosi model computer networks complete detail
osi model computer networks complete detail
 
CODch3Slides.ppt
CODch3Slides.pptCODch3Slides.ppt
CODch3Slides.ppt
 
IntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptxIntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptx
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Lecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxLecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptx
 
mapreduceApril24.ppt
mapreduceApril24.pptmapreduceApril24.ppt
mapreduceApril24.ppt
 
ch7.ppt
ch7.pptch7.ppt
ch7.ppt
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
 

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

Introduction to Hive Architecture and Query Language

  • 1. SESSION 2017-2018 B.TECH (CSE) YEAR: III SEMESTER: VI INTRODUCTION TO HIVE (CSE6005) MODULE 2 (L6) Presented By Vivek Kumar Dept of Computer Engineering & Applications GLA University India
  • 2. Agenda Learning Objectives Learning Outcomes Introduction to Hive 1. To study the Hive Architecture 2. To study the Hive File format 3. To study the Hive Query Language a) To understand the hive architecture. b) To create databases, tables and execute data manipulation language statements on it. c) To differentiate between static and dynamic partitions. d) To differentiate between managed and external tables.
  • 3. Agenda  What is Hive?  Hive Architecture  Hive Data Types  Primitive Data Types  Collection Data Types  Hive File Format  Text File  Sequential File  RCFile (Record Columnar File)
  • 4. Agenda …  Hive Query Language  DDL (Data Definition Language) Statements  DML (Data Manipulation Language) Statements  Database  Tables  Partitions  Buckets  Aggregation  Group BY and Having  SERDER
  • 5. Case Study: Retail  Major Indian retailers include FutureGroup, Reliance Industries, Tata Group and Aditya Birla Group are using Hive.  One of the retail groups, let’s call it BigX, wanted their last 5 years semi- structured dataset to be analyzed for trends and patterns.  Let us see how we can solve their problem using Hadoop.
  • 6. Case Study: Retail cont.. About BigX  BigX is a chain of hypermarket in India. Currently there are 220+ stores across 85 cities and towns in India and employs 35,000+ people. Its annual revenue for the year 2011 was USD 1 Billion. It offers a wide range of products including fashion and apparels, food products, books, furniture, electronics, health care, general merchandise and entertainment sections.
  • 7. Case Study: Retail cont.. Problem Scenario 1. One of BigX log datasets that needs to be analyzed was approximately 12TB in overall size and holds 5 years of vital information in semi structured form.
  • 8. Case Study: Retail cont.. 2. Traditional business intelligence (BI) tools are good up to a certain degree, usually several hundreds of gigabytes. But when the scale is of the order of terabytes and petabytes, these frameworks become inefficient. Also, BI tools work best when data is present in a known pre-defined schema. The particular dataset from BigX was mostly logs which didn’t conform to any specific schema.
  • 9. Case Study: Retail cont.. 3. It took around 12+ hours to move the data into their Business Intelligence systems bi- weekly. BigX wanted to reduce this time drastically. 4. Querying such large data set was taking too long
  • 10. Case Study: Retail cont.. Solution  This is where Hadoop shines in all its glory as a solution. Since the size of the logs dataset is 12TB, at such a large scale, the problem is 2- fold:  Problem 1: Moving the logs dataset to HDFS periodically  Problem 2: Performing the analysis on this HDFS dataset
  • 11. Case Study: Retail cont.. Solution of Problem1  Since logs are unstructured in this case, Sqoop was of little or no use. So Flume was used to move the log data periodically into HDFS.
  • 12. Case Study: Retail cont.. Solution of Problem2  Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. It provides an SQL-like language called HiveQL and converts the query into MapReduce tasks.
  • 13.
  • 14. Hive in this Case Study  Hive uses “Schema on Read” unlike a traditional database which uses “Schema on Write”.  While reading log files, the simplest recommended approach during Hive table creation is to use a RegexSerDe.  By default, Hive metadata is usually stored in an embeddedDerbydatabase which allows only one user to issue queries. This is not ideal for production purposes. Hence, Hive was
  • 15. Conclusion- Case Study: Retail  Using the Hadoop system, log transfer time was reduced to ~3 hours bi-weekly and querying time also was significantly improved.  Thanks to Vijay, for case study, Big Data Lead at 8KMiles, holds M. Tech in Information Retrieval from IIIT-B.  https://yourstory.com/2012/04/hive-for-retail- analysis/
  • 16. What is Hive?  Hive is a Data Warehousing tool. Hive is used to query structured data built on top of Hadoop.  Facebook created Hive component to manage their ever-growing volumes of data. Hive makes use of the following: 1. HDFS for Storage 2. MapReduce for execution 3. Stores metadata in an RDBMS.
  • 17. What is Hive ?  Apache Hive is a popular SQL interface for batch processing on Hadoop.  Hadoop was built to organize and store massive amounts of data.  Hive gives another way to access Data inside the cluster in easy, quick way.
  • 18.  Hive provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard.  Hive was one of the earliest project to bring higher-level languages to Apache Hadoop.  Hive Gives ability to Analysts and Data Scientists to access data with out being expert in Java .  Hive gives structure to Data on HDFS making it data warehousing platform.
  • 19.  This interface to Hadoop  not only accelerates the time required to produce results from data analysis,  it significantly broadens who can use Hadoop and MapReduce.  Let us take a moment to thank Facebook team because  Hive was developed by the Facebook Data team and, after being used internally,  it was contributed to the Apache Software Foundation .  Currently Hive is freely available as an open
  • 20. What Hive is not?  Hive is not Relational Database, it uses a database to store meta data, but the data that hive processes is stored in HDFS.  Hive is not designed for on-line transaction processing(OLTP).  Hive is not suited for real-time queries and row level updates and it is best used for batch jobs over large sets of immutable data such as web logs.
  • 21. Typical Use-Case of Hive  Hive takes large amount of unstructured data and place it into a structured view.  Hive supports use cases such as Ad-hoc queries, summarization, data analysis.  HIVEQL can also be exchange with custom scalar functions means user defined functions(UDF'S), aggregations(UDFA's) and table functions(UDTF's)  It converts SQL queries into MapReduce jobs.
  • 22. Features of Hive 1. It is similar to SQL. 2. HQL is easy to code. 3. Hive supports rich data types such as structs, lists, and maps. 4. Hive supports SQL filters, group-by and order- by clauses.
  • 23. Prerequisites of Hive in Hadoop  The prerequisites for setting up Hive and running queries are 1. User should have stable build of Hadoop 2. Machine Should have Java 1.6 installed 3. Basic Java Programming skills 4. Basic SQL Knowledge  Start all the services of Hadoop using the command $ start-all.sh.  Check all services are running, then use $ hive to start HIVE
  • 24. Hive Integration and Workflow  Hourly Log data can be stored directly into HDFS  And then datacleaning is performed on the log file  Finally Hive Table can be created to query the log file. Hadoop HDFS Hourly Log Log Compression Hive table 2 Hive Table 1
  • 25. Hive Architecture Metastore Driver (Query Compiler, Executor) Command-Line Interface Hive Web Interface Hive Server (Thrift) Hive JobTracker TaskTracker Hadoop HDFS
  • 26. Hive Architecture The various parts are as follows:  Hive Command-.Line Interface (Hive CLI): The most commonly used interface to interact with Hive.  Hive Web Interface: It is a simple Graphic User Interface to interact with Hive and to execute query.  Hive Server: This is an optional server. This can be used to submit Hive Jobs from a remote client.  JDBC / ODBC: Jobs can be submitted from a JDBC Client. One can write a Java code to connect to Hive and submit jobs on it.
  • 27. Hive Architecture  Driver: Hive queries are sent to the driver for compilation, optimization and execution.  Metastore: Hive table definitions and mappings to the data are stored in a Metastore. A Metastore consists of the following: 'Metastore service: Offers interface to the Hive. ' Database: Stores data definitions, mappings to the data and others.  The metadata which is stored in the metastore includes IDs of Database, IDs of Tables, IDs of Indexes etc, the time of creation of a Table, the Input Format used for a Table, the Output Format used for a table etc. The metastore is updated whenever a cable is created or deleted from Hive. There are three kinds of metastore.
  • 28. Hive Architecture  1. Embedded Metatore: This metastore is mainly used for unit tests. Here, only one process is allowed to connect to the metastore at a time. This is the default metastore for Hive. It is Apache Derby Database. In this metastore, both the database and the metastore service runs, embedded in the main Hive Server process. Figure 9.8 shows an Embedded Mecastore.  2. Local Metastore: Metadata can be stored in any RDBMS component like MySQL Local metastore allows multiple connections at a time. In this mode, the Hive metastore service runs in the main Hive Server process, but the metastore database runs in a separate process, and can be on a separate host, Figure 9.9 shows a local
  • 29. Hive Architecture  3. Remote Metastore: In this, the Hive driver and the metastore interface run on different JVMs (which can run on different machines as well) as in Figure 9.10. This way the database can be fire-walled from the Hive user and also database credentials are completely isolated from the users of Hive.
  • 30.
  • 32. Hive Data Model Contd.  Tables - Analogous to relational tables - Each table has a corresponding directory in HDFS - Data serialized and stored as files within that directory - Hive has default serialization built in which supports compression and lazy deserialization - Users can specify custom serialization – deserialization schemes (SerDe’s)
  • 33. Hive Data Model Contd.  Partitions - Each table can be broken into partitions - Partitions determine distribution of data within subdirectories Example - CREATE_TABLE Sales (sale_id INT, amount FLOAT) PARTITIONED BY (country STRING, year INT, month INT) So each partition will be split out into different folders like Sales/country=US/year=2012/month=12
  • 34. Hierarchy of Hive Partitions File File File
  • 35. Partition  The general definition of Partition is horizontally dividing the data into number of slice in a equal and manageable manner.  Every partition is stored as directory within data warehouse table.  In data warehouse this partition concept is common but there is two types of Partitions are available in data warehouse concepts.  There are i) SQL Partition ii) Hive Partition
  • 36. Hive Partition  The main work of Hive partition is also same as SQL Partition but  the main difference between SQL partition and hive partition is SQL partition is only supported for single column in table but in Hive partition it supported for Multiple columns in a table .  The main work of Hive partition is also same as SQL Partition but the main difference between SQL partition and hive partition is SQL partition is only supported for single column in table but in Hive partition it supported for Multiple columns in a table .
  • 37. Hive Data Model Contd.  Buckets - Data in each partition divided into buckets - Based on a hash function of the column - H(column) mod NumBuckets = bucket number - Each bucket is stored as a file in partition directory
  • 38. Hive Data Types Numeric Data Type TINYINT 1 - byte signed integer SMALLINT 2 -byte signed integer INT 4 - byte signed integer BIGINT 8 - byte signed integer FLOAT 4 - byte single-precision floating-point DOUBLE 8 - byte double-precision floating-point number String Types STRING VARCHAR Only available starting with Hive 0.12.0 CHAR Only available starting with Hive 0.13.0 Strings can be expressed in either single quotes (‘) or double quotes (“) Miscellaneous Types BOOLEAN BINARY Only available starting with Hive
  • 39. Hive Data Types cont.. Collection Data Types STRUC T Similar to ‘C’ struct. Fields are accessed using dot notation. E.g.: struct('John', 'Doe') MAP A collection of key - value pairs. Fields are accessed using [] notation. E.g.: map('first', 'John', 'last', 'Doe') ARRAY Ordered sequence of same types. Fields are accessed using array index. E.g.: array('John', 'Doe')
  • 40. Hive File Format  Text File: The default file format is text file.  Sequential File: Sequential files are flat files that store binary key-value pairs.  RCFile (Record Columnar File): RCFile stores the data in Column Oriented Manner which ensures that Aggregation operation is not an expensive operation.
  • 41. Hive Query Language (HQL)  Works on Databases, Tables, Partitions, Buckets (Clusters)  Create and manage tables and partitions.  Support various Relational, Arithmetic, and Logical Operators.  Evaluate functions.  Downloads the contents of a table to a local directory or result of queries to HDFS directory.
  • 42. Database  To create a database named “STUDENTS” with comments and database properties. CREATE DATABASE IF NOT EXISTS STUDENTS COMMENT 'STUDENT Details' WITH DBPROPERTIES ('creator' = 'JOHN');
  • 43. Database  To describe a database DESCRIBE DATABASE STUDENTS;  To show Databases SHOW DATABASES;  To drop database. DROP DATABASE STUDENTS;
  • 44. Tables  There are two types of tables in Hive: Managed table External table  The difference between two is when you drop a table:  if it is managed table hive deletes both data and meta data, if it is external table hive only deletes metadata.  Use external keyword to create a external table
  • 45. Tables To create managed table named ‘STUDENT’. CREATE TABLE IF NOT EXISTS STUDENT(rollno INT,name STRING,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't';
  • 46. Tables To create external table named ‘EXT_STUDENT’. CREATE EXTERNAL TABLE IF NOT EXISTS EXT_STUDENT(rollno INT,name STRING,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LOCATION ‘/STUDENT_INFO;
  • 47. Tables To load data into the table from file named student.tsv. LOAD DATA LOCAL INPATH ‘/root/hivedemos/student.tsv' OVERWRITE INTO TABLE EXT_STUDENT; To retrieve the student details from “EXT_STUDENT” table. SELECT * from EXT_STUDENT;
  • 48. Table ALTER Operations  ALTER TABLE mytablename RENAME to mt;  ALTER TABLE mytable ADD COLOUMNS (mycol STRING);  ALTER TABLE name RENAME TO new_name  ALTER TABLE name DROP [COLUMN] column_name  ALTER TABLE name CHANGE column_name new_name new_type  ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
  • 49. Partitions  Partitions split the larger dataset into more meaningful chunks.  Hive provides two kinds of partitions: Static Partition and Dynamic Partition. • To create static partition based on “gpa” column. CREATE TABLE IF NOT EXISTS STATIC_PART_STUDENT (rollno INT, name STRING) PARTITIONED BY (gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'; Load data into partition table from table. INSERT OVERWRITE TABLE STATIC_PART_STUDENT PARTITION (gpa =4.0) SELECT rollno, name from EXT_STUDENT where gpa=4.0;
  • 50. Partitions • To create dynamic partition on column date. CREATE TABLE IF NOT EXISTS DYNAMIC_PART_STUDENT(rollno INT, name STRING) PARTITIONED BY (gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'; To load data into a dynamic partition table from table. SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; Note: The dynamic partition strict mode requires at least one static partition column. To turn this off, set hive.exec.dynamic.partition.mode=nonstrict INSERT OVERWRITE TABLE DYNAMIC_PART_STUDENT PARTITION (gpa) SELECT rollno,name,gpa from EXT_STUDENT;
  • 51. Buckets  Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.  We can add partitions to a table by altering the table. Let us assume we have a table called employee with fields such as Id, Name, Salary, Designation, Dept, and yoj.
  • 52. Buckets • To create a bucketed table having 3 buckets. CREATE TABLE IF NOT EXISTS STUDENT_BUCKET (rollno INT,name STRING,grade FLOAT) CLUSTERED BY (grade) into 3 buckets; Load data to bucketed table. FROM STUDENT INSERT OVERWRITE TABLE STUDENT_BUCKET SELECT rollno,name,grade; To display the content of first bucket. SELECT DISTINCT GRADE FROM STUDENT_BUCKET TABLESAMPLE(BUCKET 1 OUT OF 3 ON GRADE);
  • 53. Aggregations  Hive supports aggregation functions like avg, count, etc.  To write the average and count aggregation function. SELECT avg(gpa) FROM STUDENT; SELECT count(*) FROM STUDENT;
  • 54. Group by and Having To write group by and having function. SELECT rollno, name,gpa FROM STUDENT GROUP BY rollno,name,gpa HAVING gpa > 4.0;
  • 55. SerDer  SerDer stands for Serializer/Deserializer.  Contains the logic to convert unstructured data into records.  Implemented using Java.  Serializers are used at the time of writing.  Deserializers are used at query time (SELECT Statement).
  • 56. Fill in the blanks  The metastore consists of ______________ and a ______________.  The most commonly used interface to interact with Hive is ______________.  The default metastore for Hive is ______________.  Metastore contains ______________ of Hive tables.  ______________ is responsible for compilation, optimization, and execution of Hive queries.