2. Content
2
Introduction
Why Hive?
Configuring Hive
The Hive Shell
Hive Architecture
HiveQL
Data Types and Table types
Managed Table
External Table
Storage Formats
Queries
View
Hive Data Model
The Metastore
User Defined Functions
What Hive is not?
3. Introduction
3
Hive is a data warehouse infrastructure built on top of Apache Hadoop.
Hive is designed to enable
Easy data summarization
Ad-hoc querying
Analysis of large volumes of data
Hive provides a simple query language called Hive QL.
HiveQL allows traditional map/reduce programmers to be able to
plug in their custom mappers and reduce.
4. Why Hive
4
Need a multi petabyte warehouse.
Files are insufficient data abstractions Need Tables, Schema, Partitions,Indices
Need for an open data format
RDBMS have a closed data format
Flexible schema
5. Configuring Hive
5
Download a release at ftp://ftp.nextgen.com
Unpack the tarball in a suitable place on your workstation
%tar xzf hive-x.y.z-dev.tar.gz
Put Hive on your class path
%export HIVE_HOME=/home/EmpID/hive-x.y.z-dev
%export PATH=$PATH:$HIVE_INSTALL/bin
Type hive to launch the shell
% hive
hive>
6. The Hive Shell
6
The hive shell
is the primary
way that we
will interact
with Hive.
HiveQL is
Hive's query
language, a
dialect of SQL.
HiveQL is
generally case
insensitive(exc
ept for string
comparisons).
The hive shell
can be run in
non-interactive
mode also.
The -f option
runs the
commands in
the specified
script file.