Hive training

Content
2
Introduction
Why Hive?
Configuring Hive
The Hive Shell
Hive Architecture
HiveQL
Data Types and Table types
Managed Table
External Table
Storage Formats
Queries
View
Hive Data Model
The Metastore
User Defined Functions
What Hive is not?

Introduction
3
Hive is a data warehouse infrastructure built on top of Apache Hadoop.
Hive is designed to enable
Easy data summarization
Ad-hoc querying
Analysis of large volumes of data
Hive provides a simple query language called Hive QL.
HiveQL allows traditional map/reduce programmers to be able to
plug in their custom mappers and reduce.

Why Hive
4
Need a multi petabyte warehouse.
Files are insufficient data abstractions Need Tables, Schema, Partitions,Indices
Need for an open data format
RDBMS have a closed data format
Flexible schema

Configuring Hive
5
Download a release at ftp://ftp.nextgen.com
Unpack the tarball in a suitable place on your workstation
%tar xzf hive-x.y.z-dev.tar.gz
Put Hive on your class path
%export HIVE_HOME=/home/EmpID/hive-x.y.z-dev
%export PATH=$PATH:$HIVE_INSTALL/bin
Type hive to launch the shell
% hive
hive>

The Hive Shell
6
The hive shell
is the primary
way that we
will interact
with Hive.
HiveQL is
Hive's query
language, a
dialect of SQL.
HiveQL is
generally case
insensitive(exc
ept for string
comparisons).
The hive shell
can be run in
non-interactive
mode also.
The -f option
runs the
commands in
the specified
script file.

Hive Architecture (Contd..)
10

Configuring Hive to have MySQL as Metastore DB
25

Configuring Hive to have MySQL as Metastore DB
26

User Defined Functions (Contd..)
28

Hive training

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hive training

Similar to Hive training (20)

More from Venkateswaran Kandasamy

More from Venkateswaran Kandasamy (6)

Recently uploaded

Recently uploaded (20)

Hive training