Apache Hive

Content
What all you need to
know about HIVE at very
high level:
✦ Architecture
✦ Workflow
✦ Read On Schema
Approach
✦ Functions
✦ Join Strategies

HIVE: Architecture
Refer: https://www.tutorialspoint.com/hive/hive_introduction.htm

HIVE: Workflow
Refer: https://www.tutorialspoint.com/hive/hive_introduction.htm

HIVE: Schema on Read
Approach
Let user to redefine tables to match the data without touching the data, unlike Mysql’s Schema on Write
approach. (Reference: https://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/)
No predetermined structure so the data can be presented in a schema that is most relevant to the task at
hand.
Upfront modeling exercise disappears.

Hive has serialization and deserialization adapters to let the user do this, so it
isn’t intended for online tasks requiring heavy read/write traffic.
HIVE:
Seralization/Deserialization

HIVE: Functions
There are three types of function APIs in Hive:
Built-In Functions
UDF (User Defined Functions- Normal functions) is a function that takes one or
more columns from a row as argument and returns a single value or object. Eg-
concat(arg1, arg2)
UDTF (User Defined Table Functions) takes zero or more inputs and and produces
multiple columns or rows of output. Eg: explode()
UDAF (User Defined Aggregate Functions)
Macros a function that uses other Hive functions.
Reference:
https://www.qubole.com/resources/hive-function-cheat-sheet/

HIVE: Join
Hive allows only equi-join.
So ON clause can have only equal conditions
(=) combined with AND operator only.

HIVE: Join Strategies
Map-Reduce Join
Map Side Join (join during map phase)
Reduce Side Join (join during reduce phase)
Hive Shuffle Join
Hive Map-Side Join (Broadcast Join)
Hive Bucket Join

HIVE: Map-Side Join
Join the records by key during read
of the input files
Highly constraint
Both tables should be sorted on
same join key
Both tables should have same
number of partitions.
Usually achieved when both input
tables were created by (different)
MapReduce jobs having the same
amount of reducers using the same
(join) key.

Apache Hive

Recommended

Recommended

More Related Content

Similar to Apache Hive

Similar to Apache Hive (20)

More from Surinder Kaur

More from Surinder Kaur (12)

Recently uploaded

Recently uploaded (20)

Apache Hive