2. Agenda
Origin – The Making of Hive Story
What is HIVE?
Why use Hive?
Hive Architecture
Hive Metastore
Configuring Hive
Important metastore configuration properties
Comparison with Traditional Databases
Hive Data Types
Hive Tables types
Store Hive table to HDFS file
15. What is HIVE?
Hive is a data warehouse infrastructure built on top of Hadoop that can compile SQL queries as
MapReduce jobs and run the job in the cluster
Associate structure with a variety of data formats
Logical Table -‐> Physical Location
Logical Table -‐> Physical Data Format Handler (SerDe)
Integrates with HDFS, HBase, MongoDB etc.
16. Why use Hive?
MapReduce is catered towards developers
Run SQL-‐like queries that get compiled and run as MapReduce jobs
Data in Hadoop even though generally unstructured has some vague structure associated with it
We’ll get Benefits of MapReduce + HDFS (Hadoop)
Fault tolerant
Robust
Scalable
19. Configuring Hive
For Exposing to hive-site.xml file:
% hive --config /Users/tom/dev/hive-conf
For Exposing to certain properties:
hive -hiveconf fs.defaultFS=hdfs://localhost
-hiveconf mapreduce.framework.name=yarn
-hiveconf yarn.resourcemanager.address=localhost:8032
For Exposing to certain properties within the shell:
SET hive.execution.engine=tez;
Logging:
hive -hiveconf hive.log.dir='/tmp/${user.name}'