Hive – What is?
• Data warehouse System Layer build on top of Hadoop
• Define Structure for your Unstructured Big Data
• Query this Data Using SQL like Language HiveQL
Hive - is not …Relational Database
• Use Relational database to store metadata.
• Data that HIVE process is stored in HDFS
Hive - is not… designed for online
• Runs on Hadoop ( batch Processing system)
• Jobs can have High latency with overhead
Hive - is not… real time queries and row
• Suited for batch jobs and over large sets of immutable data
Hive – What it does
• Hadoop was built to organize and store massive amounts of data.
• A Hadoop cluster is a reservoir of heterogeneous data, from multiple
sources and in different formats.
• Hive allows the user to explore and structure that data, analyze it,
and then turn it into business insight.
Hive – Architecture
Hive – Tables
• Hive Tables
• Data: in files in HDFS
• Schema: in metadata stored into relational tables
• Schema and Data are separated
• Hive needs schema for existing HDFS data
Hive – Pig x Hive
Pig is good for
Hive is for
• Query Data
• Preparing data for easier
• Need answer to specific
• for long series of steps to
• If you are familiar with sql
Hive – HiveQL
HCatalog – What it does
• Metadata and Table management System for Hadoop.
• shared schema and data type mechanism for different Hadoop tools
like pig, hive and MapReduce
• Interoperability across data processing tools
• Table abstraction, so you don’t need to worry with where and how
the data is stored.
HCatalog – Summary
• “Takes Hive Meatafdata and opens to everybody else”
HCatalog – Overview
• Access data Through Hcatalog