3. Hive – What is?
• Data warehouse System Layer build on top of Hadoop
• Define Structure for your Unstructured Big Data
• Query this Data Using SQL like Language HiveQL
@alepoletto
4. Hive - is not …Relational Database
• Use Relational database to store metadata.
• Data that HIVE process is stored in HDFS
@alepoletto
5. Hive - is not… designed for online
transactions
• Runs on Hadoop ( batch Processing system)
• Jobs can have High latency with overhead
@alepoletto
6. Hive - is not… real time queries and row
updates
• Suited for batch jobs and over large sets of immutable data
@alepoletto
7. Hive – What it does
• Hadoop was built to organize and store massive amounts of data.
• A Hadoop cluster is a reservoir of heterogeneous data, from multiple
sources and in different formats.
• Hive allows the user to explore and structure that data, analyze it,
and then turn it into business insight.
@alepoletto
9. Hive – Tables
• Hive Tables
• Data: in files in HDFS
• Schema: in metadata stored into relational tables
• Schema and Data are separated
• Hive needs schema for existing HDFS data
@alepoletto
11. Hive – Pig x Hive
Pig is good for
Hive is for
• ETL.
• Query Data
• Preparing data for easier
analyses.
• Need answer to specific
questions
• for long series of steps to
perform
• If you are familiar with sql
@alepoletto
14. HCatalog – What it does
• Metadata and Table management System for Hadoop.
• shared schema and data type mechanism for different Hadoop tools
like pig, hive and MapReduce
• Interoperability across data processing tools
• Table abstraction, so you don’t need to worry with where and how
the data is stored.
@alepoletto
15. HCatalog – Summary
• “Takes Hive Meatafdata and opens to everybody else”
@alepoletto