Hive hcatalog

2,267
-1

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,267
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
66
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hive hcatalog

  1. 1. @alepoletto
  2. 2. Hive @alepoletto
  3. 3. Hive – What is? • Data warehouse System Layer build on top of Hadoop • Define Structure for your Unstructured Big Data • Query this Data Using SQL like Language HiveQL @alepoletto
  4. 4. Hive - is not …Relational Database • Use Relational database to store metadata. • Data that HIVE process is stored in HDFS @alepoletto
  5. 5. Hive - is not… designed for online transactions • Runs on Hadoop ( batch Processing system) • Jobs can have High latency with overhead @alepoletto
  6. 6. Hive - is not… real time queries and row updates • Suited for batch jobs and over large sets of immutable data @alepoletto
  7. 7. Hive – What it does • Hadoop was built to organize and store massive amounts of data. • A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats. • Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight. @alepoletto
  8. 8. Hive – Architecture @alepoletto
  9. 9. Hive – Tables • Hive Tables • Data: in files in HDFS • Schema: in metadata stored into relational tables • Schema and Data are separated • Hive needs schema for existing HDFS data @alepoletto
  10. 10. @alepoletto
  11. 11. Hive – Pig x Hive Pig is good for Hive is for • ETL. • Query Data • Preparing data for easier analyses. • Need answer to specific questions • for long series of steps to perform • If you are familiar with sql @alepoletto
  12. 12. Hive – HiveQL @alepoletto
  13. 13. @alepoletto
  14. 14. HCatalog – What it does • Metadata and Table management System for Hadoop. • shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce • Interoperability across data processing tools • Table abstraction, so you don’t need to worry with where and how the data is stored. @alepoletto
  15. 15. HCatalog – Summary • “Takes Hive Meatafdata and opens to everybody else” @alepoletto
  16. 16. HCatalog – Overview • Access data Through Hcatalog @alepoletto
  17. 17. HCatalog – Archtecture @alepoletto
  18. 18. @alepoletto
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×