Your SlideShare is downloading. ×
0
@alepoletto
Hive

@alepoletto
Hive – What is?
• Data warehouse System Layer build on top of Hadoop

• Define Structure for your Unstructured Big Data
• ...
Hive - is not …Relational Database
• Use Relational database to store metadata.

• Data that HIVE process is stored in HDF...
Hive - is not… designed for online
transactions
• Runs on Hadoop ( batch Processing system)

• Jobs can have High latency ...
Hive - is not… real time queries and row
updates
• Suited for batch jobs and over large sets of immutable data

@alepolett...
Hive – What it does
• Hadoop was built to organize and store massive amounts of data.

• A Hadoop cluster is a reservoir o...
Hive – Architecture

@alepoletto
Hive – Tables
• Hive Tables
• Data: in files in HDFS
• Schema: in metadata stored into relational tables

• Schema and Dat...
@alepoletto
Hive – Pig x Hive
Pig is good for

Hive is for

• ETL.

• Query Data

• Preparing data for easier
analyses.

• Need answer...
Hive – HiveQL

@alepoletto
@alepoletto
HCatalog – What it does
• Metadata and Table management System for Hadoop.

• shared schema and data type mechanism for di...
HCatalog – Summary
• “Takes Hive Meatafdata and opens to everybody else”

@alepoletto
HCatalog – Overview
• Access data Through Hcatalog

@alepoletto
HCatalog – Archtecture

@alepoletto
@alepoletto
Upcoming SlideShare
Loading in...5
×

Hive hcatalog

1,642

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,642
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
61
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Hive hcatalog"

  1. 1. @alepoletto
  2. 2. Hive @alepoletto
  3. 3. Hive – What is? • Data warehouse System Layer build on top of Hadoop • Define Structure for your Unstructured Big Data • Query this Data Using SQL like Language HiveQL @alepoletto
  4. 4. Hive - is not …Relational Database • Use Relational database to store metadata. • Data that HIVE process is stored in HDFS @alepoletto
  5. 5. Hive - is not… designed for online transactions • Runs on Hadoop ( batch Processing system) • Jobs can have High latency with overhead @alepoletto
  6. 6. Hive - is not… real time queries and row updates • Suited for batch jobs and over large sets of immutable data @alepoletto
  7. 7. Hive – What it does • Hadoop was built to organize and store massive amounts of data. • A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats. • Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight. @alepoletto
  8. 8. Hive – Architecture @alepoletto
  9. 9. Hive – Tables • Hive Tables • Data: in files in HDFS • Schema: in metadata stored into relational tables • Schema and Data are separated • Hive needs schema for existing HDFS data @alepoletto
  10. 10. @alepoletto
  11. 11. Hive – Pig x Hive Pig is good for Hive is for • ETL. • Query Data • Preparing data for easier analyses. • Need answer to specific questions • for long series of steps to perform • If you are familiar with sql @alepoletto
  12. 12. Hive – HiveQL @alepoletto
  13. 13. @alepoletto
  14. 14. HCatalog – What it does • Metadata and Table management System for Hadoop. • shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce • Interoperability across data processing tools • Table abstraction, so you don’t need to worry with where and how the data is stored. @alepoletto
  15. 15. HCatalog – Summary • “Takes Hive Meatafdata and opens to everybody else” @alepoletto
  16. 16. HCatalog – Overview • Access data Through Hcatalog @alepoletto
  17. 17. HCatalog – Archtecture @alepoletto
  18. 18. @alepoletto
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×