Your SlideShare is downloading. ×
Hive hcatalog
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hive hcatalog

1,094
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,094
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. @alepoletto
  • 2. Hive @alepoletto
  • 3. Hive – What is? • Data warehouse System Layer build on top of Hadoop • Define Structure for your Unstructured Big Data • Query this Data Using SQL like Language HiveQL @alepoletto
  • 4. Hive - is not …Relational Database • Use Relational database to store metadata. • Data that HIVE process is stored in HDFS @alepoletto
  • 5. Hive - is not… designed for online transactions • Runs on Hadoop ( batch Processing system) • Jobs can have High latency with overhead @alepoletto
  • 6. Hive - is not… real time queries and row updates • Suited for batch jobs and over large sets of immutable data @alepoletto
  • 7. Hive – What it does • Hadoop was built to organize and store massive amounts of data. • A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats. • Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight. @alepoletto
  • 8. Hive – Architecture @alepoletto
  • 9. Hive – Tables • Hive Tables • Data: in files in HDFS • Schema: in metadata stored into relational tables • Schema and Data are separated • Hive needs schema for existing HDFS data @alepoletto
  • 10. @alepoletto
  • 11. Hive – Pig x Hive Pig is good for Hive is for • ETL. • Query Data • Preparing data for easier analyses. • Need answer to specific questions • for long series of steps to perform • If you are familiar with sql @alepoletto
  • 12. Hive – HiveQL @alepoletto
  • 13. @alepoletto
  • 14. HCatalog – What it does • Metadata and Table management System for Hadoop. • shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce • Interoperability across data processing tools • Table abstraction, so you don’t need to worry with where and how the data is stored. @alepoletto
  • 15. HCatalog – Summary • “Takes Hive Meatafdata and opens to everybody else” @alepoletto
  • 16. HCatalog – Overview • Access data Through Hcatalog @alepoletto
  • 17. HCatalog – Archtecture @alepoletto
  • 18. @alepoletto