Hive hcatalog
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,161
On Slideshare
1,161
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
35
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. @alepoletto
  • 2. Hive @alepoletto
  • 3. Hive – What is? • Data warehouse System Layer build on top of Hadoop • Define Structure for your Unstructured Big Data • Query this Data Using SQL like Language HiveQL @alepoletto
  • 4. Hive - is not …Relational Database • Use Relational database to store metadata. • Data that HIVE process is stored in HDFS @alepoletto
  • 5. Hive - is not… designed for online transactions • Runs on Hadoop ( batch Processing system) • Jobs can have High latency with overhead @alepoletto
  • 6. Hive - is not… real time queries and row updates • Suited for batch jobs and over large sets of immutable data @alepoletto
  • 7. Hive – What it does • Hadoop was built to organize and store massive amounts of data. • A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats. • Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight. @alepoletto
  • 8. Hive – Architecture @alepoletto
  • 9. Hive – Tables • Hive Tables • Data: in files in HDFS • Schema: in metadata stored into relational tables • Schema and Data are separated • Hive needs schema for existing HDFS data @alepoletto
  • 10. @alepoletto
  • 11. Hive – Pig x Hive Pig is good for Hive is for • ETL. • Query Data • Preparing data for easier analyses. • Need answer to specific questions • for long series of steps to perform • If you are familiar with sql @alepoletto
  • 12. Hive – HiveQL @alepoletto
  • 13. @alepoletto
  • 14. HCatalog – What it does • Metadata and Table management System for Hadoop. • shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce • Interoperability across data processing tools • Table abstraction, so you don’t need to worry with where and how the data is stored. @alepoletto
  • 15. HCatalog – Summary • “Takes Hive Meatafdata and opens to everybody else” @alepoletto
  • 16. HCatalog – Overview • Access data Through Hcatalog @alepoletto
  • 17. HCatalog – Archtecture @alepoletto
  • 18. @alepoletto