• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hive hcatalog
 

Hive hcatalog

on

  • 663 views

 

Statistics

Views

Total Views
663
Views on SlideShare
663
Embed Views
0

Actions

Likes
1
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hive hcatalog Hive hcatalog Presentation Transcript

    • @alepoletto
    • Hive @alepoletto
    • Hive – What is? • Data warehouse System Layer build on top of Hadoop • Define Structure for your Unstructured Big Data • Query this Data Using SQL like Language HiveQL @alepoletto
    • Hive - is not …Relational Database • Use Relational database to store metadata. • Data that HIVE process is stored in HDFS @alepoletto
    • Hive - is not… designed for online transactions • Runs on Hadoop ( batch Processing system) • Jobs can have High latency with overhead @alepoletto
    • Hive - is not… real time queries and row updates • Suited for batch jobs and over large sets of immutable data @alepoletto
    • Hive – What it does • Hadoop was built to organize and store massive amounts of data. • A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats. • Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight. @alepoletto
    • Hive – Architecture @alepoletto
    • Hive – Tables • Hive Tables • Data: in files in HDFS • Schema: in metadata stored into relational tables • Schema and Data are separated • Hive needs schema for existing HDFS data @alepoletto
    • @alepoletto
    • Hive – Pig x Hive Pig is good for Hive is for • ETL. • Query Data • Preparing data for easier analyses. • Need answer to specific questions • for long series of steps to perform • If you are familiar with sql @alepoletto
    • Hive – HiveQL @alepoletto
    • @alepoletto
    • HCatalog – What it does • Metadata and Table management System for Hadoop. • shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce • Interoperability across data processing tools • Table abstraction, so you don’t need to worry with where and how the data is stored. @alepoletto
    • HCatalog – Summary • “Takes Hive Meatafdata and opens to everybody else” @alepoletto
    • HCatalog – Overview • Access data Through Hcatalog @alepoletto
    • HCatalog – Archtecture @alepoletto
    • @alepoletto