SlideShare a Scribd company logo
HAWQ HCatalog Integration
Shivram Mani ( HAWQ-UD)
What is HCatalog ?
Table and storage management layer for Hadoop
Relational view of data in the Hadoop distributed file system
Abstracts data format/location from user
Built on top of the Hive metastore and incorporates Hive's DDL
HAWQ->(PXF)->HIVE (before)
HAWQ HIVE
CREATE EXTERNAL TABLE zoo_ext (
id double, animal string, age int
)
LOCATION ('pxf://pivotal:50070//zoo?PROFILE=hive')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
PXF
PXF
PXF
SELECT * FROM zoo_ext;
disk heap:
pg_exttable
pg_class...
3
2
1
Motivation
Problem:
● User has to know Hive table schema
● Entering schema/location/format information is error prone
● External table (static) would not be aware of Hive table metadata change
(dynamic)
Solution:
● Native integration of HCatalog into HAWQ
○ HAWQ catalog manage HAWQ tables
○ HCatalog manage external(Hive, Pig, HBase) tables
HAWQ->(PXF)->HIVE (before)
HAWQ HIVE
CREATE EXTERNAL TABLE zoo_ext (
id double, animal string, age int
)
LOCATION ('pxf://pivotal:50070//zoo?PROFILE=hive')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
PXF
PXF
PXF
SELECT * FROM zoo_ext;
disk heap:
pg_exttable
pg_class...
3
1
2
zoo:
id double
animal string
age int
HAWQ->(HCat)->(PXF)->HIVE (NOW)
HAWQ HIVE
CREATE EXTERNAL TABLE zoo_ext (
id double, animal string, age int
) LOCATION ('pxf://pivotal:50070//zoo?PROFILE=hive')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
PXF
PXF
PXF
SELECT * FROM zoo_ext;
SELECT * FROM hcatalog.default.zoo;
*** hcatalog.[HIVE_DB_NAME].[HIVE_TABLE_NAME]
HCAT
disk heap:
pg_exttable
pg_class...
in-memory:
pg_exttable
pg_class...
1
2 3
zoo:
id double
animal string
age int
HAWQ->(HCat)->(PXF)->HIVE (NOW)
HAWQ HIVE
1. Retrieve metadata from HCatalog
a. HAWQ->(REST)->proxy->(Thrift)->HCatalog
2. Parse metadata into in-memory only catalog tables
a. Extended caql and syscache
3. Query with PXF (unchanged)
PXF
PXF
PXF
SELECT * FROM hcatalog.default.zoo;
*** hcatalog.[HIVE_DB_NAME].[HIVE_TABLE_NAME]
*** Also modified grammar to recognize hcatalog queries
HCAT
disk heap:
pg_exttable
pg_class...
in-memory:
pg_exttable
pg_class...
1
2 3
Under the hood
Per session:
● Check if enough oid left for HCatalog (only when accessing HCatalog)
Per transaction:
● Get table metadata of interest on initial access
● Create in-memory catalog tables from metadata
○ pg_namespace, pg_class, pg_exttable, pg_type, pg_attribute, gp_distribution_policy
● Same table referenced multiple times will use in-memory metadata, minimizing external
calls to HCatalog
● Table metadata dropped at the end of txn
● In-memory catalog table constraints enforced (e.g. unique oid, namespace-relation name)
DEMO
In Hive:
describe demo.hive_types;
In HAWQ:
SELECT * FROM hcatalog.demo.hive_types;
SELECT s1, f FROM hcatalog.demo.hive_types
where vc1 = 'abcde' and dc1 > 1.0;

More Related Content

What's hot

Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
bigdatagurus_meetup
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
Bopyo Hong
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
Brett Keim
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
hypertable
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
exsuns
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
AnandMHadoop
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
Rohit Agrawal
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gates
trihug
 
Commands documentaion
Commands documentaionCommands documentaion
Commands documentaion
TejalNijai
 
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
camp_drupal_ua
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
Apache Apex
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
Amir Sedighi
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
exsuns
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
Hdfs lab hands-on
Hdfs lab hands-on Hdfs lab hands-on
Hdfs lab hands-on
GrowwithParmesh
 
HBase Incremental Backup
HBase Incremental BackupHBase Incremental Backup
HBase Incremental Backup
Lee neal
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
Command Prompt., Inc
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Amir Sedighi
 
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh naik linux_kernel_internals
Ganesh naik linux_kernel_internals
Ganesh Naik
 

What's hot (20)

Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gates
 
Commands documentaion
Commands documentaionCommands documentaion
Commands documentaion
 
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Hdfs lab hands-on
Hdfs lab hands-on Hdfs lab hands-on
Hdfs lab hands-on
 
HBase Incremental Backup
HBase Incremental BackupHBase Incremental Backup
HBase Incremental Backup
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh naik linux_kernel_internals
Ganesh naik linux_kernel_internals
 

Viewers also liked

Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
Alexey Grishchenko
 
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
Alexey Grishchenko
 
gsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Manigsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Mani
Shivram Mani
 
データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」
データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」
データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」
Masayuki Matsushita
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PivotalOpenSourceHub
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARI
Mithun (Matt) Mathew
 
Apache HAWQ : An Introduction
Apache HAWQ : An IntroductionApache HAWQ : An Introduction
Apache HAWQ : An Introduction
Sandeep Kunkunuru
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
BigData Research
 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)
saravana krishnamurthy
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
VMware Tanzu
 
Build & test Apache Hawq
Build & test Apache Hawq Build & test Apache Hawq
Build & test Apache Hawq
PivotalOpenSourceHub
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
InMobi Technology
 
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to ApacheApache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1
seungdon Choi
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
Muhammad Ali
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
Seungdon Choi
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
VMware Tanzu
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
Alexey Grishchenko
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
How to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARNHow to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARN
Hortonworks
 

Viewers also liked (20)

Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
 
gsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Manigsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Mani
 
データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」
データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」
データ活用を推進する「Pivotal HDB(Apache HAWQ(ホーク))」
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARI
 
Apache HAWQ : An Introduction
Apache HAWQ : An IntroductionApache HAWQ : An Introduction
Apache HAWQ : An Introduction
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
Build & test Apache Hawq
Build & test Apache Hawq Build & test Apache Hawq
Build & test Apache Hawq
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
 
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to ApacheApache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
How to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARNHow to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARN
 

Similar to Hawq Hcatalog Integration

第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
Toshihiro Suzuki
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Sumeet Singh
 
Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014
thiruvel
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Sumeet Singh
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
DataWorks Summit
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...
Sameer Tiwari
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
Hortonworks
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Ajit Koti
 
Profiling with Xhprof
Profiling with XhprofProfiling with Xhprof
Profiling with Xhprof
Tim Massey
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
The HDF-EOS Tools and Information Center
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSHadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Erik Krogen
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine
Monowar Mukul
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
Sumeet Singh
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
Yahoo Developer Network
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
Edward D. Kim
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
Madhur Nawandar
 
Hadoop Hive
Hadoop HiveHadoop Hive
Hadoop Hive
Madhur Nawandar
 
Writing A Foreign Data Wrapper
Writing A Foreign Data WrapperWriting A Foreign Data Wrapper
Writing A Foreign Data Wrapper
psoo1978
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
Cloudera, Inc.
 

Similar to Hawq Hcatalog Integration (20)

第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
 
Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Profiling with Xhprof
Profiling with XhprofProfiling with Xhprof
Profiling with Xhprof
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSHadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
Hadoop Hive
Hadoop HiveHadoop Hive
Hadoop Hive
 
Writing A Foreign Data Wrapper
Writing A Foreign Data WrapperWriting A Foreign Data Wrapper
Writing A Foreign Data Wrapper
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 

Recently uploaded

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 

Recently uploaded (20)

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 

Hawq Hcatalog Integration

  • 2. What is HCatalog ? Table and storage management layer for Hadoop Relational view of data in the Hadoop distributed file system Abstracts data format/location from user Built on top of the Hive metastore and incorporates Hive's DDL
  • 3.
  • 4. HAWQ->(PXF)->HIVE (before) HAWQ HIVE CREATE EXTERNAL TABLE zoo_ext ( id double, animal string, age int ) LOCATION ('pxf://pivotal:50070//zoo?PROFILE=hive') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); PXF PXF PXF SELECT * FROM zoo_ext; disk heap: pg_exttable pg_class... 3 2 1
  • 5. Motivation Problem: ● User has to know Hive table schema ● Entering schema/location/format information is error prone ● External table (static) would not be aware of Hive table metadata change (dynamic) Solution: ● Native integration of HCatalog into HAWQ ○ HAWQ catalog manage HAWQ tables ○ HCatalog manage external(Hive, Pig, HBase) tables
  • 6. HAWQ->(PXF)->HIVE (before) HAWQ HIVE CREATE EXTERNAL TABLE zoo_ext ( id double, animal string, age int ) LOCATION ('pxf://pivotal:50070//zoo?PROFILE=hive') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); PXF PXF PXF SELECT * FROM zoo_ext; disk heap: pg_exttable pg_class... 3 1 2
  • 7. zoo: id double animal string age int HAWQ->(HCat)->(PXF)->HIVE (NOW) HAWQ HIVE CREATE EXTERNAL TABLE zoo_ext ( id double, animal string, age int ) LOCATION ('pxf://pivotal:50070//zoo?PROFILE=hive') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); PXF PXF PXF SELECT * FROM zoo_ext; SELECT * FROM hcatalog.default.zoo; *** hcatalog.[HIVE_DB_NAME].[HIVE_TABLE_NAME] HCAT disk heap: pg_exttable pg_class... in-memory: pg_exttable pg_class... 1 2 3
  • 8. zoo: id double animal string age int HAWQ->(HCat)->(PXF)->HIVE (NOW) HAWQ HIVE 1. Retrieve metadata from HCatalog a. HAWQ->(REST)->proxy->(Thrift)->HCatalog 2. Parse metadata into in-memory only catalog tables a. Extended caql and syscache 3. Query with PXF (unchanged) PXF PXF PXF SELECT * FROM hcatalog.default.zoo; *** hcatalog.[HIVE_DB_NAME].[HIVE_TABLE_NAME] *** Also modified grammar to recognize hcatalog queries HCAT disk heap: pg_exttable pg_class... in-memory: pg_exttable pg_class... 1 2 3
  • 9. Under the hood Per session: ● Check if enough oid left for HCatalog (only when accessing HCatalog) Per transaction: ● Get table metadata of interest on initial access ● Create in-memory catalog tables from metadata ○ pg_namespace, pg_class, pg_exttable, pg_type, pg_attribute, gp_distribution_policy ● Same table referenced multiple times will use in-memory metadata, minimizing external calls to HCatalog ● Table metadata dropped at the end of txn ● In-memory catalog table constraints enforced (e.g. unique oid, namespace-relation name)
  • 10. DEMO In Hive: describe demo.hive_types; In HAWQ: SELECT * FROM hcatalog.demo.hive_types; SELECT s1, f FROM hcatalog.demo.hive_types where vc1 = 'abcde' and dc1 > 1.0;