HCatalog Hadoop Summit 2011

HCatalogTable Management for HadoopAlan F. Gates

Committer and mentor for Apache HCatalogCommitter and PMC member for Apache PigCo-founder HortonworksTwitter @alanfgatesWho Am I?Photo credit: Charles Dawley

MotivationPigHiveMap ReduceHCatSerDeCustomLoaderHiveColumnarLoaderRCFileInputFormatCustomInputFormatColumnarSerDeCustomSerDeHCatLoaderHCatInputFormatHCatalogRCFileStorageDriverCustomStorageDriverCustom FormatRCFile

More MotivationData WarehouseHiveBI ToolsAnalysisData FactoryPig/MapReducePipelinesIterative ProcessingResearch

End User Exampleraw = load ‘/rawevents/20100819/data’using MyLoader()as (ts:long, user:chararray, url:chararray);botless= filterraw byNotABot(user);…storeoutput into ‘/processedevents/20100819/data’;Processedevents consumersmust be manually informed by producer thatdata isavailable, or poll on HDFS (= bad for the NameNode)raw = load ‘rawevents’ using HCatLoader();botless= filterraw by date = ‘20100819’and NotABot(user);…storeoutput into ‘processedevents’using HCatStorage(“date=20100819”);Processedevents consumers will be notified by HCatalogdata is available and canthen start their jobs

Metadata ArchitectureHCatLoaderHCatStorageHCatInputFormatHCatOutputFormatCLINotificationHive metadata interfaceThrift serverRDBMS= Current HCatalog= Hive= Future HCatalog

Storage ArchitectureHCatLoaderHCatStorageHCatInputFormatHCatOutputFormatInputStorageDriverOutputStorageDriverHDFSHBase

Project StatusHCatalog was accepted to the Apache Incubator last March0.1 released this month, includesRead/write from PigRead/write from MapReduceRead/write from HiveWorks only with secure HadoopStorageDrivers for RCFile and Text

Future Plans0.2, plan to release in JulyNotification via JMS when data is availableStore to multiple partitions simultaneouslyImport/Export toolsLater this yearStoring in HBaseIntegration with Hadoop streamingBytearray/blob typeRCFile compression improvementsHigh Availability for Thrift serverEventuallyData management interfaces for archivers, cleaners, etc.Statistics storage

Get Involvedincubator.apache.org/hcatalogJoin the mailing lists User list: hcatalog-user@incubator.apache.orgDev list: hcatalog-dev@incubator.apache.org

HCatalog Hadoop Summit 2011

More Related Content

What's hot

Viewers also liked

Similar to HCatalog Hadoop Summit 2011

More from Hortonworks

Recently uploaded

HCatalog Hadoop Summit 2011

Editor's Notes