Jan 2012 HUG: HCatalog
by Yahoo! Developer Network on Jan 31, 2012
- 3,562 views
HCatalog is a table abstraction and a storage abstraction system that makes it easy for multiple tools to interact with the same underlying data. A common buzzword in the NoSQL world today is that of ...
HCatalog is a table abstraction and a storage abstraction system that makes it easy for multiple tools to interact with the same underlying data. A common buzzword in the NoSQL world today is that of polyglot persistence. Basically, what that comes down to is that you pick the right tool for the job. In the hadoop ecosystem, you have many tools that might be used for data processing - you might use pig or hive, or your own custom mapreduce program, or that shiny new GUI-based tool that's just come out. And which one to use might depend on the user, or on the type of query you're interested in, or the type of job we want to run. From another perspective, you might want to store your data in columnar storage for efficient storage and retrieval for particular query types, or in text so that users can write data producers in scripting languages like perl or python, or you may want to hook up that hbase table as a data source. As a end-user, I want to use whatever data processing tool is available to me. As a data designer, I want to optimize how data is stored. As a cluster manager / data architect, I want the ability to share pieces of information across the board, and move data back and forth fluidly. HCatalog's hopes and promises are the realization of all of the above.
- Total Views
- Views on SlideShare
- Embed Views