Large Scale ETL with Hadoop
by esammer on Oct 29, 2012
- 1,062 views
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, ...
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
Accessibility
Categories
Upload Details
Uploaded via SlideShare as Apple Keynote
Usage Rights
© All Rights Reserved
Statistics
- Likes
- 3
- Downloads
- 63
- Comments
- 0
- Embed Views
- Views on SlideShare
- 1,062
- Total Views
- 1,062