• Email
  • Like
  • Save
  • Private Content
  • Embed
 

Large scale ETL with Hadoop

by on Oct 29, 2012

  • 5,075 views

Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, ...

Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.

Accessibility

Upload Details

Uploaded via SlideShare as Apple Keynote

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

3 Embeds 115

http://www.scoop.it 93
https://twitter.com 21
https://www.rebelmouse.com 1

Statistics

Likes
12
Downloads
401
Comments
0
Embed Views
115
Views on SlideShare
4,960
Total Views
5,075
Post Comment
Edit your comment

Large scale ETL with Hadoop Large scale ETL with Hadoop Presentation Transcript