Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why run Hadoop in AWS? Getting Started on Hadoop


Published on

Why run Hadoop in AWS?

• elastic: batch jobs on clusters can consume many nodes,
scalable demand, not 24/7 – great case for using EC2
• commodity hardware: MR is built for fault tolerance, great
case for leveraging AMIs
• right-sizing: difficult to know a priori how large of a cluster
is needed – without running significant jobs (test k/v skew,
data quality, etc.)
• when your input data is already in S3, SDB, EBS, RDS…
• when your output needs to be consumed in AWS …
You really don't want to buy rack space in a datacenter before
assessing these issues – besides, a private datacenter probably
won’t even be cost-effective afterward.

Published in: Technology