0
Automated Hadoop Clusters on EC2    Mark Kerzner     SHMsoft
What is Hadoop? :) :) :)Everybody knows that... What is your definition?
What is a cloud?Everybody knows that, but1.   Elastic resources2.   Internet delivery3.   SAAS4.   Virtualization5.   Devi...
You are the Hadoop programmer... and you need toolsWhat are your alternatives?● IDE● Local "cluster"● Pseudo-distributed c...
You are the Hadoop programmer... and you need toolsWhat are your alternatives?● IDE - compile and run the code● Local "clu...
What are your resources●   Tom White, "Hadoop, the Definitive Guide"●   www.hadoopilluminated.com
For real play, you need a cluster
Hadoop+ (oh, by the way...)HBase, Cassandra, MongoDB, NoSQL,Dynamo, BigTable, Dryad (MS), Azure (MS),MapReduce, MapR (EMC)...
WhirrSetupexport AWS_ACCESS_KEY_ID=...export AWS_SECRET_ACCESS_KEY=...Installcurl -O http://www.apache.org/dist/whirr/whir...
Whirr limitations● No EBS● All or nothing● Generates configuration artifacts● Takes over your computer, no more local  dev...
Amazon EMR
EMR limitations●   No choice of image●   Fixed architecture●   Hard to debug●   Hard to customize
You do itRepeat the manual procedure, only automate itPrepareAMI, Java, HadoopOn-the-flyStart AMI, login, configure, start...
You do it - advancedOn startupUnder-provision, over-provision, progressOn-the-flyMonitor, run test jobs, watch for cluster...
Cloudera Manager
MapR Manager
On the large scaleHadoop 0.20 - up to 4,000 nodesHadoop 0.23 - up to 20,000GridGain - 100s of 1,000s
Thank youQuestions?
Upcoming SlideShare
Loading in...5
×

Automated Hadoop Cluster Construction on EC2

1,572

Published on

Presented at Houston Hadoop Meetup

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,572
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Automated Hadoop Cluster Construction on EC2"

  1. 1. Automated Hadoop Clusters on EC2 Mark Kerzner SHMsoft
  2. 2. What is Hadoop? :) :) :)Everybody knows that... What is your definition?
  3. 3. What is a cloud?Everybody knows that, but1. Elastic resources2. Internet delivery3. SAAS4. Virtualization5. Device-enabled6. Only (1) or all of the above
  4. 4. You are the Hadoop programmer... and you need toolsWhat are your alternatives?● IDE● Local "cluster"● Pseudo-distributed cluster● EC2
  5. 5. You are the Hadoop programmer... and you need toolsWhat are your alternatives?● IDE - compile and run the code● Local "cluster" - local file system● Pseudo-distributed cluster - test outside● EC2 - test on the cluster, test for scale
  6. 6. What are your resources● Tom White, "Hadoop, the Definitive Guide"● www.hadoopilluminated.com
  7. 7. For real play, you need a cluster
  8. 8. Hadoop+ (oh, by the way...)HBase, Cassandra, MongoDB, NoSQL,Dynamo, BigTable, Dryad (MS), Azure (MS),MapReduce, MapR (EMC), Clouderadistribution, EMC distribution, IBM distribution...
  9. 9. WhirrSetupexport AWS_ACCESS_KEY_ID=...export AWS_SECRET_ACCESS_KEY=...Installcurl -O http://www.apache.org/dist/whirr/whirr-0.7.1/whirr-0.7.1.tar.gztar zxf whirr-0.7.1.tar.gz; cd whirr-0.7.1Generate keysssh-keygen -t rsa -P -f ~/.ssh/id_rsa_whirrRunbin/whirr launch-cluster --config recipes/zookeeper-ec2.properties --private-key-file ~/.ssh/id_rsa_whirr
  10. 10. Whirr limitations● No EBS● All or nothing● Generates configuration artifacts● Takes over your computer, no more local development - uses proxy● Hard to customize
  11. 11. Amazon EMR
  12. 12. EMR limitations● No choice of image● Fixed architecture● Hard to debug● Hard to customize
  13. 13. You do itRepeat the manual procedure, only automate itPrepareAMI, Java, HadoopOn-the-flyStart AMI, login, configure, start services,verify, run test jobs
  14. 14. You do it - advancedOn startupUnder-provision, over-provision, progressOn-the-flyMonitor, run test jobs, watch for clusterdeterioration
  15. 15. Cloudera Manager
  16. 16. MapR Manager
  17. 17. On the large scaleHadoop 0.20 - up to 4,000 nodesHadoop 0.23 - up to 20,000GridGain - 100s of 1,000s
  18. 18. Thank youQuestions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×