0
Big Data in the    CloudsAn example using Ansible, R, RHadoop,  and AppScale to deploy a big data   environment on AWS/Euc...
Big Data          Environment●Why? Why R? Why AppScale? WhyAWS/Eucalyptus?●Environments needing to process “big data” arei...
Goals●Deploy open source big data environment onIaaS●Same deployment method can be used on bothpublic and private IaaS (hy...
The Architecture
Ansible●http://www.ansibleworks.com/●Open SourceConfiguration Managementusing SSH●Flexible, powerful, efficient,secure●htt...
R and RHadoop●http://www.r-project.org/    ● open source statistics        software; very flexible, and        powerful●ht...
AppScale●http://www.appscale.com●PaaS that implements GoogleApp Engine APIs on differentpublic/private IaaS, and virtualen...
AWS   EC2/Eucalyptus●http://aws.amazon.com●Cloud API that has prettymuch become a standard●http://www.eucalyptus.com●Close...
Deployment
AWS/Eucalyptus●Account/User Credentials     ● EC2_ACCESS_KEY     ● EC2_SECRET_KEY     ● EC2_URL●IAM policy for EC2 policie...
AppScale● Pre-built AppScale Images     ● AWS - ami-4e472227     ● Eucalyptus - AppScale         image found @ http://emis...
Ansible, R, RHadoop●Use git to grab Ansible playbook -https://github.com/hspencer77/ansible-r-appscale-playbook●Playbook i...
Test - Wordcount.R●Test deployment using wordcount program written in R -wordcount.R●SSH into head node, pull out wordcoun...
Results
Contact InfoAppScale - hannah@appscale.com   Eucalyptus - harold.spencer.       jr@eucalyptus.com
Questions?  Demo
Upcoming SlideShare
Loading in...5
×

Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus

1,185

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,185
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus"

  1. 1. Big Data in the CloudsAn example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus
  2. 2. Big Data Environment●Why? Why R? Why AppScale? WhyAWS/Eucalyptus?●Environments needing to process “big data” arein high-demand●Flexibility in deploying big data environments -AWS has Elastic MapReduce; Eucalyptus has ?
  3. 3. Goals●Deploy open source big data environment onIaaS●Same deployment method can be used on bothpublic and private IaaS (hybrid?)
  4. 4. The Architecture
  5. 5. Ansible●http://www.ansibleworks.com/●Open SourceConfiguration Managementusing SSH●Flexible, powerful, efficient,secure●http://ansible.cc/docs/
  6. 6. R and RHadoop●http://www.r-project.org/ ● open source statistics software; very flexible, and powerful●http://www.revolutionanalytics.com/ ● Provides enterprise analytics software using R●https://github.com/RevolutionAnalytics/RHadoop/wiki
  7. 7. AppScale●http://www.appscale.com●PaaS that implements GoogleApp Engine APIs on differentpublic/private IaaS, and virtualenvironments.●http://www.slideshare.net/shatteredNirvana/intro-to-app-engine-and-appscale●Ships with Cloudera for back-end support of Google AppEngine MapReduce APIimplementation
  8. 8. AWS EC2/Eucalyptus●http://aws.amazon.com●Cloud API that has prettymuch become a standard●http://www.eucalyptus.com●Closely follows AWS APIs forEC2, S3, IAM (soon ELB,CloudWatch, and AutoScaling)
  9. 9. Deployment
  10. 10. AWS/Eucalyptus●Account/User Credentials ● EC2_ACCESS_KEY ● EC2_SECRET_KEY ● EC2_URL●IAM policy for EC2 policies tolaunch instances, create securitygroups, authorize ports, imagemanagement (bundle, upload, andregister)
  11. 11. AppScale● Pre-built AppScale Images ● AWS - ami-4e472227 ● Eucalyptus - AppScale image found @ http://emis- catalog.s3.amazonaws. com/index.html● appscale-tools - https://github.com/AppScale/appscale-tools ● appscale init cloud ● edit AppScaleFile ● appscale up
  12. 12. Ansible, R, RHadoop●Use git to grab Ansible playbook -https://github.com/hspencer77/ansible-r-appscale-playbook●Playbook installs R, and grabsrhdfs and rmr2 from RHadoop ● https://github. com/downloads/Revolution Analytics/RHadoop/rhdfs_1 .0.5.tar.gz ● https://github. com/downloads/Revolution Analytics/RHadoop/rmr2_2 .0.2.tar.gz
  13. 13. Test - Wordcount.R●Test deployment using wordcount program written in R -wordcount.R●SSH into head node, pull out wordcount.R file - tar zxfrmr2_2.0.2.tar.gz rmr2/tests/wordcount.R●Execute it - Rscript rmr2/tests/wordcount.R
  14. 14. Results
  15. 15. Contact InfoAppScale - hannah@appscale.com Eucalyptus - harold.spencer. jr@eucalyptus.com
  16. 16. Questions? Demo
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×