Your SlideShare is downloading. ×
  • Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus



Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Big Data in the CloudsAn example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus
  • 2. Big Data Environment●Why? Why R? Why AppScale? WhyAWS/Eucalyptus?●Environments needing to process “big data” arein high-demand●Flexibility in deploying big data environments -AWS has Elastic MapReduce; Eucalyptus has ?
  • 3. Goals●Deploy open source big data environment onIaaS●Same deployment method can be used on bothpublic and private IaaS (hybrid?)
  • 4. The Architecture
  • 5. Ansible●●Open SourceConfiguration Managementusing SSH●Flexible, powerful, efficient,secure●
  • 6. R and RHadoop● ● open source statistics software; very flexible, and powerful● ● Provides enterprise analytics software using R●
  • 7. AppScale●●PaaS that implements GoogleApp Engine APIs on differentpublic/private IaaS, and virtualenvironments.●●Ships with Cloudera for back-end support of Google AppEngine MapReduce APIimplementation
  • 8. AWS EC2/Eucalyptus●●Cloud API that has prettymuch become a standard●●Closely follows AWS APIs forEC2, S3, IAM (soon ELB,CloudWatch, and AutoScaling)
  • 9. Deployment
  • 10. AWS/Eucalyptus●Account/User Credentials ● EC2_ACCESS_KEY ● EC2_SECRET_KEY ● EC2_URL●IAM policy for EC2 policies tolaunch instances, create securitygroups, authorize ports, imagemanagement (bundle, upload, andregister)
  • 11. AppScale● Pre-built AppScale Images ● AWS - ami-4e472227 ● Eucalyptus - AppScale image found @ http://emis- catalog.s3.amazonaws. com/index.html● appscale-tools - ● appscale init cloud ● edit AppScaleFile ● appscale up
  • 12. Ansible, R, RHadoop●Use git to grab Ansible playbook -●Playbook installs R, and grabsrhdfs and rmr2 from RHadoop ● https://github. com/downloads/Revolution Analytics/RHadoop/rhdfs_1 .0.5.tar.gz ● https://github. com/downloads/Revolution Analytics/RHadoop/rmr2_2 .0.2.tar.gz
  • 13. Test - Wordcount.R●Test deployment using wordcount program written in R -wordcount.R●SSH into head node, pull out wordcount.R file - tar zxfrmr2_2.0.2.tar.gz rmr2/tests/wordcount.R●Execute it - Rscript rmr2/tests/wordcount.R
  • 14. Results
  • 15. Contact InfoAppScale - Eucalyptus - harold.spencer.
  • 16. Questions? Demo