Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus

1,148

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,148
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data in the CloudsAn example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus
  • 2. Big Data Environment●Why? Why R? Why AppScale? WhyAWS/Eucalyptus?●Environments needing to process “big data” arein high-demand●Flexibility in deploying big data environments -AWS has Elastic MapReduce; Eucalyptus has ?
  • 3. Goals●Deploy open source big data environment onIaaS●Same deployment method can be used on bothpublic and private IaaS (hybrid?)
  • 4. The Architecture
  • 5. Ansible●http://www.ansibleworks.com/●Open SourceConfiguration Managementusing SSH●Flexible, powerful, efficient,secure●http://ansible.cc/docs/
  • 6. R and RHadoop●http://www.r-project.org/ ● open source statistics software; very flexible, and powerful●http://www.revolutionanalytics.com/ ● Provides enterprise analytics software using R●https://github.com/RevolutionAnalytics/RHadoop/wiki
  • 7. AppScale●http://www.appscale.com●PaaS that implements GoogleApp Engine APIs on differentpublic/private IaaS, and virtualenvironments.●http://www.slideshare.net/shatteredNirvana/intro-to-app-engine-and-appscale●Ships with Cloudera for back-end support of Google AppEngine MapReduce APIimplementation
  • 8. AWS EC2/Eucalyptus●http://aws.amazon.com●Cloud API that has prettymuch become a standard●http://www.eucalyptus.com●Closely follows AWS APIs forEC2, S3, IAM (soon ELB,CloudWatch, and AutoScaling)
  • 9. Deployment
  • 10. AWS/Eucalyptus●Account/User Credentials ● EC2_ACCESS_KEY ● EC2_SECRET_KEY ● EC2_URL●IAM policy for EC2 policies tolaunch instances, create securitygroups, authorize ports, imagemanagement (bundle, upload, andregister)
  • 11. AppScale● Pre-built AppScale Images ● AWS - ami-4e472227 ● Eucalyptus - AppScale image found @ http://emis- catalog.s3.amazonaws. com/index.html● appscale-tools - https://github.com/AppScale/appscale-tools ● appscale init cloud ● edit AppScaleFile ● appscale up
  • 12. Ansible, R, RHadoop●Use git to grab Ansible playbook -https://github.com/hspencer77/ansible-r-appscale-playbook●Playbook installs R, and grabsrhdfs and rmr2 from RHadoop ● https://github. com/downloads/Revolution Analytics/RHadoop/rhdfs_1 .0.5.tar.gz ● https://github. com/downloads/Revolution Analytics/RHadoop/rmr2_2 .0.2.tar.gz
  • 13. Test - Wordcount.R●Test deployment using wordcount program written in R -wordcount.R●SSH into head node, pull out wordcount.R file - tar zxfrmr2_2.0.2.tar.gz rmr2/tests/wordcount.R●Execute it - Rscript rmr2/tests/wordcount.R
  • 14. Results
  • 15. Contact InfoAppScale - hannah@appscale.com Eucalyptus - harold.spencer. jr@eucalyptus.com
  • 16. Questions? Demo

×