Big Data in the CloudsAn example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus
Big Data Environment●Why? Why R? Why AppScale? WhyAWS/Eucalyptus?●Environments needing to process “big data” arein high-demand●Flexibility in deploying big data environments -AWS has Elastic MapReduce; Eucalyptus has ?
Goals●Deploy open source big data environment onIaaS●Same deployment method can be used on bothpublic and private IaaS (hybrid?)
R and RHadoop●http://www.r-project.org/ ● open source statistics software; very flexible, and powerful●http://www.revolutionanalytics.com/ ● Provides enterprise analytics software using R●https://github.com/RevolutionAnalytics/RHadoop/wiki
AppScale●http://www.appscale.com●PaaS that implements GoogleApp Engine APIs on differentpublic/private IaaS, and virtualenvironments.●http://www.slideshare.net/shatteredNirvana/intro-to-app-engine-and-appscale●Ships with Cloudera for back-end support of Google AppEngine MapReduce APIimplementation
AWS EC2/Eucalyptus●http://aws.amazon.com●Cloud API that has prettymuch become a standard●http://www.eucalyptus.com●Closely follows AWS APIs forEC2, S3, IAM (soon ELB,CloudWatch, and AutoScaling)
Ansible, R, RHadoop●Use git to grab Ansible playbook -https://github.com/hspencer77/ansible-r-appscale-playbook●Playbook installs R, and grabsrhdfs and rmr2 from RHadoop ● https://github. com/downloads/Revolution Analytics/RHadoop/rhdfs_1 .0.5.tar.gz ● https://github. com/downloads/Revolution Analytics/RHadoop/rmr2_2 .0.2.tar.gz
Test - Wordcount.R●Test deployment using wordcount program written in R -wordcount.R●SSH into head node, pull out wordcount.R file - tar zxfrmr2_2.0.2.tar.gz rmr2/tests/wordcount.R●Execute it - Rscript rmr2/tests/wordcount.R