Your SlideShare is downloading. ×
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2

40,477

Published on

Part 2 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation is geared towards anyone with an occasional need for more computing power. …

Part 2 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation is geared towards anyone with an occasional need for more computing power.

We walk through the mechanics of launching a instance on Amazon's EC2, install some software (like R and RStudio), and make sure it all works.

Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012.

Published in: Technology, Business
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
40,477
On Slideshare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
651
Comments
0
Likes
12
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Step-by-Step Boston Predictive Analytics Big Data Workshop Microsoft New England Research & Development Center, Cambridge, MA Saturday, March 10, 2012 by Jeffrey Breen President and Co-Founder http://atms.gr/bigdata0310 Atmosphere Research Group email: jeffrey@atmosgrp.com Twitter: @JeffreyBreenSaturday, March 10, 2012
  • 2. n ee d a just AM mo re R little Big Data Infrastructure Part 2: Running R + RStudio on Amazon EC2 Code & more on github: https://github.com/jeffreybreen/tutorial-201203-big-dataSaturday, March 10, 2012
  • 3. Overview • Sometimes you just need a little more RAM, CPU, or disk space than you have • Let’s try launching an instance on Amazon EC2 and configuring it to do some work • We’ll install R and RStudio and call it a daySaturday, March 10, 2012
  • 4. Some details we’ll skip • Signing up (it’s not that hard) http://aws.amazon.com/ec2/ • Pricing (it keeps dropping) http://aws.amazon.com/ec2/pricing/ • The alphabet soup of services (we care about EC2 computing and S3 storage)Saturday, March 10, 2012
  • 5. Just look for biggest button on the page...Saturday, March 10, 2012
  • 6. Select an Amazon Machine Image ami-7385461a is a good, recent 64-bit CentOS image published by RightScaleSaturday, March 10, 2012
  • 7. Only use EBS images • Instance-storage machines lose their data upon shutdown (termination) • EBS instances can be stopped and restarted, or terminated when you’re done foreverSaturday, March 10, 2012
  • 8. Pick a size See http://aws.amazon.com/ec2/instance-types/ Already out of date! Amazon introduced new “m1.medium” instance type this week.Saturday, March 10, 2012
  • 9. Avoid Premature Termination Set Termination Protection + Shutdown BehaviorSaturday, March 10, 2012
  • 10. Name your instanceSaturday, March 10, 2012
  • 11. Create a key pair Don’t forget to download it (and keep it safe!)Saturday, March 10, 2012
  • 12. Create a Security Group All TCP, UDP and ICMP from your IP addressSaturday, March 10, 2012
  • 13. Don’t know your IP address? Don’t ask me. Ask Google! (simply append “/32” when entering into firewall rules)Saturday, March 10, 2012
  • 14. 3... 2... 1...Saturday, March 10, 2012
  • 15. State = running Up and running at specified domain nameSaturday, March 10, 2012
  • 16. Time to get all command line • You’ll need an ssh client and the key pair we generated in order to connect with your instance • We’ll use the Cloudera VM to control versions, options, etc. • ssh won’t use your key pair if its file permissions are too lax $ chmod og-rwx rstudio-ec2.pem • Log in as root to your domain name $ ssh -i rstudio-ec2.pem root@YOURDOMAINHERE.amazonaws.com (from previous slide)Saturday, March 10, 2012
  • 17. Install R and RStudio • Create a user login for yourself (RStudio needs this) # useradd jbreen # passwd jbreen • EPEL is already installed, so R is easy # yum -y install R • Follow RStudio’s download instructions http://www.rstudio.org/download/server # wget http://download2.rstudio.org/rstudio-server-0.95.262-x86_64.rpm # rpm -Uvh rstudio-server-0.95.262-x86_64.rpm • Browse to port 8787 and use the login and password e.g., http://ec2-107-22-109-130.compute-1.amazonaws.com:8787/Saturday, March 10, 2012
  • 18. Success!Saturday, March 10, 2012
  • 19. The meter’s running • Amazon charges by the hour (or fraction thereof). So when you’re done, you should probably shutdown • via command line $ sudo shutdown -h now • or with the “Stop” Instance Action in the AWS Management Console • (use “Terminate” if you never want to use it again)Saturday, March 10, 2012
  • 20. Next up: How to launch Hadoop clusters in the cloud without really tryingSaturday, March 10, 2012

×