SlideShare a Scribd company logo
1 of 20
Download to read offline
Big Data Step-by-Step
                              Boston Predictive Analytics
                                 Big Data Workshop
                                Microsoft New England Research &
                               Development Center, Cambridge, MA
                                    Saturday, March 10, 2012



                                                            by Jeffrey Breen

                                                        President and Co-Founder
         http://atms.gr/bigdata0310                   Atmosphere Research Group
                                                      email: jeffrey@atmosgrp.com
                                                             Twitter: @JeffreyBreen

Saturday, March 10, 2012
n ee d a
             just         AM
                  mo re R
           little
                   Big Data Infrastructure
                           Part 2: Running R + RStudio on Amazon EC2




    Code & more on github:
    https://github.com/jeffreybreen/tutorial-201203-big-data
Saturday, March 10, 2012
Overview

                    • Sometimes you just need a little more
                           RAM, CPU, or disk space than you have
                    • Let’s try launching an instance on Amazon
                           EC2 and configuring it to do some work
                    • We’ll install R and RStudio and call it a day


Saturday, March 10, 2012
Some details we’ll skip
                    • Signing up (it’s not that hard)
                           http://aws.amazon.com/ec2/

                    • Pricing (it keeps dropping)
                           http://aws.amazon.com/ec2/pricing/

                    • The alphabet soup of services (we care
                           about EC2 computing and S3 storage)



Saturday, March 10, 2012
Just look for biggest button on the page...




Saturday, March 10, 2012
Select an Amazon Machine Image
                ami-7385461a is a good, recent 64-bit CentOS
                image published by RightScale




Saturday, March 10, 2012
Only use EBS images
                • Instance-storage machines lose their data
                      upon shutdown (termination)
                • EBS instances can be stopped and restarted,
                      or terminated when you’re done forever




Saturday, March 10, 2012
Pick a size
                See http://aws.amazon.com/ec2/instance-types/




                                          Already out of date! Amazon introduced
                                          new “m1.medium” instance type this week.




Saturday, March 10, 2012
Avoid Premature Termination
                Set Termination Protection + Shutdown Behavior




Saturday, March 10, 2012
Name your instance




Saturday, March 10, 2012
Create a key pair
                Don’t forget to download it (and keep it safe!)




Saturday, March 10, 2012
Create a Security Group
                All TCP, UDP and ICMP from your IP address




Saturday, March 10, 2012
Don’t know your IP address?
                Don’t ask me. Ask Google!




                (simply append “/32” when entering into firewall rules)



Saturday, March 10, 2012
3... 2... 1...




Saturday, March 10, 2012
State = running
                Up and running at specified domain name




Saturday, March 10, 2012
Time to get all command line
                    • You’ll need an ssh client and the key pair we
                           generated in order to connect with your
                           instance
                    • We’ll use the Cloudera VM to control versions,
                           options, etc.
                    • ssh won’t use your key pair if its file permissions
                           are too lax
                           $ chmod og-rwx rstudio-ec2.pem


                    • Log in as root to your domain name
                           $ ssh -i rstudio-ec2.pem root@YOURDOMAINHERE.amazonaws.com
                                                        (from previous slide)



Saturday, March 10, 2012
Install R and RStudio
                • Create a user login for yourself (RStudio needs this)
                      # useradd jbreen
                      # passwd jbreen


                • EPEL is already installed, so R is easy
                      # yum -y install R


                • Follow RStudio’s download instructions
                      http://www.rstudio.org/download/server
                      # wget http://download2.rstudio.org/rstudio-server-0.95.262-x86_64.rpm

                      # rpm -Uvh rstudio-server-0.95.262-x86_64.rpm


                • Browse to port 8787 and use the login and password
                      e.g., http://ec2-107-22-109-130.compute-1.amazonaws.com:8787/



Saturday, March 10, 2012
Success!




Saturday, March 10, 2012
The meter’s running
                    • Amazon charges by the hour (or fraction
                           thereof). So when you’re done, you should
                           probably shutdown
                    • via command line
                           $ sudo shutdown -h now


                    • or with the “Stop” Instance Action in the
                           AWS Management Console
                    • (use “Terminate” if you never want to use it
                           again)


Saturday, March 10, 2012
Next up:
                           How to launch Hadoop
                            clusters in the cloud
                            without really trying



Saturday, March 10, 2012

More Related Content

What's hot

12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocratJonathan Linowes
 
How to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureHow to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureTiago Simões
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyPuppet
 
Globus toolkit4installationguide
Globus toolkit4installationguideGlobus toolkit4installationguide
Globus toolkit4installationguideAdarsh Patil
 
Getting Started with Ansible
Getting Started with AnsibleGetting Started with Ansible
Getting Started with Ansibleahamilton55
 
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformDrupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformHector Iribarne
 
Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Naseem Khoodoruth
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Maarten Balliauw
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 
[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rate[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rateNalee Jang
 
[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike BackAlex Liu
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Using OpenStack With Fog
Using OpenStack With FogUsing OpenStack With Fog
Using OpenStack With FogMike Hagedorn
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloudTahsin Hasan
 
An introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspyAn introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspyvpetersson
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.Graham Dumpleton
 
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityCloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
Running your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloudRunning your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloudArun Gupta
 

What's hot (20)

12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat
 
How to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureHow to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architecture
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David Nalley
 
Globus toolkit4installationguide
Globus toolkit4installationguideGlobus toolkit4installationguide
Globus toolkit4installationguide
 
Getting Started with Ansible
Getting Started with AnsibleGetting Started with Ansible
Getting Started with Ansible
 
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformDrupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
 
Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 
[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rate[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rate
 
[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Using OpenStack With Fog
Using OpenStack With FogUsing OpenStack With Fog
Using OpenStack With Fog
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloud
 
An introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspyAn introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
 
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityCloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
 
Running your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloudRunning your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloud
 

Similar to Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2

Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2Kornel Lugosi
 
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2cmcavoy
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Tom Laszewski
 
Austin Web Architecture
Austin Web ArchitectureAustin Web Architecture
Austin Web Architecturejoaquincasares
 
The Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with RubyThe Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with RubyRobert Dempsey
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 
Sqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengrenSqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengrenAndy Galbraith
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMRFaizan Javed
 
Data in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie LermanData in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie LermanJulie Lerman
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
MongoDB - Who, What & Where!
MongoDB - Who, What & Where!MongoDB - Who, What & Where!
MongoDB - Who, What & Where!Mark Hillick
 
Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]Matthew McCullough
 
Creating Your Own Static Website Generator
Creating Your Own Static Website GeneratorCreating Your Own Static Website Generator
Creating Your Own Static Website GeneratorSean O'Mahoney
 
Getting started with AWS
Getting started with AWSGetting started with AWS
Getting started with AWSJungwon Seo
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventJohn Schneider
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Amazon Web Services
 

Similar to Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2 (20)

Debian on EC2
Debian on EC2Debian on EC2
Debian on EC2
 
Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2
 
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series
 
Austin Web Architecture
Austin Web ArchitectureAustin Web Architecture
Austin Web Architecture
 
The Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with RubyThe Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with Ruby
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Sqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengrenSqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengren
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR
 
Data in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie LermanData in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie Lerman
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
MongoDB - Who, What & Where!
MongoDB - Who, What & Where!MongoDB - Who, What & Where!
MongoDB - Who, What & Where!
 
Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]
 
Creating Your Own Static Website Generator
Creating Your Own Static Website GeneratorCreating Your Own Static Website Generator
Creating Your Own Static Website Generator
 
Getting started with AWS
Getting started with AWSGetting started with AWS
Getting started with AWS
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:Invent
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
 
Harnessing The Cloud
Harnessing The CloudHarnessing The Cloud
Harnessing The Cloud
 

More from Jeffrey Breen

Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from RJeffrey Breen
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewJeffrey Breen
 

More from Jeffrey Breen (8)

Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overview
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2

  • 1. Big Data Step-by-Step Boston Predictive Analytics Big Data Workshop Microsoft New England Research & Development Center, Cambridge, MA Saturday, March 10, 2012 by Jeffrey Breen President and Co-Founder http://atms.gr/bigdata0310 Atmosphere Research Group email: jeffrey@atmosgrp.com Twitter: @JeffreyBreen Saturday, March 10, 2012
  • 2. n ee d a just AM mo re R little Big Data Infrastructure Part 2: Running R + RStudio on Amazon EC2 Code & more on github: https://github.com/jeffreybreen/tutorial-201203-big-data Saturday, March 10, 2012
  • 3. Overview • Sometimes you just need a little more RAM, CPU, or disk space than you have • Let’s try launching an instance on Amazon EC2 and configuring it to do some work • We’ll install R and RStudio and call it a day Saturday, March 10, 2012
  • 4. Some details we’ll skip • Signing up (it’s not that hard) http://aws.amazon.com/ec2/ • Pricing (it keeps dropping) http://aws.amazon.com/ec2/pricing/ • The alphabet soup of services (we care about EC2 computing and S3 storage) Saturday, March 10, 2012
  • 5. Just look for biggest button on the page... Saturday, March 10, 2012
  • 6. Select an Amazon Machine Image ami-7385461a is a good, recent 64-bit CentOS image published by RightScale Saturday, March 10, 2012
  • 7. Only use EBS images • Instance-storage machines lose their data upon shutdown (termination) • EBS instances can be stopped and restarted, or terminated when you’re done forever Saturday, March 10, 2012
  • 8. Pick a size See http://aws.amazon.com/ec2/instance-types/ Already out of date! Amazon introduced new “m1.medium” instance type this week. Saturday, March 10, 2012
  • 9. Avoid Premature Termination Set Termination Protection + Shutdown Behavior Saturday, March 10, 2012
  • 11. Create a key pair Don’t forget to download it (and keep it safe!) Saturday, March 10, 2012
  • 12. Create a Security Group All TCP, UDP and ICMP from your IP address Saturday, March 10, 2012
  • 13. Don’t know your IP address? Don’t ask me. Ask Google! (simply append “/32” when entering into firewall rules) Saturday, March 10, 2012
  • 14. 3... 2... 1... Saturday, March 10, 2012
  • 15. State = running Up and running at specified domain name Saturday, March 10, 2012
  • 16. Time to get all command line • You’ll need an ssh client and the key pair we generated in order to connect with your instance • We’ll use the Cloudera VM to control versions, options, etc. • ssh won’t use your key pair if its file permissions are too lax $ chmod og-rwx rstudio-ec2.pem • Log in as root to your domain name $ ssh -i rstudio-ec2.pem root@YOURDOMAINHERE.amazonaws.com (from previous slide) Saturday, March 10, 2012
  • 17. Install R and RStudio • Create a user login for yourself (RStudio needs this) # useradd jbreen # passwd jbreen • EPEL is already installed, so R is easy # yum -y install R • Follow RStudio’s download instructions http://www.rstudio.org/download/server # wget http://download2.rstudio.org/rstudio-server-0.95.262-x86_64.rpm # rpm -Uvh rstudio-server-0.95.262-x86_64.rpm • Browse to port 8787 and use the login and password e.g., http://ec2-107-22-109-130.compute-1.amazonaws.com:8787/ Saturday, March 10, 2012
  • 19. The meter’s running • Amazon charges by the hour (or fraction thereof). So when you’re done, you should probably shutdown • via command line $ sudo shutdown -h now • or with the “Stop” Instance Action in the AWS Management Console • (use “Terminate” if you never want to use it again) Saturday, March 10, 2012
  • 20. Next up: How to launch Hadoop clusters in the cloud without really trying Saturday, March 10, 2012