SlideShare a Scribd company logo
1 of 29
Download to read offline
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium




                                       R on Amazon cloud

           Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)



                                                        2012




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Outline



      1   Getting started on Amazon cloud


      2   Some concrete applications using Hadoop


      3   About RBelgium




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)
             Account Number, Access Key ID, Secret Access Key, 509
             Certificate




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)
             Account Number, Access Key ID, Secret Access Key, 509
             Certificate
             S3, EC2, EMR, . . .




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)
             Account Number, Access Key ID, Secret Access Key, 509
             Certificate
             S3, EC2, EMR, . . .
             Not followed or some more info ?
             http://aws.amazon.com/documentation/gettingstarted/
             http://www.bucketexplorer.com/documentation/
             amazon-s3--what-is-my-aws-access-and-secret-key.html
             http://www.yusufhm.info/content/
             adding-x509-certificate-aws-iam-user-api-command-line-tools-0
             ...



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Why AWS?




             Simple to use Just start up an instance with an AMI
             Elastic: Auto-scaling groups (RAM,CPU) + Load balancing
             (I/O) + Elastic IPs
             On demand: anytime, what you want (limit to 20 EC2
             instances without demand), normal, spot, reserved and
             EBS-optimized (see http://aws.amazon.com/ec2/)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Which AMI(s)? (1/2)

             Bioconductor on Amazon cloud: http:
             //bioconductor.org/help/bioconductor-cloud-ami/
             MPI cluster on Amazon:
       Example
   1                 l i b r a r y ( Rmpi )
                     mpi . spawn . R s l a v e s ( )
   3                 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x
                             ) x +1)
                     mpi . c l o s e . R s l a v e s ( )
   5                 mpi . q u i t ( )

                                       Listing 1: ’Rmpi’ on EC2



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Which AMI(s)? (2/2)
             Parallel cluster on Amazon:
       Example
   1                  library ( parallel )
                      c l <− makePSOCKcluster ( c ( ’ 1 0 . 6 8 . 1 5 5 . 3 0 ’ , ’
                             10.68.155.45 ’ , ’ 10.68.155.65 ’ ) )
   3                  c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) )

                                      Listing 2: ’parallel’ on EC2

             Hadoop cluster on Amazon with RHadoop:
             https://github.com/RevolutionAnalytics/RHadoop/tree/
             master/rmr2/pkg/tools
             Storm cluster on Amazon:
             https://github.com/nathanmarz/storm-deploy
             SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise
             (Hadoop for batch + NoSQL for real-time), etc.
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
      Toy case
      Xβ=y




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
      Toy case
      Xβ=y
      solve(t(X)%*%X, t(X)%*%y)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
      Toy case
      Xβ=y
      solve(t(X)%*%X, t(X)%*%y)




                                                              =




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
       Toy case
       Xβ=y
       solve(t(X)%*%X, t(X)%*%y)




                                                              =


       Example
   1             l i b r a r y ( rmr2 )
                 X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) )
   3             y = a s . m a t r i x ( rnorm ( 2 0 0 ) )

                                    Listing 6: initializing variables

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (2/4)


       Example
   1         tXX =
               values (
   3           from . d f s (
               mapreduce (
   5           input = X,
               map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%∗%Xi ) ,
   7            % reduce = reducerFunction ,
               combine = TRUE) ) ) [ [ 1 ] ]

                             Listing 7: ’rmr2’ matrix multiplication




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (3/4)


       Example
              tXy =
   2              values (
                  from . d f s (
   4              mapreduce (
                  input = X,
   6             map = f u n c t i o n ( k , X i )
                  k e y v a l ( 1 , l i s t ( t ( Xi ) %∗% y ) ) ,
   8              combine = TRUE) ) ) [ [ 1 ] ]
              s o l v e ( tXX , tXy )

                                        Listing 8: ’rmr2’ solving




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


How to debug (4/4)




      Debugging
      rmr.str(varName)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR with segue package


       Example
   1              l i b r a r y ( segue )
                  s e t C r e d e n t i a l s (” accessKey ” ,” secretAccessKey ”)
   3              m y C l u s t e r <− c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 ,
                          m a s t e r I n s t a n c e T y p e=”m1 . s m a l l ” ,
                  s l a v e I n s t a n c e T y p e=”m1 . s m a l l ” , l o c a t i o n=” us−e a s t −1a
                         ”)
   5              R e s u l t L i s t<−e m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc )
                  stopCluster ()

                                Listing 9: R on EMR with ’segue’




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR using the API command (1/3)

             Upload the numberList file (integers from 1 to 100 with one
             integer per line) and the following R scripts: ”mapper.r” and
             ”reducer.r” to your AWS S3
             Run the command line in your bash:
      Example
          . / e l a s t i c −mapreduce −−c r e a t e −−s t r e a m −−i n p u t s 3 : / /
                 y o u r b u c k e t / n u m b e r L i s t . t x t −−mapper s 3 : / /
                 y o u r b u c k e t / mapper . r −−r e d u c e r s 3 : / / y o u r b u c k e t /
                 r e d u c e r . r −−o u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s −−
                 name EMRexampleR1 −−num−i n s t a n c e s 1

                                  Listing 10: Running R on EMR



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR using the API command (2/3)


       Example
   1             #! / u s r / b i n / env R s c r i p t
                 t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$
                           ) ” , ”” , l i n e )
   3             con <− f i l e ( ” s t d i n ” , open = ” r ” )
                 w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn
                          = FALSE ) ) > 0 ) {
   5                   l i n e <− t r i m W h i t e S p a c e ( l i n e )
                      c a t ( a s . n u m e r i c ( l i n e ) , ”  t ” , ” n” , s e p=” ” )
   7             }

          Listing 11: Running simple R scripts on EMR - mapper script




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR using the API command (2/3)


       Example
   1         #! / u s r / b i n / env R s c r i p t
              t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$
                      ) ” , ”” , l i n e )
   3          con <− f i l e ( ” s t d i n ” , open = ” r ” )
              x <− c ( )
   5          w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn
                     = FALSE ) ) > 0 ) {
       x <− c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) )
   7          }
              c a t ( mean ( x ) )

          Listing 12: Running simple R scripts on EMR - reducer script



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


How to debug (4/4)




       Debugging
       Debug first your R code in local with the command line:
                  c a t i n p u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g
                        mapper . r o u t . t x t ;
   2              c a t o u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g
                        r e d u c e r . r 2>&1

                    Listing 13: Debugging R code before using EMR

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Tips with EMR
             Be careful between s3 and s3n, either you use one or the other
             but not both. For more information about the differences
             between s3 and s3n, see
             http://stackoverflow.com/questions/10569455/difference-
             between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6
             2012).
             The first line of the file must be well written to call the right
             language (such as #! /usr/bin/env Rscript" for R or
             #!/usr/bin/python for python). If this file is called by
             another one then this is not necessary (ex: an R script calls an
             R function from another file, the R function file does not need
             to start with #! /usr/bin/env Rscript).
             the output directory may NOT exist before launching your
             EMR job, otherwise the job will always FAIL. Use
             s3://yourProjects/project1 instead of s3://project1.
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Projects in RBelgium

             http://www.heritagehealthprize.com/c/hhp




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Projects in RBelgium

             http://www.heritagehealthprize.com/c/hhp




             Text Mining using real “text” data extracted from the
             database systems of a project-partner

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium members (1/3)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium members (2/3)
       Example
                 mygroup <− ” RBelgium ”
   2             # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API
                 l i b r a r y ( RJSONIO , R c u r l )
   4             # library for plotting
                 l i b r a r y ( ggplot2 )
   6             # g e t member d a t a from meetup . com
                 domain . u r l<−p a s t e ( ” h t t p s : / / a p i . meetup . com/ 2 /
                        members ? k e y=” , mykey , ”&s i g n=t r u e&g r o u p u r l n a m e
                        =RBelgium ” , c o l l a p s e=” ” , s e p=” ” )
   8             domain . g e t<−getURL ( domain . u r l )
                 domain . d a t a<−fromJSON ( domain . g e t )
  10             # d i s p l a y i n g names
                 p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n (
                         x ) x $name ) ) )



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium members (3/3)
       Example
   1             # p l o t t i n g graph
                 j o i n s <− u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s ,
                        f u n c t i o n ( x ) x$ j o i n e d ) )
   3             o r d e r e d J o i n s <− j o i n s [ o r d e r ( j o i n s ) ]
                 l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=”
                        1970−01−01” )
   5             d f <− d a t a . f r a m e (
                             x=l a b ,
   7                         y =1: l e n g t h ( domain . d a t a $ r e s u l t s )
                             )
   9             png ( ” memberJoined . png ” )
                 ggplot ( df ) +
  11                     geom p o i n t ( a e s ( x = x , y = y ) ) +
                         x l a b ( ” Date ” ) +
  13                     y l a b ( ”#members ” )
                 dev . o f f ( )

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium on internet


             Website: http://www.meetup.com/RBelgium/ (68
             members)
             Website: http://www.rbelgium.be
             Twitter: twitter.com/rbelgium (5 followers)
             LinkedIn: http://www.linkedin.com/groups/
             RBelgium-4223869?gid=4223869&trk=hb_side_g (7
             members)
             Google group:
             http://groups.google.com/group/rbelgium,
             rbelgium@googlegroups.com



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Questions?




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud

More Related Content

Similar to R belgium 20121116-awson-cloud-beamer

Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of HadoopAsif Ali
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBoxlzap
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Servicesstephenjbarr
 
Building an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleBuilding an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleMerelda
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
SAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At WorkSAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At WorkJuergen Schmerder
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 

Similar to R belgium 20121116-awson-cloud-beamer (20)

Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Services
 
Building an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleBuilding an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable Scale
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
SAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At WorkSAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At Work
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 

Recently uploaded

JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfJORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfArturo Pacheco Alvarez
 
Personal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley DennisPersonal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley Dennisjocksofalltradespodc
 
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...
Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...Nitya salvi
 
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docxAlbania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docxWorld Wide Tickets And Hospitality
 
Unveiling the Mystery of Main Bazar Chart
Unveiling the Mystery of Main Bazar ChartUnveiling the Mystery of Main Bazar Chart
Unveiling the Mystery of Main Bazar ChartChart Kalyan
 
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Eticketing.co
 
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuropean Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuro Cup 2024 Tickets
 
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room packageWhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room packageNitya salvi
 
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxSlovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxWorld Wide Tickets And Hospitality
 
Cricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfLatiyalinfotech
 
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxNetherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxEuro Cup 2024 Tickets
 
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)Delhi Call girls
 
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...baharayali
 
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfJORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfArturo Pacheco Alvarez
 
Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...
Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...
Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...Eticketing.co
 
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeTechnical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeOptics-Trade
 
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Marina Costa
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfJORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
 
Personal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley DennisPersonal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley Dennis
 
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...
Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...
 
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docxAlbania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
 
Unveiling the Mystery of Main Bazar Chart
Unveiling the Mystery of Main Bazar ChartUnveiling the Mystery of Main Bazar Chart
Unveiling the Mystery of Main Bazar Chart
 
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
 
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuropean Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
 
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room packageWhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
 
Slovenia Vs Serbia Eurovision odds Slovenia have top.docx
Slovenia Vs Serbia Eurovision odds Slovenia have top.docxSlovenia Vs Serbia Eurovision odds Slovenia have top.docx
Slovenia Vs Serbia Eurovision odds Slovenia have top.docx
 
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxSlovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
 
Cricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdf
 
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxNetherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
 
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
 
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
 
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfJORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
 
Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...
Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...
Austria vs France Austria Euro 2024 squad Ralf Rangnick's full team ahead of ...
 
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeTechnical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
 
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

R belgium 20121116-awson-cloud-beamer

  • 1. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on Amazon cloud Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) 2012 Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 2. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Outline 1 Getting started on Amazon cloud 2 Some concrete applications using Hadoop 3 About RBelgium Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 3. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 4. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 5. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . . Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 6. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . . Not followed or some more info ? http://aws.amazon.com/documentation/gettingstarted/ http://www.bucketexplorer.com/documentation/ amazon-s3--what-is-my-aws-access-and-secret-key.html http://www.yusufhm.info/content/ adding-x509-certificate-aws-iam-user-api-command-line-tools-0 ... Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 7. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Why AWS? Simple to use Just start up an instance with an AMI Elastic: Auto-scaling groups (RAM,CPU) + Load balancing (I/O) + Elastic IPs On demand: anytime, what you want (limit to 20 EC2 instances without demand), normal, spot, reserved and EBS-optimized (see http://aws.amazon.com/ec2/) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 8. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Which AMI(s)? (1/2) Bioconductor on Amazon cloud: http: //bioconductor.org/help/bioconductor-cloud-ami/ MPI cluster on Amazon: Example 1 l i b r a r y ( Rmpi ) mpi . spawn . R s l a v e s ( ) 3 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x ) x +1) mpi . c l o s e . R s l a v e s ( ) 5 mpi . q u i t ( ) Listing 1: ’Rmpi’ on EC2 Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 9. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Which AMI(s)? (2/2) Parallel cluster on Amazon: Example 1 library ( parallel ) c l <− makePSOCKcluster ( c ( ’ 1 0 . 6 8 . 1 5 5 . 3 0 ’ , ’ 10.68.155.45 ’ , ’ 10.68.155.65 ’ ) ) 3 c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) ) Listing 2: ’parallel’ on EC2 Hadoop cluster on Amazon with RHadoop: https://github.com/RevolutionAnalytics/RHadoop/tree/ master/rmr2/pkg/tools Storm cluster on Amazon: https://github.com/nathanmarz/storm-deploy SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise (Hadoop for batch + NoSQL for real-time), etc. Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 10. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 11. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 12. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) = Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 13. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) = Example 1 l i b r a r y ( rmr2 ) X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) ) 3 y = a s . m a t r i x ( rnorm ( 2 0 0 ) ) Listing 6: initializing variables Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 14. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (2/4) Example 1 tXX = values ( 3 from . d f s ( mapreduce ( 5 input = X, map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%∗%Xi ) , 7 % reduce = reducerFunction , combine = TRUE) ) ) [ [ 1 ] ] Listing 7: ’rmr2’ matrix multiplication Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 15. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (3/4) Example tXy = 2 values ( from . d f s ( 4 mapreduce ( input = X, 6 map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi ) %∗% y ) ) , 8 combine = TRUE) ) ) [ [ 1 ] ] s o l v e ( tXX , tXy ) Listing 8: ’rmr2’ solving Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 16. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium How to debug (4/4) Debugging rmr.str(varName) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 17. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR with segue package Example 1 l i b r a r y ( segue ) s e t C r e d e n t i a l s (” accessKey ” ,” secretAccessKey ”) 3 m y C l u s t e r <− c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 , m a s t e r I n s t a n c e T y p e=”m1 . s m a l l ” , s l a v e I n s t a n c e T y p e=”m1 . s m a l l ” , l o c a t i o n=” us−e a s t −1a ”) 5 R e s u l t L i s t<−e m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc ) stopCluster () Listing 9: R on EMR with ’segue’ Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 18. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR using the API command (1/3) Upload the numberList file (integers from 1 to 100 with one integer per line) and the following R scripts: ”mapper.r” and ”reducer.r” to your AWS S3 Run the command line in your bash: Example . / e l a s t i c −mapreduce −−c r e a t e −−s t r e a m −−i n p u t s 3 : / / y o u r b u c k e t / n u m b e r L i s t . t x t −−mapper s 3 : / / y o u r b u c k e t / mapper . r −−r e d u c e r s 3 : / / y o u r b u c k e t / r e d u c e r . r −−o u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s −− name EMRexampleR1 −−num−i n s t a n c e s 1 Listing 10: Running R on EMR Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 19. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { 5 l i n e <− t r i m W h i t e S p a c e ( l i n e ) c a t ( a s . n u m e r i c ( l i n e ) , ” t ” , ” n” , s e p=” ” ) 7 } Listing 11: Running simple R scripts on EMR - mapper script Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 20. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) x <− c ( ) 5 w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { x <− c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) ) 7 } c a t ( mean ( x ) ) Listing 12: Running simple R scripts on EMR - reducer script Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 21. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium How to debug (4/4) Debugging Debug first your R code in local with the command line: c a t i n p u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g mapper . r o u t . t x t ; 2 c a t o u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g r e d u c e r . r 2>&1 Listing 13: Debugging R code before using EMR Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 22. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Tips with EMR Be careful between s3 and s3n, either you use one or the other but not both. For more information about the differences between s3 and s3n, see http://stackoverflow.com/questions/10569455/difference- between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6 2012). The first line of the file must be well written to call the right language (such as #! /usr/bin/env Rscript" for R or #!/usr/bin/python for python). If this file is called by another one then this is not necessary (ex: an R script calls an R function from another file, the R function file does not need to start with #! /usr/bin/env Rscript). the output directory may NOT exist before launching your EMR job, otherwise the job will always FAIL. Use s3://yourProjects/project1 instead of s3://project1. Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 23. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Projects in RBelgium http://www.heritagehealthprize.com/c/hhp Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 24. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Projects in RBelgium http://www.heritagehealthprize.com/c/hhp Text Mining using real “text” data extracted from the database systems of a project-partner Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 25. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium members (1/3) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 26. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium members (2/3) Example mygroup <− ” RBelgium ” 2 # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API l i b r a r y ( RJSONIO , R c u r l ) 4 # library for plotting l i b r a r y ( ggplot2 ) 6 # g e t member d a t a from meetup . com domain . u r l<−p a s t e ( ” h t t p s : / / a p i . meetup . com/ 2 / members ? k e y=” , mykey , ”&s i g n=t r u e&g r o u p u r l n a m e =RBelgium ” , c o l l a p s e=” ” , s e p=” ” ) 8 domain . g e t<−getURL ( domain . u r l ) domain . d a t a<−fromJSON ( domain . g e t ) 10 # d i s p l a y i n g names p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x $name ) ) ) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 27. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium members (3/3) Example 1 # p l o t t i n g graph j o i n s <− u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x$ j o i n e d ) ) 3 o r d e r e d J o i n s <− j o i n s [ o r d e r ( j o i n s ) ] l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=” 1970−01−01” ) 5 d f <− d a t a . f r a m e ( x=l a b , 7 y =1: l e n g t h ( domain . d a t a $ r e s u l t s ) ) 9 png ( ” memberJoined . png ” ) ggplot ( df ) + 11 geom p o i n t ( a e s ( x = x , y = y ) ) + x l a b ( ” Date ” ) + 13 y l a b ( ”#members ” ) dev . o f f ( ) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 28. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium on internet Website: http://www.meetup.com/RBelgium/ (68 members) Website: http://www.rbelgium.be Twitter: twitter.com/rbelgium (5 followers) LinkedIn: http://www.linkedin.com/groups/ RBelgium-4223869?gid=4223869&trk=hb_side_g (7 members) Google group: http://groups.google.com/group/rbelgium, rbelgium@googlegroups.com Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 29. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Questions? Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud