R belgium 20121116-awson-cloud-beamer
Upcoming SlideShare
Loading in...5
×
 

R belgium 20121116-awson-cloud-beamer

on

  • 213 views

 

Statistics

Views

Total Views
213
Slideshare-icon Views on SlideShare
213
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    R belgium 20121116-awson-cloud-beamer R belgium 20121116-awson-cloud-beamer Presentation Transcript

    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on Amazon cloud Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) 2012Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumOutline 1 Getting started on Amazon cloud 2 Some concrete applications using Hadoop 3 About RBelgiumJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 CertificateJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . .Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . . Not followed or some more info ? http://aws.amazon.com/documentation/gettingstarted/ http://www.bucketexplorer.com/documentation/ amazon-s3--what-is-my-aws-access-and-secret-key.html http://www.yusufhm.info/content/ adding-x509-certificate-aws-iam-user-api-command-line-tools-0 ...Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumWhy AWS? Simple to use Just start up an instance with an AMI Elastic: Auto-scaling groups (RAM,CPU) + Load balancing (I/O) + Elastic IPs On demand: anytime, what you want (limit to 20 EC2 instances without demand), normal, spot, reserved and EBS-optimized (see http://aws.amazon.com/ec2/)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumWhich AMI(s)? (1/2) Bioconductor on Amazon cloud: http: //bioconductor.org/help/bioconductor-cloud-ami/ MPI cluster on Amazon: Example 1 l i b r a r y ( Rmpi ) mpi . spawn . R s l a v e s ( ) 3 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x ) x +1) mpi . c l o s e . R s l a v e s ( ) 5 mpi . q u i t ( ) Listing 1: ’Rmpi’ on EC2Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumWhich AMI(s)? (2/2) Parallel cluster on Amazon: Example 1 library ( parallel ) c l <− makePSOCKcluster ( c ( ’ 1 0 . 6 8 . 1 5 5 . 3 0 ’ , ’ 10.68.155.45 ’ , ’ 10.68.155.65 ’ ) ) 3 c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) ) Listing 2: ’parallel’ on EC2 Hadoop cluster on Amazon with RHadoop: https://github.com/RevolutionAnalytics/RHadoop/tree/ master/rmr2/pkg/tools Storm cluster on Amazon: https://github.com/nathanmarz/storm-deploy SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise (Hadoop for batch + NoSQL for real-time), etc.Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=yJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) =Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) = Example 1 l i b r a r y ( rmr2 ) X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) ) 3 y = a s . m a t r i x ( rnorm ( 2 0 0 ) ) Listing 6: initializing variablesJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (2/4) Example 1 tXX = values ( 3 from . d f s ( mapreduce ( 5 input = X, map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%∗%Xi ) , 7 % reduce = reducerFunction , combine = TRUE) ) ) [ [ 1 ] ] Listing 7: ’rmr2’ matrix multiplicationJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (3/4) Example tXy = 2 values ( from . d f s ( 4 mapreduce ( input = X, 6 map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi ) %∗% y ) ) , 8 combine = TRUE) ) ) [ [ 1 ] ] s o l v e ( tXX , tXy ) Listing 8: ’rmr2’ solvingJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumHow to debug (4/4) Debugging rmr.str(varName)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR with segue package Example 1 l i b r a r y ( segue ) s e t C r e d e n t i a l s (” accessKey ” ,” secretAccessKey ”) 3 m y C l u s t e r <− c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 , m a s t e r I n s t a n c e T y p e=”m1 . s m a l l ” , s l a v e I n s t a n c e T y p e=”m1 . s m a l l ” , l o c a t i o n=” us−e a s t −1a ”) 5 R e s u l t L i s t<−e m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc ) stopCluster () Listing 9: R on EMR with ’segue’Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR using the API command (1/3) Upload the numberList file (integers from 1 to 100 with one integer per line) and the following R scripts: ”mapper.r” and ”reducer.r” to your AWS S3 Run the command line in your bash: Example . / e l a s t i c −mapreduce −−c r e a t e −−s t r e a m −−i n p u t s 3 : / / y o u r b u c k e t / n u m b e r L i s t . t x t −−mapper s 3 : / / y o u r b u c k e t / mapper . r −−r e d u c e r s 3 : / / y o u r b u c k e t / r e d u c e r . r −−o u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s −− name EMRexampleR1 −−num−i n s t a n c e s 1 Listing 10: Running R on EMRJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { 5 l i n e <− t r i m W h i t e S p a c e ( l i n e ) c a t ( a s . n u m e r i c ( l i n e ) , ” t ” , ” n” , s e p=” ” ) 7 } Listing 11: Running simple R scripts on EMR - mapper scriptJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) x <− c ( ) 5 w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { x <− c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) ) 7 } c a t ( mean ( x ) ) Listing 12: Running simple R scripts on EMR - reducer scriptJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumHow to debug (4/4) Debugging Debug first your R code in local with the command line: c a t i n p u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g mapper . r o u t . t x t ; 2 c a t o u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g r e d u c e r . r 2>&1 Listing 13: Debugging R code before using EMRJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumTips with EMR Be careful between s3 and s3n, either you use one or the other but not both. For more information about the differences between s3 and s3n, see http://stackoverflow.com/questions/10569455/difference- between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6 2012). The first line of the file must be well written to call the right language (such as #! /usr/bin/env Rscript" for R or #!/usr/bin/python for python). If this file is called by another one then this is not necessary (ex: an R script calls an R function from another file, the R function file does not need to start with #! /usr/bin/env Rscript). the output directory may NOT exist before launching your EMR job, otherwise the job will always FAIL. Use s3://yourProjects/project1 instead of s3://project1.Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumProjects in RBelgium http://www.heritagehealthprize.com/c/hhpJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumProjects in RBelgium http://www.heritagehealthprize.com/c/hhp Text Mining using real “text” data extracted from the database systems of a project-partnerJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium members (1/3)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium members (2/3) Example mygroup <− ” RBelgium ” 2 # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API l i b r a r y ( RJSONIO , R c u r l ) 4 # library for plotting l i b r a r y ( ggplot2 ) 6 # g e t member d a t a from meetup . com domain . u r l<−p a s t e ( ” h t t p s : / / a p i . meetup . com/ 2 / members ? k e y=” , mykey , ”&s i g n=t r u e&g r o u p u r l n a m e =RBelgium ” , c o l l a p s e=” ” , s e p=” ” ) 8 domain . g e t<−getURL ( domain . u r l ) domain . d a t a<−fromJSON ( domain . g e t ) 10 # d i s p l a y i n g names p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x $name ) ) )Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium members (3/3) Example 1 # p l o t t i n g graph j o i n s <− u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x$ j o i n e d ) ) 3 o r d e r e d J o i n s <− j o i n s [ o r d e r ( j o i n s ) ] l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=” 1970−01−01” ) 5 d f <− d a t a . f r a m e ( x=l a b , 7 y =1: l e n g t h ( domain . d a t a $ r e s u l t s ) ) 9 png ( ” memberJoined . png ” ) ggplot ( df ) + 11 geom p o i n t ( a e s ( x = x , y = y ) ) + x l a b ( ” Date ” ) + 13 y l a b ( ”#members ” ) dev . o f f ( )Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium on internet Website: http://www.meetup.com/RBelgium/ (68 members) Website: http://www.rbelgium.be Twitter: twitter.com/rbelgium (5 followers) LinkedIn: http://www.linkedin.com/groups/ RBelgium-4223869?gid=4223869&trk=hb_side_g (7 members) Google group: http://groups.google.com/group/rbelgium, rbelgium@googlegroups.comJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
    • Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumQuestions?Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud