Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Getting started on Amazon cloud                  Some concrete applications using Hadoop                                  ...
Upcoming SlideShare
Loading in...5
×

R belgium 20121116-awson-cloud-beamer

159

Published on

Published in: Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
159
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

R belgium 20121116-awson-cloud-beamer

  1. 1. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on Amazon cloud Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) 2012Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  2. 2. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumOutline 1 Getting started on Amazon cloud 2 Some concrete applications using Hadoop 3 About RBelgiumJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  3. 3. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  4. 4. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 CertificateJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  5. 5. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . .Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  6. 6. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumBasics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . . Not followed or some more info ? http://aws.amazon.com/documentation/gettingstarted/ http://www.bucketexplorer.com/documentation/ amazon-s3--what-is-my-aws-access-and-secret-key.html http://www.yusufhm.info/content/ adding-x509-certificate-aws-iam-user-api-command-line-tools-0 ...Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  7. 7. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumWhy AWS? Simple to use Just start up an instance with an AMI Elastic: Auto-scaling groups (RAM,CPU) + Load balancing (I/O) + Elastic IPs On demand: anytime, what you want (limit to 20 EC2 instances without demand), normal, spot, reserved and EBS-optimized (see http://aws.amazon.com/ec2/)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  8. 8. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumWhich AMI(s)? (1/2) Bioconductor on Amazon cloud: http: //bioconductor.org/help/bioconductor-cloud-ami/ MPI cluster on Amazon: Example 1 l i b r a r y ( Rmpi ) mpi . spawn . R s l a v e s ( ) 3 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x ) x +1) mpi . c l o s e . R s l a v e s ( ) 5 mpi . q u i t ( ) Listing 1: ’Rmpi’ on EC2Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  9. 9. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumWhich AMI(s)? (2/2) Parallel cluster on Amazon: Example 1 library ( parallel ) c l <− makePSOCKcluster ( c ( ’ 1 0 . 6 8 . 1 5 5 . 3 0 ’ , ’ 10.68.155.45 ’ , ’ 10.68.155.65 ’ ) ) 3 c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) ) Listing 2: ’parallel’ on EC2 Hadoop cluster on Amazon with RHadoop: https://github.com/RevolutionAnalytics/RHadoop/tree/ master/rmr2/pkg/tools Storm cluster on Amazon: https://github.com/nathanmarz/storm-deploy SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise (Hadoop for batch + NoSQL for real-time), etc.Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  10. 10. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=yJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  11. 11. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  12. 12. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) =Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  13. 13. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) = Example 1 l i b r a r y ( rmr2 ) X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) ) 3 y = a s . m a t r i x ( rnorm ( 2 0 0 ) ) Listing 6: initializing variablesJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  14. 14. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (2/4) Example 1 tXX = values ( 3 from . d f s ( mapreduce ( 5 input = X, map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%∗%Xi ) , 7 % reduce = reducerFunction , combine = TRUE) ) ) [ [ 1 ] ] Listing 7: ’rmr2’ matrix multiplicationJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  15. 15. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumUsing rmr2 in Hadoop framework (3/4) Example tXy = 2 values ( from . d f s ( 4 mapreduce ( input = X, 6 map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi ) %∗% y ) ) , 8 combine = TRUE) ) ) [ [ 1 ] ] s o l v e ( tXX , tXy ) Listing 8: ’rmr2’ solvingJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  16. 16. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumHow to debug (4/4) Debugging rmr.str(varName)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  17. 17. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR with segue package Example 1 l i b r a r y ( segue ) s e t C r e d e n t i a l s (” accessKey ” ,” secretAccessKey ”) 3 m y C l u s t e r <− c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 , m a s t e r I n s t a n c e T y p e=”m1 . s m a l l ” , s l a v e I n s t a n c e T y p e=”m1 . s m a l l ” , l o c a t i o n=” us−e a s t −1a ”) 5 R e s u l t L i s t<−e m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc ) stopCluster () Listing 9: R on EMR with ’segue’Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  18. 18. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR using the API command (1/3) Upload the numberList file (integers from 1 to 100 with one integer per line) and the following R scripts: ”mapper.r” and ”reducer.r” to your AWS S3 Run the command line in your bash: Example . / e l a s t i c −mapreduce −−c r e a t e −−s t r e a m −−i n p u t s 3 : / / y o u r b u c k e t / n u m b e r L i s t . t x t −−mapper s 3 : / / y o u r b u c k e t / mapper . r −−r e d u c e r s 3 : / / y o u r b u c k e t / r e d u c e r . r −−o u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s −− name EMRexampleR1 −−num−i n s t a n c e s 1 Listing 10: Running R on EMRJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  19. 19. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { 5 l i n e <− t r i m W h i t e S p a c e ( l i n e ) c a t ( a s . n u m e r i c ( l i n e ) , ” t ” , ” n” , s e p=” ” ) 7 } Listing 11: Running simple R scripts on EMR - mapper scriptJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  20. 20. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumR on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) x <− c ( ) 5 w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { x <− c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) ) 7 } c a t ( mean ( x ) ) Listing 12: Running simple R scripts on EMR - reducer scriptJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  21. 21. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumHow to debug (4/4) Debugging Debug first your R code in local with the command line: c a t i n p u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g mapper . r o u t . t x t ; 2 c a t o u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g r e d u c e r . r 2>&1 Listing 13: Debugging R code before using EMRJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  22. 22. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumTips with EMR Be careful between s3 and s3n, either you use one or the other but not both. For more information about the differences between s3 and s3n, see http://stackoverflow.com/questions/10569455/difference- between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6 2012). The first line of the file must be well written to call the right language (such as #! /usr/bin/env Rscript" for R or #!/usr/bin/python for python). If this file is called by another one then this is not necessary (ex: an R script calls an R function from another file, the R function file does not need to start with #! /usr/bin/env Rscript). the output directory may NOT exist before launching your EMR job, otherwise the job will always FAIL. Use s3://yourProjects/project1 instead of s3://project1.Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  23. 23. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumProjects in RBelgium http://www.heritagehealthprize.com/c/hhpJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  24. 24. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumProjects in RBelgium http://www.heritagehealthprize.com/c/hhp Text Mining using real “text” data extracted from the database systems of a project-partnerJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  25. 25. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium members (1/3)Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  26. 26. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium members (2/3) Example mygroup <− ” RBelgium ” 2 # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API l i b r a r y ( RJSONIO , R c u r l ) 4 # library for plotting l i b r a r y ( ggplot2 ) 6 # g e t member d a t a from meetup . com domain . u r l<−p a s t e ( ” h t t p s : / / a p i . meetup . com/ 2 / members ? k e y=” , mykey , ”&s i g n=t r u e&g r o u p u r l n a m e =RBelgium ” , c o l l a p s e=” ” , s e p=” ” ) 8 domain . g e t<−getURL ( domain . u r l ) domain . d a t a<−fromJSON ( domain . g e t ) 10 # d i s p l a y i n g names p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x $name ) ) )Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  27. 27. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium members (3/3) Example 1 # p l o t t i n g graph j o i n s <− u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x$ j o i n e d ) ) 3 o r d e r e d J o i n s <− j o i n s [ o r d e r ( j o i n s ) ] l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=” 1970−01−01” ) 5 d f <− d a t a . f r a m e ( x=l a b , 7 y =1: l e n g t h ( domain . d a t a $ r e s u l t s ) ) 9 png ( ” memberJoined . png ” ) ggplot ( df ) + 11 geom p o i n t ( a e s ( x = x , y = y ) ) + x l a b ( ” Date ” ) + 13 y l a b ( ”#members ” ) dev . o f f ( )Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  28. 28. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumRBelgium on internet Website: http://www.meetup.com/RBelgium/ (68 members) Website: http://www.rbelgium.be Twitter: twitter.com/rbelgium (5 followers) LinkedIn: http://www.linkedin.com/groups/ RBelgium-4223869?gid=4223869&trk=hb_side_g (7 members) Google group: http://groups.google.com/group/rbelgium, rbelgium@googlegroups.comJean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  29. 29. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgiumQuestions?Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×