R language tutorial

10,731 views

Published on

R language tutorial

Published in: Technology
2 Comments
44 Likes
Statistics
Notes
No Downloads
Views
Total views
10,731
On SlideShare
0
From Embeds
0
Number of Embeds
134
Actions
Shares
0
Downloads
1,185
Comments
2
Likes
44
Embeds 0
No embeds

No notes for slide

R language tutorial

  1. 1. David ChiuR Language Tutorial14/23/2013 Confidential | Copyright 2013 Trend Micro Inc.
  2. 2. Background of R4/23/2013 2Confidential | Copyright 2012 Trend Micro Inc.
  3. 3. What is R?• GNU Project Developed by John Chambers @ Bell Lab• Free software environment for statistical computing and graphics• Functional programming language written primarily in C, Fortran4/23/2013 3Confidential | Copyright 2012 Trend Micro Inc.
  4. 4. R Language• R is functional programming language• R is an interpreted language• R is object oriented-language
  5. 5. Why Using R• Statistic analysis on the fly• Mathematical function and graphic module embedded• FREE! & Open Source!– http://cran.r-project.org/src/base/
  6. 6. Kagglehttp://www.kaggle.com/R is the most widely language used bykaggle participants
  7. 7. Data Scientist of these Companies Using RWhat is your programming language ofchoice, R, Python or something else?“I use R, and occasionally matlab, for data analysis. There isa large, active and extremely knowledgeable R community atGoogle.”http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/4/23/2013 7Confidential | Copyright 2013 Trend Micro Inc.“Expert knowledge of SAS (With EnterpriseGuide/Miner) required and candidates withstrong knowledge of R will be preferred”http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook
  8. 8. Commercial support for R• In 2007, Revolution Analytics providea commercial support forRevolution R– http://www.revolutionanalytics.com/products/revolution-r.php– http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php• Big Data Appliance, which integrates R, Apache Hadoop, OracleEnterprise Linux, and a NoSQL database with theExadata hardware– http://www.oracle.com/us/products/database/big-data-appliance/overview/index.html
  9. 9. Revolotion R• Free for Community Version– http://www.revolutionanalytics.com/downloads/– http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php4/23/2013 9Confidential | Copyright 2013 Trend Micro Inc.Base R 2.14.264Revolution R(1-core)Revolution R(4-core)Speedup (4 core)MatrixCalculation17.4 sec 2.9 sec 2.0 sec 7.9xMatrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8xProgram Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable
  10. 10. IDER Studio• http://www.rstudio.com/4/23/2013 10Confidential | Copyright 2013 Trend Micro Inc.RGUI• http://www.r-project.org/
  11. 11. Web App DevelopmentShiny makes it super simple for R users like you to turnanalyses into interactive web applications that anyonecan usehttp://www.rstudio.com/shiny/4/23/2013 11Confidential | Copyright 2013 Trend Micro Inc.
  12. 12. Package Management• CRAN (Comprehensive R Archive Network)4/23/2013 12Confidential | Copyright 2013 Trend Micro Inc.Repository URLCRAN http://cran.r-project.org/web/packages/Bioconductor http://www.bioconductor.org/packages/release/Software.htmlR-Forge http://r-forge.r-project.org/
  13. 13. R Basic4/23/2013 13Confidential | Copyright 2012 Trend Micro Inc.
  14. 14. Basic Command• help()– help(demo)• demo()– demo(is.things)• q()• ls()• rm()– rm(x)4/23/2013 14Confidential | Copyright 2013 Trend Micro Inc.
  15. 15. Basic Object• Vector• List• Factor• Array• Matrix• Data Frame4/23/2013 15Confidential | Copyright 2013 Trend Micro Inc.
  16. 16. Objects & Arithmetic• Scalar– x=3; y<-5; x+y• Vectors– x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y;– x =seq(1,10); y= 2:11; x+y– x =seq(1,10,by=2); y =seq(1,10,length=2)– rep(c(5,8), 3)– x= c(1,2,3); length(x)4/23/2013 16Confidential | Copyright 2013 Trend Micro Inc.
  17. 17. Summaries and Subscripting• Summary– X = c(1,2,3,4,5,6,7,8,9,10)– mean(x), min(x), median(x), max(x), var(x)– summary(x)• Subscripting– x = c(1,2,3,4,5,6,7,8,9,10)– x[1:3]; x[c(1,3,5)];– x[c(1,3,5)] * 2 + x[c(2,2,2)]– x[-(1:6)]4/23/2013 17Confidential | Copyright 2013 Trend Micro Inc.
  18. 18. Lists• Contain a heterogeneous selection of objects– e <- list(thing="hat", size="8.25"); e– l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)– l$j– man = list(name="Qoo", height=183); man$name
  19. 19. Factor• Ordered collection of items to present categorical value• Different values that the factor can take are called levels• Factors– phone =factor(c(iphone, htc, iphone, samsung, iphone, samsung))– levels(phone)4/23/2013 19Confidential | Copyright 2013 Trend Micro Inc.
  20. 20. Matrices & Array• Array– An extension of a vector to more than two dimensions– a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))• Matrices– A vector to two dimensions – 2d-array– x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)– x = rbind(c(1,2,3),c(4,5,6)); dim(x)– x<-matrix(c(1,2,3,4,5,6),nr=3);– x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)– x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y– t(matrix(c(1,2,3,4),nr=2))– solve(matrix(c(1,2,3,4),nr=2))
  21. 21. Data Frame• Useful way to represent tabular data• essentially a matrix with named columns may alsoinclude non-numerical variables• Example– df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df
  22. 22. Function• Function– `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1– f <- function(x) {return(x^2 + 3)}create.vector.of.ones <- function(n) {return.vector <- NA;for (i in 1:n) {return.vector[i] <- 1;} return.vector;}– create.vector.of.ones(3)• Control Structures– If …else…– Repeat, for, while• Catch error – trycatch
  23. 23. Anonymous Function• Functional language Characteristic– apply.to.three <- function(f) {f(3)}– apply.to.three(function(x) {x * 7})
  24. 24. Objects and Classes• All R code manipulates objects.• Every object in R has a type• In assignment statements, R will copy the object, notjust the reference to the object Attributes
  25. 25. S3 & S4 Object• Many R functions were implemented using S3 methods• In S version 4 (hence S4), formal classes and methodswere introduced that allowed– Multiple arguments– Abstract types– inheritance.
  26. 26. OOP of S4• S4 OOP Example– setClass("Student", representation(name ="character", score="numeric"))– studenta = new ("Student", name="david", score=80 )– studentb = new ("Student", name="andy", score=90 )setMethod("show", signature("Student"),function(object) {cat(object@score+100)})– setGeneric("getscore", function(object)standardGeneric("getscore"))– Studenta
  27. 27. Packages• A package is a related set of functions, help files, anddata files that have been bundled together.• Basic Command– library(rpart)– CRAN– Install– (.packages())
  28. 28. Package used in Machine Learning forHackers4/23/2013 28Confidential | Copyright 2013 Trend Micro Inc.
  29. 29. Apply• Apply– Returns a vector or array or list of values obtained by applying afunction to margins of an array or matrix.– data <- cbind(c(1,2),c(3,4))– data.rowsum <- apply(data,1,sum)– data.colsum <- apply(data,2,sum)– data4/23/2013 29Confidential | Copyright 2013 Trend Micro Inc.
  30. 30. Apply• lapply– returns a list of the same length as X, each element of which isthe result of applying FUN to the corresponding element of X.• sapply– is a user-friendly version and wrapper of lapply by defaultreturning a vector, matrix or• vapply– is similar to sapply, but has a pre-specified type of returnvalue, so it can be safer (and sometimes faster) to use.4/23/2013 30Confidential | Copyright 2013 Trend Micro Inc.
  31. 31. File IO• Save and Load– x = USPersonalExpenditure– save(x, file="~/test.RData")– rm(x)– load("~/test.RData")– x
  32. 32. Charts and Graphics
  33. 33. Plotting Example– xrange = range(as.numeric(colnames(USPersonalExpenditure)));– yrange= range(USPersonalExpenditure);– plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )– for(i in 1:5) {lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,], type="b", lwd=1.5)}
  34. 34. IRIS Dataset• data()
  35. 35. IRIS Dataset• The Iris flower data set or Fishers Iris data set isa multivariate data set introduced by Sir RonaldFisher (1936) as an example ofdiscriminant analysis.[1] Itis sometimes called Andersons Iris data set– http://en.wikipedia.org/wiki/Iris_flower_data_set4/23/2013 35Confidential | Copyright 2013 Trend Micro Inc.Iris setosa Iris versicolor Iris virginica
  36. 36. Classification of IRIS• Classification Example– install.packages("e1071")– pairs(iris[1:4],main="Iris Data(red=setosa,green=versicolor,blue=virginica)", pch=21,bg=c("red","green3","blue")[unclass(iris$Species)])– classifier<-naiveBayes(iris[,1:4], iris[,5])– table(predict(classifier, iris[,-5]), iris[,5])– classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,-5]), iris[,5] + )– prediction = predict(classifier, iris[,1:4])• http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%AFve_Bayes4/23/2013 36Confidential | Copyright 2013 Trend Micro Inc.
  37. 37. Performance Tips• Use Built-in Math Functions• Use Environments for Lookup Tables• Use a Database to Query Large Data Sets• Preallocate Memory• Monitor How Much Memory You Are Using• Cleaning Up Objects• Functions for Big Data Sets• Parallel Computation with R
  38. 38. R for Machine Learning4/23/2013 38Confidential | Copyright 2012 Trend Micro Inc.
  39. 39. Helps of the Topic• ?read.delim– # Access a functions help file• ??base::delim– # Search for delim in all help files for functions in base• help.search("delimited")– # Search for delimited in all help files• RSiteSearch("parsing text")– # Search for the term parsing text on the R site.
  40. 40. Sample Code of Chapter 1• https://github.com/johnmyleswhite/ML_for_Hackers.git4/23/2013 40Confidential | Copyright 2013 Trend Micro Inc.
  41. 41. Reference & Resource4/23/2013 41Confidential | Copyright 2012 Trend Micro Inc.
  42. 42. Study Material• R in a nutshell4/23/2013 42Confidential | Copyright 2013 Trend Micro Inc.
  43. 43. Online Reference4/23/2013 43Confidential | Copyright 2013 Trend Micro Inc.
  44. 44. Community Resources for R help4/23/2013 44Confidential | Copyright 2013 Trend Micro Inc.
  45. 45. Resource• Websites– Stackoverflow– Cross Validated– R-help– R-devel– R-sig-*– Package-specific mailing list• Blog– R-bloggers• Twitter– https://twitter.com/#rstats• Quora– http://www.quora.com/R-software4/23/2013 45Confidential | Copyright 2013 Trend Micro Inc.
  46. 46. Resource (Con’d)• Conference– useR!– R in Finance– R in Insurance– Others– Joint Statistical Meetings– Royal Statistical Society Conference• Local User Group– http://blog.revolutionanalytics.com/local-r-groups.html• Taiwan R User Group– http://www.facebook.com/Tw.R.User– http://www.meetup.com/Taiwan-R/4/23/2013 46Confidential | Copyright 2013 Trend Micro Inc.
  47. 47. Thank You!4/23/2013 47Confidential | Copyright 2012 Trend Micro Inc.

×