R for Pirates     Mandi Walls      @lnxchk EscConf, Boston, MA  October 27, 2011
whoami• stats misfit• R tinkerer• large-farm runner• not a professional statistician :D
What is R• Scripting language for stats work• Inspired by earlier S (for statistics)  developed at AT&T• FOSS• Syntax inhe...
What Does R Do?•   Manipulate data•   Complex Modeling and    Computation•   Graphics and    Visualization
Why R?• WHY NOT!?
But Other Math Stuff!•   Mathematica•   MatLab•   Minitab•   MAPLE•   Excel (yes. shutup h8rs. ask your CFOs what they    ...
Get R• Available for Linux, Mac, Windows• http://www.r-project.org/
Fire!•   R console on Mac•   Interactive interpreter    for your R needs•   Can also run from the    command line: R
R Basics•   R considers all elements    to be vectors•   A single number is a    one-element vector•   Use <- for assignme...
Let’s see that again
Practice Datasets•   data()•   shows the sample sets    included with your R
Functions•   Looks familiar!•   Let’s see one!•   “evencount” counts the number of even ints in a vector
Datatypes•   Vectors, the important ones•   Scalars are really single-element vectors•   Character strings•   Matrices, re...
Vectors•   R’s most-used data structure•   All elements in a vector must have the same mode    or data type•   To add valu...
Scalars•   One-element vectors    > x <- 8    > x[1]    [1] 8•   also climb your rigging                                  ...
Character Strings•   Single-element vectors   •   Can do normal string    with mode character          things, like       ...
Matrices•   Two-dimensional array    > m <- rbind(c(1,4),c(2,2))    > m           [,1] [,2]    [1,]      1    4    [2,]   ...
Lists•   Contain elements of different types•   Have a particular syntax    > x <- list(u=2, v="abc")    > x    $u    [1] ...
Data Frames•   Matrices are limited to only a single type for all elements•   A data frame can contain different types of ...
Putting R to Work•   Read in a log file:    access <- read.table("access.log", header=FALSE)    > head(access)             ...
Fun with Plots• This plot series is going to   make use of the “return   codes” from the access log• We’ll do a series of ...
Barplotbarplot(table(access[,7]))
Barplot v2barplot(table(access[,7]),ylab="Number of Pages",xlab="ReturnCode",main="Plot of Return Codes")
Barplot v3barplot(table(access[,7]),ylab="Number ofPages",xlab="Return Code",main="Plot ofReturn Codes", col=heat.colors(l...
Barplot v4Source: wikipedia, http://en.wikipedia.org/wiki/Bar_%28establishment%29
Writing Graphical             Output to Files•   Set up the output target by calling a graphics function:•   pdf(), png(),...
Shopping is Hard, Let’s          Do Math•   Read in some load averages (one-min)    loadavg<-read.table("load_avg.txt")   ...
Summary Stats•   Summarize the data with one function call•   Gives the min, max, mean, median, and quartiles    summary(l...
Summary Stats as   Boxplot
Same Thing, 3                                  Datacenters               > cpu<-read.table("cpu")               > head(cpu...
Running R in Your              Workflow  •   The little bit of boxplotting we did eariler, in a script:[mandi@mandi ~]$ cat...
Hey!•   I made a graph with a    script!
What Else?•   R can read data input from a variety of files with regular    formats•   R can also fetch data from the inter...
References• http://www.slideshare.net/dataspora/an-  interactive-introduction-to-r-programming-  language-for-statistics• ...
R for Pirates. ESCCONF October 27, 2011
Upcoming SlideShare
Loading in …5
×

R for Pirates. ESCCONF October 27, 2011

989 views
852 views

Published on

Short deck on using R for some adhoc modeling, focused on data from things like system performance or application logs.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
989
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • R for Pirates. ESCCONF October 27, 2011

    1. 1. R for Pirates Mandi Walls @lnxchk EscConf, Boston, MA October 27, 2011
    2. 2. whoami• stats misfit• R tinkerer• large-farm runner• not a professional statistician :D
    3. 3. What is R• Scripting language for stats work• Inspired by earlier S (for statistics) developed at AT&T• FOSS• Syntax inherits through Algol family, so looks somewhat like C/C++
    4. 4. What Does R Do?• Manipulate data• Complex Modeling and Computation• Graphics and Visualization
    5. 5. Why R?• WHY NOT!?
    6. 6. But Other Math Stuff!• Mathematica• MatLab• Minitab• MAPLE• Excel (yes. shutup h8rs. ask your CFOs what they use)• R provides sophisticated statistical and modeling capabilities, and is extendible through your own code
    7. 7. Get R• Available for Linux, Mac, Windows• http://www.r-project.org/
    8. 8. Fire!• R console on Mac• Interactive interpreter for your R needs• Can also run from the command line: R
    9. 9. R Basics• R considers all elements to be vectors• A single number is a one-element vector• Use <- for assignment• Use c() to concatenate values into a vector
    10. 10. Let’s see that again
    11. 11. Practice Datasets• data()• shows the sample sets included with your R
    12. 12. Functions• Looks familiar!• Let’s see one!• “evencount” counts the number of even ints in a vector
    13. 13. Datatypes• Vectors, the important ones• Scalars are really single-element vectors• Character strings• Matrices, rectangular arrays of numbers• Lists• Tables, useful for data transitions and temp work
    14. 14. Vectors• R’s most-used data structure• All elements in a vector must have the same mode or data type• To add values to a vector, you concatenate into it with the c() function• Many mathematical functions can be performed on a vector, they can also be traversed like arrays• Index starts at 1, not 0!
    15. 15. Scalars• One-element vectors > x <- 8 > x[1] [1] 8• also climb your rigging ©Disney.
    16. 16. Character Strings• Single-element vectors • Can do normal string with mode character things, like > t <- paste("yo","dawg") > y <- "abc" > t > length(y) [1] "yo dawg" [1] 1 > u <- strsplit(t,"") > mode(y) > u [1] "character" [[1]] [1] "y" "o" " " "d" "a" "w" "g"
    17. 17. Matrices• Two-dimensional array > m <- rbind(c(1,4),c(2,2)) > m [,1] [,2] [1,] 1 4 [2,] 2 2 > m[1,2] [1] 4 > m[1,] [1] 1 4
    18. 18. Lists• Contain elements of different types• Have a particular syntax > x <- list(u=2, v="abc") > x $u [1] 2 $v [1] "abc" > x$u [1] 2
    19. 19. Data Frames• Matrices are limited to only a single type for all elements• A data frame can contain different types of data, can be read in from a file or created in realtime > df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8))) > df kids ages 1 Olivia 10 2 Madison 8 > df$ages [1] 10 8
    20. 20. Putting R to Work• Read in a log file: access <- read.table("access.log", header=FALSE) > head(access) V1 V2 V3 V4 V5 V6 V7 V8 1 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 401 401 2 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 200 1970 3 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.css HTTP/1.1 200 2258
    21. 21. Fun with Plots• This plot series is going to make use of the “return codes” from the access log• We’ll do a series of plots that gradually get more sophisticated• This is a basic histogram of the data, it’s not much fun
    22. 22. Barplotbarplot(table(access[,7]))
    23. 23. Barplot v2barplot(table(access[,7]),ylab="Number of Pages",xlab="ReturnCode",main="Plot of Return Codes")
    24. 24. Barplot v3barplot(table(access[,7]),ylab="Number ofPages",xlab="Return Code",main="Plot ofReturn Codes", col=heat.colors(length(x)))
    25. 25. Barplot v4Source: wikipedia, http://en.wikipedia.org/wiki/Bar_%28establishment%29
    26. 26. Writing Graphical Output to Files• Set up the output target by calling a graphics function:• pdf(), png(), jpeg(), etc• jpeg(“/var/www/images/returncodes-date.jpg”)• Call the plot function you have chosen, then call dev.off()• Can be used in batch mode to create graphics from your data
    27. 27. Shopping is Hard, Let’s Do Math• Read in some load averages (one-min) loadavg<-read.table("load_avg.txt") head(loadavg) V1 1 3.79 2 3.11 3 2.94 4 4.81
    28. 28. Summary Stats• Summarize the data with one function call• Gives the min, max, mean, median, and quartiles summary(loadavg) V1 Min. :0.760 1st Qu.:1.390 Median :1.970 Mean :2.302 3rd Qu.:3.080 Max. :5.070
    29. 29. Summary Stats as Boxplot
    30. 30. Same Thing, 3 Datacenters > cpu<-read.table("cpu") > head(cpu) V1 V2 1 3.78 smq 2 2.57 smq 3 3.69 smq 4 0.86 smq • Looks like there’s outliers. That could spell trouble! You found them with R awesomeness. Horay!boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=topo.colors(3))
    31. 31. Running R in Your Workflow • The little bit of boxplotting we did eariler, in a script:[mandi@mandi ~]$ cat sample.R#!/usr/bin/env Rscriptcpu<-read.table("cpu")jpeg("./sample.jpg")boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, byDatacenter", ylab="One-Minute Load Average", main="Box Plotof One-Minute Load Average, FEs", col=heat.colors(3))dev.off()[mandi@mandi ~]$ Rscript sample.R > /dev/null[mandi@mandi ~]$ ls -l sample.jpg-rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg
    32. 32. Hey!• I made a graph with a script!
    33. 33. What Else?• R can read data input from a variety of files with regular formats• R can also fetch data from the internet using the url() function• R has a number of functions available for dealing with reading data, creating data frames or other structures, and converting string text into numerical data modes• Extended packages provide support for structured data formats like JSON.
    34. 34. References• http://www.slideshare.net/dataspora/an- interactive-introduction-to-r-programming- language-for-statistics• http://www.harding.edu/fmccown/R/• Art of R Programming, Norman Matloff, Copyright 2011 No Starch Press• Statistical Analysis with R, John M. Quick, Copyright 2011 Packt Publishing

    ×