But Other Math Stuff!• Mathematica• MatLab• Minitab• MAPLE• Excel (yes. shutup h8rs. ask your CFOs what they use)• R provides sophisticated statistical and modeling capabilities, and is extendible through your own code
Get R• Available for Linux, Mac, Windows• http://www.r-project.org/
Fire!• R console on Mac• Interactive interpreter for your R needs• Can also run from the command line: R
R Basics• R considers all elements to be vectors• A single number is a one-element vector• Use <- for assignment• Use c() to concatenate values into a vector
Practice Datasets• data()• shows the sample sets included with your R
Functions• Looks familiar!• Let’s see one!• “evencount” counts the number of even ints in a vector
Datatypes• Vectors, the important ones• Scalars are really single-element vectors• Character strings• Matrices, rectangular arrays of numbers• Lists• Tables, useful for data transitions and temp work
Vectors• R’s most-used data structure• All elements in a vector must have the same mode or data type• To add values to a vector, you concatenate into it with the c() function• Many mathematical functions can be performed on a vector, they can also be traversed like arrays• Index starts at 1, not 0!
Character Strings• Single-element vectors • Can do normal string with mode character things, like > t <- paste("yo","dawg") > y <- "abc" > t > length(y)  "yo dawg"  1 > u <- strsplit(t,"") > mode(y) > u  "character" []  "y" "o" " " "d" "a" "w" "g"
Lists• Contain elements of different types• Have a particular syntax > x <- list(u=2, v="abc") > x $u  2 $v  "abc" > x$u  2
Data Frames• Matrices are limited to only a single type for all elements• A data frame can contain different types of data, can be read in from a ﬁle or created in realtime > df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8))) > df kids ages 1 Olivia 10 2 Madison 8 > df$ages  10 8
Putting R to Work• Read in a log ﬁle: access <- read.table("access.log", header=FALSE) > head(access) V1 V2 V3 V4 V5 V6 V7 V8 1 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 401 401 2 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 200 1970 3 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.css HTTP/1.1 200 2258
Fun with Plots• This plot series is going to make use of the “return codes” from the access log• We’ll do a series of plots that gradually get more sophisticated• This is a basic histogram of the data, it’s not much fun
Writing Graphical Output to Files• Set up the output target by calling a graphics function:• pdf(), png(), jpeg(), etc• jpeg(“/var/www/images/returncodes-date.jpg”)• Call the plot function you have chosen, then call dev.off()• Can be used in batch mode to create graphics from your data
Shopping is Hard, Let’s Do Math• Read in some load averages (one-min) loadavg<-read.table("load_avg.txt") head(loadavg) V1 1 3.79 2 3.11 3 2.94 4 4.81
Summary Stats• Summarize the data with one function call• Gives the min, max, mean, median, and quartiles summary(loadavg) V1 Min. :0.760 1st Qu.:1.390 Median :1.970 Mean :2.302 3rd Qu.:3.080 Max. :5.070
Same Thing, 3 Datacenters > cpu<-read.table("cpu") > head(cpu) V1 V2 1 3.78 smq 2 2.57 smq 3 3.69 smq 4 0.86 smq • Looks like there’s outliers. That could spell trouble! You found them with R awesomeness. Horay!boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=topo.colors(3))
Running R in Your Workﬂow • The little bit of boxplotting we did eariler, in a script:[mandi@mandi ~]$ cat sample.R#!/usr/bin/env Rscriptcpu<-read.table("cpu")jpeg("./sample.jpg")boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, byDatacenter", ylab="One-Minute Load Average", main="Box Plotof One-Minute Load Average, FEs", col=heat.colors(3))dev.off()[mandi@mandi ~]$ Rscript sample.R > /dev/null[mandi@mandi ~]$ ls -l sample.jpg-rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg
What Else?• R can read data input from a variety of ﬁles with regular formats• R can also fetch data from the internet using the url() function• R has a number of functions available for dealing with reading data, creating data frames or other structures, and converting string text into numerical data modes• Extended packages provide support for structured data formats like JSON.
References• http://www.slideshare.net/dataspora/an- interactive-introduction-to-r-programming- language-for-statistics• http://www.harding.edu/fmccown/R/• Art of R Programming, Norman Matloff, Copyright 2011 No Starch Press• Statistical Analysis with R, John M. Quick, Copyright 2011 Packt Publishing