R Language

dwivedishashwat@gmail.com
Background

• 1991: Created by Ross Ihaka and Robert Gentleman
• 2000: R version 1.0.0 is released
• Latest version is 2.15.2 released in Oct „12

• R version 2.15.3 is scheduled to release in Mar „12
and 3.0.0 is scheduled to be released in Apr ‟13
• http://www.r-project.org (basic information about R)
• http://www.cran.r-project.org (base system and
additional packages)
• help() or ?help, help.search() or ??help
Background

• R is a free software environment for
statistical computing and graphics
• Very active and vibrant user community
• Graphical capabilities
• Physical memory
• Base R and around 4000 packages

1/21/2014
Introduction
 memory.limit(): To find out maximum amount of available
physical memory
 memory.size(): To find out how much memory is in use
 getwd(): Shows the path of your current working directory
 setwd(path): Allows you to set a new path for your current
working directory

 dir(): List down all the files in your working directory
 Program Editor (open, load, run, save)
• ls(): List all objects in your workspace

• rm(): Removes object from your workspace
Introduction

 Commands to R are expressions (4/3) or assignments (x <- 4/3)
 R is case sensitive
 Everything in R is a object
 Normally R objects are accessed by their names which is made up from letters,
and digits (0 to 9) or a period (“.”) in non-initial positions.
 Every object has a class
 R has 5 basic classes of objects
 character
 numeric (real numbers)
 integers
 complex
 logical (True / False)

•

The most basic object is a vector

 A vector can only contains objects of the same class
Background
•

Ex.

•

 x <- 1 # assignment
 Print(x) # explicit printing
 X # auto printing
Ex.









•

Q.




•

x <- c(0.5, 0.6) # numeric
X <- c(TRUE, FALSE) # logical
X <- c(T, F) # logical
X <- c(“a”, “b”, “c”, “d”) # character
X <- 1:20 # integer
X <- c(1+0i, 2+4i) #complex
seq(from=1, to=10, by=1)
rep(c(1,2,3,4,5), times=2, each=2)

X <- c(1.7, “a”)
X <- c(TRUE, 2)
X <- c(“a”, TRUE)

When different objects are mixed in a vector, coercion occurs so that every element in the
vector is of same class
Session 1 Remaining

• rep(x, times, length.out, each)
• rm()
• rm(list=ls(pattern=“^test”))
• rm(list=ls(pattern=“test”))
• rm(list=setdiff(ls(), “test”))
• rm(list=ls())
Introduction

• Ex.
 X <- 0:6
 X <- c(“a”, “b”, “c”)
 X <- c(1, 2, 3)

 Numbers in R generally treated as numeric (i.e. double precision
real numbers)
 If you explicitly wants an integer then you need to specify L suffix
 Special number Inf (1/0), it actually a real number, 1/Inf will give
you 0
 Undefined value NaN (0/0) Not a Number, it can be though of as
missing value
• # indicates comments
Data Types

 R objects can have attributes (attributes())
 Class (class())
 Length (length())
 names (colnames for a matrix), dimnames (rownames, colnames for a matrix)

 dimensions (dim())
 other user defined attributes

 Various data types in R
 Vectors
 Vector(mode, length)

•

Lists: Special type of vectors which can contain objects of different
classes.

 x <- list(1,2,3,“a”,”b”,”c”)
 x <- list(a=c(1,2,3), b=1:4, c=c(“a”,”b”,”c”))
Data Types
 Matrix: vectors with dimension attribute. Dimension itself is an
integer vector of length 2 (nrow, ncol). Matrices are constructed
column wise.







m <- matrix(nrow=2, ncol=3)
m <- matrix(1:6, nrow=2, ncol=3)
x <- 1:3
y <- 10:12
cbind(x, y)
rbind(x,y)

 Data frames (data.frame())
 https://stat.ethz.ch/pipermail/rhelp/attachments/20101027/05a229bb/attachment.pl
 Factors: Used for categorical data i.e. Male & Female or analyst,
senior analyst, manager etc.






x <- factor(c(“a”, “b”, “b”, “c”, “c”, “c”, “d”))
levels()
unclass(x)
levels([4:6])
Levels([4:6, drop=TRUE])
Date & Time

 Converting a character variable to a date variable
 as.Date(variable_name, input_format)
 strptime(variable_name, input_format)
 Output will be %Y-%m-%d %H:%M:%S


%Y: Year with century



%m: Month as decimal number (01-12)



%d: Day of the month as decimal number(01-31)



%H: Hrs as decimal numbers (00-23)



%M: Minutes as decimal numbers (00-59)



%S: Seconda as decimal numbers (00-59)

 Converting a date variable to a character variable / formatting a date
variable
 strftime(date_variable_name, output_format)
 format(data_variable_name, output_format)
 as.character(date_variable_name, output_format)
Sub-setting

 [ always returns an object of the same class as the original; can be
used to select more than one element
 [[ is used to extract elements of list or data frames; it can only be
used to extract single element and the class of the returned object
will not necessarily be a list or data frame
 $ is used to extract elements of a list or data frames by names;
semantics are similar to [[
Operators

 <: Less than
 <=: Less than equals to
 >: Greater than
 >=: Greater than equals to
 ==: Exactly equals to
 !=: Not equal to

 | or II: OR
 & or &&: AND
 !: NOT
Some Examples

 x <- c(“a”, “b”, “c”, “c”, “d”, “a”)
 x[1], x[1:4], x[x > “a”], u <- x >”a”

 x <- matrix(1:6,2,3)
 x[1,2], x[1,], x[,1], x[1,2, drop=FALSE]

 x <- list(var_1=c(1:10), var_2=c(“a”, “b”, “c”), var_3=0.6)
 x[1], x[[1]], x$var_1
 name <- “var_1”, x[name], x[[name]], x$name
 x[c(1,3)], x[[c(1,3)]], x[[1]][[3]]

 Produce a character vector containing var_1, var_2, var_3… var_999
 Remove missing values from x <- c(1, 2, 3, NA, 4, 5, NA, 6)
 y <- c(“a”, “b”, NA, NA, “c”, “d”, “e”, “f”), prepare a matrix containing
two columns x & y and does not have any missing value
 What is the sum & mean of Wind for the observations which has
temperature greater then 60 & month equals to 5
 How to create a new directory with a given name
Reading / Writing Data Set
 Principle functions for reading data into R
 read.table(), read.csv(): Used for reading tabular data
 readLines(): For reading lines of a text file
 source(): For reading in R code file
 dget(): For reading in R code file
 load(): For reading in saved workspaces
 unserialize(): For reading single R objects in binary form

•

Principle functions for writing data to files

 write.table()
 writeLines()
 dump()
 dput()
 save()
 serialize()
Importing / Exporting Data

 Read.table() is one of the most commonly used function for reading data.
Few important arguments;
 file, name of the file to be read,
 header, logical indicating if the file has a header line
 sep, a string indicating how the columns are separated
 colClasses, a character vector indicating class of each column in the dataset
 nrows, the maximum number of rows to be read in the dataset
 na.strings, a character vector of strings which are to be interpreted as NA values
 comment.char, a character string indicating the comment character
 skip, number of lines to skip from beginning
 stringAsFactors, logical indicating should character variables be codes as factors

 Write.table()
 X, the object to be written, preferable a matrix or a data frame
 File, path and name of the file to be created
 Sep, a string indicating how the columns are separated
 Row.names, col.names, logical indicating whether the row names or col names to be
written along with x
Data Summary / Manipulation

 attach(x): For attaching a file
 detach(x): For detaching a file
•

summary(x): For displaying summary statistics of a data set

•

str(x): For displaying summary statistics of a data set in a different
manner then summary()

•

sort(): For sorting a vector or factor

•

order(): For ordering along more than one variable

•

merge(): Merge two data frames by common columns or row names, or
do other versions of database join operations

•

cut(x, breaks, labels): Divides the range of x into intervals and codes
the values in x according to which interval they fall. The leftmost
interval corresponds to level one, the next leftmost to level two and
so on.
 cut(x, 10, 1:10)
Data Summary / Manipulation

•

pretty(x, n): Compute a sequence of about n+1 equally spaced „round‟
values which cover the range of the values in x.
 pretty(x, 100)

•

substr(x, start, stop) <- value: Extract or replace substrings in a
character vector.

•

strsplit(): Split the elements of a character vector x into substrings
according to the matches to substring split within them.

•

rank(): Returns the sample ranks of the values in a vector. Ties (i.e.,
equal values) and missing values can be handled in several ways

•

aggregate(): Splits the data into subsets, computes summary statistics
for each, and returns the result in a convenient form.

 ddply(): For each subset of a data frame, apply function then combine
results into a data frame.
Control Structures

 Allows you to control the flow of execution of the program
 if, else (testing a condition)
 if (condition) {do something} else if {do something different} else {do something
different}

 for (executing a loop fixed number of times)
 for (i in 1:10) { do something}

 while (executing a loop while a condition is true)
 while (condition) { do something}

 repeat (execute a infinite loop)

 break (break the execution of a loop)
 next (skip a iteration of a loop)
 return (exit a function)

 Create a vector with all integers from 1 to 1000 and replace all even
number by their inverse
Loop Functions

 lapply: Returns a list of the same length as X, each element of which
is the result of applying FUN to the corresponding element of X
 lapply(airquality, mean)
 Calculate sum of all the variables of the airquality dataset excluding NAs

 sapply: Sapply is a user-friendly version of lapply by default returning
a vector or matrix if appropriate
 sapply(airquality, mean)
 Repeat the problem present in lapply using sapply and see the difference

 apply: Returns a vector or array or list of values obtained by applying
a function to margins of an array or matrix
 apply(airquality, 1, sum)
 Calculate deciles including min and max of all the variables of the dataset
airquality excluding NAs
 Calculate square of each element of a matrix with dimensions 10 & 2 and
entries 1 to 20
Loop Functions

 tapply: Apply a function to each cell of a ragged array, that is to
each (non-empty) group of values given by a unique combination of
the levels of certain factors
 tapply(airquality$Ozone, aiqruality$Month, sum)

 Calculate sum of Ozone variable for observations having month equals
to 5

 mapply: mapply is a multivariate version of sapply. mapply applies
FUN to the first elements of each argument, the second elements,
the third elements, and so on
 mapply(rep, 1:4, 4:1)
 Calculate sum of two lists with dimensions 10 & 2 and having entries 1
to 20, 101 to 120, 201 to 220 & 301 to 320
Plotting Functions
 plot(x,y)
 hist(x)
 par()
 pch: plotting symbol

 lty: line type
 lwd: line width
 col: plotting color
 las: axis label orientation
 bg: background color

 mar: margin size
 oma: outer margin size
 mfrow: number of plots per row, column (plots are filled row-wise)
 mfcol: number of plots per row, column (plots are filled column-wise)
Plotting Functions

 lines: add lines to the plot
 points: add points to the plot
 text: add text labels to the plot
 title: add annotations to x, y axis labels, title, subtitle, outer
margin

 mtext: add text to the margins of the plot
 axis: adding axis ticks/labels
Functions

 function ()
 Exact match –> Partial match –> Positional match
 Return value of a function is the last expression in the function body
to be evaluated
 Functions can be nested, so that a function can be defined inside
another function
 Functions can be passed as arguments to other functions
Debugging

• Primary tools for debugging functions in R
 traceback: prints out the function call stack after an error occurs; does
nothing if there is no error
 debug: flags a function for debug mode which allows you to step through
execution of a function one line at a time
 browser: suspends the execution of a function whenever it is called and
puts the function in debug mode
 trace: allows you to insert debugging code into a function at specific
places
 recover: allows you to modify the error behavior so that you can browse
the function call stack
Debugging

 Indications that something‟s is not right
 message: a generic notification/diagnostic message produced by the
message function; execution of the function continues

 warning: an indication that something is wrong but not necessarily
fatal produced by warning function‟ execution of the function
continues
 error: an indication that a fatal problem has occurred produced by
stop function; execution stops
 condition: a generic concept for indicating that something
unexpected can occur; programmers can create their own conditions
Thanks a lot
For Question Read more 

dwivedishashwat@gmail.com

R language introduction

  • 1.
  • 2.
    Background • 1991: Createdby Ross Ihaka and Robert Gentleman • 2000: R version 1.0.0 is released • Latest version is 2.15.2 released in Oct „12 • R version 2.15.3 is scheduled to release in Mar „12 and 3.0.0 is scheduled to be released in Apr ‟13 • http://www.r-project.org (basic information about R) • http://www.cran.r-project.org (base system and additional packages) • help() or ?help, help.search() or ??help
  • 3.
    Background • R isa free software environment for statistical computing and graphics • Very active and vibrant user community • Graphical capabilities • Physical memory • Base R and around 4000 packages 1/21/2014
  • 4.
    Introduction  memory.limit(): Tofind out maximum amount of available physical memory  memory.size(): To find out how much memory is in use  getwd(): Shows the path of your current working directory  setwd(path): Allows you to set a new path for your current working directory  dir(): List down all the files in your working directory  Program Editor (open, load, run, save) • ls(): List all objects in your workspace • rm(): Removes object from your workspace
  • 5.
    Introduction  Commands toR are expressions (4/3) or assignments (x <- 4/3)  R is case sensitive  Everything in R is a object  Normally R objects are accessed by their names which is made up from letters, and digits (0 to 9) or a period (“.”) in non-initial positions.  Every object has a class  R has 5 basic classes of objects  character  numeric (real numbers)  integers  complex  logical (True / False) • The most basic object is a vector  A vector can only contains objects of the same class
  • 6.
    Background • Ex. •  x <-1 # assignment  Print(x) # explicit printing  X # auto printing Ex.         • Q.    • x <- c(0.5, 0.6) # numeric X <- c(TRUE, FALSE) # logical X <- c(T, F) # logical X <- c(“a”, “b”, “c”, “d”) # character X <- 1:20 # integer X <- c(1+0i, 2+4i) #complex seq(from=1, to=10, by=1) rep(c(1,2,3,4,5), times=2, each=2) X <- c(1.7, “a”) X <- c(TRUE, 2) X <- c(“a”, TRUE) When different objects are mixed in a vector, coercion occurs so that every element in the vector is of same class
  • 7.
    Session 1 Remaining •rep(x, times, length.out, each) • rm() • rm(list=ls(pattern=“^test”)) • rm(list=ls(pattern=“test”)) • rm(list=setdiff(ls(), “test”)) • rm(list=ls())
  • 8.
    Introduction • Ex.  X<- 0:6  X <- c(“a”, “b”, “c”)  X <- c(1, 2, 3)  Numbers in R generally treated as numeric (i.e. double precision real numbers)  If you explicitly wants an integer then you need to specify L suffix  Special number Inf (1/0), it actually a real number, 1/Inf will give you 0  Undefined value NaN (0/0) Not a Number, it can be though of as missing value • # indicates comments
  • 9.
    Data Types  Robjects can have attributes (attributes())  Class (class())  Length (length())  names (colnames for a matrix), dimnames (rownames, colnames for a matrix)  dimensions (dim())  other user defined attributes  Various data types in R  Vectors  Vector(mode, length) • Lists: Special type of vectors which can contain objects of different classes.  x <- list(1,2,3,“a”,”b”,”c”)  x <- list(a=c(1,2,3), b=1:4, c=c(“a”,”b”,”c”))
  • 10.
    Data Types  Matrix:vectors with dimension attribute. Dimension itself is an integer vector of length 2 (nrow, ncol). Matrices are constructed column wise.       m <- matrix(nrow=2, ncol=3) m <- matrix(1:6, nrow=2, ncol=3) x <- 1:3 y <- 10:12 cbind(x, y) rbind(x,y)  Data frames (data.frame())  https://stat.ethz.ch/pipermail/rhelp/attachments/20101027/05a229bb/attachment.pl  Factors: Used for categorical data i.e. Male & Female or analyst, senior analyst, manager etc.      x <- factor(c(“a”, “b”, “b”, “c”, “c”, “c”, “d”)) levels() unclass(x) levels([4:6]) Levels([4:6, drop=TRUE])
  • 11.
    Date & Time Converting a character variable to a date variable  as.Date(variable_name, input_format)  strptime(variable_name, input_format)  Output will be %Y-%m-%d %H:%M:%S  %Y: Year with century  %m: Month as decimal number (01-12)  %d: Day of the month as decimal number(01-31)  %H: Hrs as decimal numbers (00-23)  %M: Minutes as decimal numbers (00-59)  %S: Seconda as decimal numbers (00-59)  Converting a date variable to a character variable / formatting a date variable  strftime(date_variable_name, output_format)  format(data_variable_name, output_format)  as.character(date_variable_name, output_format)
  • 12.
    Sub-setting  [ alwaysreturns an object of the same class as the original; can be used to select more than one element  [[ is used to extract elements of list or data frames; it can only be used to extract single element and the class of the returned object will not necessarily be a list or data frame  $ is used to extract elements of a list or data frames by names; semantics are similar to [[
  • 13.
    Operators  <: Lessthan  <=: Less than equals to  >: Greater than  >=: Greater than equals to  ==: Exactly equals to  !=: Not equal to  | or II: OR  & or &&: AND  !: NOT
  • 14.
    Some Examples  x<- c(“a”, “b”, “c”, “c”, “d”, “a”)  x[1], x[1:4], x[x > “a”], u <- x >”a”  x <- matrix(1:6,2,3)  x[1,2], x[1,], x[,1], x[1,2, drop=FALSE]  x <- list(var_1=c(1:10), var_2=c(“a”, “b”, “c”), var_3=0.6)  x[1], x[[1]], x$var_1  name <- “var_1”, x[name], x[[name]], x$name  x[c(1,3)], x[[c(1,3)]], x[[1]][[3]]  Produce a character vector containing var_1, var_2, var_3… var_999  Remove missing values from x <- c(1, 2, 3, NA, 4, 5, NA, 6)  y <- c(“a”, “b”, NA, NA, “c”, “d”, “e”, “f”), prepare a matrix containing two columns x & y and does not have any missing value  What is the sum & mean of Wind for the observations which has temperature greater then 60 & month equals to 5  How to create a new directory with a given name
  • 15.
    Reading / WritingData Set  Principle functions for reading data into R  read.table(), read.csv(): Used for reading tabular data  readLines(): For reading lines of a text file  source(): For reading in R code file  dget(): For reading in R code file  load(): For reading in saved workspaces  unserialize(): For reading single R objects in binary form • Principle functions for writing data to files  write.table()  writeLines()  dump()  dput()  save()  serialize()
  • 16.
    Importing / ExportingData  Read.table() is one of the most commonly used function for reading data. Few important arguments;  file, name of the file to be read,  header, logical indicating if the file has a header line  sep, a string indicating how the columns are separated  colClasses, a character vector indicating class of each column in the dataset  nrows, the maximum number of rows to be read in the dataset  na.strings, a character vector of strings which are to be interpreted as NA values  comment.char, a character string indicating the comment character  skip, number of lines to skip from beginning  stringAsFactors, logical indicating should character variables be codes as factors  Write.table()  X, the object to be written, preferable a matrix or a data frame  File, path and name of the file to be created  Sep, a string indicating how the columns are separated  Row.names, col.names, logical indicating whether the row names or col names to be written along with x
  • 17.
    Data Summary /Manipulation  attach(x): For attaching a file  detach(x): For detaching a file • summary(x): For displaying summary statistics of a data set • str(x): For displaying summary statistics of a data set in a different manner then summary() • sort(): For sorting a vector or factor • order(): For ordering along more than one variable • merge(): Merge two data frames by common columns or row names, or do other versions of database join operations • cut(x, breaks, labels): Divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.  cut(x, 10, 1:10)
  • 18.
    Data Summary /Manipulation • pretty(x, n): Compute a sequence of about n+1 equally spaced „round‟ values which cover the range of the values in x.  pretty(x, 100) • substr(x, start, stop) <- value: Extract or replace substrings in a character vector. • strsplit(): Split the elements of a character vector x into substrings according to the matches to substring split within them. • rank(): Returns the sample ranks of the values in a vector. Ties (i.e., equal values) and missing values can be handled in several ways • aggregate(): Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.  ddply(): For each subset of a data frame, apply function then combine results into a data frame.
  • 19.
    Control Structures  Allowsyou to control the flow of execution of the program  if, else (testing a condition)  if (condition) {do something} else if {do something different} else {do something different}  for (executing a loop fixed number of times)  for (i in 1:10) { do something}  while (executing a loop while a condition is true)  while (condition) { do something}  repeat (execute a infinite loop)  break (break the execution of a loop)  next (skip a iteration of a loop)  return (exit a function)  Create a vector with all integers from 1 to 1000 and replace all even number by their inverse
  • 20.
    Loop Functions  lapply:Returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X  lapply(airquality, mean)  Calculate sum of all the variables of the airquality dataset excluding NAs  sapply: Sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate  sapply(airquality, mean)  Repeat the problem present in lapply using sapply and see the difference  apply: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix  apply(airquality, 1, sum)  Calculate deciles including min and max of all the variables of the dataset airquality excluding NAs  Calculate square of each element of a matrix with dimensions 10 & 2 and entries 1 to 20
  • 21.
    Loop Functions  tapply:Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors  tapply(airquality$Ozone, aiqruality$Month, sum)  Calculate sum of Ozone variable for observations having month equals to 5  mapply: mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each argument, the second elements, the third elements, and so on  mapply(rep, 1:4, 4:1)  Calculate sum of two lists with dimensions 10 & 2 and having entries 1 to 20, 101 to 120, 201 to 220 & 301 to 320
  • 22.
    Plotting Functions  plot(x,y) hist(x)  par()  pch: plotting symbol  lty: line type  lwd: line width  col: plotting color  las: axis label orientation  bg: background color  mar: margin size  oma: outer margin size  mfrow: number of plots per row, column (plots are filled row-wise)  mfcol: number of plots per row, column (plots are filled column-wise)
  • 23.
    Plotting Functions  lines:add lines to the plot  points: add points to the plot  text: add text labels to the plot  title: add annotations to x, y axis labels, title, subtitle, outer margin  mtext: add text to the margins of the plot  axis: adding axis ticks/labels
  • 24.
    Functions  function () Exact match –> Partial match –> Positional match  Return value of a function is the last expression in the function body to be evaluated  Functions can be nested, so that a function can be defined inside another function  Functions can be passed as arguments to other functions
  • 25.
    Debugging • Primary toolsfor debugging functions in R  traceback: prints out the function call stack after an error occurs; does nothing if there is no error  debug: flags a function for debug mode which allows you to step through execution of a function one line at a time  browser: suspends the execution of a function whenever it is called and puts the function in debug mode  trace: allows you to insert debugging code into a function at specific places  recover: allows you to modify the error behavior so that you can browse the function call stack
  • 26.
    Debugging  Indications thatsomething‟s is not right  message: a generic notification/diagnostic message produced by the message function; execution of the function continues  warning: an indication that something is wrong but not necessarily fatal produced by warning function‟ execution of the function continues  error: an indication that a fatal problem has occurred produced by stop function; execution stops  condition: a generic concept for indicating that something unexpected can occur; programmers can create their own conditions
  • 27.
    Thanks a lot ForQuestion Read more  dwivedishashwat@gmail.com