2. Contents
● What is R? How to invoke?
● Basic data types, control structures
● Environments, functions
● Classes, packages, graphs
3. What is R
● A free software programming language and
software environment for statistical computing
and graphics
● Dialect of the S programming language with
lexical scoping semantic inspired by Scheme
● Provides linear and nonlinear modeling, classical
statistical tests, time-series analysis,
classification, clustering, and more...
● Cross-platform: Windows, Linux, Mac
4. How to invoke
● Command-line interpreter for interactive
programming
– Type R in a terminal
– Use ?topic or help(topic) for help;
try example(topic) for examples
– Use quit() to exit
● GUIs
– RStudio IDE
– Web-interface: http://10.122.85.41:8787/
5. Basic Data Types (1)
● R is value typed. Everything is an object
x<value # < is the assignment
y<x # deep copying
● Atomic data types
– integer (32 bits) age<20L
– double (binary64) gpa<3.34
– character name<"John"
– logical married<TRUE # or FALSE
– complex, raw
mode(x), typeof(x), class(x), str(x)
● Other types
– closure f<function() {}
– language q<quote(x<1)
● Special constants
NULL, NA, Inf, NaN
6. Basic Data Types (2)
● Vectors
– A set of objects of an atomic type
banknotes<c(1,5,10,20,50,100) # c means combine
banknotes[5], length[banknotes], mode(banknotes)
name<c(given="George", middle="Walker", family="Bush")
name[1], name["given"], names(name)
– Tricks with indexes
x<c(1,2,3) # try x[0], x[c(1,3)], x[1], x[c(1,3)]
x[c(T,F,T)], x>1, x[x>1] # logical indexing
– Cycling through a vector argument
c(1,2,3,4) + c(0,1), 10 * c(1,2,3,4)
7. Basic Data Types (3)
●
Lists
– A set of objects of different types
l<list(age=20L, gpa=3.34, name="John", married=TRUE)
length(l), names(l)
– Use [] to extract a sublist
l[1], l[c("name","married")], l[1], l[c(1,3)]
– Use [[]] or $ to access an object in a list
l[[1]], l[["age"]], l$age
list(c(1, 2, 3), c("a", "b"), function(){})
●
Attributes
attributes(age)<list(units="years")
structure(l, comment="those one guy")
8. Basic Data Types (4)
● Matrices
m<matrix(c(1,2,3,4), c(2,2)) # use dim,
nrow, ncol rbind, and cbind with matrices
– Use t to transpose, %*% for matrix multiplication, diag to
extract diagonal
● Arrays
a<array(rnorm(8), c(2,2,2))
● Factors (enumerated type)
faculty<factor("engineering", c("arts",
"law", "engineering", "finances"))
9. Basic Data Types (5)
● Data frames
– A data frame combines a set of vector of the same length
df<data.frame(age=c(20L, 21L),
gpa=c(3.34, 3.14),
name=c("John", "George"),
married=c(T, F))
– Any data frame can be accessed either as a list or as a matrix
df.1<df[df$gpa>3.2, c("name","married")]
df.2<subset(df, subset=(gpa>3.2),
select=c(name,married))
identical(df.1, df.2) # TRUE
12. Environments (1)
● Every variable or function is defined in an environment
environment() # gives the current evaluation
environment
● Environments form a tree with the root given by emptyenv()
● The root environment emptyenv() cannot be populated
● .GlobalEnv is the user's working environment or workspace. It
can also be assessed by globalenv()
identical(environment(), globalenv()) # TRUE
● baseenv() is the library environment for the basic R functions
ls(baseenv())
14. Environments (3)
● parent.env(env) returns the parent of environment env
identical(parent.env(baseenv()),emptyenv()) # TRUE
● To create a new environment use new.env(parent)
– If the parent parameter is omitted, .GlobalEnv is used by default
●
To change the evaluation environment use
evalq(expr, env)
with(data, expr) # does the same to data frames and lists
●
Example
e.1<new.env() # created a new environment e.1
parent.env(e.1) # should be .GlobalEnv
evalq(environment(), e.1) # should be e.1
e.2<new.env(parent=e.1) # created a new environment e.2
parent.env(e.2) # should be e.1
evalq(environment(), e.2) # should be e.2
15. Environments (4)
● When resolving a variable or function name, R searches the current evaluation environment, then the
parent environments along the path to the root environment emptyenv()
x<0 # set x to 0 in .GlobalEnv
Both evalq(x, e.1) and evalq(x, e.2) should give 0
evalq(x<2, e.2) # set x to 2 in e.2
Now evalq(x, e.1) still gives 0 while evalq(x, e.2) has changed to 2
● To set an object, such as a variable or a function, in a particular environment use
assign(obj.name, value, envir=env) # inherits is FALSE by default
● To get the value of an object in a particular environment use
get(obj.name, envir=env) # inherits is TRUE by default
● To check whether an object exists in a particular environment
exists(obj.name, envir=env) # inherits is TRUE by default
For example,
exists("x", e.1, inherits=FALSE) # FALSE
exists("x", e.2, inherits=FALSE) # TRUE
16. Environments (5)
● Every environment can also be treated as a list. For example,
e.2$x gives access to x in e.2
● The so-called search path starts from .GlobalEnv and ends with
baseenv(). The search() function returns string names of the
environments in the search path
[1] ".GlobalEnv" "tools:rstudio"
[3] "package:stats" "package:graphics"
[5] "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods"
[9] "Autoloads" "package:base"
17. Environments (6)
● To restore an environment from its string name, use
as.environment(name). For example,
as.environment("package:base") # maps the
string to the baseenv() object
● Unless the evaluation environment was specified explicitly, the
interpreter searches the environments along the search path,
starting from .GlobalEnv, until it hits emptyenv()
18. Environments (7)
● To add an environment env to the search path one can use
attach(env, pos, name) which creates a copy of the
environment env with string name name and inserts it at position
pos>1 in the search path
● find(obj.name) returns all environments along the search path
containing objects with a specified name
● Example
e.1$x<1; attach(e.1, 2L, "e.1")
assign("x", 11, e.1) # modified e.1 but not its attached
duplicate
get("x", e.1) # returns 11
x # is still 1
19. Functions (1)
● Functions in R are “first class objects” which means they can be
treated much like any other object
– Can be passed as arguments to other functions
– Can be nested, so that you can define a function inside of another function
– Can be returned by other functions
● The return value of a function is the last expression in the function
body to be evaluated
● Example: factorial
fact<function(x) ifelse(x==1, 1, x*fact(x1)) # ? in C
fact # function(x) ifelse(x==1, 1, x*f(x1))
fact(5) # 120
fact(1000) # Inf
20. Functions (2)
●
A function consists of its formal arguments and a body and it has a reference to the
enclosing environment (closure)
formals(fact) # $x
body(fact) # ifelse(x == 1, 1, x * f(x 1))
environment(fact) # .GlobalEnv
●
By default, enclosing environment references the environment in which the function was
created, but it can be redefined with
environment(fact)<some.other.environment
●
Being called, a function creates its own environment, a child of the enclosing environment
●
Thus we have
– the environment where the function is created: find("fact")
– the environment where the function resides (enclosing environment): environment(fact)
– the environment created when a function is run: environment()
– the environment where a function is called: parent.frame()
21. Functions (3)
● Function arguments are named and may have default values
● You can mix positional matching with matching by name. When an
argument is matched by name it is “taken out” of the argument list
and the remaining unnamed arguments are matched in the order that
they are listed in the function definition
f<function(x, y, z=0) as.list(environment())
f(1, 2) # x:1, y:2, z:0
f(y=1, x=2, z=3) # x:2, y:1, z:3
f(y=1) # x:, y:1, z:0
f(z=3, 2, 1) # x: 2, y: 1, z: 3
f(1, 2, 3, 4) # error: unused argument (4)
22. Functions (4)
● The order of operations when given an argument is
– Check for exact match for a named argument
– Check for partial match
– Check for a positional match
– The … argument indicates a variable number of arguments that are usually passed on to other
functions
● Any argument that appears after … in the argument list must be named explicitly and
cannot be partially matched
g<function(y, z=0) as.list(environment())
f<function(x, ...) g(...)
f(1) # y:, z: 0
f(1, 2, 3) # y: 2, z: 3
f(1, 2, 3, 4) # error: unused argument (4)
f(y=1, 2, 3) # y: 1, z: 3
f(2, 3, x=1) # y: 2, z: 3
23. Functions (5)
● Free variables
f<function() x # x is a free variable
f() # error: object 'x' not found
x<1; f() # 1
● Lexical (static) scoping
f<function() {
x<1
g<function() x
}
x<2; h<f()
h() # 1 or 2?
● Why 1?
environment(g) # created by f() call, not .GlobalEnv
environment(g)$x # 1
24. Functions (6)
● Example: function that returns function
power<function(n) {
function(x) x^n
}
n<5 # ignored
square<power(2)
square(3) # 9
cube<power(3)
cube(2) # 8
25. Classes (1)
● Everything in R is an object
● A class is the definition of an object
● A method is a function that performs specific calculations
on objects of a specific class. A generic function is used
to determine the class of its arguments and select the
appropriate method. A generic function is a function with
a collection of methods
● print, plot, summary...
● See ?Classes and ?Methods for more info
26. Classes (2)
● S3 classes – old style, quick and dirty, informal
● Set an object's attribute to the class name, e.g.
x<c("a", "b", "c") # this is an object
class(x)<"X" # set the class of the object
# Define a method specific to the X class
print.X<function(x, ...) {
cat("X obj:n")
print(unclass(x), ...)
}
print(x) # X obj: a b c
● Inheritance
class(x)<c("X", "Y", "Z")
inherits(x, “Z”) # TRUE
28. Classes (4)
● S4 classes – new style, rigorous and formal
● Classes have formal definitions which describe their fields
and inheritance structures (parent classes)
● Method dispatch can be based on multiple arguments to a
generic function, not just one
● There is a special operator, @, for extracting slots (aka
fields) from an S4 object
● All S4 related code is stored in the methods package
29. Classes (5)
● To create a new S4 class
setClass(Class, representation)
● Use new() to generate a new object from a
class
● To create an S4 method
setMethod(f, signature,
definition)
32. Packages (1)
● Packages extend functionality of R
● http://cran.r-project.org/web/packages
– 5434 available packages as of Apr 14, 2014
● repository → installed → loaded
● library(help="package")
● Datasets
data(mtcars); help(mtcars)
● Example: libsvm
install.packages("e1071")
library(e1071)
detach("package:e1071")
33. Packages (2)
● Library and namespace environments
– Library environments, such as "package:stats", contain external objects resolvable to the
user by their names. Thus library environment has to be attached to the search path
– Conversely, the namespace environments contain internal objects that are opaque to the user
but transparent to the library functions. Usually, namespace environment is a
.BaseNamespace's children
find("svm") # "package:e1071"
environment(svm) # <environment: namespace:e1071>
length(ls(environment(svm))) # 90 objects in the namespace
environment
length(ls("package:e1071")) # 58 objects in the package environment
– Path from the namespace environment to .GlobalEnv
<environment: namespace:e1071> "imports:e1071" .BaseNamespaceEnv → →
.GlobalEnv→
34. Packages (3)
● Example SparkR
sc<sparkR.init(master)
parallelize(sc, col, numSlices)
map(rdd, func)
reduce(rdd, func)
reduceByKey(rdd, combineFunc, numPartitions)
cache(rdd)
collect(rdd)
● Pi example
37. Appendix: funny stuff about R
● Expression which returns itself
(function(x) substitute((x)(x)))
(function(x) substitute((x)(x)))
(function(x) substitute((x)(x)))
(function(x) substitute((x)(x)))
expression < (function(x)
substitute((x)(x)))(function(x)
substitute((x)(x)))
expression == eval(expression) # TRUE
Editor's Notes
R is a dialect of the S language. It is a case-sensitive, interpreted language. You can enter commands one at a time at the command prompt or run a set of commands from a source file.
NULL object
Special values: NA, NaN, Inf,
NA may have a type