SlideShare a Scribd company logo
1 of 37
Introduction to R
Contents
● What is R? How to invoke?
● Basic data types, control structures
● Environments, functions
● Classes, packages, graphs
What is R
● A free software programming language and
software environment for statistical computing
and graphics
● Dialect of the S programming language with
lexical scoping semantic inspired by Scheme
● Provides linear and nonlinear modeling, classical
statistical tests, time-series analysis,
classification, clustering, and more...
● Cross-platform: Windows, Linux, Mac
How to invoke
● Command-line interpreter for interactive
programming
– Type R in a terminal
– Use ?topic or help(topic) for help;
try example(topic) for examples
– Use quit() to exit
● GUIs
– RStudio IDE
– Web-interface: http://10.122.85.41:8787/
Basic Data Types (1)
● R is value typed. Everything is an object
x<­value # <­ is the assignment
y<­x # deep copying
● Atomic data types
– integer (32 bits) age<­20L
– double (binary64) gpa<­3.34
– character name<­"John"
– logical married<­TRUE # or FALSE
– complex, raw
mode(x), typeof(x), class(x), str(x)
● Other types
– closure f<­function() {}
– language q<­quote(x<­1)
● Special constants
NULL, NA, Inf, NaN
Basic Data Types (2)
● Vectors
– A set of objects of an atomic type
banknotes<­c(1,5,10,20,50,100) # c means combine
banknotes[5], length[banknotes], mode(banknotes)
name<­c(given="George", middle="Walker", family="Bush") 
name[1], name["given"], names(name)
– Tricks with indexes
x<­c(1,2,3) # try x[0], x[c(1,3)], x[­1], x[c(­1,­3)]
x[c(T,F,T)], x>1, x[x>1] # logical indexing
– Cycling through a vector argument
c(1,2,3,4) + c(0,1), 10 * c(1,2,3,4)
Basic Data Types (3)
●
Lists
– A set of objects of different types
l<­list(age=20L, gpa=3.34, name="John", married=TRUE)
length(l), names(l)
– Use [] to extract a sublist
l[1], l[c("name","married")], l[­1], l[c(­1,­3)]
– Use [[]] or $ to access an object in a list
l[[1]], l[["age"]], l$age
list(c(1, 2, 3), c("a", "b"), function(){})
●
Attributes
attributes(age)<­list(units="years")
structure(l, comment="those one guy")
Basic Data Types (4)
● Matrices
m<­matrix(c(1,2,3,4), c(2,2)) # use dim, 
nrow, ncol rbind, and cbind with matrices
– Use t to transpose, %*% for matrix multiplication, diag to
extract diagonal
● Arrays
a<­array(rnorm(8), c(2,2,2))
● Factors (enumerated type)
faculty<­factor("engineering", c("arts",       
   "law", "engineering", "finances"))
Basic Data Types (5)
● Data frames
– A data frame combines a set of vector of the same length
df<­data.frame(age=c(20L, 21L),
               gpa=c(3.34, 3.14),
               name=c("John", "George"),
               married=c(T, F))
– Any data frame can be accessed either as a list or as a matrix
df.1<­df[df$gpa>3.2, c("name","married")]
df.2<­subset(df, subset=(gpa>3.2),
             select=c(name,married))
identical(df.1, df.2) # TRUE
Control Structures (1)
if (cond) expr
if (cond) expr1 else expr2
for (var in seq) expr
while (cond) expr
repeat expr
break, next
switch (expr, ...)
ifelse (test, yes, no)
● Implicit looping
lapply, sapply, apply, mapply
Control Structures (2)
●
Examples
df.3<­data.frame(name=character(), married=logical())
for (row in 1:nrow(df))
  if (df$gpa[row]>3.2) 
    df.3<­rbind(df.3, data.frame(name=df$name[row],
                married=df$married[row]))
identical(df.2, df.3) # TRUE
lapply(l, typeof) #
sapply(l, typeof) # simplifies the result to a vector
apply(m, 2, max) # max column element (vector)
apply(m, 1:length(dim(mat)), sqrt) # (matrix)
mapply(function(x, y) seq_len(x) + y, c(1, 2, 3), c(10, 0, ­10))
Environments (1)
● Every variable or function is defined in an environment
environment() # gives the current evaluation
                environment
● Environments form a tree with the root given by emptyenv()
● The root environment emptyenv() cannot be populated
● .GlobalEnv is the user's working environment or workspace. It
can also be assessed by globalenv()
identical(environment(), globalenv()) # TRUE
● baseenv() is the library environment for the basic R functions
ls(baseenv())
Environments (2)
baseenv()
.GlobalEnv
emptyenv()
...
...
e.1 .BaseNamespaceEnv
......
...
...
...
e.2
Environments (3)
● parent.env(env) returns the parent of environment env
identical(parent.env(baseenv()),emptyenv()) # TRUE
● To create a new environment use new.env(parent)
– If the parent parameter is omitted, .GlobalEnv is used by default
●
To change the evaluation environment use
evalq(expr, env)
with(data, expr) # does the same to data frames and lists
●
Example
e.1<­new.env() # created a new environment e.1
parent.env(e.1) # should be .GlobalEnv
evalq(environment(), e.1) # should be e.1
e.2<­new.env(parent=e.1) # created a new environment e.2
parent.env(e.2) # should be e.1
evalq(environment(), e.2) # should be e.2
Environments (4)
● When resolving a variable or function name, R searches the current evaluation environment, then the
parent environments along the path to the root environment emptyenv()
x<­0 # set x to 0 in .GlobalEnv
Both evalq(x, e.1) and evalq(x, e.2) should give 0
evalq(x<­2, e.2) # set x to 2 in e.2
Now evalq(x, e.1) still gives 0 while evalq(x, e.2) has changed to 2
● To set an object, such as a variable or a function, in a particular environment use
assign(obj.name, value, envir=env) # inherits is FALSE by default 
● To get the value of an object in a particular environment use
get(obj.name, envir=env) # inherits is TRUE by default
● To check whether an object exists in a particular environment
exists(obj.name, envir=env) # inherits is TRUE by default
For example,
exists("x", e.1, inherits=FALSE) # FALSE
exists("x", e.2, inherits=FALSE) # TRUE
Environments (5)
● Every environment can also be treated as a list. For example,
e.2$x gives access to x in e.2
● The so-called search path starts from .GlobalEnv and ends with
baseenv(). The search() function returns string names of the
environments in the search path
[1] ".GlobalEnv" "tools:rstudio"
[3] "package:stats" "package:graphics"
[5] "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods"
[9] "Autoloads" "package:base"
Environments (6)
● To restore an environment from its string name, use
as.environment(name). For example,
as.environment("package:base") # maps the 
string to the baseenv() object
● Unless the evaluation environment was specified explicitly, the
interpreter searches the environments along the search path,
starting from .GlobalEnv, until it hits emptyenv()
Environments (7)
● To add an environment env to the search path one can use
attach(env, pos, name) which creates a copy of the
environment env with string name name and inserts it at position
pos>1 in the search path
● find(obj.name) returns all environments along the search path
containing objects with a specified name
● Example
e.1$x<­1; attach(e.1, 2L, "e.1")
assign("x", 11, e.1) # modified e.1 but not its attached 
duplicate
get("x", e.1) # returns 11
x # is still 1
Functions (1)
● Functions in R are “first class objects” which means they can be
treated much like any other object
– Can be passed as arguments to other functions
– Can be nested, so that you can define a function inside of another function
– Can be returned by other functions
● The return value of a function is the last expression in the function
body to be evaluated
● Example: factorial
fact<­function(x) ifelse(x==1, 1, x*fact(x­1)) # ? in C
fact # function(x) ifelse(x==1, 1, x*f(x­1))
fact(5) # 120
fact(1000) # Inf
Functions (2)
●
A function consists of its formal arguments and a body and it has a reference to the
enclosing environment (closure)
formals(fact) # $x
body(fact) # ifelse(x == 1, 1, x * f(x ­ 1))
environment(fact) # .GlobalEnv
●
By default, enclosing environment references the environment in which the function was
created, but it can be redefined with
environment(fact)<­some.other.environment
●
Being called, a function creates its own environment, a child of the enclosing environment
●
Thus we have
– the environment where the function is created: find("fact")
– the environment where the function resides (enclosing environment): environment(fact)
– the environment created when a function is run: environment()
– the environment where a function is called: parent.frame()
Functions (3)
● Function arguments are named and may have default values
● You can mix positional matching with matching by name. When an
argument is matched by name it is “taken out” of the argument list
and the remaining unnamed arguments are matched in the order that
they are listed in the function definition
f<­function(x, y, z=0) as.list(environment())
f(1, 2) # x:1, y:2, z:0
f(y=1, x=2, z=3) # x:2, y:1, z:3
f(y=1) # x:, y:1, z:0
f(z=3, 2, 1) # x: 2, y: 1, z: 3
f(1, 2, 3, 4) # error: unused argument (4)
Functions (4)
● The order of operations when given an argument is
– Check for exact match for a named argument
– Check for partial match
– Check for a positional match
– The … argument indicates a variable number of arguments that are usually passed on to other
functions
● Any argument that appears after … in the argument list must be named explicitly and
cannot be partially matched
g<­function(y, z=0) as.list(environment())
f<­function(x, ...) g(...)
f(1) # y:, z: 0
f(1, 2, 3) # y: 2, z: 3
f(1, 2, 3, 4) # error: unused argument (4)
f(y=1, 2, 3) # y: 1, z: 3
f(2, 3, x=1) # y: 2, z: 3
Functions (5)
● Free variables
f<­function() x # x is a free variable
f() # error: object 'x' not found
x<­1; f() # 1
● Lexical (static) scoping
f<­function() {
   x<­1
   g<­function() x
}
x<­2; h<­f()
h() # 1 or 2?
● Why 1?
environment(g) # created by f() call, not .GlobalEnv
environment(g)$x # 1
Functions (6)
● Example: function that returns function
power<­function(n) {
  function(x) x^n
}
n<­5 # ignored
square<­power(2)
square(3) # 9
cube<­power(3)
cube(2) # 8
Classes (1)
● Everything in R is an object
● A class is the definition of an object
● A method is a function that performs specific calculations
on objects of a specific class. A generic function is used
to determine the class of its arguments and select the
appropriate method. A generic function is a function with
a collection of methods
● print, plot, summary...
● See ?Classes and ?Methods for more info
Classes (2)
● S3 classes – old style, quick and dirty, informal
● Set an object's attribute to the class name, e.g.
x<­c("a", "b", "c") # this is an object
class(x)<­"X" # set the class of the object
# Define a method specific to the X class
print.X<­function(x, ...) {
  cat("X obj:n")
  print(unclass(x), ...)
}
print(x) # X obj: a b c
● Inheritance
class(x)<­c("X", "Y", "Z")
inherits(x, “Z”) # TRUE
Classes (3)
● Constructor
X<­function(x) {
  if (!is.numeric(x)) stop("x must be numeric")
  structure(x, class = "X")
}
● S3 class useful methods
is.object(obj) checks whether an object has a class attribute
class(x), unclass(x), methods(generic.function),
methods(class="class"), inherits(obj, "class"), is(obj, "class")
● Creating new generics
g<­function(x, ...) UseMethod("f", x)
f.X<­function(x, ...) print(x, ...)
g(x) # X obj: a b c
Classes (4)
● S4 classes – new style, rigorous and formal
● Classes have formal definitions which describe their fields
and inheritance structures (parent classes)
● Method dispatch can be based on multiple arguments to a
generic function, not just one
● There is a special operator, @, for extracting slots (aka
fields) from an S4 object
● All S4 related code is stored in the methods package
Classes (5)
● To create a new S4 class
setClass(Class, representation)
● Use new() to generate a new object from a
class
● To create an S4 method
setMethod(f, signature, 
definition)
Classes (6)
● Example
setClass("Person",
  slots = list(name = "character", age = "numeric"))
setClass("Employee",
  slots = list(boss = "Person"),
  contains = "Person")
alice <­ new("Person", name = "Alice", age = 40)
john <­ new("Employee", name = "John", age = 20, boss 
= alice)
Classes (7)
● Example
setGeneric("union") # [1] "union"
setMethod("union",
  c(x = "data.frame", y = "data.frame"),
  function(x, y) {
    unique(rbind(x, y))
  }
) # [1] "union"
● Useful functions
getGenerics, getClasses, showMehods
Packages (1)
● Packages extend functionality of R
● http://cran.r-project.org/web/packages
– 5434 available packages as of Apr 14, 2014
● repository → installed → loaded
● library(help="package")
● Datasets
data(mtcars); help(mtcars)
● Example: libsvm
install.packages("e1071")
library(e1071)
detach("package:e1071")
Packages (2)
● Library and namespace environments
– Library environments, such as "package:stats", contain external objects resolvable to the
user by their names. Thus library environment has to be attached to the search path
– Conversely, the namespace environments contain internal objects that are opaque to the user
but transparent to the library functions. Usually, namespace environment is a
.BaseNamespace's children
find("svm") # "package:e1071"
environment(svm) # <environment: namespace:e1071>
length(ls(environment(svm))) # 90 objects in the namespace 
environment
length(ls("package:e1071")) # 58 objects in the package environment
– Path from the namespace environment to .GlobalEnv
<environment: namespace:e1071>   "imports:e1071"   .BaseNamespaceEnv → →
 .GlobalEnv→
Packages (3)
● Example SparkR
sc<­sparkR.init(master)
parallelize(sc, col, numSlices)
map(rdd, func)
reduce(rdd, func)
reduceByKey(rdd, combineFunc, numPartitions)
cache(rdd)
collect(rdd)
● Pi example
Graphs
demo(graphisc); demo(persp)
References
http://www.pitt.edu/~njc23
http://adv-r.had.co.nz/
Appendix: funny stuff about R
● Expression which returns itself
(function(x) substitute((x)(x)))
(function(x) substitute((x)(x)))
(function(x) substitute((x)(x)))
(function(x) substitute((x)(x)))
expression <­ (function(x) 
substitute((x)(x)))(function(x) 
substitute((x)(x)))
expression == eval(expression) # TRUE

More Related Content

What's hot

R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)Christopher Roach
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)gekiaruj
 
The Ring programming language version 1.5.1 book - Part 30 of 180
The Ring programming language version 1.5.1 book - Part 30 of 180The Ring programming language version 1.5.1 book - Part 30 of 180
The Ring programming language version 1.5.1 book - Part 30 of 180Mahmoud Samir Fayed
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RRsquared Academy
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Lucas Witold Adamus
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardMario Fusco
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)Pedro Rodrigues
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Rick Copeland
 
Python dictionary : past, present, future
Python dictionary: past, present, futurePython dictionary: past, present, future
Python dictionary : past, present, futuredelimitry
 
Logic programming a ruby perspective
Logic programming a ruby perspectiveLogic programming a ruby perspective
Logic programming a ruby perspectiveNorman Richards
 
18. Java associative arrays
18. Java associative arrays18. Java associative arrays
18. Java associative arraysIntro C# Book
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaVladimir Kostyukov
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In ScalaKnoldus Inc.
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义leejd
 
Java chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and useJava chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and useMukesh Tekwani
 

What's hot (20)

R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)
 
The Ring programming language version 1.5.1 book - Part 30 of 180
The Ring programming language version 1.5.1 book - Part 30 of 180The Ring programming language version 1.5.1 book - Part 30 of 180
The Ring programming language version 1.5.1 book - Part 30 of 180
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forward
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)
 
Python dictionary : past, present, future
Python dictionary: past, present, futurePython dictionary: past, present, future
Python dictionary : past, present, future
 
Logic programming a ruby perspective
Logic programming a ruby perspectiveLogic programming a ruby perspective
Logic programming a ruby perspective
 
Python : Regular expressions
Python : Regular expressionsPython : Regular expressions
Python : Regular expressions
 
18. Java associative arrays
18. Java associative arrays18. Java associative arrays
18. Java associative arrays
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义
 
Java chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and useJava chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and use
 

Similar to Introduction to R

Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programaciónSoftware Guru
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkZalando Technology
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Jesper Kamstrup Linnet
 
python beginner talk slide
python beginner talk slidepython beginner talk slide
python beginner talk slidejonycse
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Loops and functions in r
Loops and functions in rLoops and functions in r
Loops and functions in rmanikanta361
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecLoïc Descotte
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 

Similar to Introduction to R (20)

Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Clojure basics
Clojure basicsClojure basics
Clojure basics
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programación
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?
 
Hw09 Hadoop + Clojure
Hw09   Hadoop + ClojureHw09   Hadoop + Clojure
Hw09 Hadoop + Clojure
 
R basics
R basicsR basics
R basics
 
Python basic
Python basicPython basic
Python basic
 
python beginner talk slide
python beginner talk slidepython beginner talk slide
python beginner talk slide
 
Practical cats
Practical catsPractical cats
Practical cats
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
Loops and functions in r
Loops and functions in rLoops and functions in r
Loops and functions in r
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
R language
R languageR language
R language
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 

Recently uploaded

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 

Recently uploaded (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 

Introduction to R

  • 2. Contents ● What is R? How to invoke? ● Basic data types, control structures ● Environments, functions ● Classes, packages, graphs
  • 3. What is R ● A free software programming language and software environment for statistical computing and graphics ● Dialect of the S programming language with lexical scoping semantic inspired by Scheme ● Provides linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and more... ● Cross-platform: Windows, Linux, Mac
  • 4. How to invoke ● Command-line interpreter for interactive programming – Type R in a terminal – Use ?topic or help(topic) for help; try example(topic) for examples – Use quit() to exit ● GUIs – RStudio IDE – Web-interface: http://10.122.85.41:8787/
  • 5. Basic Data Types (1) ● R is value typed. Everything is an object x<­value # <­ is the assignment y<­x # deep copying ● Atomic data types – integer (32 bits) age<­20L – double (binary64) gpa<­3.34 – character name<­"John" – logical married<­TRUE # or FALSE – complex, raw mode(x), typeof(x), class(x), str(x) ● Other types – closure f<­function() {} – language q<­quote(x<­1) ● Special constants NULL, NA, Inf, NaN
  • 6. Basic Data Types (2) ● Vectors – A set of objects of an atomic type banknotes<­c(1,5,10,20,50,100) # c means combine banknotes[5], length[banknotes], mode(banknotes) name<­c(given="George", middle="Walker", family="Bush")  name[1], name["given"], names(name) – Tricks with indexes x<­c(1,2,3) # try x[0], x[c(1,3)], x[­1], x[c(­1,­3)] x[c(T,F,T)], x>1, x[x>1] # logical indexing – Cycling through a vector argument c(1,2,3,4) + c(0,1), 10 * c(1,2,3,4)
  • 7. Basic Data Types (3) ● Lists – A set of objects of different types l<­list(age=20L, gpa=3.34, name="John", married=TRUE) length(l), names(l) – Use [] to extract a sublist l[1], l[c("name","married")], l[­1], l[c(­1,­3)] – Use [[]] or $ to access an object in a list l[[1]], l[["age"]], l$age list(c(1, 2, 3), c("a", "b"), function(){}) ● Attributes attributes(age)<­list(units="years") structure(l, comment="those one guy")
  • 8. Basic Data Types (4) ● Matrices m<­matrix(c(1,2,3,4), c(2,2)) # use dim,  nrow, ncol rbind, and cbind with matrices – Use t to transpose, %*% for matrix multiplication, diag to extract diagonal ● Arrays a<­array(rnorm(8), c(2,2,2)) ● Factors (enumerated type) faculty<­factor("engineering", c("arts",           "law", "engineering", "finances"))
  • 9. Basic Data Types (5) ● Data frames – A data frame combines a set of vector of the same length df<­data.frame(age=c(20L, 21L),                gpa=c(3.34, 3.14),                name=c("John", "George"),                married=c(T, F)) – Any data frame can be accessed either as a list or as a matrix df.1<­df[df$gpa>3.2, c("name","married")] df.2<­subset(df, subset=(gpa>3.2),              select=c(name,married)) identical(df.1, df.2) # TRUE
  • 12. Environments (1) ● Every variable or function is defined in an environment environment() # gives the current evaluation                 environment ● Environments form a tree with the root given by emptyenv() ● The root environment emptyenv() cannot be populated ● .GlobalEnv is the user's working environment or workspace. It can also be assessed by globalenv() identical(environment(), globalenv()) # TRUE ● baseenv() is the library environment for the basic R functions ls(baseenv())
  • 14. Environments (3) ● parent.env(env) returns the parent of environment env identical(parent.env(baseenv()),emptyenv()) # TRUE ● To create a new environment use new.env(parent) – If the parent parameter is omitted, .GlobalEnv is used by default ● To change the evaluation environment use evalq(expr, env) with(data, expr) # does the same to data frames and lists ● Example e.1<­new.env() # created a new environment e.1 parent.env(e.1) # should be .GlobalEnv evalq(environment(), e.1) # should be e.1 e.2<­new.env(parent=e.1) # created a new environment e.2 parent.env(e.2) # should be e.1 evalq(environment(), e.2) # should be e.2
  • 15. Environments (4) ● When resolving a variable or function name, R searches the current evaluation environment, then the parent environments along the path to the root environment emptyenv() x<­0 # set x to 0 in .GlobalEnv Both evalq(x, e.1) and evalq(x, e.2) should give 0 evalq(x<­2, e.2) # set x to 2 in e.2 Now evalq(x, e.1) still gives 0 while evalq(x, e.2) has changed to 2 ● To set an object, such as a variable or a function, in a particular environment use assign(obj.name, value, envir=env) # inherits is FALSE by default  ● To get the value of an object in a particular environment use get(obj.name, envir=env) # inherits is TRUE by default ● To check whether an object exists in a particular environment exists(obj.name, envir=env) # inherits is TRUE by default For example, exists("x", e.1, inherits=FALSE) # FALSE exists("x", e.2, inherits=FALSE) # TRUE
  • 16. Environments (5) ● Every environment can also be treated as a list. For example, e.2$x gives access to x in e.2 ● The so-called search path starts from .GlobalEnv and ends with baseenv(). The search() function returns string names of the environments in the search path [1] ".GlobalEnv" "tools:rstudio" [3] "package:stats" "package:graphics" [5] "package:grDevices" "package:utils" [7] "package:datasets" "package:methods" [9] "Autoloads" "package:base"
  • 17. Environments (6) ● To restore an environment from its string name, use as.environment(name). For example, as.environment("package:base") # maps the  string to the baseenv() object ● Unless the evaluation environment was specified explicitly, the interpreter searches the environments along the search path, starting from .GlobalEnv, until it hits emptyenv()
  • 18. Environments (7) ● To add an environment env to the search path one can use attach(env, pos, name) which creates a copy of the environment env with string name name and inserts it at position pos>1 in the search path ● find(obj.name) returns all environments along the search path containing objects with a specified name ● Example e.1$x<­1; attach(e.1, 2L, "e.1") assign("x", 11, e.1) # modified e.1 but not its attached  duplicate get("x", e.1) # returns 11 x # is still 1
  • 19. Functions (1) ● Functions in R are “first class objects” which means they can be treated much like any other object – Can be passed as arguments to other functions – Can be nested, so that you can define a function inside of another function – Can be returned by other functions ● The return value of a function is the last expression in the function body to be evaluated ● Example: factorial fact<­function(x) ifelse(x==1, 1, x*fact(x­1)) # ? in C fact # function(x) ifelse(x==1, 1, x*f(x­1)) fact(5) # 120 fact(1000) # Inf
  • 20. Functions (2) ● A function consists of its formal arguments and a body and it has a reference to the enclosing environment (closure) formals(fact) # $x body(fact) # ifelse(x == 1, 1, x * f(x ­ 1)) environment(fact) # .GlobalEnv ● By default, enclosing environment references the environment in which the function was created, but it can be redefined with environment(fact)<­some.other.environment ● Being called, a function creates its own environment, a child of the enclosing environment ● Thus we have – the environment where the function is created: find("fact") – the environment where the function resides (enclosing environment): environment(fact) – the environment created when a function is run: environment() – the environment where a function is called: parent.frame()
  • 21. Functions (3) ● Function arguments are named and may have default values ● You can mix positional matching with matching by name. When an argument is matched by name it is “taken out” of the argument list and the remaining unnamed arguments are matched in the order that they are listed in the function definition f<­function(x, y, z=0) as.list(environment()) f(1, 2) # x:1, y:2, z:0 f(y=1, x=2, z=3) # x:2, y:1, z:3 f(y=1) # x:, y:1, z:0 f(z=3, 2, 1) # x: 2, y: 1, z: 3 f(1, 2, 3, 4) # error: unused argument (4)
  • 22. Functions (4) ● The order of operations when given an argument is – Check for exact match for a named argument – Check for partial match – Check for a positional match – The … argument indicates a variable number of arguments that are usually passed on to other functions ● Any argument that appears after … in the argument list must be named explicitly and cannot be partially matched g<­function(y, z=0) as.list(environment()) f<­function(x, ...) g(...) f(1) # y:, z: 0 f(1, 2, 3) # y: 2, z: 3 f(1, 2, 3, 4) # error: unused argument (4) f(y=1, 2, 3) # y: 1, z: 3 f(2, 3, x=1) # y: 2, z: 3
  • 23. Functions (5) ● Free variables f<­function() x # x is a free variable f() # error: object 'x' not found x<­1; f() # 1 ● Lexical (static) scoping f<­function() {    x<­1    g<­function() x } x<­2; h<­f() h() # 1 or 2? ● Why 1? environment(g) # created by f() call, not .GlobalEnv environment(g)$x # 1
  • 24. Functions (6) ● Example: function that returns function power<­function(n) {   function(x) x^n } n<­5 # ignored square<­power(2) square(3) # 9 cube<­power(3) cube(2) # 8
  • 25. Classes (1) ● Everything in R is an object ● A class is the definition of an object ● A method is a function that performs specific calculations on objects of a specific class. A generic function is used to determine the class of its arguments and select the appropriate method. A generic function is a function with a collection of methods ● print, plot, summary... ● See ?Classes and ?Methods for more info
  • 26. Classes (2) ● S3 classes – old style, quick and dirty, informal ● Set an object's attribute to the class name, e.g. x<­c("a", "b", "c") # this is an object class(x)<­"X" # set the class of the object # Define a method specific to the X class print.X<­function(x, ...) {   cat("X obj:n")   print(unclass(x), ...) } print(x) # X obj: a b c ● Inheritance class(x)<­c("X", "Y", "Z") inherits(x, “Z”) # TRUE
  • 27. Classes (3) ● Constructor X<­function(x) {   if (!is.numeric(x)) stop("x must be numeric")   structure(x, class = "X") } ● S3 class useful methods is.object(obj) checks whether an object has a class attribute class(x), unclass(x), methods(generic.function), methods(class="class"), inherits(obj, "class"), is(obj, "class") ● Creating new generics g<­function(x, ...) UseMethod("f", x) f.X<­function(x, ...) print(x, ...) g(x) # X obj: a b c
  • 28. Classes (4) ● S4 classes – new style, rigorous and formal ● Classes have formal definitions which describe their fields and inheritance structures (parent classes) ● Method dispatch can be based on multiple arguments to a generic function, not just one ● There is a special operator, @, for extracting slots (aka fields) from an S4 object ● All S4 related code is stored in the methods package
  • 29. Classes (5) ● To create a new S4 class setClass(Class, representation) ● Use new() to generate a new object from a class ● To create an S4 method setMethod(f, signature,  definition)
  • 32. Packages (1) ● Packages extend functionality of R ● http://cran.r-project.org/web/packages – 5434 available packages as of Apr 14, 2014 ● repository → installed → loaded ● library(help="package") ● Datasets data(mtcars); help(mtcars) ● Example: libsvm install.packages("e1071") library(e1071) detach("package:e1071")
  • 33. Packages (2) ● Library and namespace environments – Library environments, such as "package:stats", contain external objects resolvable to the user by their names. Thus library environment has to be attached to the search path – Conversely, the namespace environments contain internal objects that are opaque to the user but transparent to the library functions. Usually, namespace environment is a .BaseNamespace's children find("svm") # "package:e1071" environment(svm) # <environment: namespace:e1071> length(ls(environment(svm))) # 90 objects in the namespace  environment length(ls("package:e1071")) # 58 objects in the package environment – Path from the namespace environment to .GlobalEnv <environment: namespace:e1071>   "imports:e1071"   .BaseNamespaceEnv → →  .GlobalEnv→
  • 34. Packages (3) ● Example SparkR sc<­sparkR.init(master) parallelize(sc, col, numSlices) map(rdd, func) reduce(rdd, func) reduceByKey(rdd, combineFunc, numPartitions) cache(rdd) collect(rdd) ● Pi example
  • 37. Appendix: funny stuff about R ● Expression which returns itself (function(x) substitute((x)(x))) (function(x) substitute((x)(x))) (function(x) substitute((x)(x))) (function(x) substitute((x)(x))) expression <­ (function(x)  substitute((x)(x)))(function(x)  substitute((x)(x))) expression == eval(expression) # TRUE

Editor's Notes

  1. R is a dialect of the S language. It is a case-sensitive, interpreted language. You can enter commands one at a time at the command prompt or run a set of commands from a source file.
  2. NULL object Special values: NA, NaN, Inf, NA may have a type