SlideShare a Scribd company logo
1
Introduction to R
What is R?
Getting Started
Data structures
Scalar (number, string,
Boolean, Date-time) ,
Vector, Matrix, Data frame, List
Input / Output
Plots
Control Logic
Working with Strings
Writing Functions
Angshuman Saha
2
What is R?
• R is a free software environment for statistical computing and graphics. It
compiles and runs on a wide variety of UNIX platforms, Windows and
MacOS.
• R can be downloaded and installed from CRAN website (http://www.r-
project.org/)
• CRAN stands for Comprehensive R Archive Network
• Installation comes with base, stat and a few other packages. Other than
that, there are hundreds of contributed packages enabling users to a
variety of specialized computation on data
3
Getting Started in R
4
Getting Started
Double - click on the R icon on your desktop to start R
This launches the R GUI window
In the command prompt you can directly type your code
and hit Enter. This will run the code. This however runs the
code one line at a time.
1. Using command prompt
You can use a standard text editor like Notepad to create
your R code and save it in a text file. You can manually copy
the whole code from there and paste it in the RGUI window.
This will run the whole code.
2. Using external text files
You may save your R code in a text file with extension “.r”. You can then source this file to run the code.
Use “File>Source R code” from the menu to do this. Alternatively, you may type following command in R
prompt  source(“D:/myFirstRcode.r”) to run the code. You need to specify the full path of your R code
file within double-quotes, while using source().
3. Using .r files
5
Data Structure :
Vector
6
Vector > Creation
x = c(10, 12.3 , 45) # create a vector of 3 numbers
x = c(FALSE, TRUE , TRUE, FALSE) # create a vector of 4 logical (boolean)
variables
x = c(“red”, “green” , “blue”) # create a vector of 3 strings
x = c(1:15) # create a vector of integers 1 to 15
x = 1:15 # equivalent to previous code
x = rep( 5.6 , 10) # repeat 5.6, 10 times. Vector of length 10 , all entries equal to
5.6
x = rep( c(1,2) , c(3,2) ) # x= (1,1,1,2,2)
x = seq( 10 , 14 , 2) # sequence from 10 to 14 in steps of 2. x=(10,12,14)
x = vector(mode="numeric", length=0)
# Initialize a zero length numeric vector, values will be put inside it later
7
Vector > Accessing Elements
x = c(10, 12.3 , 45, 55, 65, 75, 85) # create a vector
y=x[2] # y has value 12.3
y=x[c(5,6,7)] # y is a vector with 5th,6th and 7th value of x
y=x[ -c(5,6,7) ] # y is a vector with all but 5th,6th and 7th value of x
y=x[c(1,1,3,4,7,7)] # y = (10,10,45,55,85,85)
Vector > Naming
x = c(10, 45, 55 ) # create a vector
names(x) = c(“first”, ”second”, ”third”) # name the elements of x
y=x[ “second” ] # y= 45. Elements can be accessed by name.
a = “third” ; y=x[ a ] # y = 55. Name can be passed through another variable
8
Vector > operations
x = c(10, 45, 55 ) ; y = c(1, 5, 6 ) # create two vectors x and y
z = x + y # z=(11,50,61) . Element-wise addition
z = x - y # z=(9,40,49) . Element-wise subtraction
z = x * y # z=(10,225,330) . Element-wise multiplication
z = x / y # z=(10,9,1.66667) . Element-wise division
z = x ^2 # z=(100,2025,3025) . Element-wise squaring
z = x[x>20] # z=(45,55) . All elements of x that are >20
z= which(x>20) # z= (2,3). Indices of x where x>20
z1 = x[x>20] ; z2 = x[ which( x>20 ) ] ; u= which(x>20) ; z3=x[u]
# z1 z2 and z3 are all identical
9
Data Structure :
Matrix
10
Matrix > Creation
x = matrix( 10, nrow=3 , ncol = 5) # x is a 3 by 5 matrix with all entries = 10
Matrix can be created from a vector
x = 1:12 ; mat = matrix(x , nrow = 4 , ncol=3)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
By default, numbers are stacked column wise.
To change that , use byrow = TRUE
x = 1:12 ; mat = matrix(x , nrow = 4 , ncol=3 , byrow = TRUE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
Row and column names can be assigned
colnames(mat) =c("col1","col2","col3")
rownames( mat ) = paste( “rowID ” , 1:4, sep=“_”)
col1 col2 col3
rowID_1 1 2 3
rowID_2 4 5 6
rowID_3 7 8 9
rowID_4 10 11 12
11
Matrix > Subsetting
Consider the Matrix – mat in previous example
x = mat[ 2, ] # a vector containing second row of mat
y = mat[ ,3 ] # a vector containing third column of mat
x = mat[ “rowID_3”, ] # third row of mat
x = mat[ ,”col2” ] # second column of mat
newmat = mat[ 1:2, 2:3 ] # sub-matrix of mat
newmat = mat[ c(1,2,4) , c(1,3) ] # sub-matrix of mat
diag_entries = diag(mat) # vector (1,5,9)
col1 col2
col3
rowID_1 1 2 3
rowID_2 4 5 6
rowID_3 7 8 9
rowID_4 10 11 12
Row / column names can be changed
rownames(mat) [3] = “third” ;
colnames(mat)[2]=“second col”
col1 Nm2 col3
rowID_1 1 2 3
rowID_2 4 5 6
third 7 8 9
rowID_4 10 11 12
Set all values > 9 to 99
mat [mat>9] = 99
col1 Nm2 col3
rowID_1 1 2 3
rowID_2 4 5 6
third 7 8 9
rowID_4 99 99 99
12
Matrix > Operations
Element-wise operations
mat1 = matrix(1:12, nrow=4, ncol = 3)
mat2 = matrix( 10*(1:12), nrow=4, ncol = 3)
mat3 = mat1 + mat2 # element-wise addition
# Similarly we can have element-wise
# subtraction , multiplication , division
mat1 = matrix(1:16, nrow=4, ncol = 4)
mat2 = matrix( 10*(1:16), nrow=4, ncol = 4)
mat3 = mat1 %*% mat2 # matrix-multiplication
Matrix multiplication
mat1 = matrix( rnorm(16) ,4,4)
mat2 = solve( mat1 )
Matrix inversion
13
Data Structure :
Data Frame
14
Data Frame > Background
• Data frame can be thought of as a matrix where the
columns may be of different types (e.g. text, date, number,
logical)
• Most datasets we work with can be stored as data frame
• Row / column subsetting works just like matrices
• Row and column names can be assigned
15
Data Frame > Creation
Data frames can be created by stacking individual vectors
column-wise
cust = c(“Bob” , “John” , “Jane”)
age= c(67, 45, 52)
ownHouse = c( FALSE , FALSE, TRUE)
cust_dat = data.frame( Name= cust, Age = age, ownHouse = ownHouse)
Name Age ownHouse
1 Bob 67 FALSE
2 John 45 FALSE
3 Jane 52 TRUE
Data frames can also be created by reading data from a csv
cust_dat =
read.csv( file = “custData.csv” , header = TRUE, stringsAsFactors =
FALSE)
header = TRUE  says that the 1st row of the file contains column names
stringsAsFactors = FALSE do not convert character vectors to “factors”
16
Data Frame > Creation
Consider two data frames - cust1 & cust2
cust = rbind(cust1 , cust2)
Name Age ownHouse
1 Bob 67 FALSE
2 John 45 FALSE
3 Jane 52 TRUE
Name Age ownHouse
1 Bill 55 TRUE
2 Jack 75 TRUE
3 Deb 49 TRUE
Name Age ownHouse
1 Bob 67 FALSE
2 John 45 FALSE
3 Jane 52 TRUE
4 Bill 55 TRUE
5 Jack 75 TRUE
6 Deb 49 TRUE
Two data frames can be stacked below each other
A new data frame can be created by subsetting an
existing data frame
cust = cust[cust$Age > 60 , ]
Name Age ownHouse
1 Bob 67 FALSE
5 Jack 75 TRUE
17
Data Frame > Creation
cust0 = data.frame(
Name=character(0) ,
Age=numeric(0) ,
ownHouse =
logical(0)
)
[1] Name Age ownHouse
<0 rows> (or 0-length row.names)
An empty data frame can be created by specifying
column names and types. It can be populated later.
An empty data frame can be created from an existing
data frame
cust0 = cust[0,]
[1] Name Age ownHouse
<0 rows> (or 0-length row.names)
18
Data Frame > Creation
Two data frames can be merged by a common column
By default, only common records are returned.
Using options - all , all.x , all.y – different record sets are
obtained. Records may contain missing values.
Name Age ownHouse
1 Bob 67 FALSE
2 John 45 FALSE
3 Jane 52 TRUE
Name PetCount hasCar
1 Bob 1 TRUE
2 John 0 FALSE
3 Jill 5 TRUE
cust= merge(cust1,cust2 ,
by = "Name")
Name Age ownHouse PetCount hasCar
1 Bob 67 FALSE 1 TRUE
2 John 45 FALSE 0 FALSE
cust= merge(cust1,cust2 ,
by = "Name" ,
all = TRUE)
Name Age ownHouse PetCount hasCar
1 Bob 67 FALSE 1 TRUE
2 Jane 52 TRUE NA NA
3 Jill NA NA 5 TRUE
4 John 45 FALSE 0 FALSE
19
Data Structure :
List
20
List > Background
• List can be thought of as a vector, whose elements may be
of different types
LIST
vector matrix
Another
List
21
List > Creation
An empty list
mylist = list() # nothing is known about the list
mylist = vector(mode=“list”, length=5) # length is known upfront
Non- empty list
mylist = list( c(1,5,7) , “abc” , matrix(0,3,3) )
List with names
mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) )
22
List > Accessing the entries
By Index
mylist = list( c(1,5,7) , “abc” , matrix(0,3,3) )
x = mylist[[1]] # x is a vector (1,5,7)
x = mylist[[2]] # x is a string “abc”
x = mylist[[1]] # x is a 3-by-3 matrix of zeros
By Name
mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) )
x = mylist$comp1 # x is a vector (1,5,7)
x = mylist$comp2 # x is a string “abc”
x = mylist$comp3 # x is a 3-by-3 matrix of zeros
23
List > Updating entries
By Index
By Name
mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) )
mylist[[4]] = 1024 # create a new entry at 4th position  a number 1024
mylist = mylist[-3] # drop the third entry from mylist
mylist[[2]] = “New Entry” # update the second entry
mylist$comp99 = 1024 # create a new entry at 4th position  its name “comp99”
mylist$comp1 = c(10,10) # update the entry – “comp1”
mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) )
names( mylist) # returns the vector – (“comp1” , “comp2” , “comp3”)
names( mylist) = c(“A”,”B”,”C”) # change the names of the components
names( mylist)[2] =”second” # change only the name of the second component
Renaming components
Subsets
newlist = mylist[ c(1,3,4) ] # new list contains the first, third and fourth entry of mylist
24
Data Structure :
Date & Time
25
Data Structure: Dates
Sys.time() # Returns the current system date and time.
x = strptime("02-07-2012",format="%m-%d-%Y")
x = strptime("02-feb-2012",format="%d-%b-%Y")
x = strptime("02-feb-2012 15:45:10",format="%d-%b-%Y %H:%M:%S")
String to Date-time
x = Sys.time() # on typing x in console you see : "2012-06-22 11:44:01 IST"
y = strftime(x , format="%d-%b-%Y") # "22-Jun-2012"
y = strftime(x , format="date: %d-%b-%Y >> Time: %H+%M+%S")
# "date: 22-Jun-2012 >> Time: 11+44+01«
y = strftime(x , format="%d-%b-%Y %a >> Time: %H hour %M min %S sec")
#"22-Jun-2012 Fri >> Time: 11 hour 44 min 01 sec"
Date-time to String
Study R help on date-time variables to learn about a large
number of possible format options
26
Data Structure: Dates
Two main (internal) formats for date-time are : POSIXct and POSIXlt
POSIXct : A short format of date-time, typically used to
store date-time columns in a data frame
POSIXlt : A long format of date-time, various other sub-units
of time can be extracted from here
x = Sys.time() # on typing x in console you see : "2012-06-22 11:44:01 IST"
y = as.POSIXlt(x) # Convert from POSIXct to POSIXlt
z = c(y$mon, y$year, y$hour, y$min, y$wday) # z = (5, 112, 11, 51, 5)
Examples
difftime
x1 = strptime("02-07-2012 14:20:34",format="%m-%d-%Y %H:%M:%S ")
x2 = strptime("11-07-2012 14:20:34",format="%m-%d-%Y %H:%M:%S ")
y = x2-x1 # y is a difftime object
x1 + as.difftime( 1 , units="days") # "2012-02-08 14:20:34 IST“
x1 + as.difftime( 10 , units=“mins") # "2012-02-07 14:30:34 IST"
27
Data Structure :
Others
28
Data Structures: Others
NULL
NULL is typically used for initializing variables. The code “x=NULL” creates a
variable x of length zero. It can later be converted to other values by overwriting x with some
other values. The function is.null() returns TRUE of FALSE and tells whether a variable is
NULL or not.
Other than the data structures described so far, there are a few very useful data types.
NA
NA is used for denoting missing values. The code “x=NA” creates a variable x with
missing values. The function is.na() returns TRUE of FALSE and tells whether a variable is NA
or not.
NaN
NaN stands for “Not a Number”. The code “x= sqrt(-10) ; y = log(-10)” sets value of x
and y to NaN. Also prints a warning message in console. The function is.nan() lets you check
whether the value of a variable is NaN or not.
Inf
Inf stands for “Infinity”. The code “x= 10/0 ; y = -3/0” sets value of x to Inf and y to -Inf.
The function is.finite() lets you check whether the value of a variable is infinity or not.
29
Input / output
30
Input
Read data (row-column format) from a csv file
x = read.csv(file = “D:/mydata.csv” , header = TRUE, stringsAsFactors = FALSE)
# x is a data frame containing the data in csv
Read data (row-column format) from a delimited file
x = read.table( file = “D:/mydata.csv” , sep = “,” , header = TRUE, stringsAsFactors =
FALSE)
# x is a data frame containing the data in csv
# read.csv is a special case of read.table with sep=“,”.
# In read.table you may specify any character(s) of your choice as a separator
Reading arbitrary data using a lower level function : scan()
Using scan() user can read character by character from a file.
These functions have many more optional input arguments
to let user control the way in which data is read.
31
Output
Write a R object in R workspace to disk
Write a data frame to a file on disk
# Assume: x is a data frame
# write.csv() writes it to a csv file on disk
write.csv( x, file = “D:/ out.csv” , row.names = FALSE, col.names=TRUE, na = “”)
# write.table() writes it to any user-specified file.
# write.csv(0 is a special case of write.table
write.table( x, file = “D:/ out.txt” ,
row.names = FALSE, col.names=TRUE, na = “” , sep = “t” )
# Assume: x is an object in R workspace
save( x, file = “D:/ out.RData”)
32
Plots
33
Plots – xy plot
x = rnorm(100, mean = 2 , sd = 2)
y = rnorm(100, mean = 10 , sd = 1)
plot(x,y,
xlab = "x-variable" , ylab = "y-variable",
main = "scatter plot example" ,
pch = 19 , cex= 0.7, col="blue")
X-y scatter plot
main
ylab
xlab
A large number of options available to control – axes, tick
marks, axes labels, legends, font type and size …. etc
34
Plots - overlay
x = rnorm(100, mean = 2 , sd = 2)
y = rnorm(100, mean = 10 , sd = 1)
plot(x,y,xlab = "x-variable" , ylab = "y-variable",
main = "scatter plot example" , pch = 19 , cex=
0.7, col="blue")
Generate a plot
Add red points later
x1 = rnorm(30, mean = 0 , sd = 1)
y1 = rnorm(30, mean = 12 , sd = 0.5)
points(x1,y1,pch = 15 , col="red" , cex=1)
35
Plots – multi panel plot
x = rnorm(100, mean = 2 , sd = 2)
y = rnorm(100, mean = 10 , sd = 1)
par(mfrow=c(2,2))
plot(x,y,xlab = "x-variable" , ylab = "y-
variable", main = "scatter plot example" , pch
= 19 , cex= 0.7, col="blue")
hist(x, xlab = "x-variable" , ylab = "frequency",
main = "histogram-x" , col = "grey",
border="blue" , lwd=2 )
hist(y, xlab = "y-variable" , ylab = "frequency",
main = "histogram-y" , col = "grey",
border="blue" , lwd=2 )
plot(density(x),col="limegreen",lwd=2,
xlab="x",ylab="density",main="density plot")
par( mfrow=c(2,2)) splits the plot region into a 2-by2 matrix.
Next 4 plot commands create plots in cells (1,1),(1,2),(2,1),(2,2)
36
Plots – saving to a file
x = rnorm(100, mean = 2 , sd = 2)
y = rnorm(100, mean = 10 , sd = 1)
png(file = "D:/testplots.png")
par(mfrow=c(2,2))
plot(x,y,xlab = "X" , ylab = "Y", main = " " , pch
= 19 , cex= 0.7, col="blue")
plot( 0,0, type="n", axes=F,
xlab="",ylab="",main="")
text(0,0, "NO DATA")
hist(y, xlab = "Y" , ylab = "frequency", main =
"histogram-y" , col = "grey", border="blue" ,
lwd=2 )
plot(density(x),col="limegreen",lwd=2,
xlab="x",ylab="density",main="density plot (X)
")
dev.off()
The code creates the above
plot and saves it in a png file
in the location :
D:/testplots.png
37
Control Logic
38
Control
# Generate k random numbers from N(0,1)
# k is not fixed apriori.
# Stop when sum of the value exceed 5
x = NULL ; stopIter = FALSE
while( !stopIter) {
x= c(x,rnorm(1,mean=0,sd=1) )
sumx=sum(x) ;
if (sumx >5){stopIter = TRUE} }
While ()
for ()
# Example of for loop
x = rnorm(100) ; y = rep(0, length(x))
for(i in 1:length(x) ){ y[i] = x[i] ^3 }
39
Working with
Strings
40
Working with Strings
x= nchar("WRA data Filtering") #counts number of characters – x= 18 in
this case
MetID = 2 ; x = paste(“Met”, MetID, sep = “:”) # string concatenation - x= “Met:2”
x = substr(“Met 12”, start=1, stop = 5) # substring from position 1 to 5 - x= “Met 1”
x = strsplit("Met1 has no data" , split = " ") # splits the string by “ ”. Returns a list
y = unlist(x) # y is a vector with 4 elements – “Met1” , “has”, “no”, “data”
x= sub( pattern = "Met1” , replacement = “Met2” , x = “Met1 is empty")
# replaces the first match - x = “Met2 is empty”
x= gsub("Met1” , “Met2” , x = “Met1 is empty. Met1 has no data.")
# replaces all matches - x = “Met2 is empty. Met2 has no data.”
x = c( “red” , “Blue” , “green” , “skyblue” )
y = grep(pattern =“blue”, x = x, ignore.case = TRUE) # y = (2,4) – positions of matches
z = grep(pattern =“blue”, x = x, ignore.case = TRUE, value = TRUE)
# z = (“Blue”,”skyblue”) – returns the actual strings that match the pattern
41
Regular Expressions
x=c("ht_10m","ht:20m"," ht_30m")
y = gsub("^ht_","HT:",x) # y = ("HT:10m" , "ht:20m" , " ht_30m")
# Replace “ht_” at the beginning of the string with “HT:”
y = gsub(“m$",”mtr",x) # y = ("ht_10mtr“ , "ht:20mtr“ , " ht_30mtr")
# Replace “m” at the end of the string with “mtr”
y = gsub(“[0-9]+",”XXX", x) # y = ("ht_XXXm" , "ht:XXXm" , " ht_XXXm")
# Replace one or more occurrence of digits with “XXX”
y = gsub(“_[0-9]+",”XXX", x) # y = ("htXXXm" , "ht:20m" , " htXXXm")
# Replace one or more occurrence of digits preceeded by “_” with “XXX”
u = grep(“^ht_[0-9]+m", x) ; y = x ; y[-u] = “invalid!”
# y = ("ht_10m" , "invalid!“ , "invalid!")
# Used for checking the validity of format of a string
Regular expressions provide a vast number of options in manipulating
strings. Study R help on regular expressions to know more.
42
Writing functions
43
Function
GetSummary = function ( x = NULL){
output = list( SumOfSqr = NA , Mean_x = NA, Failed = TRUE )
#Input Validation
if(is.null(x) || length(x) ==0 || ){ return(output) }
x1 = x[is.numeric(x)] ; if(length(x1) == 0) { return(output) }
###############
output$SumOfSqr = sum( x^2 , na.rm = T)
output$Mean_x = mean(x , na.rm = T)
output$Failed = FALSE
return(output)
}
Define the function
Use the function
x = rnorm(1000) ; out = GetSummary(x)
Argument
Default Value
Return Value
Comment
44
Further
Resources
45
Further Help on R
- http://cran.r-project.org/
- http://www.r-project.org/search.html
This page provides links to search engines specific to R
- Search for “R tutorial” , “R forum” …
Have fun exploring
the world of R

More Related Content

What's hot

Econometrics ch6
Econometrics ch6Econometrics ch6
Econometrics ch6
Baterdene Batchuluun
 
Regression ppt.pptx
Regression ppt.pptxRegression ppt.pptx
Regression ppt.pptx
DevendraSinghKaushal1
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
Rupak Roy
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
Chandan Reddy
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Simplilearn
 
Multivariate Data Analysis
Multivariate Data AnalysisMultivariate Data Analysis
Multivariate Data Analysis
Merul Romadhani
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
Khaled Abd Elaziz
 
R studio
R studio R studio
R studio
Kinza Irshad
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
RekhaChoudhary24
 
R programming slides
R  programming slidesR  programming slides
R programming slides
Pankaj Saini
 
統計的因果推論勉強会 第1回
統計的因果推論勉強会 第1回統計的因果推論勉強会 第1回
統計的因果推論勉強会 第1回
Hikaru GOTO
 
Diagnósticos do Modelo Clássico de Regressão Linear
Diagnósticos do Modelo Clássico de Regressão LinearDiagnósticos do Modelo Clássico de Regressão Linear
Diagnósticos do Modelo Clássico de Regressão Linear
Felipe Pontes
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
Rahul Rockers
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R Studio
Azmi Mohd Tamil
 
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Edureka!
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
Malla Reddy University
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
James Neill
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Model selection
Model selectionModel selection
Model selection
Animesh Kumar
 

What's hot (20)

Econometrics ch6
Econometrics ch6Econometrics ch6
Econometrics ch6
 
Regression ppt.pptx
Regression ppt.pptxRegression ppt.pptx
Regression ppt.pptx
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Multivariate Data Analysis
Multivariate Data AnalysisMultivariate Data Analysis
Multivariate Data Analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
R studio
R studio R studio
R studio
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
統計的因果推論勉強会 第1回
統計的因果推論勉強会 第1回統計的因果推論勉強会 第1回
統計的因果推論勉強会 第1回
 
Diagnósticos do Modelo Clássico de Regressão Linear
Diagnósticos do Modelo Clássico de Regressão LinearDiagnósticos do Modelo Clássico de Regressão Linear
Diagnósticos do Modelo Clássico de Regressão Linear
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R Studio
 
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Model selection
Model selectionModel selection
Model selection
 

Similar to A quick introduction to R

Introduction to r
Introduction to rIntroduction to r
Introduction to r
Golden Julie Jesus
 
R programming
R programmingR programming
R programming
Pramodkumar Jha
 
R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptx
kalai75
 
Multi dimensional arrays
Multi dimensional arraysMulti dimensional arrays
Multi dimensional arraysAseelhalees
 
Day 1d R structures & objects: matrices and data frames.pptx
Day 1d   R structures & objects: matrices and data frames.pptxDay 1d   R structures & objects: matrices and data frames.pptx
Day 1d R structures & objects: matrices and data frames.pptx
Adrien Melquiond
 
R Basics
R BasicsR Basics
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
062MayankSinghal
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
Chu An
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
Abhik Seal
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Day 1c access, select ordering copy.pptx
Day 1c   access, select   ordering copy.pptxDay 1c   access, select   ordering copy.pptx
Day 1c access, select ordering copy.pptx
Adrien Melquiond
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
Kevin Chun-Hsien Hsu
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Sander Kieft
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
Dr Nisha Arora
 
Chapter2
Chapter2Chapter2
Chapter2
Krishna Kumar
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
Josh Doyle
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
Dieudonne Nahigombeye
 
Matlab Overviiew 2
Matlab Overviiew 2Matlab Overviiew 2
Matlab Overviiew 2
Nazim Naeem
 

Similar to A quick introduction to R (20)

Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
R programming
R programmingR programming
R programming
 
R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptx
 
Multi dimensional arrays
Multi dimensional arraysMulti dimensional arrays
Multi dimensional arrays
 
Day 1d R structures & objects: matrices and data frames.pptx
Day 1d   R structures & objects: matrices and data frames.pptxDay 1d   R structures & objects: matrices and data frames.pptx
Day 1d R structures & objects: matrices and data frames.pptx
 
R Basics
R BasicsR Basics
R Basics
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
bobok
bobokbobok
bobok
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Day 1c access, select ordering copy.pptx
Day 1c   access, select   ordering copy.pptxDay 1c   access, select   ordering copy.pptx
Day 1c access, select ordering copy.pptx
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R교육1
R교육1R교육1
R교육1
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
Chapter2
Chapter2Chapter2
Chapter2
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
Matlab Overviiew 2
Matlab Overviiew 2Matlab Overviiew 2
Matlab Overviiew 2
 

Recently uploaded

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 

Recently uploaded (20)

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 

A quick introduction to R

  • 1. 1 Introduction to R What is R? Getting Started Data structures Scalar (number, string, Boolean, Date-time) , Vector, Matrix, Data frame, List Input / Output Plots Control Logic Working with Strings Writing Functions Angshuman Saha
  • 2. 2 What is R? • R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. • R can be downloaded and installed from CRAN website (http://www.r- project.org/) • CRAN stands for Comprehensive R Archive Network • Installation comes with base, stat and a few other packages. Other than that, there are hundreds of contributed packages enabling users to a variety of specialized computation on data
  • 4. 4 Getting Started Double - click on the R icon on your desktop to start R This launches the R GUI window In the command prompt you can directly type your code and hit Enter. This will run the code. This however runs the code one line at a time. 1. Using command prompt You can use a standard text editor like Notepad to create your R code and save it in a text file. You can manually copy the whole code from there and paste it in the RGUI window. This will run the whole code. 2. Using external text files You may save your R code in a text file with extension “.r”. You can then source this file to run the code. Use “File>Source R code” from the menu to do this. Alternatively, you may type following command in R prompt  source(“D:/myFirstRcode.r”) to run the code. You need to specify the full path of your R code file within double-quotes, while using source(). 3. Using .r files
  • 6. 6 Vector > Creation x = c(10, 12.3 , 45) # create a vector of 3 numbers x = c(FALSE, TRUE , TRUE, FALSE) # create a vector of 4 logical (boolean) variables x = c(“red”, “green” , “blue”) # create a vector of 3 strings x = c(1:15) # create a vector of integers 1 to 15 x = 1:15 # equivalent to previous code x = rep( 5.6 , 10) # repeat 5.6, 10 times. Vector of length 10 , all entries equal to 5.6 x = rep( c(1,2) , c(3,2) ) # x= (1,1,1,2,2) x = seq( 10 , 14 , 2) # sequence from 10 to 14 in steps of 2. x=(10,12,14) x = vector(mode="numeric", length=0) # Initialize a zero length numeric vector, values will be put inside it later
  • 7. 7 Vector > Accessing Elements x = c(10, 12.3 , 45, 55, 65, 75, 85) # create a vector y=x[2] # y has value 12.3 y=x[c(5,6,7)] # y is a vector with 5th,6th and 7th value of x y=x[ -c(5,6,7) ] # y is a vector with all but 5th,6th and 7th value of x y=x[c(1,1,3,4,7,7)] # y = (10,10,45,55,85,85) Vector > Naming x = c(10, 45, 55 ) # create a vector names(x) = c(“first”, ”second”, ”third”) # name the elements of x y=x[ “second” ] # y= 45. Elements can be accessed by name. a = “third” ; y=x[ a ] # y = 55. Name can be passed through another variable
  • 8. 8 Vector > operations x = c(10, 45, 55 ) ; y = c(1, 5, 6 ) # create two vectors x and y z = x + y # z=(11,50,61) . Element-wise addition z = x - y # z=(9,40,49) . Element-wise subtraction z = x * y # z=(10,225,330) . Element-wise multiplication z = x / y # z=(10,9,1.66667) . Element-wise division z = x ^2 # z=(100,2025,3025) . Element-wise squaring z = x[x>20] # z=(45,55) . All elements of x that are >20 z= which(x>20) # z= (2,3). Indices of x where x>20 z1 = x[x>20] ; z2 = x[ which( x>20 ) ] ; u= which(x>20) ; z3=x[u] # z1 z2 and z3 are all identical
  • 10. 10 Matrix > Creation x = matrix( 10, nrow=3 , ncol = 5) # x is a 3 by 5 matrix with all entries = 10 Matrix can be created from a vector x = 1:12 ; mat = matrix(x , nrow = 4 , ncol=3) [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 By default, numbers are stacked column wise. To change that , use byrow = TRUE x = 1:12 ; mat = matrix(x , nrow = 4 , ncol=3 , byrow = TRUE) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12 Row and column names can be assigned colnames(mat) =c("col1","col2","col3") rownames( mat ) = paste( “rowID ” , 1:4, sep=“_”) col1 col2 col3 rowID_1 1 2 3 rowID_2 4 5 6 rowID_3 7 8 9 rowID_4 10 11 12
  • 11. 11 Matrix > Subsetting Consider the Matrix – mat in previous example x = mat[ 2, ] # a vector containing second row of mat y = mat[ ,3 ] # a vector containing third column of mat x = mat[ “rowID_3”, ] # third row of mat x = mat[ ,”col2” ] # second column of mat newmat = mat[ 1:2, 2:3 ] # sub-matrix of mat newmat = mat[ c(1,2,4) , c(1,3) ] # sub-matrix of mat diag_entries = diag(mat) # vector (1,5,9) col1 col2 col3 rowID_1 1 2 3 rowID_2 4 5 6 rowID_3 7 8 9 rowID_4 10 11 12 Row / column names can be changed rownames(mat) [3] = “third” ; colnames(mat)[2]=“second col” col1 Nm2 col3 rowID_1 1 2 3 rowID_2 4 5 6 third 7 8 9 rowID_4 10 11 12 Set all values > 9 to 99 mat [mat>9] = 99 col1 Nm2 col3 rowID_1 1 2 3 rowID_2 4 5 6 third 7 8 9 rowID_4 99 99 99
  • 12. 12 Matrix > Operations Element-wise operations mat1 = matrix(1:12, nrow=4, ncol = 3) mat2 = matrix( 10*(1:12), nrow=4, ncol = 3) mat3 = mat1 + mat2 # element-wise addition # Similarly we can have element-wise # subtraction , multiplication , division mat1 = matrix(1:16, nrow=4, ncol = 4) mat2 = matrix( 10*(1:16), nrow=4, ncol = 4) mat3 = mat1 %*% mat2 # matrix-multiplication Matrix multiplication mat1 = matrix( rnorm(16) ,4,4) mat2 = solve( mat1 ) Matrix inversion
  • 14. 14 Data Frame > Background • Data frame can be thought of as a matrix where the columns may be of different types (e.g. text, date, number, logical) • Most datasets we work with can be stored as data frame • Row / column subsetting works just like matrices • Row and column names can be assigned
  • 15. 15 Data Frame > Creation Data frames can be created by stacking individual vectors column-wise cust = c(“Bob” , “John” , “Jane”) age= c(67, 45, 52) ownHouse = c( FALSE , FALSE, TRUE) cust_dat = data.frame( Name= cust, Age = age, ownHouse = ownHouse) Name Age ownHouse 1 Bob 67 FALSE 2 John 45 FALSE 3 Jane 52 TRUE Data frames can also be created by reading data from a csv cust_dat = read.csv( file = “custData.csv” , header = TRUE, stringsAsFactors = FALSE) header = TRUE  says that the 1st row of the file contains column names stringsAsFactors = FALSE do not convert character vectors to “factors”
  • 16. 16 Data Frame > Creation Consider two data frames - cust1 & cust2 cust = rbind(cust1 , cust2) Name Age ownHouse 1 Bob 67 FALSE 2 John 45 FALSE 3 Jane 52 TRUE Name Age ownHouse 1 Bill 55 TRUE 2 Jack 75 TRUE 3 Deb 49 TRUE Name Age ownHouse 1 Bob 67 FALSE 2 John 45 FALSE 3 Jane 52 TRUE 4 Bill 55 TRUE 5 Jack 75 TRUE 6 Deb 49 TRUE Two data frames can be stacked below each other A new data frame can be created by subsetting an existing data frame cust = cust[cust$Age > 60 , ] Name Age ownHouse 1 Bob 67 FALSE 5 Jack 75 TRUE
  • 17. 17 Data Frame > Creation cust0 = data.frame( Name=character(0) , Age=numeric(0) , ownHouse = logical(0) ) [1] Name Age ownHouse <0 rows> (or 0-length row.names) An empty data frame can be created by specifying column names and types. It can be populated later. An empty data frame can be created from an existing data frame cust0 = cust[0,] [1] Name Age ownHouse <0 rows> (or 0-length row.names)
  • 18. 18 Data Frame > Creation Two data frames can be merged by a common column By default, only common records are returned. Using options - all , all.x , all.y – different record sets are obtained. Records may contain missing values. Name Age ownHouse 1 Bob 67 FALSE 2 John 45 FALSE 3 Jane 52 TRUE Name PetCount hasCar 1 Bob 1 TRUE 2 John 0 FALSE 3 Jill 5 TRUE cust= merge(cust1,cust2 , by = "Name") Name Age ownHouse PetCount hasCar 1 Bob 67 FALSE 1 TRUE 2 John 45 FALSE 0 FALSE cust= merge(cust1,cust2 , by = "Name" , all = TRUE) Name Age ownHouse PetCount hasCar 1 Bob 67 FALSE 1 TRUE 2 Jane 52 TRUE NA NA 3 Jill NA NA 5 TRUE 4 John 45 FALSE 0 FALSE
  • 20. 20 List > Background • List can be thought of as a vector, whose elements may be of different types LIST vector matrix Another List
  • 21. 21 List > Creation An empty list mylist = list() # nothing is known about the list mylist = vector(mode=“list”, length=5) # length is known upfront Non- empty list mylist = list( c(1,5,7) , “abc” , matrix(0,3,3) ) List with names mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) )
  • 22. 22 List > Accessing the entries By Index mylist = list( c(1,5,7) , “abc” , matrix(0,3,3) ) x = mylist[[1]] # x is a vector (1,5,7) x = mylist[[2]] # x is a string “abc” x = mylist[[1]] # x is a 3-by-3 matrix of zeros By Name mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) ) x = mylist$comp1 # x is a vector (1,5,7) x = mylist$comp2 # x is a string “abc” x = mylist$comp3 # x is a 3-by-3 matrix of zeros
  • 23. 23 List > Updating entries By Index By Name mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) ) mylist[[4]] = 1024 # create a new entry at 4th position  a number 1024 mylist = mylist[-3] # drop the third entry from mylist mylist[[2]] = “New Entry” # update the second entry mylist$comp99 = 1024 # create a new entry at 4th position  its name “comp99” mylist$comp1 = c(10,10) # update the entry – “comp1” mylist = list( comp1 = c(1,5,7) , comp2 = “abc” , comp3 = matrix(0,3,3) ) names( mylist) # returns the vector – (“comp1” , “comp2” , “comp3”) names( mylist) = c(“A”,”B”,”C”) # change the names of the components names( mylist)[2] =”second” # change only the name of the second component Renaming components Subsets newlist = mylist[ c(1,3,4) ] # new list contains the first, third and fourth entry of mylist
  • 25. 25 Data Structure: Dates Sys.time() # Returns the current system date and time. x = strptime("02-07-2012",format="%m-%d-%Y") x = strptime("02-feb-2012",format="%d-%b-%Y") x = strptime("02-feb-2012 15:45:10",format="%d-%b-%Y %H:%M:%S") String to Date-time x = Sys.time() # on typing x in console you see : "2012-06-22 11:44:01 IST" y = strftime(x , format="%d-%b-%Y") # "22-Jun-2012" y = strftime(x , format="date: %d-%b-%Y >> Time: %H+%M+%S") # "date: 22-Jun-2012 >> Time: 11+44+01« y = strftime(x , format="%d-%b-%Y %a >> Time: %H hour %M min %S sec") #"22-Jun-2012 Fri >> Time: 11 hour 44 min 01 sec" Date-time to String Study R help on date-time variables to learn about a large number of possible format options
  • 26. 26 Data Structure: Dates Two main (internal) formats for date-time are : POSIXct and POSIXlt POSIXct : A short format of date-time, typically used to store date-time columns in a data frame POSIXlt : A long format of date-time, various other sub-units of time can be extracted from here x = Sys.time() # on typing x in console you see : "2012-06-22 11:44:01 IST" y = as.POSIXlt(x) # Convert from POSIXct to POSIXlt z = c(y$mon, y$year, y$hour, y$min, y$wday) # z = (5, 112, 11, 51, 5) Examples difftime x1 = strptime("02-07-2012 14:20:34",format="%m-%d-%Y %H:%M:%S ") x2 = strptime("11-07-2012 14:20:34",format="%m-%d-%Y %H:%M:%S ") y = x2-x1 # y is a difftime object x1 + as.difftime( 1 , units="days") # "2012-02-08 14:20:34 IST“ x1 + as.difftime( 10 , units=“mins") # "2012-02-07 14:30:34 IST"
  • 28. 28 Data Structures: Others NULL NULL is typically used for initializing variables. The code “x=NULL” creates a variable x of length zero. It can later be converted to other values by overwriting x with some other values. The function is.null() returns TRUE of FALSE and tells whether a variable is NULL or not. Other than the data structures described so far, there are a few very useful data types. NA NA is used for denoting missing values. The code “x=NA” creates a variable x with missing values. The function is.na() returns TRUE of FALSE and tells whether a variable is NA or not. NaN NaN stands for “Not a Number”. The code “x= sqrt(-10) ; y = log(-10)” sets value of x and y to NaN. Also prints a warning message in console. The function is.nan() lets you check whether the value of a variable is NaN or not. Inf Inf stands for “Infinity”. The code “x= 10/0 ; y = -3/0” sets value of x to Inf and y to -Inf. The function is.finite() lets you check whether the value of a variable is infinity or not.
  • 30. 30 Input Read data (row-column format) from a csv file x = read.csv(file = “D:/mydata.csv” , header = TRUE, stringsAsFactors = FALSE) # x is a data frame containing the data in csv Read data (row-column format) from a delimited file x = read.table( file = “D:/mydata.csv” , sep = “,” , header = TRUE, stringsAsFactors = FALSE) # x is a data frame containing the data in csv # read.csv is a special case of read.table with sep=“,”. # In read.table you may specify any character(s) of your choice as a separator Reading arbitrary data using a lower level function : scan() Using scan() user can read character by character from a file. These functions have many more optional input arguments to let user control the way in which data is read.
  • 31. 31 Output Write a R object in R workspace to disk Write a data frame to a file on disk # Assume: x is a data frame # write.csv() writes it to a csv file on disk write.csv( x, file = “D:/ out.csv” , row.names = FALSE, col.names=TRUE, na = “”) # write.table() writes it to any user-specified file. # write.csv(0 is a special case of write.table write.table( x, file = “D:/ out.txt” , row.names = FALSE, col.names=TRUE, na = “” , sep = “t” ) # Assume: x is an object in R workspace save( x, file = “D:/ out.RData”)
  • 33. 33 Plots – xy plot x = rnorm(100, mean = 2 , sd = 2) y = rnorm(100, mean = 10 , sd = 1) plot(x,y, xlab = "x-variable" , ylab = "y-variable", main = "scatter plot example" , pch = 19 , cex= 0.7, col="blue") X-y scatter plot main ylab xlab A large number of options available to control – axes, tick marks, axes labels, legends, font type and size …. etc
  • 34. 34 Plots - overlay x = rnorm(100, mean = 2 , sd = 2) y = rnorm(100, mean = 10 , sd = 1) plot(x,y,xlab = "x-variable" , ylab = "y-variable", main = "scatter plot example" , pch = 19 , cex= 0.7, col="blue") Generate a plot Add red points later x1 = rnorm(30, mean = 0 , sd = 1) y1 = rnorm(30, mean = 12 , sd = 0.5) points(x1,y1,pch = 15 , col="red" , cex=1)
  • 35. 35 Plots – multi panel plot x = rnorm(100, mean = 2 , sd = 2) y = rnorm(100, mean = 10 , sd = 1) par(mfrow=c(2,2)) plot(x,y,xlab = "x-variable" , ylab = "y- variable", main = "scatter plot example" , pch = 19 , cex= 0.7, col="blue") hist(x, xlab = "x-variable" , ylab = "frequency", main = "histogram-x" , col = "grey", border="blue" , lwd=2 ) hist(y, xlab = "y-variable" , ylab = "frequency", main = "histogram-y" , col = "grey", border="blue" , lwd=2 ) plot(density(x),col="limegreen",lwd=2, xlab="x",ylab="density",main="density plot") par( mfrow=c(2,2)) splits the plot region into a 2-by2 matrix. Next 4 plot commands create plots in cells (1,1),(1,2),(2,1),(2,2)
  • 36. 36 Plots – saving to a file x = rnorm(100, mean = 2 , sd = 2) y = rnorm(100, mean = 10 , sd = 1) png(file = "D:/testplots.png") par(mfrow=c(2,2)) plot(x,y,xlab = "X" , ylab = "Y", main = " " , pch = 19 , cex= 0.7, col="blue") plot( 0,0, type="n", axes=F, xlab="",ylab="",main="") text(0,0, "NO DATA") hist(y, xlab = "Y" , ylab = "frequency", main = "histogram-y" , col = "grey", border="blue" , lwd=2 ) plot(density(x),col="limegreen",lwd=2, xlab="x",ylab="density",main="density plot (X) ") dev.off() The code creates the above plot and saves it in a png file in the location : D:/testplots.png
  • 38. 38 Control # Generate k random numbers from N(0,1) # k is not fixed apriori. # Stop when sum of the value exceed 5 x = NULL ; stopIter = FALSE while( !stopIter) { x= c(x,rnorm(1,mean=0,sd=1) ) sumx=sum(x) ; if (sumx >5){stopIter = TRUE} } While () for () # Example of for loop x = rnorm(100) ; y = rep(0, length(x)) for(i in 1:length(x) ){ y[i] = x[i] ^3 }
  • 40. 40 Working with Strings x= nchar("WRA data Filtering") #counts number of characters – x= 18 in this case MetID = 2 ; x = paste(“Met”, MetID, sep = “:”) # string concatenation - x= “Met:2” x = substr(“Met 12”, start=1, stop = 5) # substring from position 1 to 5 - x= “Met 1” x = strsplit("Met1 has no data" , split = " ") # splits the string by “ ”. Returns a list y = unlist(x) # y is a vector with 4 elements – “Met1” , “has”, “no”, “data” x= sub( pattern = "Met1” , replacement = “Met2” , x = “Met1 is empty") # replaces the first match - x = “Met2 is empty” x= gsub("Met1” , “Met2” , x = “Met1 is empty. Met1 has no data.") # replaces all matches - x = “Met2 is empty. Met2 has no data.” x = c( “red” , “Blue” , “green” , “skyblue” ) y = grep(pattern =“blue”, x = x, ignore.case = TRUE) # y = (2,4) – positions of matches z = grep(pattern =“blue”, x = x, ignore.case = TRUE, value = TRUE) # z = (“Blue”,”skyblue”) – returns the actual strings that match the pattern
  • 41. 41 Regular Expressions x=c("ht_10m","ht:20m"," ht_30m") y = gsub("^ht_","HT:",x) # y = ("HT:10m" , "ht:20m" , " ht_30m") # Replace “ht_” at the beginning of the string with “HT:” y = gsub(“m$",”mtr",x) # y = ("ht_10mtr“ , "ht:20mtr“ , " ht_30mtr") # Replace “m” at the end of the string with “mtr” y = gsub(“[0-9]+",”XXX", x) # y = ("ht_XXXm" , "ht:XXXm" , " ht_XXXm") # Replace one or more occurrence of digits with “XXX” y = gsub(“_[0-9]+",”XXX", x) # y = ("htXXXm" , "ht:20m" , " htXXXm") # Replace one or more occurrence of digits preceeded by “_” with “XXX” u = grep(“^ht_[0-9]+m", x) ; y = x ; y[-u] = “invalid!” # y = ("ht_10m" , "invalid!“ , "invalid!") # Used for checking the validity of format of a string Regular expressions provide a vast number of options in manipulating strings. Study R help on regular expressions to know more.
  • 43. 43 Function GetSummary = function ( x = NULL){ output = list( SumOfSqr = NA , Mean_x = NA, Failed = TRUE ) #Input Validation if(is.null(x) || length(x) ==0 || ){ return(output) } x1 = x[is.numeric(x)] ; if(length(x1) == 0) { return(output) } ############### output$SumOfSqr = sum( x^2 , na.rm = T) output$Mean_x = mean(x , na.rm = T) output$Failed = FALSE return(output) } Define the function Use the function x = rnorm(1000) ; out = GetSummary(x) Argument Default Value Return Value Comment
  • 45. 45 Further Help on R - http://cran.r-project.org/ - http://www.r-project.org/search.html This page provides links to search engines specific to R - Search for “R tutorial” , “R forum” … Have fun exploring the world of R