2. What is R?
• A language and environment for statistical computing
and graphics
• Wide variety of statistical & graphical techniques built in
• Free and Open Source software
• Compiles and runs on a wide variety of UNIX platforms,
Windows and MacOS
2
3. R Environment
• Most functionality through built in functions
• Basic functions available by default
• Other functions contained in packages that can be
attached
• All datasets created in the session are in Memory
• Output can be used as input to other functions
3
4. R-CRAN
• The Comprehensive R Archive Network
• A network of global web servers storing identical, up-to-
date, versions of code and documentation for R
• Use the CRAN mirror nearest to you to download R
setup at a faster speed
• http://cran.r-project.org/
4
6. The R-UI
6
The R console
-- Type in the command
-- Press enter
-- See the output
7. Create Your Data Set
x<-c(12,23,45)
y<-c(13,21,6) Create vectors x, y, z on R Console
z<-c("a","b","c")
data1<-data.frame(x,y,z) Combine them in a table
data1 Type data name for output
7
8. Your data in Table mode
edit(data1)
Edit your table, add new values & close the window to save your
updates
8
9. Export your updated table
write.csv (data1,file.choose())
Select the path, name your file and click save
9
10. Data Set types
10
Vector 1 dimension
All elements have the same data
types
Numeric
Character
Logic
Factor
Matrix 2 dimensions
Array
2 or more
dimensions
Data frame 2 dimensions
Table-like data object allowing
different data types for different
columns
List
Collection of data objects, each element of a list is a data
object
11. Useful In-Built Functions
min(data1$x) Use $ sign after data set name to use specific columns
max(data1$y) length(data1$x)
mean(data1$y)
levels(data1$z) Gives a list of unique categories in the data
11
12. Save Your Script
Open a new script, write commands, execute using F5 key. Save the file at a desired
folder. Helps to save the script for a later use.
12
13. R Workspace
• Objects that you create during an R session are hold
in memory, the collection of objects that you
currently have is called the workspace
• This workspace is not saved on disk unless you tell R
to do so
• This means that your objects are lost when you close
R and not save the objects, or worse when R or your
system crashes on you during a session
13
14. R Workspace
• If you have saved a workspace image and you start R
the next time, it will restore the workspace
• So all your previously saved objects are available
again
• You can also explicitly load a saved workspace i.e.,
that could be the workspace image of someone else
• Go the `File' menu and select `Load workspace'
14
15. R Workspace
Display last 25 commands
history()
Display all previous commands
history(max.show=Inf)
Save your command history to a file. Default is ".Rhistory"
savehistory(file="myfile")
Recall your command history. Default is ".Rhistory"
loadhistory(file="myfile")
15
16. Types of Variables
• Type: Logical, integer, double, complex, character,
factor or raw
• Type conversion functions have the naming
convention
as.xxx
• For example, as.integer(3.2) returns the integer 3,
and as.character(3.2) returns the string "3.2".
16
17. Factors
• Tell R that a variable is nominal by making it a
factor
• The factor stores the nominal values as a vector of
integers in the range [ 1... k ] (where k is the number
of unique values in the nominal variable), and an
internal vector of character strings (the original
values) mapped to these integers
• An ordered factor is used to represent an ordinal
variable
17
18. Factors
• R will treat factors as nominal variables and ordered
factors as ordinal variables in statistical procedures
and graphical analyses
• You can use options in the factor( ) and ordered( )
functions to control the mapping of integers to
strings (overriding the alphabetical ordering)
18
19. Help in R
• If you know the topic but not the exact function
Help.search(“topic”)
• If you know the exact function
Help(functionname)
19
20. In-Memory Computing
• Storage of information in the main RAM of
dedicated servers rather than in complicated
relational databases operating on comparatively
slow disk drives
• Helps business customers, including retailers,
banks and utilities, to quickly detect patterns,
analyze massive data volumes on the fly, and
perform their operations quickly
20
21. Import .csv Data File
basic_salary<-read.table(“C:/Users/Documents/R/BASIC
SALARY DATA.csv", header=TRUE, sep=",")
• Command requires the file path separated by a /
• header=TRUE if there are row labels
21
22. Data Imported
Use Head function to get an idea about how your data looks like
head(basic_salary)
22
25. Importing .txt File
#importing a text file having dlm = “blank/space”
sample1<-read.table(file.choose(),header=T)
sample1
#importing a comma separated text file
new<-read.table(file.choose(), header=T, sep=",")
new
#another mehod for importing a comma separated text
file
new1<-read.csv(file.choose(),header=T)
new1
25
27. Using Indices
• Row level sub setting: display some selected
rows
• Sub setting by putting condition on observations
• Column level sub setting: display some selected
columns
• Sub setting by putting condition on variable
names
• Display selected rows for selected columns
• Sub setting by putting conditions on variables
and observations
27
50. Recoding Based on User Cut Points
brk3<-with(basic_salary,c(min(basic_salary$ba),
15000,18000,max(basic_salary$ba)))
brk3
basic_salary15<-within(basic_salary,salary_partition<-
cut(basic_salary$ba,breaks=brk3,labels=1:3,include.lowest
=TRUE))
50
61. Merging Cases (append)
# rbind function
Appending two datasets using rbind function requires
both the datasets with exactly the same number of
variables with exactly the same names
If datasets do not have the same number of variables ,
variables can be either dropped or created so both
match.
or else
# smartbind function can be used
#need to donwload gtools package to use this function
61
65. Deleting missing cases# complete.cases, na.exclude(), na.omit()are for dealing manually with NAs in a
dataset
x<-c(12,NA,13,12)
y<-c(25,48,NA,NA)
data<-data.frame(x,y)
data
65
data1<-na.omit(data)
data1
x y
1 12 25
data[complete.cases(da
ta),]
x y
1 12 25
na.exclude(data)
x y
1 12 25
# variable gender with 20 "male" entries and # 30 "female" entries gender <- c(rep("male",20), rep("female", 30)) gender <- factor(gender) # stores gender as 20 1s and 30 2s and associates# 1=female, 2=male internally (alphabetically)# R now treats gender as a nominal variable summary(gender)
# variable rating coded as "large", "medium", "small'rating <- ordered(rating)# recodes rating to 1,2,3 and associates# 1=large, 2=medium, 3=small internally# R now treats rating as ordinal
# variable gender with 20 "male" entries and # 30 "female" entries gender <- c(rep("male",20), rep("female", 30)) gender <- factor(gender) # stores gender as 20 1s and 30 2s and associates# 1=female, 2=male internally (alphabetically)# R now treats gender as a nominal variable summary(gender)
# variable rating coded as "large", "medium", "small'rating <- ordered(rating)# recodes rating to 1,2,3 and associates# 1=large, 2=medium, 3=small internally# R now treats rating as ordinal