2. What is R?
• A language and environment for statistical computing
and graphics
• Wide variety of statistical & graphical techniques built in
• Free and Open Source software
• Compiles and runs on a wide variety of UNIX platforms,
Windows and MacOS
2
www.Sanaitics.com
3. R Environment
• Most functionality through built in functions
• Basic functions available by default
• Other functions contained in packages that can be
attached
• All datasets created in the session are in Memory
• Output can be used as input to other functions
• R commands are Case Sensitive
3
www.Sanaitics.com
4. R-CRAN
• The Comprehensive R Archive Network
• A network of global web servers storing identical, up-to-
date, versions of code and documentation for R
• Use the CRAN mirror nearest to you to download R
setup at a faster speed
• http://cran.r-project.org/
4
www.Sanaitics.com
6. The R-UI
6
The R console
-- Type in the command
-- Press enter
-- See the output
www.Sanaitics.com
7. Create Your Data Set
x<-c(12,23,45)
y<-c(13,21,6) Create vectors x, y, z on R Console
z<-c("a","b","c")
data1<-data.frame(x,y,z) Combine them in a table
data1 Type data name for output
Note that R is Case-Sensitive
7
www.Sanaitics.com
8. Your data in Table mode
edit(data1)
Edit your table, add new values & close the window to save your
updates
8
www.Sanaitics.com
9. Deriving & Removing Variables
Add new variables using mathematical operators
data1$z<-data1$x+data1$y
Delete an unwanted column from your table by using one of the
following methods
data1$x<-NULL
data1<-subset(data1, select = -y)
9
www.Sanaitics.com
10. Export your updated table
write.csv (data1,file.choose())
Select the path, name your file with .csv extension, click open and Yes
to export your table
10
www.Sanaitics.com
11. Data Set types
11
Vector 1 dimension
All elements have the same data
types
Numeric
Character
Logic
Factor
Matrix 2 dimensions
Array
2 or more
dimensions
Data frame 2 dimensions
Table-like data object allowing
different data types for different
columns
List
Collection of data objects, each element of a list is a data
object
www.Sanaitics.com
12. Useful In-Built Functions
min(data1$x) Use $ sign after data set name to use specific columns
max(data1$y) length(data1$x)
mean(data1$y)
levels(data1$z) Gives a list of unique categories in the data
12
www.Sanaitics.com
13. Save Your Script
Open a new script, write commands, execute using F5 key. Save the file at a desired
folder. Helps to save the script for a later use.
13
www.Sanaitics.com
14. R Workspace
• Objects that you create during an R session are hold
in memory, the collection of objects that you
currently have is called the workspace
• This workspace is not saved on disk unless you tell R
to do so
• This means that your objects are lost when you close
R and not save the objects, or worse when R or your
system crashes on you during a session
14
www.Sanaitics.com
15. R Workspace
• If you have saved a workspace image and you start R
the next time, it will restore the workspace
• So all your previously saved objects are available
again
• You can also explicitly load a saved workspace i.e.,
that could be the workspace image of someone else
• Go the `File' menu and select `Load workspace'
15
www.Sanaitics.com
16. R Workspace
Display all previous commands
history()
Display last 25 commands
history(max.show=25)
Save your command history to a file. Default is ".Rhistory"
savehistory(file="myfile")
Recall your command history. Default is ".Rhistory"
loadhistory(file="myfile")
16
www.Sanaitics.com
17. Types of Variables
• Type: Logical, integer, double, complex, character,
factor
• Type conversion functions have the naming
convention
as.xxx
• For example, as.integer(3.2) returns the integer 3,
and as.character(3.2) returns the string "3.2".
17
www.Sanaitics.com
18. Factors
• Tell R that a variable is nominal by making it a
factor
• The factor stores the nominal values as a vector of
integers in the range [ 1... k ] (where k is the number
of unique values in the nominal variable), and an
internal vector of character strings (the original
values) mapped to these integers
• An ordered factor is used to represent an ordinal
variable
18
www.Sanaitics.com
19. Factors
• R will treat factors as nominal variables and ordered
factors as ordinal variables in statistical procedures
and graphical analyses
• You can use options in the factor( ) and ordered( )
functions to control the mapping of integers to
strings (overriding the alphabetical ordering)
19
www.Sanaitics.com
20. Help in R
• If you know the topic but not the exact function
help.search(“topic”)
• If you know the exact function
help(function name) OR
?functionname
20
www.Sanaitics.com
21. In-Memory Computing
• Storage of information in the main RAM of
dedicated servers rather than in complicated
relational databases operating on comparatively
slow disk drives
• Helps business customers, including retailers,
banks and utilities, to quickly detect patterns,
analyze massive data volumes on the fly, and
perform their operations quickly
21
www.Sanaitics.com
22. Import .csv Data File
basic_salary<-read.table(“C:/Users/Documents/R/BASIC
SALARY DATA.csv", header=TRUE, sep=",")
• Command requires the file path separated by a /
• header=TRUE if there are row labels
22
www.Sanaitics.com
24. Importing .txt File
Importing a text file having delimiter = “blank/space”
sample1<-read.table(file.choose(),header=T)
sample1
Importing a comma separated text file
sample2<-read.table(file.choose(), header=T, sep=",")
sample2
Various special characters can act as separators. Check them in the
file before importing.
24
www.Sanaitics.com
25. Data Imported
Use Head function to get an idea about how your data looks like
Note that it displays the first 6 rows
head(basic_salary)
25
www.Sanaitics.com
29. Sub-setting using Indices
• Row level: display some selected rows
• By setting condition on observations
• Column level: display some selected columns
• By setting condition on variable names
• Display selected rows for selected columns
• By setting conditions on variables & observations
29
www.Sanaitics.com
60. Merging Cases (append)
• Appending two datasets using rbind function requires
both the datasets with exactly the same number of
variables with exactly the same names
• If datasets do not have the same number of variables,
variables can be either dropped or created so both match
60
www.Sanaitics.com
61. R-String Functions
Extract or replaces substrings in a character vector
General syntax: substr(column name, start, stop)
basic_salary$Location<-substr(basic_salary$Location,1,1)
head(basic_salary)
61
www.Sanaitics.com
62. Replace Function
Try for yourself
x<-c(“green” , “red”, “yellow”); x
x<-replace(x, 1,”good”); x
x<-replace(x, c(2,3), c(“apple”, “banana”)); x
62
www.Sanaitics.com
63. Extracting Complete Cases
Create a dataset
x<-c(12, NA, 13, 12)
y<-c(25, 48, NA, NA)
complete_case<-data.frame(x, y); complete_case
Extract complete cases using any one of the three commnds
case1<-na.omit(complete_case); case1
case2<-complete_case[complete.cases(complete_case),]
case2
case3<-na.exclude(complete_case); case3
63
www.Sanaitics.com
64. FIRST MILESTONE REACHED!
NOW THAT YOU HAVE MASTERED FIRST STEP, IT’S TIME FOR
R GET STARTED II
64
www.Sanaitics.com