2. Data type defines what sort of value it is. The very most commonly
used data types are numbers which is called as numeric values in R
and text which is again a character value.
Data Structures can be defined as the structure of storing the data.
Some of the common data structures
Vectors: a collection of values that has all the same data type.
The elements of a vector can be numeric vector or a
character vector. A vector can also be used to represent
a single variable in a data set .
Data Types
Rupak Roy
3. Factors: are also a collection of variables that are used to
categorize the data and store it as levels. It is similar to a
vector except they can store both strings and integers.
For example : number of gear types in the column.
Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 . means it has
a list of three types of gears i.e. 3, 4 and 5
Data Structures: factors
Rupak Roy
4. Matrices: a two dimensional collection of values where all have
the same data type. The values are arranged
in rows and columns, used when the
data is a high dimensional array.
#creating a matrix and saving it as a R object
> A=matrix( c( 1,2,3,3,2,1), nrow= 3, ncol = 2)
Now we can access the elements using:
> A[1, ] #i.e. [row,column] Output=> 13
> A[ ,2] #i.e. [ , 2nd column] Output=> 321
> A[3,2] #i.e.[3rd row,2nd column] Output => 1
Data Structures: matrices
Rupak Roy
5. Data Structures: data frame
Data frame: it is like a single table with rows and columns of same or different
data types.
Let’s see how to create a data frame.
#vectors
> Student=c(1,2,3,4,5,6)
> pre_module_score=c(18,21,23,22,24,17)
> post_module_score=c(22,21,24,15,18,19)
> module_name=c( "graded", "graded", "non-graded",
"graded", "non-graded", "non-graded")
> test_scores= data.frame(Student,pre_module_score,post_module_score,
module_name)
Alternatively;
> test_scores=data.frame(Student=c(1,2,3,4,5,6), pre_module_score=c(18,21,23,22,24,17),
pos_module_score=c(22,21,24,15,18,19), module_name=c("graded","graded","non-
graded", "graded","non-graded","non- graded"))
6. List: is a collection of objects of same or different data types.
#list
We can access
a list by its list
Position
Like list [[1]]
[1] bob
Data Structures: list
Rupak Roy
7. #dataframe: a table with rows and columns
#take the following vectors
> subject=c("geography","history","chemistry")
> testscores=c(77,61,65)
> remarks = c("very good","average","good") > Markscard =
> data.frame(subject,testscores,remarks)
> Markscard
OR
> Markscard<-data.frame(subject = c("geography","history","chemistry"),
testscores=c(77,61,65),remarks = c("very good","average","good"))
Examples in R
Rupak Roy
8. To know what is the type of data structure the Markscard is use:
> class(Markscard)
And to access a element from the table or data.frame
> Markscard [,3] #i.e. Markscard [ row , column]
> Markscard[3,]
Let’s load the in-build R studio datasets.
>library(datasets)
#from the list of datasets we will call Iris dataset
>data(iris)
Examples in R
Rupak Roy
9. #to view the column names
> names(iris) or colnames(iris)
#to view the data structure of the iris data set
> str(iris)
#to know the dimensions of the iris data set
> dim(iris)
150 rows and 5 columns
#to view the number of rows and columns
> nrow(iris) and > ncol(iris)
Examples in R
Rupak Roy
10. Examples in R
#to view the whole dataset.
>view(iris)
#to view the top or the bottom most values from the dataset.
> head(iris)
> Tail(iris)
#to view only few top or the bottom most values from the dataset.
> head(iris,10) > tail(iris,10)
#to know more about head, tail function or any functions use:
> ?head
#to view a range of rows or columns from the dataset use
> iris[15:20,] #i.e. from rows 15 to 20
> iris[15:20,2:3] #i.e. from rows 15 to 20 and column 2 and 3
Rupak Roy
11. #we can access all the values of a particular column using $
> iris$Species
#Or we can access a particular value from the column using:
>iris$Species[3] i.e. the 3rd row of species column
#Else a range of values/rows from a particular column using:
> iris$Species[50:10] i.e. from row 50 to 100
#to have a quick summary of the dataset use:
> summary(iris)
#we can also check the data type of a variable using:
> is.character(iris$Species)
> is.numeric(iris$PetalLength)
Examples in R
Rupak Roy
12. Examples in R
#we can use the ‘attach’ function to take all the columns of iris
data set and create an individual objects so that we don’t have
to use $ to call the columns of the dataset.
> attach(iris)
> Species #‘Species’ previously accessed by using iris$Species
#it is better to detach the dataset after we are done with the
dataset as we are aware that R uses systems RAM to perform
its tasks.
> detach(iris)
Rupak Roy
13. #we can view the working directory of R by
> getwd()
#we also set the work directory by our choice by
> setwd(“c:/Users/data2dimensions/documents”)
#it is preferable to use:
> gc() i.e. garbage collection (GC) automatically releases memory
when an object is no longer used. It does this by tracking how many names
point to each object and when there are no names pointing to an object, it
deletes that object.
Examples in R
Rupak Roy
14. Next :
We will learn how to import and export data in R
Data types and structures
Rupak Roy