1. Introduction to R Programming
A Session
By
Vaibhav Kumar
Dept. of CSE
DIT University, Dehradun
Vaibhav Kumar, DIT University, Dehradun
2. R
• R is a programming language and software environment for statistical
analysis, graphics representation and reporting.
• R was created by Ross Ihaka and Robert Gentleman at the University
of Auckland, New Zealand.
• R is freely a
• It was named R, based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka).
Vaibhav Kumar, DIT University, Dehradun
3. Features of R
• R is a well-developed, simple and effective programming language
which includes conditionals, loops, user defined recursive functions
and input and output facilities.
• R has an effective data handling and storage facility.
• R provides a suite of operators for calculations on arrays, lists, vectors
and matrices.
• R provides a large, coherent and integrated collection of tools for data
analysis.
• R provides graphical facilities for data analysis and display either
directly at the computer or printing at the papers.
Vaibhav Kumar, DIT University, Dehradun
4. A Simple Example
• A simple program to write “Hello” cab be written in R as:
>print(“Hello”)
• To add two numbers, a program can be written as:
>Print(2+3)
The first program can also be written as:
>message=“Hello”
>print(message)
Vaibhav Kumar, DIT University, Dehradun
5. Data Types and Objects in R
• While using any programming language, we must define the data type
of variables; means which type of data the variable will store.
• Some popularly used data types in R are: Logical, Numeric, Integer,
Complex, Character, Raw.
• Some frequently used objects in R are: Vectors, Lists, Matrices, Arrays,
Factors, Data Frames.
Vaibhav Kumar, DIT University, Dehradun
6. Vectors
• A function c() is used to combine the elements of a vectore
Example:
fruits=c(“Apple”, “Orange”, “Banana”)
print(fruits)
• When we execute the above code, we will get the following output:
“Apple” “Orange” “Banana”
Vaibhav Kumar, DIT University, Dehradun
7. Lists
• A list is an R-object which can contain many different types of elements
inside it like vectors, functions and even another list inside it.
Example
list1=list(c(“Apple”, “Orange”, “Banana”), c(2, 3, 5), 14.5)
print(list1)
When we execute the above code, we will get the following output:
[1] “Apple” “Orange” “Banana”
[2] 2 3 5
[3] 14.5
Vaibhav Kumar, DIT University, Dehradun
8. Matrices
• A matrix in R can be created using a vector input to the matrix
function.
Example:
M=matrix(c(1, 2,3,4,5,6,7,8,9),ncol=3,nrow=3)
When we execute the above code, we will get the following output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Vaibhav Kumar, DIT University, Dehradun
9. Data Frames
• Data frames are tabular data objects.
• Unlike a matrix in data frame each column can contain different modes of data.
• Data Frames are created using the data.frame() function.
Example:
>BMI=data.frame(
Name=c(“Vaibhav”, “Nitin”, “Aakash”),
Height=c(170, 169,175),
Weight=c(80, 75,78),
Age=c(30,30,29))
>print(BMI)
When we run the above code, we will get the following output:
Name Height Weight Age
1 Vaibhav 170 80 30
2 Nitin 169 75 30
3 Aakash 175 78 29
Vaibhav Kumar, DIT University, Dehradun
10. R-Excel File
• Microsoft Excel is the most widely used spreadsheet program which
stores data in the .xls or .xlsx format.
• R can read directly from these files using some excel specific
packages.
• We will have to run the following codes to install the package in R to
access excel files.
install.packages(“xlsx”)
library(“xlsx”)
(Note: Java environment must be installed before running these codes)
Vaibhav Kumar, DIT University, Dehradun
11. Reading the Excel File
• Let we have an excel file: marks.xlsx in the current working directory*, then
we will have to run the following code to read this file:
data=read.xlsx(“marks.xlsx”, sheetIndex=1)
print(data)
• To make a sub data frame from the main data frame, we can run the
following code
NameMarks=data.frame(data$Name, data$Final)
When we execute the above code, we can see the data of entire file which is
loaded into the data frame: data
(*.we can see the current working directory through the function getwd())
Vaibhav Kumar, DIT University, Dehradun
12. Statistical Operations in R
• Let us consider a vector of elements as:
values=c(4, 5, 8, 9, 2, 5, 3, 6, 9, 8, 1 ,4)
• Mean: mean(values)
• Mode: mode(values)
• Median: Median(values)
• Let us consider the previous example of marks, if we want to see the
Mean, Mode or Median of Final marks of students then we will have
to run mean(data$Final), median(data$Final).
Vaibhav Kumar, DIT University, Dehradun
13. Regression Analysis
• Regression analysis is a very widely used statistical tool to establish a
relationship model between two variables-predictor and response.
• The general mathematical equation for a linear regression is −
y=ax+b
Where y is the response variable, x is the predictor variable and a and b
are the constants known as coefficients of regression.
• In R, lm() function is used to create a relationship model between
these two variables.
Vaibhav Kumar, DIT University, Dehradun
14. Example of Regression Analysis
• Let us the example of marks of students.
• Suppose we are to analyze the relation between class test marks and final
marks of the students.
• Let y=data$Final, x=data$ClassTest
Then the relation can be created through the code:
relation=lm(y~x)
We can see the relation by running the following code:
print(relation)
• Summary of the relation can be seen through: summary(relation)
(Note: since we are working on very less amount of data, values may not be
acceptable)
Vaibhav Kumar, DIT University, Dehradun
15. Graphical Visualization of Regression
• Regression analysis in previous example can be visualized graphically as:
>png(file=“MarksRegression.png”)
>plot(x, y, col=“Blue”, main=“Class Test and Final Marks”,
abline(lm(y~x)), cex=1.3, pch=16, xlab=“Class Test”, ylab=“Final
Marks”)
>dev.off()
By running the above code, we can see a regression line of the relation
between class test and final marks.
Vaibhav Kumar, DIT University, Dehradun
16. Prediction
• By using the regression analysis, we can predict the value of response variable for
a new predictor value through predict() function.
• Consider the previous example, where if we need to predict the final marks of a
student on the basis of his marks in class test.
Let we are to predict final marks if marks in class test is 10.
>a=data.frame(x=10)
>result=predict(relation, a)
>print(relation)
(Note: result will be in highly acceptable range if we have a large data set to create
the model)
Vaibhav Kumar, DIT University, Dehradun
17. Multiple Regression
• Multiple regression is an extension of linear regression into
relationship between more than two variables.
• In simple linear relation we have one predictor and one response
variable, but in multiple regression we have more than one predictor
variable and one response variable.
• It can be expressed as:
Y=a+b1X1+b2X2+….+bnXn
Where, Y is the response variable, a, b1, b2,…,bn are the coefficients
and X1, X2,….,Xn are the predictor variables.
Vaibhav Kumar, DIT University, Dehradun
18. Multiple Regression in R
• Let us consider an example where result of students consists of Mid-Term Exams,
Class Tests, Quiz and Final Marks.
• Let we are to create a relation to analyze how Final marks are depending on Mid-
Term Exams, Class Tests and Quiz.
Let we have an another data set NewData which consists all these marks. Then a
relation can be created as:
Mul_Regr=lm(NewData$Final~NewData$MidTerm+NewData$Cla
ssTest+NewData$Quiz, data=NewData)
We can see this relation by
print(Mul_Regr)
Vaibhav Kumar, DIT University, Dehradun
19. Pie Chart
• In R the pie chart is created using the pie() function.
• Example:
x=c(20, 10, 40, 30)
labels=c(“Dehradun”, “Roorkee”, “Delhi”, “Ghaziabad”)
png(file=“PieChart.png”)
pie(x,labels)
dev.off()
Vaibhav Kumar, DIT University, Dehradun
20. Bar Chart
• Consider the final marks of students. It can be plotted through bar
chart as:
png(file=“BarChart.png”)
barplot(data$Final)
dev.off()
Vaibhav Kumar, DIT University, Dehradun
21. Histogram
• Consider the example of marks again. Let we are to plot the
histogram of final marks.
>png(file=“Histogram.png”)
>hist(data$Final, xlab=“Final Marks”, col=“Blue”,
border=“Red”)
>dev.off()
Vaibhav Kumar, DIT University, Dehradun