SlideShare a Scribd company logo
Introduction to R
Regression Module
Pinaki M Mukherjee
Regression Module Introduction to R Pinaki M Mukherjee 1 / 42
1 About R software environment
2 Download and install R
3 R packages
4 Types of data in R & import data in R
5 Regression Modelling
6 Go to R Lab
7 Interpretation of R regression output
Regression Module Introduction to R Pinaki M Mukherjee 2 / 42
About R software environment
About R software environment
Regression Module Introduction to R Pinaki M Mukherjee 3 / 42
About R software environment
What is R?
R is a powerful software environment for statistical computing and
graphics
R is free open source software licensed under the GNU general public
license
R runs in Linux,Mac and Windows
Users of R
Economists
Financial Analysts
Market Researchers
Academicians
Bio- Scientist
Many other professionals for quantitative research
Regression Module Introduction to R Pinaki M Mukherjee 4 / 42
About R software environment
Why to use R?
There are may statistical softwares like SAS, SPSS, E-Views, STATA etc.
Why to use R?
Some of the very strong reasons
Its FREE
More than 2 million users around the world
High acceptability and recognition: Extensively used by large corporate
houses and business schools & universities (like Stanford, Harvard,
Johns Hopkins, Princeton, Washington University, MIT etc.)
New features are being developed all the time
Active R community and free updated resources
Regression Module Introduction to R Pinaki M Mukherjee 5 / 42
About R software environment
Applications of R
Data sourcing
Data cleaning
Data structuring
Data warehousing
Statistical and econometric modelling
Report document generation
Preparing presentation
Automate reproducable research
To know more about applications of R click Here
Regression Module Introduction to R Pinaki M Mukherjee 6 / 42
Download and install R
Download and install R
Regression Module Introduction to R Pinaki M Mukherjee 7 / 42
Download and install R
Install R
Open www.r-project.org
Click the “download R”
Select a CRAN location (a mirror site) and click the corresponding link.
Click on the “Download R for Windows” link at the top of the page
Click on the “install R for the first time” link at the top of the page.
Click “Download R for Windows” and save the executable file
Run the .exe file and follow the installation instructions
After installation of R, you need to download and install RStudio. RStudio
is easy to use interface for R loaded with many user friendly and useful
features. Like R, RStudio is also free.
Regression Module Introduction to R Pinaki M Mukherjee 8 / 42
Download and install R
Install RStudio
Open www.rstudio.com
Click on the “Download RStudio” button
Click on “Download RStudio Desktop”
Click on the version recommended for your system, or the latest
Windows version
Download and save the executable file
Run the .exe file and follow the installation instructions
Regression Module Introduction to R Pinaki M Mukherjee 9 / 42
R packages
R packages
Regression Module Introduction to R Pinaki M Mukherjee 10 / 42
R packages
About R packages
Packages are collections of R functions, data, and compiled code in a
well-defined format
There are more than 6000 packages in R
However we need only a handful of packages to work
Only 1% of external packages are required to efficiently execute 99% of
work
The packages are kept in dedicated serves maintained by R community.
In India, we have the seerver at IIT Madras
The network of servers are also called CRAN (Comprehensive R
Archive Network)
Regression Module Introduction to R Pinaki M Mukherjee 11 / 42
R packages
Install R packages
(You need a proper internet connection to successfully run the codes)
To install one specific package in R
On the R Console, type install.packages(“forecast”) “forecast” is the
name of the package
Run the following code in the “Console” of RStudio to install only the most
important packages relecent for us.
install.packages(“ctv”)
library(“ctv”)
install.views(“Econometrics”)
install.views(“ReproducibleResearch”)
Install.views(“Finance”)
install.views(“MachineLearning”)
Regression Module Introduction to R Pinaki M Mukherjee 12 / 42
R packages
Some important commands
Load a package in the R session
library(forecast)
See the packages loaded in the R session
search()
Regression Module Introduction to R Pinaki M Mukherjee 13 / 42
Types of data in R & import data in R
Types of data in R & import data in R
Regression Module Introduction to R Pinaki M Mukherjee 14 / 42
Types of data in R & import data in R
Vectors, Factors & Matrix
A vector is a collection of data elements of the same type (also called class
in R). There are four different class of data elements
Charecter
Numeric
Logical
Date and
Intiger
Factors are qualitative valiable extensively used in modelling. For example,
interest rate changed by RBI in different RBI monetary policy reviw
meetings
Matrix is a numeric vectors with multiple dimentions in rows and columns
Regression Module Introduction to R Pinaki M Mukherjee 15 / 42
Types of data in R & import data in R
Dataframe & List
Dataframe
Most frequently used format for statistical analysis
Different than matrices. It can store different classes of vectors
It can be created in R by simple data entry Or
It can be imported from external sources, like datasets in csv files,
excel files etc
List is a collection of different dataframes. May resemble to a workbook
with different Sheets
Regression Module Introduction to R Pinaki M Mukherjee 16 / 42
Types of data in R & import data in R
Import external data into R
From csv or txt file
read.csv(‘data.csv’)
From excel
install.packages(‘xlsx’) (if xlsx package is not already installed)
library(xlsx) (load the ‘xlsx’ package into the R session)
read.xlsx(‘data.xlsx’, sheetIndex= 1) (Importing data in Sheet 1
of’data.xlsx’ )
Regression Module Introduction to R Pinaki M Mukherjee 17 / 42
Regression Modelling
Regression Modelling
Regression Module Introduction to R Pinaki M Mukherjee 18 / 42
Regression Modelling
Linear regression
Linear regression is a simple approach to supervised learning. It
assumes that the dependence of Y on X1,X2, . . . . ,Xp is linear
True regression functions are never linear
although it may seem overly simplistic, linear regression is extremely
useful both conceptually and practically
Regression Module Introduction to R Pinaki M Mukherjee 19 / 42
Regression Modelling
Questions we might ask
Is there a relationship between the dependent and independent
variable?
How strong is the relationship between the dependent and independent
variable?
Which independent variable contributes to dependent variable?
Is the relationship linear?
How accurately can we forecast the value of the dependent variable?
Regression Module Introduction to R Pinaki M Mukherjee 20 / 42
Regression Modelling
Simple linear regression using a single predictor X.
We assume a model:
Y = β0 + β1X +
where β0 and β1 are two unknown constants that represent the intercept
and slope, also known as coefficients or parameters, and is the error term
or residual
Given the estimates for β0 and β1 for the model coefficients, we can
forecast Y using the following equations
ˆy = ˆβ0 + ˆβ1x
where ˆy indicates a prediction of Y on the basis of X = x. The hat symbol
denotes an estimated value.
Regression Module Introduction to R Pinaki M Mukherjee 21 / 42
Regression Modelling
Estimation of the parameters by least squares
ˆyi = ˆβ0 + ˆβ1xi
Let ˆyi be the prediction for Y based on xi value of X
i = yi − ˆy represents the ith residual
We define the residual sum of square also called RSS as
RSS = 2
1 + 2
2 + 2
3 + .... + 2
n
The least squares approach chooses ˆβ0 and ˆβ1 to minimize the RSS
Regression Module Introduction to R Pinaki M Mukherjee 22 / 42
Regression Modelling
Simple regression model: The advertisement data
0 50 100 150 200 250 300
051015202530
Sales to TV Advertisement
TV Ad budget
Sales
Regression Module Introduction to R Pinaki M Mukherjee 23 / 42
Regression Modelling
Simple regression model: The advertisement data
0 50 100 150 200 250 300
051015202530
Sales to TV Advertisement
TV Ad budget
Sales
Regression Module Introduction to R Pinaki M Mukherjee 24 / 42
Regression Modelling
Simple regression model: The advertisement data
0 50 100 150 200 250 300
051015202530
Sales to TV Advertisement
TV Ad budget
Sales
Regression Module Introduction to R Pinaki M Mukherjee 25 / 42
Regression Modelling
Multiple regression using more than one predictor
We assume a model:
Y = β0 + β1X1 + β2X2 +
where β0, β1 and β3 are two unknown constants that represent the
intercept and slope, also known as coefficients or parameters, and is the
error term or residual
Given the estimates for β0 ,β1 and β2 for the model coefficients, we can
forecast Y using the following equations
ˆy = ˆβ0 + ˆβ1x1 + ˆβ2x2
where ˆy indicates a prediction of Y on the basis of X = x. The hat symbol
denotes an estimated value.
Regression Module Introduction to R Pinaki M Mukherjee 26 / 42
Regression Modelling
Estimation of the parameters by least squares
ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i
Let ˆyi be the prediction for Y based on x1i value of X1 and x2i value
of X2
i = yi − ˆy represents the ith residual
We define the residual sum of square also called RSS as
RSS = 2
1 + 2
2 + 2
3 + .... + 2
n
The least squares approach chooses ˆβ0, ˆβ1 and ˆβ2 to minimize the RSS
Regression Module Introduction to R Pinaki M Mukherjee 27 / 42
Regression Modelling
Multiple regression model: The advertisement data
Adding elements
0 50 100 150 200 250 300
051015202530
0
10
20
30
40
50
TV
Radio
Sales
Regression Module Introduction to R Pinaki M Mukherjee 28 / 42
Regression Modelling
Multiple regression model: The advertisement data
Adding elements
0 50 100 150 200 250 300
051015202530
0
10
20
30
40
50
TV
Radio
Sales
Regression Module Introduction to R Pinaki M Mukherjee 29 / 42
Regression Modelling
Multiple regression model: The advertisement data
Adding elements
0 50 100 150 200 250 300
051015202530
0
10
20
30
40
50
TV
Radio
Sales
Regression Module Introduction to R Pinaki M Mukherjee 30 / 42
Go to R Lab
Go to R Lab
Regression Module Introduction to R Pinaki M Mukherjee 31 / 42
Go to R Lab
Go to R Lab: Objective of the Lab
Import external data
Corelation matrix
Estimating regression coefficients
Estimating error term/residuals
Print regression model summary
Regression Module Introduction to R Pinaki M Mukherjee 32 / 42
Interpretation of R regression output
Interpretation of R regression output
Regression Module Introduction to R Pinaki M Mukherjee 33 / 42
Interpretation of R regression output
Accuracy of the estimated coefficient: Confidence interval
The standard error of an estimator reflects how it varies under repeated
sampling
These standard errors can be used to compute confidence intervals
A 95% confidence interval is defined as a range of values such that
with 95% probability, the range will contain the true unknown value of
the parameter.
It has the form ˆβ1 ± 2 ∗ SE( ˆβ1)
That is, there is approximately a 95% chance that the interval
[ ˆβ1 − 2 ∗ SE( ˆβ1), ˆβ1 + 2 ∗ SE( ˆβ1)]
For the advertising data, the 95% confidence interval for β1 is [0:042;
0:053]
Regression Module Introduction to R Pinaki M Mukherjee 34 / 42
Interpretation of R regression output
Hypothesis testing
Standard errors can also be used to perform hypothesis tests on the
coefficients
The most common hypothesis test involves testing
The null hypothesis of H0 : There is no relationship between X and Y
Vs
The alternative hypothesis of HA : There is some relationship between X
and Y
Regression Module Introduction to R Pinaki M Mukherjee 35 / 42
Interpretation of R regression output
Hypothesis testing: Mathematically mean
Testing
H0 : β1 = 0
Vs
H0 : β1 = 0
if β1 = 0 it means X is not associated with Y
To test the null hypothesis, we compute a t-statistic, given by
t =
ˆβ1−0
SE( ˆβ1)
Using R, it is easy to compute the probability of observing any value
equal to |t| or larger. We call this probability the p-value.
If we see a small p-value,then we can infer that there is an association
between the predictor and the response. We reject the null
hypothesis-that is, we declare a relationship to exist between X and Y
Regression Module Introduction to R Pinaki M Mukherjee 36 / 42
Interpretation of R regression output
Assessing the Overall Accuracy of the Model: R2
R-squared or fraction of variance explained is
R2 = TSS−RSS
TSS = 1 − RSS
TSS
TSS= Total Sum of Square, also called total variation RSS= Residual sum
of Square, also called unexplained variation Explained variation= Total
variation - Unexplained variation
R2 measures the proportion of variability in dependent variable that
can be explained using independent variable
An R2 statistic that is close to 1 indicates that a large proportion of
the variability in the response has been explained by the regression
A number near 0 indicates that the regression did not explain much of
the variability in the response
Regression Module Introduction to R Pinaki M Mukherjee 37 / 42
Interpretation of R regression output
Assessing the Overall Accuracy of the Model: R2
. . .
R2 always lies between 0 and 1.
However, it can still be challenging to determine what is a good R2 value?
Depend on the application
Physics ~ Close to 1, smaller value value might indicate a serious
problem
Biology, Psychology ~ well below 0.1 might be more realistic!
Economics and finance ~ well above 0.6 might be more acceptable!
what is the value of R2 in our data?
Regression Module Introduction to R Pinaki M Mukherjee 38 / 42
Interpretation of R regression output
Assessing the Overall Accuracy of the Model: F Test
F =
TSS−RSS
p
RSS
n−p−1
n = numberofobservations
p = numberofindependentvariables
Intuitively if the model is a good fit then the explained variation
(TSS − RSS) will be high relative to the RSS.
F value higher than 1 is desired
Just how high depends on the sample size n and the number of
independent variables
Regression Module Introduction to R Pinaki M Mukherjee 39 / 42
Interpretation of R regression output
Answer to “Questions we might ask” in advertisement data
Is there a relationship between the dependent and independent variable?
Is there a relationship between advertising budget and sales?
How strong is the relationship between the dependent and independent
variable?
How strong is the relationship?
Which independent variable contributes to dependent variable?
Which media contribute to sales?
How large is the effect of each medium on sales?
Is the relationship linear?
Is the relationship linear?
How accurately can we forecast the value of the dependent variable?
Regression Module Introduction to R Pinaki M Mukherjee 40 / 42
Interpretation of R regression output
Exporting the regression output
a <- capture.output(summary(reg))
cat(a, file = "trial.txt", sep = "n", append = TRUE)
Regression Module Introduction to R Pinaki M Mukherjee 41 / 42
Interpretation of R regression output
Online free resources
R Cookbook : http://www.cookbook-r.com/
Try R: http://tryr.codeschool.com/
Video tutorials: http://www.twotorials.com/
I shall be glad to help you
Follow: Me in Google plus and my blog
Email: pinaki.economics@gmail.com
Mobile: +91 9818383989
Regression Module Introduction to R Pinaki M Mukherjee 42 / 42

More Related Content

What's hot

Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-Normalisation
Ajit Nayak
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
Sakthi Dasans
 
Matlab intro
Matlab introMatlab intro
Matlab introfvijayami
 
Introduction to c++
Introduction to c++Introduction to c++
Introduction to c++
Prof. Dr. K. Adisesha
 
Introduction to c++ ppt
Introduction to c++ pptIntroduction to c++ ppt
Introduction to c++ ppt
Prof. Dr. K. Adisesha
 
R - the language
R - the languageR - the language
R - the language
Mike Martinez
 
Unit II - LINEAR DATA STRUCTURES
Unit II -  LINEAR DATA STRUCTURESUnit II -  LINEAR DATA STRUCTURES
Unit II - LINEAR DATA STRUCTURES
Usha Mahalingam
 
MatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On PlottingMatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On Plotting
MOHDRAFIQ22
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
Malla Reddy University
 
Numerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioningNumerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioning
Scilab
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
Ragia Ibrahim
 
Structure In C
Structure In CStructure In C
Structure In C
yndaravind
 
Fnctions part2
Fnctions part2Fnctions part2
Fnctions part2
yndaravind
 
Final project report
Final project reportFinal project report
Final project reportssuryawanshi
 
stacks in algorithems and data structure
stacks in algorithems and data structurestacks in algorithems and data structure
stacks in algorithems and data structure
faran nawaz
 
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translationUnit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Ajith kumar M P
 
Compiler unit 4
Compiler unit 4Compiler unit 4
Compiler unit 4
BBDITM LUCKNOW
 
Computer science ms
Computer science msComputer science ms
Computer science ms
B Bhuvanesh
 
Tokens expressionsin C++
Tokens expressionsin C++Tokens expressionsin C++
Tokens expressionsin C++
HalaiHansaika
 

What's hot (20)

Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-Normalisation
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Matlab intro
Matlab introMatlab intro
Matlab intro
 
Introduction to c++
Introduction to c++Introduction to c++
Introduction to c++
 
Introduction to c++ ppt
Introduction to c++ pptIntroduction to c++ ppt
Introduction to c++ ppt
 
Matlab project
Matlab projectMatlab project
Matlab project
 
R - the language
R - the languageR - the language
R - the language
 
Unit II - LINEAR DATA STRUCTURES
Unit II -  LINEAR DATA STRUCTURESUnit II -  LINEAR DATA STRUCTURES
Unit II - LINEAR DATA STRUCTURES
 
MatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On PlottingMatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On Plotting
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
 
Numerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioningNumerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioning
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Structure In C
Structure In CStructure In C
Structure In C
 
Fnctions part2
Fnctions part2Fnctions part2
Fnctions part2
 
Final project report
Final project reportFinal project report
Final project report
 
stacks in algorithems and data structure
stacks in algorithems and data structurestacks in algorithems and data structure
stacks in algorithems and data structure
 
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translationUnit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
 
Compiler unit 4
Compiler unit 4Compiler unit 4
Compiler unit 4
 
Computer science ms
Computer science msComputer science ms
Computer science ms
 
Tokens expressionsin C++
Tokens expressionsin C++Tokens expressionsin C++
Tokens expressionsin C++
 

Similar to Introduction to R : Regression Module

Lecture_R.ppt
Lecture_R.pptLecture_R.ppt
Lecture_R.ppt
Abebe334138
 
Rbootcamp Day 1
Rbootcamp Day 1Rbootcamp Day 1
Rbootcamp Day 1
Olga Scrivner
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
Unmesh Baile
 
A Handbook Of Statistical Analyses Using R
A Handbook Of Statistical Analyses Using RA Handbook Of Statistical Analyses Using R
A Handbook Of Statistical Analyses Using R
Nicole Adams
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
ArchishaKhandareSS20
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
vikassingh569137
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
anshikagoel52
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
SOUMIQUE AHAMED
 
Getting started with R
Getting started with RGetting started with R
R studio
R studio R studio
R studio
Kinza Irshad
 
R basics
R basicsR basics
R basics
Sagun Baijal
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
Andrew Lowe
 
Introduction to R - from Rstudio to ggplot
Introduction to R - from Rstudio to ggplotIntroduction to R - from Rstudio to ggplot
Introduction to R - from Rstudio to ggplot
Olga Scrivner
 
Basics of R
Basics of RBasics of R
Basics of R
Sachita Yadav
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
NareshKarela1
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
ranapoonam1
 
Quantitative Data Analysis using R
Quantitative Data Analysis using RQuantitative Data Analysis using R
Quantitative Data Analysis using R
Taddesse Kassahun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 

Similar to Introduction to R : Regression Module (20)

Lecture_R.ppt
Lecture_R.pptLecture_R.ppt
Lecture_R.ppt
 
Rbootcamp Day 1
Rbootcamp Day 1Rbootcamp Day 1
Rbootcamp Day 1
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
A Handbook Of Statistical Analyses Using R
A Handbook Of Statistical Analyses Using RA Handbook Of Statistical Analyses Using R
A Handbook Of Statistical Analyses Using R
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
 
Getting started with R
Getting started with RGetting started with R
Getting started with R
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
R studio
R studio R studio
R studio
 
R basics
R basicsR basics
R basics
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
 
Introduction to R - from Rstudio to ggplot
Introduction to R - from Rstudio to ggplotIntroduction to R - from Rstudio to ggplot
Introduction to R - from Rstudio to ggplot
 
Basics of R
Basics of RBasics of R
Basics of R
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
Quantitative Data Analysis using R
Quantitative Data Analysis using RQuantitative Data Analysis using R
Quantitative Data Analysis using R
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 

More from Pinaki Mahata Mukherjee

Business Forecasting with R
Business Forecasting with RBusiness Forecasting with R
Business Forecasting with R
Pinaki Mahata Mukherjee
 
Linux cheat sheet
Linux cheat sheetLinux cheat sheet
Linux cheat sheet
Pinaki Mahata Mukherjee
 

More from Pinaki Mahata Mukherjee (9)

Business Forecasting with R
Business Forecasting with RBusiness Forecasting with R
Business Forecasting with R
 
Linux cheat sheet
Linux cheat sheetLinux cheat sheet
Linux cheat sheet
 
Daily marketstats 07 Feb 2014
Daily marketstats 07 Feb 2014Daily marketstats 07 Feb 2014
Daily marketstats 07 Feb 2014
 
Timeseries Analysis with R
Timeseries Analysis with RTimeseries Analysis with R
Timeseries Analysis with R
 
Daily marketstats24may2013
Daily marketstats24may2013Daily marketstats24may2013
Daily marketstats24may2013
 
Daily marketstats16apr2013
Daily marketstats16apr2013Daily marketstats16apr2013
Daily marketstats16apr2013
 
Daily marketstats15apr2013
Daily marketstats15apr2013Daily marketstats15apr2013
Daily marketstats15apr2013
 
Daily marketstats14apr2013
Daily marketstats14apr2013Daily marketstats14apr2013
Daily marketstats14apr2013
 
Notes econometricswithr
Notes econometricswithrNotes econometricswithr
Notes econometricswithr
 

Recently uploaded

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Introduction to R : Regression Module

  • 1. Introduction to R Regression Module Pinaki M Mukherjee Regression Module Introduction to R Pinaki M Mukherjee 1 / 42
  • 2. 1 About R software environment 2 Download and install R 3 R packages 4 Types of data in R & import data in R 5 Regression Modelling 6 Go to R Lab 7 Interpretation of R regression output Regression Module Introduction to R Pinaki M Mukherjee 2 / 42
  • 3. About R software environment About R software environment Regression Module Introduction to R Pinaki M Mukherjee 3 / 42
  • 4. About R software environment What is R? R is a powerful software environment for statistical computing and graphics R is free open source software licensed under the GNU general public license R runs in Linux,Mac and Windows Users of R Economists Financial Analysts Market Researchers Academicians Bio- Scientist Many other professionals for quantitative research Regression Module Introduction to R Pinaki M Mukherjee 4 / 42
  • 5. About R software environment Why to use R? There are may statistical softwares like SAS, SPSS, E-Views, STATA etc. Why to use R? Some of the very strong reasons Its FREE More than 2 million users around the world High acceptability and recognition: Extensively used by large corporate houses and business schools & universities (like Stanford, Harvard, Johns Hopkins, Princeton, Washington University, MIT etc.) New features are being developed all the time Active R community and free updated resources Regression Module Introduction to R Pinaki M Mukherjee 5 / 42
  • 6. About R software environment Applications of R Data sourcing Data cleaning Data structuring Data warehousing Statistical and econometric modelling Report document generation Preparing presentation Automate reproducable research To know more about applications of R click Here Regression Module Introduction to R Pinaki M Mukherjee 6 / 42
  • 7. Download and install R Download and install R Regression Module Introduction to R Pinaki M Mukherjee 7 / 42
  • 8. Download and install R Install R Open www.r-project.org Click the “download R” Select a CRAN location (a mirror site) and click the corresponding link. Click on the “Download R for Windows” link at the top of the page Click on the “install R for the first time” link at the top of the page. Click “Download R for Windows” and save the executable file Run the .exe file and follow the installation instructions After installation of R, you need to download and install RStudio. RStudio is easy to use interface for R loaded with many user friendly and useful features. Like R, RStudio is also free. Regression Module Introduction to R Pinaki M Mukherjee 8 / 42
  • 9. Download and install R Install RStudio Open www.rstudio.com Click on the “Download RStudio” button Click on “Download RStudio Desktop” Click on the version recommended for your system, or the latest Windows version Download and save the executable file Run the .exe file and follow the installation instructions Regression Module Introduction to R Pinaki M Mukherjee 9 / 42
  • 10. R packages R packages Regression Module Introduction to R Pinaki M Mukherjee 10 / 42
  • 11. R packages About R packages Packages are collections of R functions, data, and compiled code in a well-defined format There are more than 6000 packages in R However we need only a handful of packages to work Only 1% of external packages are required to efficiently execute 99% of work The packages are kept in dedicated serves maintained by R community. In India, we have the seerver at IIT Madras The network of servers are also called CRAN (Comprehensive R Archive Network) Regression Module Introduction to R Pinaki M Mukherjee 11 / 42
  • 12. R packages Install R packages (You need a proper internet connection to successfully run the codes) To install one specific package in R On the R Console, type install.packages(“forecast”) “forecast” is the name of the package Run the following code in the “Console” of RStudio to install only the most important packages relecent for us. install.packages(“ctv”) library(“ctv”) install.views(“Econometrics”) install.views(“ReproducibleResearch”) Install.views(“Finance”) install.views(“MachineLearning”) Regression Module Introduction to R Pinaki M Mukherjee 12 / 42
  • 13. R packages Some important commands Load a package in the R session library(forecast) See the packages loaded in the R session search() Regression Module Introduction to R Pinaki M Mukherjee 13 / 42
  • 14. Types of data in R & import data in R Types of data in R & import data in R Regression Module Introduction to R Pinaki M Mukherjee 14 / 42
  • 15. Types of data in R & import data in R Vectors, Factors & Matrix A vector is a collection of data elements of the same type (also called class in R). There are four different class of data elements Charecter Numeric Logical Date and Intiger Factors are qualitative valiable extensively used in modelling. For example, interest rate changed by RBI in different RBI monetary policy reviw meetings Matrix is a numeric vectors with multiple dimentions in rows and columns Regression Module Introduction to R Pinaki M Mukherjee 15 / 42
  • 16. Types of data in R & import data in R Dataframe & List Dataframe Most frequently used format for statistical analysis Different than matrices. It can store different classes of vectors It can be created in R by simple data entry Or It can be imported from external sources, like datasets in csv files, excel files etc List is a collection of different dataframes. May resemble to a workbook with different Sheets Regression Module Introduction to R Pinaki M Mukherjee 16 / 42
  • 17. Types of data in R & import data in R Import external data into R From csv or txt file read.csv(‘data.csv’) From excel install.packages(‘xlsx’) (if xlsx package is not already installed) library(xlsx) (load the ‘xlsx’ package into the R session) read.xlsx(‘data.xlsx’, sheetIndex= 1) (Importing data in Sheet 1 of’data.xlsx’ ) Regression Module Introduction to R Pinaki M Mukherjee 17 / 42
  • 18. Regression Modelling Regression Modelling Regression Module Introduction to R Pinaki M Mukherjee 18 / 42
  • 19. Regression Modelling Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X1,X2, . . . . ,Xp is linear True regression functions are never linear although it may seem overly simplistic, linear regression is extremely useful both conceptually and practically Regression Module Introduction to R Pinaki M Mukherjee 19 / 42
  • 20. Regression Modelling Questions we might ask Is there a relationship between the dependent and independent variable? How strong is the relationship between the dependent and independent variable? Which independent variable contributes to dependent variable? Is the relationship linear? How accurately can we forecast the value of the dependent variable? Regression Module Introduction to R Pinaki M Mukherjee 20 / 42
  • 21. Regression Modelling Simple linear regression using a single predictor X. We assume a model: Y = β0 + β1X + where β0 and β1 are two unknown constants that represent the intercept and slope, also known as coefficients or parameters, and is the error term or residual Given the estimates for β0 and β1 for the model coefficients, we can forecast Y using the following equations ˆy = ˆβ0 + ˆβ1x where ˆy indicates a prediction of Y on the basis of X = x. The hat symbol denotes an estimated value. Regression Module Introduction to R Pinaki M Mukherjee 21 / 42
  • 22. Regression Modelling Estimation of the parameters by least squares ˆyi = ˆβ0 + ˆβ1xi Let ˆyi be the prediction for Y based on xi value of X i = yi − ˆy represents the ith residual We define the residual sum of square also called RSS as RSS = 2 1 + 2 2 + 2 3 + .... + 2 n The least squares approach chooses ˆβ0 and ˆβ1 to minimize the RSS Regression Module Introduction to R Pinaki M Mukherjee 22 / 42
  • 23. Regression Modelling Simple regression model: The advertisement data 0 50 100 150 200 250 300 051015202530 Sales to TV Advertisement TV Ad budget Sales Regression Module Introduction to R Pinaki M Mukherjee 23 / 42
  • 24. Regression Modelling Simple regression model: The advertisement data 0 50 100 150 200 250 300 051015202530 Sales to TV Advertisement TV Ad budget Sales Regression Module Introduction to R Pinaki M Mukherjee 24 / 42
  • 25. Regression Modelling Simple regression model: The advertisement data 0 50 100 150 200 250 300 051015202530 Sales to TV Advertisement TV Ad budget Sales Regression Module Introduction to R Pinaki M Mukherjee 25 / 42
  • 26. Regression Modelling Multiple regression using more than one predictor We assume a model: Y = β0 + β1X1 + β2X2 + where β0, β1 and β3 are two unknown constants that represent the intercept and slope, also known as coefficients or parameters, and is the error term or residual Given the estimates for β0 ,β1 and β2 for the model coefficients, we can forecast Y using the following equations ˆy = ˆβ0 + ˆβ1x1 + ˆβ2x2 where ˆy indicates a prediction of Y on the basis of X = x. The hat symbol denotes an estimated value. Regression Module Introduction to R Pinaki M Mukherjee 26 / 42
  • 27. Regression Modelling Estimation of the parameters by least squares ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i Let ˆyi be the prediction for Y based on x1i value of X1 and x2i value of X2 i = yi − ˆy represents the ith residual We define the residual sum of square also called RSS as RSS = 2 1 + 2 2 + 2 3 + .... + 2 n The least squares approach chooses ˆβ0, ˆβ1 and ˆβ2 to minimize the RSS Regression Module Introduction to R Pinaki M Mukherjee 27 / 42
  • 28. Regression Modelling Multiple regression model: The advertisement data Adding elements 0 50 100 150 200 250 300 051015202530 0 10 20 30 40 50 TV Radio Sales Regression Module Introduction to R Pinaki M Mukherjee 28 / 42
  • 29. Regression Modelling Multiple regression model: The advertisement data Adding elements 0 50 100 150 200 250 300 051015202530 0 10 20 30 40 50 TV Radio Sales Regression Module Introduction to R Pinaki M Mukherjee 29 / 42
  • 30. Regression Modelling Multiple regression model: The advertisement data Adding elements 0 50 100 150 200 250 300 051015202530 0 10 20 30 40 50 TV Radio Sales Regression Module Introduction to R Pinaki M Mukherjee 30 / 42
  • 31. Go to R Lab Go to R Lab Regression Module Introduction to R Pinaki M Mukherjee 31 / 42
  • 32. Go to R Lab Go to R Lab: Objective of the Lab Import external data Corelation matrix Estimating regression coefficients Estimating error term/residuals Print regression model summary Regression Module Introduction to R Pinaki M Mukherjee 32 / 42
  • 33. Interpretation of R regression output Interpretation of R regression output Regression Module Introduction to R Pinaki M Mukherjee 33 / 42
  • 34. Interpretation of R regression output Accuracy of the estimated coefficient: Confidence interval The standard error of an estimator reflects how it varies under repeated sampling These standard errors can be used to compute confidence intervals A 95% confidence interval is defined as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter. It has the form ˆβ1 ± 2 ∗ SE( ˆβ1) That is, there is approximately a 95% chance that the interval [ ˆβ1 − 2 ∗ SE( ˆβ1), ˆβ1 + 2 ∗ SE( ˆβ1)] For the advertising data, the 95% confidence interval for β1 is [0:042; 0:053] Regression Module Introduction to R Pinaki M Mukherjee 34 / 42
  • 35. Interpretation of R regression output Hypothesis testing Standard errors can also be used to perform hypothesis tests on the coefficients The most common hypothesis test involves testing The null hypothesis of H0 : There is no relationship between X and Y Vs The alternative hypothesis of HA : There is some relationship between X and Y Regression Module Introduction to R Pinaki M Mukherjee 35 / 42
  • 36. Interpretation of R regression output Hypothesis testing: Mathematically mean Testing H0 : β1 = 0 Vs H0 : β1 = 0 if β1 = 0 it means X is not associated with Y To test the null hypothesis, we compute a t-statistic, given by t = ˆβ1−0 SE( ˆβ1) Using R, it is easy to compute the probability of observing any value equal to |t| or larger. We call this probability the p-value. If we see a small p-value,then we can infer that there is an association between the predictor and the response. We reject the null hypothesis-that is, we declare a relationship to exist between X and Y Regression Module Introduction to R Pinaki M Mukherjee 36 / 42
  • 37. Interpretation of R regression output Assessing the Overall Accuracy of the Model: R2 R-squared or fraction of variance explained is R2 = TSS−RSS TSS = 1 − RSS TSS TSS= Total Sum of Square, also called total variation RSS= Residual sum of Square, also called unexplained variation Explained variation= Total variation - Unexplained variation R2 measures the proportion of variability in dependent variable that can be explained using independent variable An R2 statistic that is close to 1 indicates that a large proportion of the variability in the response has been explained by the regression A number near 0 indicates that the regression did not explain much of the variability in the response Regression Module Introduction to R Pinaki M Mukherjee 37 / 42
  • 38. Interpretation of R regression output Assessing the Overall Accuracy of the Model: R2 . . . R2 always lies between 0 and 1. However, it can still be challenging to determine what is a good R2 value? Depend on the application Physics ~ Close to 1, smaller value value might indicate a serious problem Biology, Psychology ~ well below 0.1 might be more realistic! Economics and finance ~ well above 0.6 might be more acceptable! what is the value of R2 in our data? Regression Module Introduction to R Pinaki M Mukherjee 38 / 42
  • 39. Interpretation of R regression output Assessing the Overall Accuracy of the Model: F Test F = TSS−RSS p RSS n−p−1 n = numberofobservations p = numberofindependentvariables Intuitively if the model is a good fit then the explained variation (TSS − RSS) will be high relative to the RSS. F value higher than 1 is desired Just how high depends on the sample size n and the number of independent variables Regression Module Introduction to R Pinaki M Mukherjee 39 / 42
  • 40. Interpretation of R regression output Answer to “Questions we might ask” in advertisement data Is there a relationship between the dependent and independent variable? Is there a relationship between advertising budget and sales? How strong is the relationship between the dependent and independent variable? How strong is the relationship? Which independent variable contributes to dependent variable? Which media contribute to sales? How large is the effect of each medium on sales? Is the relationship linear? Is the relationship linear? How accurately can we forecast the value of the dependent variable? Regression Module Introduction to R Pinaki M Mukherjee 40 / 42
  • 41. Interpretation of R regression output Exporting the regression output a <- capture.output(summary(reg)) cat(a, file = "trial.txt", sep = "n", append = TRUE) Regression Module Introduction to R Pinaki M Mukherjee 41 / 42
  • 42. Interpretation of R regression output Online free resources R Cookbook : http://www.cookbook-r.com/ Try R: http://tryr.codeschool.com/ Video tutorials: http://www.twotorials.com/ I shall be glad to help you Follow: Me in Google plus and my blog Email: pinaki.economics@gmail.com Mobile: +91 9818383989 Regression Module Introduction to R Pinaki M Mukherjee 42 / 42