SlideShare a Scribd company logo
1 of 41
Download to read offline
Unit 1: Introduction to R
Why R?
• A key element of the data analysis chain (acquire / analyze / explain) is the
choice of data analysis software. Since there are many software, why R?
• First reason is that R is a free, open-source language, available for most
popular operating systems.
• Second reason: the enormous range of analysis methods supported by R's
growing universe of addon packages.
• Third reason to adopt R is its growing popularity, undoubtedly fueled by the
reasons just described, but which is also likely to promote the continued
growth of new capabilities.
Computers, software, and R
To use R or any other data analysis environment, involves three basic tasks:
1.Make the data you want to analyze available to the analysis software;
2.Perform the analysis;
3.Make the results of the analysis available to those who need them.
Data
Loosely speaking, the term data refers to a collection of details, recorded to
characterize a source like one of the following:
• an entity, e.g.: family history from a patient in a medical study;
manufacturing lot information for a material sample in a physical testing
application; or competing company characteristics in a marketing analysis;
• an event, e.g.: demographic characteristics of those who voted for different
political candidates in a particular election;
• a process, e.g.: operating data from an industrial manufacturing process.
Data Frame
• In many statistics problem data to refer to a rectangular array of observed values,
where each row refers to a different observation of entity, event, or process
characteristics (e.g., distinct patients in a medical study), and each column
represents a different characteristic (e.g., diastolic blood pressure) recorded for
each row.
• In R's terminology, this description defines a data frame.
• The mtcars data frame is one of many built-in data examples in R. This data frame
has 32 rows, each one corresponding to a different car. Each of these cars is
characterized by 11 variables, which constitute the columns of the data frame.
Metadata
• The information about the data set, (how it is collected, variables included,
the target entities) constitutes the metadata
• So, metadata is data about data, and it can vary widely in terms of its
completeness, consistency, and general accuracy.
• If the data is included in R typing help(name of data frame) shows you the
metadata (information about the data frame).
• It is extremely important to read the metadata carefully.
The structure of R
The R programming language basically consists of three components:
• a set of base R packages, a required collection of programs that support
language infrastructure and basic statistics and data analysis functions;
• a set of recommended packages, automatically included in almost all R
installations (e.g., the MASS package);
• a very large and growing set of optional add-on packages, available through
the Comprehensive R Archive Network (CRAN).
Basics of R
9
We need some time to learn R.
It is hard in the beginning, but it
gets easier as you go along.
10
How to download?
R is free and can be downloaded
from:
• Just search for R or CRAN (Comprehensive R Archive Network) in your
browser or just visit http://www.r-project.org
11
Nice RTutorials
The following tutorials are in PDF format.
• P. Kuhnert & B.Venables, An Introduction to R: Software for Statistical
Modeling & Computing
• J.H. Maindonald, Using R for Data Analysis and Graphics
• B. Muenchen, R for SAS and SPSS Users
• W.J. Owen,The R Guide
• D. Rossiter, Introduction to the R Project for Statistical Computing for Use
at the ITC
• W.N.Venebles & D. M. Smith, An Introduction to R
12
Web links
• Paul Geissler's excellent R tutorial
• Excellent tutorial an nearly every aspect of R
• Introduction to R byVincent Zoonekynd
• R Cookbook
• Data Manipulation Reference
• R Concepts and DataTypes
• An Introduction to R
• Import / Export Manual
• R Reference Cards
• Hints on plotting data in R
13
R History
R is a comprehensive statistical and graphical programming language.
R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics
of U of Auckland, New Zealand during 1990s.
Since 1997: international “R-core” team of 15 people with access to
commonCVS archive.
14
R: running
You can enter commands one at a time at the command prompt (>) or
run a set of commands from a source file.
There is a wide variety of data types, including vectors (numerical,
character, logical), matrices, data frames, and lists.
To quit R, use
>q()
15
R: available functions
Most functionality is provided through built-in and user-created
functions and all data objects are kept in memory during an
interactive session.
Basic functions are available by default. Other functions are contained
in packages that can be attached to a current session as needed
using the library command.
16
R Overview
A key skill to using R effectively is learning how to use the built-in
help system. Other sections describe the working environment,
inputting programs and outputting results, installing new
functionality through packages and etc.
A fundamental design feature of R is that the output from most
functions can be used as input to other functions.This is described
in reusing results.
17
R Interface
Start the R system, the main window (RGui) with a sub window (R Console)
will appear
In the `Console' window the cursor is waiting for you to type in some R
commands.
When you
start R
18
19
R Language Introduction
• Results of calculations can be stored in objects using the assignment
operators:
• An arrow (<-) formed by a smaller than character and a hyphen
without a space!
• The equal character (=).
• Example
> a=x+y
or
> a<- x+y
20
R language Introduction
• These objects can then be used in other calculations.To print the object
just enter the name of the object.There are some restrictions when giving
an object a name:
• Object names cannot contain `strange' symbols like !, +, -, #.
• A dot (.) and an underscore ( ) are allowed, also a name starting with a
dot.
• Object names can contain a number but cannot start with a number.
• R is case sensitive, X and x are two different objects, as well as temp
and temP.
21
Example
22
R Language Introduction
• To see the list of object in your working space use
> ls()
• To delete use
> rm(x)
• To plot
23
R Workspace
Objects that you create during an R session are hold in
memory, the collection of objects that you currently
have is called the workspace.This workspace is not
saved on disk unless you tell R to do so.This means
that your objects are lost when you close R and not
save the objects, or worse when R or your system
crashes on you during a session.
24
R Workspace
When you close the RGui or the R console window, the
system will ask if you want to save the workspace image.
If you select to save the workspace image then all the
objects in your current R session are saved in a file
.RData.This is a binary file located in the working
directory of R, which is by default the installation
directory of R.
Or you can save it yourself by:
25
R Working Directory
Working directory is the place where R will save and read from by
default.To specifyWorking directory
> setwd("d:/temp")
To see what is your working directory type
> getwd()
To load your saved workspace use:
> load(".Rdata")
To see your previous command (history) just type >history() or use the
up and down arrows.
26
R History
# save your command history
savehistory(file="myfile") # default is ".Rhistory"
# recall your command history
loadhistory(file="myfile") # default is ".Rhistory“
27
R Help
Once R is installed, there is a comprehensive built-in help system. At
the program's command prompt you can use any of the following:
help.start() # general help
help(foo) # help about function foo
?foo # same thing
apropos("foo") # list all function containing string foo
example(foo) # show an example of function foo
28
R Help
# search for foo in help manuals and archived mailing lists
RSiteSearch("foo")
# get vignettes on using installed packages
vignette() # show available vingettes
vignette("foo") # show specific vignette
29
R Datasets
R comes with a number of sample datasets that you can
experiment with.Type
> data( )
to see the available datasets.The results will depend on which
packages you have loaded.Type
help(datasetname)
for details on a sample dataset.
30
R Packages
• One of the strengths of R is that the system can easily be extended.The
system allows you to write new functions and package those functions
in a so called `R package' (or `R library').The R package may also contain
other R objects, for example data sets or documentation.There is a
lively R user community and many R packages have been written and
made available on CRAN for other users. Just a few examples, there are
packages for portfolio optimization, drawing maps, exporting objects to
html, time series analysis, spatial statistics and the list goes on and on.
31
R Packages
• When you download R, already a number (around 30) of packages are
downloaded as well.To use a function in an R package, that package has to be
attached to the system. When you start R not all of the downloaded packages
are attached, only nine packages are attached to the system by default.You can
use the function search to see a list of packages that are currently attached to
the system, this list is also called the search path.
R Packages
• To attach another package to the system
you can use the menu or the library
function.Via the menu:
Or : >library(thename)
• To install new package, use the install
package menu above.
32
R Packages
• The function library can also be used to list all the available libraries on
your system with a short description.
33
34
Source Codes
you can have input come from a script file (a text file containing R
commands) and direct output to a variety of destinations.
Input
The source( ) function runs a script in the current session. If the
filename does not include a path, the file is taken from the current
working directory.
# input a script
source("myfile")
35
Output
Output
The sink( ) function defines the direction of the output.
# direct output to a file
sink("myfile", append=FALSE, split=FALSE)
# return output to the terminal
sink()
36
Output
The append option controls whether output overwrites or adds to a
file.
The split option determines if output is also sent to the screen as well
as the output file.
Here are some examples of the sink() function.
# output directed to output.txt in c:projects directory.
# output overwrites existing file. no output to terminal.
sink("myfile.txt", append=TRUE, split=TRUE)
37
Graphs
To redirect graphic output use one of the following functions.
Use dev.off( ) to return output to the terminal.
Function Output to
pdf("mygraph.pdf") pdf file
win.metafile("mygraph.wmf") windows metafile
png("mygraph.png") png file
jpeg("mygraph.jpg") jpeg file
bmp("mygraph.bmp") bmp file
postscript("mygraph.ps") postscript file
38
Redirecting Graphs
# example - output graph to jpeg file
jpeg(“d:/temp/myplot.jpg")
plot(x,y)
dev.off()
39
Reusing Results
One of the most useful design features of R is that the output of
analyses can easily be saved and used as input to additional
analyses.
# Example 1
lm(mpg~wt, data=mtcars)
This will run a simple linear regression of miles per gallon on car
weight using the dataframe mtcars. Results are sent to the
screen. Nothing is saved.
40
Reusing Results
# Example 2
fit <- lm(mpg~wt, data=mtcars)
This time, the same regression is performed but the results are saved
under the name fit. No output is sent to the screen. However, you
now can manipulate the results.
str(fit) # view the contents/structure of "fit“
The assignment has actually created a list called "fit" that contains a
wide range of information (including the predicted values,
residuals, coefficients, and more.
41
Reusing Results
# plot residuals by fitted values
plot(fit$residuals, fit$fitted.values)
To see what a function returns, look at the value section of the online
help for that function. Here we would look at help(lm).
The results can also be used by a wide range of other functions.
# produce diagnostic plots
plot(fit)

More Related Content

Similar to Unit1_Introduction to R.pdf

Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptxsalutiontechnology
 
R programming presentation
R programming presentationR programming presentation
R programming presentationAkshat Sharma
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptxranapoonam1
 
Research paper presentation
Research paper presentation Research paper presentation
Research paper presentation Akshat Sharma
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxrajalakshmi5921
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
Intro to data science module 1 r
Intro to data science module 1 rIntro to data science module 1 r
Intro to data science module 1 ramuletc
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 

Similar to Unit1_Introduction to R.pdf (20)

R training
R trainingR training
R training
 
pres02a.ppt
pres02a.pptpres02a.ppt
pres02a.ppt
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptx
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
Research paper presentation
Research paper presentation Research paper presentation
Research paper presentation
 
Unit 3
Unit 3Unit 3
Unit 3
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Intro to data science module 1 r
Intro to data science module 1 rIntro to data science module 1 r
Intro to data science module 1 r
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
R language
R languageR language
R language
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 

Recently uploaded

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 

Unit1_Introduction to R.pdf

  • 2. Why R? • A key element of the data analysis chain (acquire / analyze / explain) is the choice of data analysis software. Since there are many software, why R? • First reason is that R is a free, open-source language, available for most popular operating systems. • Second reason: the enormous range of analysis methods supported by R's growing universe of addon packages. • Third reason to adopt R is its growing popularity, undoubtedly fueled by the reasons just described, but which is also likely to promote the continued growth of new capabilities.
  • 3. Computers, software, and R To use R or any other data analysis environment, involves three basic tasks: 1.Make the data you want to analyze available to the analysis software; 2.Perform the analysis; 3.Make the results of the analysis available to those who need them.
  • 4. Data Loosely speaking, the term data refers to a collection of details, recorded to characterize a source like one of the following: • an entity, e.g.: family history from a patient in a medical study; manufacturing lot information for a material sample in a physical testing application; or competing company characteristics in a marketing analysis; • an event, e.g.: demographic characteristics of those who voted for different political candidates in a particular election; • a process, e.g.: operating data from an industrial manufacturing process.
  • 5. Data Frame • In many statistics problem data to refer to a rectangular array of observed values, where each row refers to a different observation of entity, event, or process characteristics (e.g., distinct patients in a medical study), and each column represents a different characteristic (e.g., diastolic blood pressure) recorded for each row. • In R's terminology, this description defines a data frame. • The mtcars data frame is one of many built-in data examples in R. This data frame has 32 rows, each one corresponding to a different car. Each of these cars is characterized by 11 variables, which constitute the columns of the data frame.
  • 6. Metadata • The information about the data set, (how it is collected, variables included, the target entities) constitutes the metadata • So, metadata is data about data, and it can vary widely in terms of its completeness, consistency, and general accuracy. • If the data is included in R typing help(name of data frame) shows you the metadata (information about the data frame). • It is extremely important to read the metadata carefully.
  • 7. The structure of R The R programming language basically consists of three components: • a set of base R packages, a required collection of programs that support language infrastructure and basic statistics and data analysis functions; • a set of recommended packages, automatically included in almost all R installations (e.g., the MASS package); • a very large and growing set of optional add-on packages, available through the Comprehensive R Archive Network (CRAN).
  • 9. 9 We need some time to learn R. It is hard in the beginning, but it gets easier as you go along.
  • 10. 10 How to download? R is free and can be downloaded from: • Just search for R or CRAN (Comprehensive R Archive Network) in your browser or just visit http://www.r-project.org
  • 11. 11 Nice RTutorials The following tutorials are in PDF format. • P. Kuhnert & B.Venables, An Introduction to R: Software for Statistical Modeling & Computing • J.H. Maindonald, Using R for Data Analysis and Graphics • B. Muenchen, R for SAS and SPSS Users • W.J. Owen,The R Guide • D. Rossiter, Introduction to the R Project for Statistical Computing for Use at the ITC • W.N.Venebles & D. M. Smith, An Introduction to R
  • 12. 12 Web links • Paul Geissler's excellent R tutorial • Excellent tutorial an nearly every aspect of R • Introduction to R byVincent Zoonekynd • R Cookbook • Data Manipulation Reference • R Concepts and DataTypes • An Introduction to R • Import / Export Manual • R Reference Cards • Hints on plotting data in R
  • 13. 13 R History R is a comprehensive statistical and graphical programming language. R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s. Since 1997: international “R-core” team of 15 people with access to commonCVS archive.
  • 14. 14 R: running You can enter commands one at a time at the command prompt (>) or run a set of commands from a source file. There is a wide variety of data types, including vectors (numerical, character, logical), matrices, data frames, and lists. To quit R, use >q()
  • 15. 15 R: available functions Most functionality is provided through built-in and user-created functions and all data objects are kept in memory during an interactive session. Basic functions are available by default. Other functions are contained in packages that can be attached to a current session as needed using the library command.
  • 16. 16 R Overview A key skill to using R effectively is learning how to use the built-in help system. Other sections describe the working environment, inputting programs and outputting results, installing new functionality through packages and etc. A fundamental design feature of R is that the output from most functions can be used as input to other functions.This is described in reusing results.
  • 17. 17 R Interface Start the R system, the main window (RGui) with a sub window (R Console) will appear In the `Console' window the cursor is waiting for you to type in some R commands.
  • 19. 19 R Language Introduction • Results of calculations can be stored in objects using the assignment operators: • An arrow (<-) formed by a smaller than character and a hyphen without a space! • The equal character (=). • Example > a=x+y or > a<- x+y
  • 20. 20 R language Introduction • These objects can then be used in other calculations.To print the object just enter the name of the object.There are some restrictions when giving an object a name: • Object names cannot contain `strange' symbols like !, +, -, #. • A dot (.) and an underscore ( ) are allowed, also a name starting with a dot. • Object names can contain a number but cannot start with a number. • R is case sensitive, X and x are two different objects, as well as temp and temP.
  • 22. 22 R Language Introduction • To see the list of object in your working space use > ls() • To delete use > rm(x) • To plot
  • 23. 23 R Workspace Objects that you create during an R session are hold in memory, the collection of objects that you currently have is called the workspace.This workspace is not saved on disk unless you tell R to do so.This means that your objects are lost when you close R and not save the objects, or worse when R or your system crashes on you during a session.
  • 24. 24 R Workspace When you close the RGui or the R console window, the system will ask if you want to save the workspace image. If you select to save the workspace image then all the objects in your current R session are saved in a file .RData.This is a binary file located in the working directory of R, which is by default the installation directory of R. Or you can save it yourself by:
  • 25. 25 R Working Directory Working directory is the place where R will save and read from by default.To specifyWorking directory > setwd("d:/temp") To see what is your working directory type > getwd() To load your saved workspace use: > load(".Rdata") To see your previous command (history) just type >history() or use the up and down arrows.
  • 26. 26 R History # save your command history savehistory(file="myfile") # default is ".Rhistory" # recall your command history loadhistory(file="myfile") # default is ".Rhistory“
  • 27. 27 R Help Once R is installed, there is a comprehensive built-in help system. At the program's command prompt you can use any of the following: help.start() # general help help(foo) # help about function foo ?foo # same thing apropos("foo") # list all function containing string foo example(foo) # show an example of function foo
  • 28. 28 R Help # search for foo in help manuals and archived mailing lists RSiteSearch("foo") # get vignettes on using installed packages vignette() # show available vingettes vignette("foo") # show specific vignette
  • 29. 29 R Datasets R comes with a number of sample datasets that you can experiment with.Type > data( ) to see the available datasets.The results will depend on which packages you have loaded.Type help(datasetname) for details on a sample dataset.
  • 30. 30 R Packages • One of the strengths of R is that the system can easily be extended.The system allows you to write new functions and package those functions in a so called `R package' (or `R library').The R package may also contain other R objects, for example data sets or documentation.There is a lively R user community and many R packages have been written and made available on CRAN for other users. Just a few examples, there are packages for portfolio optimization, drawing maps, exporting objects to html, time series analysis, spatial statistics and the list goes on and on.
  • 31. 31 R Packages • When you download R, already a number (around 30) of packages are downloaded as well.To use a function in an R package, that package has to be attached to the system. When you start R not all of the downloaded packages are attached, only nine packages are attached to the system by default.You can use the function search to see a list of packages that are currently attached to the system, this list is also called the search path.
  • 32. R Packages • To attach another package to the system you can use the menu or the library function.Via the menu: Or : >library(thename) • To install new package, use the install package menu above. 32
  • 33. R Packages • The function library can also be used to list all the available libraries on your system with a short description. 33
  • 34. 34 Source Codes you can have input come from a script file (a text file containing R commands) and direct output to a variety of destinations. Input The source( ) function runs a script in the current session. If the filename does not include a path, the file is taken from the current working directory. # input a script source("myfile")
  • 35. 35 Output Output The sink( ) function defines the direction of the output. # direct output to a file sink("myfile", append=FALSE, split=FALSE) # return output to the terminal sink()
  • 36. 36 Output The append option controls whether output overwrites or adds to a file. The split option determines if output is also sent to the screen as well as the output file. Here are some examples of the sink() function. # output directed to output.txt in c:projects directory. # output overwrites existing file. no output to terminal. sink("myfile.txt", append=TRUE, split=TRUE)
  • 37. 37 Graphs To redirect graphic output use one of the following functions. Use dev.off( ) to return output to the terminal. Function Output to pdf("mygraph.pdf") pdf file win.metafile("mygraph.wmf") windows metafile png("mygraph.png") png file jpeg("mygraph.jpg") jpeg file bmp("mygraph.bmp") bmp file postscript("mygraph.ps") postscript file
  • 38. 38 Redirecting Graphs # example - output graph to jpeg file jpeg(“d:/temp/myplot.jpg") plot(x,y) dev.off()
  • 39. 39 Reusing Results One of the most useful design features of R is that the output of analyses can easily be saved and used as input to additional analyses. # Example 1 lm(mpg~wt, data=mtcars) This will run a simple linear regression of miles per gallon on car weight using the dataframe mtcars. Results are sent to the screen. Nothing is saved.
  • 40. 40 Reusing Results # Example 2 fit <- lm(mpg~wt, data=mtcars) This time, the same regression is performed but the results are saved under the name fit. No output is sent to the screen. However, you now can manipulate the results. str(fit) # view the contents/structure of "fit“ The assignment has actually created a list called "fit" that contains a wide range of information (including the predicted values, residuals, coefficients, and more.
  • 41. 41 Reusing Results # plot residuals by fitted values plot(fit$residuals, fit$fitted.values) To see what a function returns, look at the value section of the online help for that function. Here we would look at help(lm). The results can also be used by a wide range of other functions. # produce diagnostic plots plot(fit)