Introduction to Data Analysis and Graphics
using R
Introduction to Data Analysis and Graphics us-
ing R
Hellen Gakuruh
2017-03-06
Introduction to R and RStudio
Outline
• Introduction to base R and RStudio
• Download and Install both R and RStudio
• Layout (Windows and Panes)
• Interactively work with R and RStudio’s console
• Global environment/working space and history files
• Code using scripts
• Install R Packages
• Working directory and RStudio’s Projects
• Getting Help
Introduction to base R and RStudio
In this session we get to know a bit about R and Rstudio, like what they are,
how they differ and where we can get them.
About R
• A system for statistical computation and graphics (R Documentation)
• A Programming Language
• A dialect of the S language and started in 1983
1
• Partly named after the initials of first names of the leading founders Ross
Ihaka and Robert Gentleman and partly to bear similarity with S
• Began as a program for teaching statistics at a university
• Has since grown with diverse user from all over the world
• A collaborative project with many contributors to the base package as well
as extensions (packages)
Why R?
• It’s absolutely free
• Has magnificent graphing capabilities: R’s greatest strength.
• As a programming language, it’s highly extensible; it allows user defined
functions and packages (add-on)
• Growing number of packages to facilitate documentations (Word, PDF,
HTML) and reproducible analysis (Rmarkdown, bookdown, blogdown)
• Innovative packages for interactive statistical application (apps) like shiny
• A growing number of users
Why R? (cont.)
• A growing number of free and commercial Integrated Development Envi-
ronment (IDE’s) [1].
• To distinguish R from it’s IDE’s, it is usually referred to as base R.
• With good foundation, R is easy to work with.
[1]: IDE's are software’s which ease coding process, they include RStudio, and
Revolution R.
What is RStudio?
• One of R’s Integrated Development Environment(IDE)'s. Some of it’s
key advantages over base R are:
– Well thought out and organized layout(panes), making it easy to see
all the core windows at the same time.
– It has a syntax-highlighting editor that supports direct code execution,
– Workspace management.
What is RStudio (cont)
• Makes data importation easier
• Incorporation of R markdown files makes documentation and reproducible
analysis easier.
2
• It is available for open source and commercial editions and runs on the
desktop (Windows, Mac, and Linux) or in a browser connected to RStudio
Server or RStudio Server Pro (Debian/Ubuntu, RedHat/CentOS, and
SUSE Linux) RStudio.
Download and install both programs
In this sub-session we discuss and demonstrate how to install and download R
and Rstudio.
Downloading base R
• Base R is available from Comprehensive R Archives Network (CRAN)
• CRAN is a collection of web servers that stores identical and up to date
version of code and documentation of R
• There are multiple mirrors (copies) of CRAN located across the globe. It
is recommended to download from one of the mirrors closest to your area
• Select a platform most suitable to your computer specification (OS, 32/64
bit)
• Now lets download and start it up (live demonstration)
Downloading RStudio
• Download from RStudio
• Select platform specific version
• Click the executable file to begin the installation process
(live demonstration)
Layout (Windows and Panes)
In this sub-session we quickly look at R’s tool bar and windows.
Base R layout
Tool bar (live demo)
• File: Open/create script, load/save workspace and history, print and exit.
Workspace and history files to be discussed later in this session
• Edit: Usual edit options plus run options and GUI preference (appearance)
options
3
• Packages: selecting mirrors, loading, installing and uninstalling package.
Packages are add-no’s
• Windows: How to arrange the windows
• Help: Manuals and help documentation
Console
• Also called Command Line Interface
• It’s an interactive platform (acts just like a calculator)
• Enter an input and it would be evaluated and results outputted immediately
(interactive)
• As a window, it can be minimize, maximize, or re-size
• Most suitable for single lines
Console (cont)
Interactive session on console (live demo)
# simple arithmetric
1 + 1
[1] 2
# Exponential
exp(1)
[1] 2.718282
# Some geometry
2 * pi * (90/360)
[1] 1.570796
# Some trigonometry
cos(90 - 32)
[1] 0.1191801
atan(28/63)
[1] 0.4182243
Console (cont)
Note:
• R is case sensitive, so Cos and cos are not the same.
4
• To clear console Ctrl+L
• Use up and down arrow to go back to a previous code
• Console prompt “+” indicates an incomplete syntax; R is waiting for
completion
R script
• Scripts are text files used to write code
• They are more suitable for reproducibility and multiple line of code (like
creating a program)
• Text editor (program for writing scripts) in base R, need to be loaded as a
window from file > Scripts
• Not interactive (don’t give instant results) like console
• To output results, must click edit then run all/selection, alternatively
ctrl_R
R Script (cont)
Live demonstration on scripting in base R
Global environment/workspace
• Environment in R is a place with list of object names and location of their
associated values. It is also a list of parent environments as environment
have a hierarchical nature.
• Environments themselves do not store objects (values/data), they only
point to where it’s located.
• First environment to be used is called a global environment or the
workspace.
• Global environment is searched first when an object name is given in a
code, if not found, it’s parent environment is searched (this will become
clear as we create objects)
• This is not visible in base R (use function ls())
RStudio Layout
• Very user friendly
• Has four panes with multiple tabs
• Tool bar is similar to base R but with some additions
5
RStudio Panes
• Top left usually Script/text editor and data viewer tabs
• Top right usually global environment, history (logs) and additional tabs
(like Build, Git and Presentations)
• Bottom left usually console tab
• Bottom right files, plots, packages, help and viewer tabs
• However, these can be re-arranged
Interactive session on Rstudio
• Not different than base R’s console
• Input code click enter and output generated
Live demonstration
Scripting in RStudio
• Also like Base R with added bonus that it has it’s own tab with easily
accessible run button.
• There are a variety of scripts that work well in R’s script editor like .R (for
pure R code), .Rmd (for reproducibility, text and code), md (markdown
files), html and css.
Live demo (.R file)
Function Calls in R
• Everything that happens in R is as a result of a function call
• Functions are actions performed by R: commands
• A function is is denoted by parenthesis: ()
• Within parenthesis are arguments or parameters. Arguments are name-
value pair used to give input or specify how a function should be done
• There two types of functions, named functions and anonymous functions.
Named functions include mean (to compute mean), median (to locate
median), and read.table (to import data)
• Functions can also be considered to be high or low level. High level
functions are commonly used commands while low level commands are
those called by high level functions
• “Function call” means using a function to perform and action
• When making function call, argument can be named or unnamed and it
could have default values or not
6
Function call example: mean function
• To access documentation of this and any other function; type ?function
name e.g. ?mean
• mean has arguments (x, trim = 0, na.rm = FALSE, . . . )
• Arguments trim and na.rm have default values 0 and FALSE. These can
be changed as need be but if okay, don’t include in call.
• Argument x has no default value hence it must be given.
• When a function call is made with both name and value e.g. mean(x =
data) it’s a named argument call. If it’s mean(data), then it’s unnamed
argument call
• A function call with named arguments can be specified in any order without
a problem (though not really good practice)
• But unnamed argument call needs to be ordered the same way as function
definition (Live demo)
Errors, warnings and messages
• An error in R means something is not available, for example a data set
is specified in a call yet it’s unavailable in R. Errors are fatal, they stop
execution
• A warnings are information of possible problems, they do not stop execution.
It good to check why this happens to pre-empt a possible problem
• Messages are useful information, they have nothing to do with a problem,
good examples are messages to inform on installation progress
(Live demo)
R Packages
• Package is simply a mechanism for loading optional code, data and docu-
mentation.
• All R functions and data sets are stored in packages and base R distribution
includes about 30 packages [2].
• Out of these 30 packages, there those packages that are considered part of
R source code and therefore automatically loaded
• The others are installed (exist), but are not available for use, they must
be loaded with function library to make them available
• Many other contributed packages exist in CRAN (Comprehensive R Net-
work)
• Generally, base R packages are sufficient to perform most basic statistics,
but if specialized functions are needed, then check CRAN (start with tasks
view )
7
[2]:
R Core Team (2016). R: A language and environment for statistical com-
putation. R Foundation for Statistical Computing, Vienna, Austria. URL
https://www.R-project.org/.
R Session and Working Directory
• R session is a current active session, it begins when R is logged in and end
when logged off
• A working directory is a folder used during an R session to source and save
files
• It is important to specify working directory for each R session or globally
change R to start from a folder considered as a working directory
Getting and setting working directory
• To check current working directory, call getwd()
• To set another working directory call setwd(dir)
• Argument dir is a path (location) name), it can either be relative or
absolute
• Example: set("~/Data Mania Inc/Data_Mgt_Analysis_and_Graphics_R")
RStudio Projects
• One of the recent feature in RStudio
• Enables working with different yet inter-related work and document each
with it’s own working directory (folder)
• Can be created in a new or an existing working directory or from a cloned
version controlled repository )
Getting Help
• Important to initially read internal help documentation like function
documentation: access with either ?function or help(function) e.g.
?read.table or help("read.table"). Note, later case has to be quoted
(“”).
• R core team has invested a lot of time and energy to develop a number of
user manual: access with help.start() (no arguments)
• Manuals to read as a beginner in R are:
– An introduction to R and
– R Installation and Administration
8
• Under references, Search Engine & Keywords can be handy in locating
certain write-up
• Under Miscellaneous Materials, “Frequently Asked Questions is a must
read”
• Other documentation in this page can be read in bits as they become
relevant
Beyond R documentation
• Do a Google search, specifically using R seek
• Ask a knowledgeable (and helpful) friend
• Search through R’s help mailing list
• Search through stackoverflow Q&A
• Finally post a question to either Stackoverflow or R’s mailing list (consider
the latter) but never the same question to both sites
Useful resources
• RStudio’s getting help: https://support.rstudio.com/hc/en-us/articles/
200552336-Getting-Help-with-R) in R documentation
• A PDF version of R-project’s “An Introduction to R: https://cran.r-
project.org/doc/manuals/R-intro.pdf
• R-projects’s ref-card: https://cran.r-project.org/doc/contrib/Short-
refcard.pdf
9

R training

  • 1.
    Introduction to DataAnalysis and Graphics using R Introduction to Data Analysis and Graphics us- ing R Hellen Gakuruh 2017-03-06 Introduction to R and RStudio Outline • Introduction to base R and RStudio • Download and Install both R and RStudio • Layout (Windows and Panes) • Interactively work with R and RStudio’s console • Global environment/working space and history files • Code using scripts • Install R Packages • Working directory and RStudio’s Projects • Getting Help Introduction to base R and RStudio In this session we get to know a bit about R and Rstudio, like what they are, how they differ and where we can get them. About R • A system for statistical computation and graphics (R Documentation) • A Programming Language • A dialect of the S language and started in 1983 1
  • 2.
    • Partly namedafter the initials of first names of the leading founders Ross Ihaka and Robert Gentleman and partly to bear similarity with S • Began as a program for teaching statistics at a university • Has since grown with diverse user from all over the world • A collaborative project with many contributors to the base package as well as extensions (packages) Why R? • It’s absolutely free • Has magnificent graphing capabilities: R’s greatest strength. • As a programming language, it’s highly extensible; it allows user defined functions and packages (add-on) • Growing number of packages to facilitate documentations (Word, PDF, HTML) and reproducible analysis (Rmarkdown, bookdown, blogdown) • Innovative packages for interactive statistical application (apps) like shiny • A growing number of users Why R? (cont.) • A growing number of free and commercial Integrated Development Envi- ronment (IDE’s) [1]. • To distinguish R from it’s IDE’s, it is usually referred to as base R. • With good foundation, R is easy to work with. [1]: IDE's are software’s which ease coding process, they include RStudio, and Revolution R. What is RStudio? • One of R’s Integrated Development Environment(IDE)'s. Some of it’s key advantages over base R are: – Well thought out and organized layout(panes), making it easy to see all the core windows at the same time. – It has a syntax-highlighting editor that supports direct code execution, – Workspace management. What is RStudio (cont) • Makes data importation easier • Incorporation of R markdown files makes documentation and reproducible analysis easier. 2
  • 3.
    • It isavailable for open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux) RStudio. Download and install both programs In this sub-session we discuss and demonstrate how to install and download R and Rstudio. Downloading base R • Base R is available from Comprehensive R Archives Network (CRAN) • CRAN is a collection of web servers that stores identical and up to date version of code and documentation of R • There are multiple mirrors (copies) of CRAN located across the globe. It is recommended to download from one of the mirrors closest to your area • Select a platform most suitable to your computer specification (OS, 32/64 bit) • Now lets download and start it up (live demonstration) Downloading RStudio • Download from RStudio • Select platform specific version • Click the executable file to begin the installation process (live demonstration) Layout (Windows and Panes) In this sub-session we quickly look at R’s tool bar and windows. Base R layout Tool bar (live demo) • File: Open/create script, load/save workspace and history, print and exit. Workspace and history files to be discussed later in this session • Edit: Usual edit options plus run options and GUI preference (appearance) options 3
  • 4.
    • Packages: selectingmirrors, loading, installing and uninstalling package. Packages are add-no’s • Windows: How to arrange the windows • Help: Manuals and help documentation Console • Also called Command Line Interface • It’s an interactive platform (acts just like a calculator) • Enter an input and it would be evaluated and results outputted immediately (interactive) • As a window, it can be minimize, maximize, or re-size • Most suitable for single lines Console (cont) Interactive session on console (live demo) # simple arithmetric 1 + 1 [1] 2 # Exponential exp(1) [1] 2.718282 # Some geometry 2 * pi * (90/360) [1] 1.570796 # Some trigonometry cos(90 - 32) [1] 0.1191801 atan(28/63) [1] 0.4182243 Console (cont) Note: • R is case sensitive, so Cos and cos are not the same. 4
  • 5.
    • To clearconsole Ctrl+L • Use up and down arrow to go back to a previous code • Console prompt “+” indicates an incomplete syntax; R is waiting for completion R script • Scripts are text files used to write code • They are more suitable for reproducibility and multiple line of code (like creating a program) • Text editor (program for writing scripts) in base R, need to be loaded as a window from file > Scripts • Not interactive (don’t give instant results) like console • To output results, must click edit then run all/selection, alternatively ctrl_R R Script (cont) Live demonstration on scripting in base R Global environment/workspace • Environment in R is a place with list of object names and location of their associated values. It is also a list of parent environments as environment have a hierarchical nature. • Environments themselves do not store objects (values/data), they only point to where it’s located. • First environment to be used is called a global environment or the workspace. • Global environment is searched first when an object name is given in a code, if not found, it’s parent environment is searched (this will become clear as we create objects) • This is not visible in base R (use function ls()) RStudio Layout • Very user friendly • Has four panes with multiple tabs • Tool bar is similar to base R but with some additions 5
  • 6.
    RStudio Panes • Topleft usually Script/text editor and data viewer tabs • Top right usually global environment, history (logs) and additional tabs (like Build, Git and Presentations) • Bottom left usually console tab • Bottom right files, plots, packages, help and viewer tabs • However, these can be re-arranged Interactive session on Rstudio • Not different than base R’s console • Input code click enter and output generated Live demonstration Scripting in RStudio • Also like Base R with added bonus that it has it’s own tab with easily accessible run button. • There are a variety of scripts that work well in R’s script editor like .R (for pure R code), .Rmd (for reproducibility, text and code), md (markdown files), html and css. Live demo (.R file) Function Calls in R • Everything that happens in R is as a result of a function call • Functions are actions performed by R: commands • A function is is denoted by parenthesis: () • Within parenthesis are arguments or parameters. Arguments are name- value pair used to give input or specify how a function should be done • There two types of functions, named functions and anonymous functions. Named functions include mean (to compute mean), median (to locate median), and read.table (to import data) • Functions can also be considered to be high or low level. High level functions are commonly used commands while low level commands are those called by high level functions • “Function call” means using a function to perform and action • When making function call, argument can be named or unnamed and it could have default values or not 6
  • 7.
    Function call example:mean function • To access documentation of this and any other function; type ?function name e.g. ?mean • mean has arguments (x, trim = 0, na.rm = FALSE, . . . ) • Arguments trim and na.rm have default values 0 and FALSE. These can be changed as need be but if okay, don’t include in call. • Argument x has no default value hence it must be given. • When a function call is made with both name and value e.g. mean(x = data) it’s a named argument call. If it’s mean(data), then it’s unnamed argument call • A function call with named arguments can be specified in any order without a problem (though not really good practice) • But unnamed argument call needs to be ordered the same way as function definition (Live demo) Errors, warnings and messages • An error in R means something is not available, for example a data set is specified in a call yet it’s unavailable in R. Errors are fatal, they stop execution • A warnings are information of possible problems, they do not stop execution. It good to check why this happens to pre-empt a possible problem • Messages are useful information, they have nothing to do with a problem, good examples are messages to inform on installation progress (Live demo) R Packages • Package is simply a mechanism for loading optional code, data and docu- mentation. • All R functions and data sets are stored in packages and base R distribution includes about 30 packages [2]. • Out of these 30 packages, there those packages that are considered part of R source code and therefore automatically loaded • The others are installed (exist), but are not available for use, they must be loaded with function library to make them available • Many other contributed packages exist in CRAN (Comprehensive R Net- work) • Generally, base R packages are sufficient to perform most basic statistics, but if specialized functions are needed, then check CRAN (start with tasks view ) 7
  • 8.
    [2]: R Core Team(2016). R: A language and environment for statistical com- putation. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. R Session and Working Directory • R session is a current active session, it begins when R is logged in and end when logged off • A working directory is a folder used during an R session to source and save files • It is important to specify working directory for each R session or globally change R to start from a folder considered as a working directory Getting and setting working directory • To check current working directory, call getwd() • To set another working directory call setwd(dir) • Argument dir is a path (location) name), it can either be relative or absolute • Example: set("~/Data Mania Inc/Data_Mgt_Analysis_and_Graphics_R") RStudio Projects • One of the recent feature in RStudio • Enables working with different yet inter-related work and document each with it’s own working directory (folder) • Can be created in a new or an existing working directory or from a cloned version controlled repository ) Getting Help • Important to initially read internal help documentation like function documentation: access with either ?function or help(function) e.g. ?read.table or help("read.table"). Note, later case has to be quoted (“”). • R core team has invested a lot of time and energy to develop a number of user manual: access with help.start() (no arguments) • Manuals to read as a beginner in R are: – An introduction to R and – R Installation and Administration 8
  • 9.
    • Under references,Search Engine & Keywords can be handy in locating certain write-up • Under Miscellaneous Materials, “Frequently Asked Questions is a must read” • Other documentation in this page can be read in bits as they become relevant Beyond R documentation • Do a Google search, specifically using R seek • Ask a knowledgeable (and helpful) friend • Search through R’s help mailing list • Search through stackoverflow Q&A • Finally post a question to either Stackoverflow or R’s mailing list (consider the latter) but never the same question to both sites Useful resources • RStudio’s getting help: https://support.rstudio.com/hc/en-us/articles/ 200552336-Getting-Help-with-R) in R documentation • A PDF version of R-project’s “An Introduction to R: https://cran.r- project.org/doc/manuals/R-intro.pdf • R-projects’s ref-card: https://cran.r-project.org/doc/contrib/Short- refcard.pdf 9