R is a statistical software environment used for data analysis and statistical graphics. It was created in 1991 at the University of Auckland. Some key features of R include its large collection of statistical and graphical techniques, the ability to display and manipulate data, and the ability to write custom code for new applications. R has a standard interface called RStudio that allows users to write code, view output, and access help documentation. Common tasks in R include importing data, generating descriptive statistics, creating visualizations, and performing statistical tests and modeling.
1. Introduction to R
• Statistics is a collection of tools used for converting raw data into
information to help decision makers in their work.
• Types of Statistics:
Descriptive statistics is devoted to the summarization and description
of data.
Inferential statistics uses sample data to make an inference about a
population.
2. Statistical Analysis of Data using R
• Statistical Software Packages
1) SAS
2) SPSS
3) STATA
4) Microsoft Excel
5) R
3. Introduction to R
• R Language:
In 1991, R was created by Ross Ihaka and Robert Gentleman in the
Department of Statistics at the University of Auckland. In 1993 the first
announcement of R was made to the public.
• In 2000 R version 1.0.0 was released to the public.
• Philosophy – ‘How to Make Data Analysis Easier’
• The primary R system is available from the Comprehensive R Archive
Network, also known as CRAN.
• The main source code archives are maintained by a dedicated group
known as the R Core Team
http://cran.r-project.org
4. Introduction to R
• Installation – R GUI
Search “download R”. Go to
https://cran.r-project.org/bin/windows/base/
Click on Download R 4.1.1 for Windows (84 megabytes, 32/64 bit)
Save the file and run as administrator. Accept all default setting for
installation and complete installation process.
• There is also an integrated development environment (IDE) available for R
that is built by RStudio.
5. Introduction to R
• Installation – RStudio
Search “download RStudio”. Go to
https://rstudio.com/products/rstudio/download/ Click on First option
RStudio Desktop (FREE) to download
Save the file and run as administrator. Accept all default setting for
installation and complete installation process.
• Set your working directory, which lets R know where to find all of your
files.
6. Introduction to R
• Panels of RStudio
The source editor and data viewer panel
The R console
The command history and workspace browser
The file, help, package, and plots panel
Rstudio IDE: Cheat Sheet
R scripts – .R extension
8. Statistical Analysis of Data using R
• Using Packages :-
• R packages (or libraries) are collections of code that hold data and functionality
used in R. (i) Installed and automatically loaded, (ii) installed but need to
activate, (iii) Require to install
• install.packages("arules") and update.packages() , citation citation(“arules”)
• Writing own packages -- Writing R Extensions manual
• Wickham, H. (2015b). R Packages. O’Reilly Media, USA.
• The R Journal - https://journal.r-project.org/
9. Introduction to R
• Initial Codes
• Function/operator Brief description
options Set various R options
# A comment (ignored by interpreter)
getwd Print current working directory
setwd Set current working directory
library Load an installed package
install.packages Download and install package
update.packages Update installed packages
help or ? Function/object help file
help.search or ?? Search help files
q Quit R
10. Statistical Analysis of Data using R
• The basics of simple arithmetic, assignment, and important object types such as
vectors, matrices, lists, and data frames.
• Functions, loops and conditional statements, which are used to control the flow,
repetition, and execution of ‘your code’.
• Elementary summary statistics such as the mean, variance, quantiles, and
correlation
• Visually explore your data (with both built-in and ggplot2 functionality) by using
and customizing common statistical plots such as histograms and box- and-whisker
plots.
• R implementation and statistical interpretation of some common probability
distributions.
11. Statistical Analysis of Data using R
• Sampling distributions and confidence intervals
• hypothesis testing and p-values and demonstrates implementation and
interpretation using R; the common ANOVA
• Linear regression modeling
• ??
12. Statistical Analysis of Data using R
• R Language:
• Data Objects: Vector, List, Matrix, Data Frame
• Data Types: Integer, Numeric (Real Numbers), Logical
(True/False), Character, Complex
• R Packages:
R Packages are collections of R functions, data, and compiled code. It
will facilitate to allow specialized statistical techniques, graphical device
(such as ggplot2)
Ex:- stats, dplyr
Currently, the CRAN package repository features 16052 available packages
13. Statistical Analysis of Data using R
• Importing Data in R:
The most common way is using read.table() function (.txt).
Quite often we have comma (,) separated data values. Such a data
file can be imported into R using read.csv().
read.csv(file, header = TRUE, sep = ",", quote = """, dec = ".", ...)
Use read.table() or read.csv() function to import the file into R
• Importing an Excel File:
Download readxl package from CRAN. Load it in the workspace and
use read_excel() function to import excel file into R.
• data()
14. Statistical Analysis of Data using R
• Objectives
Entering the Input and Evaluation
Creating Vectors – The c() function can be used to create vectors of
objects by concatenating things together.
Finding descriptive measures like range, averages, variation (CV),
five-number and summary, dotplot and boxplot diagram
Perform t-test
Discrete Frequency Distribution and graphs
Creating Matrix – The matrix() function is used (AP)
•
15. Statistical Analysis of Data using R
• Objectives
Compute Binomial distribution, Poisson distribution and Normal
distribution Probability
Read data from external source using read.csv
Perform Cluster Analysis
Obtain Summary , Tables and Graphs
Manage dataframe using dplyr package
•