SlideShare a Scribd company logo
Basics of R Programming
Yanchang Zhao
http://www.RDataMining.com
R and Data Mining Course
Beijing University of Posts and Telecommunications,
Beijing, China
July 2019
1 / 44
Quiz
Have you used R before?
2 / 44
Quiz
Have you used R before?
Are you familiar with data mining and machine learning
techniques and algorithms?
2 / 44
Quiz
Have you used R before?
Are you familiar with data mining and machine learning
techniques and algorithms?
Have you used R for data mining and analytics in your
study/research/work?
2 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
3 / 44
What is R?
R ∗ is a free software environment for statistical computing
and graphics.
R can be easily extended with 14,000+ packages available on
CRAN† (as of July 2019).
Many other packages provided on Bioconductor‡, R-Forge§,
GitHub¶, etc.
R manuals on CRAN
An Introduction to R
The R Language Definition
R Data Import/Export
. . .
∗
http://www.r-project.org/
†
http://cran.r-project.org/
‡
http://www.bioconductor.org/
§
http://r-forge.r-project.org/
¶
https://github.com/
http://cran.r-project.org/manuals.html
4 / 44
Why R?
R is widely used in both academia and industry.
R is one of the most popular tools for data science and
analytics, ranked #1 from 2011 to 2016, but sadly overtaken
by Python since 2017, :-( ∗∗.
The CRAN Task Views †† provide collections of packages for
different tasks.
Machine learning & atatistical learning
Cluster analysis & finite mixture models
Time series analysis
Multivariate statistics
Analysis of spatial data
. . .
∗∗
The KDnuggets polls on Top Analytics, Data Science software
https://www.kdnuggets.com/2019/05/poll-top-data-science-machine-learning-platforms.html
††
http://cran.r-project.org/web/views/
5 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
6 / 44
RStudio‡‡
An integrated development environment (IDE) for R
Runs on various operating systems like Windows, Mac OS X
and Linux
Suggestion: always using an RStudio project, with subfolders
code: source code
data: raw data, cleaned data
figures: charts and graphs
docs: documents and reports
models: analytics models
‡‡
https://www.rstudio.com/products/rstudio/
7 / 44
RStudio
8 / 44
RStudio Keyboard Shortcuts
Run current line or selection: Ctrl + enter
Comment / uncomment selection: Ctrl + Shift + C
Clear console: Ctrl + L
Reindent selection: Ctrl + I
9 / 44
Writing Reports and Papers
Sweave + LaTex: for academic publications
beamer + LaTex: for presentations
knitr + R Markdown: generating reports and slides in HTML,
PDF and WORD formats
Notebooks: R notebook, Jupiter notebook
10 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
11 / 44
Pipe Operations
Load library magrittr for pipe operations
Avoid nested function calls
Make code easy to understand
Supported by dplyr and ggplot2
library(magrittr) ## for pipe operations
## traditional way
b <- fun3(fun2(fun1(a), b), d)
## the above can be rewritten to
b <- a %>% fun1() %>% fun2(b) %>% fun3(d)
12 / 44
Pipe Operations
Load library magrittr for pipe operations
Avoid nested function calls
Make code easy to understand
Supported by dplyr and ggplot2
library(magrittr) ## for pipe operations
## traditional way
b <- fun3(fun2(fun1(a), b), d)
## the above can be rewritten to
b <- a %>% fun1() %>% fun2(b) %>% fun3(d)
Quiz: Why not use ’c’ in above example?
12 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
13 / 44
Data Types and Structures
Data types
Integer
Numeric
Character
Factor
Logical
Date
Data structures
Vector
Matrix
Data frame
List
14 / 44
Vector
## integer vector
x <- 1:10
print(x)
## [1] 1 2 3 4 5 6 7 8 9 10
## numeric vector, generated randomly from a uniform distribution
y <- runif(5)
y
## [1] 0.95724678 0.02629283 0.49250477 0.07112317 0.93636358
## character vector
(z <- c("abc", "d", "ef", "g"))
## [1] "abc" "d" "ef" "g"
15 / 44
Matrix
## create a matrix with 4 rows, from a vector of 1:20
m <- matrix(1:20, nrow = 4, byrow = T)
m
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
## [3,] 11 12 13 14 15
## [4,] 16 17 18 19 20
## matrix subtraction
m - diag(nrow = 4, ncol = 5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 2 3 4 5
## [2,] 6 6 8 9 10
## [3,] 11 12 12 14 15
## [4,] 16 17 18 18 20
16 / 44
Data Frame
library(magrittr)
age <- c(45, 22, 61, 14, 37)
gender <- c("Female", "Male", "Male", "Female", "Male")
height <- c(1.68, 1.85, 1.8, 1.66, 1.72)
married <- c(T, F, T, F, F)
df <- data.frame(age, gender, height, married) %>% print()
## age gender height married
## 1 45 Female 1.68 TRUE
## 2 22 Male 1.85 FALSE
## 3 61 Male 1.80 TRUE
## 4 14 Female 1.66 FALSE
## 5 37 Male 1.72 FALSE
str(df)
## 'data.frame': 5 obs. of 4 variables:
## $ age : num 45 22 61 14 37
## $ gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2
## $ height : num 1.68 1.85 1.8 1.66 1.72
## $ married: logi TRUE FALSE TRUE FALSE FALSE
17 / 44
Data Slicing
df$age
## [1] 45 22 61 14 37
df[, 1]
## [1] 45 22 61 14 37
df[1, ]
## age gender height married
## 1 45 Female 1.68 TRUE
df[1, 1]
## [1] 45
df$gender[1]
## [1] Female
## Levels: Female Male
18 / 44
Data Subsetting and Sorting
df %>% subset(gender == "Male")
## age gender height married
## 2 22 Male 1.85 FALSE
## 3 61 Male 1.80 TRUE
## 5 37 Male 1.72 FALSE
idx <- order(df$age) %>% print()
## [1] 4 2 5 1 3
df[idx, ]
## age gender height married
## 4 14 Female 1.66 FALSE
## 2 22 Male 1.85 FALSE
## 5 37 Male 1.72 FALSE
## 1 45 Female 1.68 TRUE
## 3 61 Male 1.80 TRUE
19 / 44
List
x <- 1:10
y <- c("abc", "d", "ef", "g")
ls <- list(x, y) %>% print()
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [1] "abc" "d" "ef" "g"
## retrieve an element in a list
ls[[2]]
## [1] "abc" "d" "ef" "g"
ls[[2]][1]
## [1] "abc"
20 / 44
Character
x <- c("apple", "orange", "pear", "banana")
## search for a pattern
grep(pattern = "an", x)
## [1] 2 4
## search for a pattern and return matched elements
grep(pattern = "an", x, value = T)
## [1] "orange" "banana"
## replace a pattern
gsub(pattern = "an", replacement = "**", x)
## [1] "apple" "or**ge" "pear" "b****a"
21 / 44
Date
library(lubridate)
x <- ymd("2019-07-08")
class(x)
## [1] "Date"
year(x)
## [1] 2019
# month(x)
day(x)
## [1] 8
weekdays(x)
## [1] "Monday"
Date parsing functions: ymd(), ydm(), mdy(), myd(), dmy(),
dym(), yq() in package lubridate
22 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
23 / 44
Conditional Control
if . . . else . . .
score <- 4
if (score >= 3) {
print("pass")
} else {
print("fail")
}
## [1] "pass"
ifelse()
score <- 1:5
ifelse(score >= 3, "pass", "fail")
## [1] "fail" "fail" "pass" "pass" "pass"
24 / 44
Loop Control
for, while, repeat
break, next
for (i in 1:5) {
print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
25 / 44
Apply Functions
apply(): apply a function to margins of an array or matrix
lapply(): apply a function to every item in a list or vector
and return a list
sapply(): similar to lapply, but return a vector or matrix
vapply(): similar to sapply, but as a pre-specified type of
return value
26 / 44
Loop vs lapply
## for loop
x <- 1:10
y <- rep(NA, 10)
for (i in 1:length(x)) {
y[i] <- log(x[i])
}
y
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.79...
## [7] 1.9459101 2.0794415 2.1972246 2.3025851
## apply a function (log) to every element of x
tmp <- lapply(x, log)
y <- do.call("c", tmp) %>% print()
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.79...
## [7] 1.9459101 2.0794415 2.1972246 2.3025851
## same as above
sapply(x, log)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.79...
## [7] 1.9459101 2.0794415 2.1972246 2.3025851
27 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
28 / 44
Parallel Computing
## on Linux or Mac machines
library(parallel)
n.cores <- detectCores() - 1 %>% print()
tmp <- mclapply(x, log, mc.cores=n.cores)
y <- do.call("c", tmp)
## on Windows machines
library(parallel)
## set up cluster
cluster <- makeCluster(n.cores)
## run jobs in parallel
tmp <- parLapply(cluster, x, log)
## stop cluster
stopCluster(cluster)
# collect results
y <- do.call("c", tmp)
29 / 44
Parallel Computing (cont.)
On Windows machines, libraries and global variables used by a
function to run in parallel have to be explicited exported to all
nodes.
## on Windows machines
library(parallel)
## set up cluster
cluster <- makeCluster(n.cores)
## load required libraries, if any, on all nodes
tmp <- clusterEvalQ(cluster, library(igraph))
## export required variables, if any, to all nodes
clusterExport(cluster, "myvar")
## run jobs in parallel
tmp <- parLapply(cluster, x, myfunc)
## stop cluster
stopCluster(cluster)
# collect results
y <- do.call("c", tmp)
30 / 44
Parallel Computing (cont.)
On Windows machines, libraries and global variables used by a
function to run in parallel have to be explicited exported to all
nodes.
## on Windows machines
library(parallel)
## set up cluster
cluster <- makeCluster(n.cores)
## load required libraries, if any, on all nodes
tmp <- clusterEvalQ(cluster, library(igraph))
## export required variables, if any, to all nodes
clusterExport(cluster, "myvar")
## run jobs in parallel
tmp <- parLapply(cluster, x, myfunc)
## stop cluster
stopCluster(cluster)
# collect results
y <- do.call("c", tmp)
30 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
31 / 44
Functions
Define your own function: calculate the arithmetic average of a
numeric vector
average <- function(x) {
y <- sum(x)
n <- length(x)
z <- y/n
return(z)
}
## calcuate the average of 1:10
average(1:10)
## [1] 5.5
32 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
33 / 44
Data Import and Export
Read data from and write data to
R native formats (incl. Rdata and RDS)
CSV files
EXCEL files
ODBC databases
SAS databases
R Data Import/Export:
http://cran.r-project.org/doc/manuals/R-data.pdf
Chapter 2: Data Import and Export, in book R and Data Mining:
Examples and Case Studies.
http://www.rdatamining.com/docs/RDataMining-book.pdf
34 / 44
Save and Load R Objects
save(): save R objects into a .Rdata file
load(): read R objects from a .Rdata file
rm(): remove objects from R
a <- 1:10
save(a, file = "../data/dumData.Rdata")
rm(a)
a
## Error in eval(expr, envir, enclos): object ’a’ not found
load("../data/dumData.Rdata")
a
## [1] 1 2 3 4 5 6 7 8 9 10
35 / 44
Save and Load R Objects - More Functions
save.image():
save current workspace to a file
It saves everything!
readRDS():
read a single R object from a .rds file
saveRDS():
save a single R object to a file
Advantage of readRDS() and saveRDS():
You can restore the data under a different object name.
Advantage of load() and save():
You can save multiple R objects to one file.
36 / 44
Import from and Export to .CSV Files
write.csv(): write an R object to a .CSV file
read.csv(): read an R object from a .CSV file
# create a data frame
var1 <- 1:5
var2 <- (1:5)/10
var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies")
df1 <- data.frame(var1, var2, var3)
names(df1) <- c("VarInt", "VarReal", "VarChar")
# save to a csv file
write.csv(df1, "../data/dummmyData.csv", row.names = FALSE)
# read from a csv file
df2 <- read.csv("../data/dummmyData.csv")
print(df2)
## VarInt VarReal VarChar
## 1 1 0.1 R
## 2 2 0.2 and
## 3 3 0.3 Data Mining
## 4 4 0.4 Examples
## 5 5 0.5 Case Studies
37 / 44
Import from and Export to EXCEL Files
Package openxlsx: read, write and edit XLSX files
library(openxlsx)
xlsx.file <- "../data/dummmyData.xlsx"
write.xlsx(df2, xlsx.file, sheetName = "sheet1", row.names = F)
df3 <- read.xlsx(xlsx.file, sheet = "sheet1")
df3
## VarInt VarReal VarChar
## 1 1 0.1 R
## 2 2 0.2 and
## 3 3 0.3 Data Mining
## 4 4 0.4 Examples
## 5 5 0.5 Case Studies
38 / 44
Read from Databases
Package RODBC: provides connection to ODBC databases.
Function odbcConnect(): sets up a connection to database
sqlQuery(): sends an SQL query to the database
odbcClose() closes the connection.
library(RODBC)
db <- odbcConnect(dsn = "servername", uid = "userid",
pwd = "******")
sql <- "SELECT * FROM lib.table WHERE ..."
# or read query from file
sql <- readChar("myQuery.sql", nchars=99999)
myData <- sqlQuery(db, sql, errors=TRUE)
odbcClose(db)
39 / 44
Read from Databases
Package RODBC: provides connection to ODBC databases.
Function odbcConnect(): sets up a connection to database
sqlQuery(): sends an SQL query to the database
odbcClose() closes the connection.
library(RODBC)
db <- odbcConnect(dsn = "servername", uid = "userid",
pwd = "******")
sql <- "SELECT * FROM lib.table WHERE ..."
# or read query from file
sql <- readChar("myQuery.sql", nchars=99999)
myData <- sqlQuery(db, sql, errors=TRUE)
odbcClose(db)
Functions sqlFetch(), sqlSave() and sqlUpdate(): read, write
or update a table in an ODBC database
39 / 44
Import Data from SAS
Package foreign provides function read.ssd() for importing SAS
datasets (.sas7bdat files) into R.
library(foreign) # for importing SAS data
# the path of SAS on your computer
sashome <- "C:/Program Files/SAS/SASFoundation/9.4"
filepath <- "./data"
# filename should be no more than 8 characters, without extension
fileName <- "dumData"
# read data from a SAS dataset
a <- read.ssd(file.path(filepath), fileName,
sascmd=file.path(sashome, "sas.exe"))
40 / 44
Import Data from SAS
Package foreign provides function read.ssd() for importing SAS
datasets (.sas7bdat files) into R.
library(foreign) # for importing SAS data
# the path of SAS on your computer
sashome <- "C:/Program Files/SAS/SASFoundation/9.4"
filepath <- "./data"
# filename should be no more than 8 characters, without extension
fileName <- "dumData"
# read data from a SAS dataset
a <- read.ssd(file.path(filepath), fileName,
sascmd=file.path(sashome, "sas.exe"))
Alternatives:
function read.xport(): read a file in SAS Transport
(XPORT) format
RStudio : Environment Panel : Import Dataset from
SPSS/SAS/Stata
40 / 44
Contents
Introduction to R
RStudio
Pipe Operations
Data Objects
Control Flow
Parallel Computing
Functions
Data Import and Export
Online Resources
41 / 44
Online Resources
Book titled R and Data Mining: Examples and Case Studies
http://www.rdatamining.com/docs/RDataMining-book.pdf
R Reference Card for Data Mining
http://www.rdatamining.com/docs/RDataMining-reference-card.pdf
Free online courses and documents
http://www.rdatamining.com/resources/
RDataMining Group on LinkedIn (27,000+ members)
http://group.rdatamining.com
Twitter (3,300+ followers)
@RDataMining
42 / 44
The End
Thanks!
Email: yanchang(at)RDataMining.com
Twitter: @RDataMining
43 / 44
How to Cite This Work
Citation
Yanchang Zhao. R and Data Mining: Examples and Case Studies. ISBN
978-0-12-396963-7, December 2012. Academic Press, Elsevier. 256
pages. URL: http://www.rdatamining.com/docs/RDataMining-book.pdf.
BibTex
@BOOK{Zhao2012R,
title = {R and Data Mining: Examples and Case Studies},
publisher = {Academic Press, Elsevier},
year = {2012},
author = {Yanchang Zhao},
pages = {256},
month = {December},
isbn = {978-0-123-96963-7},
keywords = {R, data mining},
url = {http://www.rdatamining.com/docs/RDataMining-book.pdf}
}
44 / 44

More Related Content

What's hot

R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
Ragia Ibrahim
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
izahn
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
Mahmoud Shiri Varamini
 
R basics
R basicsR basics
R basics
FAO
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
Guy Lebanon
 
Introduction to the language R
Introduction to the language RIntroduction to the language R
Introduction to the language R
fbenault
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
Yogendra Chaubey
 
Programming in R
Programming in RProgramming in R
Programming in R
Smruti Sarangi
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
Sovello Hildebrand
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
Gaurang Dobariya
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
Unmesh Baile
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
Yanchang Zhao
 
R - the language
R - the languageR - the language
R - the language
Mike Martinez
 
R programming language: conceptual overview
R programming language: conceptual overviewR programming language: conceptual overview
R programming language: conceptual overview
Maxim Litvak
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Functional Programming in R
Functional Programming in RFunctional Programming in R
Functional Programming in R
Soumendra Dhanee
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
Yanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
Yanchang Zhao
 
R tutorial
R tutorialR tutorial
R tutorial
Richard Vidgen
 
R programming language
R programming languageR programming language
R programming language
Alberto Minetti
 

What's hot (20)

R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R basics
R basicsR basics
R basics
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Introduction to the language R
Introduction to the language RIntroduction to the language R
Introduction to the language R
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
Programming in R
Programming in RProgramming in R
Programming in R
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
R - the language
R - the languageR - the language
R - the language
 
R programming language: conceptual overview
R programming language: conceptual overviewR programming language: conceptual overview
R programming language: conceptual overview
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Functional Programming in R
Functional Programming in RFunctional Programming in R
Functional Programming in R
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
R tutorial
R tutorialR tutorial
R tutorial
 
R programming language
R programming languageR programming language
R programming language
 

Similar to RDataMining slides-r-programming

R basics
R basicsR basics
R basics
Sagun Baijal
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
The Statistical and Applied Mathematical Sciences Institute
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
Rsquared Academy
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
Alexandros Karatzoglou
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
Samuel Bosch
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
RohanBorgalli
 
Rbootcamp Day 1
Rbootcamp Day 1Rbootcamp Day 1
Rbootcamp Day 1
Olga Scrivner
 
The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)
jaxLondonConference
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918Yen_CY
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918Chia-Yi Yen
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
062MayankSinghal
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Data Con LA
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
Paul Lam
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
Peter Solymos
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
Andrew Lowe
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
Rsquared Academy
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
BusyBird2
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
RaginiRatre
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
ssuserd64918
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
ArchishaKhandareSS20
 

Similar to RDataMining slides-r-programming (20)

R basics
R basicsR basics
R basics
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
Rbootcamp Day 1
Rbootcamp Day 1Rbootcamp Day 1
Rbootcamp Day 1
 
The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 

More from Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
Yanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
Yanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
Yanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
Yanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
Yanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
Yanchang Zhao
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
Yanchang Zhao
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
Yanchang Zhao
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
Yanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
Yanchang Zhao
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
Yanchang Zhao
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
Yanchang Zhao
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
Yanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
Yanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
Yanchang Zhao
 

More from Yanchang Zhao (15)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

RDataMining slides-r-programming

  • 1. Basics of R Programming Yanchang Zhao http://www.RDataMining.com R and Data Mining Course Beijing University of Posts and Telecommunications, Beijing, China July 2019 1 / 44
  • 2. Quiz Have you used R before? 2 / 44
  • 3. Quiz Have you used R before? Are you familiar with data mining and machine learning techniques and algorithms? 2 / 44
  • 4. Quiz Have you used R before? Are you familiar with data mining and machine learning techniques and algorithms? Have you used R for data mining and analytics in your study/research/work? 2 / 44
  • 5. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 3 / 44
  • 6. What is R? R ∗ is a free software environment for statistical computing and graphics. R can be easily extended with 14,000+ packages available on CRAN† (as of July 2019). Many other packages provided on Bioconductor‡, R-Forge§, GitHub¶, etc. R manuals on CRAN An Introduction to R The R Language Definition R Data Import/Export . . . ∗ http://www.r-project.org/ † http://cran.r-project.org/ ‡ http://www.bioconductor.org/ § http://r-forge.r-project.org/ ¶ https://github.com/ http://cran.r-project.org/manuals.html 4 / 44
  • 7. Why R? R is widely used in both academia and industry. R is one of the most popular tools for data science and analytics, ranked #1 from 2011 to 2016, but sadly overtaken by Python since 2017, :-( ∗∗. The CRAN Task Views †† provide collections of packages for different tasks. Machine learning & atatistical learning Cluster analysis & finite mixture models Time series analysis Multivariate statistics Analysis of spatial data . . . ∗∗ The KDnuggets polls on Top Analytics, Data Science software https://www.kdnuggets.com/2019/05/poll-top-data-science-machine-learning-platforms.html †† http://cran.r-project.org/web/views/ 5 / 44
  • 8. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 6 / 44
  • 9. RStudio‡‡ An integrated development environment (IDE) for R Runs on various operating systems like Windows, Mac OS X and Linux Suggestion: always using an RStudio project, with subfolders code: source code data: raw data, cleaned data figures: charts and graphs docs: documents and reports models: analytics models ‡‡ https://www.rstudio.com/products/rstudio/ 7 / 44
  • 11. RStudio Keyboard Shortcuts Run current line or selection: Ctrl + enter Comment / uncomment selection: Ctrl + Shift + C Clear console: Ctrl + L Reindent selection: Ctrl + I 9 / 44
  • 12. Writing Reports and Papers Sweave + LaTex: for academic publications beamer + LaTex: for presentations knitr + R Markdown: generating reports and slides in HTML, PDF and WORD formats Notebooks: R notebook, Jupiter notebook 10 / 44
  • 13. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 11 / 44
  • 14. Pipe Operations Load library magrittr for pipe operations Avoid nested function calls Make code easy to understand Supported by dplyr and ggplot2 library(magrittr) ## for pipe operations ## traditional way b <- fun3(fun2(fun1(a), b), d) ## the above can be rewritten to b <- a %>% fun1() %>% fun2(b) %>% fun3(d) 12 / 44
  • 15. Pipe Operations Load library magrittr for pipe operations Avoid nested function calls Make code easy to understand Supported by dplyr and ggplot2 library(magrittr) ## for pipe operations ## traditional way b <- fun3(fun2(fun1(a), b), d) ## the above can be rewritten to b <- a %>% fun1() %>% fun2(b) %>% fun3(d) Quiz: Why not use ’c’ in above example? 12 / 44
  • 16. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 13 / 44
  • 17. Data Types and Structures Data types Integer Numeric Character Factor Logical Date Data structures Vector Matrix Data frame List 14 / 44
  • 18. Vector ## integer vector x <- 1:10 print(x) ## [1] 1 2 3 4 5 6 7 8 9 10 ## numeric vector, generated randomly from a uniform distribution y <- runif(5) y ## [1] 0.95724678 0.02629283 0.49250477 0.07112317 0.93636358 ## character vector (z <- c("abc", "d", "ef", "g")) ## [1] "abc" "d" "ef" "g" 15 / 44
  • 19. Matrix ## create a matrix with 4 rows, from a vector of 1:20 m <- matrix(1:20, nrow = 4, byrow = T) m ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 2 3 4 5 ## [2,] 6 7 8 9 10 ## [3,] 11 12 13 14 15 ## [4,] 16 17 18 19 20 ## matrix subtraction m - diag(nrow = 4, ncol = 5) ## [,1] [,2] [,3] [,4] [,5] ## [1,] 0 2 3 4 5 ## [2,] 6 6 8 9 10 ## [3,] 11 12 12 14 15 ## [4,] 16 17 18 18 20 16 / 44
  • 20. Data Frame library(magrittr) age <- c(45, 22, 61, 14, 37) gender <- c("Female", "Male", "Male", "Female", "Male") height <- c(1.68, 1.85, 1.8, 1.66, 1.72) married <- c(T, F, T, F, F) df <- data.frame(age, gender, height, married) %>% print() ## age gender height married ## 1 45 Female 1.68 TRUE ## 2 22 Male 1.85 FALSE ## 3 61 Male 1.80 TRUE ## 4 14 Female 1.66 FALSE ## 5 37 Male 1.72 FALSE str(df) ## 'data.frame': 5 obs. of 4 variables: ## $ age : num 45 22 61 14 37 ## $ gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 ## $ height : num 1.68 1.85 1.8 1.66 1.72 ## $ married: logi TRUE FALSE TRUE FALSE FALSE 17 / 44
  • 21. Data Slicing df$age ## [1] 45 22 61 14 37 df[, 1] ## [1] 45 22 61 14 37 df[1, ] ## age gender height married ## 1 45 Female 1.68 TRUE df[1, 1] ## [1] 45 df$gender[1] ## [1] Female ## Levels: Female Male 18 / 44
  • 22. Data Subsetting and Sorting df %>% subset(gender == "Male") ## age gender height married ## 2 22 Male 1.85 FALSE ## 3 61 Male 1.80 TRUE ## 5 37 Male 1.72 FALSE idx <- order(df$age) %>% print() ## [1] 4 2 5 1 3 df[idx, ] ## age gender height married ## 4 14 Female 1.66 FALSE ## 2 22 Male 1.85 FALSE ## 5 37 Male 1.72 FALSE ## 1 45 Female 1.68 TRUE ## 3 61 Male 1.80 TRUE 19 / 44
  • 23. List x <- 1:10 y <- c("abc", "d", "ef", "g") ls <- list(x, y) %>% print() ## [[1]] ## [1] 1 2 3 4 5 6 7 8 9 10 ## ## [[2]] ## [1] "abc" "d" "ef" "g" ## retrieve an element in a list ls[[2]] ## [1] "abc" "d" "ef" "g" ls[[2]][1] ## [1] "abc" 20 / 44
  • 24. Character x <- c("apple", "orange", "pear", "banana") ## search for a pattern grep(pattern = "an", x) ## [1] 2 4 ## search for a pattern and return matched elements grep(pattern = "an", x, value = T) ## [1] "orange" "banana" ## replace a pattern gsub(pattern = "an", replacement = "**", x) ## [1] "apple" "or**ge" "pear" "b****a" 21 / 44
  • 25. Date library(lubridate) x <- ymd("2019-07-08") class(x) ## [1] "Date" year(x) ## [1] 2019 # month(x) day(x) ## [1] 8 weekdays(x) ## [1] "Monday" Date parsing functions: ymd(), ydm(), mdy(), myd(), dmy(), dym(), yq() in package lubridate 22 / 44
  • 26. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 23 / 44
  • 27. Conditional Control if . . . else . . . score <- 4 if (score >= 3) { print("pass") } else { print("fail") } ## [1] "pass" ifelse() score <- 1:5 ifelse(score >= 3, "pass", "fail") ## [1] "fail" "fail" "pass" "pass" "pass" 24 / 44
  • 28. Loop Control for, while, repeat break, next for (i in 1:5) { print(i^2) } ## [1] 1 ## [1] 4 ## [1] 9 ## [1] 16 ## [1] 25 25 / 44
  • 29. Apply Functions apply(): apply a function to margins of an array or matrix lapply(): apply a function to every item in a list or vector and return a list sapply(): similar to lapply, but return a vector or matrix vapply(): similar to sapply, but as a pre-specified type of return value 26 / 44
  • 30. Loop vs lapply ## for loop x <- 1:10 y <- rep(NA, 10) for (i in 1:length(x)) { y[i] <- log(x[i]) } y ## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.79... ## [7] 1.9459101 2.0794415 2.1972246 2.3025851 ## apply a function (log) to every element of x tmp <- lapply(x, log) y <- do.call("c", tmp) %>% print() ## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.79... ## [7] 1.9459101 2.0794415 2.1972246 2.3025851 ## same as above sapply(x, log) ## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.79... ## [7] 1.9459101 2.0794415 2.1972246 2.3025851 27 / 44
  • 31. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 28 / 44
  • 32. Parallel Computing ## on Linux or Mac machines library(parallel) n.cores <- detectCores() - 1 %>% print() tmp <- mclapply(x, log, mc.cores=n.cores) y <- do.call("c", tmp) ## on Windows machines library(parallel) ## set up cluster cluster <- makeCluster(n.cores) ## run jobs in parallel tmp <- parLapply(cluster, x, log) ## stop cluster stopCluster(cluster) # collect results y <- do.call("c", tmp) 29 / 44
  • 33. Parallel Computing (cont.) On Windows machines, libraries and global variables used by a function to run in parallel have to be explicited exported to all nodes. ## on Windows machines library(parallel) ## set up cluster cluster <- makeCluster(n.cores) ## load required libraries, if any, on all nodes tmp <- clusterEvalQ(cluster, library(igraph)) ## export required variables, if any, to all nodes clusterExport(cluster, "myvar") ## run jobs in parallel tmp <- parLapply(cluster, x, myfunc) ## stop cluster stopCluster(cluster) # collect results y <- do.call("c", tmp) 30 / 44
  • 34. Parallel Computing (cont.) On Windows machines, libraries and global variables used by a function to run in parallel have to be explicited exported to all nodes. ## on Windows machines library(parallel) ## set up cluster cluster <- makeCluster(n.cores) ## load required libraries, if any, on all nodes tmp <- clusterEvalQ(cluster, library(igraph)) ## export required variables, if any, to all nodes clusterExport(cluster, "myvar") ## run jobs in parallel tmp <- parLapply(cluster, x, myfunc) ## stop cluster stopCluster(cluster) # collect results y <- do.call("c", tmp) 30 / 44
  • 35. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 31 / 44
  • 36. Functions Define your own function: calculate the arithmetic average of a numeric vector average <- function(x) { y <- sum(x) n <- length(x) z <- y/n return(z) } ## calcuate the average of 1:10 average(1:10) ## [1] 5.5 32 / 44
  • 37. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 33 / 44
  • 38. Data Import and Export Read data from and write data to R native formats (incl. Rdata and RDS) CSV files EXCEL files ODBC databases SAS databases R Data Import/Export: http://cran.r-project.org/doc/manuals/R-data.pdf Chapter 2: Data Import and Export, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining-book.pdf 34 / 44
  • 39. Save and Load R Objects save(): save R objects into a .Rdata file load(): read R objects from a .Rdata file rm(): remove objects from R a <- 1:10 save(a, file = "../data/dumData.Rdata") rm(a) a ## Error in eval(expr, envir, enclos): object ’a’ not found load("../data/dumData.Rdata") a ## [1] 1 2 3 4 5 6 7 8 9 10 35 / 44
  • 40. Save and Load R Objects - More Functions save.image(): save current workspace to a file It saves everything! readRDS(): read a single R object from a .rds file saveRDS(): save a single R object to a file Advantage of readRDS() and saveRDS(): You can restore the data under a different object name. Advantage of load() and save(): You can save multiple R objects to one file. 36 / 44
  • 41. Import from and Export to .CSV Files write.csv(): write an R object to a .CSV file read.csv(): read an R object from a .CSV file # create a data frame var1 <- 1:5 var2 <- (1:5)/10 var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies") df1 <- data.frame(var1, var2, var3) names(df1) <- c("VarInt", "VarReal", "VarChar") # save to a csv file write.csv(df1, "../data/dummmyData.csv", row.names = FALSE) # read from a csv file df2 <- read.csv("../data/dummmyData.csv") print(df2) ## VarInt VarReal VarChar ## 1 1 0.1 R ## 2 2 0.2 and ## 3 3 0.3 Data Mining ## 4 4 0.4 Examples ## 5 5 0.5 Case Studies 37 / 44
  • 42. Import from and Export to EXCEL Files Package openxlsx: read, write and edit XLSX files library(openxlsx) xlsx.file <- "../data/dummmyData.xlsx" write.xlsx(df2, xlsx.file, sheetName = "sheet1", row.names = F) df3 <- read.xlsx(xlsx.file, sheet = "sheet1") df3 ## VarInt VarReal VarChar ## 1 1 0.1 R ## 2 2 0.2 and ## 3 3 0.3 Data Mining ## 4 4 0.4 Examples ## 5 5 0.5 Case Studies 38 / 44
  • 43. Read from Databases Package RODBC: provides connection to ODBC databases. Function odbcConnect(): sets up a connection to database sqlQuery(): sends an SQL query to the database odbcClose() closes the connection. library(RODBC) db <- odbcConnect(dsn = "servername", uid = "userid", pwd = "******") sql <- "SELECT * FROM lib.table WHERE ..." # or read query from file sql <- readChar("myQuery.sql", nchars=99999) myData <- sqlQuery(db, sql, errors=TRUE) odbcClose(db) 39 / 44
  • 44. Read from Databases Package RODBC: provides connection to ODBC databases. Function odbcConnect(): sets up a connection to database sqlQuery(): sends an SQL query to the database odbcClose() closes the connection. library(RODBC) db <- odbcConnect(dsn = "servername", uid = "userid", pwd = "******") sql <- "SELECT * FROM lib.table WHERE ..." # or read query from file sql <- readChar("myQuery.sql", nchars=99999) myData <- sqlQuery(db, sql, errors=TRUE) odbcClose(db) Functions sqlFetch(), sqlSave() and sqlUpdate(): read, write or update a table in an ODBC database 39 / 44
  • 45. Import Data from SAS Package foreign provides function read.ssd() for importing SAS datasets (.sas7bdat files) into R. library(foreign) # for importing SAS data # the path of SAS on your computer sashome <- "C:/Program Files/SAS/SASFoundation/9.4" filepath <- "./data" # filename should be no more than 8 characters, without extension fileName <- "dumData" # read data from a SAS dataset a <- read.ssd(file.path(filepath), fileName, sascmd=file.path(sashome, "sas.exe")) 40 / 44
  • 46. Import Data from SAS Package foreign provides function read.ssd() for importing SAS datasets (.sas7bdat files) into R. library(foreign) # for importing SAS data # the path of SAS on your computer sashome <- "C:/Program Files/SAS/SASFoundation/9.4" filepath <- "./data" # filename should be no more than 8 characters, without extension fileName <- "dumData" # read data from a SAS dataset a <- read.ssd(file.path(filepath), fileName, sascmd=file.path(sashome, "sas.exe")) Alternatives: function read.xport(): read a file in SAS Transport (XPORT) format RStudio : Environment Panel : Import Dataset from SPSS/SAS/Stata 40 / 44
  • 47. Contents Introduction to R RStudio Pipe Operations Data Objects Control Flow Parallel Computing Functions Data Import and Export Online Resources 41 / 44
  • 48. Online Resources Book titled R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining-book.pdf R Reference Card for Data Mining http://www.rdatamining.com/docs/RDataMining-reference-card.pdf Free online courses and documents http://www.rdatamining.com/resources/ RDataMining Group on LinkedIn (27,000+ members) http://group.rdatamining.com Twitter (3,300+ followers) @RDataMining 42 / 44
  • 50. How to Cite This Work Citation Yanchang Zhao. R and Data Mining: Examples and Case Studies. ISBN 978-0-12-396963-7, December 2012. Academic Press, Elsevier. 256 pages. URL: http://www.rdatamining.com/docs/RDataMining-book.pdf. BibTex @BOOK{Zhao2012R, title = {R and Data Mining: Examples and Case Studies}, publisher = {Academic Press, Elsevier}, year = {2012}, author = {Yanchang Zhao}, pages = {256}, month = {December}, isbn = {978-0-123-96963-7}, keywords = {R, data mining}, url = {http://www.rdatamining.com/docs/RDataMining-book.pdf} } 44 / 44