3. Importing
Plain text files: the workhorse function read.table()
read.table("path_to_file",
header = TRUE, # first row as column names
sep = ",", # column separtor
stringsAsFactors = FALSE) # not convert text to factors
14. Indexing/Subsetting
Atomic vector:
## zero
x[0] # returns zero-length vector
> numeric(0)
## character vector (subsetting using names)
y <- setNames(x, letters[1:4])
y[c("c", "a", "d")]
> c a d
> 3 2 5
15. Indexing/Subsetting
List:
Subsetting a list works in the same way as subsetting an atomic vector.
Using [ will always return a list; [[ and $ pull out the components of the list.
17. Indexing/Subsetting
Data frames:
Data frames possess the characteristics of both lists and matrices: if you
subset with a single vector, they behave like lists; if you subset with two
vectors, they behave like matrices.
dtf <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])
dtf
> x y z
> 1 1 3 a
> 2 2 2 b
> 3 3 1 c
20. Indexing/Subsetting
Data frames:
if output is a single column, returns a vector instead of a data frame.
str(dtf[, "x"]) # simplifying
> int [1:3] 1 2 3
str(dtf[, "x", drop = F]) # preserving
> 'data.frame': 3 obs. of 1 variable:
> $ x: int 1 2 3
25. Reshaping
reshape2 written by Hadley Wickham that makes it dealy easy to transform
data between wide and long formats.
Wide-format data:
day storeA storeB storeC
2017/03/22 12 2 34
2017/03/23 1 11 5
Long-format data:
day stores sales
2017/03/22 storeA 12
2017/03/22 storeB 2
2017/03/22 storeC 34
2017/03/23 storeA 1
2017/03/23 storeB 11
2017/03/23 storeC 5
26. Reshaping
melt() takes wide-format data and melts it into long-format data.
long <- melt(dtf, id.vars = "day",
variable.name = "stores", value.name = "sales")
long
> day stores sales
> 1 2017/03/22 storeA 12
> 2 2017/03/23 storeA 1
> 3 2017/03/22 storeB 2
> 4 2017/03/23 storeB 11
> 5 2017/03/22 storeC 34
> 6 2017/03/23 storeC 5
27. Reshaping
dcast() takes long-format data and casts it into wide-format data.
wide <- dcast(long, day ~ stores, value.var = "sales")
wide
> day storeA storeB storeC
> 1 2017/03/22 12 2 34
> 2 2017/03/23 1 11 5
29. Merging
Binding 2 data frames vertically:
## The two data frames must have the same variables,
## but they do not have to be in the same order.
total <- rbind(dtf_A, dtf_B)
Binding 2 data frames horizontally:
## The two data frames must have the same rows.
total <- cbind(dtf_A, dtf_B)
30. Merging
Question: Given
a <- data.frame(x1 = c("A", "B", "C"), x2 = c(1, 2, 3))
b <- data.frame(x1 = c("A", "B", "D"), x3 = c(T, F, T))
Which expression is used to get the following result?
> x1 x2 x3
> 1 A 1 TRUE
> 2 B 2 FALSE
> 3 C 3 NA
a. merge(a, b, by = "x1", all = T)
b. merge(a, b, by = "x1", all.x = T)
c. merge(a, b, by = "x1", all = F)
d. merge(a, b, by = "x1", all.y = T)
40. Repeating/Looping
More general sequences:
The step in sequences created by : is always 1.
seq() makes it possible to generate more general sequences
seq(from,
to,
by, # stepsize
length.out) # length of final vector
47. Repeating/Looping
if-then-else example:
x <- c(-4, 9)
if (x > 0) y <- sqrt(x) else y <- x^2
> Warning in if (x > 0) y <- sqrt(x) else y <- x^2: the condition h
> > 1 and only the first element will be used
print(y)
> [1] 16 81