This document provides an introduction to exploring and visualizing data using the R programming language. It discusses the history and development of R, introduces key R packages like tidyverse and ggplot2 for data analysis and visualization, and provides examples of reading data, examining data structures, and creating basic plots and histograms. It also demonstrates more advanced ggplot2 concepts like faceting, mapping variables to aesthetics, using different geoms, and combining multiple geoms in a single plot.
4. A brief history of R
• In the beginning, there was S. Developed at
Bell Labs in the 1970’s.
• S was owned and licensed by AT&T
• In 1990’s, two professors from New Zealand
created a free, open source reimplementation
of S, called R
• Many of the unusual features of R exist
because they came from S
• R itself is somewhat different from S and has a
very flexible syntax
10. Your turn
Inspect the diamonds data set.
With diamonds, make a histogram of the carat
variable. Experiment with different bin sizes. What
patterns do you see?
Inspect the mpg data set.
With mpg, make a scatter plot showing the
relationship between displ and hwy.
20. Your turn
What happens if you use shape instead of
color?
Run ?geom_smooth to see the documentation.
Then remove the confidence region from the
model line.
What happens if you add a model line and map a
variable to color?
34. Discrete Continuous
Color Rainbow of colors
Gradient from light
blue to
dark blue
Size Discrete size steps
Linear mapping
between radius and
value
Shape
Different shape for
each
Shouldn’t work
39. Tidy data
A B C D A B C D
Each variable is
in a column
Each observation
is in a row
40. Example of non-tidy data
subject sex cond1 cond2 cond3
1 M 7.9 12.3 10.7
2 F 6.3 10.6 11.1
3 F 9.5 13.1 13.8
4 M 11.5 13.4 12.9
Each row has 3
observations
Not Tidy
41. Converting to tidy data
subject sex cond1 cond2 cond3
1 M 7.9 12.3 10.7
2 F 6.3 10.6 11.1
3 F 9.5 13.1 13.8
4 M 11.5 13.4 12.9
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
Not Tidy
Tidy
44. # AND
filter(mpg, hwy > 30, class == "compact")
filter(mpg, hwy > 30 & class == "compact")
# OR
filter(mpg, hwy > 30 | class == "compact")
Filter: get a subset of rows
53. Summarise
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
value
11.1
data %>%
summarise(value = mean(value))
54. Group-wise summarise
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
subject value
1 10.3
2 9.3
3 12.1
4 12.6
data %>%
group_by(subject) %>%
summarise(value = mean(value))
55. Group-wise summarise
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
sex condition value
F cond1 11.9
F cond2 12.5
F cond3 7.9
M cond1 12.9
M cond2 11.8
M cond3 9.7
data %>%
group_by(sex, condition) %>%
summarise(value = mean(value))
56. Mutate
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
data %>%
mutate(norm = value - mean(value))
subject sex condition value norm
1 M cond1 7.9 -3.2
1 M cond2 12.3 1.2
1 M cond3 10.7 -0.4
2 F cond1 6.3 -4.8
2 F cond2 10.6 -0.5
2 F cond3 11.1 0
3 F cond1 9.5 -1.6
3 F cond2 13.1 2
3 F cond3 13.8 2.7
4 M cond1 11.5 0.4
4 M cond2 13.4 2.3
4 M cond3 12.9 1.8
57. Group-wise mutate
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
data %>%
group_by(subject) %>%
mutate(norm = value - mean(value))
subject sex condition value norm
1 M cond1 7.9 -2.4
1 M cond2 12.3 2
1 M cond3 10.7 0.4
2 F cond1 6.3 -3
2 F cond2 10.6 1.3
2 F cond3 11.1 1.8
3 F cond1 9.5 -2.6
3 F cond2 13.1 1
3 F cond3 13.8 1.7
4 M cond1 11.5 -1.1
4 M cond2 13.4 0.8
4 M cond3 12.9 0.3
59. Converting to tidy data
subject sex cond1 cond2 cond3
1 M 7.9 12.3 10.7
2 F 6.3 10.6 11.1
3 F 9.5 13.1 13.8
4 M 11.5 13.4 12.9
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
Not Tidy
Tidy
60. Converting to tidy data
subject sex cond1 cond2 cond3
1 M 7.9 12.3 10.7
2 F 6.3 10.6 11.1
3 F 9.5 13.1 13.8
4 M 11.5 13.4 12.9
gather(data, condition, value, cond1:cond3)
subject sex condition value
1 M cond1 7.9
1 M cond2 12.3
1 M cond3 10.7
2 F cond1 6.3
2 F cond2 10.6
2 F cond3 11.1
3 F cond1 9.5
3 F cond2 13.1
3 F cond3 13.8
4 M cond1 11.5
4 M cond2 13.4
4 M cond3 12.9
data