Introduction to ggplot2

Introduction to ggplot2
Elegant Graphics for Data Analysis
Maik Röder
15.12.2011
RUGBCN and Barcelona Code Meetup

vendredi 16 décembre 2011 1

Data Analysis Steps
• Prepare data
• e.g. using the reshape framework for restructuring
data
• Plot data
• e.g. using ggplot2 instead of base graphics and
lattice
• Summarize the data and reﬁne the plots
• Iterative process

ggplot2
grammar of graphics


Grammar
• Oxford English Dictionary:

• The fundamental principles or rules of an art or
science

• A book presenting these in methodical form.
(Now rare; formerly common in the titles of
books.)

• System of rules underlying a given language

• An abstraction which facilitates thinking, reasoning
and communicating


The grammar of graphics
• Move beyond named graphics (e.g. “scatterplot”)

• gain insight into the deep structure that underlies
statistical graphics

• Powerful and ﬂexible system for

• constructing abstract graphs (set of points)
mathematically

• Realizing physical representations as graphics by
mapping aesthetic attributes (size, colour) to graphs

• Lacking openly available implementation


Speciﬁcation
Concise description of components of a graphic

• DATA - data operations that create variables
from datasets. Reshaping using an Algebra with
operations
• TRANS - variable transformations
• SCALE - scale transformations
• ELEMENT - graphs and their aesthetic attributes
• COORD - a coordinate system
• GUIDE - one or more guides

Birth/Death Rate

Source: http://www.scalloway.org.uk/popu6.htm


Excess birth
(vs. death) rates in selected countries

Source: The grammar of Graphics, p.13

Grammar of Graphics
Speciﬁcation can be run in GPL implemented in SPSS

DATA: source("demographics")
DATA: longitude,
latitude = map(source("World"))
TRANS: bd = max(birth - death, 0)
COORD: project.mercator()
ELEMENT: point(position(lon * lat),
size(bd),
color(color.red))
ELEMENT: polygon(position(longitude *
latitude))
Source: The grammar of Graphics, p.13

Rearrangement of Components
Grammar of Graphics Layered Grammar of
Graphics
Data Defaults
Trans Data
Mapping
Element Layer
Data
Mapping
Geom
Stat
Scale Position
Guide Scale
Coord
Coord Facet

Layered Grammar of Graphics
Implementation embedded in R using ggplot2

w <- world
d <- demographics
d <- transform(d,
bd = pmax(birth - death, 0))
p <- ggplot(d, aes(lon, lat))
p <- p + geom_polygon(data = w)
p <- p + geom_point(aes(size = bd),
colour = "red")
p <- p + coord_map(projection = "mercator")
p

ggplot2
• Author: Hadley Wickham

• Open Source implementation of the layered
grammar of graphics

• High-level R package for creating publication-
quality statistical graphics

• Carefully chosen defaults following basic
graphical design rules

• Flexible set of components for creating any type of
graphics

ggplot2 installation
• In R console:
install.packages("ggplot2")
library(ggplot2)


qplot
• Quickly plot something with qplot
• for exploring ideas interactively
• Same options as plot converted to ggplot2
qplot(carat, price,
data=diamonds,
main = "Diamonds",
asp = 1)


Exploring with qplot
First try:

qplot(carat, price,
data=diamonds)
Log transform using functions on the variables:
qplot(log(carat),
log(price),
data=diamonds)


from qplot to ggplot
qplot(carat, price,
data=diamonds,
main = "Diamonds",
asp = 1)

p <- ggplot(diamonds, aes(carat, price))
p <- p + geom_point()
p <- p + opts(title = "Diamonds",
aspect.ratio = 1)
p

Data and mapping

• If you need to ﬂexibly restructure and
aggregate data beforehand, use Reshape

• data is considered an independent concern
• Need a mapping of what variables are
mapped to what aesthetic
• weight => x, height => y, age => size
• Mappings are deﬁned in scales

Statistical Transformations
• a stat transforms data
• can add new variables to a dataset
• that can be used in aesthetic mappings


stat_smooth
• Fits a smoother to the data
• Displays a smooth and its standard error
ggplot(diamonds, aes(carat, price)) +
geom_point() + geom_smooth()


Geometric Object
• Control the type of plot
• A geom can only display certain aesthetics


geom_histogram

• Distribution of carats shown in a histogram

ggplot(diamonds, aes(carat)) +
geom_histogram()


Position adjustments
• Tweak positioning of geometric objects
• Avoid overlaps


position_jitter

• Avoid overplotting by jittering points
x <- c(0, 0, 0, 0, 0)
y <- c(0, 0, 0, 0, 0)
overplotted <- data.frame(x, y)
ggplot(overplotted, aes(x,y)) +
geom_point(position=position_jitter
(w=0.1, h=0.1))

Scales
• Control mapping from data to aesthetic
attributes
• One scale per aesthetic


scale_x_continuous
scale_y_continuous
x <- c(0, 0, 0, 0, 0)
y <- c(0, 0, 0, 0, 0)
overplotted <- data.frame(x, y)
ggplot(overplotted, aes(x,y)) +
geom_point(position=position_jitter
(w=0.1, h=0.1)) +
scale_x_continuous(limits=c(-1,1)) +
scale_y_continuous(limits=c(-1,1))


Coordinate System
• Maps the position of objects into the plane
• Affect all position variables simultaneously
• Change appearance of geoms (unlike scales)


coord_map
library("maps")
map <- map("nz", plot=FALSE)[c("x","y")]
m <- data.frame(map)
n <- qplot(x, y, data=m, geom="path")
n
d <- data.frame(c(0), c(0))
n + geom_point(data = d, colour = "red")


Faceting
• lay out multiple plots on a page
• split data into subsets
• plot subsets into different panels


Facet Types
2D grid of panels: 1D ribbon of panels
wrapped into 2D:


Faceting

aesthetics <- aes(carat, ..density..)
p <- ggplot(diamonds, aesthetics)
p <- p + geom_histogram(binwidth = 0.2)
p + facet_grid(clarity ~ cut)


Faceting Formula
no faceting .~ .

single row multiple columns .~ a

single column, multiple rows b~.

multiple rows and columns a~b

.~ a + b
multiple variables in rows and/or
a + b ~.
columns
a+b~c+d


Scales in Facets
facet_grid(. ~ cyl, scales="free_x")

scales value free

fixed -

free x, y

free_x x

free_y y

Layers
• Iterativey update a plot
• change a single feature at a time
• Think about the high level aspects of the
plot in isolation
• Instead of choosing a static type of plot,
create new types of plots on the ﬂy
• Cure against immobility
• Developers can easily develop new layers
without affecting other layers

Hierarchy of defaults
Omitted layer Default chosen by layer
Stat Geom
Geom Stat
Mapping Plot default
Coord Cartesian coordinates
Chosen depending on aesthetic and type of
Scale
variable
Linear scaling for continuous variables
Position
Integers for categorical variables


Thanks!
• Visit the ggplot2 homepage:
• http://had.co.nz/ggplot2/
• Get the ggplot2 book:
• http://amzn.com/0387981403
• Get the Grammar of Graphics book from
Leland Wilkinson:
• http://amzn.com/0387245448

Introduction to ggplot2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Introduction to ggplot2

Similar to Introduction to ggplot2 (20)

More from maikroeder

More from maikroeder (7)

Recently uploaded

Recently uploaded (20)

Introduction to ggplot2