Ggplot2 1outof3

Learnggplot2
AsKungfu Skill
A K
f Skills
Givenby
KaiXiao(DataSciencetist)
VivianZhang(Co-founder&CTO)
Vi i Zh
(C f
d & CTO)
Contact:vivian.zhang@supstat.com
Contact: vivian zhang@supstat com

• I： Point
• II： Bar
• III：Histogram
g
• IV：Line
• V： Tile
• VI：Map

Introduction
• ggplot2 is a plotting system for R
• based on the《The Grammar of
Graphics》
• which tries to take the good parts
of base and lattice graphics and
none of the bad parts
none of the bad parts
• It takes care of many of the fiddly
details that make plotting a hassle
details that make plotting a hassle
• It becomes easy to produce
complex multi‐layered graphics
p
y
g p

Whyweloveggplot2?
Why we love ggplot2?
•

controltheplotasabstractlayersandmakecreativitybecomereality；

•

getusedtostructuralthinking；
d
l hi ki

•

getbeautifulgraphicswhileavoidingcomplicateddetails
1973murdercasesinUSA

7BasicConcepts
7 Basic Concepts
• Mapping
• Scale
• Geometric
Stat st cs
• Statistics
• Coordinate
• L
Layer
• Facet

Mapping
Mappingcontrolsrelationsbetweenvariables
M
i
t l
l ti
b t
i bl

Scale
Scalewillpresentmappingoncoordinatescales.
Scale will present mapping on coordinate scales
ScaleandMappingiscloselyrelatedconcepts.

Geometric
Geom means the graphical elements such as
meansthegraphicalelements,suchas
points,linesandpolygons.

Statistics
Statenablesustocalculateanddostatistical
Stat enables us to calculate and do statistical
analysisbased,suchasaddingaregressionline.

Stat

Coordinate
Cood will affect how we observe graphical
willaffecthowweobservegraphical
elements.Transformationofcoordinatesisuseful.

Stat

Coord

Layer
Component:data,mapping,geom,stat
Usinglayerwillallowuserstoestablishplotsstep
i
l
ill ll
bli h l
bystep.Itbecomemucheasiertomodifyaplot.

Facet
Facetsplitsdataintogroupsanddraweach
groupseparately.Usually,thereisaorder.

SkillI：Point
Skill I：Point

Sampledata mpg
Sample data--mpg
• Fuel economy data from 1999 and 2008 for 38 popular
y
p p
models of car
•
•
•
•
•
•
•

Details
D il
Displ :               engine displacement, in litres
Cyl:                    number of cylinders
Cyl:
number of cylinders
Trans:                type of transmission
Drv:                   front‐wheel, rear wheel drive, 4wd
,
,
Cty:                    city miles per gallon
Hwy:                  highway miles per gallon

>library(ggplot2)
>str(mpg)
'data.frame':

234obs.of14variables:

$manufacturer:Factorw/15levels"audi","chevrolet",..:
$model:Factorw/38levels"4runner4wd",..:
$displ

:num 1.81.8222.82.83.11.81.82...

$year:int 1999 1999 2008 2008 1999 1999 2008 1999
$ year
: int 19991999200820081999199920081999
$cyl

:int 4444666444...

$trans:Factorw/10levels"auto(av)","auto(l3)",..:
$drv
$d

:Factorw/3levels"4","f","r":
3l
l
f

$cty

:int 18212021161818181620...

$hwy
$fl

:int 29293130262627262528...
:Factorw/5levels"c","d","e","p",..:

$class:Factorw/7levels"2seater","compact",..:

aesthetics
p < ggplot(data mpg, mapping aes(x cty, y hwy))
p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy))
p + geom_point()

> summary(p)
data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy,
fl, class [234x11]
fl l [
]
mapping: x = cty, y = hwy
faceting: facet_null()
g
_
()
> summary(p+geom_point())
data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy,
data: manufacturer model displ year cyl trans drv cty hwy
fl, class [234x11]
f
i f
ll()
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
g
geom_point: na.rm = FALSE
_p
stat_identity:
position_identity: (width = NULL, height = NULL)

p + geom_point(color red4 size=3)
p + geom point(color='red4',size 3)

#addonemorelayer--color
p
p<- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year)))
ggp ( pg,
(
y,y
y,
(y )))
p+geom_point()

#addonemorestat(loess:localpartialpolynomialregression)
> p + geom_point() + stat_smooth()

p <‐ ggplot(data=mpg, mapping=aes(x=cty,y=hwy))
p g
p + geom_point(aes(colour=factor(year)))+
_p
( (
(y )))
stat_smooth()

Twoequallywaystodraw
q
y
y
p <‐ ggplot(mpg, aes(x=cty,y=hwy))
p <‐ ggplot(mpg aes(x=cty y=hwy))
p + geom_point(aes(colour=factor(year)))+
()
stat_smooth()

d <‐ ggplot() +
()
geom_point(data=mpg, aes(x=cty, y=hwy, colour=factor(year)))+
stat_smooth(data=mpg, aes(x=cty, y=hwy))
print(d)

Besidethe“whitepaper”canvas,wewillfindgeom andstat
canvas.
> summary(d)
data: [0x0]
[ ]
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
mapping: x = cty, y = hwy, colour = factor(year)
mapping: x = cty y = hwy colour = factor(year)
geom_point: na.rm = FALSE
stat_identity:
pp g
y, y
y
geom_smooth:
stat_smooth: method = auto, formula = y ~ x, se = TRUE,
n = 80, fullrange = FALSE, level = 0.95, na.rm = FALSE
n = 80 fullrange = FALSE level = 0 95 na rm = FALSE

#Usingscale()function,wecancontrolcolorofscale.
p+geom_point(aes(colour=factor(year)))+
stat_smooth()+
h()
scale_color_manual(values=c('steelblue','red4'))

#Wecanmap“displ”to thesizeofpoint
p+geom_point(aes(colour=factor(year),size=displ))+
stat_smooth()+
h()
scale_color_manual(values=c('steelblue','red4'))

# We solve the problem with overlapping and point being too small
p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+
stat_smooth()+
scale_color_manual(values =c('steelblue','red4'))+
scale_size_continuous(range = c(4, 10))

# We change the coordinate system.
p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+
stat_smooth()+
scale_size_continuous(range = c(4, 10)) + coord_flip()

p + geom_point(aes(colour=factor(year),size=displ),
alpha=0.5,position = "jitter")+
stat_smooth()+
()
scale_size_continuous(range = c(4, 10)) + coord_polar()

p + geom_point(aes(colour=factor(year),size=displ),
alpha=0.5,position = "jitter") + stat_smooth()+
(
(
,
))
scale_size_continuous(range = c(4, 10))+
coord_cartesian(xlim = c(15, 25), ylim=c(15,40))

#Usingfacet()function,wenowsplitdataanddrawthembygroup
p+geom_point(aes(colour=class,size=displ),
alpha=0.5,position="jitter")+
stat_smooth()+
h
scale_size_continuous(range=c(4,10))+
facet_wrap(~year,ncol=1)

#Addplotnameandspecifyallinformationyouwanttoadd
p<- ggplot(mpg,aes(x=cty,y=hwy))
p+geom_point(aes(colour=class,size=displ),
alpha=0.5,position="jitter")+stat_smooth()+
scale_size_continuous(range=c(4,10))+
l i
ti
(
(4 10))
facet_wrap(~year,ncol=1)+opts(title='modelofcarandmpg')+
labs(y='drivingdistancepergallononhighway',x='drivingdistancepergallononcityroad',
size='displacement',colour ='model')

#scatterplotfordiamonddataset
p
p<- ggplot(diamonds,aes(carat,price))
ggp (
,
(
,p
))
p+geom_point()

#usetransparencyandsmallsizepoints
p g
p+geom_point(size=0.1,alpha=0.1)
_p
(
, p
)

#usebincharttoobserveintensityofpoints
p
p+stat_bin2d(bins=40)
_
(
)

#estimatedatadentisy
p+stat_density2d(aes(fill=..level..),geom="polygon")+
coord_cartesian(xlim =c(0,1.5),ylim=c(0,6000))
coord cartesian(xlim = c(0 1 5) ylim=c(0 6000))

SkillIII：Histogram
Skill III：Histogram

SkillIV：Line
Skill IV：Line

Resources
http://learnr.wordpress.com
Redrawallthelatticegraph
Redraw all the lattice graph
byggplot2

Resources
Alltheexamplesaredoneby
ggplot2.
ggplot2

Resources
• http://wiki stdout org/rcookbook/Graphs/
http://wiki.stdout.org/rcookbook/Graphs/
• http://r-blogger.com
• htt //St k
http://Stackoverflow.com
fl
• http://xccds1977.blogspot.com
• http://r-ke.info/
• http://www.youtube.com/watch?v=vnVJJYi1
mbw

Thankyou!Comebackformore!
Signupat:www.meetup.com/nyc-open-data
Givefeedbackat:www.bit.ly/nycopen

Ggplot2 1outof3

Recommended

Recommended

More Related Content

Similar to Ggplot2 1outof3

Similar to Ggplot2 1outof3 (8)

More from Vivian S. Zhang

More from Vivian S. Zhang (20)

Recently uploaded

Recently uploaded (20)

Ggplot2 1outof3