Learnggplot2
AsKungfu Skill
A K
f Skills
Givenby
KaiXiao(DataSciencetist)
VivianZhang(Co-founder&CTO)
Vi i Zh
(C f...
• I: Point
• II: Bar
• III:Histogram
g
• IV:Line
• V: Tile
• VI:Map
Introduction
• ggplot2 is a plotting system for R
• based on the《The Grammar of 
Graphics》
• which tries to take the good ...
Whyweloveggplot2?
Why we love ggplot2?
•

controltheplotasabstractlayersandmakecreativitybecomereality;

•

g...
7BasicConcepts
7 Basic Concepts
• Mapping
• Scale
• Geometric
Stat st cs
• Statistics
• Coordinate
• L
Layer
• Facet
Mapping
Mappingcontrolsrelationsbetweenvariables
M
i
t l
l ti
b t
i bl
Scale
Scalewillpresentmappingoncoordinatescales.
Scale will present mapping on coordinate scales
ScaleandMapping...
Geometric
Geom means the graphical elements such as
meansthegraphicalelements,suchas
points,linesandpolygons.
Statistics
Statenablesustocalculateanddostatistical
Stat enables us to calculate and do statistical
analysisbased...
Coordinate
Cood will affect how we observe graphical
willaffecthowweobservegraphical
elements.Transformationofcoo...
Layer
Component:data,mapping,geom,stat
Usinglayerwillallowuserstoestablishplotsstep
i
l
ill ll
bli h l
byst...
Facet
Facetsplitsdataintogroupsanddraweach
groupseparately.Usually,thereisaorder.
7BasicConcepts
7 Basic Concepts
• Mapping
• Scale
• Geometric
Stat st cs
• Statistics
• Coordinate
• L
Layer
• Facet
SkillI:Point
Skill I:Point
Sampledata mpg
Sample data--mpg
• Fuel economy data from 1999 and 2008 for 38 popular 
y
p p
models of car
•
•
•
•
•
•
•
...
>library(ggplot2)
>str(mpg)
'data.frame':

234obs.of14variables:

$manufacturer:Factorw/15levels"audi","chevr...
aesthetics
p < ggplot(data mpg, mapping aes(x cty, y hwy))
p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy))
p + geom_point()
> summary(p) 
data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, 
fl, class [234x11] 
fl l [
]
mapping: x ...
p + geom_point(color red4 size=3)
p + geom point(color='red4',size 3)
#addonemorelayer--color
p
p<- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year)))
ggp ( pg,
(
y,y
y,
(y )))
p+geom_poi...
#addonemorestat(loess:localpartialpolynomialregression)
> p + geom_point() + stat_smooth()
p <‐ ggplot(data=mpg, mapping=aes(x=cty,y=hwy))
p g
p + geom_point(aes(colour=factor(year)))+
_p
( (
(y )))
stat_smooth()
Twoequallywaystodraw
q
y
y
p  <‐ ggplot(mpg, aes(x=cty,y=hwy))
p <‐ ggplot(mpg aes(x=cty y=hwy))
p  + geom_point(aes(c...
Besidethe“whitepaper”canvas,wewillfindgeom andstat
canvas.
> summary(d)
data: [0x0]
[ ]
faceting: facet_null() 
‐...
#Usingscale()function,wecancontrolcolorofscale.
p+geom_point(aes(colour=factor(year)))+
stat_smooth()+
h()
scal...
#Wecanmap“displ”to thesizeofpoint
p+geom_point(aes(colour=factor(year),size=displ))+
stat_smooth()+
h()
scale_col...
# We solve the problem with overlapping and point being too small
p + geom_point(aes(colour=factor(year),size=displ), alph...
# We change the coordinate system.
p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+
sta...
p + geom_point(aes(colour=factor(year),size=displ),
alpha=0.5,position = "jitter")+
stat_smooth()+
()
scale_color_manual(v...
p + geom_point(aes(colour=factor(year),size=displ),
alpha=0.5,position = "jitter")  +    stat_smooth()+
scale_color_manual...
#Usingfacet()function,wenowsplitdataanddrawthembygroup
p+geom_point(aes(colour=class,size=displ),
alpha=0.5,...
#Addplotnameandspecifyallinformationyouwanttoadd
p<- ggplot(mpg,aes(x=cty,y=hwy))
p+geom_point(aes(colour=c...
#scatterplotfordiamonddataset
p
p<- ggplot(diamonds,aes(carat,price))
ggp (
,
(
,p
))
p+geom_point()
#usetransparencyandsmallsizepoints
p g
p+geom_point(size=0.1,alpha=0.1)
_p
(
, p
)
#usebincharttoobserveintensityofpoints
p
p+stat_bin2d(bins=40)
_
(
)
#estimatedatadentisy
p+stat_density2d(aes(fill=..level..),geom="polygon")+
coord_cartesian(xlim =c(0,1.5),ylim=...
SkillII:Bar
Skill II:Bar
SkillIII:Histogram
Skill III:Histogram
SkillIV:Line
Skill IV:Line
SkillV:Tile
Skill V:Tile
SkillVI:Map
Skill VI:Map
Resources
http://learnr.wordpress.com
Redrawallthelatticegraph
Redraw all the lattice graph
byggplot2
Resources
Alltheexamplesaredoneby
ggplot2.
ggplot2
Resources
• http://wiki stdout org/rcookbook/Graphs/
http://wiki.stdout.org/rcookbook/Graphs/
• http://r-blogger.com
• htt...
Thankyou!Comebackformore!
Signupat:www.meetup.com/nyc-open-data
Givefeedbackat:www.bit.ly/nycopen
Upcoming SlideShare
Loading in...5
×

Ggplot2 1outof3

287

Published on

This is the slides I made for NYC Open Data meetup.event detail: http://www.meetup.com/NYC-Open-Data/events/137235202/

NYC Open Data meetup group offers free woekshop twice per week in NYC. Everyone is welcome to come and learn data science and open data topics. Sin up at http://www.meetup.com/NYC-Open-Data/

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
287
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Ggplot2 1outof3

  1. 1. Learnggplot2 AsKungfu Skill A K f Skills Givenby KaiXiao(DataSciencetist) VivianZhang(Co-founder&CTO) Vi i Zh (C f d & CTO) Contact:vivian.zhang@supstat.com Contact: vivian zhang@supstat com
  2. 2. • I: Point • II: Bar • III:Histogram g • IV:Line • V: Tile • VI:Map
  3. 3. Introduction • ggplot2 is a plotting system for R • based on the《The Grammar of  Graphics》 • which tries to take the good parts  of base and lattice graphics and  none of the bad parts none of the bad parts • It takes care of many of the fiddly  details that make plotting a hassle details that make plotting a hassle • It becomes easy to produce  complex multi‐layered graphics p y g p
  4. 4. Whyweloveggplot2? Why we love ggplot2? • controltheplotasabstractlayersandmakecreativitybecomereality; • getusedtostructuralthinking; d l hi ki • getbeautifulgraphicswhileavoidingcomplicateddetails 1973murdercasesinUSA
  5. 5. 7BasicConcepts 7 Basic Concepts • Mapping • Scale • Geometric Stat st cs • Statistics • Coordinate • L Layer • Facet
  6. 6. Mapping Mappingcontrolsrelationsbetweenvariables M i t l l ti b t i bl
  7. 7. Scale Scalewillpresentmappingoncoordinatescales. Scale will present mapping on coordinate scales ScaleandMappingiscloselyrelatedconcepts.
  8. 8. Geometric Geom means the graphical elements such as meansthegraphicalelements,suchas points,linesandpolygons.
  9. 9. Statistics Statenablesustocalculateanddostatistical Stat enables us to calculate and do statistical analysisbased,suchasaddingaregressionline. Stat
  10. 10. Coordinate Cood will affect how we observe graphical willaffecthowweobservegraphical elements.Transformationofcoordinatesisuseful. Stat Coord
  11. 11. Layer Component:data,mapping,geom,stat Usinglayerwillallowuserstoestablishplotsstep i l ill ll bli h l bystep.Itbecomemucheasiertomodifyaplot.
  12. 12. Facet Facetsplitsdataintogroupsanddraweach groupseparately.Usually,thereisaorder.
  13. 13. 7BasicConcepts 7 Basic Concepts • Mapping • Scale • Geometric Stat st cs • Statistics • Coordinate • L Layer • Facet
  14. 14. SkillI:Point Skill I:Point
  15. 15. Sampledata mpg Sample data--mpg • Fuel economy data from 1999 and 2008 for 38 popular  y p p models of car • • • • • • • Details D il Displ :               engine displacement, in litres Cyl:                    number of cylinders  Cyl: number of cylinders Trans:                type of transmission  Drv:                   front‐wheel, rear wheel drive, 4wd  , , Cty:                    city miles per gallon  Hwy:                  highway miles per gallon 
  16. 16. >library(ggplot2) >str(mpg) 'data.frame': 234obs.of14variables: $manufacturer:Factorw/15levels"audi","chevrolet",..: $model:Factorw/38levels"4runner4wd",..: $displ :num 1.81.8222.82.83.11.81.82... $year:int 1999 1999 2008 2008 1999 1999 2008 1999 $ year : int 19991999200820081999199920081999 $cyl :int 4444666444... $trans:Factorw/10levels"auto(av)","auto(l3)",..: $drv $d :Factorw/3levels"4","f","r": 3l l f $cty :int 18212021161818181620... $hwy $fl :int 29293130262627262528... :Factorw/5levels"c","d","e","p",..: $class:Factorw/7levels"2seater","compact",..:
  17. 17. aesthetics p < ggplot(data mpg, mapping aes(x cty, y hwy)) p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy)) p + geom_point()
  18. 18. > summary(p)  data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy,  fl, class [234x11]  fl l [ ] mapping: x = cty, y = hwy faceting: facet_null()  g _ () > summary(p+geom_point()) data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, data: manufacturer model displ year cyl trans drv cty hwy fl, class [234x11] mapping:  x = cty, y = hwy faceting: facet_null()  f i f ll() ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ g geom_point: na.rm = FALSE  _p stat_identity:   position_identity: (width = NULL, height = NULL)
  19. 19. p + geom_point(color red4 size=3) p + geom point(color='red4',size 3)
  20. 20. #addonemorelayer--color p p<- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year))) ggp ( pg, ( y,y y, (y ))) p+geom_point()
  21. 21. #addonemorestat(loess:localpartialpolynomialregression) > p + geom_point() + stat_smooth()
  22. 22. p <‐ ggplot(data=mpg, mapping=aes(x=cty,y=hwy)) p g p + geom_point(aes(colour=factor(year)))+ _p ( ( (y ))) stat_smooth()
  23. 23. Twoequallywaystodraw q y y p  <‐ ggplot(mpg, aes(x=cty,y=hwy)) p <‐ ggplot(mpg aes(x=cty y=hwy)) p  + geom_point(aes(colour=factor(year)))+ () stat_smooth() d <‐ ggplot() + () geom_point(data=mpg, aes(x=cty, y=hwy, colour=factor(year)))+ stat_smooth(data=mpg, aes(x=cty, y=hwy)) print(d)
  24. 24. Besidethe“whitepaper”canvas,wewillfindgeom andstat canvas. > summary(d) data: [0x0] [ ] faceting: facet_null()  ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ mapping: x = cty, y = hwy, colour = factor(year)  mapping: x = cty y = hwy colour = factor(year) geom_point: na.rm = FALSE  stat_identity:   position_identity: (width = NULL, height = NULL) pp g y, y y mapping: x = cty, y = hwy  geom_smooth:   stat_smooth: method = auto, formula = y ~ x, se = TRUE,  n = 80, fullrange = FALSE, level = 0.95, na.rm = FALSE  n = 80 fullrange = FALSE level = 0 95 na rm = FALSE position_identity: (width = NULL, height = NULL)
  25. 25. #Usingscale()function,wecancontrolcolorofscale. p+geom_point(aes(colour=factor(year)))+ stat_smooth()+ h() scale_color_manual(values=c('steelblue','red4'))
  26. 26. #Wecanmap“displ”to thesizeofpoint p+geom_point(aes(colour=factor(year),size=displ))+ stat_smooth()+ h() scale_color_manual(values=c('steelblue','red4'))
  27. 27. # We solve the problem with overlapping and point being too small p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10))
  28. 28. # We change the coordinate system. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10)) +    coord_flip()
  29. 29. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ () scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10))  +    coord_polar()
  30. 30. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")  +    stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ ( ( , )) scale_size_continuous(range = c(4, 10))+                    coord_cartesian(xlim = c(15, 25), ylim=c(15,40))
  31. 31. #Usingfacet()function,wenowsplitdataanddrawthembygroup p+geom_point(aes(colour=class,size=displ), alpha=0.5,position="jitter")+ stat_smooth()+ h scale_size_continuous(range=c(4,10))+ facet_wrap(~year,ncol=1)
  32. 32. #Addplotnameandspecifyallinformationyouwanttoadd p<- ggplot(mpg,aes(x=cty,y=hwy)) p+geom_point(aes(colour=class,size=displ), alpha=0.5,position="jitter")+stat_smooth()+ scale_size_continuous(range=c(4,10))+ l i ti ( (4 10)) facet_wrap(~year,ncol=1)+opts(title='modelofcarandmpg')+ labs(y='drivingdistancepergallononhighway',x='drivingdistancepergallononcityroad', size='displacement',colour ='model')
  33. 33. #scatterplotfordiamonddataset p p<- ggplot(diamonds,aes(carat,price)) ggp ( , ( ,p )) p+geom_point()
  34. 34. #usetransparencyandsmallsizepoints p g p+geom_point(size=0.1,alpha=0.1) _p ( , p )
  35. 35. #usebincharttoobserveintensityofpoints p p+stat_bin2d(bins=40) _ ( )
  36. 36. #estimatedatadentisy p+stat_density2d(aes(fill=..level..),geom="polygon")+ coord_cartesian(xlim =c(0,1.5),ylim=c(0,6000)) coord cartesian(xlim = c(0 1 5) ylim=c(0 6000))
  37. 37. SkillII:Bar Skill II:Bar
  38. 38. SkillIII:Histogram Skill III:Histogram
  39. 39. SkillIV:Line Skill IV:Line
  40. 40. SkillV:Tile Skill V:Tile
  41. 41. SkillVI:Map Skill VI:Map
  42. 42. Resources http://learnr.wordpress.com Redrawallthelatticegraph Redraw all the lattice graph byggplot2
  43. 43. Resources Alltheexamplesaredoneby ggplot2. ggplot2
  44. 44. Resources • http://wiki stdout org/rcookbook/Graphs/ http://wiki.stdout.org/rcookbook/Graphs/ • http://r-blogger.com • htt //St k http://Stackoverflow.com fl • http://xccds1977.blogspot.com • http://r-ke.info/ • http://www.youtube.com/watch?v=vnVJJYi1 mbw
  45. 45. Thankyou!Comebackformore! Signupat:www.meetup.com/nyc-open-data Givefeedbackat:www.bit.ly/nycopen
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×