Upcoming SlideShare
×

# Ggplot2 1outof3

901 views

Published on

3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
901
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
42
0
Likes
3
Embeds 0
No embeds

No notes for slide

### Ggplot2 1outof3

1. 1. Learnggplot2 AsKungfu Skill A K f Skills Givenby KaiXiao(DataSciencetist) VivianZhang(Co-founder&CTO) Vi i Zh (C f d & CTO) Contact:vivian.zhang@supstat.com Contact: vivian zhang@supstat com
2. 2. • I： Point • II： Bar • III：Histogram g • IV：Line • V： Tile • VI：Map
3. 3. Introduction • ggplot2 is a plotting system for R • based on the《The Grammar of  Graphics》 • which tries to take the good parts  of base and lattice graphics and  none of the bad parts none of the bad parts • It takes care of many of the fiddly  details that make plotting a hassle details that make plotting a hassle • It becomes easy to produce  complex multi‐layered graphics p y g p
4. 4. Whyweloveggplot2? Why we love ggplot2? • controltheplotasabstractlayersandmakecreativitybecomereality； • getusedtostructuralthinking； d l hi ki • getbeautifulgraphicswhileavoidingcomplicateddetails 1973murdercasesinUSA
5. 5. 7BasicConcepts 7 Basic Concepts • Mapping • Scale • Geometric Stat st cs • Statistics • Coordinate • L Layer • Facet
6. 6. Mapping Mappingcontrolsrelationsbetweenvariables M i t l l ti b t i bl
7. 7. Scale Scalewillpresentmappingoncoordinatescales. Scale will present mapping on coordinate scales ScaleandMappingiscloselyrelatedconcepts.
8. 8. Geometric Geom means the graphical elements such as meansthegraphicalelements,suchas points,linesandpolygons.
9. 9. Statistics Statenablesustocalculateanddostatistical Stat enables us to calculate and do statistical analysisbased,suchasaddingaregressionline. Stat
10. 10. Coordinate Cood will affect how we observe graphical willaffecthowweobservegraphical elements.Transformationofcoordinatesisuseful. Stat Coord
11. 11. Layer Component:data,mapping,geom,stat Usinglayerwillallowuserstoestablishplotsstep i l ill ll bli h l bystep.Itbecomemucheasiertomodifyaplot.
12. 12. Facet Facetsplitsdataintogroupsanddraweach groupseparately.Usually,thereisaorder.
13. 13. 7BasicConcepts 7 Basic Concepts • Mapping • Scale • Geometric Stat st cs • Statistics • Coordinate • L Layer • Facet
14. 14. SkillI：Point Skill I：Point
15. 15. Sampledata mpg Sample data--mpg • Fuel economy data from 1999 and 2008 for 38 popular  y p p models of car • • • • • • • Details D il Displ :               engine displacement, in litres Cyl:                    number of cylinders  Cyl: number of cylinders Trans:                type of transmission  Drv:                   front‐wheel, rear wheel drive, 4wd  , , Cty:                    city miles per gallon  Hwy:                  highway miles per gallon
16. 16. >library(ggplot2) >str(mpg) 'data.frame': 234obs.of14variables: \$manufacturer:Factorw/15levels"audi","chevrolet",..: \$model:Factorw/38levels"4runner4wd",..: \$displ :num 1.81.8222.82.83.11.81.82... \$year:int 1999 1999 2008 2008 1999 1999 2008 1999 \$ year : int 19991999200820081999199920081999 \$cyl :int 4444666444... \$trans:Factorw/10levels"auto(av)","auto(l3)",..: \$drv \$d :Factorw/3levels"4","f","r": 3l l f \$cty :int 18212021161818181620... \$hwy \$fl :int 29293130262627262528... :Factorw/5levels"c","d","e","p",..: \$class:Factorw/7levels"2seater","compact",..:
17. 17. aesthetics p < ggplot(data mpg, mapping aes(x cty, y hwy)) p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy)) p + geom_point()
18. 18. > summary(p)  data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy,  fl, class [234x11]  fl l [ ] mapping: x = cty, y = hwy faceting: facet_null()  g _ () > summary(p+geom_point()) data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, data: manufacturer model displ year cyl trans drv cty hwy fl, class [234x11] mapping:  x = cty, y = hwy faceting: facet_null()  f i f ll() ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ g geom_point: na.rm = FALSE  _p stat_identity:   position_identity: (width = NULL, height = NULL)
19. 19. p + geom_point(color red4 size=3) p + geom point(color='red4',size 3)
20. 20. #addonemorelayer--color p p<- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year))) ggp ( pg, ( y,y y, (y ))) p+geom_point()
21. 21. #addonemorestat(loess:localpartialpolynomialregression) > p + geom_point() + stat_smooth()
22. 22. p <‐ ggplot(data=mpg, mapping=aes(x=cty,y=hwy)) p g p + geom_point(aes(colour=factor(year)))+ _p ( ( (y ))) stat_smooth()
23. 23. Twoequallywaystodraw q y y p  <‐ ggplot(mpg, aes(x=cty,y=hwy)) p <‐ ggplot(mpg aes(x=cty y=hwy)) p  + geom_point(aes(colour=factor(year)))+ () stat_smooth() d <‐ ggplot() + () geom_point(data=mpg, aes(x=cty, y=hwy, colour=factor(year)))+ stat_smooth(data=mpg, aes(x=cty, y=hwy)) print(d)
24. 24. Besidethe“whitepaper”canvas,wewillfindgeom andstat canvas. > summary(d) data: [0x0] [ ] faceting: facet_null()  ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ mapping: x = cty, y = hwy, colour = factor(year)  mapping: x = cty y = hwy colour = factor(year) geom_point: na.rm = FALSE  stat_identity:   position_identity: (width = NULL, height = NULL) pp g y, y y mapping: x = cty, y = hwy  geom_smooth:   stat_smooth: method = auto, formula = y ~ x, se = TRUE,  n = 80, fullrange = FALSE, level = 0.95, na.rm = FALSE  n = 80 fullrange = FALSE level = 0 95 na rm = FALSE position_identity: (width = NULL, height = NULL)
25. 25. #Usingscale()function,wecancontrolcolorofscale. p+geom_point(aes(colour=factor(year)))+ stat_smooth()+ h() scale_color_manual(values=c('steelblue','red4'))
26. 26. #Wecanmap“displ”to thesizeofpoint p+geom_point(aes(colour=factor(year),size=displ))+ stat_smooth()+ h() scale_color_manual(values=c('steelblue','red4'))
27. 27. # We solve the problem with overlapping and point being too small p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10))
28. 28. # We change the coordinate system. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10)) +    coord_flip()
29. 29. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ () scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10))  +    coord_polar()
30. 30. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")  +    stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ ( ( , )) scale_size_continuous(range = c(4, 10))+                    coord_cartesian(xlim = c(15, 25), ylim=c(15,40))
31. 31. #Usingfacet()function,wenowsplitdataanddrawthembygroup p+geom_point(aes(colour=class,size=displ), alpha=0.5,position="jitter")+ stat_smooth()+ h scale_size_continuous(range=c(4,10))+ facet_wrap(~year,ncol=1)
32. 32. #Addplotnameandspecifyallinformationyouwanttoadd p<- ggplot(mpg,aes(x=cty,y=hwy)) p+geom_point(aes(colour=class,size=displ), alpha=0.5,position="jitter")+stat_smooth()+ scale_size_continuous(range=c(4,10))+ l i ti ( (4 10)) facet_wrap(~year,ncol=1)+opts(title='modelofcarandmpg')+ labs(y='drivingdistancepergallononhighway',x='drivingdistancepergallononcityroad', size='displacement',colour ='model')
33. 33. #scatterplotfordiamonddataset p p<- ggplot(diamonds,aes(carat,price)) ggp ( , ( ,p )) p+geom_point()
34. 34. #usetransparencyandsmallsizepoints p g p+geom_point(size=0.1,alpha=0.1) _p ( , p )
35. 35. #usebincharttoobserveintensityofpoints p p+stat_bin2d(bins=40) _ ( )
36. 36. #estimatedatadentisy p+stat_density2d(aes(fill=..level..),geom="polygon")+ coord_cartesian(xlim =c(0,1.5),ylim=c(0,6000)) coord cartesian(xlim = c(0 1 5) ylim=c(0 6000))
37. 37. SkillII：Bar Skill II：Bar
38. 38. SkillIII：Histogram Skill III：Histogram
39. 39. SkillIV：Line Skill IV：Line
40. 40. SkillV：Tile Skill V：Tile
41. 41. SkillVI：Map Skill VI：Map
42. 42. Resources http://learnr.wordpress.com Redrawallthelatticegraph Redraw all the lattice graph byggplot2
43. 43. Resources Alltheexamplesaredoneby ggplot2. ggplot2
44. 44. Resources • http://wiki stdout org/rcookbook/Graphs/ http://wiki.stdout.org/rcookbook/Graphs/ • http://r-blogger.com • htt //St k http://Stackoverflow.com fl • http://xccds1977.blogspot.com • http://r-ke.info/ • http://www.youtube.com/watch?v=vnVJJYi1 mbw
45. 45. Thankyou!Comebackformore! Signupat:www.meetup.com/nyc-open-data Givefeedbackat:www.bit.ly/nycopen