Ggplot2 1outof3

509 views

Published on

This is the slides I made for NYC Open Data meetup.event detail: http://www.meetup.com/NYC-Open-Data/events/137235202/

NYC Open Data meetup group offers free woekshop twice per week in NYC. Everyone is welcome to come and learn data science and open data topics. Sin up at http://www.meetup.com/NYC-Open-Data/

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
509
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
12
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Ggplot2 1outof3

  1. 1. Learnggplot2 AsKungfu Skill A K f Skills Givenby KaiXiao(DataSciencetist) VivianZhang(Co-founder&CTO) Vi i Zh (C f d & CTO) Contact:vivian.zhang@supstat.com Contact: vivian zhang@supstat com
  2. 2. • I: Point • II: Bar • III:Histogram g • IV:Line • V: Tile • VI:Map
  3. 3. Introduction • ggplot2 is a plotting system for R • based on the《The Grammar of  Graphics》 • which tries to take the good parts  of base and lattice graphics and  none of the bad parts none of the bad parts • It takes care of many of the fiddly  details that make plotting a hassle details that make plotting a hassle • It becomes easy to produce  complex multi‐layered graphics p y g p
  4. 4. Whyweloveggplot2? Why we love ggplot2? • controltheplotasabstractlayersandmakecreativitybecomereality; • getusedtostructuralthinking; d l hi ki • getbeautifulgraphicswhileavoidingcomplicateddetails 1973murdercasesinUSA
  5. 5. 7BasicConcepts 7 Basic Concepts • Mapping • Scale • Geometric Stat st cs • Statistics • Coordinate • L Layer • Facet
  6. 6. Mapping Mappingcontrolsrelationsbetweenvariables M i t l l ti b t i bl
  7. 7. Scale Scalewillpresentmappingoncoordinatescales. Scale will present mapping on coordinate scales ScaleandMappingiscloselyrelatedconcepts.
  8. 8. Geometric Geom means the graphical elements such as meansthegraphicalelements,suchas points,linesandpolygons.
  9. 9. Statistics Statenablesustocalculateanddostatistical Stat enables us to calculate and do statistical analysisbased,suchasaddingaregressionline. Stat
  10. 10. Coordinate Cood will affect how we observe graphical willaffecthowweobservegraphical elements.Transformationofcoordinatesisuseful. Stat Coord
  11. 11. Layer Component:data,mapping,geom,stat Usinglayerwillallowuserstoestablishplotsstep i l ill ll bli h l bystep.Itbecomemucheasiertomodifyaplot.
  12. 12. Facet Facetsplitsdataintogroupsanddraweach groupseparately.Usually,thereisaorder.
  13. 13. 7BasicConcepts 7 Basic Concepts • Mapping • Scale • Geometric Stat st cs • Statistics • Coordinate • L Layer • Facet
  14. 14. SkillI:Point Skill I:Point
  15. 15. Sampledata mpg Sample data--mpg • Fuel economy data from 1999 and 2008 for 38 popular  y p p models of car • • • • • • • Details D il Displ :               engine displacement, in litres Cyl:                    number of cylinders  Cyl: number of cylinders Trans:                type of transmission  Drv:                   front‐wheel, rear wheel drive, 4wd  , , Cty:                    city miles per gallon  Hwy:                  highway miles per gallon 
  16. 16. >library(ggplot2) >str(mpg) 'data.frame': 234obs.of14variables: $manufacturer:Factorw/15levels"audi","chevrolet",..: $model:Factorw/38levels"4runner4wd",..: $displ :num 1.81.8222.82.83.11.81.82... $year:int 1999 1999 2008 2008 1999 1999 2008 1999 $ year : int 19991999200820081999199920081999 $cyl :int 4444666444... $trans:Factorw/10levels"auto(av)","auto(l3)",..: $drv $d :Factorw/3levels"4","f","r": 3l l f $cty :int 18212021161818181620... $hwy $fl :int 29293130262627262528... :Factorw/5levels"c","d","e","p",..: $class:Factorw/7levels"2seater","compact",..:
  17. 17. aesthetics p < ggplot(data mpg, mapping aes(x cty, y hwy)) p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy)) p + geom_point()
  18. 18. > summary(p)  data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy,  fl, class [234x11]  fl l [ ] mapping: x = cty, y = hwy faceting: facet_null()  g _ () > summary(p+geom_point()) data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, data: manufacturer model displ year cyl trans drv cty hwy fl, class [234x11] mapping:  x = cty, y = hwy faceting: facet_null()  f i f ll() ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ g geom_point: na.rm = FALSE  _p stat_identity:   position_identity: (width = NULL, height = NULL)
  19. 19. p + geom_point(color red4 size=3) p + geom point(color='red4',size 3)
  20. 20. #addonemorelayer--color p p<- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year))) ggp ( pg, ( y,y y, (y ))) p+geom_point()
  21. 21. #addonemorestat(loess:localpartialpolynomialregression) > p + geom_point() + stat_smooth()
  22. 22. p <‐ ggplot(data=mpg, mapping=aes(x=cty,y=hwy)) p g p + geom_point(aes(colour=factor(year)))+ _p ( ( (y ))) stat_smooth()
  23. 23. Twoequallywaystodraw q y y p  <‐ ggplot(mpg, aes(x=cty,y=hwy)) p <‐ ggplot(mpg aes(x=cty y=hwy)) p  + geom_point(aes(colour=factor(year)))+ () stat_smooth() d <‐ ggplot() + () geom_point(data=mpg, aes(x=cty, y=hwy, colour=factor(year)))+ stat_smooth(data=mpg, aes(x=cty, y=hwy)) print(d)
  24. 24. Besidethe“whitepaper”canvas,wewillfindgeom andstat canvas. > summary(d) data: [0x0] [ ] faceting: facet_null()  ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ mapping: x = cty, y = hwy, colour = factor(year)  mapping: x = cty y = hwy colour = factor(year) geom_point: na.rm = FALSE  stat_identity:   position_identity: (width = NULL, height = NULL) pp g y, y y mapping: x = cty, y = hwy  geom_smooth:   stat_smooth: method = auto, formula = y ~ x, se = TRUE,  n = 80, fullrange = FALSE, level = 0.95, na.rm = FALSE  n = 80 fullrange = FALSE level = 0 95 na rm = FALSE position_identity: (width = NULL, height = NULL)
  25. 25. #Usingscale()function,wecancontrolcolorofscale. p+geom_point(aes(colour=factor(year)))+ stat_smooth()+ h() scale_color_manual(values=c('steelblue','red4'))
  26. 26. #Wecanmap“displ”to thesizeofpoint p+geom_point(aes(colour=factor(year),size=displ))+ stat_smooth()+ h() scale_color_manual(values=c('steelblue','red4'))
  27. 27. # We solve the problem with overlapping and point being too small p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10))
  28. 28. # We change the coordinate system. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10)) +    coord_flip()
  29. 29. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+ stat_smooth()+ () scale_color_manual(values =c('steelblue','red4'))+ scale_size_continuous(range = c(4, 10))  +    coord_polar()
  30. 30. p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")  +    stat_smooth()+ scale_color_manual(values =c('steelblue','red4'))+ ( ( , )) scale_size_continuous(range = c(4, 10))+                    coord_cartesian(xlim = c(15, 25), ylim=c(15,40))
  31. 31. #Usingfacet()function,wenowsplitdataanddrawthembygroup p+geom_point(aes(colour=class,size=displ), alpha=0.5,position="jitter")+ stat_smooth()+ h scale_size_continuous(range=c(4,10))+ facet_wrap(~year,ncol=1)
  32. 32. #Addplotnameandspecifyallinformationyouwanttoadd p<- ggplot(mpg,aes(x=cty,y=hwy)) p+geom_point(aes(colour=class,size=displ), alpha=0.5,position="jitter")+stat_smooth()+ scale_size_continuous(range=c(4,10))+ l i ti ( (4 10)) facet_wrap(~year,ncol=1)+opts(title='modelofcarandmpg')+ labs(y='drivingdistancepergallononhighway',x='drivingdistancepergallononcityroad', size='displacement',colour ='model')
  33. 33. #scatterplotfordiamonddataset p p<- ggplot(diamonds,aes(carat,price)) ggp ( , ( ,p )) p+geom_point()
  34. 34. #usetransparencyandsmallsizepoints p g p+geom_point(size=0.1,alpha=0.1) _p ( , p )
  35. 35. #usebincharttoobserveintensityofpoints p p+stat_bin2d(bins=40) _ ( )
  36. 36. #estimatedatadentisy p+stat_density2d(aes(fill=..level..),geom="polygon")+ coord_cartesian(xlim =c(0,1.5),ylim=c(0,6000)) coord cartesian(xlim = c(0 1 5) ylim=c(0 6000))
  37. 37. SkillII:Bar Skill II:Bar
  38. 38. SkillIII:Histogram Skill III:Histogram
  39. 39. SkillIV:Line Skill IV:Line
  40. 40. SkillV:Tile Skill V:Tile
  41. 41. SkillVI:Map Skill VI:Map
  42. 42. Resources http://learnr.wordpress.com Redrawallthelatticegraph Redraw all the lattice graph byggplot2
  43. 43. Resources Alltheexamplesaredoneby ggplot2. ggplot2
  44. 44. Resources • http://wiki stdout org/rcookbook/Graphs/ http://wiki.stdout.org/rcookbook/Graphs/ • http://r-blogger.com • htt //St k http://Stackoverflow.com fl • http://xccds1977.blogspot.com • http://r-ke.info/ • http://www.youtube.com/watch?v=vnVJJYi1 mbw
  45. 45. Thankyou!Comebackformore! Signupat:www.meetup.com/nyc-open-data Givefeedbackat:www.bit.ly/nycopen

×