Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Visualisation in R
                       Hadley Wickham
            Assistant Professor / Dobelman Family Junior Chair
  ...
HE LL O
                        my name is




                       Hadley
Sunday, 25 July 2010
http://had.co.nz/
                        vanderbilt-vis


Sunday, 25 July 2010
1. Preview of today
                2. About ggplot2
                3. More resources
                4. Diving in




Su...
Fuel economy
                       Basic graphics


Sunday, 25 July 2010
●




                                          ●

                             40


                                     ...
●




                                          ●

                             40


                                     ...
Diamond prices
                   Displaying large data


Sunday, 25 July 2010
4000




         3000
 count




         2000




         1000




            0

                       56   58   60  ...
●   ●   ●
                                             ●
                                             ●   ●
              ...
15000



                                                   count
                                                       1...
US baby names
                        US baby names

                       Data manipulation and
                        ...
6.4




              6.2




              6.0
 avg_length




                                                          ...
0.95




        0.90




        0.85

                                                                  sex
 prop




  ...
Polishing your plots



Sunday, 25 July 2010
1. Scales: used to override default
                       perceptual mappings, and tune
                       parameters...
Sunday, 25 July 2010
ggplot2



Sunday, 25 July 2010
About ggplot2
                       Graphical grammar (domain specific
                       language), based on “The Gra...
Useful resources
                       http://had.co.nz/ggplot2
                       http://had.co.nz/ggplot2/book
    ...
Learning a new
                       language is hard!
Sunday, 25 July 2010
Scatterplot basics
                install.packages("ggplot2")
                library(ggplot2)

                ?mpg
    ...
Scatterplot basics
                install.packages("ggplot2")
                library(ggplot2)

                ?mpg
    ...
●




                                          ●

                             40


                                     ...
Additional variables

                       Can display additional variables with
                       aesthetics (like...
●




                                          ●

                             40


                                     ...
●




                                          ●

                             40


                                     ...
Your turn
                       Experiment with colour, size, and shape
                       aesthetics.
              ...
Discrete        Continuous

                        Rainbow of       Gradient from
          Colour
                      ...
Faceting

                       Small multiples displaying different
                       subsets of the data.
        ...
Your turn
                       qplot(displ, hwy, data = mpg) +
                       facet_grid(. ~ cyl)
              ...
Summary

                       facet_grid(): 2d grid, rows ~ cols, . for
                       no split
                ...
What’s the                                                                               ●   ●




                       ...
●   ●




                                                                                                               ●...
●   ●




                                                                                                               ●...
●                                       ●




                                                                       ●

  ...
How could        ●                                       ●




              we improve
                                  ...
●           ●




                                                                             ●

       40


            ...
●           ●




                                                                        ●

       40


                 ...
●                            ●




                                                                                       ...
●           ●




                                                                        ●

       40


                 ...
●
                                                                                                                        ...
Your turn

                       Read the help for reorder. Redraw the
                       previously plots with class...
Aside: coding strategy

                       At the end of each interactive session, you
                       want a s...
Sunday, 25 July 2010
This work is licensed under the Creative
       Commons Attribution-Noncommercial 3.0 United
       States License. To vie...
Upcoming SlideShare
Loading in …5
×

1 basics

1,815 views

Published on

Published in: Technology
  • Be the first to comment

1 basics

  1. 1. Visualisation in R Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University July 2010 Sunday, 25 July 2010
  2. 2. HE LL O my name is Hadley Sunday, 25 July 2010
  3. 3. http://had.co.nz/ vanderbilt-vis Sunday, 25 July 2010
  4. 4. 1. Preview of today 2. About ggplot2 3. More resources 4. Diving in Sunday, 25 July 2010
  5. 5. Fuel economy Basic graphics Sunday, 25 July 2010
  6. 6. ● ● 40 ● ● 35 ● ● ● ● ● ● ● ●● 30 ● ● ● ● ● ● ● ●● ● hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 displ Sunday, 25 July 2010
  7. 7. ● ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 displ Sunday, 25 July 2010
  8. 8. Diamond prices Displaying large data Sunday, 25 July 2010
  9. 9. 4000 3000 count 2000 1000 0 56 58 60 62 64 66 68 70 depth Sunday, 25 July 2010
  10. 10. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 50 60 70 80 90 table Sunday, 25 July 2010
  11. 11. 15000 count 1000 2000 10000 3000 price 4000 5000 6000 7000 5000 1 2 3 4 5 carat Sunday, 25 July 2010
  12. 12. US baby names US baby names Data manipulation and transformation Sunday, 25 July 2010
  13. 13. 6.4 6.2 6.0 avg_length sex boy girl 5.8 5.6 5.4 1880 1900 1920 1940 1960 1980 2000 year Sunday, 25 July 2010
  14. 14. 0.95 0.90 0.85 sex prop boy 0.80 girl 0.75 0.70 1880 1900 1920 1940 1960 1980 2000 year Sunday, 25 July 2010
  15. 15. Polishing your plots Sunday, 25 July 2010
  16. 16. 1. Scales: used to override default perceptual mappings, and tune parameters of axes and legends. 2. Themes: control presentation of non-data elements. 3. Saving your work: to include in reports, presentations, etc. Sunday, 25 July 2010
  17. 17. Sunday, 25 July 2010
  18. 18. ggplot2 Sunday, 25 July 2010
  19. 19. About ggplot2 Graphical grammar (domain specific language), based on “The Grammar of Graphics” by Leland Wilkinson. Specify what you want, not how to create it. Many fiddly details taken care of. “Instead of spending time making your graph look pretty, you can focus on creating a graph that bests reveals the messages in your data.” Sunday, 25 July 2010
  20. 20. Useful resources http://had.co.nz/ggplot2 http://had.co.nz/ggplot2/book http://groups.google.com/group/ggplot2 http://learnr.wordpress.com http://ggplot2.wik.is Sunday, 25 July 2010
  21. 21. Learning a new language is hard! Sunday, 25 July 2010
  22. 22. Scatterplot basics install.packages("ggplot2") library(ggplot2) ?mpg head(mpg) str(mpg) summary(mpg) qplot(displ, hwy, data = mpg) Sunday, 25 July 2010
  23. 23. Scatterplot basics install.packages("ggplot2") library(ggplot2) ?mpg head(mpg) In ggplot2, we str(mpg) always explicitly summary(mpg) specify the data qplot(displ, hwy, data = mpg) Sunday, 25 July 2010
  24. 24. ● ● 40 ● ● 35 ● ● ● ● ● ● ● ●● 30 ● ● ● ● ● ● ● ●● ● hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, data = mpg) displ Sunday, 25 July 2010
  25. 25. Additional variables Can display additional variables with aesthetics (like shape, colour, size) or facetting (small multiples displaying different subsets) Sunday, 25 July 2010
  26. 26. ● ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, colour = class, data = mpg) displ Sunday, 25 July 2010
  27. 27. ● ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● Legend chosen and displayed automatically. ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, colour = class, data = mpg) displ Sunday, 25 July 2010
  28. 28. Your turn Experiment with colour, size, and shape aesthetics. What’s the difference between discrete or continuous variables? What happens when you combine multiple aesthetics? Sunday, 25 July 2010
  29. 29. Discrete Continuous Rainbow of Gradient from Colour colours red to blue Linear mapping Discrete size Size between radius steps and value Different shape Shape Doesn’t work for each Sunday, 25 July 2010
  30. 30. Faceting Small multiples displaying different subsets of the data. Useful for exploring conditional relationships. Useful for large data. Sunday, 25 July 2010
  31. 31. Your turn qplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ .) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl) qplot(displ, hwy, data = mpg) + facet_wrap(~ class) Sunday, 25 July 2010
  32. 32. Summary facet_grid(): 2d grid, rows ~ cols, . for no split facet_wrap(): 1d ribbon wrapped into 2d Scales argument controls whether position scales are fixed or free. Sunday, 25 July 2010
  33. 33. What’s the ● ● 40 problem with ● this plot? ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● 10 15 20 25 30 35 qplot(cty, hwy, data = mpg) cty Sunday, 25 July 2010
  34. 34. ● ● ● 40 ● ● ● ●● 35 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 30 ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● hwy ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● 25 ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● 15 ●● ●● ●● ● ● ● ● ● ●● ●● 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") cty Sunday, 25 July 2010
  35. 35. ● ● ● 40 ● ● ● ●● 35 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 30 ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● hwy ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● 25 ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● 15 ●● ●● ●● ● ● ● ● ● ●● geom controls ●● “type” of plot 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") cty Sunday, 25 July 2010
  36. 36. ● ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) class Sunday, 25 July 2010
  37. 37. How could ● ● we improve ● 40 this plot? ● ● 35 ● ● Brainstorm ● ● ● ● ● ● 30 for 1 minute. ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) class Sunday, 25 July 2010
  38. 38. ● ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact reorder(class, hwy) Sunday, 25 July 2010
  39. 39. ● ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● Incredibly useful ● technique! ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data = mpg) reorder(class, hwy) Sunday, 25 July 2010
  40. 40. ● ● ● 40 ● ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● 25 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter") hwy, data Sunday, 25 July 2010
  41. 41. ● ● ● 40 ● 35 ● 30 hwy ● ● 25 ● ● ● ● 20 ● 15 ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot") reorder(class, = Sunday, 25 July 2010
  42. 42. ● ● ● ● ● ● 40 ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 30 ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● hwy ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● qplot(reorder(class,minivan pickup suv hwy), 2seater data = subcompact hwy, midsize mpg, compact geom = c("jitter", "boxplot")) reorder(class, hwy) Sunday, 25 July 2010
  43. 43. Your turn Read the help for reorder. Redraw the previously plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots? Sunday, 25 July 2010
  44. 44. Aside: coding strategy At the end of each interactive session, you want a summary of everything you did. Two options: 1. Save everything you did with savehistory() then remove the unimportant bits. 2. Build up the important bits as you go. (this is how I work) Sunday, 25 July 2010
  45. 45. Sunday, 25 July 2010
  46. 46. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Sunday, 25 July 2010

×