SlideShare a Scribd company logo
1 of 41
Download to read offline
Stat405         Graphic tips & tricks


                              Hadley Wickham
Wednesday, 9 September 2009
1. Homework
               2. Reading a scatterplot
               3. Scatterplot techniques for large data
               4. Iteration & story telling
               5. Project & homework



Wednesday, 9 September 2009
Homework
                   Great start!
                   Remember the grading scheme:
                   4.5–5 = A+, 4–4.5 = A, 3.5–4 = A-
                   Shorter is better than longer.
                   Check aspect ratios.
                   Read the comments!


Wednesday, 9 September 2009
Revision:
                              reading a scatterplot

                   • Big patterns
                   • Small patterns
                   • Deviations from the pattern
                   • Strange patterns




Wednesday, 9 September 2009
Wednesday, 9 September 2009
Strong linear relationship.
              A number of outliers.




Wednesday, 9 September 2009
Wednesday, 9 September 2009
Unusual striations. Two
                              groups? Little relationship
                              between table and price?




Wednesday, 9 September 2009
Wednesday, 9 September 2009
Curved (exponential?)
                              relationship. Outliers mostly
                              cheaper than expected.


Wednesday, 9 September 2009
But what’s the
                                 problem with
                              all these plots?


qplot(carat, price, data = diamonds)
Wednesday, 9 September 2009
But what’s the
                                 problem with
                              all these plots?
                                  In pairs, brainstorm
                              solutions for 2 minutes.

qplot(carat, price, data = diamonds)
Wednesday, 9 September 2009
Ideas

                   If x discrete, use boxplots.
                   Use semi-transparent points.
                   Divide into bins and count number of
                   points in each bin (2d histogram).
                   Display statistical summary.



Wednesday, 9 September 2009
Box and
                              whisker plots


Wednesday, 9 September 2009
Boxplots

                   Less information than a histogram, but
                   take up much less space.
                   Already seen them used with discrete x
                   values. Can also use with continuous x
                   values, by specifying how we want the
                   data grouped.



Wednesday, 9 September 2009
qplot(table, price, data = diamonds)
Wednesday, 9 September 2009
●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
         15000                          ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●
                                        ●




         10000
 price




         5000




                              50   60       70   80   90
qplot(table, price, data = diamonds, geom = "boxplot")
                             table
Wednesday, 9 September 2009
●   ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●   ●
                                       ●
                                       ●   ●
                                           ●   ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●     ●
                               ●       ●   ●   ●
                                               ●   ●   ●   ●   ●   ●
                                                                   ●   ●   ●
                                 ●     ●   ●
                                           ●   ●   ●
                                                   ●   ●   ●   ●   ●   ●
                                       ●   ●   ●       ●   ●   ●   ●
                               ● ●     ●
                                       ●   ●
                                           ●
                                           ●
                                               ●
                                               ●
                                                   ●
                                                   ●
                                                   ●   ●
                                                       ●
                                                       ●
                                                           ●
                                                           ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●
                                                                   ●
                                                                       ●
                                                                       ●   ● ●
                                       ●   ●   ●   ●
                                                   ●   ●
                                                       ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●   ●
                                 ●     ●   ●
                                           ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●   ●   ●   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●
                                                                   ●   ●   ●
                                 ●     ●
                                       ●   ●
                                           ●   ●   ●
                                                   ●   ●   ●
                                                           ●   ●
                                                               ●   ●   ●   ●
                                       ●
                                       ●   ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●
                                                                       ●   ● ●
                                       ●   ●   ●   ●   ●   ●   ●
                                                               ●   ●
                                                                   ●   ●           ●
                                  ●    ●   ●
                                           ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●   ●   ●   ●   ●     ●
                                                                             ●     ●
                                       ●   ●   ●
                                               ●   ●
                                                   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●   ●
                                  ●
                                  ●    ●   ●   ●   ●   ●   ●
                                                           ●   ●   ●   ●
                                                                       ●   ● ●
                                  ●    ●   ●
                                           ●   ●
                                               ●   ●   ●
                                                       ●   ●
                                                           ●
                                                           ●   ●
                                                               ●   ●   ●   ●
                              ●   ●
                                ● ●    ●   ●   ●   ●
                                                   ●   ●
                                                       ●   ●   ●
                                                               ●   ●   ●
                                  ●    ●   ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●   ●   ●   ●   ●           ●
                                  ●    ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●           ●●
                                                                                   ●
                                  ●    ●
                                       ●   ●   ●   ●   ●   ●   ●
                                                               ●   ●   ●
                                  ●    ●   ●
                                           ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●
                                                                       ●
                                       ●   ●   ●       ●
                                                       ●   ●   ●           ● ●
         15000                    ●
                                  ●
                                       ●
                                       ●
                                       ●
                                           ●
                                           ●
                                           ●
                                           ●
                                               ●
                                               ●
                                               ●
                                                   ●
                                                   ●
                                                   ●
                                                   ●
                                                       ●
                                                       ●
                                                       ●
                                                           ●
                                                           ●
                                                           ●
                                                           ●
                                                               ●
                                                               ●
                                                               ●
                                                                   ●
                                                                   ●
                                                                   ●
                                                                       ●
                                                                       ●
                                                                       ●   ●
                                                                           ●
                                                                             ●         ●
                                                                                       ●
                                       ●   ●
                                           ●   ●
                                               ●   ●   ●   ●   ●   ●
                                                                   ●   ●   ●
                                ●      ●
                                       ●   ●
                                           ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●       ●   ●
                                       ●
                                       ●   ●   ●   ●   ●   ●       ●
                                                                   ●
                                                                   ●   ●           ●
                                       ●
                                       ●   ●   ●   ●
                                                   ●   ●           ●
                                       ●   ●
                                           ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●           ●
                                                                   ●       ●
                                   ●   ●
                                       ●   ●   ●   ●   ●           ●   ●   ●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●               ●
                                                                   ●   ●
                                                                       ●
                                                                       ●   ●       ●●●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●               ●
                                                                   ●   ●           ●
                                   ●   ●
                                       ●   ●   ●   ●
                                                   ●               ●
                                                                   ●   ●       ●
                                       ●
                                       ●   ●
                                           ●   ●   ●
                                                   ●               ●
                                                                   ●   ●   ●         ●
                                           ●   ●   ●               ●
                                                                   ●   ●
                                                                       ●   ●
                                   ●
                                       ●
                                       ●
                                       ●
                                           ●
                                           ●   ●
                                               ●
                                               ●
                                                   ●
                                                   ●
                                                   ●               ●
                                                                   ●   ●
                                                                       ●   ●       ●●●     ●
                                       ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●               ●
                                                                   ●   ●
                                                                       ●   ●
                                                                           ●
                                       ●   ●   ●
                                               ●   ●               ●
                                                                   ●   ●   ●   ●           ●
                                       ●
                                       ●   ●
                                           ●   ●
                                               ●
                                               ●   ●
                                                   ●                   ●
                                       ●   ●   ●
                                               ●   ●                   ●   ●           ●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                       ●   ●   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                       ●
                                                                       ●   ●   ●   ●
                                   ●
                                   ●   ●   ●   ●                           ●   ●
                                       ●   ●   ●                           ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                               ●
                                                                               ●   ●● ●        ●
                                       ●
                                       ●   ●   ●                           ●   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                               ●   ●
                                   ●
                                   ●   ●   ●   ●
                                               ●                           ●         ● ●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                           ●
                                   ●   ●   ●
                                           ●
                                           ●   ●
                                               ●                                     ●
         10000                     ●
                                   ●
                                       ●
                                       ●
                                       ●
                                           ●
                                           ●
                                           ●
                                               ●                                   ●
 price




                                   ●   ●
                                       ●   ●
                                           ●
                                   ●
                                   ●   ●   ●                                           ●
                                   ●   ●
                                       ●   ●
                                           ●
                                   ●   ●
                                       ●   ●
                                           ●
                                       ●
                                       ●   ●
                                           ●                                           ●
                                       ●
                                       ●   ●
                                       ●
                                       ●
                                       ●
                                       ●




         5000




qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Wednesday, 9 September 2009
●   ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●   ●
                                       ●
                                       ●   ●
                                           ●   ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●     ●
                               ●       ●   ●   ●
                                               ●   ●   ●   ●   ●   ●
                                                                   ●   ●   ●
                                 ●     ●   ●
                                           ●   ●   ●
                                                   ●   ●   ●   ●   ●   ●
                                       ●   ●   ●       ●   ●   ●   ●
                               ● ●     ●
                                       ●   ●
                                           ●
                                           ●
                                               ●
                                               ●
                                                   ●
                                                   ●
                                                   ●   ●
                                                       ●
                                                       ●
                                                           ●
                                                           ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●
                                                                   ●
                                                                       ●
                                                                       ●   ● ●
                                       ●   ●   ●   ●
                                                   ●   ●
                                                       ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●   ●
                                 ●     ●   ●
                                           ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●   ●   ●   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●
                                                                   ●   ●   ●
                                 ●     ●
                                       ●   ●
                                           ●   ●   ●
                                                   ●   ●   ●
                                                           ●   ●
                                                               ●   ●   ●   ●
                                       ●
                                       ●   ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●
                                                                       ●   ● ●
                                       ●   ●   ●   ●   ●   ●   ●
                                                               ●   ●
                                                                   ●   ●           ●
                                  ●    ●   ●
                                           ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●   ●   ●   ●   ●     ●
                                                                             ●     ●
                                       ●   ●   ●
                                               ●   ●
                                                   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●   ●
                                  ●
                                  ●    ●   ●   ●   ●   ●   ●
                                                           ●   ●   ●   ●
                                                                       ●   ● ●
                                  ●    ●   ●
                                           ●   ●
                                               ●   ●   ●
                                                       ●   ●
                                                           ●
                                                           ●   ●
                                                               ●   ●   ●   ●
                              ●   ●
                                ● ●    ●   ●   ●   ●
                                                   ●   ●
                                                       ●   ●   ●
                                                               ●   ●   ●
                                  ●    ●   ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●   ●   ●   ●   ●           ●
                                  ●    ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●           ●●
                                                                                   ●
                                  ●    ●
                                       ●   ●   ●   ●   ●   ●   ●
                                                               ●   ●   ●
                                  ●    ●   ●
                                           ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●   ●
                                                               ●   ●
                                                                   ●   ●
                                                                       ●
                                       ●   ●   ●       ●
                                                       ●   ●   ●           ● ●
         15000                    ●
                                  ●
                                       ●
                                       ●
                                       ●
                                           ●
                                           ●
                                           ●
                                           ●
                                               ●
                                               ●
                                               ●
                                                   ●
                                                   ●
                                                   ●
                                                   ●
                                                       ●
                                                       ●
                                                       ●
                                                           ●
                                                           ●
                                                           ●
                                                           ●
                                                               ●
                                                               ●
                                                               ●
                                                                   ●
                                                                   ●
                                                                   ●
                                                                       ●
                                                                       ●
                                                                       ●   ●
                                                                           ●
                                                                             ●         ●
                                                                                       ●
                                       ●   ●
                                           ●   ●
                                               ●   ●   ●   ●   ●   ●
                                                                   ●   ●   ●
                                ●      ●
                                       ●   ●
                                           ●   ●
                                               ●
                                               ●   ●
                                                   ●   ●
                                                       ●   ●
                                                           ●       ●   ●
                                       ●
                                       ●   ●   ●   ●   ●   ●       ●
                                                                   ●
                                                                   ●   ●           ●
                                       ●
                                       ●   ●   ●   ●
                                                   ●   ●           ●
                                       ●   ●
                                           ●
                                           ●   ●
                                               ●   ●
                                                   ●   ●
                                                       ●           ●
                                                                   ●       ●
                                   ●   ●
                                       ●   ●   ●   ●   ●           ●   ●   ●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●               ●
                                                                   ●   ●
                                                                       ●
                                                                       ●   ●       ●●●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●               ●
                                                                   ●   ●           ●
                                   ●   ●
                                       ●   ●   ●   ●
                                                   ●               ●
                                                                   ●   ●       ●
                                       ●
                                       ●   ●
                                           ●   ●   ●
                                                   ●               ●
                                                                   ●   ●   ●         ●
                                           ●   ●   ●               ●
                                                                   ●   ●
                                                                       ●   ●
                                   ●
                                       ●
                                       ●
                                       ●
                                           ●
                                           ●   ●
                                               ●
                                               ●
                                                   ●
                                                   ●
                                                   ●               ●
                                                                   ●   ●
                                                                       ●   ●       ●●●     ●
                                       ●
                                       ●   ●
                                           ●   ●
                                               ●   ●
                                                   ●               ●
                                                                   ●   ●
                                                                       ●   ●
                                                                           ●
                                       ●   ●   ●
                                               ●   ●               ●
                                                                   ●   ●   ●   ●           ●
                                       ●
                                       ●   ●
                                           ●   ●
                                               ●
                                               ●   ●
                                                   ●                   ●
                                       ●   ●   ●
                                               ●   ●                   ●   ●           ●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                       ●   ●   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                       ●
                                                                       ●   ●   ●   ●
                                   ●
                                   ●   ●   ●   ●                           ●   ●
                                       ●   ●   ●                           ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                               ●
                                                                               ●   ●● ●        ●
                                       ●
                                       ●   ●   ●                           ●   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                               ●   ●
                                   ●
                                   ●   ●   ●   ●
                                               ●                           ●         ● ●
                                   ●
                                   ●   ●
                                       ●   ●
                                           ●   ●
                                               ●                           ●
                                   ●   ●   ●
                                           ●
                                           ●   ●
                                               ●                                     ●
         10000                     ●
                                   ●
                                       ●
                                       ●
                                       ●
                                           ●
                                           ●
                                           ●
                                               ●                                   ●
 price




                                   ●   ●
                                       ●   ●
                                           ●
                                   ●
                                   ●   ●   ●                                           ●
                                   ●   ●
                                       ●   ●
                                           ●
                                   ●   ●
                                       ●   ●
                                           ●
                                       ●
                                       ●   ●
                                           ●                                           ●
                                       ●
                                       ●   ●
                                       ●
                                       ●
                                       ●
                                       ●




         5000




     One boxplot for
    each unique value
     of this aesthetic
qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Wednesday, 9 September 2009
Alpha blending



Wednesday, 9 September 2009
qplot(carat, price, data = diamonds, alpha = I(1/10))
Wednesday, 9 September 2009
qplot(carat, price, data = diamonds, alpha = I(1/50))
Wednesday, 9 September 2009
qplot(carat, price, data = diamonds, alpha = I(1/250))
Wednesday, 9 September 2009
Statistical summary



Wednesday, 9 September 2009
qplot(carat, price, data = diamonds) + geom_smooth()
Wednesday, 9 September 2009
qplot(log10(carat), log10(price), data = diamonds) + geom_smooth()
Wednesday, 9 September 2009
qplot(log10(carat), log10(price), data = diamonds) +
  geom_smooth(method = "lm")
Wednesday, 9 September 2009
2d bins



Wednesday, 9 September 2009
# Very basic cleaning
     diamonds$x[diamonds$x == 0] <- NA
     diamonds$y[diamonds$y == 0] <- NA
     diamonds$y[diamonds$y > 12] <- NA

     qplot(x,                 y,   data   =   diamonds)
     qplot(x,                 y,   data   =   diamonds,   geom   =   "bin2d")
     qplot(x,                 y,   data   =   diamonds,   geom   =   "hex")
     qplot(x,                 y,   data   =   diamonds,   geom   =   "bin2d", bins = 100)
     qplot(x,                 y,   data   =   diamonds,   geom   =   "hex", bins = 100)

     # Zoom in
     qplot(x, y,                   data = diamonds, geom = "bin2d", bins = 100) +
       xlim(4,7)                   + ylim(4,7)
     qplot(x, y,                   data = diamonds, geom = "bin2d", bins = 100) +
       xlim(4,5)                   + ylim(4,5)

Wednesday, 9 September 2009
qplot(x,                 x / y, data = diamonds,
       geom =                 "bin2d")
     qplot(x,                 log(x / y), data = diamonds,
       geom =                 "bin2d")

     clean <- subset(diamonds, abs(log(x / y)) < 0.1)

     qplot(x, log(x / y), data = clean, geom = "bin2d")
     qplot(x, log(x / y), data = clean, geom = "bin2d",
       bins = 80)




Wednesday, 9 September 2009
qplot(x,                 x / y, data = diamonds,
       geom =                 "bin2d")
     qplot(x,                 log(x / y), data = diamonds,
       geom =                 "bin2d")

     clean <- subset(diamonds, abs(log(x / y)) < 0.1)

     qplot(x, log(x / y), data = clean, geom = "bin2d")
     qplot(x, log(x / y), data = clean, geom = "bin2d",
       bins = 80)
                             What would be a good name for
                               log(x / y)? What other variable
                              might you create to go with it?

Wednesday, 9 September 2009
Your turn
                   Continue to explore the relationship
                   between x, y, z and depth. Create new
                   variables as necessary.
                   (Hint: rerun the cleaning code from last
                   week, and create more as necessary)
                   Some good ideas here: http://
                   www.diamondhelpers.com/fivesteps/4-
                   certified-diamonds.shtml


Wednesday, 9 September 2009
x
                                      table width




                                                              z




                                  depth = z / diameter
                              table = table width / x * 100

Wednesday, 9 September 2009
y_big <- diamonds$y > 10
     z_big <- diamonds$z > 6

     x_zero <- diamonds$x == 0
     y_zero <- diamonds$y == 0
     z_zero <- diamonds$z == 0

     diamonds$x[x_zero] <- NA
     diamonds$y[y_zero | y_big] <- NA
     diamonds$z[z_zero | z_big] <- NA




Wednesday, 9 September 2009
qplot(z/y * 100, depth, data = diamonds)
     last_plot() + xlim(50, 100)
     last_plot() + xlim(50, 80) + ylim(50, 80)

     qplot(z/x * 100, depth, data = diamonds) +
       xlim(50, 80) + ylim(50, 80)
     qplot(z/x * 100, depth / (z/x), data = diamonds)
     last_plot() + xlim(50, 80) + ylim(80, 120)
     last_plot() + ylim(95, 105)

     # ...



Wednesday, 9 September 2009
Iteration & stories



Wednesday, 9 September 2009
Stories
                   Best data analyses tell a story, with a
                   natural flow from beginning to end.
                   For homeworks, try and come up with
                   three plots that tell a story.
                   Stories about a small sample of the data
                   can work well.



Wednesday, 9 September 2009
qplot(cty, hwy, data = mpg)
     qplot(cty, hwy, data = mpg, geom = "jitter")
     qplot(cty, hwy, data = mpg, geom = "jitter", colour =
     class)
     qplot(cty, cty / hwy, data = mpg, geom = "jitter",
     colour = class)
     qplot(cty, cty / hwy, data = mpg, colour = class)
     qplot(displ, cty / hwy, data = mpg, colour = class)
     qplot(displ, cty / hwy, data = mpg) + facet_wrap(~
     class)
     qplot(displ, cty / hwy, data = mpg) + facet_wrap(~
     class) + geom_smooth(se = F)
     qplot(displ, cty / hwy, data = mpg) + facet_wrap(~
     class) + geom_smooth(method = "lm", se = F)

     qplot(displ, cty, data = mpg) + facet_wrap(~ class)

Wednesday, 9 September 2009
Project
                   Due in 3.5 weeks.
                   Bigger group data analysis project. (Will
                   be discussing group dynamics on
                   Monday)
                   Homework is to get you started working
                   with the data.



Wednesday, 9 September 2009
Next week

                   Checking on a slot machine.
                   Learning how to write functions.
                   Basics of simulation.




Wednesday, 9 September 2009
Feedback
     http://hadley.wufoo.com/forms/stat405-feedback/




Wednesday, 9 September 2009

More Related Content

Viewers also liked

Viewers also liked (9)

02 Ddply
02 Ddply02 Ddply
02 Ddply
 
01 Intro
01 Intro01 Intro
01 Intro
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data Analysis
 
27 development
27 development27 development
27 development
 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Basic java important interview questions and answers to secure a job
Basic java important interview questions and answers to secure a jobBasic java important interview questions and answers to secure a job
Basic java important interview questions and answers to secure a job
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 

More from Hadley Wickham (20)

27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

05 Tips Tricks

  • 1. Stat405 Graphic tips & tricks Hadley Wickham Wednesday, 9 September 2009
  • 2. 1. Homework 2. Reading a scatterplot 3. Scatterplot techniques for large data 4. Iteration & story telling 5. Project & homework Wednesday, 9 September 2009
  • 3. Homework Great start! Remember the grading scheme: 4.5–5 = A+, 4–4.5 = A, 3.5–4 = A- Shorter is better than longer. Check aspect ratios. Read the comments! Wednesday, 9 September 2009
  • 4. Revision: reading a scatterplot • Big patterns • Small patterns • Deviations from the pattern • Strange patterns Wednesday, 9 September 2009
  • 6. Strong linear relationship. A number of outliers. Wednesday, 9 September 2009
  • 8. Unusual striations. Two groups? Little relationship between table and price? Wednesday, 9 September 2009
  • 10. Curved (exponential?) relationship. Outliers mostly cheaper than expected. Wednesday, 9 September 2009
  • 11. But what’s the problem with all these plots? qplot(carat, price, data = diamonds) Wednesday, 9 September 2009
  • 12. But what’s the problem with all these plots? In pairs, brainstorm solutions for 2 minutes. qplot(carat, price, data = diamonds) Wednesday, 9 September 2009
  • 13. Ideas If x discrete, use boxplots. Use semi-transparent points. Divide into bins and count number of points in each bin (2d histogram). Display statistical summary. Wednesday, 9 September 2009
  • 14. Box and whisker plots Wednesday, 9 September 2009
  • 15. Boxplots Less information than a histogram, but take up much less space. Already seen them used with discrete x values. Can also use with continuous x values, by specifying how we want the data grouped. Wednesday, 9 September 2009
  • 16. qplot(table, price, data = diamonds) Wednesday, 9 September 2009
  • 17. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 price 5000 50 60 70 80 90 qplot(table, price, data = diamonds, geom = "boxplot") table Wednesday, 9 September 2009
  • 18. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Wednesday, 9 September 2009
  • 19. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 One boxplot for each unique value of this aesthetic qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Wednesday, 9 September 2009
  • 20. Alpha blending Wednesday, 9 September 2009
  • 21. qplot(carat, price, data = diamonds, alpha = I(1/10)) Wednesday, 9 September 2009
  • 22. qplot(carat, price, data = diamonds, alpha = I(1/50)) Wednesday, 9 September 2009
  • 23. qplot(carat, price, data = diamonds, alpha = I(1/250)) Wednesday, 9 September 2009
  • 25. qplot(carat, price, data = diamonds) + geom_smooth() Wednesday, 9 September 2009
  • 26. qplot(log10(carat), log10(price), data = diamonds) + geom_smooth() Wednesday, 9 September 2009
  • 27. qplot(log10(carat), log10(price), data = diamonds) + geom_smooth(method = "lm") Wednesday, 9 September 2009
  • 28. 2d bins Wednesday, 9 September 2009
  • 29. # Very basic cleaning diamonds$x[diamonds$x == 0] <- NA diamonds$y[diamonds$y == 0] <- NA diamonds$y[diamonds$y > 12] <- NA qplot(x, y, data = diamonds) qplot(x, y, data = diamonds, geom = "bin2d") qplot(x, y, data = diamonds, geom = "hex") qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) qplot(x, y, data = diamonds, geom = "hex", bins = 100) # Zoom in qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) + xlim(4,7) + ylim(4,7) qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) + xlim(4,5) + ylim(4,5) Wednesday, 9 September 2009
  • 30. qplot(x, x / y, data = diamonds, geom = "bin2d") qplot(x, log(x / y), data = diamonds, geom = "bin2d") clean <- subset(diamonds, abs(log(x / y)) < 0.1) qplot(x, log(x / y), data = clean, geom = "bin2d") qplot(x, log(x / y), data = clean, geom = "bin2d", bins = 80) Wednesday, 9 September 2009
  • 31. qplot(x, x / y, data = diamonds, geom = "bin2d") qplot(x, log(x / y), data = diamonds, geom = "bin2d") clean <- subset(diamonds, abs(log(x / y)) < 0.1) qplot(x, log(x / y), data = clean, geom = "bin2d") qplot(x, log(x / y), data = clean, geom = "bin2d", bins = 80) What would be a good name for log(x / y)? What other variable might you create to go with it? Wednesday, 9 September 2009
  • 32. Your turn Continue to explore the relationship between x, y, z and depth. Create new variables as necessary. (Hint: rerun the cleaning code from last week, and create more as necessary) Some good ideas here: http:// www.diamondhelpers.com/fivesteps/4- certified-diamonds.shtml Wednesday, 9 September 2009
  • 33. x table width z depth = z / diameter table = table width / x * 100 Wednesday, 9 September 2009
  • 34. y_big <- diamonds$y > 10 z_big <- diamonds$z > 6 x_zero <- diamonds$x == 0 y_zero <- diamonds$y == 0 z_zero <- diamonds$z == 0 diamonds$x[x_zero] <- NA diamonds$y[y_zero | y_big] <- NA diamonds$z[z_zero | z_big] <- NA Wednesday, 9 September 2009
  • 35. qplot(z/y * 100, depth, data = diamonds) last_plot() + xlim(50, 100) last_plot() + xlim(50, 80) + ylim(50, 80) qplot(z/x * 100, depth, data = diamonds) + xlim(50, 80) + ylim(50, 80) qplot(z/x * 100, depth / (z/x), data = diamonds) last_plot() + xlim(50, 80) + ylim(80, 120) last_plot() + ylim(95, 105) # ... Wednesday, 9 September 2009
  • 36. Iteration & stories Wednesday, 9 September 2009
  • 37. Stories Best data analyses tell a story, with a natural flow from beginning to end. For homeworks, try and come up with three plots that tell a story. Stories about a small sample of the data can work well. Wednesday, 9 September 2009
  • 38. qplot(cty, hwy, data = mpg) qplot(cty, hwy, data = mpg, geom = "jitter") qplot(cty, hwy, data = mpg, geom = "jitter", colour = class) qplot(cty, cty / hwy, data = mpg, geom = "jitter", colour = class) qplot(cty, cty / hwy, data = mpg, colour = class) qplot(displ, cty / hwy, data = mpg, colour = class) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) + geom_smooth(se = F) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) + geom_smooth(method = "lm", se = F) qplot(displ, cty, data = mpg) + facet_wrap(~ class) Wednesday, 9 September 2009
  • 39. Project Due in 3.5 weeks. Bigger group data analysis project. (Will be discussing group dynamics on Monday) Homework is to get you started working with the data. Wednesday, 9 September 2009
  • 40. Next week Checking on a slot machine. Learning how to write functions. Basics of simulation. Wednesday, 9 September 2009
  • 41. Feedback http://hadley.wufoo.com/forms/stat405-feedback/ Wednesday, 9 September 2009