SlideShare a Scribd company logo
1 of 44
Download to read offline
Stat405   Graphics for large data


                           Hadley Wickham
Thursday, 26 August 2010
Majoring in Stat

                    • Declare early (even if you’re not sure)
                    • Weekly lunches
                    • Summer opportunities
                      (research & internships)




Thursday, 26 August 2010
1. Leftovers from last lecture
                2. The diamonds data
                3. Histograms and bar charts
                4. More boxplots and scatterplots
                5. Homework



Thursday, 26 August 2010
# Remember: start with                                                                                                            ●                            ●


             library(ggplot2)                                                                                                                                           ●

       40

                                                                                                                                                                                    ●

                                                                                                                                                            ●
                                                                                                                                           ●
                                                                                                                                                                                         ●    ●
       35
                                                                                                                                                   ●
                                                                                                                                                                                ●
                                                                                                                                                                    ●
                                                                                                                         ●                 ●       ●
                                                                                                                    ●                                           ●
                                                                                                                    ●                                                                   ●          ●
                                                                                                                     ●       ●                                                           ●
                                                                                                                                                                                ●
       30                                                                                                                     ●   ●                                                               ●

                                                                                                                          ● ● ●       ●●           ●       ●●                  ● ● ● ●
                                                                                                                                                                               ●
                                                                                                                                                                            ● ●●
                                                                                                                                                                             ●
                                                                                                                                           ●                        ●       ●●  ●
                                                                                                                              ●                                     ●             ●
                                                                                                                         ● ● ●                 ●
 hwy




                                                                                                                            ●                                               ●                ● ●
                                                                                                                                                                                             ● ●
                                                                    ●                                                ● ●     ●●                                         ●   ●
                                                        ●                                               ●             ●● ●
                                                                                                                      ●        ●
                                                                                                                                 ●         ●●              ●●                   ●        ●
                                                                                                            ●       ● ●
                                                                                                                    ● ●
                                                                                                                    ●         ●●●          ● ●                                   ●        ●     ●●
                                                                                                                    ●                                               ●                          ●
                                                                                                                                                   ●                            ● ●
       25                                           ●       ●                                     ●                          ● ●●
                                                                                                                                                                                         ●●
                                                                                                                                                                                         ●
                                                                                                                                                                                               ●
                                                                                                                                                                                               ●
                                                ●                            ●                ●                                                    ●                 ●              ●    ●
                                                                                                                ●                              ●
                                                                                     ●●                              ●                                              ●
                                                                                                            ●                ●
                                                                ●                         ●                                                    ●                                               ●
                                                                         ●
                                   ●                    ●                                 ●                                                                     ●
                                                                                 ●        ●                                                                      ●
                                                                         ●
                                                                                                                                                       ●
                      ●        ●
       20                          ●
                                           ●     ●
                                               ● ●          ●●      ●                                                                                           ●
                   ●                                        ●●      ●●
                 ●●        ●                ●                       ●●
                                                  ●      ●
                                       ●      ●
                   ●                          ●● ● ●●●
                    ●     ●
                          ●    ●           ● ● ● ●●
                      ●  ●     ●           ●●● ●     ●     ●                     ●
                  ● ● ●
                   ●                           ●● ●     ●  ●
                  ●● ●        ●                        ● ●
                        ●
                 ●                            ● ●
       15        ● ●        ●                ● ●        ● ●
                                                 ●
                                                 ●


                  ●            ●   ●                 ●          ●




                      pickup                         suv                     minivan                  2seater            midsize           subcompact                           compact
qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter")
                            hwy, data
Thursday, 26 August 2010
●           ●




                                                                       ●

       40


                                                                                   ●


       35                                                                          ●




       30
 hwy




                                ●
                                ●

       25                       ●
                                ●
                                ●
                           ●


       20


                                        ●


       15


                           ●    ●



                      pickup   suv   minivan   2seater   midsize   subcompact   compact
qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot")
                           reorder(class, =
Thursday, 26 August 2010
●
                                                                                                                                                       ●                            ●
                                                                                                                                                                                    ●




                                                                                                                                                       ●           ●

       40


                                                                                                                                                                                    ● ●
                                                                                                                                                   ● ●

       35                                                                                                                                                                  ●        ● ●
                                                                                                                                                       ●
                                                                                                                                                                                           ●
                                                                                                                                                                   ●
                                                                                                                                ●             ● ● ●
                                                                                                               ●                                                           ●                 ●
                                                                                                                       ●        ●●                                                       ●
       30                                                                                                      ●           ●    ●                                              ●
                                                                                                                                                                              ●●
                                                                                                                                                                                        ●●
                                                                                                                                ●
                                                                                                                               ● ●                                 ●                      ●●
                                                                                                                                          ●   ●●●                           ●● ●●       ● ●●
                                                                                                                                   ●
                                                                                                                       ●       ●               ●                           ●
 hwy




                                                                                                                                ●
                                                                                                                                ●                              ●
                                                                                                               ●                 ●             ●                                    ●● ●●
                                                         ● ●                                                                                                               ●           ●
                                                                                                                   ●     ●            ●                                                 ●
                                                 ●                                                             ●        ●
                                                                                                                        ● ●          ●    ●   ●●●                  ●              ●●
                                                         ●                                             ●               ●●            ●
                                                                                                                                    ●●                                        ●      ●●
                                                                                                                                                                                      ●
                                                                                               ●                   ●    ●
                                                                                                                        ●           ●       ●   ●●                              ●
                                                                                                           ●
       25                                    ●           ●         ●                                               ●                ●                                        ● ●● ●
                                                                                                                                     ●
                                                                                                                                     ●                             ●       ●            ●
                                                         ●             ●                                   ●                              ●                ●
                                                                                                                                                           ●                         ●●
                                                                       ●       ●   ●                                   ●                                       ●
                                                               ●                                                                                                             ●
                                             ●           ●                                         ●               ●                                       ●
                                                                       ●                   ●
                           ●                                               ●
                           ●                             ●                 ●
                                                                                           ●                                                       ●           ●
                                                                                                                                                                       ●
                                                                                       ●
                                   ●          ●
       20          ●       ●               ●● ●
                                            ●        ●
                                                     ●                                                                                                 ●
                                                    ●● ● ●●
                  ●                ●● ●    ● ●       ●    ●
                                    ●
                                    ●          ●   ● ●  ●
                                                 ●
                                                 ●     ●
                             ●                  ●      ●                       ●
                 ● ●●
                    ●     ● ●●             ●●●●● ● ● ●
                                             ●
                                              ● ●● ● ● ● ●
                                                 ●                             ●
                      ● ● ●                            ●
                       ●                       ●
                  ●   ●
                      ●    ●
                         ●                   ●       ●
       15         ●     ●    ●               ●        ●●           ●
                                                     ●
                                                     ●

                                   ●             ●
                      ●        ●                         ●
                                                         ●
                                       ●



qplot(reorder(class,minivan
       pickup suv
                      hwy), 2seater data = subcompact
                                  hwy, midsize mpg,                                                                                                                            compact
  geom = c("jitter", "boxplot"))
                         reorder(class, hwy)
Thursday, 26 August 2010
Your turn

                    Read the help for reorder. Redraw the
                    previous plots with class ordered by
                    median hwy.
                    How would you put the jittered points on
                    top of the boxplots?




Thursday, 26 August 2010
Diamonds



Thursday, 26 August 2010
Diamonds data
                    ~54,000 round diamonds from
                    http://www.diamondse.info/
                    Carat, colour, clarity, cut
                    Total depth, table, depth,
                    width, height
                    Price


Thursday, 26 August 2010
x
                                   table width




                                                           z




                               depth = z / diameter
                           table = table width / x * 100

Thursday, 26 August 2010
Recall

                    Write down five ways to inspect the
                    diamonds dataset.
                    You have one minute!




Thursday, 26 August 2010
Your turn


                    Inspect the data and familiarise yourself
                    with the variables. If you don’t know what
                    they mean, look them up on wikipedia.




Thursday, 26 August 2010
Histogram &
                            bar charts


Thursday, 26 August 2010
Histograms and
                              barcharts

                    Used to display the distribution of a
                    variable
                    Categorical variable → bar chart
                    Continuous variable → histogram




Thursday, 26 August 2010
Always
     experiment with
      the bin width!
Thursday, 26 August 2010
Examples
                # With only one variable, qplot guesses that
                # you want a bar chart or histogram
                qplot(cut, data = diamonds)

                qplot(carat, data = diamonds)
                qplot(carat, data = diamonds, binwidth = 1)
                qplot(carat, data = diamonds, binwidth = 0.1)
                qplot(carat, data = diamonds, binwidth = 0.01)
                resolution(diamonds$carat)

                last_plot() + xlim(0, 3)


Thursday, 26 August 2010
Examples
                # With only one variable, qplot guesses that
                # you want a bar chart or histogram
                qplot(cut, data = diamonds)

                qplot(carat, data = diamonds)
                qplot(carat, data = diamonds, binwidth = 1)
                      Common ggplot2
                qplot(carat, data = diamonds, binwidth = 0.1)
                      technique: adding
                qplot(carat, data = diamonds, binwidth = 0.01)
                         together plot
                resolution(diamonds$carat)
                           components

                last_plot() + xlim(0, 3)


Thursday, 26 August 2010
qplot(table, data = diamonds, binwidth = 1)

     # To zoom in on a plot region   use xlim() and ylim()
     qplot(table, data = diamonds,   binwidth = 1) +
        xlim(50, 70)
     qplot(table, data = diamonds,   binwidth = 0.1) +
       xlim(50, 70)
     qplot(table, data = diamonds,   binwidth = 0.1) +
       xlim(50, 70) + ylim(0, 50)

     # Note that this type of zooming discards data
     outside of the plot regions
     # See coord_cartesian() for an alternative


Thursday, 26 August 2010
Additional variables

                    As with scatterplots can use aesthetics
                    or faceting. Using aesthetics creates
                    pretty, but ineffective, plots.
                    The following examples show the
                    difference, when investigation the
                    relationship between cut and depth.



Thursday, 26 August 2010
4000




         3000
 count




         2000




         1000




            0

                           56   58   60   62   64   66   68   70
qplot(depth, data = diamonds, binwidth = 0.2)
                          depth
Thursday, 26 August 2010
4000




         3000

                                            cut
                                                  Fair
                                                  Good
 count




         2000                                     Very Good
                                                  Premium
                                                  Ideal



         1000




            0

qplot(depth, data = diamonds, binwidth = 0.2,
        56   58   60    62  64  66   68   70
  fill = cut) + xlim(55, 70)
                      depth
Thursday, 26 August 2010
4000




         3000

                                            cut
                                                  Fair
                                                  Good
 count




         2000                                     Very Good
                                                  Premium
                                                  Ideal



         1000




         Fill is the aesthetic
           0
             for fill colour
qplot(depth, data = diamonds, binwidth = 0.2,
        56   58   60    62  64  66   68   70
  fill = cut) + xlim(55, 70)
                      depth
Thursday, 26 August 2010
Fair    Good               Very Good


         2500

         2000

         1500

         1000

         500

            0
 count




                           Premium   Ideal


         2500

         2000

         1500

         1000

         500

            0
qplot(depth, 62 64 66= 68 70 56 58 60 binwidth = 0.2) +
       56 58 60 data    diamonds, 62 64 66 68 70 56 58 60   62 64 66 68 70

  xlim(55, 70) + facet_wrap(~depth    cut)
Thursday, 26 August 2010
Your turn

                    Explore the distribution of price.
                    How does it vary with colour, or cut, and
                    clarity?
                    Practice zooming in on regions of interest.




Thursday, 26 August 2010
Box and
                           whisker plots


Thursday, 26 August 2010
Boxplots

                    Less information than a histogram, but
                    take up much less space.
                    Already seen them used with discrete x
                    values. Can also use with continuous x
                    values, by specifying how we want the
                    data grouped.



Thursday, 26 August 2010
qplot(table, price, data = diamonds)
Thursday, 26 August 2010
●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
         15000                       ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●




         10000
 price




         5000




                           50   60       70   80   90
qplot(table, price, data = diamonds, geom = "boxplot")
                             table
Thursday, 26 August 2010
●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●   ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●     ●
                            ●       ●   ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●   ●
                                                ●   ●   ●   ●   ●   ●
                                    ●   ●   ●       ●   ●   ●   ●
                            ● ●     ●
                                    ●   ●
                                        ●
                                        ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●   ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●
                                                ●   ●
                                                    ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●   ●   ●   ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●   ●   ●
                              ●     ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●   ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                                    ●
                                    ●   ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●
                                                                ●   ●           ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●     ●
                                                                          ●     ●
                                    ●   ●   ●
                                            ●   ●
                                                ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                               ●
                               ●    ●   ●   ●   ●   ●   ●
                                                        ●   ●   ●   ●
                                                                    ●   ● ●
                               ●    ●   ●
                                        ●   ●
                                            ●   ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                           ●   ●
                             ● ●    ●   ●   ●   ●
                                                ●   ●
                                                    ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●           ●
                               ●    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●           ●●
                                                                                ●
                               ●    ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●
                                    ●   ●   ●       ●
                                                    ●   ●   ●           ● ●
         15000                 ●
                               ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                        ●
                                            ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●
                                                ●
                                                    ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●
                                                        ●
                                                            ●
                                                            ●
                                                            ●
                                                                ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●
                                                                    ●   ●
                                                                        ●
                                                                          ●         ●
                                                                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                             ●      ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●       ●   ●
                                    ●
                                    ●   ●   ●   ●   ●   ●       ●
                                                                ●
                                                                ●   ●           ●
                                    ●
                                    ●   ●   ●   ●
                                                ●   ●           ●
                                    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●           ●
                                                                ●       ●
                                ●   ●
                                    ●   ●   ●   ●   ●           ●   ●   ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●
                                                                    ●   ●       ●●●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●           ●
                                ●   ●
                                    ●   ●   ●   ●
                                                ●               ●
                                                                ●   ●       ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●               ●
                                                                ●   ●   ●         ●
                                        ●   ●   ●               ●
                                                                ●   ●
                                                                    ●   ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●   ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●       ●●●     ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●
                                                                        ●
                                    ●   ●   ●
                                            ●   ●               ●
                                                                ●   ●   ●   ●           ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●                   ●
                                    ●   ●   ●
                                            ●   ●                   ●   ●           ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●   ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●
                                                                    ●   ●   ●   ●
                                ●
                                ●   ●   ●   ●                           ●   ●
                                    ●   ●   ●                           ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●
                                                                            ●   ●● ●        ●
                                    ●
                                    ●   ●   ●                           ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●   ●
                                ●
                                ●   ●   ●   ●
                                            ●                           ●         ● ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                           ●
                                ●   ●   ●
                                        ●
                                        ●   ●
                                            ●                                     ●
         10000                  ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                            ●                                   ●
 price




                                ●   ●
                                    ●   ●
                                        ●
                                ●
                                ●   ●   ●                                           ●
                                ●   ●
                                    ●   ●
                                        ●
                                ●   ●
                                    ●   ●
                                        ●
                                    ●
                                    ●   ●
                                        ●                                           ●
                                    ●
                                    ●   ●
                                    ●
                                    ●
                                    ●
                                    ●




         5000




qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Thursday, 26 August 2010
●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●   ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●     ●
                            ●       ●   ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●   ●
                                                ●   ●   ●   ●   ●   ●
                                    ●   ●   ●       ●   ●   ●   ●
                            ● ●     ●
                                    ●   ●
                                        ●
                                        ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●   ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●
                                                ●   ●
                                                    ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●   ●   ●   ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●   ●   ●
                              ●     ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●   ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                                    ●
                                    ●   ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●
                                                                ●   ●           ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●     ●
                                                                          ●     ●
                                    ●   ●   ●
                                            ●   ●
                                                ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                               ●
                               ●    ●   ●   ●   ●   ●   ●
                                                        ●   ●   ●   ●
                                                                    ●   ● ●
                               ●    ●   ●
                                        ●   ●
                                            ●   ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                           ●   ●
                             ● ●    ●   ●   ●   ●
                                                ●   ●
                                                    ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●           ●
                               ●    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●           ●●
                                                                                ●
                               ●    ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●
                                    ●   ●   ●       ●
                                                    ●   ●   ●           ● ●
         15000                 ●
                               ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                        ●
                                            ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●
                                                ●
                                                    ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●
                                                        ●
                                                            ●
                                                            ●
                                                            ●
                                                                ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●
                                                                    ●   ●
                                                                        ●
                                                                          ●         ●
                                                                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                             ●      ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●       ●   ●
                                    ●
                                    ●   ●   ●   ●   ●   ●       ●
                                                                ●
                                                                ●   ●           ●
                                    ●
                                    ●   ●   ●   ●
                                                ●   ●           ●
                                    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●           ●
                                                                ●       ●
                                ●   ●
                                    ●   ●   ●   ●   ●           ●   ●   ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●
                                                                    ●   ●       ●●●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●           ●
                                ●   ●
                                    ●   ●   ●   ●
                                                ●               ●
                                                                ●   ●       ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●               ●
                                                                ●   ●   ●         ●
                                        ●   ●   ●               ●
                                                                ●   ●
                                                                    ●   ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●   ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●       ●●●     ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●
                                                                        ●
                                    ●   ●   ●
                                            ●   ●               ●
                                                                ●   ●   ●   ●           ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●                   ●
                                    ●   ●   ●
                                            ●   ●                   ●   ●           ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●   ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●
                                                                    ●   ●   ●   ●
                                ●
                                ●   ●   ●   ●                           ●   ●
                                    ●   ●   ●                           ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●
                                                                            ●   ●● ●        ●
                                    ●
                                    ●   ●   ●                           ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●   ●
                                ●
                                ●   ●   ●   ●
                                            ●                           ●         ● ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                           ●
                                ●   ●   ●
                                        ●
                                        ●   ●
                                            ●                                     ●
         10000                  ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                            ●                                   ●
 price




                                ●   ●
                                    ●   ●
                                        ●
                                ●
                                ●   ●   ●                                           ●
                                ●   ●
                                    ●   ●
                                        ●
                                ●   ●
                                    ●   ●
                                        ●
                                    ●
                                    ●   ●
                                        ●                                           ●
                                    ●
                                    ●   ●
                                    ●
                                    ●
                                    ●
                                    ●




         5000




     One boxplot for
    each unique value
     of this aesthetic
qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Thursday, 26 August 2010
Scatterplots



Thursday, 26 August 2010
Interpreting a
                             scatterplot

                    • Global patterns
                    • Local patterns
                    • Deviations




Thursday, 26 August 2010
Thursday, 26 August 2010
Strong linear relationship.
               A number of outliers.




Thursday, 26 August 2010
Thursday, 26 August 2010
Unusual striations. Two
                           groups? Little relationship
                           between table and price?




Thursday, 26 August 2010
Thursday, 26 August 2010
Curved (exponential?)
                           relationship. Outliers mostly
                           cheaper than expected.


Thursday, 26 August 2010
But what’s the
                               problem with
                            all these plots?


qplot(carat, price, data = diamonds)
Thursday, 26 August 2010
But what’s the
                               problem with
                            all these plots?
                               In pairs, brainstorm
                           solutions for 2 minutes.

qplot(carat, price, data = diamonds)
Thursday, 26 August 2010
Idea             ggplot
                     Small points        shape = I(".")

                   Transparency         alpha = I(1/50)

                           Jittering    geom = "jitter"

                  Smooth curve          geom = "smooth"
                                        geom = "bin2d" or
                           2d bins         geom = "hex"

             Density contours          geom = "density2d"
Thursday, 26 August 2010
Your turn

                    Practice doing these plots yourself.
                    Read the online documentation for each
                    plot type: http://had.co.nz/ggplot2




Thursday, 26 August 2010
Homework

                    Practice your graphics/data exploration
                    skills with the diamonds or mpg data.
                    Due in one week.
                    Make sure to read the grading rubric, and
                    find a colour printer.



Thursday, 26 August 2010
Asking questions

                    You have two minutes to write down as
                    many questions as you can come up with
                    that you might want to answer about the
                    diamonds data.
                    Write your best question on a piece of
                    paper and turn it in.



Thursday, 26 August 2010

More Related Content

Similar to 02 Large

Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1rusersla
 
Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Hadley Wickham
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En StrategieGuus Vos
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And StrategyGuus Vos
 
How People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersHow People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersMat Morrison
 
研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤックaiesecsfc_icx2011
 
研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤックaiesecsfc_icx2011
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Anparasu
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Anparasu
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Anparasu
 

Similar to 02 Large (20)

04 Wrapup
04 Wrapup04 Wrapup
04 Wrapup
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
13 Bivariate
13 Bivariate13 Bivariate
13 Bivariate
 
Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1
 
1 basics
1 basics1 basics
1 basics
 
Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En Strategie
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And Strategy
 
01 Intro
01 Intro01 Intro
01 Intro
 
How People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersHow People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It Matters
 
14 case-study
14 case-study14 case-study
14 case-study
 
研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック
 
21 Ml
21 Ml21 Ml
21 Ml
 
研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック
 
17 polishing
17 polishing17 polishing
17 polishing
 
17 Sampling Dist
17 Sampling Dist17 Sampling Dist
17 Sampling Dist
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)
 

More from Hadley Wickham (20)

27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

02 Large

  • 1. Stat405 Graphics for large data Hadley Wickham Thursday, 26 August 2010
  • 2. Majoring in Stat • Declare early (even if you’re not sure) • Weekly lunches • Summer opportunities (research & internships) Thursday, 26 August 2010
  • 3. 1. Leftovers from last lecture 2. The diamonds data 3. Histograms and bar charts 4. More boxplots and scatterplots 5. Homework Thursday, 26 August 2010
  • 4. # Remember: start with ● ● library(ggplot2) ● 40 ● ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● 25 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter") hwy, data Thursday, 26 August 2010
  • 5. ● ● 40 ● 35 ● 30 hwy ● ● 25 ● ● ● ● 20 ● 15 ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot") reorder(class, = Thursday, 26 August 2010
  • 6. ● ● ● ● ● 40 ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 30 ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● hwy ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● qplot(reorder(class,minivan pickup suv hwy), 2seater data = subcompact hwy, midsize mpg, compact geom = c("jitter", "boxplot")) reorder(class, hwy) Thursday, 26 August 2010
  • 7. Your turn Read the help for reorder. Redraw the previous plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots? Thursday, 26 August 2010
  • 9. Diamonds data ~54,000 round diamonds from http://www.diamondse.info/ Carat, colour, clarity, cut Total depth, table, depth, width, height Price Thursday, 26 August 2010
  • 10. x table width z depth = z / diameter table = table width / x * 100 Thursday, 26 August 2010
  • 11. Recall Write down five ways to inspect the diamonds dataset. You have one minute! Thursday, 26 August 2010
  • 12. Your turn Inspect the data and familiarise yourself with the variables. If you don’t know what they mean, look them up on wikipedia. Thursday, 26 August 2010
  • 13. Histogram & bar charts Thursday, 26 August 2010
  • 14. Histograms and barcharts Used to display the distribution of a variable Categorical variable → bar chart Continuous variable → histogram Thursday, 26 August 2010
  • 15. Always experiment with the bin width! Thursday, 26 August 2010
  • 16. Examples # With only one variable, qplot guesses that # you want a bar chart or histogram qplot(cut, data = diamonds) qplot(carat, data = diamonds) qplot(carat, data = diamonds, binwidth = 1) qplot(carat, data = diamonds, binwidth = 0.1) qplot(carat, data = diamonds, binwidth = 0.01) resolution(diamonds$carat) last_plot() + xlim(0, 3) Thursday, 26 August 2010
  • 17. Examples # With only one variable, qplot guesses that # you want a bar chart or histogram qplot(cut, data = diamonds) qplot(carat, data = diamonds) qplot(carat, data = diamonds, binwidth = 1) Common ggplot2 qplot(carat, data = diamonds, binwidth = 0.1) technique: adding qplot(carat, data = diamonds, binwidth = 0.01) together plot resolution(diamonds$carat) components last_plot() + xlim(0, 3) Thursday, 26 August 2010
  • 18. qplot(table, data = diamonds, binwidth = 1) # To zoom in on a plot region use xlim() and ylim() qplot(table, data = diamonds, binwidth = 1) + xlim(50, 70) qplot(table, data = diamonds, binwidth = 0.1) + xlim(50, 70) qplot(table, data = diamonds, binwidth = 0.1) + xlim(50, 70) + ylim(0, 50) # Note that this type of zooming discards data outside of the plot regions # See coord_cartesian() for an alternative Thursday, 26 August 2010
  • 19. Additional variables As with scatterplots can use aesthetics or faceting. Using aesthetics creates pretty, but ineffective, plots. The following examples show the difference, when investigation the relationship between cut and depth. Thursday, 26 August 2010
  • 20. 4000 3000 count 2000 1000 0 56 58 60 62 64 66 68 70 qplot(depth, data = diamonds, binwidth = 0.2) depth Thursday, 26 August 2010
  • 21. 4000 3000 cut Fair Good count 2000 Very Good Premium Ideal 1000 0 qplot(depth, data = diamonds, binwidth = 0.2, 56 58 60 62 64 66 68 70 fill = cut) + xlim(55, 70) depth Thursday, 26 August 2010
  • 22. 4000 3000 cut Fair Good count 2000 Very Good Premium Ideal 1000 Fill is the aesthetic 0 for fill colour qplot(depth, data = diamonds, binwidth = 0.2, 56 58 60 62 64 66 68 70 fill = cut) + xlim(55, 70) depth Thursday, 26 August 2010
  • 23. Fair Good Very Good 2500 2000 1500 1000 500 0 count Premium Ideal 2500 2000 1500 1000 500 0 qplot(depth, 62 64 66= 68 70 56 58 60 binwidth = 0.2) + 56 58 60 data diamonds, 62 64 66 68 70 56 58 60 62 64 66 68 70 xlim(55, 70) + facet_wrap(~depth cut) Thursday, 26 August 2010
  • 24. Your turn Explore the distribution of price. How does it vary with colour, or cut, and clarity? Practice zooming in on regions of interest. Thursday, 26 August 2010
  • 25. Box and whisker plots Thursday, 26 August 2010
  • 26. Boxplots Less information than a histogram, but take up much less space. Already seen them used with discrete x values. Can also use with continuous x values, by specifying how we want the data grouped. Thursday, 26 August 2010
  • 27. qplot(table, price, data = diamonds) Thursday, 26 August 2010
  • 28. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 price 5000 50 60 70 80 90 qplot(table, price, data = diamonds, geom = "boxplot") table Thursday, 26 August 2010
  • 29. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Thursday, 26 August 2010
  • 30. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 One boxplot for each unique value of this aesthetic qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Thursday, 26 August 2010
  • 32. Interpreting a scatterplot • Global patterns • Local patterns • Deviations Thursday, 26 August 2010
  • 34. Strong linear relationship. A number of outliers. Thursday, 26 August 2010
  • 36. Unusual striations. Two groups? Little relationship between table and price? Thursday, 26 August 2010
  • 38. Curved (exponential?) relationship. Outliers mostly cheaper than expected. Thursday, 26 August 2010
  • 39. But what’s the problem with all these plots? qplot(carat, price, data = diamonds) Thursday, 26 August 2010
  • 40. But what’s the problem with all these plots? In pairs, brainstorm solutions for 2 minutes. qplot(carat, price, data = diamonds) Thursday, 26 August 2010
  • 41. Idea ggplot Small points shape = I(".") Transparency alpha = I(1/50) Jittering geom = "jitter" Smooth curve geom = "smooth" geom = "bin2d" or 2d bins geom = "hex" Density contours geom = "density2d" Thursday, 26 August 2010
  • 42. Your turn Practice doing these plots yourself. Read the online documentation for each plot type: http://had.co.nz/ggplot2 Thursday, 26 August 2010
  • 43. Homework Practice your graphics/data exploration skills with the diamonds or mpg data. Due in one week. Make sure to read the grading rubric, and find a colour printer. Thursday, 26 August 2010
  • 44. Asking questions You have two minutes to write down as many questions as you can come up with that you might want to answer about the diamonds data. Write your best question on a piece of paper and turn it in. Thursday, 26 August 2010