Successfully reported this slideshow.
Stat405
              Visualising time & space


                             Hadley Wickham
Wednesday, 21 October 2009
1. New data: baby names by state
               2. Visualise time (done!)
               3. Visualise time conditional on ...
Baby names by state
                   Top 100 male and female baby names
                   for each state, 1960–2008.
  ...
Subset
                   Easier to compare states if we have
                   proportions. To calculate proportions,
  ...
Aaron Alex Allison Alyssa Angela Ashley
                   Carlos Chelsea Christian Eric Evan
                   Gabriel J...
Getting started

               library(ggplot2)
               library(plyr)

               bnames <- read.csv("interest...
Time | Space



Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990          1995  ...
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990   1995   2000  ...
AK        AL         AR         AZ          CA        CO         CT         DC
        0.04
        0.03
        0.02
    ...
AK        AL         AR         AZ         CA         CO         CT         DC
        0.04
        0.03
        0.02
    ...
Your turn

                   Pick some names out of the list and
                   explore. What do you see?
           ...
show_name <- function(name) {
       name <- bnames[bnames$name == name, ]
       qplot(year, prop, data = name, geom = "l...
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990          1995  ...
0.04




        0.03
 prop




        0.02




        0.01




qplot(year, prop, data = matthew, geom1995 "line", 2000
...
0.04




        0.03
 prop




        0.02




        0.01


                         So we only get one smooth
       ...
Two useful tools
                   Smoothing: can be easier to perceive
                   overall trend by smoothing ind...
library(mgcv)
     smooth <- function(y, x) {
       as.numeric(predict(gam(y ~ s(x))))
     }

     matthew <- ddply(matt...
index <- function(y, x) {
       y / y[order(x)[1]]
     }

     matthew <- ddply(matthew, "state", transform,
       prop...
Your turn
                   Create a plot to show all names
                   simultaneously. Does smoothing every
     ...
longterm <- ddply(bnames, c("name", "state"),
     function(df) {
        if (nrow(df) > 10) {
          df
        }
    ...
qplot(year, prop, data = bnames, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)

     ...
Space



Wednesday, 21 October 2009
Spatial plots

                   Choropleth map:
                   map colour of areas to value.
                   Prop...
juan2000 <- subset(bnames, name == "Juan" & year == 2000)

     # Turn map data into normal data frame
     library(maps)
...
45




       40
                                                                 prop
                                   ...
What’s the problem
                                                   with this map?
       45                            ...
ggplot(choropleth, aes(long, lat, group = group)) +
       geom_polygon(fill = "white", colour = "grey50") +
       geom_p...
45




       40
                                                                 prop
                                   ...
Problems?

                   What are the problems with this sort of
                   plot?
                   Take one...
Problems
                   Big areas most striking. But in the US (as
                   with most countries) big areas t...
●




       45
                        ●

                                                                               ...
mid_range <- function(x) mean(range(x))
     centres <- ddply(states, c("state"), summarise,
       lat = mid_range(lat), ...
Your turn


                   Replicate either a choropleth or a
                   proportional symbol map with the name...
Space | Time



Wednesday, 21 October 2009
Wednesday, 21 October 2009
Your turn


                   Try and create this plot yourself. What is
                   the main difference between t...
juan <- subset(bnames, name == "Juan")
     bubble <- merge(juan, centres, by = "state")

     ggplot(bubble, aes(long, la...
Aside: geographic data

                   Boundaries for most countries available
                   from: http://gadm.or...
# install.packages("sp")

     library(sp)
     load(url("http://gadm.org/data/rda/CHE_adm1.RData"))

     head(as.data.fr...
Wednesday, 21 October 2009
This work is licensed under the Creative
      Commons Attribution-Noncommercial 3.0 United
      States License. To view ...
Upcoming SlideShare
Loading in …5
×

15 Time Space

1,311 views

Published on

Published in: Technology, Education
  • Be the first to comment

15 Time Space

  1. 1. Stat405 Visualising time & space Hadley Wickham Wednesday, 21 October 2009
  2. 2. 1. New data: baby names by state 2. Visualise time (done!) 3. Visualise time conditional on space 4. Visualise space 5. Visualise space conditional on time Wednesday, 21 October 2009
  3. 3. Baby names by state Top 100 male and female baby names for each state, 1960–2008. 480,000 records (100 * 50 * 2 * 48) Slightly different variables: state, year, name, sex and number. To keep the data manageable, we will look at the top 25 names of all time. CC BY http://www.flickr.com/photos/the_light_show/2586781132 Wednesday, 21 October 2009
  4. 4. Subset Easier to compare states if we have proportions. To calculate proportions, need births. But could only find data from 1981. Then selected 30 names that occurred fairly frequently, and had interesting patterns. Wednesday, 21 October 2009
  5. 5. Aaron Alex Allison Alyssa Angela Ashley Carlos Chelsea Christian Eric Evan Gabriel Jacob Jared Jennifer Jonathan Juan Katherine Kelsey Kevin Matthew Michelle Natalie Nicholas Noah Rebecca Sara Sarah Taylor Thomas Wednesday, 21 October 2009
  6. 6. Getting started library(ggplot2) library(plyr) bnames <- read.csv("interesting-names.csv", stringsAsFactors = F) matthew <- subset(bnames, name == "Matthew") Wednesday, 21 October 2009
  7. 7. Time | Space Wednesday, 21 October 2009
  8. 8. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Wednesday, 21 October 2009
  9. 9. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 qplot(year, prop, data = matthew,year geom = "line", group = state) Wednesday, 21 October 2009
  10. 10. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 year Wednesday, 21 October 2009
  11. 11. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 last_plot() + facet_wrap(~ state)year Wednesday, 21 October 2009
  12. 12. Your turn Pick some names out of the list and explore. What do you see? Can you write a function that plots the trend for a given name? Wednesday, 21 October 2009
  13. 13. show_name <- function(name) { name <- bnames[bnames$name == name, ] qplot(year, prop, data = name, geom = "line", group = state) } show_name("Jessica") show_name("Aaron") show_name("Juan") + facet_wrap(~ state) Wednesday, 21 October 2009
  14. 14. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Wednesday, 21 October 2009
  15. 15. 0.04 0.03 prop 0.02 0.01 qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Wednesday, 21 October 2009
  16. 16. 0.04 0.03 prop 0.02 0.01 So we only get one smooth for the whole dataset qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Wednesday, 21 October 2009
  17. 17. Two useful tools Smoothing: can be easier to perceive overall trend by smoothing individual functions Indexing: remove initial differences by indexing to the first value. Not that useful here, but good tools to have in your toolbox. Wednesday, 21 October 2009
  18. 18. library(mgcv) smooth <- function(y, x) { as.numeric(predict(gam(y ~ s(x)))) } matthew <- ddply(matthew, "state", transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = matthew, geom = "line", group = state) Wednesday, 21 October 2009
  19. 19. index <- function(y, x) { y / y[order(x)[1]] } matthew <- ddply(matthew, "state", transform, prop_i = index(prop, year), prop_si = index(prop_s, year)) qplot(year, prop_i, data = matthew, geom = "line", group = state) qplot(year, prop_si, data = matthew, geom = "line", group = state) Wednesday, 21 October 2009
  20. 20. Your turn Create a plot to show all names simultaneously. Does smoothing every name in every state make it easier to see patterns? Hint: run the following R code on the next slide to eliminate names with less than 10 years of data Wednesday, 21 October 2009
  21. 21. longterm <- ddply(bnames, c("name", "state"), function(df) { if (nrow(df) > 10) { df } }) Wednesday, 21 October 2009
  22. 22. qplot(year, prop, data = bnames, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) longterm <- ddply(longterm, c("name", "state"), transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = longterm, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) last_plot() + facet_wrap(scales = "free_y") Wednesday, 21 October 2009
  23. 23. Space Wednesday, 21 October 2009
  24. 24. Spatial plots Choropleth map: map colour of areas to value. Proportional symbol map: map size of symbols to value Wednesday, 21 October 2009
  25. 25. juan2000 <- subset(bnames, name == "Juan" & year == 2000) # Turn map data into normal data frame library(maps) states <- map_data("state") states$state <- state.abb[match(states$region, tolower(state.name))] # Merge and then restore original order choropleth <- merge(juan2000, states, by = "state", all.y = T) choropleth <- choropleth[order(choropleth$order), ] # Plot with polygons qplot(long, lat, data = choropleth, geom = "polygon", fill = prop, group = group) Wednesday, 21 October 2009
  26. 26. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  27. 27. What’s the problem with this map? 45 How could we fix it? 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  28. 28. ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(fill = "white", colour = "grey50") + geom_polygon(aes(fill = prop)) Wednesday, 21 October 2009
  29. 29. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  30. 30. Problems? What are the problems with this sort of plot? Take one minute to brainstorm some possible issues. Wednesday, 21 October 2009
  31. 31. Problems Big areas most striking. But in the US (as with most countries) big areas tend to least populated. Most populated areas tend to be small and dense - e.g. the East coast. (Another computational problem: need to push around a lot of data to create these plots) Wednesday, 21 October 2009
  32. 32. ● 45 ● ● ● ● 40 ● ● ● prop ● ● ● 0.004 ● ● 0.006 lat ● ● ● 0.008 35 ● ● ● 0.010 ● ● 30 ● −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  33. 33. mid_range <- function(x) mean(range(x)) centres <- ddply(states, c("state"), summarise, lat = mid_range(lat), long = mid_range(long)) bubble <- merge(juan2000, centres, by = "state") qplot(long, lat, data = bubble, size = prop, colour = prop) ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) Wednesday, 21 October 2009
  34. 34. Your turn Replicate either a choropleth or a proportional symbol map with the name of your choice. Wednesday, 21 October 2009
  35. 35. Space | Time Wednesday, 21 October 2009
  36. 36. Wednesday, 21 October 2009
  37. 37. Your turn Try and create this plot yourself. What is the main difference between this plot and the previous? Wednesday, 21 October 2009
  38. 38. juan <- subset(bnames, name == "Juan") bubble <- merge(juan, centres, by = "state") ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) + facet_wrap(~ year) Wednesday, 21 October 2009
  39. 39. Aside: geographic data Boundaries for most countries available from: http://gadm.org To use with ggplot2, use the fortify function to convert to usual data frame. Will also need to install the sp package. Wednesday, 21 October 2009
  40. 40. # install.packages("sp") library(sp) load(url("http://gadm.org/data/rda/CHE_adm1.RData")) head(as.data.frame(gadm)) ch <- fortify(gadm, region = "ID_1") str(ch) qplot(long, lat, group = group, data = ch, geom = "polygon", colour = I("white")) Wednesday, 21 October 2009
  41. 41. Wednesday, 21 October 2009
  42. 42. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Wednesday, 21 October 2009

×