SlideShare a Scribd company logo
1 of 44
Download to read offline
Stat405Visualising time & space


                            Hadley Wickham
Thursday, 14 October 2010
1. New data: baby names by state
                2. Visualise time (done!)
                3. Visualise time conditional on space
                4. Visualise space
                5. Visualise space conditional on time
                6. Aside: geographic data


Thursday, 14 October 2010
Project
                    Project 2 due November 4.
                    Basically same as project 2, but will be
                    using the full play-by-play data from the
                    08/09 NBA season.
                    I expect to see lots of ddply usage, and
                    more advanced graphics (next week).



Thursday, 14 October 2010
Baby names by state
                    Top 100 male and female baby
                    names for each state, 1960–2008.
                    480,000 records
                    (100 * 50 * 2 * 48)
                    Slightly different variables: state,
                    year, name, sex and number.

                                          CC BY http://www.flickr.com/photos/the_light_show/2586781132
Thursday, 14 October 2010
Subset

                    Easier to compare states if we have
                    proportions. To calculate proportions,
                    need births. Could only find data from
                    1981.
                    Selected 30 names that occurred fairly
                    frequently, and had interesting patterns.



Thursday, 14 October 2010
Aaron Alex Allison Alyssa Angela Ashley
                    Carlos Chelsea Christian Eric Evan
                    Gabriel Jacob Jared Jennifer Jonathan
                    Juan Katherine Kelsey Kevin Matthew
                    Michelle Natalie Nicholas Noah Rebecca
                    Sara Sarah Taylor Thomas




Thursday, 14 October 2010
Getting started

                library(ggplot2)
                library(plyr)

                bnames <- read.csv("interesting-names.csv",
                  stringsAsFactors = F)

                matthew <- subset(bnames, name == "Matthew")




Thursday, 14 October 2010
Time |
                            Space
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




                            1985   1990          1995   2000   2005
                                          year
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




                            1985   1990   1995   2000      2005
qplot(year, prop, data = matthew,year
                                  geom = "line", group = state)
Thursday, 14 October 2010
AK       AL         AR         AZ          CA        CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                     DE       FL         GA         HI          IA        ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                     KS       KY         LA         MA          MD        ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                    MO       MS          MT         NC          NE        NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                     NV       NY         OH         OK          OR        PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                     SD       TN         TX         UT          VA        VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                    WV       WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005
                                                         year
Thursday, 14 October 2010
AK       AL         AR         AZ         CA         CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                     DE       FL         GA         HI         IA         ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                     KS       KY         LA         MA         MD         ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                    MO       MS          MT         NC         NE         NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                     NV       NY         OH         OK         OR         PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                     SD       TN         TX         UT         VA         VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                    WV       WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005

last_plot() + facet_wrap(~ state)year
Thursday, 14 October 2010
Your turn

                    Ensure that you can re-create these plots
                    for other names. What do you see?
                    Can you write a function that plots the
                    trend for a given name?




Thursday, 14 October 2010
show_name <- function(name) {
       name <- bnames[bnames$name == name, ]
       qplot(year, prop, data = name, geom = "line",
         group = state)
     }

     show_name("Jessica")
     show_name("Aaron")
     show_name("Juan") + facet_wrap(~ state)




Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




                            1985   1990          1995   2000   2005
                                          year
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990         =          group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                 = F,
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01


                         So we only get one smooth
                             for the whole dataset
qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990          =         group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                   = F,
Thursday, 14 October 2010
Three useful tools
                    Smoothing: can be easier to perceive
                    overall trend by smoothing individual
                    functions
                    Centering: remove differences in center
                    by subtracting mean
                    Scaling: remove differences in range by
                    dividing by sd, or by scaling to [0, 1]


Thursday, 14 October 2010
library(mgcv)
     smooth <- function(y, x, amount = 0.1) {
       mod <- gam(y ~ s(x, bs = "cr"), sp = amount)
       as.numeric(predict(mod))
     }

     matthew <- ddply(matthew, "state", transform,
       prop_s1 = smooth(prop, year, amount = 0.01),
       prop_s2 = smooth(prop, year, amount = 0.1),
       prop_s3 = smooth(prop, year, amount = 1),
       prop_s4 = smooth(prop, year, amount = 10))

     qplot(year, prop_s1, data = matthew, geom = "line",
       group = state)

Thursday, 14 October 2010
center <- function(x) x - mean(x, na.rm = T)

     matthew <- ddply(matthew, "state", transform,
       prop_c = center(prop),
       prop_sc = center(prop_s1))

     qplot(year, prop_c, data = matthew, geom = "line",
       group = state)
     qplot(year, prop_sc, data = matthew, geom = "line",
       group = state)




Thursday, 14 October 2010
scale <- function(x) x / sd(x, na.rm = T)
     scale01 <- function(x) {
       rng <- range(x, na.rm = T)
       (x - rng[1]) / (rng[2] - rng[1])
     }

     matthew <- ddply(matthew, "state", transform,
       prop_ss = scale01(prop_s1))

     qplot(year, prop_ss, data = matthew, geom = "line",
       group = state)



Thursday, 14 October 2010
Your turn
                    Create a plot to show all names
                    simultaneously. Does smoothing every
                    name in every state make it easier to see
                    patterns?
                    Hint: run the following R code on the next
                    slide to eliminate names with less than 10
                    years of data


Thursday, 14 October 2010
longterm <- ddply(bnames, c("name", "state"),
     function(df) {
        if (nrow(df) > 10) {
          df
        }
     })




Thursday, 14 October 2010
qplot(year, prop, data = bnames, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)

     longterm <- ddply(longterm, c("name", "state"),
       transform, prop_s = smooth(prop, year))

     qplot(year, prop_s, data = longterm, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)
     last_plot() + facet_wrap(~ name, scales = "free_y")



Thursday, 14 October 2010
Space

Thursday, 14 October 2010
Spatial plots

                    Choropleth map:
                    map colour of areas to value.
                    Proportional symbol map:
                    map size of symbols to value




Thursday, 14 October 2010
juan2000 <- subset(bnames, name == "Juan" &
       year == 2000)

     # Turn map data into normal data frame
     library(maps)
     states <- map_data("state")
     states$state <- state.abb[match(states$region,
       tolower(state.name))]

     # Join datasets
     choropleth <- join(states, juan2000, by = "state")

     # Plot with polygons
     qplot(long, lat, data = choropleth, geom = "polygon",
       fill = prop, group = group)

Thursday, 14 October 2010
45




       40
                                                                  prop
                                                                         0.004
                                                                         0.006
 lat




                                                                         0.008
       35
                                                                          0.01




       30




                        βˆ’120   βˆ’110   βˆ’100      βˆ’90   βˆ’80   βˆ’70
                                         long
Thursday, 14 October 2010
What’s the problem
                                                    with this map?
       45                                       How could we fix it?


       40
                                                                      prop
                                                                             0.004
                                                                             0.006
 lat




                                                                             0.008
       35
                                                                              0.01




       30




                        βˆ’120   βˆ’110   βˆ’100      βˆ’90    βˆ’80    βˆ’70
                                         long
Thursday, 14 October 2010
ggplot(choropleth, aes(long, lat, group = group)) +
       geom_polygon(fill = "white", colour = "grey50") +
       geom_polygon(aes(fill = prop))




Thursday, 14 October 2010
45




       40
                                                                  prop
                                                                         0.004
                                                                         0.006
 lat




                                                                         0.008
       35
                                                                          0.01




       30




                        βˆ’120   βˆ’110   βˆ’100      βˆ’90   βˆ’80   βˆ’70
                                         long
Thursday, 14 October 2010
Problems?

                    What are the problems with this sort of
                    plot?
                    Take one minute to brainstorm some
                    possible issues.




Thursday, 14 October 2010
Problems
                    Big areas most striking. But in the US (as
                    with most countries) big areas tend to
                    least populated. Most populated areas
                    tend to be small and dense - e.g. the East
                    coast.
                    (Another computational problem: need to
                    push around a lot of data to create these
                    plots)


Thursday, 14 October 2010
●




       45
                            ●

                                                                                    ●




                                                                                            ●
                                                     ●




       40                                                        ●
                                                                                        ●



                                               ●                                                      prop
                                    ●                    ●
                                                                                                       ●     0.004
                                ●                                                                     ●      0.006
 lat




                                                         ●                     ●
                                                                                                      ●      0.008
       35
                                        ●      ●                                                      ●      0.010

                                                                          ●




                                                   ●
       30

                                                                      ●




                        βˆ’120            βˆ’110       βˆ’100         βˆ’90           βˆ’80               βˆ’70
                                                         long
Thursday, 14 October 2010
mid_range <- function(x) mean(range(x))
     centres <- ddply(states, c("state"), summarise,
       lat = mid_range(lat), long = mid_range(long))

     bubble <- join(juan2000, centres, by = "state")
     qplot(long, lat, data = bubble,
       size = prop, colour = prop)

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey50") +
       geom_point(aes(size = prop, colour = prop))


Thursday, 14 October 2010
Your turn


                    Replicate either a choropleth or a
                    proportional symbol map with the name
                    of your choice.




Thursday, 14 October 2010
Space |
                             Time
Thursday, 14 October 2010
Thursday, 14 October 2010
Your turn


                    Try and create this plot yourself. What is
                    the main difference between this plot and
                    the previous?




Thursday, 14 October 2010
juan <- subset(bnames, name == "Juan")
     bubble <- merge(juan, centres, by = "state")

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey80") +
       geom_point(aes(colour = prop)) +
       facet_wrap(~ year)




Thursday, 14 October 2010
Aside: geographic data

                    Boundaries for most countries available
                    from: http://gadm.org
                    To use with ggplot2, use the fortify
                    function to convert to usual data frame.
                    Will also need to install the sp package.



Thursday, 14 October 2010
# install.packages("sp")

     library(sp)
     load(url("http://gadm.org/data/rda/CHE_adm1.RData"))

     head(as.data.frame(gadm))
     ch <- fortify(gadm, region = "ID_1")
     str(ch)

     qplot(long, lat, group = group, data = ch,
       geom = "polygon", colour = I("white"))



Thursday, 14 October 2010
Thursday, 14 October 2010
This work is licensed under the Creative
       Commons Attribution-Noncommercial 3.0 United
       States License. To view a copy of this license,
       visit http://creativecommons.org/licenses/by-nc/
       3.0/us/ or send a letter to Creative Commons,
       171 Second Street, Suite 300, San Francisco,
       California, 94105, USA.




Thursday, 14 October 2010

More Related Content

More from Hadley Wickham (20)

23 data-structures
23 data-structures23 data-structures
23 data-structures
Β 
R packages
R packagesR packages
R packages
Β 
22 spam
22 spam22 spam
22 spam
Β 
21 spam
21 spam21 spam
21 spam
Β 
19 tables
19 tables19 tables
19 tables
Β 
17 polishing
17 polishing17 polishing
17 polishing
Β 
14 case-study
14 case-study14 case-study
14 case-study
Β 
13 case-study
13 case-study13 case-study
13 case-study
Β 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
Β 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
Β 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
Β 
10 simulation
10 simulation10 simulation
10 simulation
Β 
10 simulation
10 simulation10 simulation
10 simulation
Β 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
Β 
08 functions
08 functions08 functions
08 functions
Β 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
Β 
06 data
06 data06 data
06 data
Β 
05 subsetting
05 subsetting05 subsetting
05 subsetting
Β 
04 reports
04 reports04 reports
04 reports
Β 
03 extensions
03 extensions03 extensions
03 extensions
Β 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Β 

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Β 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Β 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Β 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Β 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Β 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Β 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
Β 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Β 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Β 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Β 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Β 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Β 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Β 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Β 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
Β 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Β 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Β 

15 time-space

  • 1. Stat405Visualising time & space Hadley Wickham Thursday, 14 October 2010
  • 2. 1. New data: baby names by state 2. Visualise time (done!) 3. Visualise time conditional on space 4. Visualise space 5. Visualise space conditional on time 6. Aside: geographic data Thursday, 14 October 2010
  • 3. Project Project 2 due November 4. Basically same as project 2, but will be using the full play-by-play data from the 08/09 NBA season. I expect to see lots of ddply usage, and more advanced graphics (next week). Thursday, 14 October 2010
  • 4. Baby names by state Top 100 male and female baby names for each state, 1960–2008. 480,000 records (100 * 50 * 2 * 48) Slightly different variables: state, year, name, sex and number. CC BY http://www.flickr.com/photos/the_light_show/2586781132 Thursday, 14 October 2010
  • 5. Subset Easier to compare states if we have proportions. To calculate proportions, need births. Could only find data from 1981. Selected 30 names that occurred fairly frequently, and had interesting patterns. Thursday, 14 October 2010
  • 6. Aaron Alex Allison Alyssa Angela Ashley Carlos Chelsea Christian Eric Evan Gabriel Jacob Jared Jennifer Jonathan Juan Katherine Kelsey Kevin Matthew Michelle Natalie Nicholas Noah Rebecca Sara Sarah Taylor Thomas Thursday, 14 October 2010
  • 7. Getting started library(ggplot2) library(plyr) bnames <- read.csv("interesting-names.csv", stringsAsFactors = F) matthew <- subset(bnames, name == "Matthew") Thursday, 14 October 2010
  • 8. Time | Space Thursday, 14 October 2010
  • 9. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Thursday, 14 October 2010
  • 10. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 qplot(year, prop, data = matthew,year geom = "line", group = state) Thursday, 14 October 2010
  • 11. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 year Thursday, 14 October 2010
  • 12. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 last_plot() + facet_wrap(~ state)year Thursday, 14 October 2010
  • 13. Your turn Ensure that you can re-create these plots for other names. What do you see? Can you write a function that plots the trend for a given name? Thursday, 14 October 2010
  • 14. show_name <- function(name) { name <- bnames[bnames$name == name, ] qplot(year, prop, data = name, geom = "line", group = state) } show_name("Jessica") show_name("Aaron") show_name("Juan") + facet_wrap(~ state) Thursday, 14 October 2010
  • 15. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Thursday, 14 October 2010
  • 16. 0.04 0.03 prop 0.02 0.01 qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Thursday, 14 October 2010
  • 17. 0.04 0.03 prop 0.02 0.01 So we only get one smooth for the whole dataset qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Thursday, 14 October 2010
  • 18. Three useful tools Smoothing: can be easier to perceive overall trend by smoothing individual functions Centering: remove differences in center by subtracting mean Scaling: remove differences in range by dividing by sd, or by scaling to [0, 1] Thursday, 14 October 2010
  • 19. library(mgcv) smooth <- function(y, x, amount = 0.1) { mod <- gam(y ~ s(x, bs = "cr"), sp = amount) as.numeric(predict(mod)) } matthew <- ddply(matthew, "state", transform, prop_s1 = smooth(prop, year, amount = 0.01), prop_s2 = smooth(prop, year, amount = 0.1), prop_s3 = smooth(prop, year, amount = 1), prop_s4 = smooth(prop, year, amount = 10)) qplot(year, prop_s1, data = matthew, geom = "line", group = state) Thursday, 14 October 2010
  • 20. center <- function(x) x - mean(x, na.rm = T) matthew <- ddply(matthew, "state", transform, prop_c = center(prop), prop_sc = center(prop_s1)) qplot(year, prop_c, data = matthew, geom = "line", group = state) qplot(year, prop_sc, data = matthew, geom = "line", group = state) Thursday, 14 October 2010
  • 21. scale <- function(x) x / sd(x, na.rm = T) scale01 <- function(x) { rng <- range(x, na.rm = T) (x - rng[1]) / (rng[2] - rng[1]) } matthew <- ddply(matthew, "state", transform, prop_ss = scale01(prop_s1)) qplot(year, prop_ss, data = matthew, geom = "line", group = state) Thursday, 14 October 2010
  • 22. Your turn Create a plot to show all names simultaneously. Does smoothing every name in every state make it easier to see patterns? Hint: run the following R code on the next slide to eliminate names with less than 10 years of data Thursday, 14 October 2010
  • 23. longterm <- ddply(bnames, c("name", "state"), function(df) { if (nrow(df) > 10) { df } }) Thursday, 14 October 2010
  • 24. qplot(year, prop, data = bnames, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) longterm <- ddply(longterm, c("name", "state"), transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = longterm, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) last_plot() + facet_wrap(~ name, scales = "free_y") Thursday, 14 October 2010
  • 26. Spatial plots Choropleth map: map colour of areas to value. Proportional symbol map: map size of symbols to value Thursday, 14 October 2010
  • 27. juan2000 <- subset(bnames, name == "Juan" & year == 2000) # Turn map data into normal data frame library(maps) states <- map_data("state") states$state <- state.abb[match(states$region, tolower(state.name))] # Join datasets choropleth <- join(states, juan2000, by = "state") # Plot with polygons qplot(long, lat, data = choropleth, geom = "polygon", fill = prop, group = group) Thursday, 14 October 2010
  • 28. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 βˆ’120 βˆ’110 βˆ’100 βˆ’90 βˆ’80 βˆ’70 long Thursday, 14 October 2010
  • 29. What’s the problem with this map? 45 How could we fix it? 40 prop 0.004 0.006 lat 0.008 35 0.01 30 βˆ’120 βˆ’110 βˆ’100 βˆ’90 βˆ’80 βˆ’70 long Thursday, 14 October 2010
  • 30. ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(fill = "white", colour = "grey50") + geom_polygon(aes(fill = prop)) Thursday, 14 October 2010
  • 31. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 βˆ’120 βˆ’110 βˆ’100 βˆ’90 βˆ’80 βˆ’70 long Thursday, 14 October 2010
  • 32. Problems? What are the problems with this sort of plot? Take one minute to brainstorm some possible issues. Thursday, 14 October 2010
  • 33. Problems Big areas most striking. But in the US (as with most countries) big areas tend to least populated. Most populated areas tend to be small and dense - e.g. the East coast. (Another computational problem: need to push around a lot of data to create these plots) Thursday, 14 October 2010
  • 34. ● 45 ● ● ● ● 40 ● ● ● prop ● ● ● 0.004 ● ● 0.006 lat ● ● ● 0.008 35 ● ● ● 0.010 ● ● 30 ● βˆ’120 βˆ’110 βˆ’100 βˆ’90 βˆ’80 βˆ’70 long Thursday, 14 October 2010
  • 35. mid_range <- function(x) mean(range(x)) centres <- ddply(states, c("state"), summarise, lat = mid_range(lat), long = mid_range(long)) bubble <- join(juan2000, centres, by = "state") qplot(long, lat, data = bubble, size = prop, colour = prop) ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) Thursday, 14 October 2010
  • 36. Your turn Replicate either a choropleth or a proportional symbol map with the name of your choice. Thursday, 14 October 2010
  • 37. Space | Time Thursday, 14 October 2010
  • 39. Your turn Try and create this plot yourself. What is the main difference between this plot and the previous? Thursday, 14 October 2010
  • 40. juan <- subset(bnames, name == "Juan") bubble <- merge(juan, centres, by = "state") ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey80") + geom_point(aes(colour = prop)) + facet_wrap(~ year) Thursday, 14 October 2010
  • 41. Aside: geographic data Boundaries for most countries available from: http://gadm.org To use with ggplot2, use the fortify function to convert to usual data frame. Will also need to install the sp package. Thursday, 14 October 2010
  • 42. # install.packages("sp") library(sp) load(url("http://gadm.org/data/rda/CHE_adm1.RData")) head(as.data.frame(gadm)) ch <- fortify(gadm, region = "ID_1") str(ch) qplot(long, lat, group = group, data = ch, geom = "polygon", colour = I("white")) Thursday, 14 October 2010
  • 44. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Thursday, 14 October 2010