SlideShare a Scribd company logo
1 of 44
Download to read offline
Stat405Visualising time & space


                            Hadley Wickham
Thursday, 14 October 2010
1. New data: baby names by state
                2. Visualise time (done!)
                3. Visualise time conditional on space
                4. Visualise space
                5. Visualise space conditional on time
                6. Aside: geographic data


Thursday, 14 October 2010
Project
                    Project 2 due November 4.
                    Basically same as project 2, but will be
                    using the full play-by-play data from the
                    08/09 NBA season.
                    I expect to see lots of ddply usage, and
                    more advanced graphics (next week).



Thursday, 14 October 2010
Baby names by state
                    Top 100 male and female baby
                    names for each state, 1960–2008.
                    480,000 records
                    (100 * 50 * 2 * 48)
                    Slightly different variables: state,
                    year, name, sex and number.

                                          CC BY http://www.flickr.com/photos/the_light_show/2586781132
Thursday, 14 October 2010
Subset

                    Easier to compare states if we have
                    proportions. To calculate proportions,
                    need births. Could only find data from
                    1981.
                    Selected 30 names that occurred fairly
                    frequently, and had interesting patterns.



Thursday, 14 October 2010
Aaron Alex Allison Alyssa Angela Ashley
                    Carlos Chelsea Christian Eric Evan
                    Gabriel Jacob Jared Jennifer Jonathan
                    Juan Katherine Kelsey Kevin Matthew
                    Michelle Natalie Nicholas Noah Rebecca
                    Sara Sarah Taylor Thomas




Thursday, 14 October 2010
Getting started

                library(ggplot2)
                library(plyr)

                bnames <- read.csv("interesting-names.csv",
                  stringsAsFactors = F)

                matthew <- subset(bnames, name == "Matthew")




Thursday, 14 October 2010
Time |
                            Space
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




                            1985   1990          1995   2000   2005
                                          year
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




                            1985   1990   1995   2000      2005
qplot(year, prop, data = matthew,year
                                  geom = "line", group = state)
Thursday, 14 October 2010
AK       AL         AR         AZ          CA        CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                     DE       FL         GA         HI          IA        ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                     KS       KY         LA         MA          MD        ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                    MO       MS          MT         NC          NE        NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                     NV       NY         OH         OK          OR        PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                     SD       TN         TX         UT          VA        VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                    WV       WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005
                                                         year
Thursday, 14 October 2010
AK       AL         AR         AZ         CA         CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                     DE       FL         GA         HI         IA         ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                     KS       KY         LA         MA         MD         ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                    MO       MS          MT         NC         NE         NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                     NV       NY         OH         OK         OR         PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                     SD       TN         TX         UT         VA         VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                    WV       WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005

last_plot() + facet_wrap(~ state)year
Thursday, 14 October 2010
Your turn

                    Ensure that you can re-create these plots
                    for other names. What do you see?
                    Can you write a function that plots the
                    trend for a given name?




Thursday, 14 October 2010
show_name <- function(name) {
       name <- bnames[bnames$name == name, ]
       qplot(year, prop, data = name, geom = "line",
         group = state)
     }

     show_name("Jessica")
     show_name("Aaron")
     show_name("Juan") + facet_wrap(~ state)




Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




                            1985   1990          1995   2000   2005
                                          year
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01




qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990         =          group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                 = F,
Thursday, 14 October 2010
0.04




        0.03
 prop




        0.02




        0.01


                         So we only get one smooth
                             for the whole dataset
qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990          =         group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                   = F,
Thursday, 14 October 2010
Three useful tools
                    Smoothing: can be easier to perceive
                    overall trend by smoothing individual
                    functions
                    Centering: remove differences in center
                    by subtracting mean
                    Scaling: remove differences in range by
                    dividing by sd, or by scaling to [0, 1]


Thursday, 14 October 2010
library(mgcv)
     smooth <- function(y, x, amount = 0.1) {
       mod <- gam(y ~ s(x, bs = "cr"), sp = amount)
       as.numeric(predict(mod))
     }

     matthew <- ddply(matthew, "state", transform,
       prop_s1 = smooth(prop, year, amount = 0.01),
       prop_s2 = smooth(prop, year, amount = 0.1),
       prop_s3 = smooth(prop, year, amount = 1),
       prop_s4 = smooth(prop, year, amount = 10))

     qplot(year, prop_s1, data = matthew, geom = "line",
       group = state)

Thursday, 14 October 2010
center <- function(x) x - mean(x, na.rm = T)

     matthew <- ddply(matthew, "state", transform,
       prop_c = center(prop),
       prop_sc = center(prop_s1))

     qplot(year, prop_c, data = matthew, geom = "line",
       group = state)
     qplot(year, prop_sc, data = matthew, geom = "line",
       group = state)




Thursday, 14 October 2010
scale <- function(x) x / sd(x, na.rm = T)
     scale01 <- function(x) {
       rng <- range(x, na.rm = T)
       (x - rng[1]) / (rng[2] - rng[1])
     }

     matthew <- ddply(matthew, "state", transform,
       prop_ss = scale01(prop_s1))

     qplot(year, prop_ss, data = matthew, geom = "line",
       group = state)



Thursday, 14 October 2010
Your turn
                    Create a plot to show all names
                    simultaneously. Does smoothing every
                    name in every state make it easier to see
                    patterns?
                    Hint: run the following R code on the next
                    slide to eliminate names with less than 10
                    years of data


Thursday, 14 October 2010
longterm <- ddply(bnames, c("name", "state"),
     function(df) {
        if (nrow(df) > 10) {
          df
        }
     })




Thursday, 14 October 2010
qplot(year, prop, data = bnames, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)

     longterm <- ddply(longterm, c("name", "state"),
       transform, prop_s = smooth(prop, year))

     qplot(year, prop_s, data = longterm, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)
     last_plot() + facet_wrap(~ name, scales = "free_y")



Thursday, 14 October 2010
Space

Thursday, 14 October 2010
Spatial plots

                    Choropleth map:
                    map colour of areas to value.
                    Proportional symbol map:
                    map size of symbols to value




Thursday, 14 October 2010
juan2000 <- subset(bnames, name == "Juan" &
       year == 2000)

     # Turn map data into normal data frame
     library(maps)
     states <- map_data("state")
     states$state <- state.abb[match(states$region,
       tolower(state.name))]

     # Join datasets
     choropleth <- join(states, juan2000, by = "state")

     # Plot with polygons
     qplot(long, lat, data = choropleth, geom = "polygon",
       fill = prop, group = group)

Thursday, 14 October 2010
45




       40
                                                                  prop
                                                                         0.004
                                                                         0.006
 lat




                                                                         0.008
       35
                                                                          0.01




       30




                        −120   −110   −100      −90   −80   −70
                                         long
Thursday, 14 October 2010
What’s the problem
                                                    with this map?
       45                                       How could we fix it?


       40
                                                                      prop
                                                                             0.004
                                                                             0.006
 lat




                                                                             0.008
       35
                                                                              0.01




       30




                        −120   −110   −100      −90    −80    −70
                                         long
Thursday, 14 October 2010
ggplot(choropleth, aes(long, lat, group = group)) +
       geom_polygon(fill = "white", colour = "grey50") +
       geom_polygon(aes(fill = prop))




Thursday, 14 October 2010
45




       40
                                                                  prop
                                                                         0.004
                                                                         0.006
 lat




                                                                         0.008
       35
                                                                          0.01




       30




                        −120   −110   −100      −90   −80   −70
                                         long
Thursday, 14 October 2010
Problems?

                    What are the problems with this sort of
                    plot?
                    Take one minute to brainstorm some
                    possible issues.




Thursday, 14 October 2010
Problems
                    Big areas most striking. But in the US (as
                    with most countries) big areas tend to
                    least populated. Most populated areas
                    tend to be small and dense - e.g. the East
                    coast.
                    (Another computational problem: need to
                    push around a lot of data to create these
                    plots)


Thursday, 14 October 2010
●




       45
                            ●

                                                                                    ●




                                                                                            ●
                                                     ●




       40                                                        ●
                                                                                        ●



                                               ●                                                      prop
                                    ●                    ●
                                                                                                       ●     0.004
                                ●                                                                     ●      0.006
 lat




                                                         ●                     ●
                                                                                                      ●      0.008
       35
                                        ●      ●                                                      ●      0.010

                                                                          ●




                                                   ●
       30

                                                                      ●




                        −120            −110       −100         −90           −80               −70
                                                         long
Thursday, 14 October 2010
mid_range <- function(x) mean(range(x))
     centres <- ddply(states, c("state"), summarise,
       lat = mid_range(lat), long = mid_range(long))

     bubble <- join(juan2000, centres, by = "state")
     qplot(long, lat, data = bubble,
       size = prop, colour = prop)

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey50") +
       geom_point(aes(size = prop, colour = prop))


Thursday, 14 October 2010
Your turn


                    Replicate either a choropleth or a
                    proportional symbol map with the name
                    of your choice.




Thursday, 14 October 2010
Space |
                             Time
Thursday, 14 October 2010
Thursday, 14 October 2010
Your turn


                    Try and create this plot yourself. What is
                    the main difference between this plot and
                    the previous?




Thursday, 14 October 2010
juan <- subset(bnames, name == "Juan")
     bubble <- merge(juan, centres, by = "state")

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey80") +
       geom_point(aes(colour = prop)) +
       facet_wrap(~ year)




Thursday, 14 October 2010
Aside: geographic data

                    Boundaries for most countries available
                    from: http://gadm.org
                    To use with ggplot2, use the fortify
                    function to convert to usual data frame.
                    Will also need to install the sp package.



Thursday, 14 October 2010
# install.packages("sp")

     library(sp)
     load(url("http://gadm.org/data/rda/CHE_adm1.RData"))

     head(as.data.frame(gadm))
     ch <- fortify(gadm, region = "ID_1")
     str(ch)

     qplot(long, lat, group = group, data = ch,
       geom = "polygon", colour = I("white"))



Thursday, 14 October 2010
Thursday, 14 October 2010
This work is licensed under the Creative
       Commons Attribution-Noncommercial 3.0 United
       States License. To view a copy of this license,
       visit http://creativecommons.org/licenses/by-nc/
       3.0/us/ or send a letter to Creative Commons,
       171 Second Street, Suite 300, San Francisco,
       California, 94105, USA.




Thursday, 14 October 2010

More Related Content

More from Hadley Wickham (20)

23 data-structures
23 data-structures23 data-structures
23 data-structures
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
19 tables
19 tables19 tables
19 tables
 
17 polishing
17 polishing17 polishing
17 polishing
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 
03 extensions
03 extensions03 extensions
03 extensions
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 

15 time-space

  • 1. Stat405Visualising time & space Hadley Wickham Thursday, 14 October 2010
  • 2. 1. New data: baby names by state 2. Visualise time (done!) 3. Visualise time conditional on space 4. Visualise space 5. Visualise space conditional on time 6. Aside: geographic data Thursday, 14 October 2010
  • 3. Project Project 2 due November 4. Basically same as project 2, but will be using the full play-by-play data from the 08/09 NBA season. I expect to see lots of ddply usage, and more advanced graphics (next week). Thursday, 14 October 2010
  • 4. Baby names by state Top 100 male and female baby names for each state, 1960–2008. 480,000 records (100 * 50 * 2 * 48) Slightly different variables: state, year, name, sex and number. CC BY http://www.flickr.com/photos/the_light_show/2586781132 Thursday, 14 October 2010
  • 5. Subset Easier to compare states if we have proportions. To calculate proportions, need births. Could only find data from 1981. Selected 30 names that occurred fairly frequently, and had interesting patterns. Thursday, 14 October 2010
  • 6. Aaron Alex Allison Alyssa Angela Ashley Carlos Chelsea Christian Eric Evan Gabriel Jacob Jared Jennifer Jonathan Juan Katherine Kelsey Kevin Matthew Michelle Natalie Nicholas Noah Rebecca Sara Sarah Taylor Thomas Thursday, 14 October 2010
  • 7. Getting started library(ggplot2) library(plyr) bnames <- read.csv("interesting-names.csv", stringsAsFactors = F) matthew <- subset(bnames, name == "Matthew") Thursday, 14 October 2010
  • 8. Time | Space Thursday, 14 October 2010
  • 9. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Thursday, 14 October 2010
  • 10. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 qplot(year, prop, data = matthew,year geom = "line", group = state) Thursday, 14 October 2010
  • 11. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 year Thursday, 14 October 2010
  • 12. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 last_plot() + facet_wrap(~ state)year Thursday, 14 October 2010
  • 13. Your turn Ensure that you can re-create these plots for other names. What do you see? Can you write a function that plots the trend for a given name? Thursday, 14 October 2010
  • 14. show_name <- function(name) { name <- bnames[bnames$name == name, ] qplot(year, prop, data = name, geom = "line", group = state) } show_name("Jessica") show_name("Aaron") show_name("Juan") + facet_wrap(~ state) Thursday, 14 October 2010
  • 15. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Thursday, 14 October 2010
  • 16. 0.04 0.03 prop 0.02 0.01 qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Thursday, 14 October 2010
  • 17. 0.04 0.03 prop 0.02 0.01 So we only get one smooth for the whole dataset qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Thursday, 14 October 2010
  • 18. Three useful tools Smoothing: can be easier to perceive overall trend by smoothing individual functions Centering: remove differences in center by subtracting mean Scaling: remove differences in range by dividing by sd, or by scaling to [0, 1] Thursday, 14 October 2010
  • 19. library(mgcv) smooth <- function(y, x, amount = 0.1) { mod <- gam(y ~ s(x, bs = "cr"), sp = amount) as.numeric(predict(mod)) } matthew <- ddply(matthew, "state", transform, prop_s1 = smooth(prop, year, amount = 0.01), prop_s2 = smooth(prop, year, amount = 0.1), prop_s3 = smooth(prop, year, amount = 1), prop_s4 = smooth(prop, year, amount = 10)) qplot(year, prop_s1, data = matthew, geom = "line", group = state) Thursday, 14 October 2010
  • 20. center <- function(x) x - mean(x, na.rm = T) matthew <- ddply(matthew, "state", transform, prop_c = center(prop), prop_sc = center(prop_s1)) qplot(year, prop_c, data = matthew, geom = "line", group = state) qplot(year, prop_sc, data = matthew, geom = "line", group = state) Thursday, 14 October 2010
  • 21. scale <- function(x) x / sd(x, na.rm = T) scale01 <- function(x) { rng <- range(x, na.rm = T) (x - rng[1]) / (rng[2] - rng[1]) } matthew <- ddply(matthew, "state", transform, prop_ss = scale01(prop_s1)) qplot(year, prop_ss, data = matthew, geom = "line", group = state) Thursday, 14 October 2010
  • 22. Your turn Create a plot to show all names simultaneously. Does smoothing every name in every state make it easier to see patterns? Hint: run the following R code on the next slide to eliminate names with less than 10 years of data Thursday, 14 October 2010
  • 23. longterm <- ddply(bnames, c("name", "state"), function(df) { if (nrow(df) > 10) { df } }) Thursday, 14 October 2010
  • 24. qplot(year, prop, data = bnames, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) longterm <- ddply(longterm, c("name", "state"), transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = longterm, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) last_plot() + facet_wrap(~ name, scales = "free_y") Thursday, 14 October 2010
  • 26. Spatial plots Choropleth map: map colour of areas to value. Proportional symbol map: map size of symbols to value Thursday, 14 October 2010
  • 27. juan2000 <- subset(bnames, name == "Juan" & year == 2000) # Turn map data into normal data frame library(maps) states <- map_data("state") states$state <- state.abb[match(states$region, tolower(state.name))] # Join datasets choropleth <- join(states, juan2000, by = "state") # Plot with polygons qplot(long, lat, data = choropleth, geom = "polygon", fill = prop, group = group) Thursday, 14 October 2010
  • 28. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Thursday, 14 October 2010
  • 29. What’s the problem with this map? 45 How could we fix it? 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Thursday, 14 October 2010
  • 30. ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(fill = "white", colour = "grey50") + geom_polygon(aes(fill = prop)) Thursday, 14 October 2010
  • 31. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Thursday, 14 October 2010
  • 32. Problems? What are the problems with this sort of plot? Take one minute to brainstorm some possible issues. Thursday, 14 October 2010
  • 33. Problems Big areas most striking. But in the US (as with most countries) big areas tend to least populated. Most populated areas tend to be small and dense - e.g. the East coast. (Another computational problem: need to push around a lot of data to create these plots) Thursday, 14 October 2010
  • 34. 45 ● ● ● ● 40 ● ● ● prop ● ● ● 0.004 ● ● 0.006 lat ● ● ● 0.008 35 ● ● ● 0.010 ● ● 30 ● −120 −110 −100 −90 −80 −70 long Thursday, 14 October 2010
  • 35. mid_range <- function(x) mean(range(x)) centres <- ddply(states, c("state"), summarise, lat = mid_range(lat), long = mid_range(long)) bubble <- join(juan2000, centres, by = "state") qplot(long, lat, data = bubble, size = prop, colour = prop) ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) Thursday, 14 October 2010
  • 36. Your turn Replicate either a choropleth or a proportional symbol map with the name of your choice. Thursday, 14 October 2010
  • 37. Space | Time Thursday, 14 October 2010
  • 39. Your turn Try and create this plot yourself. What is the main difference between this plot and the previous? Thursday, 14 October 2010
  • 40. juan <- subset(bnames, name == "Juan") bubble <- merge(juan, centres, by = "state") ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey80") + geom_point(aes(colour = prop)) + facet_wrap(~ year) Thursday, 14 October 2010
  • 41. Aside: geographic data Boundaries for most countries available from: http://gadm.org To use with ggplot2, use the fortify function to convert to usual data frame. Will also need to install the sp package. Thursday, 14 October 2010
  • 42. # install.packages("sp") library(sp) load(url("http://gadm.org/data/rda/CHE_adm1.RData")) head(as.data.frame(gadm)) ch <- fortify(gadm, region = "ID_1") str(ch) qplot(long, lat, group = group, data = ch, geom = "polygon", colour = I("white")) Thursday, 14 October 2010
  • 44. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Thursday, 14 October 2010