Upcoming SlideShare
×

13 Ddply

1,467 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,467
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
33
0
Likes
1
Embeds 0
No embeds

No notes for slide

13 Ddply

1. 1. Hadley Wickham Stat405ddply case study Wednesday, 7 October 2009
2. 2. 1. Feedback & homework & project 2. Overall goal: dual-sex names vs. errors 3. Selecting smaller subset 4. Classiﬁcation 5. Individual exploration Wednesday, 7 October 2009
3. 3. I’ll try and go slower when writing things on the board. Remind me! Too much homework? Will try to reduce from now on. This week’s homework is a bit different. Feedback Wednesday, 7 October 2009
4. 4. Homework If you need more practice, all function drills, along with answers, are available on line. Running behind on grading, sorry :( Common mistakes Wednesday, 7 October 2009
5. 5. even <- function(x) { is_even <- x %% 2 == 0 if (is_even) { print("Even!") } else { print("Odd!") } } # Problems # * does it work with vectors? # * can we easily define odd in terms of even? Wednesday, 7 October 2009
6. 6. even <- function(x) { x %% 2 == 0 } even(1:10) odd <- function(x) { !even(x) } # In general, always should return something useful # from functions, rather than printing or plotting Wednesday, 7 October 2009
7. 7. area <- function(r) { a <- pi * r ^ 2 a } # Not necessary! area <- function(r) { pi * r ^ 2 } Wednesday, 7 October 2009
8. 8. # Choose from a, b and c with equal probability x <- runif(1) if (x < 1/3) { "a" } else (x < 2/3) { "b" } else { "c" } # OR sample(c("a","b","c"), 1) Wednesday, 7 October 2009
9. 9. Still working on grading. Will have back to you by next Wednesday (no class on Monday). Next project due Oct 30. Basically same as last time, but working with baby names and you need to include an external data source. Project Wednesday, 7 October 2009
10. 10. For names that are used for both boys and girls, how has usage changed? Can we use names that clearly have the incorrect sex to estimate error rates over time? Questions Wednesday, 7 October 2009
11. 11. Getting started options(stringsAsFactors = FALSE) library(plyr) library(ggplot2) bnames <- read.csv("baby-names.csv") Wednesday, 7 October 2009
12. 12. First task Identify a smaller subset of names that been in the top 1000 for both boys and girls. ~7000 names in total, we want to focus on ~100. In real-life would probably use more, but starting with a subset for easier exploration is still a good idea. Wednesday, 7 October 2009
13. 13. First task Identify a smaller subset of names that been in the top 1000 for both boys and girls. ~7000 names in total, we want to focus on ~100. In real-life would probably use more, but starting with a subset for easier exploration is still a good idea. Take two minutes to brainstorm what variables we might to create to do this. Wednesday, 7 October 2009
14. 14. Your turn Summarise each name with: the total proportion of boys, the total proportion of girls, the number of years the name was in the top 1000 as a girls name, the number of years the name was in the top 1000 as a boys name Hint: Start with a single name and ﬁgure out how to solve the problem. Hint: Use summarise Wednesday, 7 October 2009
15. 15. times <- ddply(bnames, c("name"), summarise, boys = sum(prop[sex == "boy"]), boys_n = sum(sex == "boy"), girls = sum(prop[sex == "girl"]), girls_n = sum(sex == "girl"), .progress = "text" ) nrow(times) times <- subset(times, boys_n > 1 & girls_n > 1) Wednesday, 7 October 2009
16. 16. boys_n girls_n 20 40 60 80 100 120 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 40 60 80 100 120 Wednesday, 7 October 2009
17. 17. pmin(boys_n, girls_n) count 0 20 40 60 80 20 40 60 80 100 120 New functions: pmin(a, b) pmax(a,b) Wednesday, 7 October 2009
18. 18. qplot(boys_n, girls_n, data = times) qplot(pmin(boys_n, girls_n), data = times, binwidth = 1) times\$both <- with(times, boys_n > 10 & girls_n > 10) # Still a few too many names. Lets focus on names # that have managed a certain level of popularity. qplot(pmin(boys, girls), data = subset(times, both), binwidth = 0.01) qplot(pmax(boys, girls), data = subset(times, both), binwidth = 0.1) qplot(boys + girls, data = subset(times, both), binwidth = 0.1) Wednesday, 7 October 2009
19. 19. # Now save our selections both_sexes <- subset(times, both & boys + girls > 0.4) selected_names <- both_sexes\$name selected <- subset(bnames, name %in% selected_names) nrow(selected) / nrow(bnames) Wednesday, 7 October 2009
20. 20. Next problem is to classify which names are dual-sex, and which are errors. To do that, we’ll need to calculate yearly summaries for each of those names, and use our knowledge of names to come up with a good classiﬁcation criterion. Yearly summaries Wednesday, 7 October 2009
21. 21. Your turn For each name, in each year, ﬁgure out the total number of boys and girls. Think of ways to summarise the difference between the number of boys and girls, and start visualising the data. Wednesday, 7 October 2009
22. 22. bysex <- ddply(selected, c("name", "year"), summarise, boys = sum(prop[sex == "boy"]), girls = sum(prop[sex == "girl"]), .progress = "text" ) # It's useful to have a symmetric means of comparing # the relative abundance of boys and girls - the log # ratio is good for this. bysex\$lratio <- log10(bysex\$boys / bysex\$girls) bysex\$lratio[!is.finite(bysex\$lratio)] <- NA Wednesday, 7 October 2009
23. 23. year lratio −2 −1 0 1 2 1880 1900 1920 1940 1960 1980 2000 Wednesday, 7 October 2009
24. 24. lratio reorder(name,lratio,na.rm=T) Susan Linda Karen Lisa Barbara Sandra Donna Patricia Amanda Jennifer Nancy Melissa Jessica Sharon Michelle Betty Mary Dorothy Virginia Helen Margaret Ruth Elizabeth Sarah Anna Alice Mildred Emma Marie Martha Lillian Bertha Clara Grace Minnie Edna Annie Kimberly Edith Ethel Florence Rose Louise Irene Doris Julia Frances Carol Ashley Shirley Willie Jerry Ryan Joe Louis Anthony Daniel Eric Joshua Jason Fred Henry Jack Christopher Kevin George Matthew Arthur Walter Harold Kenneth Brian Michael Paul Albert Charles Frank Joseph James Harry Robert John David Donald Thomas Edward William Richard Larry Mark Ronald ●●●●●●●●●● ●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●● ●● ●●●●● ●● ●●●●●●●●● ●●●● ● ●●●● ● ●● ●●● ●●● ●●●●●● ●● ●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●● ●● ●● ●●● ●● ●●●●●●● ●●●● ● ●●●●●●●●● ●●●● ● ●●● ●●●●●●●●●●●●●●●●●● ●● ●●● ●● ●●● ● ●●●● ●● ●● ● ●●● ●●●●●●●●● ●● ●● ●●●● ●●●●●●●●●●● ●● ● ● ●●●● ● ●●●●●●●●●●● ●● ●●●● ●●●●●●●●●● ●●●● ●●●●●●●● ●●●● ●●● ●●●●●●●●●●● ●●● ● ●● ●●●●● ●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●● ●●●● ●●●● ●● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●● ●● ●●●●●● ●● ●●●● ●● ●●● ●●●●●●● ●●●●●●● ●●● ●●●● ●●● ●●●●● ●● ●●●●● ● ●● ●● ●●●●●●●●●●●● ●●●●● ●●●● ●● ●●●● ● ●●● ●●● ●●● ●●●●● ●● ●●●●●● ●●● ●●●●● ●●● ●● ●●●● ●● ●●●● ●● ●●●● ●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●● ●● ●● ●● ● ●●● ● ●●●● ●●●●● ●●●●●● ●●● ●●● ●●●●● ●●●●● ●● ●●●●● ● ●● ●●● ● ●● ●●●● ●● ●●● ●●●● ●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●● ● ●● ●● ●● ● ●●●●● ●●●●●●● ● ● ●●●●●● ●●●●●● ●●● ● ●●●●● ●● ●● ●● ●●●●●●●●●● ●● ●●●●●●● ●● ● ●●●● ●●●● ●●●● ●●●●●●●●●● ●●●●● ● ●● ●●● ●●● ●●●●●●●●●●● ●●●● ● ●● ●●● ●●● ●●●●● ● ●● ●● ●● ● ● ●●●●●●●● ●●● ●●●●●● ●● ●● ●●●● ●●● ●●● ●●● ●● ●●●●●●●●● ●●●●●● ●●●●● ●● ●●●● ●●●●● ●●● ●●● ●● ● ●● ● ●●●●●● ● ●● ●●●●● ●● ●● ●●● ●● ●● ●●● ●●●● ●●●● ● ●● ● ●● ● ●● ●●●●● ●● ●● ●●●●●●● ●● ●●●●● ●●● ●●● ●●● ●● ● ●●● ●●● ●● ● ●● ●● ●● ●●● ● ●●●●● ●●●●● ● ●●●●● ●●● ● ●●●● ●● ●● ●● ●● ●●●● ●● ●● ●●● ●● ●● ●● ●● ● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●● ●●●● ●● ● ●●● ●● ●● ●● ●●● ●● ●● ●● ● ●●● ●● ● ●●●●●●●●●● ●●●●● ● ●●● ● ●● ●●●● ●●●● ● ●● ●● ●● ●●●● ●●● ●●●●● ● ●●●● ● ●● ●●● ● ●● ●●● ●● ●●●●●●●● ●●●●● ●●● ●●● ●● ●●● ●● ●●●●● ●●● ●● ●●●●●●●●●●● ●●●● ●● ●●●● ●●● ●●●● ●●●●●●●● ●●●●● ●●●●●●●●● ●●●● ●● ●●●●●●● ●●●● ●●●●● ● ●● ●●● ●●● ●●●●●● ●●●● ●●●●●●● ● ●●● ●●● ●●● ●●●●●●●●●●●●●●● ●●●●● ●●●● ●●● ●●●●●●● ●● ●●●● ●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●● ●●●● ●●●●● ●●● ● ● ●● ●●● ●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●● ●●● ●●●● ●● ●●● ●●● ●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ●●● ●● ●●●● ●● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●● ●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●●●●●●●● ●●● ● ●● ● ●● ●●●●● ●●●●●●● ● ●● ●●●●●●● ●●●●●●●● ●● ●●●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●● ●●● ●● ●●●●● ● ●● ●●● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ● ● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●● ●●●●● ●●● ●●●●●●●●●● ●●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●● ●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●● ●●●●● ●●●●●●●● ●● ●● ●●●●●●●● ●● ●●●● ●●●●●● ●●● ●●● ●●●●● ●●●●● ●●● ●●●●●● ●● ●● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●● ●●●● ●● ●●● ●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●●●●●●●●●● ●● ●● ●●●●● ●● ●● ●● ●●● ●●●● ●●●●●● ●●●●●● ●●● ●● ●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●● ●●● ●● ●● ●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ●● ●● ●●● ●● ●●● ●●● ● ●●● ●●● ●● ● ● ●●●●●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●● ●●● ●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●●● ●●●● ● ●● ●● ●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●●●●●● ●● ●● ●●●●● ●●●● ●●● ●● ●●●●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●● ●●● ●●●● ●● ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●● ●●● ●●●●●●●● ●●●● ●●●● ●● ●● ● ●●●● ●● ●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●● ● ● ●●●●●●●●● ●●●●●● ●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●● ●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●●●● −2 −1 0 1 2 Wednesday, 7 October 2009
25. 25. abs(lratio) reorder(name,lratio,na.rm=T) Susan Linda Karen Lisa Barbara Sandra Donna Patricia Amanda Jennifer Nancy Melissa Jessica Sharon Michelle Betty Mary Dorothy Virginia Helen Margaret Ruth Elizabeth Sarah Anna Alice Mildred Emma Marie Martha Lillian Bertha Clara Grace Minnie Edna Annie Kimberly Edith Ethel Florence Rose Louise Irene Doris Julia Frances Carol Ashley Shirley Willie Jerry Ryan Joe Louis Anthony Daniel Eric Joshua Jason Fred Henry Jack Christopher Kevin George Matthew Arthur Walter Harold Kenneth Brian Michael Paul Albert Charles Frank Joseph James Harry Robert John David Donald Thomas Edward William Richard Larry Mark Ronald ● ●● ● ●● ● ●● ●● ●●●● ●●●●●●●●●● ● ●●●● ● ● ●●● ● ●●● ●●●●● ●● ● ●●●● ● ●● ●●● ●●● ●● ●●●●● ●●●● ● ●●●●● ● ●●●●● ● ●●● ●●●●● ●● ●●●●●●●● ● ●● ● ●●●●●● ●●●● ●●●●● ● ● ●● ●● ●● ● ● ●●●●●●● ● ● ● ● ●● ●● ●●●●●● ● ● ●● ●●● ●●●●●● ● ● ● ●● ●● ●● ●● ●●●●● ● ●●●●● ●● ●●●● ● ● ●●●●●● ●●●● ● ●● ●● ●● ●● ● ●●●●●●●●●● ●●● ● ● ● ●●●● ●● ● ●●●● ● ●●● ●●●●●●●● ●●●●● ●●● ●● ●●● ●●●● ●● ●●●● ●● ● ● ●● ● ●●● ●● ●● ●●●●● ●●●●●●●●●● ●●●●● ● ● ●●●●●●●●●●●●● ●● ●●●● ● ● ● ● ● ●● ●●●● ●● ● ● ●●●● ●●●● ●●●● ●● ●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ● ●●● ●●●●●●●● ●●●●● ●●●●●●●●●● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ●●●● ●●●●●●●●●● ●●●●● ●● ●● ● ●● ●● ●●● ●●● ●● ●●●●● ●●● ● ●●● ●● ●● ●● ●●●● ●●●● ●●● ●●● ●● ●● ● ●● ●● ●●●●●●●●●●●● ●●● ●● ●●● ●● ● ●● ●● ●● ●●● ● ●●● ● ●●●● ●● ●● ● ●●● ●● ● ● ● ●● ●●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●● ●●● ●●● ● ●●●● ●●●● ●● ●● ● ● ●● ●● ● ●● ●●● ●●●●● ●●●●● ●● ● ●● ●● ● ●●● ●●● ●● ● ●●● ●● ●●●● ● ●● ●●●● ● ●●● ● ●●●● ●●● ●●●● ●●● ●● ●●●●●●●●● ●●●● ●● ● ●● ●● ●●● ● ●● ●●●●● ●●●● ●●●●● ● ●●● ● ●● ●● ●●●● ● ●● ●● ●● ●● ●● ● ● ● ●●●●●● ● ●●●● ●●● ●● ●●● ● ● ●●● ●●●● ● ●● ● ● ●●●●●●● ● ● ●● ●●● ●● ● ●● ● ●● ●● ●●●●●●●● ●● ●●●● ●●● ●● ● ●●● ●● ●●● ●●●● ●●● ●●●● ● ●● ●● ●●●●● ● ●●●●● ●● ● ● ● ●● ●● ● ●●● ●● ●● ●●●●● ●●● ●● ●●●●●● ●● ●● ●●● ●●● ●● ●●●● ●● ● ●● ● ●● ●●● ●●●● ●●● ●● ●● ●●● ●●●● ●●● ● ● ●●● ●● ●●●●● ●●●● ●●● ●●● ●●●●●● ●●● ●● ●●●● ● ●●●● ● ●●●● ●●● ●● ●●●● ●●●●●● ● ●● ●●● ●● ●●●● ●● ●● ● ●●●● ●●● ●●● ●●● ●●● ●●●● ● ● ● ●●●● ●● ●● ●●●●●● ●● ●●● ●● ●● ●● ●●● ●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●● ●●●● ●●●●●●●●●●●● ●● ●● ● ●●● ●●● ●●● ●● ●●● ● ●●● ●●●● ●●● ● ●● ●●● ●●●●● ● ●● ●●●● ● ●● ●●●●● ●●● ●●●●● ●●● ●● ●● ●●● ● ●● ●●● ●● ● ●● ●● ●●●● ●● ●●●● ●● ●●● ●●●● ●● ●● ●●● ●●● ●●●● ● ●● ●● ●●● ●●● ●● ●● ● ●● ●● ● ● ●●●●●●● ●●●●● ●●● ●● ●● ● ●● ●●● ●● ● ●●● ●●● ● ●● ●● ●●● ● ●●● ●●●●●● ●● ●●●●●●● ●● ● ●●●● ●●● ●● ●● ● ● ●● ● ● ●●●● ● ● ●● ●●●●● ●●● ● ●●● ●● ● ●● ● ●●● ●● ●●●●●●●●●●●● ●●●● ●● ● ●● ●●●● ●●● ● ● ● ● ●●● ●● ● ●●●●●●● ● ●●●● ●●●●● ● ●●● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ●●● ●● ●● ●●● ●●●●●● ● ● ● ● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●●● ●● ●●● ●●●● ●● ● ●● ●● ●● ●● ●●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ●●●●● ●● ● ● ●●● ●●● ●● ● ●● ● ● ●●●● ●●●● ●●● ●● ●●●●●●● ●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ● ●●● ●●●●● ●● ●●●●●●●● ● ● ● ●● ● ●● ●● ●●●● ●●●●●●●●●●●●● ●●● ●●● ● ● ●●● ●●●●●●●● ●●●● ●● ● ●● ●●●●●● ● ●● ●● ●● ●●●●● ●●● ●● ●●●● ● ● ●● ●●●●●●● ●●●●●●●●● ● ●●●● ●●● ●● ●●● ●● ●● ●● ●●● ●● ● ●●● ●●● ●●● ●●● ● ●●●●●●●●●●● ●●●●●●● ●●●●● ●●●●●●●●●●●● ●●● ●● ● ●● ● ●●● ●● ●●●● ●● ●● ●●●●●●●●●●●● ●● ●● ● ●● ●●●● ●●●●●●● ● ●●●● ● ●●●●●● ●●●●●●●●●●● ● ●●● ●●●●●● ●●●●●● ●● ●●●●●●● ●● ● ●● ●●●●●●●● ●●●●●● ●● ●● ● ●●●● ●●● ●●●●●●●● ● ●●● ● ●● ● ●● ●●●●● ● ●●●●● ● ● ●● ●●●●● ●● ●●●●●●●● ●● ●●●●●● ●●●● ● ●●● ●● ●●● ● ● ●● ●● ●●● ●●● ●●● ●● ●●●●●● ●●●●●●●●● ●●●●●●●● ● ● ●●●●●● ●●●●●● ●●● ●● ●● ● ●● ● ●● ●●● ● ●● ●● ●● ●●●●● ● ●● ●●●●● ●●●●●●● ●●●●●●●● ●●●●● ●●●● ● ●●● ●● ● ●● ●●●●●●● ● ●● ●●● ●●●●●●● ● ● ● ●●●●● ●● ●●●●●●●●●●●●●●●● ●●●● ●● ● ●●● ●● ●●●●●●● ● ●●●●● ●●●●●●●●●●● ● ●● ●●●●● ●●● ●● ●●●●●●●● ●●● ●●●● ● ● ● ●● ●●●●● ●●●●● ●●●●●●●●● ●● ●●●● ● ● ● ●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●● ● ● ●●●●● ●●● ●●● ●● ●● ●●●●●● ●● ● ● ●●●● ●●●● ●● ● ●●● ● ●● ●●● ●●● ●●● ●●● ●● ●●●● ● ●●● ● ●●●●● ●● ●● ●●●●●● ●●●●●●●●●● ●●●●● ●● ●● ●● ● ●●● ●●●●●● ●● ●●● ●●● ●●●●●● ●● ● ●●●●●●●●●●●●● ● ●●●● ●● ● ● ●● ●● ●●● ●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●● ●●●●●●●● ●● ●● ●●●●● ●● ●● ●● ●●● ●●●● ●●● ●●● ●●● ● ●● ● ●● ●● ●●● ●●●● ●●●●●●●●● ●●● ● ● ● ●● ●●●● ●●●● ●●●●● ●●●● ●●●●●●● ●●●●●●● ● ●● ●●● ●● ●● ●●● ●●●● ●● ●●●● ●● ●●●● ●●●●●●●●●●● ●● ●●●●●●●●●● ● ●● ● ●●●●●●●●● ●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●● ● ●● ●● ●● ● ●● ●●● ●●● ● ●● ● ●●● ●● ● ● ●●●● ●● ●●● ●● ● ●●● ●●●●● ●●●●●●●●●●●●●●●●● ●● ●● ●●●●●● ●●● ●● ●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ● ●●● ●●●● ●●● ●●●●● ●●● ● ●● ●●●●●● ●●● ●●●●●●● ●● ●●●●●●●●●●● ● ● ●● ●●●●● ●●● ● ●● ● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ●●● ● ● ●●● ●●●●● ●●●● ● ●● ●● ●●●● ● ●● ●● ●● ●●●● ●●●●● ●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●●● ● ●●●●●●●● ● ●● ●● ●●●●● ●● ●● ●●● ●● ●● ●●●● ●● ●●●●●● ●●●● ●●●●●● ● ●●● ●●●● ●● ●●● ●●● ●●●● ●● ●●● ● ●●● ●●●●●●●● ●●● ●●●● ●●●●●● ●●● ●●● ●● ● ●●●●●●● ●●●●●●● ●● ● ●●● ● ●● ●● ●● ●● ●●●● ●● ●● ● ●●●● ●● ●●● ●● ●●●●● ●●●●● ●●●●●●● ●●●●●●●●● ● ●● ● ●●●● ●●● ●●● ●●● ●● ●● ●● ●●●●●●●●● ●● ●●●●●●●●● ●●●● ●●●● ●●● ●● ●● ● ● ●●●●●●●●● ●● ●●●● ●●● ●●●● ● ●● ●●●● ● ●●● ●●● ●●●●●●● ●●●●●●●●●●●● ● ●●●● ●● ●●●●●●●● ● ●●●● ●●●●●●●●● ● ●●● ●●● ●●● ● 0.5 1.0 1.5 2.0 2.5 Wednesday, 7 October 2009
26. 26. theme_set(theme_grey(10)) qplot(year, lratio, data = bysex, group = name, geom = "line") qplot(lratio, reorder(name, lratio, na.rm = T), data = bysex) qplot(abs(lratio), reorder(name, lratio, na.rm = T), data = bysex) qplot(abs(lratio), reorder(name, lratio, na.rm = T), data = bysex) + geom_point(data = both_sexes, colour = "red") Wednesday, 7 October 2009
27. 27. year lratio −2 −1 0 1 2 1880 1900 1920 1940 1960 1980 2000 What characteristics of each name might we want to use to classify them into dual-sex with sex-errors? Wednesday, 7 October 2009
28. 28. Your turn Compute the mean and range of lratio for each name. Plot and come up with cutoffs that you think separate the two groups. Wednesday, 7 October 2009
29. 29. rng <- ddply(bysex, "name", summarise, diff = diff(range(lratio, na.rm = T)), mean = mean(lratio, na.rm = T) ) qplot(diff, abs(mean), data = rng) qplot(diff, abs(mean), data = rng, colour = abs(mean) < 1.75 | diff > 0.9) shared_names <- subset(rng, abs(mean) < 1.75 | diff > 0.9)\$name qplot(abs(lratio), reorder(name, lratio, na.rm=T), data = subset(bysex, name %in% shared_names)) qplot(year, lratio, geom = "line", group = name, data = subset(bysex, name %in% shared_names)) Wednesday, 7 October 2009
30. 30. Now that we’ve separated the two groups, we’ll explore each in more detail. Next time Wednesday, 7 October 2009