2. 1. Homework
2. Reading a scatterplot
3. Scatterplot techniques for large data
4. Iteration & story telling
5. Project & homework
Wednesday, 9 September 2009
3. Homework
Great start!
Remember the grading scheme:
4.5–5 = A+, 4–4.5 = A, 3.5–4 = A-
Shorter is better than longer.
Check aspect ratios.
Read the comments!
Wednesday, 9 September 2009
4. Revision:
reading a scatterplot
• Big patterns
• Small patterns
• Deviations from the pattern
• Strange patterns
Wednesday, 9 September 2009
10. Curved (exponential?)
relationship. Outliers mostly
cheaper than expected.
Wednesday, 9 September 2009
11. But what’s the
problem with
all these plots?
qplot(carat, price, data = diamonds)
Wednesday, 9 September 2009
12. But what’s the
problem with
all these plots?
In pairs, brainstorm
solutions for 2 minutes.
qplot(carat, price, data = diamonds)
Wednesday, 9 September 2009
13. Ideas
If x discrete, use boxplots.
Use semi-transparent points.
Divide into bins and count number of
points in each bin (2d histogram).
Display statistical summary.
Wednesday, 9 September 2009
14. Box and
whisker plots
Wednesday, 9 September 2009
15. Boxplots
Less information than a histogram, but
take up much less space.
Already seen them used with discrete x
values. Can also use with continuous x
values, by specifying how we want the
data grouped.
Wednesday, 9 September 2009
29. # Very basic cleaning
diamonds$x[diamonds$x == 0] <- NA
diamonds$y[diamonds$y == 0] <- NA
diamonds$y[diamonds$y > 12] <- NA
qplot(x, y, data = diamonds)
qplot(x, y, data = diamonds, geom = "bin2d")
qplot(x, y, data = diamonds, geom = "hex")
qplot(x, y, data = diamonds, geom = "bin2d", bins = 100)
qplot(x, y, data = diamonds, geom = "hex", bins = 100)
# Zoom in
qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) +
xlim(4,7) + ylim(4,7)
qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) +
xlim(4,5) + ylim(4,5)
Wednesday, 9 September 2009
30. qplot(x, x / y, data = diamonds,
geom = "bin2d")
qplot(x, log(x / y), data = diamonds,
geom = "bin2d")
clean <- subset(diamonds, abs(log(x / y)) < 0.1)
qplot(x, log(x / y), data = clean, geom = "bin2d")
qplot(x, log(x / y), data = clean, geom = "bin2d",
bins = 80)
Wednesday, 9 September 2009
31. qplot(x, x / y, data = diamonds,
geom = "bin2d")
qplot(x, log(x / y), data = diamonds,
geom = "bin2d")
clean <- subset(diamonds, abs(log(x / y)) < 0.1)
qplot(x, log(x / y), data = clean, geom = "bin2d")
qplot(x, log(x / y), data = clean, geom = "bin2d",
bins = 80)
What would be a good name for
log(x / y)? What other variable
might you create to go with it?
Wednesday, 9 September 2009
32. Your turn
Continue to explore the relationship
between x, y, z and depth. Create new
variables as necessary.
(Hint: rerun the cleaning code from last
week, and create more as necessary)
Some good ideas here: http://
www.diamondhelpers.com/fivesteps/4-
certified-diamonds.shtml
Wednesday, 9 September 2009
33. x
table width
z
depth = z / diameter
table = table width / x * 100
Wednesday, 9 September 2009
37. Stories
Best data analyses tell a story, with a
natural flow from beginning to end.
For homeworks, try and come up with
three plots that tell a story.
Stories about a small sample of the data
can work well.
Wednesday, 9 September 2009
38. qplot(cty, hwy, data = mpg)
qplot(cty, hwy, data = mpg, geom = "jitter")
qplot(cty, hwy, data = mpg, geom = "jitter", colour =
class)
qplot(cty, cty / hwy, data = mpg, geom = "jitter",
colour = class)
qplot(cty, cty / hwy, data = mpg, colour = class)
qplot(displ, cty / hwy, data = mpg, colour = class)
qplot(displ, cty / hwy, data = mpg) + facet_wrap(~
class)
qplot(displ, cty / hwy, data = mpg) + facet_wrap(~
class) + geom_smooth(se = F)
qplot(displ, cty / hwy, data = mpg) + facet_wrap(~
class) + geom_smooth(method = "lm", se = F)
qplot(displ, cty, data = mpg) + facet_wrap(~ class)
Wednesday, 9 September 2009
39. Project
Due in 3.5 weeks.
Bigger group data analysis project. (Will
be discussing group dynamics on
Monday)
Homework is to get you started working
with the data.
Wednesday, 9 September 2009
40. Next week
Checking on a slot machine.
Learning how to write functions.
Basics of simulation.
Wednesday, 9 September 2009
41. Feedback
http://hadley.wufoo.com/forms/stat405-feedback/
Wednesday, 9 September 2009