01 intro

1,470 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,470
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
33
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

01 intro

  1. 1. Stat405 Statistical computing & graphics Hadley Wickham Monday, 23 August 2010
  2. 2. 1. Introductions 2. Syllabus 3. Introduction to linux 4. Introduction to R 5. Basic graphics Monday, 23 August 2010
  3. 3. HE LLO my name is Hadley Monday, 23 August 2010
  4. 4. had.co.nz/stat405 (if you can’t remember just google stat405) hadley@rice.edu Monday, 23 August 2010
  5. 5. About me From New Zealand Divisional advisor for McMurtry Major advisor for statistics Monday, 23 August 2010
  6. 6. Syllabus Monday, 23 August 2010
  7. 7. Computing environment Lab computers: linux Your computers: mac and windows Essential tools: R, text editor, latex Use of command line strongly encouraged. Will show basic set up for mac, windows and linux. Monday, 23 August 2010
  8. 8. Lab access Once registrations are finalised, I’ll get everyone lab access. (But can use R on any computer on campus - including your own) Monday, 23 August 2010
  9. 9. Setup Find the instructions related to your operating system on the class website. Follow them to get R and running I’ll circulate and make sure everyone gets set up right. Monday, 23 August 2010
  10. 10. Introduction to R Monday, 23 August 2010
  11. 11. Learning a new language is hard! Monday, 23 August 2010
  12. 12. Scatterplot basics install.packages("ggplot2") library(ggplot2) ?mpg head(mpg) str(mpg) summary(mpg) qplot(displ, hwy, data = mpg) Monday, 23 August 2010
  13. 13. Scatterplot basics install.packages("ggplot2") library(ggplot2) ?mpg head(mpg) str(mpg) Always explicitly summary(mpg) specify the data qplot(displ, hwy, data = mpg) Monday, 23 August 2010
  14. 14. ● ● 40 ● ● 35 ● ● ● ● ● ● ● ●● 30 ● ● ● ● ● ● ● ●● ● hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, data = mpg) displ Monday, 23 August 2010
  15. 15. Additional variables Can display additional variables with aesthetics (like shape, colour, size) or facetting (small multiples displaying different subsets) Monday, 23 August 2010
  16. 16. ● ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, colour = class, data = mpg) displ Monday, 23 August 2010
  17. 17. ● ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● Legend chosen and displayed automatically. ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, colour = class, data = mpg) displ Monday, 23 August 2010
  18. 18. Your turn Experiment with colour, size, and shape aesthetics. What’s the difference between discrete or continuous variables? What happens when you combine multiple aesthetics? Monday, 23 August 2010
  19. 19. Discrete Continuous Rainbow of Gradient from Colour colours red to blue Linear mapping Discrete size Size between radius steps and value Different shape Shape Doesn’t work for each Monday, 23 August 2010
  20. 20. Faceting Small multiples displaying different subsets of the data. Useful for exploring conditional relationships. Useful for large data. Monday, 23 August 2010
  21. 21. Your turn qplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ .) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl) qplot(displ, hwy, data = mpg) + facet_wrap(~ class) Monday, 23 August 2010
  22. 22. Summary facet_grid(): 2d grid, rows ~ cols, . for no split facet_wrap(): 1d ribbon wrapped into 2d Monday, 23 August 2010
  23. 23. Aside: workflow Keep a copy of the slides open so that you can copy and paste the code. For complicated commands, write them in gedit and then copy and paste. Monday, 23 August 2010
  24. 24. What’s the ● ● 40 problem with ● this plot? ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● 10 15 20 25 30 35 qplot(cty, hwy, data = mpg) cty Monday, 23 August 2010
  25. 25. ● ● ● 40 ● ● ● ●● 35 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 30 ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● hwy ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● 25 ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● 15 ●● ●● ●● ● ● ● ● ● ●● ●● 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") cty Monday, 23 August 2010
  26. 26. ● ● ● 40 ● ● ● ●● 35 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 30 ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● hwy ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● 25 ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● 15 ●● ●● ●● ● ● ● ● ● ●● geom controls ●● “type” of plot 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") cty Monday, 23 August 2010
  27. 27. ● ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) class Monday, 23 August 2010
  28. 28. How could ● ● we improve ● 40 this plot? ● ● 35 ● ● Brainstorm ● ● ● ● ● ● 30 for 1 minute. ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) class Monday, 23 August 2010
  29. 29. ● ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact reorder(class, hwy) Monday, 23 August 2010
  30. 30. ● ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● Incredibly useful ● technique! ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data = mpg) reorder(class, hwy) Monday, 23 August 2010
  31. 31. ● ● ● 40 ● ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● 25 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter") hwy, data Monday, 23 August 2010
  32. 32. ● ● ● 40 ● 35 ● 30 hwy ● ● 25 ● ● ● ● 20 ● 15 ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot") reorder(class, = Monday, 23 August 2010
  33. 33. ● ● ● ● ● ● 40 ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 30 ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● hwy ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● qplot(reorder(class,minivan pickup suv hwy), 2seater data = subcompact hwy, midsize mpg, compact geom = c("jitter", "boxplot")) reorder(class, hwy) Monday, 23 August 2010
  34. 34. Your turn Read the help for reorder. Redraw the previously plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots? Monday, 23 August 2010
  35. 35. Aside: coding strategy At the end of each interactive session, you want a summary of everything you did. Two options: 1. Save everything you did with savehistory() then remove the unimportant bits. 2. Build up the important bits as you go. (this is how I work) Monday, 23 August 2010

×