• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
R graphics260809
 

R graphics260809

on

  • 351 views

 

Statistics

Views

Total Views
351
Views on SlideShare
351
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    R graphics260809 R graphics260809 Presentation Transcript

    • Plotting 1.0Selene Fernandez-Valverde Lab Meeting 26-08-09
    • Your scientific graphing options Ot he r s ...?
    • Why not only Excel ?Excel is relatively limited in its support of scientific graphingIt’s options provide limited control over the outputLimited selection of graph typesLimited number of datapoint that can be plotted (or it dies)
    • What plots you can do with R? Type on your R terminal:> demo(graphics)> demo(persp)> demo(lattice) Now, that something you can’t make in Excel or Prism. ( you actually can in Matlab )
    • Cool, but...Steep learning cur vePlotting is step by stepPrettifying a graph takes a bit lot ofeffortI don’t want to script in R I just want toplot my results
    • How do we avoid that? We use a package made by someone who encountered these problems before “ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and (almost) none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.” In summary: R graphs made easy
    • How do I start? First format the data into a table that looks like this:carat cut color clarity depth table price x y z 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 Wait ! This looks like an Excel table! Well... it is ( it can also be a tab delimited file ) Make sure your variables (columns) are meaningful and allow you to retrieve the information that you want to plot
    • Read the table into R Set your working directory:> setwd (“./Documents/MyUsername/FolderWhereMyExcelFileIs/”)> getwd()> install.packages("ggplot2", dependencies=TRUE)> library(ggplot2) If your file is and Excel file:> install.packages("gdata")> library(gdata)> table <- read.xls(“MyExcelFile.xls”) If your file is a tab delimited file:> table <- read.delim(“MyExcelFile.txt”)> summary(table) We already have a loaded dataset named “diamonds”> summary(diamonds) Start plotting!
    • Your first plot (s)> ggplot(diamonds, aes(color)) + geom_bar()> ggplot(diamonds, aes(color, fill=cut)) + geom_bar()> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge")> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge") +scale_y_log10()> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="fill")> ggplot(diamonds, aes(color, depth)) + geom_point()> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly()> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() +xlab("Diamond Color")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() +xlab("Diamond Color") + coord_flip()> ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_wrap(~ cut)> ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_grid(. ~ cut)> ggplot(diamonds, aes(color, depth, color=cut)) + geom_point()> ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter()> ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter() + ylim(53,70)> ggplot(diamonds, aes(color, depth, color=cut)) + geom_boxplot()
    • Making the graph prettier> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() +xlab("Diamond Color")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_hue("Cut")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut",palette="Set1")ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts", formatter="comma") +scale_color_brewer("Cut", palette="Set1")> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut",palette="Set1") + facet_wrap(~ clarity)> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut",palette="Set1") + facet_wrap(~ clarity, scale="free_y")
    • Oneliner graph exampleggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) + opts(aspect.ratio = 0.8)
    • Our version F M 20 Number of 15Hemoglobin 20 40 60 10 80 100 5 200 300 400 500 600 700 200 300 400 500 600 700 TIBC ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) + opts(aspect.ratio = 0.8) + theme_bw() + scale_fill_gradient("Number of Patients")
    • Some things ggplot2 can’t doYou can’t click on your graph to changethe labels, you have to rerun theprogram ( work around that , edit themin illustrator ) or use optsWhen stacking data and setting a newlimit, you’ll lose all the data that is inthe group over that range
    • Last thoughtsCan handle millions of datapointsIt’s freeIs good for having a quick look at your data and changing the display in aneasy mannerWorks in all platforms ( Windows, Mac and Linux [ser ver] like it )It’s pretty ( and did I mention is fast? )I think it saves you a bit of Illustrator timeIf you already have your scheme and it works for you is not worth it, butif you are looking for a new plotting strategy I think is a good place tostartIf you get into it you can start making statistical analysis of your dataand plotting it all together
    • For more infohttp://had.co.nz/ggplot2/http://learnr.wordpress.com/
    • NHANES Data : National Health and Nutrition Examination SurveyDescriptionThis is a somewhat large interesting dataset, a data frame of 15 variables (columns) on 9575 persons (rows).This data framecontains the following columns:Cancer.Incidencebinary factor with levels No and Yes.Cancer.Deathbinary factor with levels No and Yes.Agenumeric vector giving age of the person in years.Smokea factor with levels Current, Past, Nonsmoker, and Unknown.Ednumeric vector of {0,1} codes giving the education level.Racenumeric vector of {0,1} codes giving the persons race.Weightnumeric vector giving the weight in kilogramsBMInumeric vector giving Body Mass Index, i.e., Weight/Height^2 where Height is in meters, and missings (61% !) are coded as0 originally.Diet.Ironnumeric giving Dietary iron.Albuminnumeric giving albumin level in g/l.Serum.Ironnumeric giving Serum iron in ug/l.TIBCnumeric giving Total Iron Binding Capacity in ug/l.Transferinnumeric giving Transferin Saturation which is just 100*serum.iron/TIBC.Hemoglobinnumeric giving Hemoglobin level.Sexa factor with levels F (female) and M (male).