Presentation R basic teaching module

1,079 views

Published on

Presentation on basic R commands that are useful for biologists. With some biological examples.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,079
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation R basic teaching module

  1. 1. Introduction to R Basic Teaching module 13-10-2010Sander Timmer & Myrto Kostadima
  2. 2. OverviewWhat is RQuick overview datatypes, input/output andplotsSome biological examplesI’m not a particular good teacher, so pleaseask when you’re lost!
  3. 3. What is this R thing?R is a powerful, general purpose languageand software environment for statisticalcomputing and graphicsRuns on Linux, OS X and for the unlucky fewalso on WindowsR is open source and free!
  4. 4. Start your R interface
  5. 5. Variablesx <- 2x <- x^2x[1] 4
  6. 6. VectorsMany ways of generating a vector with a range of numbers: x <- 1:10 assign(“x”, 1:10) x <- c(1,2,3,4,5,6,7,8,9,10) x <- seq(1,10, by=1) x <- seq(length = 10, from=1,by=1)x[1] 1 2 3 4 5 6 7 8 9 10
  7. 7. VectorsCommon way to store multiple valuesx <- c(1,2,4,5,10,12,15)length(x)mean(x)summary(x)
  8. 8. VectorsVectors are indexedx[5] + x[10][1] 15x[-c(5,10)][1] 1 2 3 4 6 7 8 9
  9. 9. MatricesCommon form of storing 2 dimensional data Think about having a Excel sheetm = matrix(1:10,2,5) [,1] [,2] [,3] [,4] [,5][1,] 1 3 5 7 9[2,] 2 4 6 8 10summary(m)
  10. 10. FactorsFactors are vectors with a discrete number oflevels:x <- factor(c(“Cancer”, “Cancer”, “Normal”,“Normal”))levels(x)[1] “Cancer” “Normal”table(x)Cancer Normal 2 2
  11. 11. ListsA list can contain “anything”Useful for storing several vectorslist(gene=”gene 1”, expression=c(5,2,3))$gene[1] “gene 1”$expression[1] 5, 2, 4
  12. 12. If-else statementsEssential for any programming languageif state then do x else do yif(p < 0.01){ print(“Significant gene”)}else{ print(“Insignificant gene”)}
  13. 13. RepetitionYou want to apply 1 function to everyelement of a listfor(element in list){ ....do something.... }For loops are easy though tend to be slowApply is the fast way of getting things donein R:apply(List,1,mean)
  14. 14. Data inputR has countless ways of importing data: CSV Excel Flat text file
  15. 15. Data inputMost simple, the CSV file: read.csv(“mydata.csv”, row.names=T,col.names=T)Load a tab separated file read.table(“mytable.txt”, sep=”t”)Load Rdata file load(“mydata.Rdata”)
  16. 16. Data inputAlso for more specific data sources:ExcelDatabase connections Mysql -> Ensembl e.g.Affy Affymetrix chips dataHapMap.........
  17. 17. Data outputMost simple, the CSV file: write.csv(x, file=”myx.csv”)Save Rdata file: save(x, file=”myx.Rdata”)Save whole R session: save(file=”mysession.Rdata”)
  18. 18. GraphicsQuick way to study your data is plotting itThe function “plot” in R can plot almostanything out of the box (even if this doesn’tmake sense!)
  19. 19. plot(1:5,5:1)
  20. 20. plot(1:5,5:1, col=”red”, type=”l”)
  21. 21. plot(1:5,5:1, col=”red”, type=”l”, main="Title of this plot", xlab="x axis", ylab="y axis")
  22. 22. Basic graphicsWith R you can plot almost any object Multidimensional variables like matrixes can be plotted with matplot()Other often used plot functions are: boxplot(), hist(), levelplot(), heatmap()
  23. 23. Advanced plotting
  24. 24. Advanced plotting
  25. 25. Advanced plotting
  26. 26. Before the exampleHelp page for functions in R can be called: ?plot, ?hist, ?vectorExamples for most functions can be runned: example(plot)Text search for functions can be done byperforming: ??plot
  27. 27. ExampleSome example Affymetrix dataset to playwith Checking distribution of data Plotting data Clustering data Correlate data
  28. 28. Read filelibrary(affy)library(affydata)data(Dilution)print(Dilution)
  29. 29. Read filedil = pm(Dilution)[1:2000,]dil.ex = exprs(Dilution)[1:2000,]rownames(dil.ex) = row.names(probes(Dilution))[1:2000]
  30. 30. SummaryChecking what we gotsummary(dil)mva.pairs(dil)Or:boxplot(log(dil.ex))Or:hist(dil.ex, xlim=c(0,500), breaks=1000)
  31. 31. We need to normalise firstFor almost all experiments you have to applysome sort of normalisationdil.norm = maffy.normalize(dil, subset=1:nrow(dil))colnames(dil.norm) = colnames(dil)mva.pairs(dil.norm)
  32. 32. Most equal samplesApplying euclidian distance to detect mostequal samplesdil.norm.dist = dist(t(dil.norm))dil.norm.dist.hc = hclust(dil.norm.dist)plot(dil.norm.dist.hc)Do the same for the non normalised dataset
  33. 33. Checking expressionHeatmap representation of expression levelsfor different probesheatmap(dil.ex.norm[1:50,])You could apply a T-test for example to rankto only plot the most significant probes
  34. 34. Checking expressionHeatmap representation of expression levelsfor different probesheatmap(dil.ex.norm[1:50,])You could apply a T-test for example to rankto only plot the most significant probes
  35. 35. Checking expressionYou could apply a T-test for example to rankto only plot the most significant probeslibrary(genefilter)f = factor(c(1,1,2,2))dil.exp.norm.t = rowttests(dil.exp.norm, fac=f)heatmap(dil.exp.norm[order(dil.exp.norm.t$dm)[1:10],])
  36. 36. Want to know more?Using R will benefit all PhD’s in this roomLearning by doingLoads of basic examples at: http://addictedtor.free.fr/graphiques/ http://www.mayin.org/ajayshah/KB/R/ index.html http://www.r-project.org/
  37. 37. Just keep in mind......

×