Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis
Microarray Experiments <ul><li>Some examples of the many types of microarray experiments currently being considered. </li>...
Statistical issues to be addressed. <ul><li>Image analysis. </li></ul><ul><li>Spot identification </li></ul><ul><li>Backgr...
A tool for analysis : R <ul><li>R is freeware that is rapidly becoming very widely used. </li></ul><ul><li>It can handle t...
Image Analysis and R <ul><li>In collaboration with the CSIRO (Sydney) , Jean Yee Hwa Yang and Terry Speed have developed a...
Using R at WEHI <ul><li>Currently only available on unix02. </li></ul><ul><li>Access from a Macintosh is limited to comman...
Using R at WEHI (2) NAT>R R : Copyright 2000, The R Development Core Team Version 1.0.0  (February 29, 2000) Type  &quot;d...
How to make a vector > x<-c(1,3,5,4,7,8) > x [1] 1 3 5 4 7 8 > t(x) [,1] [,2] [,3] [,4] [,5] [,6] [1,]  1  3  5  4  7  8 >...
How to make a matrix > xmat<-matrix(x,nrow=2,ncol=3,byrow=T) > xmat [,1] [,2] [,3] [1,]  1  3  5 [2,]  4  7  8 > xmat[1,2]...
Adding and removing a column  > addcol<-c(9,2) > > newxmat<-cbind(xmat,addcol) > newxmat addcol [1,] 1 5 7  9 [2,] 3 4 8  ...
A script to find mean of columns  > for( i in 1:3){ + print(mean(xmat[,i])) + } > > 2.0 > 4.5 > 7.5 > m<-0 for( i in 1:3){...
Reading in Data num GR GC SR SC NAME X Y CH1I CH1B CH1ISD CH1BSD CH2I CH2B CH2ISD CH2BSD 1 1 1 1 1 CL0001   1220.00   890....
Reading in data from a text file  >#check that file has same number of arguments >#on each line for all lines >count.field...
Getting spot info from the dataframe  >  cy3 <- CH2I # Green cy5 <- CH1I # Red > > cy3bc <- CH2I-CH2B  # Background Correc...
Always log the intensities  > > par(mfrow=c(2,3)) hist(cy3,col=&quot;green&quot;) plot(density(cy3),col=&quot;green&quot;)...
Normalisation  > > > > > par(mfrow=c(2,1)) plot(density(Cy3),type=&quot;n&quot;) lines(density(Cy3),col=&quot;green&quot;)...
Normalisation (2)  >  >K <- median( log2(cy3)-log2(cy5) ) > >k <- 2**K Cy5n <- k*cy5 Cy5n <- log2(cy5n) > > Green intensit...
Approximate normality of log ratios  >  par(mfrow=c(2,1)) plot(density(Cy5n-Cy3),col=&quot;purple&quot;) > >qqnorm(Cy5n-Cy...
A question of significance  > par(mfrow=c(1,1)) >plot(0.5*(Cy3+Cy5n),Cy5n-Cy3, xlab=&quot;Average of logRed and logGreen&q...
Identifying a spot on a plot  >  par(mfrow=c(1,1)) plot(0.5*(Cy3+Cy5n),Cy5n-Cy3, xlab=&quot;Average of logRed and logGreen...
Saving graphics to a file (postscript)  >postscript(“filename.ps”)  par(mfrow=c(1,1)) plot(0.5*(Cy3+Cy5n),Cy5n-Cy3, xlab=&...
Using R help  >  ?plot Generic X-Y Plotting Description: Generic function for plotting of R objects.  For more details abo...
Using R help (2) >  help.start()
R Help (3)
1 1 2 2 6 6 14 15 11 7 16 12 8 4 4 3 5 9 13 10 1  2  3  4  ……………….24 25 26 27 …………………..48 …… . … .. … . ... .. ..  1 .. .....
Level colour plot of background  > bkgmat<-matrix(1:24,nrow=24,ncol=1) for(i in 1:16){ s<-c((((i-1)*576)+1):(i*576)) m<-ma...
Conclusion <ul><li>R is flexible and powerful </li></ul><ul><li>Easy to read in data. </li></ul><ul><li>Enables manipulati...
Acknowledgements <ul><li>Terry Speed </li></ul><ul><li>Melanie Bahlo </li></ul><ul><li>Asa Wirapati </li></ul><ul><li>Geor...
Upcoming SlideShare
Loading in …5
×

Bringing A Statistical Package To The Biologist's Fingertips

732 views
687 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
732
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bringing A Statistical Package To The Biologist's Fingertips

  1. 1. Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis
  2. 2. Microarray Experiments <ul><li>Some examples of the many types of microarray experiments currently being considered. </li></ul><ul><li>Comparison to normal cells. </li></ul><ul><li>Comparison of many cell types using an appropriate pool of RNA as a reference. </li></ul><ul><li>Time series using either time 0 or past time as a reference </li></ul><ul><li>Knockout experiments </li></ul><ul><li>Factor experiments </li></ul>
  3. 3. Statistical issues to be addressed. <ul><li>Image analysis. </li></ul><ul><li>Spot identification </li></ul><ul><li>Background correction </li></ul><ul><li>Data analysis </li></ul><ul><li>Normalisation </li></ul><ul><li>Transformation </li></ul><ul><li>Significant genes </li></ul><ul><li>Large amounts of data </li></ul><ul><li>………………… .Need a flexible approach. </li></ul>
  4. 4. A tool for analysis : R <ul><li>R is freeware that is rapidly becoming very widely used. </li></ul><ul><li>It can handle the large data files used to analyse microarrays. </li></ul><ul><li>Is available for Unix, Linux and Windows. </li></ul><ul><li>Has excellent documentation and help available. </li></ul>
  5. 5. Image Analysis and R <ul><li>In collaboration with the CSIRO (Sydney) , Jean Yee Hwa Yang and Terry Speed have developed a microarray image analysis package that is currently being written for implementation using Z-image and R. </li></ul><ul><li>This automated image analysis program overcomes some of the problems and limitations of other commercial packages. </li></ul><ul><li>Output will automatically be setup for further analysis in R. </li></ul>
  6. 6. Using R at WEHI <ul><li>Currently only available on unix02. </li></ul><ul><li>Access from a Macintosh is limited to command line window only. The graphics window can only be seen if an X-Windows program is installed on the Mac. </li></ul><ul><li>However, if there is a demand for use of R at WEHI then Computer Centre will investigate options to change this situation. </li></ul><ul><li>Install R windows on a PC or install R for linux. </li></ul>
  7. 7. Using R at WEHI (2) NAT>R R : Copyright 2000, The R Development Core Team Version 1.0.0 (February 29, 2000) Type &quot;demo()&quot; for some demos, &quot;help()&quot; for on-line help, or &quot;help.start()&quot; for a HTML browser interface to help. Type &quot;q()&quot; to quit R. >q() Save workspace image? [y/n/c]: y NAT>R --vsize=50M --nsize=2000k
  8. 8. How to make a vector > x<-c(1,3,5,4,7,8) > x [1] 1 3 5 4 7 8 > t(x) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 3 5 4 7 8 > length(x) [1] 6 > index<-c(2,3,4) > x[index] [1] 3 5 4 >
  9. 9. How to make a matrix > xmat<-matrix(x,nrow=2,ncol=3,byrow=T) > xmat [,1] [,2] [,3] [1,] 1 3 5 [2,] 4 7 8 > xmat[1,2] [1] 3 > xmat[,3] [1] 5 8 > xmat<-matrix(x,nrow=2,ncol=3,byrow=F) > xmat [,1] [,2] [,3] [1,] 1 5 7 [2,] 3 4 8
  10. 10. Adding and removing a column > addcol<-c(9,2) > > newxmat<-cbind(xmat,addcol) > newxmat addcol [1,] 1 5 7 9 [2,] 3 4 8 2 > oldxmat<-newxmat[,-4] > oldxmat [1,] 1 5 7 [2,] 3 4 8 >
  11. 11. A script to find mean of columns > for( i in 1:3){ + print(mean(xmat[,i])) + } > > 2.0 > 4.5 > 7.5 > m<-0 for( i in 1:3){ m<-c(m,mean(xmat[,i])) } m<-m[-1] for( i in 1:3){ print(mean(xmat[,i])) } > dim(xmat) [1] 2 3 > m<-0 + for( i in 1:3){ + m<-c(m,mean(xmat[,i])) + } + m<-m[,-1] > + + + > > > > > > > > m [1] 2.0 4.5 7.5
  12. 12. Reading in Data num GR GC SR SC NAME X Y CH1I CH1B CH1ISD CH1BSD CH2I CH2B CH2ISD CH2BSD 1 1 1 1 1 CL0001 1220.00 890.00 1223.317505 168.473679 435.352264 37.599304 1014.603149 139.578949 446.614960 21.937578 2 1 1 1 2 CL0001 1400.00 890.00 1257.714233 233.368423 337.946320 90.568703 975.333313 142.684204 354.194031 22.934818 3 1 1 1 3 CL0008 1580.00 890.00 333.555542 144.000000 145.992569 15.944347 277.730164 126.842102 156.314529 9.719757
  13. 13. Reading in data from a text file >#check that file has same number of arguments >#on each line for all lines >count.fields(file=&quot;tp04sk1.txt&quot;,sep=&quot; &quot;,skip=0) > . . . . . . . . . . . . . . . . . 16 16 16 16 [9145] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9169] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9193] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9217] 16 >tp4sk1<- read.table(&quot;tp04sk1.txt&quot;, header=T, sep=&quot; &quot;, skip=0, row.names=1) > > > >attach(tp4sk1) > median(CH1I) [1] 375.627
  14. 14. Getting spot info from the dataframe > cy3 <- CH2I # Green cy5 <- CH1I # Red > > cy3bc <- CH2I-CH2B # Background Corrected. cy5bc <- CH1I-CH1B > # Get duplicates. > d1 <- seq(1,(dim(tp4sk1)[1]-1),2) d2 <- seq(2,(dim(tp4sk1)[1]),2) > > cy3d1 <- cy3bc[d1] cy3d2 <- cy3bc[d2] > cy5d1 <- cy5bc[d1] cy5d2 <- cy5bc[d2] >
  15. 15. Always log the intensities > > par(mfrow=c(2,3)) hist(cy3,col=&quot;green&quot;) plot(density(cy3),col=&quot;green&quot;) plot(density(Cy3),col=&quot;green&quot;) # Use Log base 2 hist(cy5,col=&quot;red&quot;) plot(density(cy5),col=&quot;red&quot;) plot(density(Cy5),col=&quot;red&quot;)> >
  16. 16. Normalisation > > > > > par(mfrow=c(2,1)) plot(density(Cy3),type=&quot;n&quot;) lines(density(Cy3),col=&quot;green&quot;) lines(density(Cy5),col=&quot;red&quot;) plot(Cy3,Cy5, xlab=&quot;Log(cy3) Background Corrected&quot;, ylab=&quot;Log(cy5) Background Corrected&quot;, main=&quot;The Need For Normalisation Between Green and Red Intensities&quot;) lines(lowess(Cy3,Cy5),col=&quot;yellow&quot;)
  17. 17. Normalisation (2) > >K <- median( log2(cy3)-log2(cy5) ) > >k <- 2**K Cy5n <- k*cy5 Cy5n <- log2(cy5n) > > Green intensity is a multiple of the red intensity. cy3 <- k*cy5 So when you take logs, log2(cy3) <- K+log2(cy5) Therefore, estimate K by the median difference of log intensities. K <- median( Cy3 - Cy5 ) k <- 2**(K) cy5n <- k*cy5 Cy5n <- log2(cy5n)
  18. 18. Approximate normality of log ratios > par(mfrow=c(2,1)) plot(density(Cy5n-Cy3),col=&quot;purple&quot;) > >qqnorm(Cy5n-Cy3, col=c(&quot;red&quot;,&quot;red&quot;,&quot;yellow&quot;,&quot;yellow&quot;,&quot;green&quot;, &quot;green&quot;,&quot;blue&quot;,&quot;blue&quot;,&quot;pink&quot;,&quot;pink&quot;,&quot;orange&quot;, &quot;orange&quot;,&quot;purple&quot;,&quot;purple&quot;,&quot;black&quot;,&quot;black&quot;)) > >
  19. 19. A question of significance > par(mfrow=c(1,1)) >plot(0.5*(Cy3+Cy5n),Cy5n-Cy3, xlab=&quot;Average of logRed and logGreen&quot;, ylab=&quot;Difference of logRed and logGreen&quot;, main=&quot;Variation In Intensities Is Not Constant&quot;, col=c(&quot;red&quot;,&quot;red&quot;,&quot;yellow&quot;,&quot;yellow&quot;,&quot;green&quot;, &quot;green&quot;,&quot;blue&quot;,&quot;blue&quot;,&quot;pink&quot;,&quot;pink&quot;,&quot;orange&quot;, &quot;orange&quot;,&quot;purple&quot;,&quot;purple&quot;,&quot;black&quot;,&quot;black&quot;)) > >lines(lowess(0.5*(Cy3+Cy5n), Cy5n-Cy3),col=”yellow&quot;) > >
  20. 20. Identifying a spot on a plot > par(mfrow=c(1,1)) plot(0.5*(Cy3+Cy5n),Cy5n-Cy3, xlab=&quot;Average of logRed and logGreen&quot;, ylab=&quot;Difference of logRed and logGreen&quot;, main=&quot;Variation In Intensities Is Not Constant&quot;, type=&quot;n&quot;,ylim=c(-4,4),xlim=c(6,12)) > text(0.5*(Cy3+Cy5n),Cy5n -Cy3, as.character=c(1:9216), col=c(&quot;red&quot;,&quot;red&quot;,&quot;yellow&quot;,&quot;yellow&quot;,&quot;green&quot;, &quot;green&quot;,&quot;blue&quot;,&quot;blue&quot;,&quot;pink&quot;,&quot;pink&quot;,&quot;orange&quot;, &quot;orange&quot;,&quot;purple&quot;,&quot;purple&quot;,&quot;black&quot;,&quot;black&quot;), cex=1) lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3), col=&quot;yellow&quot;)
  21. 21. Saving graphics to a file (postscript) >postscript(“filename.ps”) par(mfrow=c(1,1)) plot(0.5*(Cy3+Cy5n),Cy5n-Cy3, xlab=&quot;Average of logRed and logGreen&quot;, ylab=&quot;Difference of logRed and logGreen&quot;, main=&quot;Variation In Intensities Is Not Constant&quot;, type=&quot;n&quot;,ylim=c(-0.1,1),xlim=c(10,11)) text(0.5*(Cy3+Cy5n),Cy5n-Cy3, as.character=c(1:9216), col=c(&quot;red&quot;,&quot;red&quot;,&quot;yellow&quot;,&quot;yellow&quot;,&quot;green&quot;, &quot;green&quot;,&quot;blue&quot;,&quot;blue&quot;,&quot;pink&quot;,&quot;pink&quot;,&quot;orange&quot;, &quot;orange&quot;,&quot;purple&quot;,&quot;purple&quot;,&quot;black&quot;,&quot;black&quot;), cex=1) dev.off() >
  22. 22. Using R help > ?plot Generic X-Y Plotting Description: Generic function for plotting of R objects. For more details about the graphical parameter arguments, see `par'. Usage: plot(x, ...) plot(x, y, xlim=range(x), ylim=range(y), type=&quot;p&quot;, main, xlab, ylab, ...) plot(y ~ x, ...) Arguments: x: the coordinates of points in the plot. Alternatively, a single plotting structure or any R object with a `plot’ method can be provided. :
  23. 23. Using R help (2) > help.start()
  24. 24. R Help (3)
  25. 25. 1 1 2 2 6 6 14 15 11 7 16 12 8 4 4 3 5 9 13 10 1 2 3 4 ……………….24 25 26 27 …………………..48 …… . … .. … . ... .. .. 1 .. .. ... … . …… . ..…….. …………………………… .576 577 578 579 …………….1001 1002 1003 …..…………..1025 …… . … .. … . ... .. .. 2 .. .. ... … . …… . ..…….. ………………………… ..1152
  26. 26. Level colour plot of background > bkgmat<-matrix(1:24,nrow=24,ncol=1) for(i in 1:16){ s<-c((((i-1)*576)+1):(i*576)) m<-matrix(CH1B[s],nrow=24,ncol=24,byrow=T) bkgmat<-cbind(bkgmat,m) } bkgmat<-bkgmat[,-1] m1<-bkgmat[,1:96] m2<-bkgmat[,(97:192)] m3<-bkgmat[,(193:(3*96))] m4<-bkgmat[,(((3*96)+1):(4*96))] bkg<-rbind(m1,m2,m3,m4) > + + + > > + + + + > > > > > > filled.contour(1:96,1:96,bkg,nlevels=100,color.palette=heat.colors)
  27. 27. Conclusion <ul><li>R is flexible and powerful </li></ul><ul><li>Easy to read in data. </li></ul><ul><li>Enables manipulation of data. </li></ul><ul><li>Extensive control of and range of graphics. </li></ul><ul><li>Wide range of statistical functions. </li></ul><ul><li>Add on packages available. </li></ul><ul><li>Can write scripts as a text file to send to collaborators for importing into R. (Use source(“filename”) to import and execute code). </li></ul><ul><li>Can save all the work you do in a session. </li></ul>
  28. 28. Acknowledgements <ul><li>Terry Speed </li></ul><ul><li>Melanie Bahlo </li></ul><ul><li>Asa Wirapati </li></ul><ul><li>George Rudy </li></ul><ul><li>Jean Yee HwaYang </li></ul><ul><li>Chuang Fong Kong </li></ul><ul><li>Keith Slattery </li></ul>

×