SlideShare a Scribd company logo
1 of 21
Download to read offline






Training
Manual
Appendix

Crash Course:
R and BioConductor
Jeff Skinner, M.S.
Sudhir Varma, Ph.D.
Bioinformatics and Computational Biosciences Branch (BCBB)
NIH/NIAID/OD/OSMO/OCICB
http://bioinformatics.niaid.nih.gov
ScienceApps@niaid.nih.gov
Crash
Course:
R
and
BioConductor

2
Appendix
Solutions to Sample Problems for Students
#1. {Fisher’s iris data} Sir Ronald A. Fisher famously used this set of iris flower data
as an example to test his new linear discriminant statistical model. Now, the iris
data set is used as a historical example for new statistical classification models.
A) Search the help menu for the keyword “linear discriminant”, then report
the names of the functions and packages you find.
Ans. > help.search(“linear discriminant”) returns results for the
functions lda() and predict.lda() from the MASS package library.
B) Search the help menus or a search engine for additional classification
models that could be tested with the iris data.
Ans. Any results are OK, but two examples are the knn() function from the
class package library and the randomForest() function from the
randomForest package library.
C) The measurements from the iris data set were made in centimeters, but
suppose a researcher wanted to compare the performance of their classifier
for measurements in both cm and inches. Remember 1 cm = 0.3937 inch
and create a new iris data set with measurements in inches.
Ans. One possible answer is shown below:
> irisINCHES <- data.frame(0.3937*iris[,1:4],iris[,5])
> iris[1:4,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
> irisINCHES[1:4,]
Sepal.Length Sepal.Width Petal.Length Petal.Width iris...5.
1 2.00787 1.37795 0.55118 0.07874 setosa
2 1.92913 1.18110 0.55118 0.07874 setosa
3 1.85039 1.25984 0.51181 0.07874 setosa
4 1.81102 1.22047 0.59055 0.07874 setosa
Crash
Course:
R
and
BioConductor

3
D) Use indexing to verify that the 77th
plant (i.e. row 77) has petal length of
approximately 1.89 inches.
Ans. Two possible answers are shown below:
> iris[77,"Petal.Length"]*0.3937
[1] 1.88976
> irisINCHES[77,3]
[1] 1.88976
#2. {AFP data} Suppose alpha-fetoprotein (AFP) is a potential biomarker for liver
cancer and other cancer types. A researcher might be interested in AFP levels
before and after taking a new drug in one of four concentrations.
A) The example in section 2.7.2 of the manual provided a list of 20 AFP
levels before drug treatment. Use your own methods to enter a new
column of 20 AFP levels after drug treatment, then enter another column
with the difference between the pre- and post-treatment AFP levels
Ans. One possible answer is shown below:
# manually enter Alpha-fetoprotein (AFP) levels for 20 patients
> AFP.after <- AFP.before - 1.2 + 0.2*rnorm(20)
> AFP.diff <- AFP.after - AFP.before
> afp.data <- data.frame(subject,gender,height,weight,BMI,
drug,AFP.before,AFP.after,AFP.diff)
> afp.data
B) Verify the storage mode of the data set afp.data. Verify the storage
mode of the variable drug. Verify the storage mode of the variable
gender. Convert the storage mode of drug to factor.
Ans. One possible answer shown below
> class(afp.data)
[1] "data.frame"
> class(afp.data$drug)
[1] "numeric"
> class(afp.data$gender)
[1] “factor”
> afp.data$drug <- as.factor(afp.data$drug)
Crash
Course:
R
and
BioConductor

4
C) Create a subset of the AFP data that only includes male patients with
BMI > 25.5 or weight > 180 lbs. How many men are included in the
data subset?
Ans. Six male patients are included in the subset. One example is shown:
> afp.subset <- afp.data[afp.data$gender=="male",]
> indx <- afp.subset$BMI > 25.5 | afp.subset$weight > 180
> afp.subset <- afp.subset[indx,]
> afp.subset
subject gender height weight BMI drug ...
2 2 male 69.15696 202.9318 29.82865 5 ...
3 3 male 69.35599 211.0632 30.84607 10 ...
5 5 male 71.44586 241.4526 33.25317 20 ...
7 7 male 68.21618 297.4155 44.93081 5 ...
8 8 male 69.77130 289.2935 41.77731 10 ...
10 10 male 66.95951 178.6660 28.01385 20 ...
D) Sort the entire data subset created in part C) by the BMI variable in an
descending order. What is the row ordering of the sorted data subset?
Save the data subset as a comma separated value (.csv) text file.
Ans. The row order is: 7, 8, 5, 3, 2, 10. A possible solution is below:
> afp.subset <- afp.subset[order(afp.subset$BMI,
decreasing=TRUE),]
> afp.subset
subject gender height weight BMI drug ...
7 7 male 68.21618 297.4155 44.93081 5 ...
8 8 male 69.77130 289.2935 41.77731 10 ...
5 5 male 71.44586 241.4526 33.25317 20 ...
3 3 male 69.35599 211.0632 30.84607 10 ...
2 2 male 69.15696 202.9318 29.82865 5 ...
10 10 male 66.95951 178.6660 28.01385 20 ...
> write.csv(afp.subset,file="~/subset.csv")
#3. {AE data} Doctors, epidemiologists and other researchers look at adverse events
to explore the symptoms and medical conditions affecting patients. A researcher
might choose to look for associations between adverse events and diet.
A) One of the adverse events in the data table is “Malaise”. Recode the AE
data table, such that all entries for “Malaise” read “Discomfort” instead.
Ans. Hint: you need to convert the adverse event variable to a character variable
> AE$Adverse.Event <- as.character(AE$Adverse.Event)
> indx <- AE$Adverse.Event == "Malaise"
> AE$Adverse.Event <- replace(AE$Adverse.Event,indx,"Discomfort")
> AE$Adverse.Event <- as.factor(AE$Adverse.Event)
Crash
Course:
R
and
BioConductor

5
B) Look at the results of your recoded adverse events. How many different
types of adverse events are there? Look through their names. Do you see
any potential problems? Fix any problems that you might find.
Ans. Initially, there are 18 different types of adverse events. There appears to
be a typo; “Mylagia” should be “Myalgia”. After correction, there are 17
different types of adverse events.
> length(levels(AE$Adverse.Event))
[1] 18
> AE$Adverse.Event <- as.character(AE$Adverse.Event)
> indx <- AE$Adverse.Event == "Mylagia"
> AE$Adverse.Event <- replace(AE$Adverse.Event,indx,"Myalgia")
> AE$Adverse.Event <- as.factor(AE$Adverse.Event)
> length(levels(AE$Adverse.Event))
[1] 17
C) Create an adverse event table to examine relationship between different
adverse event symptoms and their severities. Make sure the “Discomfort”
AE shows up in the table, instead of “Malaise”.
Ans. One possible solution is shown:
> attach(AE)
> AEtable <- table(Adverse.Event,Severity)
> AEtable
Severity
Adverse.Event Mild Moderate Severe
Anemia 2 3 1
Arthralgia 2 0 0
Dimpling 1 0 0
Discomfort 1 1 3
Ecchymosis 0 2 1
Elavated CH50 0 0 1
Erythema 0 3 1
Headache 1 5 0
Induration 1 3 0
Leukopenia 1 1 2
Myalgia 2 0 1
Nausea 4 0 1
Nodule 0 1 0
Pain 2 5 0
Papule 0 3 0
Swelling 1 2 1
Tenderness 2 2 1
Crash
Course:
R
and
BioConductor

6
D) Search the help menus for the functions rowSums and colSums. Use these
functions to count up the number of patients with each adverse event and
the number of patients with mild, moderate and severe symptoms.
Ans. An example is shown below
> AEsymptoms <- rowSums(AEtable)
> AEsymptoms
Anemia Arthralgia Dimpling Discomfort ...
6 2 1 5 ...
> AEseverity <- colSums(AEtable)
> AEseverity
Mild Moderate Severe
20 31 13
E) Define a new variable AEmatrix by converting the AE table into a matrix.
Define two new matrix variables: LL = matrix(1,1,17) and RR = c(1,1,1).
Compute the products of LL by AEmatrix; AEmatrix by RR; and LL by
AEmatrix by RR. Do you notice anything?
Ans. The matrix product LL by AEmatrix is equal to the colSums(), AEmatrix
by RR is equal to the rowSums() and LL by AEmatrix by RR is equal to
the sample size n = 64. An example is shown below:
> LL = matrix(1,1,17)
> RR = c(1,1,1)
> LL %*% AEmatrix
Severity
Mild Moderate Severe
[1,] 20 31 13
> AEmatrix %*% RR
Adverse.Event [,1]
Anemia 6
Arthralgia 2
Dimpling 1
Discomfort 5
Ecchymosis 3
Elavated CH50 1
Erythema 4
Headache 6
Induration 4
Leukopenia 4
Myalgia 3
Nausea 5
Nodule 1
Pain 7
Papule 3
Swelling 4
Tenderness 5
> LL %*% AEmatrix %*% RR
[,1]
[1,] 64
Crash
Course:
R
and
BioConductor

7
#4. {Fisher’s iris data} Sir Ronald A. Fisher famously used this set of iris flower data
as an example to test his new linear discriminant statistical model. Now, the iris
data set is used as a historical example for new statistical classification models.
A) Make a boxplot of all four measurements from Fisher’s iris data
Ans. An example is shown below:
> boxplot(iris[,1:4],main="Fisher's Iris Data",ylab="cm",
xlab="measurement",col="wheat")
Crash
Course:
R
and
BioConductor

8
B) Create a multi-panel figure with histograms of all four measurments. Do
you notice anything that could not be seen from the boxplot?
Ans. An example is shown below:
> par(mfrow=c(2,2))
> hist(iris[,1],main="Fisher's Iris Data -- Sepal Length",
ylab="count",xlab="Sepal Length (cm)",col="red")
> hist(iris[,2],main="Fisher's Iris Data -- Sepal Width",
ylab="count",xlab="Sepal Width (cm)",col="yellow")
> hist(iris[,3],main="Fisher's Iris Data -- Petal Length",
ylab="count",xlab="Petal Length (cm)",col="green")
> hist(iris[,4],main="Fisher's Iris Data -- Petal Width",
ylab="count",xlab="Petal Width (cm)",col="blue")
The boxplots didn’t show the bimodal distribution of petal length and
petal width, probably caused by differences among species.
Crash
Course:
R
and
BioConductor

9
C) Create a multi-panel figure with boxplots of all four measurements,
paneled by the three different species. Do you notice any differences
among species?
Ans. An example is shown below:
> par(mfrow=c(1,3))
> boxplot(iris[iris$Species=="setosa",1:4],
main="Fisher's Iris Data -- Setosa",ylab="cm",
xlab="measurement",col="wheat")
> boxplot(iris[iris$Species=="versicolor",1:4],
main="Fisher's Iris Data -- Versicolor",ylab="cm",
xlab="measurement",col="olivedrab")
> boxplot(iris[iris$Species=="virginica",1:4],
main="Fisher's Iris Data -- Virginica",ylab="cm",
xlab="measurement",col="grey")
Yes. There are big differences among the three species.
Crash
Course:
R
and
BioConductor

10
#5. {AFP data} Suppose alpha-fetoprotein (AFP) is a potential biomarker for liver
cancer and other cancer types. A researcher might be interested in AFP levels
before and after taking a new drug in one of four concentrations.
A) In section 3.2.1, the barplot() and arrows() commands were used to
create a barchart of mean(BMI) by gender with error bars. Install the
sciplot package library and use the bargraph.CI() command to
replicate that graph.
Ans. An example is shown below:
> library(sciplot)
> bargraph.CI(as.factor(afp.data$gender),afp.data$BMI,
col=c("pink","sky blue"),
main="Mean BMI by Gender",ylim=c(0,50),ylab="BMI")
> legend(x="topleft",legend=c("Female","Male"),
fill=c("pink","sky blue"))
Crash
Course:
R
and
BioConductor

11
B) Use the bargraph.CI() command to create a bar chart that compares AFP
difference over all five drug concentrations.
Ans. An example is shown below:
> bargraph.CI(as.factor(afp.data$drug),afp.data$AFP.diff,
col=rainbow(5),main="Mean AFP Difference by Drug",
ylim=c(0,-2),ylab="AFP difference",
xlab="Drug Concentration")
> legend(x="topleft",legend=seq(0,20,by=5),fill=rainbow(5),
title="Drug Concentration")
Crash
Course:
R
and
BioConductor

12
C) Create an interleaved bar chart that plots mean AFP difference by both
drug concentration and gender
Ans. An example is shown below:
> bargraph.CI(as.factor(afp.data$drug),afp.data$AFP.diff,
group=as.factor(afp.data$gender),
col=c("pink","sky blue"),
main="Mean AFP Difference by Drug and Gender",
ylim=c(0,-2),ylab="AFP difference",
xlab="Drug Concentration")
> legend(x="topleft",legend=c("Female","Male"),
fill=c("pink","sky blue"))
#6. {AE data} Doctors, epidemiologists and other researchers look at adverse events
to explore the symptoms and medical conditions affecting patients. A researcher
might choose to look for associations between adverse events and diet.
A) Create a histogram of Percent Body Fat (or your choice of continuous
response variable), then overlay a normal curve.
Ans. An example is shown below:
Crash
Course:
R
and
BioConductor

13
> norm.curve <- qnorm(seq(0,1,length=10000),
mean(AE$Percent.Body.Fat),
sd(AE$Percent.Body.Fat))
> hist(AE$Percent.Body.Fat,col="wheat",freq=FALSE,
xlab=”Percent Body Fat”)
> lines(density(norm.curve))
B) Install the lattice package and use the barchart() command to graph the
AEtable data table created for question #3. C) in the previous chapter.
What kind of plot is this? Add the appropriate figure legend.
Ans. The plot is a stacked bar chart, with stacked boxes representing the mild,
moderate and severe symptoms. An example is shown below:
Crash
Course:
R
and
BioConductor

14
> barchart(AEtable,main="Bar Chart of Adverse Event by Severity",
col=c("red","yellow","blue"))
> legend(x="topright",legend=levels(AE$Severity),
fill=c("red","yellow","blue"))
#7. {Nonparametric statistics} Search the help menus to find the command(s) for a
non-parametric statistical test analogous to the Student’s t-test (e.g. Mann-
Whitney U-test, Wilcoxon rank sum test, ...). Repeat at least one of the Student’s
t-test examples from section 4.1 with this non-parametric test.
Ans. An example is shown below:
> # Define a vector of % body fat data for men from AE data
> bfat.m <- AE[AE$Gender == "Male",6]
> # Define a vector of % body fat data for women from AE data
> bfat.f <- AE[AE$Gender == "Female",6]
> # Compute a two-sided, WIlcoxon Rank Sum test with AE data
> wilcox.test(bfat.m,bfat.f,alternative="two.sided")
Wilcoxon rank sum test with continuity correction
data: bfat.m and bfat.f
W = 553, p-value = 0.5811
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(bfat.m, bfat.f, alt. = "two.sided") :
cannot compute exact p-value with ties
Crash
Course:
R
and
BioConductor

15
#8. {Linear models} Add a second predictor variable to the formula parameter of the
lm() procedure from the regression or ANOVA example in section 4.2 to create a
more complicated linear model. Use the AFP data.
Ans. An example of multiple regression is shown below:
> # Define afp.data data frame with stringsAsFactors FALSE
> afp.data <- data.frame(subject,gender,height,weight,BMI,drug,
AFP.before,AFP.after,AFP.diff,
stringsAsFactors=FALSE)
> # Call the lm() procedure to fit regression
> afp.reg <- lm(formula = AFP.diff ~ drug*BMI, data = afp.data)
> afp.reg
Call:
lm(formula = AFP.diff ~ drug * BMI, data = afp.data)
Coefficients:
(Intercept) drug BMI drug:BMI
-1.3568528 0.0123046 0.0049974 -0.0003010
> anova(afp.reg)
Analysis of Variance Table
Response: AFP.diff
Df Sum Sq Mean Sq F value Pr(>F)
drug 1 0.00863 0.00863 0.2017 0.6594
BMI 1 0.00384 0.00384 0.0897 0.7685
drug:BMI 1 0.00542 0.00542 0.1265 0.7267
Residuals 16 0.68512 0.04282
> summary(afp.reg)
Call:
lm(formula = AFP.diff ~ drug * BMI, data = afp.data)
Residuals:
Min 1Q Median 3Q Max
-0.26127 -0.12370 -0.01925 0.14384 0.40517
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.3568528 0.3473496 -3.906 0.00126 **
drug 0.0123046 0.0268771 0.458 0.65325
BMI 0.0049974 0.0107781 0.464 0.64913
drug:BMI -0.0003010 0.0008463 -0.356 0.72670
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2069 on 16 degrees of freedom
Multiple R-Squared: 0.02545, Adjusted R-squared: -0.1573
F-statistic: 0.1393 on 3 and 16 DF, p-value: 0.935
Crash
Course:
R
and
BioConductor

16
#9. {Workflow scripting} Create a script to automate the creation graphing and linear
model analysis of the AFP data. Use your previous results from questions #2, #5
and #8, if necessary.
Ans. An example is shown:
############### Import AFP data ########################
# generate a list of subject IDs, numbered from 1 to 20
subject <- 1:20
# create 10 entries for male subjects
males <- rep("male",10)
# create 10 entries for female subjects
females <- rep("female",10)
# combine male and female entries into one column vector
gender <- c(males,females)
# bind subjectID and gender columns together
afp.data <- cbind(subject,gender)
# generate 10 male and 10 female random normal heights
height <- as.numeric(c(rnorm(10,70,2.5),rnorm(10,64,2.2)))
# generate 10 male and 10 female random uniform weights
weight <- as.numeric(c(runif(10,155,320),runif(10,95,210)))
# compute body mass index (BMI) for 10 men and 10 women
BMI <- as.numeric((weight*703)/(height**2))
# enter five treatment levels of a new drug (ng/mL)
drug <- rep(x = seq(from = 0, to = 20, by = 5), times = 4)
# manually enter Alpha-fetoprotein (AFP) levels for 20 patients
AFP.before <-
as.numeric(c(0.8,2.3,1.1,4.8,3.7,12.5,0.3,4.4,4.9,0.0,1.8,2.4,23.
6,8.9,0.7,3.3,3.1,0.5,2.7,4.5))
AFP.after <- AFP.before - 1.2 + 0.2*rnorm(20)
AFP.diff <- AFP.after - AFP.before
Crash
Course:
R
and
BioConductor

17
afp.data <-
data.frame(subject,gender,height,weight,BMI,drug,AFP.before,AFP.a
fter,AFP.diff)
afp.data
attach(afp.data)
############### Run LM regression ########################
afp.reg <- lm(formula = AFP.diff ~ drug*BMI, data = afp.data)
afp.reg
pdf("Regression.pdf")
par(mfrow = c(3,1))
plot(drug,AFP.diff,ylab="difference",main="regression plot")
abline(coef=afp.reg$coefficients[c(1,2)])
plot(BMI,AFP.diff,ylab="difference",main="regression plot")
abline(coef=afp.reg$coefficients[c(1,3)])
plot(afp.reg$fitted.values,afp.reg$residuals,xlab="fitted",ylab="
residual",main="residual plot")
dev.off()
browseURL("Regression.pdf")
############### Convert drug to factor ###################
afp.data$drug <- as.factor(afp.data$drug)
############### Run LM ANOVA #############################
afp.aov <- lm(formula = AFP.diff ~ drug, data = afp.data)
afp.aov
afp.anova <- anova(afp.aov)
afp.summary <- summary(afp.aov)
############## Plot graphs ##############################
library(sciplot)
pdf("ANOVA.pdf")
main = "One-way ANOVA"
ylab = "AFP differences"
xlab = "drug concentrations"
colors = rainbow(5)
Crash
Course:
R
and
BioConductor

18
means = afp.aov$fitted.values[1:5]
names(means) = levels(afp.data$drug)
mp <- barplot(height =
means,main=main,xlab=xlab,ylab=ylab,col=colors,ylim=c(0,-2))
X0 <- X1 <- mp
Y0 <- means - afp.summary$sigma
Y1 <- means + afp.summary$sigma
arrows(X0,Y0,X1,Y1,code=3,angle=90)
dev.off()
browseURL("ANOVA.pdf")
#10. {Function scripts} Create your own script to compute two new types of row
statistic (e.g. standard deviation and interquartile range) for a data frame or
matrix. Be creative, add graphics or a statistical test (e.g. linear regression).
Ans. An example is shown below:
# Define a function to compute row statistics with a for() loop
row.stats.loop <- function(x){
# Initialize vectors
row.sd <- row.IQR <- vector("numeric",length=nrow(x))
# Use a for() loop to compute means and medians for each
row
for(i in 1:nrow(x)){
row.sd[i] <- sd(x[i,])
row.IQR[i] <- IQR(x[i,])}
# Perform a linear regression
row.reg <- lm(formula = row.sd ~ row.IQR)
# Create a list of output
output <- list()
output[["row sd"]] <- row.sd
output[["row IQR"]] <- row.IQR
output[[“lm”]] <- row.reg
output[[“anova”]] <- anova(row.reg)
output[[“summary”]] <- summary(row.reg)
# Call the output list to report final results
Crash
Course:
R
and
BioConductor

19
output}
#11. Download the microarray dataset with the accession number “GDS10” from the
GEO website using the GEOquery package
Ans. The following loads the library, downloads the dataset and converts it to an
ExpressionSet object
library("GEOquery")
gds = getGEO("GDS10")
expset=GDS2eSet(gds, do.log2=TRUE)
A) Convert the data into three data frames, one for gene expression, one for
phenotypes and one for gene annotations
Ans. The following is an example script that will do this. Here we convert gds,
the output from getGEO() to an ExpressionSet object before converting
to the three data frames. We can do this directly from the getGEO() output
too (see the documentation for the GEOquery package on CRAN)
#Extract the expression matrix
X=exprs(expset)
#Extract the phenotypes
pheno.names=varLabels(expset)
> pheno.names
[1] "sample" "tissue" "strain"
"disease.state"
[5] "description"
phenotypes=data.frame(sample=expset$sample, tissue=expset$tissue,
strain=expset$strain, disease.state=expset$disease.state,
description=expset$description)
#Convert each row from factor to character type
for(i in 1:ncol(phenotypes))
phenotypes[,i]=as.character(phenotypes[,i])
#Extract the gene annotations
annot.columns= fvarLabels(expset)
> annot.columns
[1] "ID" "GB_ACC" "SPOT_ID"
annot.obj=featureData(expset)
annot=data.frame(id=annot.obj$ID, genbank.acc=annot.obj$GB_ACC,
spot.id=annot.obj$SPOT_ID)
B) Plot boxplots for each sample in one plot with different colors for each
sample. (Hint: use the stack() function and use a formula in the
Crash
Course:
R
and
BioConductor

20
boxplot() function. A vector of n colors can be obtained by using
rainbow(n))
Ans. The following is probably the easiest way to do this. You should look up
the help page for stack() to better understand how this works.
nsamp=ncol(X)
boxcol=rainbow(nsamp)
X.stack=stack(as.data.frame(X))
#Draw the boxplot
#Option las=3 makes the x axis labels vertical
boxplot(values~ind, data=X.stack, col=boxcol, las=3)
C) Compare the samples from the thymus and spleen for diabetic-resistant
mice and find the 10 most significant genes using the adjusted p-value.
Ans. This is a relatively lengthy script, but the explanation for each step can be
found here and in the manual.
#Find the samples that come from diabetic resistant mice
that
#originate from thymus
qt=which(phenotypes$disease.state=="diabetic-resistant" &
phenotypes$tissue=="thymus")
Xt=X[,qt]
Crash
Course:
R
and
BioConductor

21
#Find the samples that come from diabetic resistant mice
that
#originate from spleen
qs=which(phenotypes$disease.state=="diabetic-resistant" &
phenotypes$tissue=="spleen")
Xs=X[,qs]
#Compute the p-value and fold change for all genes
p.value=c()
fold.change=c()
for(i in 1:nrow(Xs))
{
#Find number of non-missing samples
n1=sum(!is.na(Xs[i,]))
n2=sum(!is.na(Xt[i,]))
if(n1 >= 2 & n2 >=2)
{
tt.res=t.test(Xs[i,], Xt[i,])
p.value[i]=tt.res$p.value
#The log fold change is calculated by the
#difference in means between the two classes
fold.change[i]=tt.res$estimate[2]-
tt.res$estimate[1]
}else
{
p.value[i]=NA
fold.change[i]=NA
}
}
#Compute adjusted p-values
adj.p.value=p.adjust(p.value)
#Find the smallest 10 p-values
qo=order(adj.p.value)
sig.genes=qo[1:10]
> adj.p.value[sig.genes]
[1] 1.859514e-12 7.615543e-12 1.852015e-11
[4] 3.337001e-11 4.210158e-11 5.769339e-11
[7] 7.557780e-11 9.369532e-11 1.125353e-10
[10] 1.331595e-10
D) Write the gene annotations, p-value, adjusted p-value and expressions in
all the samples for these 10 genes to an CSV file.
Ans. An example is shown below
d=data.frame(annot[sig.genes,], p.value=
p.value[sig.genes], adj.p.value=adj.p.value[sig.genes],
X[sig.genes,])
write.csv(d, file="report.csv", row.names=FALSE)

More Related Content

Similar to Appendix: Crash course in R and BioConductor

Metaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data MiningMetaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data MiningVarun Ojha
 
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)Theodore Grammatikopoulos
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control StudySatish Gupta
 
Chapter3 biostatistics by Dr Ahmed Hussein
Chapter3 biostatistics by Dr Ahmed HusseinChapter3 biostatistics by Dr Ahmed Hussein
Chapter3 biostatistics by Dr Ahmed HusseinDr Ghaiath Hussein
 
Linear Modeling Survival Analysis Statistics Assignment Help
Linear Modeling Survival Analysis Statistics Assignment HelpLinear Modeling Survival Analysis Statistics Assignment Help
Linear Modeling Survival Analysis Statistics Assignment HelpStatistics Assignment Experts
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docxmercysuttle
 
Classification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesClassification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesLovely Professional University
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spssDr. Ravneet Kaur
 
1 Useful Hints on Assignment 5 Exercise 1 (Chapter
1  Useful Hints on Assignment 5 Exercise 1 (Chapter1  Useful Hints on Assignment 5 Exercise 1 (Chapter
1 Useful Hints on Assignment 5 Exercise 1 (ChapterMartineMccracken314
 
1 Useful Hints on Assignment 5 Exercise 1 (Chapter
1  Useful Hints on Assignment 5 Exercise 1 (Chapter1  Useful Hints on Assignment 5 Exercise 1 (Chapter
1 Useful Hints on Assignment 5 Exercise 1 (ChapterAbbyWhyte974
 
ECN 425 Introduction to Econometrics Alvin Murphy .docx
ECN 425 Introduction to Econometrics Alvin Murphy      .docxECN 425 Introduction to Econometrics Alvin Murphy      .docx
ECN 425 Introduction to Econometrics Alvin Murphy .docxtidwellveronique
 
Sparse Representation for Fetal QRS Detection in Abdominal ECG Recordings
Sparse Representation for Fetal QRS Detection in Abdominal ECG RecordingsSparse Representation for Fetal QRS Detection in Abdominal ECG Recordings
Sparse Representation for Fetal QRS Detection in Abdominal ECG RecordingsRiccardo Bernardini
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin NUI Galway
 
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...AIRCC Publishing Corporation
 
Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...
Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...
Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...AIRCC Publishing Corporation
 

Similar to Appendix: Crash course in R and BioConductor (20)

Metaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data MiningMetaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
 
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
 
Statistics Assignment Help
Statistics Assignment HelpStatistics Assignment Help
Statistics Assignment Help
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
Lab manual_statistik
Lab manual_statistikLab manual_statistik
Lab manual_statistik
 
Chapter3 biostatistics by Dr Ahmed Hussein
Chapter3 biostatistics by Dr Ahmed HusseinChapter3 biostatistics by Dr Ahmed Hussein
Chapter3 biostatistics by Dr Ahmed Hussein
 
Linear Modeling Survival Analysis Statistics Assignment Help
Linear Modeling Survival Analysis Statistics Assignment HelpLinear Modeling Survival Analysis Statistics Assignment Help
Linear Modeling Survival Analysis Statistics Assignment Help
 
ppt
pptppt
ppt
 
Longintro
LongintroLongintro
Longintro
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
 
Classification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesClassification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining Techniques
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spss
 
1 Useful Hints on Assignment 5 Exercise 1 (Chapter
1  Useful Hints on Assignment 5 Exercise 1 (Chapter1  Useful Hints on Assignment 5 Exercise 1 (Chapter
1 Useful Hints on Assignment 5 Exercise 1 (Chapter
 
1 Useful Hints on Assignment 5 Exercise 1 (Chapter
1  Useful Hints on Assignment 5 Exercise 1 (Chapter1  Useful Hints on Assignment 5 Exercise 1 (Chapter
1 Useful Hints on Assignment 5 Exercise 1 (Chapter
 
ECN 425 Introduction to Econometrics Alvin Murphy .docx
ECN 425 Introduction to Econometrics Alvin Murphy      .docxECN 425 Introduction to Econometrics Alvin Murphy      .docx
ECN 425 Introduction to Econometrics Alvin Murphy .docx
 
Sparse Representation for Fetal QRS Detection in Abdominal ECG Recordings
Sparse Representation for Fetal QRS Detection in Abdominal ECG RecordingsSparse Representation for Fetal QRS Detection in Abdominal ECG Recordings
Sparse Representation for Fetal QRS Detection in Abdominal ECG Recordings
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
 
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
 
Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...
Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...
Using Cuckoo Algorithm for Estimating Two GLSD Parameters and Comparing it wi...
 

More from Bioinformatics and Computational Biosciences Branch

More from Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Biological networks
Biological networksBiological networks
Biological networks
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Appendix: Crash course in R and BioConductor

  • 1. 
 
 
 Training
Manual
Appendix
 Crash Course: R and BioConductor Jeff Skinner, M.S. Sudhir Varma, Ph.D. Bioinformatics and Computational Biosciences Branch (BCBB) NIH/NIAID/OD/OSMO/OCICB http://bioinformatics.niaid.nih.gov ScienceApps@niaid.nih.gov
  • 2. Crash
Course:
R
and
BioConductor
 2 Appendix Solutions to Sample Problems for Students #1. {Fisher’s iris data} Sir Ronald A. Fisher famously used this set of iris flower data as an example to test his new linear discriminant statistical model. Now, the iris data set is used as a historical example for new statistical classification models. A) Search the help menu for the keyword “linear discriminant”, then report the names of the functions and packages you find. Ans. > help.search(“linear discriminant”) returns results for the functions lda() and predict.lda() from the MASS package library. B) Search the help menus or a search engine for additional classification models that could be tested with the iris data. Ans. Any results are OK, but two examples are the knn() function from the class package library and the randomForest() function from the randomForest package library. C) The measurements from the iris data set were made in centimeters, but suppose a researcher wanted to compare the performance of their classifier for measurements in both cm and inches. Remember 1 cm = 0.3937 inch and create a new iris data set with measurements in inches. Ans. One possible answer is shown below: > irisINCHES <- data.frame(0.3937*iris[,1:4],iris[,5]) > iris[1:4,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa > irisINCHES[1:4,] Sepal.Length Sepal.Width Petal.Length Petal.Width iris...5. 1 2.00787 1.37795 0.55118 0.07874 setosa 2 1.92913 1.18110 0.55118 0.07874 setosa 3 1.85039 1.25984 0.51181 0.07874 setosa 4 1.81102 1.22047 0.59055 0.07874 setosa
  • 3. Crash
Course:
R
and
BioConductor
 3 D) Use indexing to verify that the 77th plant (i.e. row 77) has petal length of approximately 1.89 inches. Ans. Two possible answers are shown below: > iris[77,"Petal.Length"]*0.3937 [1] 1.88976 > irisINCHES[77,3] [1] 1.88976 #2. {AFP data} Suppose alpha-fetoprotein (AFP) is a potential biomarker for liver cancer and other cancer types. A researcher might be interested in AFP levels before and after taking a new drug in one of four concentrations. A) The example in section 2.7.2 of the manual provided a list of 20 AFP levels before drug treatment. Use your own methods to enter a new column of 20 AFP levels after drug treatment, then enter another column with the difference between the pre- and post-treatment AFP levels Ans. One possible answer is shown below: # manually enter Alpha-fetoprotein (AFP) levels for 20 patients > AFP.after <- AFP.before - 1.2 + 0.2*rnorm(20) > AFP.diff <- AFP.after - AFP.before > afp.data <- data.frame(subject,gender,height,weight,BMI, drug,AFP.before,AFP.after,AFP.diff) > afp.data B) Verify the storage mode of the data set afp.data. Verify the storage mode of the variable drug. Verify the storage mode of the variable gender. Convert the storage mode of drug to factor. Ans. One possible answer shown below > class(afp.data) [1] "data.frame" > class(afp.data$drug) [1] "numeric" > class(afp.data$gender) [1] “factor” > afp.data$drug <- as.factor(afp.data$drug)
  • 4. Crash
Course:
R
and
BioConductor
 4 C) Create a subset of the AFP data that only includes male patients with BMI > 25.5 or weight > 180 lbs. How many men are included in the data subset? Ans. Six male patients are included in the subset. One example is shown: > afp.subset <- afp.data[afp.data$gender=="male",] > indx <- afp.subset$BMI > 25.5 | afp.subset$weight > 180 > afp.subset <- afp.subset[indx,] > afp.subset subject gender height weight BMI drug ... 2 2 male 69.15696 202.9318 29.82865 5 ... 3 3 male 69.35599 211.0632 30.84607 10 ... 5 5 male 71.44586 241.4526 33.25317 20 ... 7 7 male 68.21618 297.4155 44.93081 5 ... 8 8 male 69.77130 289.2935 41.77731 10 ... 10 10 male 66.95951 178.6660 28.01385 20 ... D) Sort the entire data subset created in part C) by the BMI variable in an descending order. What is the row ordering of the sorted data subset? Save the data subset as a comma separated value (.csv) text file. Ans. The row order is: 7, 8, 5, 3, 2, 10. A possible solution is below: > afp.subset <- afp.subset[order(afp.subset$BMI, decreasing=TRUE),] > afp.subset subject gender height weight BMI drug ... 7 7 male 68.21618 297.4155 44.93081 5 ... 8 8 male 69.77130 289.2935 41.77731 10 ... 5 5 male 71.44586 241.4526 33.25317 20 ... 3 3 male 69.35599 211.0632 30.84607 10 ... 2 2 male 69.15696 202.9318 29.82865 5 ... 10 10 male 66.95951 178.6660 28.01385 20 ... > write.csv(afp.subset,file="~/subset.csv") #3. {AE data} Doctors, epidemiologists and other researchers look at adverse events to explore the symptoms and medical conditions affecting patients. A researcher might choose to look for associations between adverse events and diet. A) One of the adverse events in the data table is “Malaise”. Recode the AE data table, such that all entries for “Malaise” read “Discomfort” instead. Ans. Hint: you need to convert the adverse event variable to a character variable > AE$Adverse.Event <- as.character(AE$Adverse.Event) > indx <- AE$Adverse.Event == "Malaise" > AE$Adverse.Event <- replace(AE$Adverse.Event,indx,"Discomfort") > AE$Adverse.Event <- as.factor(AE$Adverse.Event)
  • 5. Crash
Course:
R
and
BioConductor
 5 B) Look at the results of your recoded adverse events. How many different types of adverse events are there? Look through their names. Do you see any potential problems? Fix any problems that you might find. Ans. Initially, there are 18 different types of adverse events. There appears to be a typo; “Mylagia” should be “Myalgia”. After correction, there are 17 different types of adverse events. > length(levels(AE$Adverse.Event)) [1] 18 > AE$Adverse.Event <- as.character(AE$Adverse.Event) > indx <- AE$Adverse.Event == "Mylagia" > AE$Adverse.Event <- replace(AE$Adverse.Event,indx,"Myalgia") > AE$Adverse.Event <- as.factor(AE$Adverse.Event) > length(levels(AE$Adverse.Event)) [1] 17 C) Create an adverse event table to examine relationship between different adverse event symptoms and their severities. Make sure the “Discomfort” AE shows up in the table, instead of “Malaise”. Ans. One possible solution is shown: > attach(AE) > AEtable <- table(Adverse.Event,Severity) > AEtable Severity Adverse.Event Mild Moderate Severe Anemia 2 3 1 Arthralgia 2 0 0 Dimpling 1 0 0 Discomfort 1 1 3 Ecchymosis 0 2 1 Elavated CH50 0 0 1 Erythema 0 3 1 Headache 1 5 0 Induration 1 3 0 Leukopenia 1 1 2 Myalgia 2 0 1 Nausea 4 0 1 Nodule 0 1 0 Pain 2 5 0 Papule 0 3 0 Swelling 1 2 1 Tenderness 2 2 1
  • 6. Crash
Course:
R
and
BioConductor
 6 D) Search the help menus for the functions rowSums and colSums. Use these functions to count up the number of patients with each adverse event and the number of patients with mild, moderate and severe symptoms. Ans. An example is shown below > AEsymptoms <- rowSums(AEtable) > AEsymptoms Anemia Arthralgia Dimpling Discomfort ... 6 2 1 5 ... > AEseverity <- colSums(AEtable) > AEseverity Mild Moderate Severe 20 31 13 E) Define a new variable AEmatrix by converting the AE table into a matrix. Define two new matrix variables: LL = matrix(1,1,17) and RR = c(1,1,1). Compute the products of LL by AEmatrix; AEmatrix by RR; and LL by AEmatrix by RR. Do you notice anything? Ans. The matrix product LL by AEmatrix is equal to the colSums(), AEmatrix by RR is equal to the rowSums() and LL by AEmatrix by RR is equal to the sample size n = 64. An example is shown below: > LL = matrix(1,1,17) > RR = c(1,1,1) > LL %*% AEmatrix Severity Mild Moderate Severe [1,] 20 31 13 > AEmatrix %*% RR Adverse.Event [,1] Anemia 6 Arthralgia 2 Dimpling 1 Discomfort 5 Ecchymosis 3 Elavated CH50 1 Erythema 4 Headache 6 Induration 4 Leukopenia 4 Myalgia 3 Nausea 5 Nodule 1 Pain 7 Papule 3 Swelling 4 Tenderness 5 > LL %*% AEmatrix %*% RR [,1] [1,] 64
  • 7. Crash
Course:
R
and
BioConductor
 7 #4. {Fisher’s iris data} Sir Ronald A. Fisher famously used this set of iris flower data as an example to test his new linear discriminant statistical model. Now, the iris data set is used as a historical example for new statistical classification models. A) Make a boxplot of all four measurements from Fisher’s iris data Ans. An example is shown below: > boxplot(iris[,1:4],main="Fisher's Iris Data",ylab="cm", xlab="measurement",col="wheat")
  • 8. Crash
Course:
R
and
BioConductor
 8 B) Create a multi-panel figure with histograms of all four measurments. Do you notice anything that could not be seen from the boxplot? Ans. An example is shown below: > par(mfrow=c(2,2)) > hist(iris[,1],main="Fisher's Iris Data -- Sepal Length", ylab="count",xlab="Sepal Length (cm)",col="red") > hist(iris[,2],main="Fisher's Iris Data -- Sepal Width", ylab="count",xlab="Sepal Width (cm)",col="yellow") > hist(iris[,3],main="Fisher's Iris Data -- Petal Length", ylab="count",xlab="Petal Length (cm)",col="green") > hist(iris[,4],main="Fisher's Iris Data -- Petal Width", ylab="count",xlab="Petal Width (cm)",col="blue") The boxplots didn’t show the bimodal distribution of petal length and petal width, probably caused by differences among species.
  • 9. Crash
Course:
R
and
BioConductor
 9 C) Create a multi-panel figure with boxplots of all four measurements, paneled by the three different species. Do you notice any differences among species? Ans. An example is shown below: > par(mfrow=c(1,3)) > boxplot(iris[iris$Species=="setosa",1:4], main="Fisher's Iris Data -- Setosa",ylab="cm", xlab="measurement",col="wheat") > boxplot(iris[iris$Species=="versicolor",1:4], main="Fisher's Iris Data -- Versicolor",ylab="cm", xlab="measurement",col="olivedrab") > boxplot(iris[iris$Species=="virginica",1:4], main="Fisher's Iris Data -- Virginica",ylab="cm", xlab="measurement",col="grey") Yes. There are big differences among the three species.
  • 10. Crash
Course:
R
and
BioConductor
 10 #5. {AFP data} Suppose alpha-fetoprotein (AFP) is a potential biomarker for liver cancer and other cancer types. A researcher might be interested in AFP levels before and after taking a new drug in one of four concentrations. A) In section 3.2.1, the barplot() and arrows() commands were used to create a barchart of mean(BMI) by gender with error bars. Install the sciplot package library and use the bargraph.CI() command to replicate that graph. Ans. An example is shown below: > library(sciplot) > bargraph.CI(as.factor(afp.data$gender),afp.data$BMI, col=c("pink","sky blue"), main="Mean BMI by Gender",ylim=c(0,50),ylab="BMI") > legend(x="topleft",legend=c("Female","Male"), fill=c("pink","sky blue"))
  • 11. Crash
Course:
R
and
BioConductor
 11 B) Use the bargraph.CI() command to create a bar chart that compares AFP difference over all five drug concentrations. Ans. An example is shown below: > bargraph.CI(as.factor(afp.data$drug),afp.data$AFP.diff, col=rainbow(5),main="Mean AFP Difference by Drug", ylim=c(0,-2),ylab="AFP difference", xlab="Drug Concentration") > legend(x="topleft",legend=seq(0,20,by=5),fill=rainbow(5), title="Drug Concentration")
  • 12. Crash
Course:
R
and
BioConductor
 12 C) Create an interleaved bar chart that plots mean AFP difference by both drug concentration and gender Ans. An example is shown below: > bargraph.CI(as.factor(afp.data$drug),afp.data$AFP.diff, group=as.factor(afp.data$gender), col=c("pink","sky blue"), main="Mean AFP Difference by Drug and Gender", ylim=c(0,-2),ylab="AFP difference", xlab="Drug Concentration") > legend(x="topleft",legend=c("Female","Male"), fill=c("pink","sky blue")) #6. {AE data} Doctors, epidemiologists and other researchers look at adverse events to explore the symptoms and medical conditions affecting patients. A researcher might choose to look for associations between adverse events and diet. A) Create a histogram of Percent Body Fat (or your choice of continuous response variable), then overlay a normal curve. Ans. An example is shown below:
  • 13. Crash
Course:
R
and
BioConductor
 13 > norm.curve <- qnorm(seq(0,1,length=10000), mean(AE$Percent.Body.Fat), sd(AE$Percent.Body.Fat)) > hist(AE$Percent.Body.Fat,col="wheat",freq=FALSE, xlab=”Percent Body Fat”) > lines(density(norm.curve)) B) Install the lattice package and use the barchart() command to graph the AEtable data table created for question #3. C) in the previous chapter. What kind of plot is this? Add the appropriate figure legend. Ans. The plot is a stacked bar chart, with stacked boxes representing the mild, moderate and severe symptoms. An example is shown below:
  • 14. Crash
Course:
R
and
BioConductor
 14 > barchart(AEtable,main="Bar Chart of Adverse Event by Severity", col=c("red","yellow","blue")) > legend(x="topright",legend=levels(AE$Severity), fill=c("red","yellow","blue")) #7. {Nonparametric statistics} Search the help menus to find the command(s) for a non-parametric statistical test analogous to the Student’s t-test (e.g. Mann- Whitney U-test, Wilcoxon rank sum test, ...). Repeat at least one of the Student’s t-test examples from section 4.1 with this non-parametric test. Ans. An example is shown below: > # Define a vector of % body fat data for men from AE data > bfat.m <- AE[AE$Gender == "Male",6] > # Define a vector of % body fat data for women from AE data > bfat.f <- AE[AE$Gender == "Female",6] > # Compute a two-sided, WIlcoxon Rank Sum test with AE data > wilcox.test(bfat.m,bfat.f,alternative="two.sided") Wilcoxon rank sum test with continuity correction data: bfat.m and bfat.f W = 553, p-value = 0.5811 alternative hypothesis: true location shift is not equal to 0 Warning message: In wilcox.test.default(bfat.m, bfat.f, alt. = "two.sided") : cannot compute exact p-value with ties
  • 15. Crash
Course:
R
and
BioConductor
 15 #8. {Linear models} Add a second predictor variable to the formula parameter of the lm() procedure from the regression or ANOVA example in section 4.2 to create a more complicated linear model. Use the AFP data. Ans. An example of multiple regression is shown below: > # Define afp.data data frame with stringsAsFactors FALSE > afp.data <- data.frame(subject,gender,height,weight,BMI,drug, AFP.before,AFP.after,AFP.diff, stringsAsFactors=FALSE) > # Call the lm() procedure to fit regression > afp.reg <- lm(formula = AFP.diff ~ drug*BMI, data = afp.data) > afp.reg Call: lm(formula = AFP.diff ~ drug * BMI, data = afp.data) Coefficients: (Intercept) drug BMI drug:BMI -1.3568528 0.0123046 0.0049974 -0.0003010 > anova(afp.reg) Analysis of Variance Table Response: AFP.diff Df Sum Sq Mean Sq F value Pr(>F) drug 1 0.00863 0.00863 0.2017 0.6594 BMI 1 0.00384 0.00384 0.0897 0.7685 drug:BMI 1 0.00542 0.00542 0.1265 0.7267 Residuals 16 0.68512 0.04282 > summary(afp.reg) Call: lm(formula = AFP.diff ~ drug * BMI, data = afp.data) Residuals: Min 1Q Median 3Q Max -0.26127 -0.12370 -0.01925 0.14384 0.40517 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.3568528 0.3473496 -3.906 0.00126 ** drug 0.0123046 0.0268771 0.458 0.65325 BMI 0.0049974 0.0107781 0.464 0.64913 drug:BMI -0.0003010 0.0008463 -0.356 0.72670 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2069 on 16 degrees of freedom Multiple R-Squared: 0.02545, Adjusted R-squared: -0.1573 F-statistic: 0.1393 on 3 and 16 DF, p-value: 0.935
  • 16. Crash
Course:
R
and
BioConductor
 16 #9. {Workflow scripting} Create a script to automate the creation graphing and linear model analysis of the AFP data. Use your previous results from questions #2, #5 and #8, if necessary. Ans. An example is shown: ############### Import AFP data ######################## # generate a list of subject IDs, numbered from 1 to 20 subject <- 1:20 # create 10 entries for male subjects males <- rep("male",10) # create 10 entries for female subjects females <- rep("female",10) # combine male and female entries into one column vector gender <- c(males,females) # bind subjectID and gender columns together afp.data <- cbind(subject,gender) # generate 10 male and 10 female random normal heights height <- as.numeric(c(rnorm(10,70,2.5),rnorm(10,64,2.2))) # generate 10 male and 10 female random uniform weights weight <- as.numeric(c(runif(10,155,320),runif(10,95,210))) # compute body mass index (BMI) for 10 men and 10 women BMI <- as.numeric((weight*703)/(height**2)) # enter five treatment levels of a new drug (ng/mL) drug <- rep(x = seq(from = 0, to = 20, by = 5), times = 4) # manually enter Alpha-fetoprotein (AFP) levels for 20 patients AFP.before <- as.numeric(c(0.8,2.3,1.1,4.8,3.7,12.5,0.3,4.4,4.9,0.0,1.8,2.4,23. 6,8.9,0.7,3.3,3.1,0.5,2.7,4.5)) AFP.after <- AFP.before - 1.2 + 0.2*rnorm(20) AFP.diff <- AFP.after - AFP.before
  • 17. Crash
Course:
R
and
BioConductor
 17 afp.data <- data.frame(subject,gender,height,weight,BMI,drug,AFP.before,AFP.a fter,AFP.diff) afp.data attach(afp.data) ############### Run LM regression ######################## afp.reg <- lm(formula = AFP.diff ~ drug*BMI, data = afp.data) afp.reg pdf("Regression.pdf") par(mfrow = c(3,1)) plot(drug,AFP.diff,ylab="difference",main="regression plot") abline(coef=afp.reg$coefficients[c(1,2)]) plot(BMI,AFP.diff,ylab="difference",main="regression plot") abline(coef=afp.reg$coefficients[c(1,3)]) plot(afp.reg$fitted.values,afp.reg$residuals,xlab="fitted",ylab=" residual",main="residual plot") dev.off() browseURL("Regression.pdf") ############### Convert drug to factor ################### afp.data$drug <- as.factor(afp.data$drug) ############### Run LM ANOVA ############################# afp.aov <- lm(formula = AFP.diff ~ drug, data = afp.data) afp.aov afp.anova <- anova(afp.aov) afp.summary <- summary(afp.aov) ############## Plot graphs ############################## library(sciplot) pdf("ANOVA.pdf") main = "One-way ANOVA" ylab = "AFP differences" xlab = "drug concentrations" colors = rainbow(5)
  • 18. Crash
Course:
R
and
BioConductor
 18 means = afp.aov$fitted.values[1:5] names(means) = levels(afp.data$drug) mp <- barplot(height = means,main=main,xlab=xlab,ylab=ylab,col=colors,ylim=c(0,-2)) X0 <- X1 <- mp Y0 <- means - afp.summary$sigma Y1 <- means + afp.summary$sigma arrows(X0,Y0,X1,Y1,code=3,angle=90) dev.off() browseURL("ANOVA.pdf") #10. {Function scripts} Create your own script to compute two new types of row statistic (e.g. standard deviation and interquartile range) for a data frame or matrix. Be creative, add graphics or a statistical test (e.g. linear regression). Ans. An example is shown below: # Define a function to compute row statistics with a for() loop row.stats.loop <- function(x){ # Initialize vectors row.sd <- row.IQR <- vector("numeric",length=nrow(x)) # Use a for() loop to compute means and medians for each row for(i in 1:nrow(x)){ row.sd[i] <- sd(x[i,]) row.IQR[i] <- IQR(x[i,])} # Perform a linear regression row.reg <- lm(formula = row.sd ~ row.IQR) # Create a list of output output <- list() output[["row sd"]] <- row.sd output[["row IQR"]] <- row.IQR output[[“lm”]] <- row.reg output[[“anova”]] <- anova(row.reg) output[[“summary”]] <- summary(row.reg) # Call the output list to report final results
  • 19. Crash
Course:
R
and
BioConductor
 19 output} #11. Download the microarray dataset with the accession number “GDS10” from the GEO website using the GEOquery package Ans. The following loads the library, downloads the dataset and converts it to an ExpressionSet object library("GEOquery") gds = getGEO("GDS10") expset=GDS2eSet(gds, do.log2=TRUE) A) Convert the data into three data frames, one for gene expression, one for phenotypes and one for gene annotations Ans. The following is an example script that will do this. Here we convert gds, the output from getGEO() to an ExpressionSet object before converting to the three data frames. We can do this directly from the getGEO() output too (see the documentation for the GEOquery package on CRAN) #Extract the expression matrix X=exprs(expset) #Extract the phenotypes pheno.names=varLabels(expset) > pheno.names [1] "sample" "tissue" "strain" "disease.state" [5] "description" phenotypes=data.frame(sample=expset$sample, tissue=expset$tissue, strain=expset$strain, disease.state=expset$disease.state, description=expset$description) #Convert each row from factor to character type for(i in 1:ncol(phenotypes)) phenotypes[,i]=as.character(phenotypes[,i]) #Extract the gene annotations annot.columns= fvarLabels(expset) > annot.columns [1] "ID" "GB_ACC" "SPOT_ID" annot.obj=featureData(expset) annot=data.frame(id=annot.obj$ID, genbank.acc=annot.obj$GB_ACC, spot.id=annot.obj$SPOT_ID) B) Plot boxplots for each sample in one plot with different colors for each sample. (Hint: use the stack() function and use a formula in the
  • 20. Crash
Course:
R
and
BioConductor
 20 boxplot() function. A vector of n colors can be obtained by using rainbow(n)) Ans. The following is probably the easiest way to do this. You should look up the help page for stack() to better understand how this works. nsamp=ncol(X) boxcol=rainbow(nsamp) X.stack=stack(as.data.frame(X)) #Draw the boxplot #Option las=3 makes the x axis labels vertical boxplot(values~ind, data=X.stack, col=boxcol, las=3) C) Compare the samples from the thymus and spleen for diabetic-resistant mice and find the 10 most significant genes using the adjusted p-value. Ans. This is a relatively lengthy script, but the explanation for each step can be found here and in the manual. #Find the samples that come from diabetic resistant mice that #originate from thymus qt=which(phenotypes$disease.state=="diabetic-resistant" & phenotypes$tissue=="thymus") Xt=X[,qt]
  • 21. Crash
Course:
R
and
BioConductor
 21 #Find the samples that come from diabetic resistant mice that #originate from spleen qs=which(phenotypes$disease.state=="diabetic-resistant" & phenotypes$tissue=="spleen") Xs=X[,qs] #Compute the p-value and fold change for all genes p.value=c() fold.change=c() for(i in 1:nrow(Xs)) { #Find number of non-missing samples n1=sum(!is.na(Xs[i,])) n2=sum(!is.na(Xt[i,])) if(n1 >= 2 & n2 >=2) { tt.res=t.test(Xs[i,], Xt[i,]) p.value[i]=tt.res$p.value #The log fold change is calculated by the #difference in means between the two classes fold.change[i]=tt.res$estimate[2]- tt.res$estimate[1] }else { p.value[i]=NA fold.change[i]=NA } } #Compute adjusted p-values adj.p.value=p.adjust(p.value) #Find the smallest 10 p-values qo=order(adj.p.value) sig.genes=qo[1:10] > adj.p.value[sig.genes] [1] 1.859514e-12 7.615543e-12 1.852015e-11 [4] 3.337001e-11 4.210158e-11 5.769339e-11 [7] 7.557780e-11 9.369532e-11 1.125353e-10 [10] 1.331595e-10 D) Write the gene annotations, p-value, adjusted p-value and expressions in all the samples for these 10 genes to an CSV file. Ans. An example is shown below d=data.frame(annot[sig.genes,], p.value= p.value[sig.genes], adj.p.value=adj.p.value[sig.genes], X[sig.genes,]) write.csv(d, file="report.csv", row.names=FALSE)