Quantitatively monitor expression level for thousands of genes at a time.
All the methods and applications are based on Nylon membrane microarrays and can be extended to other DNA microarray analysis using other platforms.
Why normalization:
A number of systematic variations can occur during experiments. For example, different samples being compared are hybridized on different nylon membranes. Need normalization to remove these sources of variation.
Well normalized data are the foundation of good analysis results.
Alignment : each gene is represented by two spots. Match these two spots to a schematic representation of an array. Final intensity for this gene will be the average value of the intensities of these two spots.
Background calculation
external(global):median intensity of the black space between different panels.
user-defined external:median intensity of user-defined area
local:median intensity of the space surrounding the gene spot
Data transformation:
Adjusted intensity = raw intensity - background value
Each Clontech Stress array contains 234 sequences expressed in response to stress.
Each insert cDNA is denatured and UV cross-linked to a positively charged membrane
Samples are treated with DMSO and BaP (Benzo(a)pyrene) dissolved in DMSO. So DMSO is the control and BaP is the treatment.
DMSO and BaP treated samples are hybridized under the same condition each time. Two membranes are used three times for DMSO and BaP treated samples, respectively.
Three biological replicates done with the same membrane(s) (correlation occurs)
7.
Use Phosphor Imager laser scanner to obtain densities of each spot on filter. Control RNA Sample Test RNA Sample Hybridization to microarray filters radio-labelled cDNA probes Reverse-Transcription 33 P - dCTP 33 P - dCTP Compare densities at each spot to determine if treatment changes gene expression. Compile subset of differentially expressed genes. Gene Control Test A 1X 3X : : : Z 1X 0.5X
8.
Scatter plots of adjusted log intensities for paired experiments of D MSO vs BaP
Housekeeping genes are a set of genes whose expression levels are not affected by the treatment.
The normalization coefficient is the ratio of m C /m T , where m C and m T are the means of the selected housekeeping genes for control and treatment respectively.
Problem: housekeeping genes change their expression level sometimes. The assumption doesn’t hold.
Trimmed mean normalization (adjusted global method)
trim off 5% highest and lowest extreme values , then globally normalize data. The normalization coefficient is:
where are the trimmed means for the i th treatment and control respectively.
Global or local, parametric or nonparametric method
No unique normalization method for the same data. It depends on what kind of experiment you have and what the data look like.
No absolute criteria for normalization. Basically, the normalized log ratio should be centered around 0. Combing with post hoc analysis to choose the best one.
Hundreds of genes tested at the same time. Assume 1000 genes are not differentially expressed. P-value of 0.01(false positive rate) means that around 10 genes will nevertheless be significant.
Bonferroni correction: want to make sure that P[ 1 gene significant from 1000] 0.05. Consequently, p-value for a single gene to be announced as significant is: P [single gene] 0.05/1000 = 0.00005
Conservative and lower power.
keep FWR manageable and try some p-value, say 0.001 as the significant level.
Use the normalization method discussed above to normalize data.
Obtain the average log ratio(ALR) which is centered around zero.
Using normal approximation method.
Step I: Treating the maximum or minimum value of ALR greater than mean+3*sd or less than mean-3*sd as outlier, delete it from ALR and take it as a differentially expressed gene.
Step II: calculate the mean and sd for remaining genes and repeat step I.
Do above steps iteratively until no more ourlier exists. Then, calculate the 95% predictive interval for the remaining genes. Those values outside of the PI are significant.
The final set of differentially expressed genes include those outliers detected in step I and II and those outside of PI.
Assuming there is constant coefficient of variation c for the entire gene set
the observed differential expression, R k =T k /C k (ratio of treatment and control intensity at gene k), has a sampling distribution dependent only on c. R k is approximately normally distributed.
Assume
The density function of R becomes:
Use the Maximum likelihood method to estimate the constant c, and use the EM algorithm to get the final estimate of c and m.
Estimation of Variance: limited sample size (= few replicates)
Normal Distribution assumptions: error model still not clear
Multiple Testing
Excel add-in performing robust method for differential analysis of microarray data.( Method developed and implemented by the Tibshirani group at Stanford (free for academic use)
Permutation technique:Assuming no difference between conditions, all genes are from the same population.
False Discovery Rate: Number of falsely called genes divided by number of differential genes in original data
Cutoff point determination: set up critical point to eliminate genes whose intensity is less than this point.
Statistically significant? No unique method to analyze data. Some methods are better for one data set, but may not be good for other data sets. In practice, we have to try different ways to see which methods work well.
Biologically significant? For those genes picked up by statistics, we have to be careful to draw conclusions. Some genes shown to be significant may not be functionally meaningful. Conversely, genes that do not show up significant may be significant, especially for those genes at the boarder line in the statistical test.
Be the first to comment