2. Background μ, σ2• Few observations made by a black box• What is the distribution behind the black box?• E.g., with what probability will it output a number bigger than 5?
3. Approach• Easy to determine with many observations• With few observations..• Assume a canonical distribution based on prior knowledge• Determine parameters of this distribution using the observations, e.g., mean, variance
4. Estimating the mean
5. Estimating the variance σ2 Chi-Square if the original distribution was Normal
6. Microarray Data• Many genes, 25000• 2 conditions (or more), many replicates within each condition• Which genes are differentially expressed between the two conditions?
7. More Specifically• For a particular gene – Each condition is a black box – Say 3 observations from each black box• Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?
8. Which Canonical Distribution• Use data with many replicates• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968• Distribution??
9. What is a QQ Plot
10. Distribution of log raw intensities across genes on a single array
11. The QQ plot of log scale intensities(i.e., actual vs simulated from normal)
12. QQ Plot against a Normal Distribution• 10 + 10 replicates in two groups• Single group QQ plot• Combined 2 groups QQ plot• Combined log-scale QQ plot Shapiro- Wilk Test
13. Which Canonical Distribution• Assume log normal distribution
14. Benford’s Law• Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]
15. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?
16. SDincreases linearlywith Mean SD vs Mean across 3 replicates plotted for all genes
17. SD is flat now,except for very low values Another reason to work on the log scaleSD vs Mean across 3 replicates computed for all genes after log-transformation
18. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES
19. The T-Statistic
20. The T-Statistic
21. The T-Statistic
22. The T-Statistic Flattened Normal or T- Distribution
23. A Problem
24. The curve fit here may be a better estimateLots of falsepositives can Not much be avoided difference here hereSD vs Mean across 3 replicates computed for all genes after log-transformattion