Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to statistics ii

528 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Introduction to statistics ii

  1. 1. Statistics for Microarray Data
  2. 2. Background μ, σ2• Few observations made by a black box• What is the distribution behind the black box?• E.g., with what probability will it output a number bigger than 5?
  3. 3. Approach• Easy to determine with many observations• With few observations..• Assume a canonical distribution based on prior knowledge• Determine parameters of this distribution using the observations, e.g., mean, variance
  4. 4. Estimating the mean
  5. 5. Estimating the variance σ2 Chi-Square if the original distribution was Normal
  6. 6. Microarray Data• Many genes, 25000• 2 conditions (or more), many replicates within each condition• Which genes are differentially expressed between the two conditions?
  7. 7. More Specifically• For a particular gene – Each condition is a black box – Say 3 observations from each black box• Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?
  8. 8. Which Canonical Distribution• Use data with many replicates• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968• Distribution??
  9. 9. What is a QQ Plot
  10. 10. Distribution of log raw intensities across genes on a single array
  11. 11. The QQ plot of log scale intensities(i.e., actual vs simulated from normal)
  12. 12. QQ Plot against a Normal Distribution• 10 + 10 replicates in two groups• Single group QQ plot• Combined 2 groups QQ plot• Combined log-scale QQ plot Shapiro- Wilk Test
  13. 13. Which Canonical Distribution• Assume log normal distribution
  14. 14. Benford’s Law• Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]
  15. 15. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?
  16. 16. SDincreases linearlywith Mean SD vs Mean across 3 replicates plotted for all genes
  17. 17. SD is flat now,except for very low values Another reason to work on the log scaleSD vs Mean across 3 replicates computed for all genes after log-transformation
  18. 18. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES
  19. 19. The T-Statistic
  20. 20. The T-Statistic
  21. 21. The T-Statistic
  22. 22. The T-Statistic Flattened Normal or T- Distribution
  23. 23. A Problem
  24. 24. The curve fit here may be a better estimateLots of falsepositives can Not much be avoided difference here hereSD vs Mean across 3 replicates computed for all genes after log-transformattion
  25. 25. Thank You

×