Your SlideShare is downloading. ×
0
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Introduction to statistics ii
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to statistics ii

358

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
358
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Statistics for Microarray Data
  • 2. Background μ, σ2• Few observations made by a black box• What is the distribution behind the black box?• E.g., with what probability will it output a number bigger than 5?
  • 3. Approach• Easy to determine with many observations• With few observations..• Assume a canonical distribution based on prior knowledge• Determine parameters of this distribution using the observations, e.g., mean, variance
  • 4. Estimating the mean
  • 5. Estimating the variance σ2 Chi-Square if the original distribution was Normal
  • 6. Microarray Data• Many genes, 25000• 2 conditions (or more), many replicates within each condition• Which genes are differentially expressed between the two conditions?
  • 7. More Specifically• For a particular gene – Each condition is a black box – Say 3 observations from each black box• Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?
  • 8. Which Canonical Distribution• Use data with many replicates• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968• Distribution??
  • 9. What is a QQ Plot
  • 10. Distribution of log raw intensities across genes on a single array
  • 11. The QQ plot of log scale intensities(i.e., actual vs simulated from normal)
  • 12. QQ Plot against a Normal Distribution• 10 + 10 replicates in two groups• Single group QQ plot• Combined 2 groups QQ plot• Combined log-scale QQ plot Shapiro- Wilk Test
  • 13. Which Canonical Distribution• Assume log normal distribution
  • 14. Benford’s Law• Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]
  • 15. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?
  • 16. SDincreases linearlywith Mean SD vs Mean across 3 replicates plotted for all genes
  • 17. SD is flat now,except for very low values Another reason to work on the log scaleSD vs Mean across 3 replicates computed for all genes after log-transformation
  • 18. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES
  • 19. The T-Statistic
  • 20. The T-Statistic
  • 21. The T-Statistic
  • 22. The T-Statistic Flattened Normal or T- Distribution
  • 23. A Problem
  • 24. The curve fit here may be a better estimateLots of falsepositives can Not much be avoided difference here hereSD vs Mean across 3 replicates computed for all genes after log-transformattion
  • 25. Thank You

×