0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

Introduction to statistics ii

358

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
358
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
7
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript

• 1. Statistics for Microarray Data
• 2. Background μ, σ2• Few observations made by a black box• What is the distribution behind the black box?• E.g., with what probability will it output a number bigger than 5?
• 3. Approach• Easy to determine with many observations• With few observations..• Assume a canonical distribution based on prior knowledge• Determine parameters of this distribution using the observations, e.g., mean, variance
• 4. Estimating the mean
• 5. Estimating the variance σ2 Chi-Square if the original distribution was Normal
• 6. Microarray Data• Many genes, 25000• 2 conditions (or more), many replicates within each condition• Which genes are differentially expressed between the two conditions?
• 7. More Specifically• For a particular gene – Each condition is a black box – Say 3 observations from each black box• Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?
• 8. Which Canonical Distribution• Use data with many replicates• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968• Distribution??
• 9. What is a QQ Plot
• 10. Distribution of log raw intensities across genes on a single array
• 11. The QQ plot of log scale intensities(i.e., actual vs simulated from normal)
• 12. QQ Plot against a Normal Distribution• 10 + 10 replicates in two groups• Single group QQ plot• Combined 2 groups QQ plot• Combined log-scale QQ plot Shapiro- Wilk Test
• 13. Which Canonical Distribution• Assume log normal distribution
• 14. Benford’s Law• Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]
• 15. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?
• 16. SDincreases linearlywith Mean SD vs Mean across 3 replicates plotted for all genes
• 17. SD is flat now,except for very low values Another reason to work on the log scaleSD vs Mean across 3 replicates computed for all genes after log-transformation
• 18. Differential Expression μ1,σ12 μ2,σ22Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES
• 19. The T-Statistic
• 20. The T-Statistic
• 21. The T-Statistic
• 22. The T-Statistic Flattened Normal or T- Distribution
• 23. A Problem
• 24. The curve fit here may be a better estimateLots of falsepositives can Not much be avoided difference here hereSD vs Mean across 3 replicates computed for all genes after log-transformattion
• 25. Thank You