Successfully reported this slideshow.
Upcoming SlideShare
×

# JHU Job Talk

1,557 views

Published on

Jeff Leek's JHU Job Talk from 2009 on surrogate variable analysis.

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### JHU Job Talk

1. 1. A General Framework for Multiple Testing Dependence Jeffrey Leek Johns Hopkins University School of Medicine
2. 2. High-dimensional multiple hypothesis testing is common. Problem: Dependence between tests can result in incorrect statistical and scientific results. A solution: Define and address multiple testing dependence at the level of the data – not the P-values. Big Picture Ideas
3. 3. High-Dimensional Multiple Testing Is Common Spatial EpidemiologyBrain Imaging Molecular Biology
4. 4. 4 Inflammation and the Host Response to Injury mRNA Expression ~50,000 genes Clinical Data >150 clinical variables Patient 1 Patient 2 Patient 166…. MOF measures severity of injury
5. 5. Data at Initial Time Point Multiple Organ Failure
6. 6. Simple Analysis 1. Fit the model to the data, xi, for gene i: xi = ai + biMOF + ei 2. Calculate P-values for testing the hypotheses: H0: bi = 0 vs. H1: bi ≠ 0 3
7. 7. Four “Replicated” Studies Phase 1 Phase 3 Phase 2 Phase 4 P-value P-value P-value P-value Frequency Frequency Frequency Frequency
8. 8. •  Data for test i: •  “Primary variable(s)”: •  Model: •  Hypothesis test i: € xi = xi1,xi2,…,xin( ) € Y = y1,y2,…,yn( ) € xij = ai + biksk y j( ) k=1 d ∑ + eij H0i :bi ∈ Ω0 H1i :bi ∈ Ω1 {m hypothesis tests, n observations per test} Start With The Whole Data
9. 9. = + X = B S(Y) + E observations tests Underlying Model
10. 10. A Simple Simulated Example Independent E Dependent E Genes Genes Arrays Arrays
11. 11. Null P-Value Distributions Independent E Dependent E Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency P-value P-value P-value P-value P-value P-value P-value P-value
12. 12. Null P-Value Distributions |ρ| = 0.40 |ρ| = 0.31 |ρ| = 0.10 |ρ| = 0.00Correlation Independent E Dependent E Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency P-value P-value P-value P-value P-value P-value P-value P-value
13. 13. Null Distribution Behavior Dependent E Independent E
14. 14. False Discovery Rate Estimates Independent E Dependent E
15. 15. Ranking Estimates Independent E Dependent E
16. 16. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution
17. 17. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution Existing Approaches Empirical null approaches modify the null distribution at the test-statistic level Dependence adjustments conservatively modify the P-value threshold
18. 18. Examples of Existing Approaches •  Empirical Null – Devlin and Roeder Biometrics (1999) – Efron JASA (2004) – Schwartzman AOAS (2008) •  Error Rate Adjustments – Benjamini and Yekutieli Annals of Statistics (2001) – Romano, Shaikh, and Wolf Test (2001) – Dudoit, Gilbert, van der Laan Biometrical Journal (2008)
19. 19. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution Our Approach Fit the model: X = BS + ΓG + U where G is a valid dependence kernel
20. 20. Dependence and bias are no longer present at any of these steps; standard methods can be used. Data X Fit Model X= BS + E Obtain and R € ˆB Calculate P-values Form P-value Threshold When To Address Dependence? Form Test-Statistics and Null Distribution Our Approach Fit the model: X = BS + ΓG + U where G is a valid dependence kernel
21. 21. New Dependence Definitions Definition – Data X are population-level multiple testing dependent if: Definition - Data X are estimation-level multiple testing dependent if: Leek and Storey (2008)
22. 22. Structure in E Array MOF1Genes Signal + Dependent Noise Dependent Noise Independent Noise
23. 23. = + X = B S + E observations tests data random variation primary variables Decomposing E
24. 24. = + X = B S + H + U tests + independent variation observations data primary variables dependent variation Decomposing E
25. 25. = + X = B S + Γ G + U tests + independent variation observations data primary variables dependence kernel Decomposing E H
26. 26. Decomposing E Theorem Let the data be distributed according to the model: Suppose that for each ei there is no Borel measurable function, g, such that ei =g(ei,…,ei-1,ei+1,…,em) almost surely. Then there exist matrices Γ(m×r), G(r×n) (r ≤ n) and U(m×n) such that: where the rows of U are independent and ui ≠ 0 and ui=hi(ei) for a non-random Borel measurable function hi. Leek and Storey (2008)
27. 27. Dependence Kernel Leek and Storey (2008) Definition – Dependence Kernel An r ×n matrix G forms a dependence kernel for the data X, if the following equality holds: X = BS + E = BS + ΓG + U where the rows of U are independent.
28. 28. Fitting S & G Results In Independent Tests Leek and Storey (2008) Theorem Let G be any valid dependence kernel for the data X. Suppose that the model: is fit by least squares resulting in residuals: if the rowspace jointly spanned by S and G has dimension less than n, then the ri and the are jointly independent given S and G and: € ˆbi
29. 29. = + X = B S + Γ G + U tests + independent variation observations data primary variables dependence kernel A “Blessing” of Dimensionality
30. 30. Iteratively Reweighted Surrogate Variable Analysis 1.  Estimate the row dimension, , of G. 2.  Form an initial estimate equal to the first right singular vectors of R = X - S. 3.  Estimate . 4.  Weight the ith row of X by and set to be the first right singular vectors of the weighted matrix. ˆG(b+1) € ˆr € ˆB Iterate for b=0,…,B: € ˆG0 ˆr € X = BS + ΓG + U € xi = biS + γiG + ui Whole data: Test i data: € ˆr
31. 31. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
32. 32. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
33. 33. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
34. 34. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
35. 35. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
36. 36. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
37. 37. An Example of the IRW-SVA Algorithm The Data True GEstimate of GPr(G & !S)
38. 38. Iteratively Re-weighted Surrogate Variable Analysis 1.  Estimate the row dimension, , of G. 2.  Form an initial estimate equal to the first right singular vectors of R = X - S. 3.  Estimate . 4.  Weight the ith row of X by and set to be the first right singular vectors of the weighted matrix. ˆG(b+1) € ˆr € ˆB € ˆG0 ˆr € X = BS + ΓG + U € xi = biS + γiG + ui Whole data: Test i data: € ˆr Iterate for b=0,…,B:
39. 39. 1.  Buja and Eyuboglu (1992) proposed a permutation approach. 2.  Patterson, Price, and Reich (2006) proposed a sequential testing strategy based on Tracey- Widom theory. 3.  Leek (in preparation) proposes an eigenvalue estimator that is consistent in the number of tests. Estimating The Row Dimension of G
40. 40. 1.  Assume the data follow X = BS + ΓG + U, where G and S have row dimensions r and d, r + d < n. 2.  Calculate the singular values s1,…, sn of X and choose b, such that r+d < b. 3.  Calculate the eigenvalues, λ1,…, λn of where P = I - S(STS)-1ST and R = XP. 4.  Set ˆr = 1 λj > m−1/ 3 ( ) j=1 n ∑ € € 1 m RT R − sb 2 P[ ] Estimating The Row Dimension of G
41. 41. Theorem As , is a consistent estimate of the row dimension of G, provided that: (1) uij are independent (2) E[uij]=0 (3)  (4)  (5)  ΓTΓ is positive definite with unique eigenvalues € m → ∞ € E[uij 2 ] = σi 2 < M1 € E[uij 4 ] < M2 € lim m→∞ 1 m Leek (In Prep.) € ˆr = 1 λj > m−1/ 3 ( ) j=1 n ∑ Estimating The Row Dimension of G
42. 42. Iteratively Re-weighted Surrogate Variable Analysis 1.  Estimate the row dimension, , of G. 2.  Form an initial estimate equal to the first right singular vectors of R = X - S. 3.  Estimate . 4.  Weight the ith row of X by and set to be the first right singular vectors of the weighted matrix. ˆG(b+1) € ˆr € ˆB € ˆG0 ˆr € X = BS + ΓG + U € xi = biS + γiG + ui Whole data: Test i data: € ˆr Iterate for b=0,…,B:
43. 43. Break The Estimation Into Two Components
44. 44. 1.  Form F-statistics F1,…,Fm for testing the hypotheses: 2.  Bootstrap from the conditional null model to obtain null- statistics , k =1,…K. 3.  From Bayes’ Theorem: where and . Estimating the Probability Weights € F1 0k ,...,Fm 0k € Fi 0k ~ g0 € Fi ~ π0g0 + (1− π0)g1
45. 45. 1.  Form F-statistics F1,…,Fm for testing the hypotheses: 2.  Bootstrap from the conditional null model to obtain null- statistics , k =1,…K. 3.  From Bayes’ Theorem: 4.  Estimate the ratio of the densities with a non-parametric logistic regression where Fi are “successes” and Fi 0k are “failures” (Anderson and Blair 1982). where and . . Estimating the Probability Weights € F1 0k ,...,Fm 0k € Fi 0k ~ g0 € Fi ~ π0g0 + (1− π0)g1
46. 46. 1.  Form F-statistics F1,…,Fm for testing the hypotheses: 2.  Bootstrap from the conditional null model to obtain null- statistics , k =1,…K. 3.  From Bayes’ Theorem: 4.  Estimate the ratio of the densities with a non-parametric logistic regression where Fi are “successes” and Fi 0k are “failures” (Anderson and Blair 1982). 5.  Estimate π0 according to Storey (2002). where and . Estimating the Probability Weights € F1 0k ,...,Fm 0k € Fi 0k ~ g0 € Fi ~ π0g0 + (1− π0)g1
47. 47. Estimating the Probability Weights Estimate of posterior probability bi ≠ 0.
48. 48. SVA-Adjusted Analysis 1.  Estimate G with IRW-SVA 2.  Fit 3.  Test the hypotheses € H0i :bi ∈ Ω0 H1i :bi ∈ Ω1
49. 49. A Simple Simulated Example Independent E Dependent E Genes Genes Arrays Arrays
50. 50. Null Distribution Behavior Dependent E Independent E Dependent E + IRW-SVA
51. 51. False Discovery Rate Estimates Independent E Dependent E Dependent E + IRW-SVA True False Discovery Rate True False Discovery Rate True False Discovery Rate Q-value Q-value Q-value
52. 52. Ranking Estimates Independent E Dependent E Dependent E + IRW-SVA Ranking by True Signal to Noise Ranking by True Signal to Noise Ranking by True Signal to Noise AverageRankingbyT-Statistic AverageRankingbyT-Statistic AverageRankingbyT-Statistic
53. 53. 53 Inflammation and the Host Response to Injury mRNA Expression ~50,000 genes Clinical Data >150 clinical variables Patient 1 Patient 2 Patient 166…. MOF1 measures severity of injury
54. 54. Phase 1 Phase 2 Phase 3 Phase 4 Four “Replicated” Studies FrequencyFrequency P-value P-value P-value P-value P-value P-value P-value P-value Frequency Frequency Frequency Frequency Frequency Frequency Frequency
55. 55. Functional Enrichment Across Phases Number of phases in which a significant pathway appears Percentoftotalsignificantpathways 1 of 4 2 of 4 3 of 4 4 of 4 Unadjusted IRW-SVAAdjusted
56. 56. •  High-dimensional hypothesis testing is common. •  Dependence between tests can result in incorrect statistical and scientific inference. •  We can define and address dependence at the level of the model using the dependence kernel. •  IRW-SVA can be used to improve inference in high-dimensional multiple hypothesis testing. Summary
57. 57. Future Work •  Multiple Testing – Develop dependence kernel estimates for spatial data – Develop diagnostic tests for multiple testing procedures •  High-Dimensional Asymptotics – Extend methods for asymptotic SVD to binary data •  Feature Selection for High-Dimensional Classifiers – Extensions of top-scoring pairs (TSP) to survival data – Theoretical connections to LDA and SVM – Embedding TSP in a logic regression framework
58. 58. Thank You
59. 59. 1.  Calculate the residuals R = X - S. 2.  Calculate the singular values of R, d1,…,dn. 3.  Permute each row of R individually to get R0. 4.  Take the SVD of the residuals R* = R0 - S to obtain null singular values . 5.  Compare di to for k=1,…,K to calculate a P- value for the ith right singular vector. Estimating The Row Dimension of G € ˆB € ˆB0 € di0 k € di0 k For k =1,…,K do steps 3-4: Buja and Eyuboglu (1992)
60. 60. Why Does This Work? Leek and Storey (2007), Leek and Storey (2008) Useful Fact: X = BS + E = BS + ΓG + U = BS + ΛH + U if G and H have the same column space.
61. 61. •  References: Benjamini Y and Hochberg Y. (1995), “Controlling the false discovery rate – a practical and powerful approach to multiple testing.” JRSSB, 57: 289-300. De Castro MC, Monte-Mor RL, Sawyer DO, and Singer, BH. (2005), “Malaria risk on the amazon frontier.” PNAS, 103: 2452-2457. Delin B and Roeder K. (1999), “Genomic control for association studies.” Biometrics, 55: 997-1004. Efron B. (2004) “Large-scale simultaneous hypothesis testing: The choice of a null hypothesis.” JASA, 99: 96-104. Leek JT and Storey JD. (2008) “A general framework for multiple testing dependence.” Proceedings of the National Academy of Sciences , 105: 18718-18723. Leek JT and Storey JD. (2007) “Capturing heterogeneity in gene expression studies by ‘Surrogate Variable Analysis’.” PLoS Genetics, 3: e161. Taylor JE and Worsley KJ. (2007) “Detecting sparse signals in random fields, with applications to brain mapping.” JASA, 102: 913-928. Thank You
62. 62. 1.  Perform each hypothesis test individually. 2.  Obtain the test-statistic for each test. 3.  Compare distribution of test-statistics to the theoretical null distribution. 4.  Adjust theoretical null so that it matches the observed statistics in a low signal region. Empirical Null
63. 63. Theoretical Null Efron (2004)
64. 64. Theoretical Null Empirical Null Efron (2004)
65. 65. Empirical Null Results in Incorrect Null Distribution Dep. Kernel
66. 66. •  Observed statistics or observed P-values come from mixture distribution: π0g0 + π1g1 •  Dependence distorts g0 … can go either way: •  Must use full data set to capture dependence With Confounding Empirical Null is Ill-Posed