Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fixing the leaks in the pipeline from public genomics data to the clinic

1,000 views

Published on

A talk about improving reproducibility, simplifying genomic machine learning, and using the resulting predictors to improve power in clinical trials.

Published in: Health & Medicine
  • Be the first to comment

Fixing the leaks in the pipeline from public genomics data to the clinic

  1. 1. fixing the leaks in the genomics
  2. 2. http://jhudatascience.org/
  3. 3. https://www.coursera.org/specialization/genomics/41
  4. 4. @simplystats http://simplystatistics.org
  5. 5. @jtleek http://www.jtleek.com
  6. 6. https://www.counsyl.com/
  7. 7. Their basic pitch was “Genomics is a fraud” “ ” http://www.technologyreview.com/news/535771/a-contrarian-in-biotech/
  8. 8. “The explosive growth of next-generation sequencing data submitted into the SRA exceeds the growth rate of storage capacity ” http://www.ncbi.nlm.nih.gov/pubmed/22009675
  9. 9. 3cost analyst variation motivation
  10. 10. 1cost
  11. 11. costs money interpretability
  12. 12. http://arxiv.org/pdf/math/0606441.pdf
  13. 13. http://www.ncbi.nlm.nih.gov/pubmed/19276151
  14. 14. @leekgroup
  15. 15. http://www.ncbi.nlm.nih.gov/pubmed/25788628
  16. 16. http://www.ncbi.nlm.nih.gov/pubmed/25788628
  17. 17. Agilent/Grade 1 Agilent/Grade 3 Illumina/Grade1 Illumina/Grade3 100% 75% 50% 25% 0% Accuracy Pam Scaled Pam Unscaled TSP http://www.ncbi.nlm.nih.gov/pubmed/25788628
  18. 18. algorithm 1.select useful pairs 2.screen pairs for association 3.build a simple cart predictor
  19. 19. http://www.ncbi.nlm.nih.gov/pubmed/19276151
  20. 20. Patil et al. (in prep)
  21. 21. Patil et al. (in prep)
  22. 22. Patil et al. (in prep)
  23. 23. @leekgroup Data: xik - value for feature i, sample k yk - group indicator for sample k TSP is (i,j) pair that maximizes: |Pr(xik < xjk | yk =1) – Pr(xik < xjk | yk =0)|⌃ ⌃ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1989150/
  24. 24. @leekgroup zijk =1(xik < xjk ) E[zijk |yk ] = a0ij + a1ij yk → max |a1jk | = TSP Patil et al. (in prep)
  25. 25. @leekgroup • Not the same as TSP • But |â/s.e.(â)| = |û/s.e.(û)|, algebraically • “Variance regularized” TSP • zijk invariant to monotone transformations • Fix parameters → find features E[yk |zijk ] = u0ij + u1ij zijk Patil et al. (in prep)
  26. 26. @leekgroup 1. Calculate t-statistic for all pairs 2. Choose top pair (or covariate) 3. Continue for a fixed number of pairs E[yk |zijk ] = u0ij + u1ij zijk Patil et al. (in prep)
  27. 27. @leekgroup http://astor.som.jhmi.edu/~marchion//breastTSP.html
  28. 28. @leekgroup USP7 < RP11-423C15.3 NM_018610 < MTCH1 RND1 < LGALS14 No Recur No Recur No Recur Recur No Yes No Yes No Yes
  29. 29. @leekgroup
  30. 30. @leekgroup Mammaprint Patil et al. (in prep)
  31. 31. 2analyst variation
  32. 32. what went wrong? 2things
  33. 33. what went wrong? transparency The data/code weren’t reproducible
  34. 34. what went wrong? transparency There was a lack of cooperation
  35. 35. what went wrong? expertise They used silly prediction rules (Pr(FEC) = 5/8[Pr(F) + Pr(E) + Pr(C)] – ¼)
  36. 36. what went wrong? expertise They had study design problems (Batch effects)
  37. 37. what went wrong? expertise Their predictions weren’t locked down Today: Pr(FEC) = 0.8 Tomorrow: Pr(FEC) = 0.1
  38. 38. At the end of the day the Potti analysis was fully reproducible The problem is that the analysis was wrong
  39. 39. @leekgroup http://bit.ly/10vS1yt
  40. 40. @leekgroup http://bit.ly/OgW3xv
  41. 41. @leekgroup Drinkel et al. Oganometalics 2013
  42. 42. @leekgroup
  43. 43. @leekgroup
  44. 44. @leekgroup
  45. 45. @leekgroup
  46. 46. http://simplystatistics.tumblr.com/post/19646774024/laws-of-nature-and-the-law-of-patents-supreme-court
  47. 47. 3motivation
  48. 48. $(from reducing sample size)
  49. 49. basic idea randomization isn’t perfect “rebalance” with baseline covariates improve estimator precision
  50. 50. Ack Math!!!!
  51. 51. Estimate probability of being in arm given baseline covariates
  52. 52. Calculate initial estimate for each person using each arm model using propensity score weighted logistic regression
  53. 53. Define a covariate as the residual from fitting the arm-level models minus the arm-level means and fit new propensity models
  54. 54. Use these propensities to re-fit WLR from (2), then average predictions to get covariate-adjusted treatment effect
  55. 55. @leekgroup http://astor.som.jhmi.edu/~marchion//breastTSP.html
  56. 56. @leekgroup Age, Tumor Size, Grade 5.1% Age, Tumor Size, Grade, ER Status 4.9% Mammaprint Risk Category (MRC) 5.4% Age, Tumor Size, Grade, ER Status, MRC 7.8%
  57. 57. @leekgroup Age, Tumor Size, Grade 5.1% Age, Tumor Size, Grade, ER Status 4.9% Mammaprint Risk Category (MRC) 5.4% Age, Tumor Size, Grade, ER Status, MRC 7.8% Age, Tumor Size, Grade, ER Status, TSP 6.2%
  58. 58. 3cost analyst variation motivation
  59. 59. acknowledgements Leek group Prasad Patil Leo Collado Torres Abhi Nellore Claire Ruberman Jack Fu Kai Kammers Collaborators Michael Rosenblum Benjamin Haibe-Kains P.O. Bachant-Winner Roger Peng
  60. 60. Prasad Patil http://www.biostat.jhsph.edu/~prpatil/
  61. 61. Links https://github.com/leekgroup/sig2trial http://jtleek.com/talks/

×