Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bioinformatics workflows and study design

282 views

Published on

Overview of the steps involved in bioinformatics and the impact of batch effects on study design.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Bioinformatics workflows and study design

  1. 1. Considerations in bioinformatics analyses and study design Elana J Fertig Johns Hopkins University
  2. 2. Why study design and bioinformatics pipelines?
  3. 3. Let’s team up to avoid painful power calculation discussions • https://www.youtube.com/watch?v=PbODigCZqL8
  4. 4. Is there a boundary between standard bioinformatics and AI/data science?
  5. 5. Data science and statistics are a continuum that must work together for best analyses
  6. 6. How does bioinformatics work?
  7. 7. When to contact the bioinformatician? “To call in the statistician after the experiment is done is no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.” Ronald Fisher
  8. 8. Why? Am I wasting your time? Never. Are you really just a control freak? Ok, maybe. The GIGO principle of computer science: Garbage In Garbage Out
  9. 9. Best analyses come from good data cleaning and study design
  10. 10. Considerations for study design • Sample preparation impacts which technologies you can use • Biological hypothesis should drive technology selection • “Off-label” use of technologies impacts technology protocols (e.g., TCR sequence, virus, or splice variant detection from bulk or sc RNA- seq) • Consider study design to anticipate the impact of technical artifacts may impact data quality (e.g., library, sequencing run, batch, technician, date of processing, age of sample, etc). Measure twice, cut once
  11. 11. Core coordination minimizes off-label analysis costs Off-labelinformatics toolsworkonrawdata
  12. 12. Published bladder cancer microarray data set Leek et al. 2010
  13. 13. Even large consortia datasets like TCGA have batch effects Fortin et al., 2014
  14. 14. Design studies to avoid confounding technical artifacts and biological covariates
  15. 15. Batch effects change the correlation structure between genes Leek et al. 2010
  16. 16. Batch effects change the correlation structure between genes Leek et al. 2010
  17. 17. Study design and data cleaning are the most critical part of any analysis
  18. 18. We can mathematically correct for known batch effects in data with good study designs
  19. 19. We can correct for batch effects if we know they are there
  20. 20. Recognizing confounded designs • Trial Arm A in one batch and trial Arm B in another • Pre-treatment in one batch and post-treatment in another • Responders in one batch and non-responders in another • Designs can get complicated. E.g., what do you do if you have multiple tissue sites from multiple individuals and you want to compare both site and individual differences? We love to help during design!
  21. 21. What should we do? Leek et al. 2010
  22. 22. Bioinformatics as a team sport and best practices • Early consultation for sample preparation, technology selection, and study design • Interactive collaboration during data preprocessing and cleaning • Reproducible scripts to include as manuscript supplements or online to document analysis steps • Open source software for dissemination of any new algorithms employed in analysis
  23. 23. Summary • It is never too early to contact your friendly neighborhood bioinformatician and we can consult on • Sample preservation • Technology selection • Study design • Analysis plan and preprocessing • Data parasiting • Coordinated collaboration in the data generation process and with the sequencing core minimizes costs and maximizes data quality

×