Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Next Generation Sequencing Informatics - Challenges and Opportunities

1,190 views

Published on

Genetic data is the foundation of precision medicine. Next Generation Sequencing(NGS) enable us to get our whole genome data in affordable cost. How to process huge amount of NGS data effectively ?

Published in: Health & Medicine
  • Be the first to comment

Next Generation Sequencing Informatics - Challenges and Opportunities

  1. 1. Name, Title, Department Date Genome Insight . Inside Genome Next Generation Sequencing Informatics - Challenges and Opportunities Chung-Tsai Su, Ph.D Atgenomix, CTO 2017/03/16 @TMU
  2. 2. Title TextQuestions Before My Talk (1/3) Confidential - Anome internal use only. © 2016 Anome, Inc. Q: How many of you have your own “genetic” data? http://tools.thermofisher.com/content/sfs/prodImages/high/GeneChip_generic_microarray_300dpi_white.jpg https://img.buzzfeed.com/buzzfeed-static/static/2016-10/26/13/campaign_images/buzzfeed-prod-fastlane01/23andme-anne-wojcicki-next-generation-sequencing-2-24817-1477502838-3_dblbig.jpg http://www.kenkon.com.tw/data/editor/images/edf08288b46f7acb4a26ffca9a8c1d82.jpg
  3. 3. Title TextRight to Know and Freedom to Choose Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.fashiongonerogue.com/angelina-jolie-movies-style-photos/2/
  4. 4. Title TextQuestions Before My Talk (2/3) Confidential - Anome internal use only. © 2016 Anome, Inc. Q: How many of you heard about Next-Generation Sequencing (NGS)? http://www.anthonybaldor.com/thoughts-and-notes/bioblog/next-generation-sequencing/
  5. 5. Title TextQuestions Before My Talk (3/3) Confidential - Anome internal use only. © 2016 Anome, Inc. Q: How many of you heard about Spark? http://vignette2.wikia.nocookie.net/vsbattles/images/4/49/Spocks.png/revision/latest?cb=20160501151337 http://spark.apache.org/images/spark-logo-trademark.png
  6. 6. Title TextMy Logical Thinking Confidential - Anome internal use only. © 2016 Anome, Inc. Precision Medicine Human Genome Next Generation Sequencing Big Data Technology
  7. 7. Title TextAbout Me Confidential - Anome internal use only. © 2016 Anome, Inc. Education 1994-1998 NTNU ICE Bachelor 1998-2000 NTU CSIE Master 2000-2007 NTU CSIE Ph.D Experience 2000-2005 Avamax Engineer 2007-2008 NTU Post Doc. 2008-2015 Trend Micro Big Data Architect 2015-now Atgenomix CTO & Cofounder
  8. 8. Title TextAgenda Confidential - Anome internal use only. © 2016 Anome, Inc. • Precision Medicine • Technology • Challenges • Opportunities • Lessons Learned -Next Generation Sequencing -Data Science -Big Data Technology http://i2.kym-cdn.com/photos/images/newsfeed/000/653/558/88e.jpg
  9. 9. Precision Medicine
  10. 10. Title TextToday Medicine in US Confidential - Anome internal use only. © 2016 Anome, Inc.https://www.washingtonpost.com/news/to-your-health/wp/2016/05/03/researchers-medical-errors-now-third-leading-cause-of-death-in-united-states/?utm_term=.066000857138
  11. 11. Title TextPrecision Medicine Initiative most medical treatments are designed for the "average patient" as "one-size-fits-all-approach" that is successful for some patients but not for others.
  12. 12. Title TextImprecision Medicine http://www.nature.com/news/personalized-medicine-time-for-one-person-trials-1.17411 (高膽固醇)(關節炎) (精神分裂症) (胃灼熱) (憂鬱症) (氣喘) (牛皮癬) (孔羅氏症) (多發性硬化症) (嗜中性白血球低下) Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.fda.gov/downloads/ScienceResearch/SpecialTopics/PersonalizedMedicine/UCM372421.pdf
  13. 13. Title TextThe 1000 Genomes Project Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.nature.com/nature/journal/v526/n7571/full/nature15393.html
  14. 14. Next Generation Sequencing (NGS)
  15. 15. Title TextCost per Genome https://www.genome.gov/images/content/costpergenome2015_4.jpg Next Generation Sequencing (NGS) debuted Illumina HiSeq X10 debuted Human Genome Project (HGP) Completed Precision Medicine Initiative announced
  16. 16. Title TextIllumina Product http://www.illumina.com/content/dam/illumina-marketing/documents/products/brochures/brochure_sequencing_systems_portfolio.pdf
  17. 17. Title TextThe First $1,000 Genome http://systems.illumina.com/systems/hiseq-x-sequencing-system.html
  18. 18. Title TextExpectation of Data Processing Power for illumina HiSeq X Ten • A cluster of 10 HiSeq X instruments • Capable of sequencing up to 18,000 whole human genomes each year • Has a run cycle of ~3 days and produces ~150 genomes each run cycle • Running the industry standard BWA+GATK analysis pipeline to perform this analysis on a reasonably high-end (Dual Intel Xeon E5-2697v2 CPU – 12 core, 2.7 GHz with 96 GB DRAM) compute server takes ~24 hours per genome. • To achieve the required throughput of 150 genomes every three days, at least 50 of these servers are required. • Should meet a target of ~28 minutes for the completion of the mapping, aligning, sorting, de-duplication and variant calling of each genome.
  19. 19. Title TextNGS 101 https://www.broadinstitute.org/gatk/img/cartoon-blackbox-workflow-web-blackblue.png Web Lab Dry Lab
  20. 20. Title TextGATK Best Practice http://cdn.vanillaforums.com/gatk.vanillaforums.com/FileUpload/eb/44f317f8850ba74b64ba47b02d1bae.png 4,5百萬變 異怎麼分析
  21. 21. Title TextRead Mapping http://www.nature.com/nrg/journal/v13/n1/fig_tab/nrg3117_F1.html
  22. 22. Title TextVariant Calling http://www.clcsupport.com/clcgenomicsworkbench/754/SNP-example.png
  23. 23. Data Science
  24. 24. Title TextData Science https://media.licdn.com/mpr/mpr/p/5/005/06d/041/02978e8.jpg
  25. 25. Title TextData Scientist https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ http://www.marketingdistillery.com/wp-content/uploads/2014/11/mds_f.png http://buzzorange.com/techorange/2012/10/05/data-scientists-the-definition-of-sexy/
  26. 26. Title TextThe Three Facets of Data Science http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  27. 27. Title TextThe Three Facets of Precision Medicine Clinic Data Science Precision Medicine
  28. 28. Big Data Technology
  29. 29. Title Text4V Velocity Volume Variety Veracity MB GB TB PB batch periodic near Real-Time Real-Time
  30. 30. Title TextScale-Up vs. Scale-Out Horizontal Scaling (More Nodes) VerticalScaling (BiggerNodes) More expensive server (Big Memory, Many CPU cores) Many commodity nodes
  31. 31. Title TextHadoop – HDFS, Spark, YARN https://www.tutorialspoint.com/hadoop/hadoop_introduction.htm
  32. 32. Title TextMap/Reduce http://railscarma.com/wp-content/uploads/2015/02/graphics1.gif
  33. 33. Title TextAn Example of Word Count http://7xjbdi.com1.z0.glb.clouddn.com/word-count-as-mapreduce.png
  34. 34. Title TextPerformance Comparison Method Time (Hours) Note Single-thread GATK Process 16.60 Single Node 20-threads GATK Process 5.49 Single Node 40-threads GATK Process 5.48 Single Node SeqsLab Piper with 40 Cores (GATK) 1.20 9 Nodes SeqsLab Piper with 80 Cores (GATK) 0.99 9 Nodes *By NA12878
  35. 35. Challenges
  36. 36. Title TextNGS 102 Read Mapping Variant CallingBAM 5百萬變異 怎麼分析? Annotation ~ 3 days for 150 genomes per run 100 GB / sample (30X) ~ 12 hours / sample# 100 GB / sample (30X) ~ 70 hours / sample* # using BWA-MEM (20 threats) * using GATK Haplotype Caller (single threat) $ using Annovar 5 GB / sample 10 GB / sample ~ 3 hours / sample$ ∞ hours / sample VCF VCF FASTQ
  37. 37. Title TextChallenges Read Mapping Variant CallingBAM Annotation Dry LabWet Lab • Hard to screen variant efficiently • Hard to identify causal variant effectively • Sample purification • Capture capability • Hard to distinguish variants and sequencing error • Hard to detect structural variants • Hard to provide sufficient evidence • Hard to deal with database error • Sequencing error • Poor in repeat and low complexity regions • Pseudo gene • Short read length • Long turn-around time
  38. 38. Title TextSequencing Error Dr. Watson Discoverer of the structure of DNA in 1953 < 0.1% ~ 1 % Chimp Most closest species to human Sequencing Error = ~1% Dr. Su Cofounder of Atgenomix in 2015 ~ 0.1%
  39. 39. Title TextACMG Standard and Guidelines http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/ Confidential - Anome internal use only. © 2016 Anome, Inc.
  40. 40. Title TextACMG Evidence Framework http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/ Rule set: Confidential - Anome internal use only. © 2016 Anome, Inc.
  41. 41. Title TextAnnotation Database Population Disease LOVD ENCODE 1000 Genomes (phase III) ESP6500 dbSNP ExAC DGV YanHuang CLINVAR COSMIC DVD OMIM ARVC Chrominum COL4A Coloncancer EahadcoagulationFacator Eurowabb Eye Globin Mendelian Mismatchrepairegenes Monogenicdiabetes Musculardystrophy OI Parkinson RB1 RetinalHearing Shared1 TSC VUMC Xchromsome Zjucggm DHS H3K27AC H3K4ME1 H3K4ME3 H3K9AC TFBS Functional Genome Context dbNSFP dbSCSNV Macarthuretal CGD dbNSFP (gene) GENCODE GeneOntology GWAS Haploreg HapmapGF HapmapLD Confidential - Anome internal use only. © 2016 Anome, Inc.
  42. 42. Opportunities
  43. 43. Title TextGATK Best Practice (1/2) https://software.broadinstitute.org/gatk/best-practices/
  44. 44. Title TextGATK Best Practice (2/2) http://cdn.vanillaforums.com/gatk.vanillaforums.com/FileUpload/eb/44f317f8850ba74b64ba47b02d1bae.png
  45. 45. Title TextBut … https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php
  46. 46. Title TextSNP and Indels http://cdn.vanillaforums.com/gatk.vanillaforums.com/FileUpload/eb/44f317f8850ba74b64ba47b02d1bae.png
  47. 47. Title TextTypes of Structural Variation Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.nature.com/nmeth/journal/v9/n2/full/nmeth.1858.html
  48. 48. Title TextDetection Methods of Structural Variation Confidential - Anome internal use only. © 2016 Anome, Inc.
  49. 49. Title TextStructural Variation from 1000Genomes (1/2) Variant Type Caller Deletion (DEL) GenomeStrip Breakdancer CNVnator Delly Variation Hunter UWash RD Pindel (Short Deletions) multiple Copy Number Variation (mCNV) UWash SSL GenomeStrip Duplications (DUP) Delly UWash RD GenomeStrip Inversions (INV) Delly Mobile Element Insertions (MEI) MELT Mitocondrial Insertions (NUMT) Dinumt
  50. 50. Title Text Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.nature.com/nature/journal/v526/n7571/full/nature15394.html Structural Variation from 1000Genomes (2/2)
  51. 51. Title Text$1,000 Whole Genome Sequencing (WGS) Confidential - Anome internal use only. © 2016 Anome, Inc.https://www.technologyreview.com/s/600950/for-999-veritas-genetics-will-put-your-genome-on-a-smartphone-app/
  52. 52. Title TextWhy WGS? Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.nature.com/nature/journal/v511/n7509/full/nature13394.html
  53. 53. Title Text$100 Genome Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.bio-itworld.com/2017/1/9/illumina-launches-novoseq-sequencers-aiming-replace-1900-sequencers.aspx
  54. 54. Title TextOld Drugs, New Uses Confidential - Anome internal use only. © 2016 Anome, Inc.http://www.bbc.com/news/health-39253537
  55. 55. Conclusions
  56. 56. Title TextParadigm Shift https://www.slideshare.net/TWilckens/inn-ventis-precision-medicine2014
  57. 57. Title TextVision of Precision Medicine
  58. 58. Title TextNext-Generation Biology Confidential - Anome internal use only. © 2016 Anome, Inc.http://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.2002050 http://wp.sanger.ac.uk/barrettgroup/files/2012/08/expLab.jpg
  59. 59. Title TextGraph Genome Confidential - Anome internal use only. © 2016 Anome, Inc.https://www.sevenbridges.com/graph/ https://vimeo.com/184983995
  60. 60. Title Text http://www.currencyfundgroup.com/2015/03/07/scientists-are-developing-ways-to-edit-the-dna-of-tomorrows-children/ Back to The Beginning Confidential - Anome internal use only. © 2016 Anome, Inc.
  61. 61. Q&A

×