NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)


Published on

Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.2- Next Generation Sequencing. Technologies and Applications. Part II: NGS Applications I.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (, Barcelona.

Published in: Science, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)

  1. 1. 1 Vall d’Hebron Institut de Recerca (VHIR) Rosa Prieto Head of the High Tech Unit 15/05/2014 Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII) NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
  3. 3. NGS applications -Amplicon sequencing -Targeted DNA resequencing -Exome sequencing -Whole genome sequencing -Metagenomics -RNA sequencing -Targeted RNA resequencing -Epigenomics -Sequencing of free DNA-RNA (plasma/serum)
  4. 4. Considerations to use NGS -What do I want to sequence? Whole genome, exome, several genes, metagenome, epigenome, RNAseq..... -How many samples? -Length of read required? -Quality and quantity of starting material? -Size of nucleic acids to sequence -Amount of sequence needed: coverage (Depth of) Coverage: how many times a particular base is sequenced. 30x = each base has been read by 30 sequences (in average) Depth of coverage = (nº reads * read length / size of target genome) (Breadth of) Coverage: amount of the target sequence that has been covered (with a given coverage)
  5. 5. Considerations to use NGS Which depth of coverage do I need? It is an empiric value that depends on the objective of the study and its particular conditions (consensus values may exist)
  6. 6. Amplicon sequencing: viral quasispecies  In an infected patient the population of viruses presents high rates of mutation and replication. It is a complex mixing of different mutants.  Goal of the study:  Detection and quantification of mutations or combination of mutations that could confer resistance to viral inhibitors in samples from infected patients.  Special interest in mutations at a low rate (minor variants). HCV, HBV, HIV virus populations have special characteristics:
  7. 7. Amplicon sequencing: viral quasispecies  Minor variants often play an important role in the development of resistance to antiviral treatments in patients, even if they are present in a very low percentage in the population.  Minor variants may not be detected by classical sequencing methods  You obtain hundreds of sequences with much effort and high cost  NGS allows to detect efficiently variants at a very low rate  You obtain thousands of sequences with relatively low cost WHY IS NGS APPROPIATED FOR THIS KIND OF STUDY? 454 technology is the most appropiated method in this particular case (long sequences are achieved)
  8. 8. Targeted sequencing using gene panels Array-based capture system Liquid capture system
  9. 9. Targeted sequencing using gene panels Illumina Ion Torrent
  10. 10. Considerations that affect capture efficiency -Quality and quantity of input DNA -Repeat elements, tandem repeats and pseudogenes: uneven distribution of coverage -Extreme GC content: 5’UTR, first exons of genes, promoter regions -Library insert length and its distribution: •Different capture platforms recommend different sets of standard practices for sample library preparation. •.As a result of these underlying chemistries, each platform has its own range of recommended fragment sizes. Agilent insert size ranges from 100 to 300bp, Nimblegen ranges from 150 to 250bp and TruSeq has the broadest range of 300 to 500bp. -Consistent laboratory procedures.
  11. 11. Sequence capture for cancer genomics
  12. 12. Exome vs. whole genome sequencing PROS: • Enabling technologies: NGS machines, open-source algorithms, capture reagents, lowering cost, big sample collections • Exomes are more cost effective (less sequencing for the same coverage): human genome 3,2 Gb vs. human exome aprox. 50 Mb (1- 2% of the genome) • Simplified bioinformatics analysis compared to whole genomes CHALLENGES: • Still can’t interpret many Mendelian disorders • Rare variants need large samples sizes • Exome might miss regions of interest (e.g. novel non-coding genes) • Exome reagents do not capture all exons • Sometimes unsuccessful to interpret clinical data Shendure, Genome Biol 2011
  13. 13. ( ) /emPCR Exome sequencing workflow
  14. 14. Illumina exome sequencing Kits Sequencers -Nimblegen EZ capture -Agilent SureSelect -Raindance .......
  15. 15. Ion exome sequencing
  16. 16. De novo sequencing Resequencing Whole genome sequencing
  17. 17. Whole genome sequencing Sequenced reads Contigs Scaffolds Mapped Scaffolds Genome map Long reads (454, PacBio, PE Illumina reads) Shot gun
  18. 18. 18 Secuenciación de la cepa bacteriana E. coli O104:H4 con GS Junior, MiSeq, PGM. 1. Creación de un ensamblaje de referencia (Roche GS FLX+ shotgun + 8 Kb PE, coverage 32x). Contiene 1 cromosoma (5.3 kb) y 2 plásmidos. Quedan 153 gaps correspondientes a regiones repetitivas sin resolver. 2. Secuenciación de la misma cepa usando: • 2 runs del 454 GS Junior • 2 chips 316 del Ion Torrent PGM • 1 run del MiSeq (2x150 bases) Performance comparison of benchtop high-troughput sequencing platforms. Nat. Biotechn. 30 (5): 434-441 (2012) Whole genome sequencing
  19. 19. 19 Conclusions: “One important conclusion from this evaluation is that saying that one has “sequenced a bacterial genome” means different things on different benchtop sequencing platforms” MiSeq GS Junior IonTorrent Throughput/run The highest The lowest The fastest Errors The lowest Intermediate(indels) Many, specially in homopolymers Read length Intermediate (2x150bp) The longest (520 bp) The shortest (100bp) Run time The longest (27 hr) Intermediate (9 hr) The shortest (3 hr) Price per Mb The cheapest The most expensive Intermediate Other considerations Unfillable gaps Errors in homopolymers The worstest performance Performance comparison of benchtop high-troughput sequencing platforms. Nat. Biotechn. 30 (5): 434-441 (2012) Whole genome sequencing
  20. 20. 20 • La pequeña fracción del genoma con variaciones entre los individuos puede explicar diferencias en la susceptibilidad a una enfermedad, en la respuesta a fármacos o en la reacción a factores ambientales. El “Proyecto de los 1000 genomas” tratará de establecer un mapa del genoma humano que incluya la descripción de la mayor cantidad posible de variaciones en el mismo, mejorando de forma espectacular la información obtenida con el proyecto HapMap. • El proyecto se realiza con el soporte principal de tres instituciones: el Wellcome Trust Sanger Institute (Hinxton, Inglaterra), el Beijing Genomics Institute (Shenzen, China) y el National Human Genome Research Institute, que forma parte del NIH (National Institutes of Health, USA). 1000 Genomes Project
  21. 21. 21 Methods: 1-Low coverage (5x) sequencing: SOLiD+Illumina 2-Whole exome sequencing (80× average coverage across a consensus target of 24 Mb spanning more than 15,000 genes)): SeqCap EZHuman Exome Library, Nimblegen, and SureSelect All Exon V2 Target Enrichment kit from Agilent. 3-SNP genotyping: Initially all samples were typed using a Sequenom MassArray SNP Genotyping panel of 23 SNPs and one gender determining assay to establish a genetic fingerprint. After gender concordance was verified the samples were placed on 96 well plates using the llumina HumanOmni2.5OQuad v1.0 B SNP array. 1000 Genomes Project
  22. 22. 22 El proyecto publicará el genotipo de los voluntarios, junto con información detallada de su fenotipo: registros médicos, varios análisis, imágenes RM, etc. Toda la información estará disponible para cualquiera en Internet, para que investigadores puedan probar varias hipótesis acerca de las relaciones entre el genotipo, el ambiente y el fenotipo. Personal Genome Project
  23. 23. 23 ClinVar  MedGen to research the phenotype  GTR (Genetic Testing Registry) to choose appropriate tests  ClinVar to research variant pathogenicity NCBI’s Resources for Phenotype (MedGen), Tests (GTR) and Variation (ClinVar)
  24. 24. 24 NCBI’s Resources for Phenotype (MedGen), Tests (GTR) and Variation (ClinVar) Patient showing signs compatible with Marfan syndrome:
  25. 25. 25 NCBI’s Resources for Phenotype (MedGen), Tests (GTR) and Variation (ClinVar)
  26. 26. 26 List of tests for Marfan syndrome (panels included)
  27. 27. 27 NCBI’s Resources for Phenotype (MedGen), Tests (GTR) and Variation (ClinVar)
  28. 28. 28
  29. 29. 29 NCBI’s Resources for Phenotype (MedGen), Tests (GTR) and Variation (ClinVar)
  30. 30. 30 Searching ClinVar NM_000138.4:c.4786C>T FBN1:c.4786C>T c.4786C>T Arg1596Ter R1596*
  31. 31. 31 Allele summary • Gene • Variant type • Genomic location • HGVS expressions* • Molecular consequence* • Links* • Frequency* Phenotype summary • Names • Links* • Age of onset * • Prevalence * Interpretation • Significance • Review status * • Accession.version * * May be provided by NCBI ClinVar detailed display
  32. 32. 32 ClinVar detailed display