Eccmid meet the-expert

•Download as PPTX, PDF•

0 likes•1,171 views

This document discusses bioinformatic tools for analyzing high-throughput sequencing data for molecular diagnostics. It recommends tools for quality control like FastQC and Qualimap to check if sequencing worked and fragment lengths. BLAST and Kraken can check if a sample matches expected identity or is contaminated. Trimmomatic is recommended for adaptor trimming. For analysis, it recommends reference-based approaches using BWA and GATK or de novo assembly with SPADES. Both approaches have advantages, so most people will want to do both and use a de novo assembly as a reference if none exists.

Science

What bioinformatic tools should I use for
analysis of high-throughput sequencing data
for molecular diagnostics?
Nick Loman

Read QC
Assembly
Whole-genome
alignment
Reference-based approach
De novo approach
Mauve
ParsnpAlignment BWA
Variant calling Samtools/VarScan
GATK
SPADES
FastQC
Qualimap
Kraken
BLAST!Adaptor/quality
trimming Trimmomatic
SNP extraction
Python script!
Snippy
Recombination filtering Gubbins
MLST/Antibiogram
Annotation
Mlst
abricate
Prokka
Tree building FastTree
RAXML
Tree building Harvest
Population genomics
BIGSDB
Phyloviz
MLST/Antibiogram SRST2 Pan-genome LS-BSR

Quality Control: Questions to Ask
• Did my sequencing work?
• What are the fragment lengths?
• Is my sample what I think it is?
• Is my sample contaminated?

What are the fragment lengths?
• Qualimap (or just BWA)
Bad
Fragment length < read
length
OK
Fragment length > read
length
Good
Fragment length > 2x read
length
Will affect: genome coverage, de novo assembly performance, alignment performance

Is my sample what I think it is?
• BLASTing a few reads usually very efficient

Adaptor trim reads
• With Nextera libraries, failing to adaptor trim
will KILL your assemblies.
• Particularly important when mean fragment
length < read length.
• Many trimmers available: I like to use
Trimmomatic
For more explanation: http://nickloman.github.io/high-
throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-
experiences-with-nextera-libraries/

Reference-based or de novo?
• Reference-based
– Implies ALIGNMENT to reference
– Implies you HAVE a reference
– Allows exquisitely sensitive and specific SNP
calling (forensic SNP calling to single mutation
precision)
– Important for looking at CHAINS OF
TRANSMISSION
– Can only call in parts of the genome COMMON
between your SAMPLES and REFERENCE

Reference-based or de novo?
• De-novo
– Implies de novo assembly
– Does NOT require a reference
– Gives access to the entire PAN-genome
– E.g.
• Unexpected antibiotic resistance genes
• Virulence factors
– Can give misleading results in REPEAT sequences
– Not suitable for very fine-resolution SNP analysis

In practice
• Most people will want to do both.
• And if you have no reference, you can use a
draft de novo assembly AS your reference.

Acknowledgements
• Twitter comments:
– Tom Connor, Alan McNally, Torsten Seemann, C.
Titus Brown, Heng Li, Christoffer Flensburg, Matt
MacManes, Rachel Glover, Willem van Schaik

What's hot

GLBIO/CCBC Metagenomics WorkshopMorgan Langille

Ngs part i 2013Elsa von Licy

2011 jeroen vanhoudt_ngsDin Apellidos

Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc

NGS: bioinformatic challengesLex Nederbragt

The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson

A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts

Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha

High Throughput Sequencing Technologies: What We Can KnowBrian Krueger

So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis

A Comparison of NGS Platforms.mkim8

Ngs intro_v6_publicFrançois PAILLIER

Next generation sequencingVishal Pandey

Ngs introductionAlagar Suresh

Rnaseq basics ngs_application1Yaoyu Wang

Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann

RNASeq Experiment DesignYaoyu Wang

Exploring new frontiers with next-generation sequencingQIAGEN

Biotech autumn2012-02-ngs2BioinformaticsInstitute

RNA-seq Data Analysis OverviewSean Davis

What's hot (20)

GLBIO/CCBC Metagenomics Workshop

Ngs part i 2013

2011 jeroen vanhoudt_ngs

Knowing Your NGS Upstream: Alignment and Variants

NGS: bioinformatic challenges

The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule

A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...

Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

High Throughput Sequencing Technologies: What We Can Know

So you want to do a: RNAseq experiment, Differential Gene Expression Analysis

A Comparison of NGS Platforms.

Ngs intro_v6_public

Next generation sequencing

Ngs introduction

Rnaseq basics ngs_application1

Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015

RNASeq Experiment Design

Exploring new frontiers with next-generation sequencing

Biotech autumn2012-02-ngs2

RNA-seq Data Analysis Overview

Similar to Eccmid meet the-expert

Genome in a bottle for amp GeT-RM 181030GenomeInABottle

Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle

Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics

20140711 4 e_tseng_ercc2.0_workshopExternal RNA Controls Consortium

Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins

Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc

RNA-seq quality control and pre-processingmikaelhuss

GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge

Bioinformatics workshop Sept 2014LutzFr

Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob

2012 10-24 - ngs webinarElsa von Licy

Under the Hood of Alignment Algorithms for NGS ResearchersGolden Helix Inc

Assembly and gene_predictionBas van Breukelen

Festival of Genomics Jan 2018Graham Taylor

Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle

Hong_Celine_ES_workshop.pptxBioinformatics and Computational Biosciences Branch

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca

Cignal lenti webinarElsa von Licy

Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1QIAGEN

Similar to Eccmid meet the-expert (20)

Genome in a bottle for amp GeT-RM 181030

Genome in a bottle for ashg grc giab workshop 181016

Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...

20140711 4 e_tseng_ercc2.0_workshop

Using VarSeq to Improve Variant Analysis Research Workflows

RNA-seq quality control and pre-processing

GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016

Bioinformatics workshop Sept 2014

Part 2 of RNA-seq for DE analysis: Investigating raw data

2012 10-24 - ngs webinar

Under the Hood of Alignment Algorithms for NGS Researchers

Assembly and gene_prediction

Festival of Genomics Jan 2018

Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...

Hong_Celine_ES_workshop.pptx

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...

Cignal lenti webinar

Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1

Recently uploaded

Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani

Topography and sediments of the floor of the Bay of BengalMd Hasan Tareq

The importance of continents, oceans and plate tectonics for the evolution of...Sérgio Sacani

Richard's entangled aventures in wonderlandRichard Gill

biotech-regenration of plants, pharmaceutical applications.pptxANONYMOUS

Microbial Type Culture Collection (MTCC)abhishekdhamu51

Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPirithiRaju

RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGAADYARAJPANDEY1

FAIRSpectra - Towards a common data file format for SIMS imagesAlex Henderson

National Biodiversity protection initiatives and Convention on Biological Di...PABOLU TEJASREE

Lab report on liquid viscosity of glycerinossaicprecious19

Hemoglobin metabolism: C Kalyan & E. Muralinathmuralinath2

platelets- lifespan -Clot retraction-disorders.pptxmuralinath2

In silico drugs analogue design: novobiocin analogues.pptxAlaminAfendy1

Hemoglobin metabolism_pathophysiology.pptxmuralinath2

Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani

Detectability of Solar Panels as a TechnosignatureSérgio Sacani

Mammalian Pineal Body Structure and Also FunctionsYOGESH DOGRA

Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani

Shuaib Y-basedComprehensive mahmudj.pptxMdAbuRayhan16

Recently uploaded (20)

Climate extremes likely to drive land mammal extinction during next supercont...

Topography and sediments of the floor of the Bay of Bengal

The importance of continents, oceans and plate tectonics for the evolution of...

Richard's entangled aventures in wonderland

biotech-regenration of plants, pharmaceutical applications.pptx

Microbial Type Culture Collection (MTCC)

Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf

RNA INTERFERENCE: UNRAVELING GENETIC SILENCING

FAIRSpectra - Towards a common data file format for SIMS images

National Biodiversity protection initiatives and Convention on Biological Di...

Lab report on liquid viscosity of glycerin

Hemoglobin metabolism: C Kalyan & E. Muralinath

platelets- lifespan -Clot retraction-disorders.pptx

In silico drugs analogue design: novobiocin analogues.pptx

Hemoglobin metabolism_pathophysiology.pptx

Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243

Detectability of Solar Panels as a Technosignature

Mammalian Pineal Body Structure and Also Functions

Multi-source connectivity as the driver of solar wind variability in the heli...

Shuaib Y-basedComprehensive mahmudj.pptx

Eccmid meet the-expert

1. What bioinformatic tools should I use for analysis of high-throughput sequencing data for molecular diagnostics? Nick Loman

2. Read QC Assembly Whole-genome alignment Reference-based approach De novo approach Mauve ParsnpAlignment BWA Variant calling Samtools/VarScan GATK SPADES FastQC Qualimap Kraken BLAST!Adaptor/quality trimming Trimmomatic SNP extraction Python script! Snippy Recombination filtering Gubbins MLST/Antibiogram Annotation Mlst abricate Prokka Tree building FastTree RAXML Tree building Harvest Population genomics BIGSDB Phyloviz MLST/Antibiogram SRST2 Pan-genome LS-BSR

3. Quality Control: Questions to Ask • Did my sequencing work? • What are the fragment lengths? • Is my sample what I think it is? • Is my sample contaminated?

4. Did my sequencing work? • FastQC:

5. What are the fragment lengths? • Qualimap (or just BWA) Bad Fragment length < read length OK Fragment length > read length Good Fragment length > 2x read length Will affect: genome coverage, de novo assembly performance, alignment performance

6. Is my sample what I think it is? • BLASTing a few reads usually very efficient

7. Is my sample contaminated?

8. Adaptor trim reads • With Nextera libraries, failing to adaptor trim will KILL your assemblies. • Particularly important when mean fragment length < read length. • Many trimmers available: I like to use Trimmomatic For more explanation: http://nickloman.github.io/high- throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die- experiences-with-nextera-libraries/

9. Adaptor trim reads • With Nextera libraries, failing to adaptor trim will KILL your assemblies. • Particularly important when mean fragment length < read length. • Many trimmers available: I like to use Trimmomatic For more explanation: http://nickloman.github.io/high- throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die- experiences-with-nextera-libraries/

10. Reference-based or de novo?

11. Reference-based or de novo? • Reference-based – Implies ALIGNMENT to reference – Implies you HAVE a reference – Allows exquisitely sensitive and specific SNP calling (forensic SNP calling to single mutation precision) – Important for looking at CHAINS OF TRANSMISSION – Can only call in parts of the genome COMMON between your SAMPLES and REFERENCE

12. Reference-based or de novo? • De-novo – Implies de novo assembly – Does NOT require a reference – Gives access to the entire PAN-genome – E.g. • Unexpected antibiotic resistance genes • Virulence factors – Can give misleading results in REPEAT sequences – Not suitable for very fine-resolution SNP analysis

13. In practice • Most people will want to do both. • And if you have no reference, you can use a draft de novo assembly AS your reference.

14. Acknowledgements • Twitter comments: – Tom Connor, Alan McNally, Torsten Seemann, C. Titus Brown, Heng Li, Christoffer Flensburg, Matt MacManes, Rachel Glover, Willem van Schaik

Eccmid meet the-expert

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Eccmid meet the-expert

Similar to Eccmid meet the-expert (20)

Recently uploaded

Recently uploaded (20)

Eccmid meet the-expert