SlideShare a Scribd company logo
1 of 29
What bioinformatic tools should I use for
analysis of high-throughput sequencing data
for molecular diagnostics?
Nick Loman
Reference-based approach
Alignment
Variant calling
SNP extraction & filter
Recombination
filtering
Tree building
MLST/Antibiogram
Read QC
Adaptor/quality
trimming
Species ID
Sample QC
FastQC, Qualimap
Trimmomatic
BLAST, Metaphlan,
MOCAT
Blobology, Kraken,
BLAST
BWA
Samtools/VarScan
GATK
Custom script, snippy,
SnpEff, BRESEQ
Gubbins,
ClonalFrameML
FastTree, RaXML
SRST2
De novo approach
Assembly
MLST/Antibiogram
Annotation
Tree building
Population genomics
Pan-genome
Velvet
SPADES
Prokka
Harvest
BigsDB
Phyloviz
LS-BSR
mlst, Abricate
FastQC
• What: Analyse read-level sequence quality.
• Why: Determine serious errors in read quality
that might affect downstream analysis.
• Where:
http://www.bioinformatics.babraham.ac.uk/p
rojects/fastqc/
FastQC
Qualimap
• What: Analyse insert size distribution
• Why: Determine whether sequencing has
been effective, particularly for de novo
assembly, need for adaptor trimming
• Where: http://qualimap.bioinfo.cipf.es/
Trimmomatic
• What: One of several million read trimmers
• Why: To remove sequence adaptors which
may influence the results of de novo assembly
• Where:
http://www.usadellab.org/cms/?page=trimmo
matic
Species ID: BLAST
• What: Only the most famous bioinformatics
algorithm ever made
• Why: A few random BLAST searches will reveal
much important information about your data
before you start on a pipeline analysis
• Where: http://ncbi.nlm.nih.gov/BLAST
Species ID: Metaphlan
• What: Designed for metagenomics, this
algorithm will find “taxon-defining” genes to
identify what species are in a sample
• Why: Check for extent of sample
contamination, give an accurate species ID for
unknown samples
• Where:
https://bitbucket.org/biobakery/metaphlan2
Species ID: Kraken
• What: Similar to Metaphlan but even faster
and with a more complete database
• Why: Check for extent of sample
contamination, give an accurate species ID for
unknown samples
• Where: https://ccb.jhu.edu/software/kraken/
Species ID: MOCAT
• What: Uses a phylogenetic approach to
identify novel or divergent species by relying
on distances in conserved marker genes
• Why: Sometimes you sequence something
completely novel and want to know more
about its relationships
• Where: http://vm-
lux.embl.de/~kultima/MOCAT/
• Alternatives: Phylosift, rMLST
Sample QC: Blobology
• What: A simple method of plotting de novo
assembly contigs by GC, coverage and taxon
• Why: Characterise contamination, plasmids,
lytic phage in a sample
• Where:
https://github.com/blaxterlab/blobology
Reference approach
Alignment: BWA
• What: The standard method for aligning
Illumina sequences to a reference, use in
BWA-MEM mode which works well with most
read lengths
• Why: Finds the likely location of each
sequence read in a reference genome
• Where: https://github.com/lh3/bwa
• Alternatives: SMALT, Bowtie2 (beware
standard insert size parameters)
Variant calling: samtools&VarScan
• What: A way of calling SNPs against a
reference in one or more samples
• Why: VarScan permits easy filtering of SNPs by
allele frequency and strand, useful for getting
a precise dataset
• Where: http://www.htslib.org/
• http://varscan.sourceforge.net/
• Alternatives: GATK, snippy, Nesoni
Recombination filtering: Gubbins
• What: Detect regions which have undergone
recombination which will confound phylogenetic
reconstructions assuming clonality
• Why: Important when attempting phylogenetic
reconstructions from recombining organisms
• Where: http://sanger-
pathogens.github.io/gubbins/
• Alternatives: ClonalFrameML, BRATNextGen
Tree building: FastTree
• What: Phylogenetic reconstructions from SNP
data
• Why: Tree reconstructions are an effective way of
examining evolutionary relationships in isolates
and testing if they are from an outbreak, FastTree
• Note: Ensure you don’t hit the double-precision
bug!
(http://darlinglab.org/blog/2015/03/23/not-so-
fast-fasttree.html)
• Where:
http://meta.microbesonline.org/fasttree/Alternat
ives: RAxML (more thorough, slower), REALPHY
http://realphy.unibas.ch/fcgi/realphy
MLST & Antibiogram: SRST2
• What: Aligns reads against MLST and
antibiotic resistance databases
• Why: Permits MLST typing with genome data
and a rough prediction of antibiotic resistance
• Where: http://katholt.github.io/srst2/
De novo approach
De novo assembly: SPADES
• What: A reliable de novo assembler which
works well with multiple data types
• Why: Has in-built error corrector so no need
for read trimming, can use multiple values of k
so less need for experimentation, consistently
performs well in comparisons
• Where: http://bioinf.spbau.ru/spades
De novo assembly: Velvet
• What: The original short-read assembly
• Why: Extremely fast for draft assemblies,
particularly if just want to do MLST or
antibiograms
• Where:
https://www.ebi.ac.uk/~zerbino/velvet/
• Alternatives: MEGAHIT – even faster!
Annotation: Prokka
• What: Takes de novo assembly contig files and
annotates them with coding sequences and non-
coding features such as RNAs
• Why: A very sensible set of tools and reference
databases in a single package, produces usable
output for other software and database
submission
• Where:
http://www.vicbioinformatics.com/software.prok
ka.shtml
• Alternatives: xBASE annotation interface
Tree building: Harvest
• What: Takes de novo assembly contigs, performs
whole-genome alignment and permits
reconstruction of core genome phylogenies
• Why: Scaleable to hundreds of genomes on a
laptop and with an excellent viewer
• Where:
http://harvest.readthedocs.org/en/latest/index.h
tml
• Alternatives: Mauve
Population genomics: BIGSDB
• What: Takes de novo assembly contigs and
applies MLST-like schemes working on
hundreds or thousands of core genes
• Why: Scaleable to >1000s of genomes for
rapid population-level clustering
• Where:
http://pubmlst.org/software/database/bigsdb
/
• Alternatives: Bionumerics
Pan/accessory genomes: LS-BSR
• What: Takes de novo assembly contigs or
annotations and compares gene content
• Why: To determine differences in gene
content between 1 to 1000s of strains
• Where: https://github.com/jasonsahl/LS-BSR
• Alternatives: OrthoMCL
MLST/Antibiogram: mlst and Abricate
• What: Works on de novo assembly to give
mlst prediction and antibiotic resistance
perdiction
• Why: A very fast method
• Where: https://github.com/tseemann/mlst
• https://github.com/tseemann/abricate
• Alternatives: SRST2
CLoud Infrastructure for Microbial
Bioinformatics (CLIMB)
• MRC funded project to
develop Cloud
Infrastructure for
microbial bioinformatics
• £4M of hardware, capable
of supporting >1000
individual virtual servers
• Amazon/Google cloud for
Academics
Acknowledgements
• Twitter comments:
– Tom Connor, Alan McNally, Torsten Seemann, C.
Titus Brown, Heng Li, Christoffer Flensburg, Matt
MacManes, Rachel Glover, Willem van Schaik, Bill
Hanage, Jennifer Gardy, Mick Watson, Alan
McNally, Esther Robinson, Nicola Fawcett, Aziz
Aboobaker, Ruth Massey

More Related Content

What's hot

RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...Nick Loman
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGBilal Nizami
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesKeith Bradnam
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 

What's hot (20)

RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore Technologies
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Rna seq
Rna seqRna seq
Rna seq
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 

Viewers also liked

London Calling: A Year of Happy MAPping 14th May 2015
London Calling: A Year of Happy MAPping 14th May 2015London Calling: A Year of Happy MAPping 14th May 2015
London Calling: A Year of Happy MAPping 14th May 2015Nick Loman
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
 
Bayside Solutions Clinical Division
Bayside Solutions Clinical DivisionBayside Solutions Clinical Division
Bayside Solutions Clinical DivisionKurt Decker
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 
Long-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyLong-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyClaire Rioualen
 
Going Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's NextGoing Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's NextHealth Catalyst
 

Viewers also liked (6)

London Calling: A Year of Happy MAPping 14th May 2015
London Calling: A Year of Happy MAPping 14th May 2015London Calling: A Year of Happy MAPping 14th May 2015
London Calling: A Year of Happy MAPping 14th May 2015
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Bayside Solutions Clinical Division
Bayside Solutions Clinical DivisionBayside Solutions Clinical Division
Bayside Solutions Clinical Division
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
Long-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyLong-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technology
 
Going Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's NextGoing Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's Next
 

Similar to ECCMID 2015 Meet-The-Expert: Bioinformatics Tools

High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondAdamCribbs1
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfATPowr
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfPushpendra83
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupGenomeInABottle
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...QIAGEN
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114fnothaft
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeLex Nederbragt
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Lucidworks
 

Similar to ECCMID 2015 Meet-The-Expert: Bioinformatics Tools (20)

High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdf
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
 

Recently uploaded

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 

Recently uploaded (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 

ECCMID 2015 Meet-The-Expert: Bioinformatics Tools

  • 1. What bioinformatic tools should I use for analysis of high-throughput sequencing data for molecular diagnostics? Nick Loman
  • 2. Reference-based approach Alignment Variant calling SNP extraction & filter Recombination filtering Tree building MLST/Antibiogram Read QC Adaptor/quality trimming Species ID Sample QC FastQC, Qualimap Trimmomatic BLAST, Metaphlan, MOCAT Blobology, Kraken, BLAST BWA Samtools/VarScan GATK Custom script, snippy, SnpEff, BRESEQ Gubbins, ClonalFrameML FastTree, RaXML SRST2 De novo approach Assembly MLST/Antibiogram Annotation Tree building Population genomics Pan-genome Velvet SPADES Prokka Harvest BigsDB Phyloviz LS-BSR mlst, Abricate
  • 3. FastQC • What: Analyse read-level sequence quality. • Why: Determine serious errors in read quality that might affect downstream analysis. • Where: http://www.bioinformatics.babraham.ac.uk/p rojects/fastqc/
  • 5. Qualimap • What: Analyse insert size distribution • Why: Determine whether sequencing has been effective, particularly for de novo assembly, need for adaptor trimming • Where: http://qualimap.bioinfo.cipf.es/
  • 6. Trimmomatic • What: One of several million read trimmers • Why: To remove sequence adaptors which may influence the results of de novo assembly • Where: http://www.usadellab.org/cms/?page=trimmo matic
  • 7. Species ID: BLAST • What: Only the most famous bioinformatics algorithm ever made • Why: A few random BLAST searches will reveal much important information about your data before you start on a pipeline analysis • Where: http://ncbi.nlm.nih.gov/BLAST
  • 8. Species ID: Metaphlan • What: Designed for metagenomics, this algorithm will find “taxon-defining” genes to identify what species are in a sample • Why: Check for extent of sample contamination, give an accurate species ID for unknown samples • Where: https://bitbucket.org/biobakery/metaphlan2
  • 9. Species ID: Kraken • What: Similar to Metaphlan but even faster and with a more complete database • Why: Check for extent of sample contamination, give an accurate species ID for unknown samples • Where: https://ccb.jhu.edu/software/kraken/
  • 10. Species ID: MOCAT • What: Uses a phylogenetic approach to identify novel or divergent species by relying on distances in conserved marker genes • Why: Sometimes you sequence something completely novel and want to know more about its relationships • Where: http://vm- lux.embl.de/~kultima/MOCAT/ • Alternatives: Phylosift, rMLST
  • 11. Sample QC: Blobology • What: A simple method of plotting de novo assembly contigs by GC, coverage and taxon • Why: Characterise contamination, plasmids, lytic phage in a sample • Where: https://github.com/blaxterlab/blobology
  • 13. Alignment: BWA • What: The standard method for aligning Illumina sequences to a reference, use in BWA-MEM mode which works well with most read lengths • Why: Finds the likely location of each sequence read in a reference genome • Where: https://github.com/lh3/bwa • Alternatives: SMALT, Bowtie2 (beware standard insert size parameters)
  • 14. Variant calling: samtools&VarScan • What: A way of calling SNPs against a reference in one or more samples • Why: VarScan permits easy filtering of SNPs by allele frequency and strand, useful for getting a precise dataset • Where: http://www.htslib.org/ • http://varscan.sourceforge.net/ • Alternatives: GATK, snippy, Nesoni
  • 15. Recombination filtering: Gubbins • What: Detect regions which have undergone recombination which will confound phylogenetic reconstructions assuming clonality • Why: Important when attempting phylogenetic reconstructions from recombining organisms • Where: http://sanger- pathogens.github.io/gubbins/ • Alternatives: ClonalFrameML, BRATNextGen
  • 16. Tree building: FastTree • What: Phylogenetic reconstructions from SNP data • Why: Tree reconstructions are an effective way of examining evolutionary relationships in isolates and testing if they are from an outbreak, FastTree • Note: Ensure you don’t hit the double-precision bug! (http://darlinglab.org/blog/2015/03/23/not-so- fast-fasttree.html) • Where: http://meta.microbesonline.org/fasttree/Alternat ives: RAxML (more thorough, slower), REALPHY http://realphy.unibas.ch/fcgi/realphy
  • 17. MLST & Antibiogram: SRST2 • What: Aligns reads against MLST and antibiotic resistance databases • Why: Permits MLST typing with genome data and a rough prediction of antibiotic resistance • Where: http://katholt.github.io/srst2/
  • 19. De novo assembly: SPADES • What: A reliable de novo assembler which works well with multiple data types • Why: Has in-built error corrector so no need for read trimming, can use multiple values of k so less need for experimentation, consistently performs well in comparisons • Where: http://bioinf.spbau.ru/spades
  • 20. De novo assembly: Velvet • What: The original short-read assembly • Why: Extremely fast for draft assemblies, particularly if just want to do MLST or antibiograms • Where: https://www.ebi.ac.uk/~zerbino/velvet/ • Alternatives: MEGAHIT – even faster!
  • 21. Annotation: Prokka • What: Takes de novo assembly contig files and annotates them with coding sequences and non- coding features such as RNAs • Why: A very sensible set of tools and reference databases in a single package, produces usable output for other software and database submission • Where: http://www.vicbioinformatics.com/software.prok ka.shtml • Alternatives: xBASE annotation interface
  • 22. Tree building: Harvest • What: Takes de novo assembly contigs, performs whole-genome alignment and permits reconstruction of core genome phylogenies • Why: Scaleable to hundreds of genomes on a laptop and with an excellent viewer • Where: http://harvest.readthedocs.org/en/latest/index.h tml • Alternatives: Mauve
  • 23.
  • 24.
  • 25. Population genomics: BIGSDB • What: Takes de novo assembly contigs and applies MLST-like schemes working on hundreds or thousands of core genes • Why: Scaleable to >1000s of genomes for rapid population-level clustering • Where: http://pubmlst.org/software/database/bigsdb / • Alternatives: Bionumerics
  • 26. Pan/accessory genomes: LS-BSR • What: Takes de novo assembly contigs or annotations and compares gene content • Why: To determine differences in gene content between 1 to 1000s of strains • Where: https://github.com/jasonsahl/LS-BSR • Alternatives: OrthoMCL
  • 27. MLST/Antibiogram: mlst and Abricate • What: Works on de novo assembly to give mlst prediction and antibiotic resistance perdiction • Why: A very fast method • Where: https://github.com/tseemann/mlst • https://github.com/tseemann/abricate • Alternatives: SRST2
  • 28. CLoud Infrastructure for Microbial Bioinformatics (CLIMB) • MRC funded project to develop Cloud Infrastructure for microbial bioinformatics • £4M of hardware, capable of supporting >1000 individual virtual servers • Amazon/Google cloud for Academics
  • 29. Acknowledgements • Twitter comments: – Tom Connor, Alan McNally, Torsten Seemann, C. Titus Brown, Heng Li, Christoffer Flensburg, Matt MacManes, Rachel Glover, Willem van Schaik, Bill Hanage, Jennifer Gardy, Mick Watson, Alan McNally, Esther Robinson, Nicola Fawcett, Aziz Aboobaker, Ruth Massey