SlideShare a Scribd company logo
February 4th, 2015
Mariam Quiñones, PhD
Computational Molecular Biology Specialist
Bioinformatics and Computational Biosciences Branch
OCICB/OSMO/OD/NIAID/NIH
BCBB: A Branch Devoted to Bioinformatics and
Computational Biosciences
  Researchers’ time is increasingly important
  BCBB saves our collaborators time and effort
  Researchers speed projects to completion using
BCBB consultation and development services
  No need to hire extra post docs or use external
consultants or developers
What makes us different?
Software DevelopersComputational
Biologists
PMs and Analysts
2
BCBB
  “NIH Users: Access a menu of
BCBB services on the NIAID
Intranet:
•  http://bioinformatics.niaid.nih.gov/
  Outside of NIH –
•  search “BCBB” on the NIAID
Public Internet Page:
www.niaid.nih.gov
  Email us at:
•  ScienceApps@niaid.nih.gov
3
Remember Sanger?
•  Sanger introduced the “dideoxy method” (also known as Sanger
sequencing) in December 1977
Alignment of reads using tools such as ‘Sequencher’
IMAGE: http://www.lifetechnologies.com
4
Understanding file
formats
@F29EPBU01CZU4O
GCTCCGTCGTAAAAGGGG
+
24469:666811//..,,
@F29EPBU01D60ZF
CTCGTTCTTGATTAATGAAACATTCTTGGCAAA
TGCTTTCGCTCTGGTCCGTCTTGCGCCGGTCCA
AGAATTTCACCTCTAGCGGCGCAATACGAATG
CCCAAACACACCCAACACACCA
+
G???HHIIIIIIIIIBG555?
=IIIIIIIIHHGHHIHHHIIIIIIHHHIIHHHIIIIIIIIIH99;;CB
BCCEI???DEIIIIII??;;;IIGDBCEA?
9944215BB@>>@A=BEIEEE
@F29EPBU01EIPCX
TTAATGATTGGAGTCTTGGAAGCTTGACTACCC
TACGTTCTCCTACAAATGGACCTTGAGAGCTTG
TTTGGAGGTTCTAGCAGGGGAGCGCATCTCCC
CAAACACACCCAACACACCA
+
IIIIIIIIIIIIIIIIIIIIIIHHHHIIIIHHHIIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIIIH
HHIIIIIIIIEIIB94422=4GEEEEEIBBBBHHHFIH??
?CII=?AEEEE
@F29EPBU01DER7Q
TGACGTGCAAATCGGTCGTCCGACCTCGGTAT
AGGGGCGAAGACTAATCGAACCATCTAGTAGC
5
File formats for aligned reads
  SAM (sequence alignment map)
CTTGGGCTGCGTCGTGTCTTCGCTTCACACCCGCGACGAGCGCGGCTTCT
CTTGGGCTGCGTCGTGTCTTCGCTTCACACC
Chr_ start end
Chr2 100000 100050
Chr_ start end
Chr2 50000 50050
6
Most commonly used alignment file formats
  SAM (sequence alignment map)
Unified format for storing alignments to a reference genome
  BAM (binary version of SAM) – used commonly to deliver data
Compressed SAM file, is normally indexed
  BED
Commonly used to report features described by chrom, start, end, name,
score, and strand.
For example:
chr1 11873 14409 uc001aaa.3 0 +
7
Variant Analysis – Class Topics
Introduction
Variant calling,
filtering and
annotation
Overview of
downstream
analyses
8
Efforts at creating databases of variants:
•  Project that started on 2002 with the goal of describing patterns of human
genetic variation and create a haplotype map using SNPs present in at
least 1% of the population, which were deposited in dbSNPs.
1000 Genomes
•  Started in 2008 with a goal of using at least 1000 individuals (about
2,500 samples at 4X coverage), interrogate 1000 gene regions in 900
samples (exome analysis), find most genetic variants with allele
frequencies above 1% and to a 0.1% if in coding regions as well as
Indels and structural variants
•  Make data available to the public ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/
or via Amazon Cloud http://s3.amazonaws.com/1000genomes
9
Variants are not only found in coding sequences
Main Goal:
•  Find all functional elements in the genome
10
Read
alignment
•  Mapping reads to reference
genome, processing to improve
alignment and mark PCR duplicates
Variant
identification
•  Variant calling and
genotyping
Annotation
and Filtering
•  Removing likely false
positive variants
Visualization
and
downstream
analyses
•  (e.g.
association
studies)
A typical variant calling pipeline
11
Mapping reads to reference
Whole Genome Exome-Seq
  Useful for identifying
SNPs but also structural
variants in any genomic
region.
  Interrogates variants in
specific genomic regions,
usually coding sequences.
  Cheaper and provides high
coverage
Read
alignment
Recommended alignment tools: BWA-MEM, Bowtie2, ssaha2, novoalign.
See more tools:
•  Seq Answers Forum: http://seqanswers.com/wiki/Software/list
12
How to identify structural variants?
http://www.sciencedirect.com/science/article/pii/S0168952511001685
Methods and SV tools
1- Read Pair – use pair ends.
GASVPro (Sindi 2012),
BreakDancer (Chen 2009)
2- Read depth – CNVnator
(Abyzov, 2010)
3- Split read – PinDEL (Ye
2009)
4- Denovo assembly
Learn more:
http://bit.ly/16A8FKR
Also see references:
http://bit.ly/1z7xnol - SV
http://bit.ly/qin294 - CNV
http://bit.ly/rLB7jc
http://bit.ly/17fdZTc
13
Exome-Seq
Targeted exome capture
  targets ~20,000 variants
near coding sequences
and a few rare missense or
loss of function variants
  Provides high depth of
coverage for more accurate
variant calling
  It is starting to be used as a
diagnostic tool
Ann Neurol. 2012 Jan;71(1):5-14.
-Nimblegen
-Agilent
-Illumina
14
Analysis
Workflow 
15
Mapping
subject and
family
sequence
data with
BWA
(paired end)
Variants
called with
SAMtools
Filtering out
common
variants in 1000
genomes (>0.05
allele freq)
500K variants
Annotation
using
ANNOVAR*
576 non-
synonymous
filtered
variants
Visualization
of each gene
Whole-exome sequencing reveals a heterozygous LRP5 mutation in a 6-
year-old boy with vertebral compression fractures and low trabecular bone
density Fahiminiya S, Majewski J, Roughley P, Roschger P, Klaushofer K, Rauch F. Bone: July 23, 2013
*Annovar annotated: the type of mutations, presence in
dbSNP132, minor allele frequency in the 1000 Genomes
project, SIFT, Polyphen-2 and PHASTCONS scores
16
Exome-seq
- Also used for finding somatic mutations
Exome sequencing of serous endometrial tumors identifies recurrent somatic
mutations in chromatin- remodeling and ubiquitin ligase complex genes
VOLUME 44 | NUMBER 12 | DECEMBER 2012 Nature Genetics
Figure 1: Somatic mutations
in CHD4, FBXW7 and SPOP
cluster within important
functional domains of the
encoded proteins.
17
Variant Analysis – Class Topics
Introduction
Variant
calling,
filtering and
annotation
Overview of
downstream
analyses
18
Variant
identification
Goals:
•  Identify variant bases, genotype likelihood and
allele frequency while avoiding instrument noise.
•  Generate an output file in vcf format (variant call
format) with genotypes assigned to each sample
19
Challenges of finding variants using NGS
data
  Base calling errors
•  Different types of errors that vary by technology, sequence
cycle and sequence context
 Low coverage sequencing
•  Lack of sequence from two chromosomes of a diploid
individual at a site
  Inaccurate mapping
•  Aligned reads should be reported with mapping quality score
20
Pre-Processing…
21
Processing reads for Variant Analysis
  Trimming:
•  Unless quality of read data is poor, low quality end trimming of short
reads (e.g. Illumina) prior to mapping is usually not required.
•  If adapters are present in a high number of reads, these could be
trimmed to improve mapping.
–  Recommended tools: Prinseq, Btrim64, FastQC
  Marking duplicates:
•  It is not required to remove duplicate reads prior to mapping but
instead it is recommended to mark duplicates after the alignment.
–  Recommended tool: Picard.
•  A library that is composed mainly of PCR duplicates could produce
inaccurate variant calling.
http://www.biomedcentral.com/1471-2164/13/S8/S8
Clip: http://bestclipartblog.com 22
Picard Tools (integrates with GATK)
AddOrReplaceReadGroups.jar
BamIndexStats.jar
BamToBfq.jar
BuildBamIndex.jar
CalculateHsMetrics.jar
CleanSam.jar
CollectAlignmentSummaryMetrics.jar
CollectCDnaMetrics.jar
CollectGcBiasMetrics.jar
CollectInsertSizeMetrics.jar
CollectMultipleMetrics.jar
CompareSAMs.jar
CreateSequenceDictionary.jar
EstimateLibraryComplexity.jar
ExtractIlluminaBarcodes.jar
ExtractSequences.jar
FastqToSam.jar
FixMateInformation.jar
IlluminaBasecallsToSam.jar
MarkDuplicates.jar
MeanQualityByCycle.jar
MergeBamAlignment.jar
MergeSamFiles.jar
NormalizeFasta.jar
picard-1.45.jar
QualityScoreDistribution.jar
ReorderSam.jar
ReplaceSamHeader.jar
RevertSam.jar
sam-1.45.jar
SamFormatConverter.jar
SamToFastq.jar
SortSam.jar
ValidateSamFile.jar
ViewSam.jar
http://picard.sourceforge.net/command-line-overview.shtml
java -jar QualityScoreDistribution.jar I=file.bam CHART=file.pdf
23
SNP Calling Methods
Early SNP callers and some commercial packages use a simple method
of counting reads for each allele that have passed a mapping quality
threshold. ** This is not good enough in particular when coverage is
low.
It is best to use advanced SNP callers which add more statistics for more
accurate variant calling of low coverage datasets and indels.
-e.g. GATK
24
SNP and Genotype calling
http://bib.oxfordjournals.org/content/early/2013/01/21/bib.bbs086/T1.expansion.html
Germline Variant Callers
Somatic Variant Callers
•  Varscan, MuTEC
Samtools: https://github.com/samtools/bcftools/wiki/HOWTOs 25
VCF format (version 4.0)
  Format used to report information about a position in the genome
  Use by 1000 genomes to report all variants
26
VCF format
http://www.broadinstitute.org/gsa/wiki/index.php/Understanding_the_Unified_Genotyper's_VCF_files
27
GATK pipeline (BROAD) for variant calling
http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows
28
GATK Haplotype Caller shows improvements on detecting
misalignments around gaps and reducing false positives
29
GATK uses a Base Quality Recalibration method
http://www.nature.com/ng/journal/v43/n5/full/ng.806.html
Covariates
•  the raw quality score
•  the position of the base in the read
•  the dinucleotide context and the read group
30
GATK uses a bayesian model
http://www.broadinstitute.org/gatk//events/2038/GATKwh0-BP-5-Variant_calling.pdf
•  It determines possible SNP and indel alleles
•  Computes, for each sample, for each genotype, likelihoods of data given
genotypes
•  Computes the allele frequency distribution to determine most likely allele
count, and emit a variant call if determined
– When it reports a variant, it assign a genotype to each sample
31
From the output of thousands of variants,
which ones should you consider?
  Various important considerations
•  Is the variant call of good quality?
•  If you expect a rare mutation, is the variant
commonly found in the general population?
•  What is the predicted effect of the variant?
–  Non-synonymous
–  Detrimental for function
32
Read
alignment
•  Mapping reads to reference
genome, processing to improve
alignment and mark pcr duplicates
Variant
identification
•  Variant calling and
genotyping
Annotation
and Filtering
•  Removing likely false
positive variants
Visualization
and
downstream
analyses
•  (e.g.
association
studies)
A typical variant calling pipeline
33
Filtering variants
  The initial set of variants is
usually filtered extensively in
hopes of removing false
positives.
  More filtering can include public
data or custom filters based on
in-house data
2 novel in chr10
In house
exome
dbSNPs
1000
genomes
17,687 SNPs
PLoS One. 2012;7(1):e29708 34
Removing the common and low frequency to get
the novel SNPs
http://www.cardiovasculairegeneeskunde.nl/bestanden/Kathiresan.pdf
35
Filtering variants
  The initial set of variants is usually filtered extensively
in hopes of removing false positives.
  More filtering can include public data or custom filters
based on in-house data
  Most exome studies will then filter common variants
(>1%) such as those in dbSNP database
–  Good tools for filtering and annotation: Annovar, SNPeff
–  Reports genes at or near variants as well as the type (non-
synonymous coding, UTR, splicing, etc.)
36
Filtering variants
  The initial set of variants is usually filtered extensively in
hopes of removing false positives.
  More filtering can include public data or custom filters
based on in-house data
  Most exome studies will then filter common variants (>1%)
such as those in dbSNP database
–  Good tools for filtering and annotation: Annovar, SNPeff
–  Reports genes at or near variants as well as the type (non-
synonymous coding, UTR, splicing, etc.)
  To filter by variant consequence, use SIFT and Polyphen2.
These are available via Annovar or ENSEMBL VEP
http://www.ensembl.org/Homo_sapiens/Tools/VEP
  For a more flexible annotation workflow, explore GEMINI
http://gemini.readthedocs.org/en/latest/
37
Evaluating Variant Consequence
SIFT predictions - SIFT predicts whether an amino acid substitution affects
protein function based on sequence homology and the physical properties of
amino acids.
PolyPhen predictions - PolyPhen is a tool which predicts possible impact of
an amino acid substitution on the structure and function of a human protein
using straightforward physical and comparative considerations.
Condel consensus predictions - Condel computes a weighed average of the
scores (WAS) of several computational tools aimed at classifying missense
mutations as likely deleterious or likely neutral. The VEP currently presents a
Condel WAS from SIFT and PolyPhen.
38
Read
alignment
•  Mapping reads to reference
genome, processing to improve
alignment and mark pcr duplicates
Variant
identification
•  Variant calling and
genotyping
Annotation
and Filtering
•  Removing likely false
positive variants
Visualization
and
downstream
analyses
•  (e.g.
association
studies)
A typical variant calling pipeline
39
Viewing reads in browser
  If your genome is available via the UCSC genome
browser http://genome.ucsc.edu/, import bam format
file to the UCSC genome browser by hosting the file on
a server and providing the link.
  If your genome is not in UCSC, use another browser
such as IGV http://www.broadinstitute.org/igv/ , or IGB
http://bioviz.org/igb/
•  Import genome (fasta)
•  Import annotations (gff3 or bed format)
•  Import data (bam)
40
Variant Analysis – Class Topics
Introduction
Variant
calling,
filtering and
annotation
Overview of
downstream
analyses
41
Before exome-seq there was GWAS
GWAS – Genome wide association study
  Examination of common genetic variants in individuals
and their association with a trait
  Analysis is usually association to disease (Case/control)
Genome-wide association
study of systemic sclerosis
identifies CD247 as a new
susceptibility locus
Nature Genetics 42, 426–429
(2010)
Manhatan Plot – each
dot is a SNP, Y-axis
shows association level
42
GWAS Association studies
  A typical analysis:
•  Identify SNPs
where one allele
is significantly
more common in
cases than
controls
•  Hardy-Weinberg
Chi Square
http://en.wikipedia.org/wiki/Genome-wide_association_study 43
Analysis Tools for genotype-phenotype association
from sequencing data
  Plink-seq - http://atgu.mgh.harvard.edu/plinkseq/
  EPACTS - http://genome.sph.umich.edu/wiki/EPACTS
  SNPTestv2 / GRANVIL - http://www.well.ox.ac.uk/GRANVIL/
44
Consider a more integrative
approach, which could
include adding more SNP or
exome data, biological
function, pedigree, gene
exprssion network and more.
Science 2002 Oltvai et al., pp. 763 - 764
45
A more integrative approach
http://www.genome.gov/Multimedia/
Slides/SequenceVariants2012/
Day2_IntegratedApproachWG.pdf
PLoS Genet. 2011 Jan
13;7(1):e1001273
46
Pathways Analysis
With pathways analysis
we can study:
• How genes and
molecules communicate
• The effect of a
disruption to a pathway
in relation to the
associated phenotype or
disease
• Which pathways are
likely to be affected an
affected individual
0 1 2 3 4 5
Reference
Test sample1
Test sample2
DNA damage Apoptosis
Interferon signalling
Enrichment and Network
47
Networks Analysis
  Encore: Genetic Association Interaction Network Centrality Pipeline
and Application to SLE Exome Data
Nicholas A. Davis1,†,‡, Caleb A. Lareau2,3,‡, Bill C. White1, Ahwan Pandey1, Graham Wiley3,
Courtney G. Montgomery3, Patrick M. Gaffney3, B. A. McKinney1,2,*
Article first published online: 5 JUN 2013
•  open source network analysis pipeline for genome-wide association
studies and rare variant data
48
Visualization
  Browsers
•  IGV - http://www.broadinstitute.org/igv/ - see slide 31
•  UCSC Genome browser - http://genome.ucsc.edu/
  Other
•  Cytoscape – can be used for example to visualize
variants in the context of affected genes
49
Other applications of variant calling in
Deep Sequence: Virus evolution
Deep Sequencing detects early variants in HIV evolution
http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/v-phaser
http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002529 50
Quasispecies evolution
Whole Genome Deep
Sequencing of HIV-1 Reveals
the Impact of Early Minor
Variants Upon Immune
Recognition During Acute
Infection
  They concluded that the early
control of HIV-1 replication by
immunodominant CD8+ T cell
responses may be substantially
influenced by rapid, low
frequency viral adaptations not
detected by conventional
sequencing approaches, which
warrants further investigation.
http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002529
51
Other applications of variant calling in
Deep Sequence: Fast genotyping
Outbreak investigation using high-throughput genome sequencing
within a diagnostic microbiology laboratory. J Clin Microbiol. 2013 Feb 13
Ion
Torrent
was used!
52
Links for variant calling and filtering tools
  Variant Annotation
-  Annovar http://www.openbioinformatics.org/annovar/
-  snpEff http://snpeff.sourceforge.net
-  SeattleSeq Annotation
http://snp.gs.washington.edu/SeattleSeqAnnotation137/
  Variant Consequence
-  SIFT: http://sift.jcvi.org/
-  Polyphen-2: http://genetics.bwh.harvard.edu/pph2/
  Multiple tools for VCF file manipulation, variant calling,
phenotype association
- Galaxy - https://main.g2.bx.psu.edu/root
53
Thank You
Question or Comments please contact:
quinonesm@niaid.nih.gov
ScienceApps@niaid.nih.gov
54

More Related Content

What's hot

Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
Hamza Khan
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
Farid MUSA
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
cursoNGS
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
Phil Ewels
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
mikaelhuss
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
Chakradhar Reddy
 
The Cancer Genome Atlas Update
The Cancer Genome Atlas UpdateThe Cancer Genome Atlas Update
The Cancer Genome Atlas Update
Melanoma Research Foundation
 
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesDNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
QIAGEN
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
Torsten Seemann
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Uzma Jabeen
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
Aneela Rafiq
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
Vall d'Hebron Institute of Research (VHIR)
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
LOGESWARAN KA
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
Sheetal Mehla
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
Gabe Rudy
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
Karan Veer Singh
 

What's hot (20)

Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
The Cancer Genome Atlas Update
The Cancer Genome Atlas UpdateThe Cancer Genome Atlas Update
The Cancer Genome Atlas Update
 
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesDNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 

Similar to Variant analysis and whole exome sequencing

Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
nist-spin
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
GenomeInABottle
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
Monica Munoz-Torres
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
Golden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
Golden Helix
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
GenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GenomeInABottle
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
Bioinformatics and Computational Biosciences Branch
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
Silpa87
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Stuart MacGowan
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
OECD Environment
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.
Monica Munoz-Torres
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 

Similar to Variant analysis and whole exome sequencing (20)

Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

More from Bioinformatics and Computational Biosciences Branch

Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
Bioinformatics and Computational Biosciences Branch
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
Bioinformatics and Computational Biosciences Branch
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Protein function prediction
Protein function predictionProtein function prediction
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
Bioinformatics and Computational Biosciences Branch
 
Biological networks
Biological networksBiological networks
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
Bioinformatics and Computational Biosciences Branch
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
Bioinformatics and Computational Biosciences Branch
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Categorical models
Categorical modelsCategorical models
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
Bioinformatics and Computational Biosciences Branch
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
Bioinformatics and Computational Biosciences Branch
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
Bioinformatics and Computational Biosciences Branch
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting

More from Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Biological networks
Biological networksBiological networks
Biological networks
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 

Recently uploaded

Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 

Recently uploaded (20)

Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 

Variant analysis and whole exome sequencing

  • 1. February 4th, 2015 Mariam Quiñones, PhD Computational Molecular Biology Specialist Bioinformatics and Computational Biosciences Branch OCICB/OSMO/OD/NIAID/NIH
  • 2. BCBB: A Branch Devoted to Bioinformatics and Computational Biosciences   Researchers’ time is increasingly important   BCBB saves our collaborators time and effort   Researchers speed projects to completion using BCBB consultation and development services   No need to hire extra post docs or use external consultants or developers What makes us different? Software DevelopersComputational Biologists PMs and Analysts 2
  • 3. BCBB   “NIH Users: Access a menu of BCBB services on the NIAID Intranet: •  http://bioinformatics.niaid.nih.gov/   Outside of NIH – •  search “BCBB” on the NIAID Public Internet Page: www.niaid.nih.gov   Email us at: •  ScienceApps@niaid.nih.gov 3
  • 4. Remember Sanger? •  Sanger introduced the “dideoxy method” (also known as Sanger sequencing) in December 1977 Alignment of reads using tools such as ‘Sequencher’ IMAGE: http://www.lifetechnologies.com 4
  • 6. File formats for aligned reads   SAM (sequence alignment map) CTTGGGCTGCGTCGTGTCTTCGCTTCACACCCGCGACGAGCGCGGCTTCT CTTGGGCTGCGTCGTGTCTTCGCTTCACACC Chr_ start end Chr2 100000 100050 Chr_ start end Chr2 50000 50050 6
  • 7. Most commonly used alignment file formats   SAM (sequence alignment map) Unified format for storing alignments to a reference genome   BAM (binary version of SAM) – used commonly to deliver data Compressed SAM file, is normally indexed   BED Commonly used to report features described by chrom, start, end, name, score, and strand. For example: chr1 11873 14409 uc001aaa.3 0 + 7
  • 8. Variant Analysis – Class Topics Introduction Variant calling, filtering and annotation Overview of downstream analyses 8
  • 9. Efforts at creating databases of variants: •  Project that started on 2002 with the goal of describing patterns of human genetic variation and create a haplotype map using SNPs present in at least 1% of the population, which were deposited in dbSNPs. 1000 Genomes •  Started in 2008 with a goal of using at least 1000 individuals (about 2,500 samples at 4X coverage), interrogate 1000 gene regions in 900 samples (exome analysis), find most genetic variants with allele frequencies above 1% and to a 0.1% if in coding regions as well as Indels and structural variants •  Make data available to the public ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/ or via Amazon Cloud http://s3.amazonaws.com/1000genomes 9
  • 10. Variants are not only found in coding sequences Main Goal: •  Find all functional elements in the genome 10
  • 11. Read alignment •  Mapping reads to reference genome, processing to improve alignment and mark PCR duplicates Variant identification •  Variant calling and genotyping Annotation and Filtering •  Removing likely false positive variants Visualization and downstream analyses •  (e.g. association studies) A typical variant calling pipeline 11
  • 12. Mapping reads to reference Whole Genome Exome-Seq   Useful for identifying SNPs but also structural variants in any genomic region.   Interrogates variants in specific genomic regions, usually coding sequences.   Cheaper and provides high coverage Read alignment Recommended alignment tools: BWA-MEM, Bowtie2, ssaha2, novoalign. See more tools: •  Seq Answers Forum: http://seqanswers.com/wiki/Software/list 12
  • 13. How to identify structural variants? http://www.sciencedirect.com/science/article/pii/S0168952511001685 Methods and SV tools 1- Read Pair – use pair ends. GASVPro (Sindi 2012), BreakDancer (Chen 2009) 2- Read depth – CNVnator (Abyzov, 2010) 3- Split read – PinDEL (Ye 2009) 4- Denovo assembly Learn more: http://bit.ly/16A8FKR Also see references: http://bit.ly/1z7xnol - SV http://bit.ly/qin294 - CNV http://bit.ly/rLB7jc http://bit.ly/17fdZTc 13
  • 14. Exome-Seq Targeted exome capture   targets ~20,000 variants near coding sequences and a few rare missense or loss of function variants   Provides high depth of coverage for more accurate variant calling   It is starting to be used as a diagnostic tool Ann Neurol. 2012 Jan;71(1):5-14. -Nimblegen -Agilent -Illumina 14
  • 16. Mapping subject and family sequence data with BWA (paired end) Variants called with SAMtools Filtering out common variants in 1000 genomes (>0.05 allele freq) 500K variants Annotation using ANNOVAR* 576 non- synonymous filtered variants Visualization of each gene Whole-exome sequencing reveals a heterozygous LRP5 mutation in a 6- year-old boy with vertebral compression fractures and low trabecular bone density Fahiminiya S, Majewski J, Roughley P, Roschger P, Klaushofer K, Rauch F. Bone: July 23, 2013 *Annovar annotated: the type of mutations, presence in dbSNP132, minor allele frequency in the 1000 Genomes project, SIFT, Polyphen-2 and PHASTCONS scores 16
  • 17. Exome-seq - Also used for finding somatic mutations Exome sequencing of serous endometrial tumors identifies recurrent somatic mutations in chromatin- remodeling and ubiquitin ligase complex genes VOLUME 44 | NUMBER 12 | DECEMBER 2012 Nature Genetics Figure 1: Somatic mutations in CHD4, FBXW7 and SPOP cluster within important functional domains of the encoded proteins. 17
  • 18. Variant Analysis – Class Topics Introduction Variant calling, filtering and annotation Overview of downstream analyses 18
  • 19. Variant identification Goals: •  Identify variant bases, genotype likelihood and allele frequency while avoiding instrument noise. •  Generate an output file in vcf format (variant call format) with genotypes assigned to each sample 19
  • 20. Challenges of finding variants using NGS data   Base calling errors •  Different types of errors that vary by technology, sequence cycle and sequence context  Low coverage sequencing •  Lack of sequence from two chromosomes of a diploid individual at a site   Inaccurate mapping •  Aligned reads should be reported with mapping quality score 20
  • 22. Processing reads for Variant Analysis   Trimming: •  Unless quality of read data is poor, low quality end trimming of short reads (e.g. Illumina) prior to mapping is usually not required. •  If adapters are present in a high number of reads, these could be trimmed to improve mapping. –  Recommended tools: Prinseq, Btrim64, FastQC   Marking duplicates: •  It is not required to remove duplicate reads prior to mapping but instead it is recommended to mark duplicates after the alignment. –  Recommended tool: Picard. •  A library that is composed mainly of PCR duplicates could produce inaccurate variant calling. http://www.biomedcentral.com/1471-2164/13/S8/S8 Clip: http://bestclipartblog.com 22
  • 23. Picard Tools (integrates with GATK) AddOrReplaceReadGroups.jar BamIndexStats.jar BamToBfq.jar BuildBamIndex.jar CalculateHsMetrics.jar CleanSam.jar CollectAlignmentSummaryMetrics.jar CollectCDnaMetrics.jar CollectGcBiasMetrics.jar CollectInsertSizeMetrics.jar CollectMultipleMetrics.jar CompareSAMs.jar CreateSequenceDictionary.jar EstimateLibraryComplexity.jar ExtractIlluminaBarcodes.jar ExtractSequences.jar FastqToSam.jar FixMateInformation.jar IlluminaBasecallsToSam.jar MarkDuplicates.jar MeanQualityByCycle.jar MergeBamAlignment.jar MergeSamFiles.jar NormalizeFasta.jar picard-1.45.jar QualityScoreDistribution.jar ReorderSam.jar ReplaceSamHeader.jar RevertSam.jar sam-1.45.jar SamFormatConverter.jar SamToFastq.jar SortSam.jar ValidateSamFile.jar ViewSam.jar http://picard.sourceforge.net/command-line-overview.shtml java -jar QualityScoreDistribution.jar I=file.bam CHART=file.pdf 23
  • 24. SNP Calling Methods Early SNP callers and some commercial packages use a simple method of counting reads for each allele that have passed a mapping quality threshold. ** This is not good enough in particular when coverage is low. It is best to use advanced SNP callers which add more statistics for more accurate variant calling of low coverage datasets and indels. -e.g. GATK 24
  • 25. SNP and Genotype calling http://bib.oxfordjournals.org/content/early/2013/01/21/bib.bbs086/T1.expansion.html Germline Variant Callers Somatic Variant Callers •  Varscan, MuTEC Samtools: https://github.com/samtools/bcftools/wiki/HOWTOs 25
  • 26. VCF format (version 4.0)   Format used to report information about a position in the genome   Use by 1000 genomes to report all variants 26
  • 28. GATK pipeline (BROAD) for variant calling http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows 28
  • 29. GATK Haplotype Caller shows improvements on detecting misalignments around gaps and reducing false positives 29
  • 30. GATK uses a Base Quality Recalibration method http://www.nature.com/ng/journal/v43/n5/full/ng.806.html Covariates •  the raw quality score •  the position of the base in the read •  the dinucleotide context and the read group 30
  • 31. GATK uses a bayesian model http://www.broadinstitute.org/gatk//events/2038/GATKwh0-BP-5-Variant_calling.pdf •  It determines possible SNP and indel alleles •  Computes, for each sample, for each genotype, likelihoods of data given genotypes •  Computes the allele frequency distribution to determine most likely allele count, and emit a variant call if determined – When it reports a variant, it assign a genotype to each sample 31
  • 32. From the output of thousands of variants, which ones should you consider?   Various important considerations •  Is the variant call of good quality? •  If you expect a rare mutation, is the variant commonly found in the general population? •  What is the predicted effect of the variant? –  Non-synonymous –  Detrimental for function 32
  • 33. Read alignment •  Mapping reads to reference genome, processing to improve alignment and mark pcr duplicates Variant identification •  Variant calling and genotyping Annotation and Filtering •  Removing likely false positive variants Visualization and downstream analyses •  (e.g. association studies) A typical variant calling pipeline 33
  • 34. Filtering variants   The initial set of variants is usually filtered extensively in hopes of removing false positives.   More filtering can include public data or custom filters based on in-house data 2 novel in chr10 In house exome dbSNPs 1000 genomes 17,687 SNPs PLoS One. 2012;7(1):e29708 34
  • 35. Removing the common and low frequency to get the novel SNPs http://www.cardiovasculairegeneeskunde.nl/bestanden/Kathiresan.pdf 35
  • 36. Filtering variants   The initial set of variants is usually filtered extensively in hopes of removing false positives.   More filtering can include public data or custom filters based on in-house data   Most exome studies will then filter common variants (>1%) such as those in dbSNP database –  Good tools for filtering and annotation: Annovar, SNPeff –  Reports genes at or near variants as well as the type (non- synonymous coding, UTR, splicing, etc.) 36
  • 37. Filtering variants   The initial set of variants is usually filtered extensively in hopes of removing false positives.   More filtering can include public data or custom filters based on in-house data   Most exome studies will then filter common variants (>1%) such as those in dbSNP database –  Good tools for filtering and annotation: Annovar, SNPeff –  Reports genes at or near variants as well as the type (non- synonymous coding, UTR, splicing, etc.)   To filter by variant consequence, use SIFT and Polyphen2. These are available via Annovar or ENSEMBL VEP http://www.ensembl.org/Homo_sapiens/Tools/VEP   For a more flexible annotation workflow, explore GEMINI http://gemini.readthedocs.org/en/latest/ 37
  • 38. Evaluating Variant Consequence SIFT predictions - SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. PolyPhen predictions - PolyPhen is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Condel consensus predictions - Condel computes a weighed average of the scores (WAS) of several computational tools aimed at classifying missense mutations as likely deleterious or likely neutral. The VEP currently presents a Condel WAS from SIFT and PolyPhen. 38
  • 39. Read alignment •  Mapping reads to reference genome, processing to improve alignment and mark pcr duplicates Variant identification •  Variant calling and genotyping Annotation and Filtering •  Removing likely false positive variants Visualization and downstream analyses •  (e.g. association studies) A typical variant calling pipeline 39
  • 40. Viewing reads in browser   If your genome is available via the UCSC genome browser http://genome.ucsc.edu/, import bam format file to the UCSC genome browser by hosting the file on a server and providing the link.   If your genome is not in UCSC, use another browser such as IGV http://www.broadinstitute.org/igv/ , or IGB http://bioviz.org/igb/ •  Import genome (fasta) •  Import annotations (gff3 or bed format) •  Import data (bam) 40
  • 41. Variant Analysis – Class Topics Introduction Variant calling, filtering and annotation Overview of downstream analyses 41
  • 42. Before exome-seq there was GWAS GWAS – Genome wide association study   Examination of common genetic variants in individuals and their association with a trait   Analysis is usually association to disease (Case/control) Genome-wide association study of systemic sclerosis identifies CD247 as a new susceptibility locus Nature Genetics 42, 426–429 (2010) Manhatan Plot – each dot is a SNP, Y-axis shows association level 42
  • 43. GWAS Association studies   A typical analysis: •  Identify SNPs where one allele is significantly more common in cases than controls •  Hardy-Weinberg Chi Square http://en.wikipedia.org/wiki/Genome-wide_association_study 43
  • 44. Analysis Tools for genotype-phenotype association from sequencing data   Plink-seq - http://atgu.mgh.harvard.edu/plinkseq/   EPACTS - http://genome.sph.umich.edu/wiki/EPACTS   SNPTestv2 / GRANVIL - http://www.well.ox.ac.uk/GRANVIL/ 44
  • 45. Consider a more integrative approach, which could include adding more SNP or exome data, biological function, pedigree, gene exprssion network and more. Science 2002 Oltvai et al., pp. 763 - 764 45
  • 46. A more integrative approach http://www.genome.gov/Multimedia/ Slides/SequenceVariants2012/ Day2_IntegratedApproachWG.pdf PLoS Genet. 2011 Jan 13;7(1):e1001273 46
  • 47. Pathways Analysis With pathways analysis we can study: • How genes and molecules communicate • The effect of a disruption to a pathway in relation to the associated phenotype or disease • Which pathways are likely to be affected an affected individual 0 1 2 3 4 5 Reference Test sample1 Test sample2 DNA damage Apoptosis Interferon signalling Enrichment and Network 47
  • 48. Networks Analysis   Encore: Genetic Association Interaction Network Centrality Pipeline and Application to SLE Exome Data Nicholas A. Davis1,†,‡, Caleb A. Lareau2,3,‡, Bill C. White1, Ahwan Pandey1, Graham Wiley3, Courtney G. Montgomery3, Patrick M. Gaffney3, B. A. McKinney1,2,* Article first published online: 5 JUN 2013 •  open source network analysis pipeline for genome-wide association studies and rare variant data 48
  • 49. Visualization   Browsers •  IGV - http://www.broadinstitute.org/igv/ - see slide 31 •  UCSC Genome browser - http://genome.ucsc.edu/   Other •  Cytoscape – can be used for example to visualize variants in the context of affected genes 49
  • 50. Other applications of variant calling in Deep Sequence: Virus evolution Deep Sequencing detects early variants in HIV evolution http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/v-phaser http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002529 50
  • 51. Quasispecies evolution Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection   They concluded that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002529 51
  • 52. Other applications of variant calling in Deep Sequence: Fast genotyping Outbreak investigation using high-throughput genome sequencing within a diagnostic microbiology laboratory. J Clin Microbiol. 2013 Feb 13 Ion Torrent was used! 52
  • 53. Links for variant calling and filtering tools   Variant Annotation -  Annovar http://www.openbioinformatics.org/annovar/ -  snpEff http://snpeff.sourceforge.net -  SeattleSeq Annotation http://snp.gs.washington.edu/SeattleSeqAnnotation137/   Variant Consequence -  SIFT: http://sift.jcvi.org/ -  Polyphen-2: http://genetics.bwh.harvard.edu/pph2/   Multiple tools for VCF file manipulation, variant calling, phenotype association - Galaxy - https://main.g2.bx.psu.edu/root 53
  • 54. Thank You Question or Comments please contact: quinonesm@niaid.nih.gov ScienceApps@niaid.nih.gov 54