The Global Network For Genomic
Medicine™
®
Using Genome in a Bottle Data
Andrew Carroll, PhD
Director of Science
®
2
What is DNAnexus
Genomic Analysis in the Cloud. Scalable, Cost Effective, Secure, Compliant.
®
3
What’s in the Talk
• GIAB in PrecisionFDA
• Datasets on DNAnexus
• Example 1: Comparing mapper+variant caller combination
• Example 2: Assessing structural variation in AJ-Trio
https://precision.fda.gov/
®
10
Public X-Ten Data on DNAnexus
®
11
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
BWA is used for mapping sequences against a large reference genomes, such as the
human genome. It preforms very well for low divergent sequences or reads.
Bowtie2 is a memory efficient tool for aligning sequencing reads to long reference
sequences. It performs extremely well for sequences length between 50 bp and
1000.
ISAAC, developed by Illumina, is a set of DNA sequence aligner and variant caller
that uses high memory hardware to improve efficiency and accuracy.
SNAP is a relatively new aligner as accurate as existing tools like BWA-mem,
Bowtie2 and Novoalign. SNAP was developed by a team from the UC Berkeley AMP
Lab, Microsoft, and UCSF.
Mappers
®
12
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Atlas is a variant caller that is known differentiating between the genuine SNPs and
indels from sequencing and mapping errors. It is mainly used for whole exome
data.
FreeBayes is a haplotype-based Bayesian genetic variant caller designed to find
small polymorphisms, specifically SNPs, indels, MNPs and complex events smaller
than the length of a short-read sequencing alignment.
GATK Haplotype Caller is one of the most popular variant caller. It calls SNPs and
Indels simultaneously using local de novo assembly and a Bayesian statistical
model.
ISAAC, developed by Illumina, is a set of DNA sequence aligner and variant caller
that uses high memory hardware to improve efficiency and accuracy.
Platypus is an efficient variant detection tool, that can detect SNPs, MNPs, short
indels and replacements up to several kb.
Variant Callers
®
13
One Example Analysis
®
14
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.82350 0.94507 0.95825 0.94282 0.89522
BWA 0.97194 0.94550 0.98176 0.93319 0.91955
ISAAC 0.88343 0.93066 0.96390 0.95965 0.90659
SNAP 0.86781 0.93111 0.97531 0.96337 0.91221
0.80000
0.82000
0.84000
0.86000
0.88000
0.90000
0.92000
0.94000
0.96000
0.98000
1.00000
Percentage
SENSITIVITY
SNPs
®
15
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
SNPs
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.99333 0.98662 0.98946 0.98716 0.98762
BWA 0.98641 0.99229 0.98857 0.93550 0.99093
ISAAC 0.99296 0.99244 0.98332 0.98800 0.99128
SNAP 0.96294 0.98893 0.97712 0.97778 0.99114
0.93000
0.94000
0.95000
0.96000
0.97000
0.98000
0.99000
1.00000Percentage
SPECIFICITY
®
16
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
SNPs
Bowtie BWA ISAAC SNAP
Sensitivity 0.91297 0.95039 0.92885 0.93980
Precision 0.98884 0.97874 0.98960 0.97958
0.89000
0.91000
0.93000
0.95000
0.97000
0.99000
1.01000Percentage
AVERAGE Sensitivity and Specificity By Mappers
®
17
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Indels
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.49795 0.78538 0.81924 0.81839 0.85304
BWA 0.76214 0.83467 0.79286 0.78319 0.87780
ISAAC 0.50326 0.73087 0.74374 0.78490 0.83327
SNAP 0.39509 0.71881 0.64430 0.09749 0.68149
0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
0.60000
0.70000
0.80000
0.90000
1.00000
AxisTitle
SENSITIVITY
®
18
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Indels
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.78217 0.72941 0.62806 0.81787 0.65759
BWA 0.73470 0.85447 0.73998 0.69092 0.67424
ISAAC 0.62956 0.72843 0.85435 0.84755 0.66588
SNAP 0.57962 0.70933 0.64279 0.07105 0.36763
0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
0.60000
0.70000
0.80000
0.90000
Percentage SPECIFICITY
®
19
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Bowtie BWA ISAAC SNAP
CPU-hours 308.3 236 94.4 102.7
0
100
200
300
400
CPU-hour
Mappers CPU-hours
Atlas Freebayes GATK ISAAC Platypus
CPU-hours 270.6 60.8 436.4 37.9 10.9
0
100
200
300
400
500
CPU-hour
Variants Callers CPU-hours
®
20
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
®
21
Use of AJ-Trio to Understand SV
22
Baylor College of Medicine
Characterizing large genomic variants is essential to expanding the
research & clinical applications of genome sequencing.
Adam
English
Will
Salerno
Narayanan
Veeraraghavan
Singer
Ma
Andrew
Carroll
23
Pipeline Schematic
24
Development through Orthogonal Technology
25
Development through Orthogonal Technology
26
GIAB Inheritance Benhmarks
DNAnexus is working actively with Genome in a Bottle to help
develop high quality benchmark datasets for structural variations
in the Ashkenazi Jewish Trio, applying Parliament alongside to
combine Illumina and PacBio alongside other techniques.
Jan2016 dnanexus giab uses andrew carroll

Jan2016 dnanexus giab uses andrew carroll

  • 1.
    The Global NetworkFor Genomic Medicine™ ® Using Genome in a Bottle Data Andrew Carroll, PhD Director of Science
  • 2.
    ® 2 What is DNAnexus GenomicAnalysis in the Cloud. Scalable, Cost Effective, Secure, Compliant.
  • 3.
    ® 3 What’s in theTalk • GIAB in PrecisionFDA • Datasets on DNAnexus • Example 1: Comparing mapper+variant caller combination • Example 2: Assessing structural variation in AJ-Trio
  • 9.
  • 10.
  • 11.
    ® 11 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com BWA is used for mapping sequences against a large reference genomes, such as the human genome. It preforms very well for low divergent sequences or reads. Bowtie2 is a memory efficient tool for aligning sequencing reads to long reference sequences. It performs extremely well for sequences length between 50 bp and 1000. ISAAC, developed by Illumina, is a set of DNA sequence aligner and variant caller that uses high memory hardware to improve efficiency and accuracy. SNAP is a relatively new aligner as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign. SNAP was developed by a team from the UC Berkeley AMP Lab, Microsoft, and UCSF. Mappers
  • 12.
    ® 12 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com Atlas is a variant caller that is known differentiating between the genuine SNPs and indels from sequencing and mapping errors. It is mainly used for whole exome data. FreeBayes is a haplotype-based Bayesian genetic variant caller designed to find small polymorphisms, specifically SNPs, indels, MNPs and complex events smaller than the length of a short-read sequencing alignment. GATK Haplotype Caller is one of the most popular variant caller. It calls SNPs and Indels simultaneously using local de novo assembly and a Bayesian statistical model. ISAAC, developed by Illumina, is a set of DNA sequence aligner and variant caller that uses high memory hardware to improve efficiency and accuracy. Platypus is an efficient variant detection tool, that can detect SNPs, MNPs, short indels and replacements up to several kb. Variant Callers
  • 13.
  • 14.
    ® 14 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com Atlas Freebayes GATK ISAAC Platypus Bowrie2 0.82350 0.94507 0.95825 0.94282 0.89522 BWA 0.97194 0.94550 0.98176 0.93319 0.91955 ISAAC 0.88343 0.93066 0.96390 0.95965 0.90659 SNAP 0.86781 0.93111 0.97531 0.96337 0.91221 0.80000 0.82000 0.84000 0.86000 0.88000 0.90000 0.92000 0.94000 0.96000 0.98000 1.00000 Percentage SENSITIVITY SNPs
  • 15.
    ® 15 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com SNPs Atlas Freebayes GATK ISAAC Platypus Bowrie2 0.99333 0.98662 0.98946 0.98716 0.98762 BWA 0.98641 0.99229 0.98857 0.93550 0.99093 ISAAC 0.99296 0.99244 0.98332 0.98800 0.99128 SNAP 0.96294 0.98893 0.97712 0.97778 0.99114 0.93000 0.94000 0.95000 0.96000 0.97000 0.98000 0.99000 1.00000Percentage SPECIFICITY
  • 16.
    ® 16 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com SNPs Bowtie BWA ISAAC SNAP Sensitivity 0.91297 0.95039 0.92885 0.93980 Precision 0.98884 0.97874 0.98960 0.97958 0.89000 0.91000 0.93000 0.95000 0.97000 0.99000 1.01000Percentage AVERAGE Sensitivity and Specificity By Mappers
  • 17.
    ® 17 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com Indels Atlas Freebayes GATK ISAAC Platypus Bowrie2 0.49795 0.78538 0.81924 0.81839 0.85304 BWA 0.76214 0.83467 0.79286 0.78319 0.87780 ISAAC 0.50326 0.73087 0.74374 0.78490 0.83327 SNAP 0.39509 0.71881 0.64430 0.09749 0.68149 0.00000 0.10000 0.20000 0.30000 0.40000 0.50000 0.60000 0.70000 0.80000 0.90000 1.00000 AxisTitle SENSITIVITY
  • 18.
    ® 18 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com Indels Atlas Freebayes GATK ISAAC Platypus Bowrie2 0.78217 0.72941 0.62806 0.81787 0.65759 BWA 0.73470 0.85447 0.73998 0.69092 0.67424 ISAAC 0.62956 0.72843 0.85435 0.84755 0.66588 SNAP 0.57962 0.70933 0.64279 0.07105 0.36763 0.00000 0.10000 0.20000 0.30000 0.40000 0.50000 0.60000 0.70000 0.80000 0.90000 Percentage SPECIFICITY
  • 19.
    ® 19 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com Bowtie BWA ISAAC SNAP CPU-hours 308.3 236 94.4 102.7 0 100 200 300 400 CPU-hour Mappers CPU-hours Atlas Freebayes GATK ISAAC Platypus CPU-hours 270.6 60.8 436.4 37.9 10.9 0 100 200 300 400 500 CPU-hour Variants Callers CPU-hours
  • 20.
    ® 20 Benchmarking well knowbioinformatics aligners and variant callers using the Pilot Genome (NA12878) A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
  • 21.
    ® 21 Use of AJ-Trioto Understand SV
  • 22.
    22 Baylor College ofMedicine Characterizing large genomic variants is essential to expanding the research & clinical applications of genome sequencing. Adam English Will Salerno Narayanan Veeraraghavan Singer Ma Andrew Carroll
  • 23.
  • 24.
  • 25.
  • 26.
    26 GIAB Inheritance Benhmarks DNAnexusis working actively with Genome in a Bottle to help develop high quality benchmark datasets for structural variations in the Ashkenazi Jewish Trio, applying Parliament alongside to combine Illumina and PacBio alongside other techniques.