Genome variation graphs
with the vg toolkit
Jean Monlong Updates from the GRC & GIAB
Oct 15, 2019
Variation Graphs
An approach to incorporating information on human diversity into the
genomic reference.
2
Sequencing reads map better on variation graphs
3
is a complete,
open source solution
for graph construction,
read mapping,
and variant calling.
https://github.com/vgteam/vg
4
Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.
Better read mapping in regions with variants
5
Better allele balances at heterozygous sites
Deletion Size Insertion Size
SNP
AltAlleleFraction
Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.
6
Structural Variants (SVs)
Genomic variants >50bp. E.g. insertions, deletions, inversions.
7
Graph-based SV genotypers:
vg, Paragraph, BayesTyper
Traditional SV genotypers:
Delly, SVTyper
Genotyping SVs from
short-read data
High-quality SV catalogs from
long-read sequencing studies
(HGSVC 2019, GIAB 2019, SVPOP 2019).
Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv.
8
Graph from de novo assemblies
Experiment with 12 yeast strains.
- better read mapping.
- SV better supported by reads.
Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv.
9
Variant normalization
construct
graph
realign
Robin Rounthwaite
10
Transcriptome analysis with variation graphs
Jonas Sibbesen
https://github.com/jonassibbesen/rpvg
11
New graph formats optimized for memory and speed
VG: Legacy
PG (PackedGraph): Small
HG (HashGraph): Fast
OG (Optimized Dynamic Graph Index):
Balanced
XG: Small and fast, but static
Jordan Eizenga, Emily Fujimoto, Erik Garrison
https://github.com/vgteam/libbdsg
12
13
Graph Burrows-Wheeler Transform
Stores 2,504 haplotypes from 1K Genomes Project in 14.6 GiB
Can scale past tens of thousands of haplotypes
Sirén et al. (2018). Haplotype-aware graph indexes. In 18th International Workshop on Algorithms in Bioinformatics (WABI)
https://github.com/jltsiren/gbwtgraph
Faster short read mapping with giraffe
Xian Chang, Jouni Siren, Adam Novak
Assuming that most indels are in the graph and a large
haplotype database.
- Minimizer-based indexing.
- Restrict to known haplotypes.
- Gapless extension.
- Faster clustering using a snarl tree.
14
https://github.com/vgteam/sequenceTubeMap
Visualization
Beyer et al. (2019). Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics (Oxford, England).
15
Pipelines in Toil and WDL
https://github.com/vgteam/toil-vg
https://github.com/vgteam/vg_wdl
toil-vg WDL
Graph construction x
Read mapping x x
Variant calling( SNV/indels & SVs) x x
Charles Markello, Glenn Hickey, Mike Lin, Eric T. Dawson
16
Acknowledgements
Benedict Paten
Adam Novak
Erik Garrison
Jordan Eizenga
Glenn Hickey
Jouni Siren
David Heller
Check out Charles Markello’ Platform Talk (PgmNr 15) on applying vg for
rare variant discovery!
https://github.com/vgteam/vg
17
Jonas Sibbesen
Xian Chang
Charles Markello
Yohei Rosen
Robin Rounthwaite
Susanna Morin
Emily Fujimoto
Eric Dawson
Mike Lin
Wolfgang Beyer
Richard Durbin
Daniel Zerbino

Genome variation graphs with the vg toolkit

  • 1.
    Genome variation graphs withthe vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019
  • 2.
    Variation Graphs An approachto incorporating information on human diversity into the genomic reference. 2
  • 3.
    Sequencing reads mapbetter on variation graphs 3
  • 4.
    is a complete, opensource solution for graph construction, read mapping, and variant calling. https://github.com/vgteam/vg 4
  • 5.
    Garrison et al(2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology. Better read mapping in regions with variants 5
  • 6.
    Better allele balancesat heterozygous sites Deletion Size Insertion Size SNP AltAlleleFraction Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology. 6
  • 7.
    Structural Variants (SVs) Genomicvariants >50bp. E.g. insertions, deletions, inversions. 7
  • 8.
    Graph-based SV genotypers: vg,Paragraph, BayesTyper Traditional SV genotypers: Delly, SVTyper Genotyping SVs from short-read data High-quality SV catalogs from long-read sequencing studies (HGSVC 2019, GIAB 2019, SVPOP 2019). Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv. 8
  • 9.
    Graph from denovo assemblies Experiment with 12 yeast strains. - better read mapping. - SV better supported by reads. Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv. 9
  • 10.
  • 11.
    Transcriptome analysis withvariation graphs Jonas Sibbesen https://github.com/jonassibbesen/rpvg 11
  • 12.
    New graph formatsoptimized for memory and speed VG: Legacy PG (PackedGraph): Small HG (HashGraph): Fast OG (Optimized Dynamic Graph Index): Balanced XG: Small and fast, but static Jordan Eizenga, Emily Fujimoto, Erik Garrison https://github.com/vgteam/libbdsg 12
  • 13.
    13 Graph Burrows-Wheeler Transform Stores2,504 haplotypes from 1K Genomes Project in 14.6 GiB Can scale past tens of thousands of haplotypes Sirén et al. (2018). Haplotype-aware graph indexes. In 18th International Workshop on Algorithms in Bioinformatics (WABI) https://github.com/jltsiren/gbwtgraph
  • 14.
    Faster short readmapping with giraffe Xian Chang, Jouni Siren, Adam Novak Assuming that most indels are in the graph and a large haplotype database. - Minimizer-based indexing. - Restrict to known haplotypes. - Gapless extension. - Faster clustering using a snarl tree. 14
  • 15.
    https://github.com/vgteam/sequenceTubeMap Visualization Beyer et al.(2019). Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics (Oxford, England). 15
  • 16.
    Pipelines in Toiland WDL https://github.com/vgteam/toil-vg https://github.com/vgteam/vg_wdl toil-vg WDL Graph construction x Read mapping x x Variant calling( SNV/indels & SVs) x x Charles Markello, Glenn Hickey, Mike Lin, Eric T. Dawson 16
  • 17.
    Acknowledgements Benedict Paten Adam Novak ErikGarrison Jordan Eizenga Glenn Hickey Jouni Siren David Heller Check out Charles Markello’ Platform Talk (PgmNr 15) on applying vg for rare variant discovery! https://github.com/vgteam/vg 17 Jonas Sibbesen Xian Chang Charles Markello Yohei Rosen Robin Rounthwaite Susanna Morin Emily Fujimoto Eric Dawson Mike Lin Wolfgang Beyer Richard Durbin Daniel Zerbino