Review Paper digit
Structural Variation Detection
Structural Variation Detection
Review Paper digit
Table of contents
• Detection of structural DNA variation from next generation
sequencing data: a review of informatic approaches
• The software pipeline digit
Structural Variation Detection
Review Paper digit
Detection of structural DNA variation from next generation sequencing
data: a review of informatic approaches
Authors: Haley J. Abel1, Eric J. Duncavage2
(1) Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
(2) Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
Structural Variation Detection
Review Paper digit
Definition
Structural DNA variation is generally defined as
variation in a DNA region larger than 1 kb and
includes several classes such as translocations,
inversions, insertions/deletions and copy number
variations (CNVs).
Structural Variation Detection
Review Paper digit
Methods
• Cytogenetics:
unbiased
BUT limited resolution/sensitivity (350-500 band level)
• FISH - Fluorescence in situ hybridization:
increased resolution, ability to test fixed interphase cells, faster turnaround time,
greater sensitivity
BUT evaluation of multiple loci requires multiple probes/assays ⇒ increasing
complexity
• Microarrays:
especially reliable for CNV and loss of heterozygosity
BUT unable to detect balanced translocations
• Next Generation Sequencing:
ability to detect full range of genetic variation ⇒ potential to streamline testing by
using a single analysis platform
BUT dependent on coverage ⇒ susceptible to GC bias
Structural Variation Detection
Review Paper digit
NGS - Methods
• Depth of coverage analysis
......
• Discordant read pair analysis
......
......
• Split read analysis
......
......
Structural Variation Detection
Review Paper digit
Tools
Structural Variation Detection
Review Paper digit
Translocation and Inversion Detection
Structural Variation Detection
Review Paper digit
Translocation and Inversion Detection
Discordant pair analysis:
• sensitive but low breakpoint resolution and low specificity
• repetetive regions on top of beeing a source of false positives drive
translocations (difficult to separate from false positives)
• Many methods try heuristic cut offs to improve specificity:
• VariationHunter and Hydra consider multiple, high scoring mappings if
available
• GASVPRO tries to improve specificity by combining discordant pair
and coverage analysis
Split read analysis: excellent breakpoint resolution (up to single base
resolution), but requires much higher coverages.
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
Discordant pair analysis:
• performs best on large deletions. struggles with dublications
• cannot detect large insertions with the usual strategy due to pairs not
spanning the dublication
• cannot detect large insertions with the usual strategy due to pairs not
spanning the dublication
• Pindel pieces translocation calls together via pattern growth algorithm
to find large insertions
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
Depth of coverage analysis:
• DNA
• Main problem is accounting for factors that modify read depth like GC
bias
• event-wise testing (EWT) algorithms rely purely on deviations in
coverage from the sample’s mean depth. GC content is adressed by
analysing the genome bin wise.
• SegSeq, CNVnator, CNAseg, CNV-seq compare the same region across
multiple samples (control samples). Methods make also use of
bins/partitions and rely on coverage ratios which permit finer CNV
mapping.
• Exome
• target-capture-data increases GC bias
• small size of targets makes paired normals or population controls a
requirement
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
• Exome methods calculate local CNV first and then merge them
together with various strategies
• CONTRA: uses circular binary segmentation for merging
• CoNVEX: denoises coverage ratios with a discrete wavelet transform
and then uses a Hidden Markov Model to identify gains and losses
• ExomeCNV: models B-allele frequencies to detect loss of
heterozygosity
• Some methods try to find sporadic CNVs in population exome data by
normalizing read count with principal component analysis
Structural Variation Detection
Review Paper digit
Insertion and Deletion Detection
Structural Variation Detection
Review Paper digit
Insertion and Deletion Detection
• Alignment based:
• offered by many packages: SAMtools, GATK, VarScan
• usually rely on probabilistic models to make indel calls
• Dindel and Stampy rely on this methods but employ filters to
differentiate common errors from true indels.
• all of these methods require considerable validation
• insertion detection is limited to 15% of total read length
• Split read based:
• Suitable for medium sized indels
• High false-positive rate, because no probabilistic models discriminate
between alignment errors and true events
Structural Variation Detection
Review Paper digit
Conclusion
• There is currently no single informatic method capable of identifying
the full range structural DNA variation.
• multiple complementary tools are required for robust variant detection
• Since methods can perform differently based on assay design,
extensive validation is required for clinical use.
Structural Variation Detection
Review Paper digit
digit - A tool for detection and identification of genomic
inter-chromosomal translocations
Authors: Richard Meier1,4, Stefan Graw1,4, Julian R Molina3, Peter
Beyerlein1, Devin Koestler2, Jeremy Chien4
(1) Technical University of Applied Sciences Wildau, 15745 Wildau, Germany
(2) Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160
(3) Department of Medical Oncology, Mayo Clinic, Rochester, MN 55905
(4) Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS 66160
Structural Variation Detection
Review Paper digit
Goals of the project
• Interchromosomal translocation detection utilizing mate-pair
sequencing data
• Handle artifacts and robustly remove false positive calls
• Investigate translocation profiles of populations / trait associated
groups
Structural Variation Detection
Review Paper digit
Mate-pair sequencing
sequencing
adapter ligationfragmentation
circularisation
fragmentation
genome / chromosome
template
terminal
fragment
read1 read2
Structural Variation Detection
Review Paper digit
digit overview
MVM
Density
01234
1.0 1.5 2.0
rejected approved
chromosome_1
chromosome_2
read_1 read_2
preprocessed read pairs
retain discordantly
mapping read pairs
find
read
pair
clusters
cluster_Bcluster_A .... . .
calculate MVMs for
each pair and filter
out low value pairs
recluster
remaining
read pairs
compare samples
and search for
group associations
called translocations
chr14:1573290-158941 & chr22:2732247-2735312
chr2:11002738-11002738 & chr3:3763766-3766175
chr11:1573290-158941 & chr17:1147275-11149839
chr5:25819112-25821940 & chr9:5151006-5154147
. . . . . . .... . .
sample_1
sample_4
sample_5
sample_9
discordant read pair cluster
group associated super cluster
concordant pairs
discordant pairs
threshold
Structural Variation Detection
Review Paper digit
Mapping validity measure (MVM)
... ...
AC T GG G A CT A C T ACG TA C G T
AC T GG G A CT G C T ACG G AC CC A GG CT
G A CT A C T ACG
TA C G T
G AC CC A GG CT
2kb
mapper assigns
read to region
mapper assigns
read to region
chromosome A
chromosome B
G T A T C C CA A TC G C AT ......
......
but
• The two reads of a read pair are remapped to both regions the mapping software
originally assigned them to.
• If a read maps equally well to both regions it is impossible to resolve the read
pair’s origin and it is rejected.
• The MVM judges how ambiguous the mappability of a read pair is.
• The MVM distribution of concordant (well behaved) read pairs in a sample are
used as internal standard to determine a filtering threshold.
Structural Variation Detection
Review Paper digit
Simulated data
Structural Variation Detection
Review Paper digit
Real data
Samples achieved a good separation between ambiguous and distinct read
pairs via MVM thresholds across the board.
concordant
discordant
threshold
1.0 1.5 2.0 2.5
012345
sample LU526
N = 749 Bandwidth = 0.02034
Density
1.0 1.5 2.0 2.5
01234
sample LU748
N = 461 Bandwidth = 0.04017
Density
1.0 1.5 2.0 2.5
02468
sample LU271
N = 641 Bandwidth = 0.01287
Density
1.0 1.5 2.0 2.5
0123456
sample LU820
N = 534 Bandwidth = 0.02189
Density
1.0 1.5 2.0 2.5
01234
sample LU1160
N = 268 Bandwidth = 0.05798
Density
1.0 1.5 2.0 2.5
01234
sample LU1184
N = 370 Bandwidth = 0.04009
Density
1.0 1.5 2.0 2.5
012345
sample LU1434
N = 391 Bandwidth = 0.02477
Density
1.0 1.5 2.0 2.502468
sample LU1466
N = 585 Bandwidth = 0.01317
Density
Structural Variation Detection
Review Paper digit
Real data
• We processed 20 patient samples from a non-cancer background and
35 patient samples with a lung cancer background.
• After comparing the two populations we retrieved 218 sample specific
events, 160 of which were from cancer.
• 328 translocation calls were shared between 2 or more samples
• 16 translocations were shared between cancer samples exclusively.
• 13 translocations shared between cancer and normal samples were
labeled potentially disease relevant.
Structural Variation Detection
Review Paper digit
Translocations exclusively found in cancer
Structural Variation Detection
Review Paper digit
Translocations enriched in cancer
Structural Variation Detection
Review Paper digit
Conclusion
• The method sucessfully reduces the false positives rate.
• Group comparision and population analysis is working, but will require
more samples to make reliable judgements in the future.
• Comparisions with other tools are running as we speak.
• Combining strategies from different tools might be valuable to look
into in future projects.
Structural Variation Detection
Review Paper digit
Questions
?Structural Variation Detection

Structural Variation Detection

  • 1.
    Review Paper digit StructuralVariation Detection Structural Variation Detection
  • 2.
    Review Paper digit Tableof contents • Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches • The software pipeline digit Structural Variation Detection
  • 3.
    Review Paper digit Detectionof structural DNA variation from next generation sequencing data: a review of informatic approaches Authors: Haley J. Abel1, Eric J. Duncavage2 (1) Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA (2) Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA Structural Variation Detection
  • 4.
    Review Paper digit Definition StructuralDNA variation is generally defined as variation in a DNA region larger than 1 kb and includes several classes such as translocations, inversions, insertions/deletions and copy number variations (CNVs). Structural Variation Detection
  • 5.
    Review Paper digit Methods •Cytogenetics: unbiased BUT limited resolution/sensitivity (350-500 band level) • FISH - Fluorescence in situ hybridization: increased resolution, ability to test fixed interphase cells, faster turnaround time, greater sensitivity BUT evaluation of multiple loci requires multiple probes/assays ⇒ increasing complexity • Microarrays: especially reliable for CNV and loss of heterozygosity BUT unable to detect balanced translocations • Next Generation Sequencing: ability to detect full range of genetic variation ⇒ potential to streamline testing by using a single analysis platform BUT dependent on coverage ⇒ susceptible to GC bias Structural Variation Detection
  • 6.
    Review Paper digit NGS- Methods • Depth of coverage analysis ...... • Discordant read pair analysis ...... ...... • Split read analysis ...... ...... Structural Variation Detection
  • 7.
  • 8.
    Review Paper digit Translocationand Inversion Detection Structural Variation Detection
  • 9.
    Review Paper digit Translocationand Inversion Detection Discordant pair analysis: • sensitive but low breakpoint resolution and low specificity • repetetive regions on top of beeing a source of false positives drive translocations (difficult to separate from false positives) • Many methods try heuristic cut offs to improve specificity: • VariationHunter and Hydra consider multiple, high scoring mappings if available • GASVPRO tries to improve specificity by combining discordant pair and coverage analysis Split read analysis: excellent breakpoint resolution (up to single base resolution), but requires much higher coverages. Structural Variation Detection
  • 10.
    Review Paper digit CopyNumber Variation Detection Structural Variation Detection
  • 11.
    Review Paper digit CopyNumber Variation Detection Discordant pair analysis: • performs best on large deletions. struggles with dublications • cannot detect large insertions with the usual strategy due to pairs not spanning the dublication • cannot detect large insertions with the usual strategy due to pairs not spanning the dublication • Pindel pieces translocation calls together via pattern growth algorithm to find large insertions Structural Variation Detection
  • 12.
    Review Paper digit CopyNumber Variation Detection Depth of coverage analysis: • DNA • Main problem is accounting for factors that modify read depth like GC bias • event-wise testing (EWT) algorithms rely purely on deviations in coverage from the sample’s mean depth. GC content is adressed by analysing the genome bin wise. • SegSeq, CNVnator, CNAseg, CNV-seq compare the same region across multiple samples (control samples). Methods make also use of bins/partitions and rely on coverage ratios which permit finer CNV mapping. • Exome • target-capture-data increases GC bias • small size of targets makes paired normals or population controls a requirement Structural Variation Detection
  • 13.
    Review Paper digit CopyNumber Variation Detection • Exome methods calculate local CNV first and then merge them together with various strategies • CONTRA: uses circular binary segmentation for merging • CoNVEX: denoises coverage ratios with a discrete wavelet transform and then uses a Hidden Markov Model to identify gains and losses • ExomeCNV: models B-allele frequencies to detect loss of heterozygosity • Some methods try to find sporadic CNVs in population exome data by normalizing read count with principal component analysis Structural Variation Detection
  • 14.
    Review Paper digit Insertionand Deletion Detection Structural Variation Detection
  • 15.
    Review Paper digit Insertionand Deletion Detection • Alignment based: • offered by many packages: SAMtools, GATK, VarScan • usually rely on probabilistic models to make indel calls • Dindel and Stampy rely on this methods but employ filters to differentiate common errors from true indels. • all of these methods require considerable validation • insertion detection is limited to 15% of total read length • Split read based: • Suitable for medium sized indels • High false-positive rate, because no probabilistic models discriminate between alignment errors and true events Structural Variation Detection
  • 16.
    Review Paper digit Conclusion •There is currently no single informatic method capable of identifying the full range structural DNA variation. • multiple complementary tools are required for robust variant detection • Since methods can perform differently based on assay design, extensive validation is required for clinical use. Structural Variation Detection
  • 17.
    Review Paper digit digit- A tool for detection and identification of genomic inter-chromosomal translocations Authors: Richard Meier1,4, Stefan Graw1,4, Julian R Molina3, Peter Beyerlein1, Devin Koestler2, Jeremy Chien4 (1) Technical University of Applied Sciences Wildau, 15745 Wildau, Germany (2) Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160 (3) Department of Medical Oncology, Mayo Clinic, Rochester, MN 55905 (4) Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS 66160 Structural Variation Detection
  • 18.
    Review Paper digit Goalsof the project • Interchromosomal translocation detection utilizing mate-pair sequencing data • Handle artifacts and robustly remove false positive calls • Investigate translocation profiles of populations / trait associated groups Structural Variation Detection
  • 19.
    Review Paper digit Mate-pairsequencing sequencing adapter ligationfragmentation circularisation fragmentation genome / chromosome template terminal fragment read1 read2 Structural Variation Detection
  • 20.
    Review Paper digit digitoverview MVM Density 01234 1.0 1.5 2.0 rejected approved chromosome_1 chromosome_2 read_1 read_2 preprocessed read pairs retain discordantly mapping read pairs find read pair clusters cluster_Bcluster_A .... . . calculate MVMs for each pair and filter out low value pairs recluster remaining read pairs compare samples and search for group associations called translocations chr14:1573290-158941 & chr22:2732247-2735312 chr2:11002738-11002738 & chr3:3763766-3766175 chr11:1573290-158941 & chr17:1147275-11149839 chr5:25819112-25821940 & chr9:5151006-5154147 . . . . . . .... . . sample_1 sample_4 sample_5 sample_9 discordant read pair cluster group associated super cluster concordant pairs discordant pairs threshold Structural Variation Detection
  • 21.
    Review Paper digit Mappingvalidity measure (MVM) ... ... AC T GG G A CT A C T ACG TA C G T AC T GG G A CT G C T ACG G AC CC A GG CT G A CT A C T ACG TA C G T G AC CC A GG CT 2kb mapper assigns read to region mapper assigns read to region chromosome A chromosome B G T A T C C CA A TC G C AT ...... ...... but • The two reads of a read pair are remapped to both regions the mapping software originally assigned them to. • If a read maps equally well to both regions it is impossible to resolve the read pair’s origin and it is rejected. • The MVM judges how ambiguous the mappability of a read pair is. • The MVM distribution of concordant (well behaved) read pairs in a sample are used as internal standard to determine a filtering threshold. Structural Variation Detection
  • 22.
    Review Paper digit Simulateddata Structural Variation Detection
  • 23.
    Review Paper digit Realdata Samples achieved a good separation between ambiguous and distinct read pairs via MVM thresholds across the board. concordant discordant threshold 1.0 1.5 2.0 2.5 012345 sample LU526 N = 749 Bandwidth = 0.02034 Density 1.0 1.5 2.0 2.5 01234 sample LU748 N = 461 Bandwidth = 0.04017 Density 1.0 1.5 2.0 2.5 02468 sample LU271 N = 641 Bandwidth = 0.01287 Density 1.0 1.5 2.0 2.5 0123456 sample LU820 N = 534 Bandwidth = 0.02189 Density 1.0 1.5 2.0 2.5 01234 sample LU1160 N = 268 Bandwidth = 0.05798 Density 1.0 1.5 2.0 2.5 01234 sample LU1184 N = 370 Bandwidth = 0.04009 Density 1.0 1.5 2.0 2.5 012345 sample LU1434 N = 391 Bandwidth = 0.02477 Density 1.0 1.5 2.0 2.502468 sample LU1466 N = 585 Bandwidth = 0.01317 Density Structural Variation Detection
  • 24.
    Review Paper digit Realdata • We processed 20 patient samples from a non-cancer background and 35 patient samples with a lung cancer background. • After comparing the two populations we retrieved 218 sample specific events, 160 of which were from cancer. • 328 translocation calls were shared between 2 or more samples • 16 translocations were shared between cancer samples exclusively. • 13 translocations shared between cancer and normal samples were labeled potentially disease relevant. Structural Variation Detection
  • 25.
    Review Paper digit Translocationsexclusively found in cancer Structural Variation Detection
  • 26.
    Review Paper digit Translocationsenriched in cancer Structural Variation Detection
  • 27.
    Review Paper digit Conclusion •The method sucessfully reduces the false positives rate. • Group comparision and population analysis is working, but will require more samples to make reliable judgements in the future. • Comparisions with other tools are running as we speak. • Combining strategies from different tools might be valuable to look into in future projects. Structural Variation Detection
  • 28.