ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.

OPTIMIZATION AND
PERFORMANCE OF A VERY LARGE
MPS SNP PANEL FOR MISSING
PERSONS
Michelle Peck, Sejla Idrizbegovic, Felix Bittner, and Thomas Parsons
May 3 2018,7th QIAGEN Investigator Forum, San Antonio, TX

International Commission on
Missing Persons
ICMP endeavors to secure
the co-operation of
governments and other
authorities in locating and
identifying persons missing
as a result of conflicts,
human rights abuses,
disasters, organized violence
and other causes and to
assist them in doing so.

• 1995: Dayton Peace Accord
– ~35,000missingat end of conflict
• 1996: Search for the missing begins
• June 1996: ICMP was created
following the G-7 Summit in Lyon,
France
• 2003: Expanded Global Mandate
– Postconflict,otherregions
– DVI globally
• Individual DNA Identifications:
18,347
International Commission on
Missing Persons

A new treaty-level International Organization
dedicated exclusively to issues of the missing
• ICMP, Version 2.0
– December, 2014. Signing ceremony in Brussels.
• Internationally recognized mandate
• New ICMP Headquarters in The Hague
• Legal status: ability to operate and protect sensitive
data, immune from legal process

DNA-led Process in former
Yugoslavia (Bosnia and Kosovo)
Family reference
blood samples: 92,735
Missing represented: 30,011
Bone samples profiled: 43,355
Unique DNA profiles: 21,925
IndividualDNA Identifications:18,347
~5 identifications per working day, for 15 years

Unresolved Cases-- Balkans
• 30, 011 missing persons that have some references
• 18,347 DNA Identified
• 11,664 still need to be identified
• Postmortem Profiles from 21,925 individuals
• 18,347 Matched to families
• 3578 Unmatched PM Profiles
• 83% Match Rate of PM Profiles

Primary Causes of Unresolved Cases
• Failure to find bodies
• DNA testing failed
– Overall a 72% success rate for submitting profile
– STR DNA targets too big
• Insufficient reference samples from families
– 2287 missing persons for whom this is possibly
or probably the case
• Combination of the above
– Partial profile gives insufficient power to match
when family references are deficient.
Insufficient family references
Degraded DNA

What do missing persons need from
MPS?
• Ability to handle very degraded samples
• Strong capabilitiesof kinship analysis
– Single, distantrelatives?
• Vastly diminished costs
– Multiplexing:Simultaneous testing of many samples
– Optimizationand streamlining
– Homogeneous assays, robust pipelines
• Informaticsthat interfaceswith other necessary
data
– Matching and case managementin an integrated
identificationsystem

The case for SNPs
• Shortest possible target, suitable for degraded DNA
• Low mutation rates
– Eliminates a complication with STRs that can be a
pain– and seriously affects efficient algorithms in
database matching
• Lower power of discrimination
– But can target many SNPs with MPS
– Not all SNPs are binary: increased power
• Ease of data handling, nomenclature, and reporting
• Possibly less sensitive to not using the perfect allele
frequency database
– If selected SNPs have similar allele frequencies globally

QIASeq Targeted DNA Assay V3
• Unique Molecular
Indices (UMIs) added to
original DNA template
• You know how many
original DNA molecules
your data represents
• Possible strong advantage
in forensic validation with
regard to coverage
thresholds and stochastic
effects
• Single Primer Extension

SNP Assay Design Collaborative
Project
• Goal: Take advantage of the ability to design a very large
SNP assay that maximizes utility for individual
identification with degradedDNA and kinship analysis.
• ICMP
• Qiagen: Keith Elliott, Raed Samara, Eric Lader
• Chris Phillips, University of Santiago de Compostela
• Andreas Tillmar, Linköping University, and Swedish
National Board of Forensic Medicine
• Ken Kidd, Yale University: Microhaplotype loci

SNP Assay Design Collaborative
Project
• For further details:
– https://investigatorforum.qiagen.com/latest-
update

Design Pipeline
• Data mine 1000 Genomes Project database for tri- and
tetra-allelic SNPs (Chris Phillips)
– High heterozygosity compared to binary SNPs
– Mostly balanced across populations
• But not a uniform feature
– ~3000 candidate SNPs
• Include short micro-haplotype loci
• Eliminate candidates that are closely linked to each
other, or to common STR Loci (Andreas Tillmar)
• Screen out sites that are bad candidates for QIASeq
primer extension
– Extension primer 75 bp or less from target SNP site

“MPSplex” Panel Final Design
• 1457 Loci after linkage screening
– Only 6 sites eliminated based on gene specific
primer disqualification!
• 1377 tri- allelic autosomal SNPs
• 34 tri-allelic X chromosome SNPs
• 46 micro-haplotype loci
• Closely linked SNPs that define haplotypes,
resulting in highly heterozygousloci
• 2832 target enrichment extension primers
– 80% of sites with redundant targeting

MPSplex Design
>96% are less than 100 bp
away from the target
>75% are less than 50 bp
away from the target

Kinship Simulation,Andreas Tillmar
10225
Kinship Simulations:

1040

1015

QIAseq Chemistry and Workflow
Sequencing
GeneReader Illumina Ion Torrent
Library QC
Sample Normalization
Universal PCR
Target Enrichment PCR
MPSplexCustomPanel
Fragmentation and Ligation
Sample Preparation
Recommended Input – 40 ng
A
A
5’ 3’
A
A
*Bead Purification
*Bead Purification
*Bead Purification

QIAseq Analysis
Final Review
Profile Review, ApplyAnalysis Thresholds
SNP Calling(IdentifyKnown Mutations
Tool)
PreliminaryProfile
Calculate UMIs,Create Super Reads
Map Reads to Reference (Hg19)
Adapter Trimming
FASTQ
CLC BiomedicalWorkbench
QIAseq DNA V3 Workflow
Critical Analysis Metrics
• Coverage
• Super reads – attributable
to a unique molecule
• Thresholds
• 10X
• 10% Variant frequency
• Quality
• Allele Balance
• Forward/Reverse Balance

Read Mapping
6: 568,235
C,T
Forward Reads
Reverse Reads

Microhaplotypes
5: 9,619,905
C,T
5: 9,619,936
A,C
MH14

Microhaplotypes – Phased calls
MH14
TC, CA
5: 9,619,905
C,T
5: 9,619,936
A,C

Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Experiment 6
Sample Type GIAB Bone Samples GIAB
Reference
Samples
GIAB GIAB
DNA Input 1 ng 50 pg - 20 ng 1 ng - 20 ng 500 pg - 40 ng 5 ng or 40 ng
2.5 ng, 5 ng, or 40
ng
FragmentationTime 24 min 24 min 14 min 14 min 14 min 14 min
Target Enrichment
PCR Cycles
6 6 6 8 8 8
UniversalPCR Cycles 24 21-24 20 - 24 25 25 25
Sequencing
Instrument
Illumina MiSeq Illumina Nextseq GeneReader GeneReader GeneReader GeneReader
SequencingCycles 2 x150 2 x150 100 150 150 150
Numberof Samples
Multiplexed
6 17 (3 NCs) 1 10 12 or 4 12
MPSplex Testing

• Forensic validation requires known
samples
– Dense genome-wide markers required control
DNA of known genome
• Genome in a Bottle
• NA12878
• Ashkenazi Family Trio
– Mother, Father, Son
Known Reference Samples

GIAB Testing
• NA12878
– 12 Replicates
• Illumina and GeneReader
• DNA Input: 1 ng – 40 ng
• Ashkenazi Family Trio
– 5 Replicates each
• GeneReader
• DNA Input: 2.5 ng, 5 ng, and 40 ng

NA12878 – Auto & X SNPs (1151)
Auto & X SNPs
Coverage
Concordance
(Comparedto GIAB)
Sample Sequencer Input (ng)
Average
Coverage
(X)
≥10X % ≥10X Count %
1 Illumina 1 101 1151 100.00% 1145 99.50%
2 Illumina 1 104 1149 99.80% 1144 99.60%
3 GeneReader 1 51.8 1150 99.90% 1138 99.00%
4 GeneReader 10 159 1151 100.00% 1143 99.30%
5 GeneReader 10 433 1151 100.00% 1145 99.50%
6 GeneReader 20 471 1151 100.00% 1144 99.40%
7 GeneReader 40 96.5 1047 91.00% 1036 98.90%
8 GeneReader 5 61.9 1141 99.10% 1133 99.30%
9 GeneReader 5 94.9 1148 99.70% 1142 99.50%
10 GeneReader 40 549.5 1116 96.96% 1107 99.19%
11 GeneReader 5 875.7 1120 97.31% 1108 98.93%
12 GeneReader 2.5 364.8 1116 96.96% 1105 99.01%

NA12878 – MH SNPs (117)
Microhaplotypes
Coverage
Concordance
(Comparedto GIAB)
Sample Sequencer Input (ng)
Average
Coverage
(X)
≥10X % ≥10X Count %
1 Illumina 1 100 117 100.00% 117 100.00%
2 Illumina 1 103 117 100.00% 117 100.00%
3 GeneReader 1 61.3 117 100.00% 117 100.00%
4 GeneReader 10 218 117 100.00% 117 100.00%
5 GeneReader 10 561 117 100.00% 117 100.00%
6 GeneReader 20 668 117 100.00% 117 100.00%
7 GeneReader 40 151 115 98.30% 115 100.00%
8 GeneReader 5 85.4 117 100.00% 117 100.00%
9 GeneReader 5 126 117 100.00% 117 100.00%
10 GeneReader 40 419 116 99.15% 115 99.14%
11 GeneReader 5 716 116 99.15% 115 99.14%
12 GeneReader 2.5 304 116 99.15% 115 99.14%

Kinship Power
• Mother, Father, Son Trio tested for
Microhaplotypes
– Phase manually determined for 43 targeted
MHs
• Kinship Index of Child given parents= 4e+21

Norway Samples
• Degraded bone
samples
• DNA Input
– 50 pg – 20 ng
– One sample replicated at 3
inputs
• 22 ng, 11 ng, and 5.5 ng
• Illumina Nextseq
– 34 Sample sequencing
multiplex

Norway Samples – Coverage
Auto & X SNPs (1151) MH SNPs (117)
Sample Input (ng)
Average
Coverage (X)
≥10X % ≥10X
Average
Coverage (X)
≥10X % ≥10X
10-2 22.3 85.4 1116 97.0% 69.8 115 98.3%
5-6 19.43 99.9 1128 98.0% 82.1 114 97.4%
7-2 11.20 34.8 1078 93.7% 31.6 110 94.0%
6-2 5.5 19.7 944 82.0% 18.8 96 82.1%
7-6 4.91 109.3 1149 99.8% 99.1 117 100.0%
3-6 3.43 32.1 1102 95.7% 29.9 114 97.4%
2-6 3.23 38.3 1130 98.2% 38.2 117 100.0%
10-6 2.04 17.0 932 81.0% 14.4 88 75.2%
11-2 1.91 12.3 756 65.7% 10.7 67 57.3%
4-6 1.45 5.8 157 13.6% 4.5 6 5.1%
6-6 1.21 18.3 993 86.3% 15.0 99 84.6%
1-6 0.85 13.0 793 68.9% 11.9 74 63.2%
12-2 0.05 1.0 0 0.0% 1.1 0 0.0%
5-2 Neg 1.3 0 0.0% 1.1 0 0.0%
11-6 Neg 0.5 1 0.1% 0.4 0 0.0%
12-6 Neg 0.0 0 0.0% 0.0 0 0.0%
14-2 Pos (1ng) 103.7 1149 99.8% 103.2 117 100.0%

Norway Samples - Concordance
Concordance (Compared to 10-2 Replicate)
Auto & X SNPs (1151) MH SNPs (117)
Sample
Input
(ng)
Count
Both
≥10X
% Count
Both
≥10X
%
10-2 22.3
7-2 11.2 1068 1072 99.6% 96 96 100.0%
6-2 5.5 929 942 98.6% 110 110 100.0%

GC-content distribution of reads
Bones (1-6)
Human DNA (minor
peak in bimodial distribution)
Human DNA (all)
NA12878
Slide: QIAGEN

Panel Development
• Parallel assessment of SNP performance
and sequencing approach
– Applied to forensics
– Relatively new library preparation chemistry
approach
• Marker Evaluation
• Chemistry and Performance Evaluation

Marker Evaluation
• Evaluation of sequencing performance
• Flagged positions for further review
– Samples
• 35 samples utilized – 9 unique
– Initial manual review of all sites – NA12878
– Discordance and outliers for analysis metrics
• Quality, heterozygousallele balance, homozygous
variant frequency, low coverage

Evaluated sequencing quality
and accuracy of calls
• Evaluation criteria
– Homopolymeric or repeat region
– Non-specific reads
– Tri-allelic call
– Concordance
– Variants only observed in one direction
– Poor primer performance

Evaluated sequencing quality
and accuracy of calls
• Classified markers
–Confirmed – acceptable data
–Questioned – need further review
• Poor primer performance, low analysis
metrics, minor alignment issues, chemistry
specific issues
–Discarded – poor genomic targets

Homopolymeric/Repeat RegionGeneReaderIlluminaGIAB

Non-specific readsGeneReaderIlluminaGIAB

Tri-allelicGeneReaderIlluminaGIAB
T
A,C,T
A,C,T

Panel Review
MPSplex_v3
• Confirmed Markers – 1097
– Auto SNPs – 1033
– X SNPs – 19
– MHs – 45
• Questioned Markers – 218
• Discarded Markers – 142

Performance: Coverage
• DNA Input
• Sample Type
• Multiplexing
• Library preparation and sequencing
chemistry

Norway Samples – Coverage
0,0
20,0
40,0
60,0
80,0
100,0
120,0
0 5 10 15 20 25
AverageCoverage(X)
DNA Input (ng)

Coverage – DNA Input
5 ng 1 ng 500 pg
0,0
50,0
100,0
150,0
200,0
250,0
300,0
1 2 3 4 5 6 7 8 9
AverageCoverage(X)
Sample
40 ng Decreased Input

Coverage – Sample Multiplex
0,0
50,0
100,0
150,0
200,0
2 3 4 5
AverageCoverage(X)
Sample
40 ng 5 ng Decreased Sample Multiplex

Considerations for Setting
Thresholds
• Sensitivity testing
– High quality samples
– Artificially degraded high quality samples
– Degraded bone samples
• Metrics to evaluate
– Coverage
– Allele balance
– Quality
– F/R Balance

Next Steps
• Bone samples
– Test more samples
– Evaluate performance on the GeneReader
– Investigatemethods for removing of bacterial DNA
and/or bioinformatic removal of bacterial reads
• Artificially degraded high quality samples
• Finalize chemistry and optimize
• Sensitivity Studies
– Evaluate performance at lower inputs and start to
determine thresholds

Next Steps
• Analysis workflow and bioinformatics
– Microhaplotype phasing tool
• Kinship stats
– Evaluate Auto & X SNPs with family trio
– Impact of removing targets
• Integration into matching software
– SNPs
– Population databases
– Compatible profile format

Take Home Points
• Accurate and reproducible
• Optimization of method and analysis
– Need to see performance on degraded
samples
• Thresholds
– Require further testing and evaluation
• Need for appropriate data infrastructure

Acknowledgments
• ICMP: Rene Huel, Ana Bilic, Zlatan
Bajunovic, and the rest of the team
• QIAGEN: Keith Elliot, Erik
Soderback, Leif Schauser, Ben
Turner, Simon Hughes, Eric Lader,
Zhong Wu, Mehdi Motallebipour,
Joost van Dijk, Vikas Gupta, Holger
Karas
• Chris Phillips (University of
Santiago de Compostela)
• Andreas Tillmar (Linkoping
University)

Questions?
Michelle Peck
Michelle.peck@icmp.int

ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.

Similar to ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al. (20)

More from QIAGEN

More from QIAGEN (20)

Recently uploaded

Recently uploaded (20)

ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.