Optimization and Performance of a Very Large MGS SNP Panel for Missing Persons, by Michelle Peck et al., International Commission on Mission Persons. Presented May 3, 2018, at the QIAGEN Investigator Forum, San Antonio, TX.
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
1. OPTIMIZATION AND
PERFORMANCE OF A VERY LARGE
MPS SNP PANEL FOR MISSING
PERSONS
Michelle Peck, Sejla Idrizbegovic, Felix Bittner, and Thomas Parsons
May 3 2018,7th QIAGEN Investigator Forum, San Antonio, TX
2. International Commission on
Missing Persons
ICMP endeavors to secure
the co-operation of
governments and other
authorities in locating and
identifying persons missing
as a result of conflicts,
human rights abuses,
disasters, organized violence
and other causes and to
assist them in doing so.
3. • 1995: Dayton Peace Accord
– ~35,000missingat end of conflict
• 1996: Search for the missing begins
• June 1996: ICMP was created
following the G-7 Summit in Lyon,
France
• 2003: Expanded Global Mandate
– Postconflict,otherregions
– DVI globally
• Individual DNA Identifications:
18,347
International Commission on
Missing Persons
4. A new treaty-level International Organization
dedicated exclusively to issues of the missing
• ICMP, Version 2.0
– December, 2014. Signing ceremony in Brussels.
• Internationally recognized mandate
• New ICMP Headquarters in The Hague
• Legal status: ability to operate and protect sensitive
data, immune from legal process
5.
6. DNA-led Process in former
Yugoslavia (Bosnia and Kosovo)
Family reference
blood samples: 92,735
Missing represented: 30,011
Bone samples profiled: 43,355
Unique DNA profiles: 21,925
IndividualDNA Identifications:18,347
~5 identifications per working day, for 15 years
7. Unresolved Cases-- Balkans
• 30, 011 missing persons that have some references
• 18,347 DNA Identified
• 11,664 still need to be identified
• Postmortem Profiles from 21,925 individuals
• 18,347 Matched to families
• 3578 Unmatched PM Profiles
• 83% Match Rate of PM Profiles
8. Primary Causes of Unresolved Cases
• Failure to find bodies
• DNA testing failed
– Overall a 72% success rate for submitting profile
– STR DNA targets too big
• Insufficient reference samples from families
– 2287 missing persons for whom this is possibly
or probably the case
• Combination of the above
– Partial profile gives insufficient power to match
when family references are deficient.
Insufficient family references
Degraded DNA
9. What do missing persons need from
MPS?
• Ability to handle very degraded samples
• Strong capabilitiesof kinship analysis
– Single, distantrelatives?
• Vastly diminished costs
– Multiplexing:Simultaneous testing of many samples
– Optimizationand streamlining
– Homogeneous assays, robust pipelines
• Informaticsthat interfaceswith other necessary
data
– Matching and case managementin an integrated
identificationsystem
10. The case for SNPs
• Shortest possible target, suitable for degraded DNA
• Low mutation rates
– Eliminates a complication with STRs that can be a
pain– and seriously affects efficient algorithms in
database matching
• Lower power of discrimination
– But can target many SNPs with MPS
– Not all SNPs are binary: increased power
• Ease of data handling, nomenclature, and reporting
• Possibly less sensitive to not using the perfect allele
frequency database
– If selected SNPs have similar allele frequencies globally
11. The case for SNPs
• Shortest possible target, suitable for degraded DNA
• Low mutation rates
– Eliminates a complication with STRs that can be a
pain– and seriously affects efficient algorithms in
database matching
• Lower power of discrimination
– But can target many SNPs with MPS
– Not all SNPs are binary: increased power
• Ease of data handling, nomenclature, and reporting
• Possibly less sensitive to not using the perfect allele
frequency database
– If selected SNPs have similar allele frequencies globally
14. QIASeq Targeted DNA Assay V3
• Unique Molecular
Indices (UMIs) added to
original DNA template
• You know how many
original DNA molecules
your data represents
• Possible strong advantage
in forensic validation with
regard to coverage
thresholds and stochastic
effects
• Single Primer Extension
15. SNP Assay Design Collaborative
Project
• Goal: Take advantage of the ability to design a very large
SNP assay that maximizes utility for individual
identification with degradedDNA and kinship analysis.
• ICMP
• Qiagen: Keith Elliott, Raed Samara, Eric Lader
• Chris Phillips, University of Santiago de Compostela
• Andreas Tillmar, Linköping University, and Swedish
National Board of Forensic Medicine
• Ken Kidd, Yale University: Microhaplotype loci
16. SNP Assay Design Collaborative
Project
• For further details:
– https://investigatorforum.qiagen.com/latest-
update
17. Design Pipeline
• Data mine 1000 Genomes Project database for tri- and
tetra-allelic SNPs (Chris Phillips)
– High heterozygosity compared to binary SNPs
– Mostly balanced across populations
• But not a uniform feature
– ~3000 candidate SNPs
• Include short micro-haplotype loci
• Eliminate candidates that are closely linked to each
other, or to common STR Loci (Andreas Tillmar)
• Screen out sites that are bad candidates for QIASeq
primer extension
– Extension primer 75 bp or less from target SNP site
18. “MPSplex” Panel Final Design
• 1457 Loci after linkage screening
– Only 6 sites eliminated based on gene specific
primer disqualification!
• 1377 tri- allelic autosomal SNPs
• 34 tri-allelic X chromosome SNPs
• 46 micro-haplotype loci
• Closely linked SNPs that define haplotypes,
resulting in highly heterozygousloci
• 2832 target enrichment extension primers
– 80% of sites with redundant targeting
19. MPSplex Design
>96% are less than 100 bp
away from the target
>75% are less than 50 bp
away from the target
28. Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Experiment 6
Sample Type GIAB Bone Samples GIAB
Reference
Samples
GIAB GIAB
DNA Input 1 ng 50 pg - 20 ng 1 ng - 20 ng 500 pg - 40 ng 5 ng or 40 ng
2.5 ng, 5 ng, or 40
ng
FragmentationTime 24 min 24 min 14 min 14 min 14 min 14 min
Target Enrichment
PCR Cycles
6 6 6 8 8 8
UniversalPCR Cycles 24 21-24 20 - 24 25 25 25
Sequencing
Instrument
Illumina MiSeq Illumina Nextseq GeneReader GeneReader GeneReader GeneReader
SequencingCycles 2 x150 2 x150 100 150 150 150
Numberof Samples
Multiplexed
6 17 (3 NCs) 1 10 12 or 4 12
MPSplex Testing
29. • Forensic validation requires known
samples
– Dense genome-wide markers required control
DNA of known genome
• Genome in a Bottle
• NA12878
• Ashkenazi Family Trio
– Mother, Father, Son
Known Reference Samples
30. GIAB Testing
• NA12878
– 12 Replicates
• Illumina and GeneReader
• DNA Input: 1 ng – 40 ng
• Ashkenazi Family Trio
– 5 Replicates each
• GeneReader
• DNA Input: 2.5 ng, 5 ng, and 40 ng
35. Kinship Power
• Mother, Father, Son Trio tested for
Microhaplotypes
– Phase manually determined for 43 targeted
MHs
• Kinship Index of Child given parents= 4e+21
36. Norway Samples
• Degraded bone
samples
• DNA Input
– 50 pg – 20 ng
– One sample replicated at 3
inputs
• 22 ng, 11 ng, and 5.5 ng
• Illumina Nextseq
– 34 Sample sequencing
multiplex
39. Norway Samples - Concordance
Concordance (Compared to 10-2 Replicate)
Auto & X SNPs (1151) MH SNPs (117)
Sample
Input
(ng)
Count
Both
≥10X
% Count
Both
≥10X
%
10-2 22.3
7-2 11.2 1068 1072 99.6% 96 96 100.0%
6-2 5.5 929 942 98.6% 110 110 100.0%
40. GC-content distribution of reads
Bones (1-6)
Human DNA (minor
peak in bimodial distribution)
Human DNA (all)
NA12878
Slide: QIAGEN
41. Panel Development
• Parallel assessment of SNP performance
and sequencing approach
– Applied to forensics
– Relatively new library preparation chemistry
approach
• Marker Evaluation
• Chemistry and Performance Evaluation
42. Marker Evaluation
• Evaluation of sequencing performance
• Flagged positions for further review
– Samples
• 35 samples utilized – 9 unique
– Initial manual review of all sites – NA12878
– Discordance and outliers for analysis metrics
• Quality, heterozygousallele balance, homozygous
variant frequency, low coverage
43. Evaluated sequencing quality
and accuracy of calls
• Evaluation criteria
– Homopolymeric or repeat region
– Non-specific reads
– Tri-allelic call
– Concordance
– Variants only observed in one direction
– Poor primer performance
44. Evaluated sequencing quality
and accuracy of calls
• Classified markers
–Confirmed – acceptable data
–Questioned – need further review
• Poor primer performance, low analysis
metrics, minor alignment issues, chemistry
specific issues
–Discarded – poor genomic targets
54. Considerations for Setting
Thresholds
• Sensitivity testing
– High quality samples
– Artificially degraded high quality samples
– Degraded bone samples
• Metrics to evaluate
– Coverage
– Allele balance
– Quality
– F/R Balance
55. Next Steps
• Bone samples
– Test more samples
– Evaluate performance on the GeneReader
– Investigatemethods for removing of bacterial DNA
and/or bioinformatic removal of bacterial reads
• Artificially degraded high quality samples
• Finalize chemistry and optimize
• Sensitivity Studies
– Evaluate performance at lower inputs and start to
determine thresholds
56. Next Steps
• Analysis workflow and bioinformatics
– Microhaplotype phasing tool
• Kinship stats
– Evaluate Auto & X SNPs with family trio
– Impact of removing targets
• Integration into matching software
– SNPs
– Population databases
– Compatible profile format
57. Take Home Points
• Accurate and reproducible
• Optimization of method and analysis
– Need to see performance on degraded
samples
• Thresholds
– Require further testing and evaluation
• Need for appropriate data infrastructure
58. Acknowledgments
• ICMP: Rene Huel, Ana Bilic, Zlatan
Bajunovic, and the rest of the team
• QIAGEN: Keith Elliot, Erik
Soderback, Leif Schauser, Ben
Turner, Simon Hughes, Eric Lader,
Zhong Wu, Mehdi Motallebipour,
Joost van Dijk, Vikas Gupta, Holger
Karas
• Chris Phillips (University of
Santiago de Compostela)
• Andreas Tillmar (Linkoping
University)