Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.

342 views

Published on

Optimization and Performance of a Very Large MGS SNP Panel for Missing Persons, by Michelle Peck et al., International Commission on Mission Persons. Presented May 3, 2018, at the QIAGEN Investigator Forum, San Antonio, TX.

Published in: Science
  • Be the first to comment

  • Be the first to like this

ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.

  1. 1. OPTIMIZATION AND PERFORMANCE OF A VERY LARGE MPS SNP PANEL FOR MISSING PERSONS Michelle Peck, Sejla Idrizbegovic, Felix Bittner, and Thomas Parsons May 3 2018,7th QIAGEN Investigator Forum, San Antonio, TX
  2. 2. International Commission on Missing Persons ICMP endeavors to secure the co-operation of governments and other authorities in locating and identifying persons missing as a result of conflicts, human rights abuses, disasters, organized violence and other causes and to assist them in doing so.
  3. 3. • 1995: Dayton Peace Accord – ~35,000missingat end of conflict • 1996: Search for the missing begins • June 1996: ICMP was created following the G-7 Summit in Lyon, France • 2003: Expanded Global Mandate – Postconflict,otherregions – DVI globally • Individual DNA Identifications: 18,347 International Commission on Missing Persons
  4. 4. A new treaty-level International Organization dedicated exclusively to issues of the missing • ICMP, Version 2.0 – December, 2014. Signing ceremony in Brussels. • Internationally recognized mandate • New ICMP Headquarters in The Hague • Legal status: ability to operate and protect sensitive data, immune from legal process
  5. 5. DNA-led Process in former Yugoslavia (Bosnia and Kosovo) Family reference blood samples: 92,735 Missing represented: 30,011 Bone samples profiled: 43,355 Unique DNA profiles: 21,925 IndividualDNA Identifications:18,347 ~5 identifications per working day, for 15 years
  6. 6. Unresolved Cases-- Balkans • 30, 011 missing persons that have some references • 18,347 DNA Identified • 11,664 still need to be identified • Postmortem Profiles from 21,925 individuals • 18,347 Matched to families • 3578 Unmatched PM Profiles • 83% Match Rate of PM Profiles
  7. 7. Primary Causes of Unresolved Cases • Failure to find bodies • DNA testing failed – Overall a 72% success rate for submitting profile – STR DNA targets too big • Insufficient reference samples from families – 2287 missing persons for whom this is possibly or probably the case • Combination of the above – Partial profile gives insufficient power to match when family references are deficient. Insufficient family references Degraded DNA
  8. 8. What do missing persons need from MPS? • Ability to handle very degraded samples • Strong capabilitiesof kinship analysis – Single, distantrelatives? • Vastly diminished costs – Multiplexing:Simultaneous testing of many samples – Optimizationand streamlining – Homogeneous assays, robust pipelines • Informaticsthat interfaceswith other necessary data – Matching and case managementin an integrated identificationsystem
  9. 9. The case for SNPs • Shortest possible target, suitable for degraded DNA • Low mutation rates – Eliminates a complication with STRs that can be a pain– and seriously affects efficient algorithms in database matching • Lower power of discrimination – But can target many SNPs with MPS – Not all SNPs are binary: increased power • Ease of data handling, nomenclature, and reporting • Possibly less sensitive to not using the perfect allele frequency database – If selected SNPs have similar allele frequencies globally
  10. 10. The case for SNPs • Shortest possible target, suitable for degraded DNA • Low mutation rates – Eliminates a complication with STRs that can be a pain– and seriously affects efficient algorithms in database matching • Lower power of discrimination – But can target many SNPs with MPS – Not all SNPs are binary: increased power • Ease of data handling, nomenclature, and reporting • Possibly less sensitive to not using the perfect allele frequency database – If selected SNPs have similar allele frequencies globally
  11. 11. Qiagen:ICMP Partnership
  12. 12. QIASeq Targeted DNA Assay V3 • Unique Molecular Indices (UMIs) added to original DNA template • You know how many original DNA molecules your data represents • Possible strong advantage in forensic validation with regard to coverage thresholds and stochastic effects • Single Primer Extension
  13. 13. SNP Assay Design Collaborative Project • Goal: Take advantage of the ability to design a very large SNP assay that maximizes utility for individual identification with degradedDNA and kinship analysis. • ICMP • Qiagen: Keith Elliott, Raed Samara, Eric Lader • Chris Phillips, University of Santiago de Compostela • Andreas Tillmar, Linköping University, and Swedish National Board of Forensic Medicine • Ken Kidd, Yale University: Microhaplotype loci
  14. 14. SNP Assay Design Collaborative Project • For further details: – https://investigatorforum.qiagen.com/latest- update
  15. 15. Design Pipeline • Data mine 1000 Genomes Project database for tri- and tetra-allelic SNPs (Chris Phillips) – High heterozygosity compared to binary SNPs – Mostly balanced across populations • But not a uniform feature – ~3000 candidate SNPs • Include short micro-haplotype loci • Eliminate candidates that are closely linked to each other, or to common STR Loci (Andreas Tillmar) • Screen out sites that are bad candidates for QIASeq primer extension – Extension primer 75 bp or less from target SNP site
  16. 16. “MPSplex” Panel Final Design • 1457 Loci after linkage screening – Only 6 sites eliminated based on gene specific primer disqualification! • 1377 tri- allelic autosomal SNPs • 34 tri-allelic X chromosome SNPs • 46 micro-haplotype loci • Closely linked SNPs that define haplotypes, resulting in highly heterozygousloci • 2832 target enrichment extension primers – 80% of sites with redundant targeting
  17. 17. MPSplex Design >96% are less than 100 bp away from the target >75% are less than 50 bp away from the target
  18. 18. Kinship Simulation,Andreas Tillmar 10225 Kinship Simulations:
  19. 19. 1040 Kinship Simulation,Andreas Tillmar Kinship Simulations:
  20. 20. 1015 Kinship Simulations: Kinship Simulation,Andreas Tillmar
  21. 21. QIAseq Chemistry and Workflow Sequencing GeneReader Illumina Ion Torrent Library QC Sample Normalization Universal PCR Target Enrichment PCR MPSplexCustomPanel Fragmentation and Ligation Sample Preparation Recommended Input – 40 ng A A 5’ 3’ A A *Bead Purification *Bead Purification *Bead Purification
  22. 22. QIAseq Analysis Final Review Profile Review, ApplyAnalysis Thresholds SNP Calling(IdentifyKnown Mutations Tool) PreliminaryProfile Calculate UMIs,Create Super Reads Map Reads to Reference (Hg19) Adapter Trimming FASTQ CLC BiomedicalWorkbench QIAseq DNA V3 Workflow Critical Analysis Metrics • Coverage • Super reads – attributable to a unique molecule • Thresholds • 10X • 10% Variant frequency • Quality • Allele Balance • Forward/Reverse Balance
  23. 23. Read Mapping 6: 568,235 C,T Forward Reads Reverse Reads
  24. 24. Microhaplotypes 5: 9,619,905 C,T 5: 9,619,936 A,C MH14
  25. 25. Microhaplotypes – Phased calls MH14 TC, CA 5: 9,619,905 C,T 5: 9,619,936 A,C
  26. 26. Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Experiment 6 Sample Type GIAB Bone Samples GIAB Reference Samples GIAB GIAB DNA Input 1 ng 50 pg - 20 ng 1 ng - 20 ng 500 pg - 40 ng 5 ng or 40 ng 2.5 ng, 5 ng, or 40 ng FragmentationTime 24 min 24 min 14 min 14 min 14 min 14 min Target Enrichment PCR Cycles 6 6 6 8 8 8 UniversalPCR Cycles 24 21-24 20 - 24 25 25 25 Sequencing Instrument Illumina MiSeq Illumina Nextseq GeneReader GeneReader GeneReader GeneReader SequencingCycles 2 x150 2 x150 100 150 150 150 Numberof Samples Multiplexed 6 17 (3 NCs) 1 10 12 or 4 12 MPSplex Testing
  27. 27. • Forensic validation requires known samples – Dense genome-wide markers required control DNA of known genome • Genome in a Bottle • NA12878 • Ashkenazi Family Trio – Mother, Father, Son Known Reference Samples
  28. 28. GIAB Testing • NA12878 – 12 Replicates • Illumina and GeneReader • DNA Input: 1 ng – 40 ng • Ashkenazi Family Trio – 5 Replicates each • GeneReader • DNA Input: 2.5 ng, 5 ng, and 40 ng
  29. 29. NA12878 – Auto & X SNPs (1151) Auto & X SNPs Coverage Concordance (Comparedto GIAB) Sample Sequencer Input (ng) Average Coverage (X) ≥10X % ≥10X Count % 1 Illumina 1 101 1151 100.00% 1145 99.50% 2 Illumina 1 104 1149 99.80% 1144 99.60% 3 GeneReader 1 51.8 1150 99.90% 1138 99.00% 4 GeneReader 10 159 1151 100.00% 1143 99.30% 5 GeneReader 10 433 1151 100.00% 1145 99.50% 6 GeneReader 20 471 1151 100.00% 1144 99.40% 7 GeneReader 40 96.5 1047 91.00% 1036 98.90% 8 GeneReader 5 61.9 1141 99.10% 1133 99.30% 9 GeneReader 5 94.9 1148 99.70% 1142 99.50% 10 GeneReader 40 549.5 1116 96.96% 1107 99.19% 11 GeneReader 5 875.7 1120 97.31% 1108 98.93% 12 GeneReader 2.5 364.8 1116 96.96% 1105 99.01%
  30. 30. NA12878 – Auto & X SNPs (1151) Auto & X SNPs Coverage Concordance (Comparedto GIAB) Sample Sequencer Input (ng) Average Coverage (X) ≥10X % ≥10X Count % 1 Illumina 1 101 1151 100.00% 1145 99.50% 2 Illumina 1 104 1149 99.80% 1144 99.60% 3 GeneReader 1 51.8 1150 99.90% 1138 99.00% 4 GeneReader 10 159 1151 100.00% 1143 99.30% 5 GeneReader 10 433 1151 100.00% 1145 99.50% 6 GeneReader 20 471 1151 100.00% 1144 99.40% 7 GeneReader 40 96.5 1047 91.00% 1036 98.90% 8 GeneReader 5 61.9 1141 99.10% 1133 99.30% 9 GeneReader 5 94.9 1148 99.70% 1142 99.50% 10 GeneReader 40 549.5 1116 96.96% 1107 99.19% 11 GeneReader 5 875.7 1120 97.31% 1108 98.93% 12 GeneReader 2.5 364.8 1116 96.96% 1105 99.01%
  31. 31. NA12878 – MH SNPs (117) Microhaplotypes Coverage Concordance (Comparedto GIAB) Sample Sequencer Input (ng) Average Coverage (X) ≥10X % ≥10X Count % 1 Illumina 1 100 117 100.00% 117 100.00% 2 Illumina 1 103 117 100.00% 117 100.00% 3 GeneReader 1 61.3 117 100.00% 117 100.00% 4 GeneReader 10 218 117 100.00% 117 100.00% 5 GeneReader 10 561 117 100.00% 117 100.00% 6 GeneReader 20 668 117 100.00% 117 100.00% 7 GeneReader 40 151 115 98.30% 115 100.00% 8 GeneReader 5 85.4 117 100.00% 117 100.00% 9 GeneReader 5 126 117 100.00% 117 100.00% 10 GeneReader 40 419 116 99.15% 115 99.14% 11 GeneReader 5 716 116 99.15% 115 99.14% 12 GeneReader 2.5 304 116 99.15% 115 99.14%
  32. 32. NA12878 – MH SNPs (117) Microhaplotypes Coverage Concordance (Comparedto GIAB) Sample Sequencer Input (ng) Average Coverage (X) ≥10X % ≥10X Count % 1 Illumina 1 100 117 100.00% 117 100.00% 2 Illumina 1 103 117 100.00% 117 100.00% 3 GeneReader 1 61.3 117 100.00% 117 100.00% 4 GeneReader 10 218 117 100.00% 117 100.00% 5 GeneReader 10 561 117 100.00% 117 100.00% 6 GeneReader 20 668 117 100.00% 117 100.00% 7 GeneReader 40 151 115 98.30% 115 100.00% 8 GeneReader 5 85.4 117 100.00% 117 100.00% 9 GeneReader 5 126 117 100.00% 117 100.00% 10 GeneReader 40 419 116 99.15% 115 99.14% 11 GeneReader 5 716 116 99.15% 115 99.14% 12 GeneReader 2.5 304 116 99.15% 115 99.14%
  33. 33. Kinship Power • Mother, Father, Son Trio tested for Microhaplotypes – Phase manually determined for 43 targeted MHs • Kinship Index of Child given parents= 4e+21
  34. 34. Norway Samples • Degraded bone samples • DNA Input – 50 pg – 20 ng – One sample replicated at 3 inputs • 22 ng, 11 ng, and 5.5 ng • Illumina Nextseq – 34 Sample sequencing multiplex
  35. 35. Norway Samples – Coverage Auto & X SNPs (1151) MH SNPs (117) Sample Input (ng) Average Coverage (X) ≥10X % ≥10X Average Coverage (X) ≥10X % ≥10X 10-2 22.3 85.4 1116 97.0% 69.8 115 98.3% 5-6 19.43 99.9 1128 98.0% 82.1 114 97.4% 7-2 11.20 34.8 1078 93.7% 31.6 110 94.0% 6-2 5.5 19.7 944 82.0% 18.8 96 82.1% 7-6 4.91 109.3 1149 99.8% 99.1 117 100.0% 3-6 3.43 32.1 1102 95.7% 29.9 114 97.4% 2-6 3.23 38.3 1130 98.2% 38.2 117 100.0% 10-6 2.04 17.0 932 81.0% 14.4 88 75.2% 11-2 1.91 12.3 756 65.7% 10.7 67 57.3% 4-6 1.45 5.8 157 13.6% 4.5 6 5.1% 6-6 1.21 18.3 993 86.3% 15.0 99 84.6% 1-6 0.85 13.0 793 68.9% 11.9 74 63.2% 12-2 0.05 1.0 0 0.0% 1.1 0 0.0% 5-2 Neg 1.3 0 0.0% 1.1 0 0.0% 11-6 Neg 0.5 1 0.1% 0.4 0 0.0% 12-6 Neg 0.0 0 0.0% 0.0 0 0.0% 14-2 Pos (1ng) 103.7 1149 99.8% 103.2 117 100.0%
  36. 36. Norway Samples – Coverage Auto & X SNPs (1151) MH SNPs (117) Sample Input (ng) Average Coverage (X) ≥10X % ≥10X Average Coverage (X) ≥10X % ≥10X 10-2 22.3 85.4 1116 97.0% 69.8 115 98.3% 5-6 19.43 99.9 1128 98.0% 82.1 114 97.4% 7-2 11.20 34.8 1078 93.7% 31.6 110 94.0% 6-2 5.5 19.7 944 82.0% 18.8 96 82.1% 7-6 4.91 109.3 1149 99.8% 99.1 117 100.0% 3-6 3.43 32.1 1102 95.7% 29.9 114 97.4% 2-6 3.23 38.3 1130 98.2% 38.2 117 100.0% 10-6 2.04 17.0 932 81.0% 14.4 88 75.2% 11-2 1.91 12.3 756 65.7% 10.7 67 57.3% 4-6 1.45 5.8 157 13.6% 4.5 6 5.1% 6-6 1.21 18.3 993 86.3% 15.0 99 84.6% 1-6 0.85 13.0 793 68.9% 11.9 74 63.2% 12-2 0.05 1.0 0 0.0% 1.1 0 0.0% 5-2 Neg 1.3 0 0.0% 1.1 0 0.0% 11-6 Neg 0.5 1 0.1% 0.4 0 0.0% 12-6 Neg 0.0 0 0.0% 0.0 0 0.0% 14-2 Pos (1ng) 103.7 1149 99.8% 103.2 117 100.0%
  37. 37. Norway Samples - Concordance Concordance (Compared to 10-2 Replicate) Auto & X SNPs (1151) MH SNPs (117) Sample Input (ng) Count Both ≥10X % Count Both ≥10X % 10-2 22.3 7-2 11.2 1068 1072 99.6% 96 96 100.0% 6-2 5.5 929 942 98.6% 110 110 100.0%
  38. 38. GC-content distribution of reads Bones (1-6) Human DNA (minor peak in bimodial distribution) Human DNA (all) NA12878 Slide: QIAGEN
  39. 39. Panel Development • Parallel assessment of SNP performance and sequencing approach – Applied to forensics – Relatively new library preparation chemistry approach • Marker Evaluation • Chemistry and Performance Evaluation
  40. 40. Marker Evaluation • Evaluation of sequencing performance • Flagged positions for further review – Samples • 35 samples utilized – 9 unique – Initial manual review of all sites – NA12878 – Discordance and outliers for analysis metrics • Quality, heterozygousallele balance, homozygous variant frequency, low coverage
  41. 41. Evaluated sequencing quality and accuracy of calls • Evaluation criteria – Homopolymeric or repeat region – Non-specific reads – Tri-allelic call – Concordance – Variants only observed in one direction – Poor primer performance
  42. 42. Evaluated sequencing quality and accuracy of calls • Classified markers –Confirmed – acceptable data –Questioned – need further review • Poor primer performance, low analysis metrics, minor alignment issues, chemistry specific issues –Discarded – poor genomic targets
  43. 43. Homopolymeric/Repeat RegionGeneReaderIlluminaGIAB
  44. 44. Non-specific readsGeneReaderIlluminaGIAB
  45. 45. Tri-allelicGeneReaderIlluminaGIAB T A,C,T A,C,T
  46. 46. Panel Review MPSplex_v3 • Confirmed Markers – 1097 – Auto SNPs – 1033 – X SNPs – 19 – MHs – 45 • Questioned Markers – 218 • Discarded Markers – 142
  47. 47. Performance: Coverage • DNA Input • Sample Type • Multiplexing • Library preparation and sequencing chemistry
  48. 48. Coverage
  49. 49. Norway Samples – Coverage 0,0 20,0 40,0 60,0 80,0 100,0 120,0 0 5 10 15 20 25 AverageCoverage(X) DNA Input (ng)
  50. 50. Coverage – DNA Input 5 ng 1 ng 500 pg 0,0 50,0 100,0 150,0 200,0 250,0 300,0 1 2 3 4 5 6 7 8 9 AverageCoverage(X) Sample 40 ng Decreased Input
  51. 51. Coverage – Sample Multiplex 0,0 50,0 100,0 150,0 200,0 2 3 4 5 AverageCoverage(X) Sample 40 ng 5 ng Decreased Sample Multiplex
  52. 52. Considerations for Setting Thresholds • Sensitivity testing – High quality samples – Artificially degraded high quality samples – Degraded bone samples • Metrics to evaluate – Coverage – Allele balance – Quality – F/R Balance
  53. 53. Next Steps • Bone samples – Test more samples – Evaluate performance on the GeneReader – Investigatemethods for removing of bacterial DNA and/or bioinformatic removal of bacterial reads • Artificially degraded high quality samples • Finalize chemistry and optimize • Sensitivity Studies – Evaluate performance at lower inputs and start to determine thresholds
  54. 54. Next Steps • Analysis workflow and bioinformatics – Microhaplotype phasing tool • Kinship stats – Evaluate Auto & X SNPs with family trio – Impact of removing targets • Integration into matching software – SNPs – Population databases – Compatible profile format
  55. 55. Take Home Points • Accurate and reproducible • Optimization of method and analysis – Need to see performance on degraded samples • Thresholds – Require further testing and evaluation • Need for appropriate data infrastructure
  56. 56. Acknowledgments • ICMP: Rene Huel, Ana Bilic, Zlatan Bajunovic, and the rest of the team • QIAGEN: Keith Elliot, Erik Soderback, Leif Schauser, Ben Turner, Simon Hughes, Eric Lader, Zhong Wu, Mehdi Motallebipour, Joost van Dijk, Vikas Gupta, Holger Karas • Chris Phillips (University of Santiago de Compostela) • Andreas Tillmar (Linkoping University)
  57. 57. Questions? Michelle Peck Michelle.peck@icmp.int

×