Hidden value in medical genetics databases. Splice the silence!

937 views

Published on

Silent mutations in medical genetics databases like ClinVar contain extra value if analyzed with the most current genomics tools. In most cases the silent mutations are of low priority in big data genomics analysis, unless additional value like them being found at functionally important DNA sequences accompanies them. This presentation describes a method to add value to the silent mutations in human exome. Specifically, mapping variants, including silent variants to the known exon-intron boundaries identifies the silent mutations whose potential as pathogenic would otherwise be a lot more unclear.

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
937
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hidden value in medical genetics databases. Splice the silence!

  1. 1. Mehis Pold, MD Email:mehisp@hotmail.com 1 THEY AREN’T PERFECT! WIDELY USED GENOMICS DATABASES THAT IS…DIG DEEP! By Mehis Pold, MD Public databases have become one of the most used sources for information in genomics data analysis. They too, as anything else in this world aren’t perfect though. Case in point – ClinVar. I mapped all ClinVar GRCh37 single nucleotide variants using my UVM (Universal Variation Mapping) tools to known exon-intron boundaries as provided by CCDS. My particular interest was to find ClinVar variants annotated as silent mutations but located at conserved positions on splice junctions (see below, Fig. 1). A silent mutation found at a conserved position of an exon-intron boundary is certainly of higher value for interpretation of genetic finding than just a silent mutation anywhere else on an exon. In brief, I was curious if I could add value to the silent variants described in ClinVar. RESULT: I found a total of 2,426 GRCh37 variants in ClinVar not annotated for conserved splice junction positions. Out of these 2,426 variants, 159 were annotated as silent mutations (please see below, Table 1 and 2 for more descriptive details). For me in particular, the silent mutations on BRCA1 and BRCA2 stood out. CONCLUSION: Anyone using public databases still needs additional tools in order to uncover the full value found therein. Just relying on the annotations provided in the public databases will result in overlooking important details. Figure 1. Mapping was carried out to conserved positions on exon-intron boundaries. E – exon, I – intron, 1 – first position, 2 –second position, z – last position, y – second last position. Ez I1 I2 Iy Iz E1 +1 -1 -2 -2 -1 +1 DONOR SITE ACCEPTOR SITE Exon Intron Exon Conserved positions
  2. 2. Mehis Pold, MD Email:mehisp@hotmail.com 2 Table 1. Summary of ClinVar variant mapping to the known exon-intron boundaries as provided by CCDS. Total number of entries annotated as CRCh37: 97,811 Variants, no annotation to splice junction: 2426/97,811 = 2.5% Silent coding region variants, no annotation to splice junction: 159/97,811 = 0.16% Location on splice junction Number of hits in ClinVar Number of hits with ClinVar silent mutation Splice junction description key E1_A_MINUS 166 19 E1 - first base on exon E1_A_PLUS 191 22 Ez - last base on exon Ez_D_MINUS 245 39 I1 - first base on intron Ez_D_PLUS 246 73 I2 - second base on intron I1_D_MINUS 309 3 Iy - second last base on intron I1_D_PLUS 315 0 Iz - last base on intron I2_D_MINUS 92 1 A - acceptor I2_D_PLUS 140 2 D - donor Iy_A_MINUS 154 0 MINUS - minus strand Iy_A_PLUS 175 0 PLUS - plus strand Iz_A_MINUS 178 0 Iz_A_PLUS 215 0 TOTALS 2426 159 Source file: ClinVar database, variant summary, downloaded 06242014 Genome assembly - CRCh37 Variant type: single nucleotide variant Source for exon-Intron boundary coordinates: CCDS Table 2. The list of ClinVar silent mutations mapping to the conserved positions on exon-intron boundaries. E1_A_MINUS Variant Description Gene ID Pathogenicity Genome Coordinates AGA:c.699C>T (p.Gly233Gly=) AGA not provided 4:178355643 ATP4A:c.2007G>A (p.Lys669Lys=) ATP4A not provided 19:36046492 COBL:c.246C>T (p.Ser82Ser=) COBL not provided 7:51261286 MMP1:c.351G>A (p.Arg117Arg=) MMP1 not provided 11:102667893 MYH1:c.4182G>A (p.Lys1394Lys=) MYH1 not provided 17:10401234 NLRP13:c.2958G>A (p.Gly986=) NLRP13 not provided 19:56407485 NPHP3:c.1986G>A (p.Arg662=) NPHP3 Benign 3:132416206 PDE4C:c.243C>T (p.Arg81Arg=) PDE4C not provided 19:18333133
  3. 3. Mehis Pold, MD Email:mehisp@hotmail.com 3 PNKP:c.579G>A (p.Arg193=) PNKP Uncertain significance 19:50367493 POTEF:c.522G>A (p.Arg174Arg=) POTEF not provided 2:130872901 PTPRD:c.2263C>T (p.Leu755Leu=) PTPRD not provided 9:8465675 PTPRT:c.2343G>A (p.Leu781Leu=) PTPRT not provided 20:40828028 STRA6:c.114G>A (p.Gly38Gly=) STRA6 not provided 15:74490159 TMPRSS3:c.783C>T (p.Asp261=) TMPRSS3 Likely benign 21:43802343 TSC1:c.1998G>A (p.Lys666Lys=) TSC1 not provided 9:135779841 TSC1:c.738G>A (p.Arg246Arg=) TSC1 not provided 9:135787844 TTN:c.50355A>G (p.Arg16785=) TTN Uncertain significance 2:179476681 USH2A:c.6486G>A (p.Gln2162=) USH2A Likely benign 1:216172400 ZNF335:c.2703C>T (p.Ser901=) ZNF335 Benign 20:44581348 E1_A_PLUS Variant Description Gene ID Pathogenicity Genome Coordinates AATF:c.1620G>A (p.Arg540Arg=) AATF not provided 17:35413901 ATP2C1:c.2127G>A (p.Thr709Thr=) ATP2C1 not provided 3:130715524 ATP6V0A2:c.1515T>C (p.Asn505=) ATP6V0A2 Benign 12:124229429 B3GALTL:c.348T>C (p.His116His=) B3GALTL Benign 13:31821992 CHD7:c.4851T>C (p.Gly1617=) CHD7 Uncertain significance 8:61757423 DSP:c.2631G>A (p.Arg877=) DSP Benign 6:7576527 GALNT13:c.687G>A (p.Arg229Arg=) GALNT13 not provided 2:155102325 ITPR1:c.1962G>A (p.Lys654=) ITPR1 Benign 3:4712413 KIF1B:c.4809G>A (p.Lys1603Lys=) KIF1B not provided 1:10434374 LGR6:c.999G>A (p.Leu333Leu=) LGR6 not provided 1:202273687 LMNA:c.357C>T (p.Arg119Arg=) LMNA Benign;Uncertain significance 1:156100408 LMO7:c.3186G>A (p.Arg1062Arg=) LMO7 not provided 13:76416987 MYO7A:c.2283G>A (p.Arg761Arg=) MYO7A Likely benign 11:76890091 NPSR1:c.1026G>A (p.Arg342Arg=) NPSR1 not provided 7:34889177 NRG3:c.1158G>A (p.Lys386Lys=) NRG3 not provided 10:84718705 OSBPL2:c.456G>A (p.Arg152Arg=) OSBPL2 not provided 20:60854213 PIWIL4:c.1269G>A (p.Arg423Arg=) PIWIL4 not provided 11:94330970 PLCB1:c.3337C>T (p.Leu1113=) PLCB1 Benign 20:8770822 RUNDC3B:c.459G>A (p.Arg153Arg=) RUNDC3B not provided 7:87369107 SLC4A10:c.1443G>A (p.Arg481Arg=) SLC4A10 not provided 2:162760514 STXBP1:c.1548C>T (p.Ser516=) STXBP1 Uncertain significance 9:130444685 TRPM2:c.255G>A (p.Gly85Gly=) TRPM2 not provided 21:45783997 Ez_D_MINUS Variant Description Gene ID Pathogenicity Genome Coordinates ABCA4:c.768G>T (p.Val256=) ABCA4 not provided 1:94564350 AKT2:c.960G>A (p.Glu320=) AKT2 Uncertain significance 19:40742164 BRCA1:c.4185G>A (p.Gln1395Gln=) BRCA1 Pathogenic 17:41242961 BRCA1:c.5277G>A (p.Lys1759Lys=) BRCA1 Pathogenic 17:41209069 CA5A:c.555G>A (p.Lys185=) CA5A Pathogenic 16:87936031 CORO1C:c.195G>A (p.Lys65Lys=) CORO1C not provided 12:109094900 CPAMD8:c.627G>A (p.Lys209Lys=) CPAMD8 not provided 19:17122274 CSTB:c.168G>A (p.Lys56=) CSTB Likely pathogenic 21:45194539 CSTB:c.66G>A (p.Gln22=) CSTB Likely pathogenic 21:45196085 CUBN:c.1530G>A (p.Lys510=) CUBN Likely pathogenic 10:17145124 CUBN:c.489G>A (p.Lys163=) CUBN Likely pathogenic 10:17165587 DPYD:c.1905C>T (p.Asn635Asn=) DPYD not provided 1:97915615 GAMT:c.327G>A (p.Lys109Lys=) GAMT Pathogenic 19:1399792 GCK:c.483G>A (p.Lys161Lys=) GCK Likely pathogenic 7:44190555 GNPTAB:c.771G>A (p.Leu257Leu=) GNPTAB Pathogenic 12:102173930 KMT2D:c.10740G>A (p.Gln3580=) KMT2D Likely pathogenic 12:49427850 LAMA4:c.3813G>A (p.Gly1271=) LAMA4 Uncertain significance 6:112453955 MKS1:c.387G>A (p.Glu129=) MKS1 Likely pathogenic 17:56293449 MLC1:c.597A>G (p.Ser199=) MLC1 Benign 22:50515270 MYH7:c.2922G>A (p.Lys974=) MYH7 Likely benign 14:23893116
  4. 4. Mehis Pold, MD Email:mehisp@hotmail.com 4 MYH7:c.345C>T (p.Tyr115=) MYH7 Uncertain significance 14:23902293 MYH7:c.732C>T (p.Phe244=) MYH7 Benign 14:23900794 NEB:c.18693G>C (p.Ala6231=) NEB Benign 2:152420120 NEB:c.2943G>A (p.Glu981=) NEB Uncertain significance 2:152539176 NPHS2:c.378G>A (p.Lys126Lys=) NPHS2 not provided 1:179533825 PAH:c.168G>A (p.Glu56=) PAH not provided 12:103306569 PAH:c.912G>A (p.Gln304=) PAH not provided 12:103245465 PAH:c.969A>G (p.Thr323=) PAH not provided 12:103240673 PRKAG2:c.114G>A (p.Pro38=) PRKAG2 Likely benign 7:151573592 PYGM:c.1827G>A (p.Lys609Lys=) PYGM Pathogenic 11:64519069 RBM19:c.2385G>A (p.Gln795=) RBM19 Uncertain significance 12:114358416 RRM2B:c.48G>A (p.Glu16=) RRM2B Pathogenic 8:103251055 SLC17A5:c.291G>A (p.Thr97=) SLC17A5 Likely pathogenic 6:74354130 SLC25A13:c.15G>A (p.Lys5Lys=) SLC25A13 Pathogenic 7:95951254 TBC1D4:c.1611T>G (p.Ser537=) TBC1D4 Benign 13:75915261 TP53:c.672G>A (p.Glu224Glu=) TP53 not provided 17:7578177 TSC1:c.363G>A (p.Lys121Lys=) TSC1 not provided 9:135800974 VWF:c.7437G>A (p.Ser2479=) VWF not provided 12:6085277 WDR74:c.618G>A (p.Gln206Gln=) - not provided 11:62602903 Ez_D_PLUS Variant Description Gene ID Pathogenicity Genome Coordinates ATM:c.2250G>A (p.Lys750=) ATM Likely pathogenic;Pathogenic 11:108127067 ATM:c.3576G>A (p.Lys1192=) ATM Pathogenic 11:108151895 ATP8A2:c.2679G>A (p.Glu893Glu=) ATP8A2 not provided 13:26349097 ATP8B2:c.3393G>A (p.Thr1131Thr=) ATP8B2 not provided 1:154321014 BCKDHA:c.288C>T (p.His96=) BCKDHA Uncertain significance 19:41916721 BRCA2:c.516G>A (p.Lys172Lys=) BRCA2 Uncertain significance 13:32900419 BRCA2:c.8331G>A (p.Lys2777Lys=) BRCA2 Uncertain significance 13:32937670 BRCA2:c.8754G>A (p.Glu2918Glu=) BRCA2 Pathogenic 13:32950928 BRCA2:c.9117G>A (p.Pro3039Pro=) BRCA2 Pathogenic 13:32954050 BRCA2:c.9501G>A (p.Glu3167Glu=) BRCA2 Pathogenic 13:32969070 BTD:c.459G>A (p.Glu153Glu=) BTD Pathogenic 3:15683564 CCDC83:c.672G>A (p.Glu224Glu=) CCDC83 not provided 11:85610058 CCND1:c.723G>A (p.Pro241=) CCND1 risk factor 11:69462910 CD22:c.1944G>A (p.Lys648Lys=) CD22 not provided 19:35836029 CD96:c.855G>A (p.Glu285Glu=) CD96 not provided 3:111304225 CDH23:c.7362G>A (p.Thr2454=) CDH23 Likely pathogenic 10:73559386 CFTR:c.1209G>A (p.Glu403=) CFTR not provided 7:117182162 CFTR:c.1584G>A (p.Glu528=) CFTR Benign;Uncertain significance 7:117199709 CFTR:c.2619G>A (p.Glu873=) CFTR not provided 7:117235112 CFTR:c.2988G>A (p.Gln996Gln=) CFTR Pathogenic;Uncertain significance 7:117246807 CFTR:c.3468G>A (p.Leu1156=) CFTR not provided 7:117254767 CFTR:c.3717G>A (p.Arg1239=) CFTR not provided 7:117267824 CFTR:c.489G>A (p.Lys163=) CFTR not provided 7:117171168 CHRND:c.243C>T (p.His81=) CHRND Benign 2:233392155 CNTNAP2:c.1083G>A (p.Val361=) CNTNAP2 Benign;Uncertain significance 7:146825928 COL3A1:c.2022G>A (p.Lys674=) COL3A1 Pathogenic 2:189863444 COL3A1:c.2337G>A (p.Lys779=) COL3A1 Pathogenic 2:189866176 COL3A1:c.4254G>A (p.Thr1418=) COL3A1 Pathogenic 2:189875616 COL6A1:c.1056C>T (p.Asp352=) COL6A1 Benign 21:47410740 CYP27A1:c.1017G>C (p.Thr339=) CYP27A1 Pathogenic 2:219677819 DYSF:c.6009G>A (p.Ala2003=) DYSF Uncertain significance 2:71906365 EGFLAM:c.2166G>A (p.Glu722Glu=) EGFLAM not provided 5:38431390 ERGIC1:c.765G>A (p.Thr255Thr=) ERGIC1 not provided 5:172362313 FBXW12:c.33G>A (p.Lys11Lys=) FBXW12 not provided 3:48413810 FRMPD1:c.612G>A (p.Lys204Lys=) FRMPD1 not provided 9:37724317 GRN:c.264G>A (p.Glu88=) GRN not provided 17:42426919 GRN:c.708C>T (p.Asn236=) GRN not provided 17:42428168
  5. 5. Mehis Pold, MD Email:mehisp@hotmail.com 5 HAS3:c.636G>A (p.Gln212Gln=) HAS3 not provided 16:69143934 KCNQ1:c.1032G>A (p.Ala344Ala=) KCNQ1 Pathogenic 11:2604775 KCNQ1:c.1032G>C (p.Ala344=) KCNQ1 not provided 11:2604775 KCNQ1:c.921G>A (p.Val307=) KCNQ1 not provided 11:2594216 LMNA:c.1698C>T (p.His566His=) LMNA Benign 1:156107534 LMNA:c.513G>A (p.Lys171Lys=) LMNA not provided 1:156100564 LMNA:c.810G>A (p.Lys270Lys=) LMNA not provided 1:156104766 MAPT:c.1866T>C (p.Ser622=) MAPT not provided 17:44087768 MLH1:c.1038G>A (p.Gln346Gln=) MLH1 Pathogenic 3:37061954 MLH1:c.1731G>A (p.Ser577Ser=) MLH1 Pathogenic 3:37083822 MLH1:c.1731G>C (p.Ser577=) MLH1 Uncertain significance 3:37083822 MLH1:c.1896G>A (p.Glu632Glu=) MLH1 Pathogenic 3:37089174 MLH1:c.1989G>A (p.Glu663Glu=) MLH1 Likely pathogenic 3:37090100 MLH1:c.2103G>A (p.Gln701=) MLH1 Likely pathogenic 3:37090508 MLH1:c.453G>A (p.Thr151=) MLH1 Likely pathogenic 3:37048554 MSH2:c.2634G>A (p.Glu878=) MSH2 Pathogenic 2:47708010 MSH2:c.942G>A (p.Gln314=) MSH2 Pathogenic 2:47641557 MYH14:c.810C>T (p.Phe270=) MYH14 Likely benign 19:50728934 MYOCD:c.1125G>A (p.Lys375Lys=) MYOCD not provided 17:12649389 NISCH:c.1653G>A (p.Gln551Gln=) NISCH not provided 3:52518653 PNLDC1:c.1224C>T (p.Ile408Ile=) PNLDC1 not provided 6:160239686 PRRG2:c.261G>A (p.Thr87Thr=) - not provided 19:50086974 SERPING1:c.51G>A (p.Gly17Gly=) SERPING1 not provided 11:57365794 SLC26A4:c.1614C>T (p.Asn538Asn=) SLC26A4 Likely benign 7:107338556 SLC26A4:c.918G>A (p.Val306Val=) SLC26A4 not provided 7:107323799 SLC34A3:c.756G>A (p.Gln252Gln=) SLC34A3 Pathogenic 9:140127856 SLC34A3:c.846G>A (p.Pro282Pro=) SLC34A3 Pathogenic 9:140128174 TGFBR2:c.1524G>A (p.Gln508Gln=) TGFBR2 Pathogenic 3:30730003 TSC2:c.1443G>A (p.Glu481Glu=) TSC2 not provided 16:2113054 TSC2:c.1599G>A (p.Lys533Lys=) TSC2 not provided 16:2114428 TSC2:c.1716G>A (p.Gln572=) TSC2 not provided 16:2115636 TSC2:c.2742G>A (p.Lys914Lys=) TSC2 not provided 16:2126171 TSC2:c.4662G>A (p.Gln1554Gln=) TSC2 not provided 16:2135323 TTC8:c.489G>A (p.Thr163Thr=) TTC8 Pathogenic 14:89307540 UROD:c.942G>A (p.Glu314Glu=) UROD Pathogenic 1:45480678 VPS13B:c.1563G>A (p.Lys521Lys=) VPS13B Pathogenic 8:100147961 I1_D_MINUS Variant Description Gene ID Pathogenicity Genome Coordinates KCNQ2:c.1119G>A (p.Arg373Arg=) KCNQ2 Pathogenic 20:62065161 PAX3:c.1449G>A (p.Ala483Ala=) PAX3 not provided 2:223066130 PTHLH:c.525G>A (p.Arg175Arg=) PTHLH not provided 12:28116280 I2_D_MINUS Variant Description Gene ID Pathogenicity Genome Coordinates BRCA1:c.789T>C (p.Gly263=) BRCA1 not provided 17:41246759 I2_D_PLUS Variant Description Gene ID Pathogenicity Genome Coordinates EPGN:c.108T>C (p.Gly36=) EPGN not provided 4:75174874 MLH1:c.303T>G (p.Gly101Gly=) MLH1 Likely benign 3:37042541

×