Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ratio of harmful amino acid substitutions: Case studies of BTK and MMR system - Mauno Vihinen

314 views

Published on

Numerous disease-causing and neutral variations have been identified in several genes and proteins. To gain insight into the total number of harmful variations we have performed protein-wide predictions for Bruton tyrosine kinase (BTK) involved in primary immunodeficiency and mismatch repair (MMR) system proteins in gastric cancers. Mutation rates have been measured and estimated for certain proteins, however, are not very useful in this respect as the rates are known to vary by 100 fold depending on the gene.

We developed dedicated predictors both for the BTK, called PONBTK (1), and for four MMR proteins, called PON-MMR2 (2). The methods were carefully evaluated before used for the analyses. They both have very good performance according to several measures. In the case of BTK kinase domain we found that altogether two thirds of all the possible single-nucleotide substitution-caused amino acid variations (SNAVs) are harmful. In the case of the MMR proteins, where all the possible substitutions i.e. 19 per position, were investigated the ratios are somewhat smaller than for BTK and different for individual proteins. Predictions with PON-P2 (3), a generic pathogenicity predictor, indicate very different ratios of harmful variants in proteins. PON-P2 is very suitable for this kind of analysis due to its high performance and ability to distinguish harmful variants. We have applied this approach to all the 22 mitochondrial tRNA molecules by using our novel PON-mt-tRNA predictor (4). Several mechanisms are behind the harmful variants. When developing a new predictor for protein solubility affecting variants (PON-Sol, 4), we tested the method on interleukin 1-β and found structure context dependent distribution differences for the solubility increasing and decreasing variants.

Knowledge about harmful variants in genes and proteins allows deeper insight into diseases and when combined with other information, such as protein structures, understanding the mechanisms behind diseases due to variants.

All the tools mentioned are freely available at http://structure.bmc.lu.se/

References
1. Väliaho, J., Faisal, I., Ortutay, C., Smith, C. I. E. and Vihinen, M. (2015) Characterization of all possible single-nucleotide change caused amino acid substitutions in the kinase domain of Bruton tyrosine kinase. Hum. Mutat. 36, 638-647.
2. Niroula, A. and Vihinen, M. (2015) Classification of amino acid substitutions in mismatch repair proteins using PON-MMR2.
3. Niroula, A., Urolagin, S. and Vihinen, M. (2015) PON-P2: Prediction method for fast and reliable identification of harmful variants. PLoS ONE 10(2):e0117380.
4. Niroula, A. and Vihinen, M. (2016) PON-mt-tRNA: a multifactorial probability-based method for classification of mitochondrial tRNA variations. Nucleic Acids Res. 44, 2020-2027.
5. Yang, Y., Niroula, A., Shen, B., and Vihinen, M. PON-Sol: prediction of effects of variants on protein solubility. Bioinf. (in press).

Published in: Science
  • Be the first to comment

  • Be the first to like this

Ratio of harmful amino acid substitutions: Case studies of BTK and MMR system - Mauno Vihinen

  1. 1. Ratio of harmful amino acid substitutions Mauno Vihinen Protein Structure and Bioinformatics Group Department of Experimental Medical Science Lund University, Sweden
  2. 2. Focus on variation Data collection Method performance assessment Systematics Prediction method development Predictors Mechanisms
  3. 3. How many variations are disease related? Current databases don’t provide the answer Depends on gene/protein/domain/region Bruton tyrosine kinase (BTK) in X-linked agammaglobulinemia
  4. 4. Amino acid substitutions Hydrophilic Hydrophobic Acidic Basic Polar Special -> A F I L M V W Y D E H K R N Q S T C G P X total A 0 1 0.6 0.4 0 0 0 0.8 0 2.8 F 0 0 0.4 0.1 0.3 1 0.1 0.1 2.1 I 0.1 0 0 0.1 0 0 0 1 0 0.1 0.6 0 2 L 1.3 0.3 0 0 0.3 0.1 0 0.6 0.1 0.6 3.8 0.6 7.6 M 1.1 0.1 0 0.4 0.4 0.1 2 0 4.2 V 0.4 0.7 0 0 0 0.1 0.4 0.1 0.3 0 0.1 2.2 W 0.1 0.8 0.3 0.3 0 4.1 5.6 Y 0 0 0.7 0.6 0.6 0.8 1.3 4.8 8.7 D 0 0.4 0.1 0 0.1 0.1 0.3 0.3 0 1.4 E 0 0 0.7 0 0.3 0 0.4 1.8 3.2 H 0 0.1 0.1 0 0.4 0 0.1 0.3 0 1.1 K 0.1 0 1.1 0 0.4 0.3 0 0 1.4 3.4 R 0 0.3 0 6.3 4.1 0.6 0.1 4.5 1 0.3 2.5 2 1.1 8.3 31 N 0 0.1 0 0 0.1 0 0 0 0.1 0.4 Q 0 0 0 0.3 0 0.1 0 0.1 6.2 6.7 S 0 0.7 0.1 0.1 0 0.4 0.1 0 0 0 0 0 0.7 1 3.2 T 0.1 0.4 0 0 0 0 0 0 1.1 0 1.7 C 0.7 0.3 1.7 0.4 0.3 0 0.4 0.7 4.5 G 0.1 0.1 0.1 1.1 2.1 1.5 0 0 0.1 0.3 5.6 P 0.3 0.6 0 0 0.3 0 0.7 0.7 0 0 2.5 total 1 3.5 2.1 1.7 0.1 2.5 6.9 2.8 3.6 3.9 5 1.4 5 2.1 4.8 4.8 3.5 4.2 3.5 8 29.5 100
  5. 5. Approach Predictions Methods of highest possible performance Machine learning methods trained with experimentally verified cases, careful selection, representativeness Benchmarking
  6. 6. PON-BTK More than 500 kinases, BTK contains the largest number of disease- causing variants among them Dedicated predictor for kinase domain of Bruton tyrosine kinase SNAVs, single-nucleotide substitution caused amino acid variations Analysis of all amino acid substitutions caused by single nucleotide change 67% of variations harmful Väliaho et al. Hum. Mutat. 2015
  7. 7. Experimental studies
  8. 8. PON-MMR Dedicated predictor for a protein complex, mismatch repair (MMR) system Lynch syndrome and other gastrointestinal cancers Collection of data from literature, InSiGHT database 785 amino acid substitutions in InSiGHT database in 5 proteins Clear functional information for disease relevance only for 168 INSiGHT variation interpretation committee rapport (Thompson et al. Nat. Genet 2014) Classified 1370 variants out of 2360 investigated 46 of those we had predicted to be either benign or pathogenic 44/46 were correct (96%) Ali et al. Hum. Mutat. 2012
  9. 9. PON-MMR, structural verification Structure of MSH2-MSH6 dimer Structural explanations match for 105 out of 109 variants
  10. 10. PON-MMR2 Novel predictor for MLH1, MSH2, MSH6, PMS2 Trained with InSiGHT and PON-MMR data Extensive feature selection and method training 5 useful features from altogether 624 Accuracy and MCC: 0.84 and 0.70, cross- validation 0.85 and 0.67 for an independent test dataset Niroula and Vihinen, Hum. Mutat. 2015
  11. 11. Ratios of harmful variants
  12. 12. Mitochondrial tRNA Human cells have two sets of tRNAs - nuclear and mitochondrial Almost all pathogenic variations in tRNAs are found in mitochondrial tRNAs Evidence-based methods have been developed to classify mitochondrial tRNA variations Conservation, Biochemical test, histochemical test, heteroplasmy, segregation, single fibre, trans-mitochondrial cybrid, etc About 200 variants were classified by Yarham et al. Niroula and Vihinen, NAR 2016
  13. 13. PON-mt-tRNA Classification of all possible single nucleotide substitutions in all 22 human mt-tRNAs Pathogenic variations are concentrated in the stems Anticodon loop has the highest frequency of pathogenic variations among loops 51.0% of all variants predicted to be harmful 42.0% of all variants in yeast arginine tRNACCU (Li et al. Science 2016)
  14. 14. PON-P2 PON-P2 classifies amino acid substitutions into three classes benign pathogenic unclassified variants (UVs) Machine learning -based method (random forest) Features Physical and biochemical properties of amino acids Gene Ontology annotations Evolutionary features Functional annotation Niroula et al PLoS ONE 2015
  15. 15. Variants in cancer Somatic variations in 7,042 cancer genomes/exomes Alexandrov et al. (Nature 2013) 30 cancer types ~5 million variants 2.63 million in coding region 824,001 amino acid substitutions Predicted the impact of AASs using PON-P2 14.24% of all variations were predicted harmful 14.71% of COSMIC variation (total 647,872) 39.88% variations in Cancer Gene Census (CGC) genes Niroula and Vihinen, BMC Med Genet 2015
  16. 16. Summary Estimates for the ratio of harmful variants - BTK kinase domain, SNAVs - MMR system proteins, all variants - Mitochondrial tRNAs - Cancer - 30 cancer types - COSMIC - Cancer Gene Census proteins Highly variable ratios between proteins and domains, and within proteins
  17. 17. Thanks! Abhishek Gabriel Gerard Jelena Siddhaling Yang Niroula Teku Schaafsma Calyseva Urolagin Yang Heidi Preethy Jouni Ali Nair Väliaho http://structure.bmc.lu.se mauno.vihinen@med.lu.se

×