A bioinformatics approach for the prioritization of
disease candidate human mtDNA mutations
BiP-Day 2014 Seconda Giornata della Bioinformatica Pugliese
Bari, 19 dicembre 2014
Mariangela Santorsola
mar. ’15 Mariangela Santorsola
1 Mutations spreading in one or more populations and/or
haplogroup-associated events
2 Rare mutations occurring in highly conserved sites lying to
functional and selective constraints (somatic and/or germline)
Potentially affecting function
mitochondrial mutations
Mitochondrial mutations
MToolBox(1) functional annotation
Patho Table of all possible
non-synonymous mutations
• Nucleotide variability values (SiteVar
alghoritm (2))
• Six pathogenicity predictions from:
1 MutPred (3)
2 Polyphen-2 (4)
3 SNPs&GO (5)
1 Calabrese et al., 2014
2 Pesole and Saccone 2001.
3 Li et al., 2009
4 Adzhubei et al., 2010
5 Capriotti et al., 2013
mar. ’15 Mariangela Santorsola
The discrepancy of pathogenicity predictions by different methods requires
the use of a single score able to summarize such predictions and define a
mitochondrial non-synonymous mutation as ’disease' or ’benign'.
mar. ’15 Mariangela Santorsola
The weighted mean of the probabilities to be deleterious provided by
pathogenicity predictor methods for each i-th non-synonymous mutation
Disease Score
DSi =
(Pi
MP*WMP)+(Pi
PPD*WPPD)+(Pi
PPV*WPPV)+(Pi
PT*WPT)+(Pi
PS*WPS)+(Pi
SG*WSG)
WMP+WPPD+WPPV+WPT+WPS+WSG
PMP = MutPred probability WMP= MutPred weight
PPPD = Polyphen-2 HumDiv probability WPPD = Polyphen-2 HumDiv weight
PPPV = Polyphen-2 HumVar probability WPPV = Polyphen-2 HumVar weight
PPT = PANTHER probability WPT = PANTHER weight
PPS = PhD-SNP probability WPS = PhD-SNP weight
PSG = SNPs&GO probability WSG = SNPs&GO weight
Ranging between 0 and 1
Weight
W=(hp+rp)/2n
• hp = number of times the method
provides the higher probability
• rp = number of times the method
provides the right prediction
(“affecting function or disease”)
• n = number of training mutations
• An ideal method which provides n
times the higher probability and n
times the right prediction for n
mutations would have weight 1
mar. ’15 Mariangela Santorsola
Disease Score
Training dataset
53 non-synonymous mutations
previously validated as affecting
function
• 28 disease-associated mutations,
annotated in Mitomap as
‘confirmed’ pathogenic by at
least two or more independent
laboratories (1)
• 25 cancer-associated mutations
previously validated (2)
1 http://www.mitomap.org
2 Pereira et al., 2012
mar. ’15 Mariangela Santorsola
Min 0.66
Median 0.87
Max 0.92
Min 0.05
Median 0.13
Max 0.43
Disease Benign
Disease
Benign
Disease scores distribution of observed non-synonymous
mutations predicted as ‘Benign’ or ‘Disease’ by the six
pathogenicity predictors at the same time.
>=0.4311
Disease score cutoff
Bimodal distribution of disease scores for 1872 observed non-synonymous
mutations observed in 15385 mtDNA genomes from healthy individuals
stored in HmtDB (1) (Last update May 2014)
The disease score value for which the probability of belonging to the
second ‘disease’ component of mixture model was ten times greater than
the probability of belonging to the first ‘neutral’.
1 Rubino et al., 2012
Disease Scores
Frequency
mar. ’15 Mariangela Santorsola
Disease
Benign
mar. ’15 Mariangela Santorsola
Potentially affecting function mitochondrial mutations
Low nucleotide variability value
High disease score
Nucleotide variability/Disease score correlation of all possible
non-synonymous mutations
mar. ’15 Mariangela Santorsola
Nucleotide variability cutoff
Nucleotide variability
Frequency
The nucleotide variability cutoff below which a mutation may be considered
potentially deleterious was determined as the third quartile of the
distribution of variability values associated to the 816 non-synonymous
events featuring disease score above the established DS-cutoff.
3rd Qu. <= 0.0026
mar. ’15 Mariangela Santorsola
Polymorphic and haplogroup-associated vs
rare mutations
MToolBox Functional annotation
• Hg_MHCS (Major Haplogroup
Consensus Sequences(1) )
• rCRS (revised Cambridge
Reference Sequence(2))
• RSRS (Reconstructed Sapiens
Reference Sequence(3))
Phylogenetic relationships among virtual Major Haplogroup Consensus Sequences and
two real mitochondrial sequences (Phylotree(4)) for each haplogroup
1 Calabrese et al., 2014
2 Anderson et al., 1981
3 Behar et al., 2012
4 van Oven and Kayser 2009
Prioritization criteria of mtDNA non-synonymous
mutations affecting-function for future analysis
mar. ’15 Mariangela Santorsola
Recognized by three reference sequences
Occurring in non-haplogroup defining sites
Featuring nucleotide variability values <= 0.0026
Featuring Disease score >= 0.4311
Heteroplasmy level (*)
mar. ’15 Mariangela Santorsola
Application of MToolBox and prioritization criteria
Check of tumor-specific nature
by sequencing mtDNA from blood tissues of the same individuals
• 77.78% of prioritized variants were tumor-specific
21 ovarian tumor pre-chemio
mtDNA samples
Sample Variant Allele HF Locus AA Change Nt Var Disease score
Tumor-
specific/Germline
EOC5 3380A 0.75 MT-ND1 R25Q 0.0003 0.8764 tumor-specific
EOC40 14969C 0.50 MT-CYB Y75H 0.0003 0.8526 tumor-specific
EOC16 9837A 0.45 MT-CO3 G211S 0.0000 0.8379 tumor-specific
EOC20 15255C 0.80 MT-CYB V170A 0.0000 0.8195 tumor-specific
EOC20 10696T 0.75 MT-ND4L A76V 0.0000 0.7810 tumor-specific
EOC14 6121C 0.45 MT-CO1 I73T 0.0007 0.7054 tumor-specific
EOC5 8412C 1.00 MT-ATP8 M16T 0.0023 0.6587 germline
EOC32 14249A 1.00 MT-ND6 A142V 0.0020 0.4498 germline
List of 8/268 prioritized non-synonymous affecting function mutations
6/21 mutated samples (33%)
• All synonymous mutations, occurring in site showing variability
values below the variability cutoff, resulted to be germline
Acknowledgements
mar. ’15 Mariangela Santorsola
Department of Medical and Surgical Sciences
University of Bologna
Giuseppe Gasparre
Claudia Calabrese
Rosanna Clima
Giulia Girolimetti
Department of Biosciences, Biotechnologies and Biopharmaceutics,
University of Bari
Prof Marcella Attimonelli
Saverio Vicario
Domenico Simone
Maria Angela Diroma
BiPday 2014 -- Santorsola Mariangela

BiPday 2014 -- Santorsola Mariangela

  • 1.
    A bioinformatics approachfor the prioritization of disease candidate human mtDNA mutations BiP-Day 2014 Seconda Giornata della Bioinformatica Pugliese Bari, 19 dicembre 2014 Mariangela Santorsola
  • 2.
    mar. ’15 MariangelaSantorsola 1 Mutations spreading in one or more populations and/or haplogroup-associated events 2 Rare mutations occurring in highly conserved sites lying to functional and selective constraints (somatic and/or germline) Potentially affecting function mitochondrial mutations Mitochondrial mutations
  • 3.
    MToolBox(1) functional annotation PathoTable of all possible non-synonymous mutations • Nucleotide variability values (SiteVar alghoritm (2)) • Six pathogenicity predictions from: 1 MutPred (3) 2 Polyphen-2 (4) 3 SNPs&GO (5) 1 Calabrese et al., 2014 2 Pesole and Saccone 2001. 3 Li et al., 2009 4 Adzhubei et al., 2010 5 Capriotti et al., 2013 mar. ’15 Mariangela Santorsola The discrepancy of pathogenicity predictions by different methods requires the use of a single score able to summarize such predictions and define a mitochondrial non-synonymous mutation as ’disease' or ’benign'.
  • 4.
    mar. ’15 MariangelaSantorsola The weighted mean of the probabilities to be deleterious provided by pathogenicity predictor methods for each i-th non-synonymous mutation Disease Score DSi = (Pi MP*WMP)+(Pi PPD*WPPD)+(Pi PPV*WPPV)+(Pi PT*WPT)+(Pi PS*WPS)+(Pi SG*WSG) WMP+WPPD+WPPV+WPT+WPS+WSG PMP = MutPred probability WMP= MutPred weight PPPD = Polyphen-2 HumDiv probability WPPD = Polyphen-2 HumDiv weight PPPV = Polyphen-2 HumVar probability WPPV = Polyphen-2 HumVar weight PPT = PANTHER probability WPT = PANTHER weight PPS = PhD-SNP probability WPS = PhD-SNP weight PSG = SNPs&GO probability WSG = SNPs&GO weight Ranging between 0 and 1
  • 5.
    Weight W=(hp+rp)/2n • hp =number of times the method provides the higher probability • rp = number of times the method provides the right prediction (“affecting function or disease”) • n = number of training mutations • An ideal method which provides n times the higher probability and n times the right prediction for n mutations would have weight 1 mar. ’15 Mariangela Santorsola Disease Score Training dataset 53 non-synonymous mutations previously validated as affecting function • 28 disease-associated mutations, annotated in Mitomap as ‘confirmed’ pathogenic by at least two or more independent laboratories (1) • 25 cancer-associated mutations previously validated (2) 1 http://www.mitomap.org 2 Pereira et al., 2012
  • 6.
    mar. ’15 MariangelaSantorsola Min 0.66 Median 0.87 Max 0.92 Min 0.05 Median 0.13 Max 0.43 Disease Benign Disease Benign Disease scores distribution of observed non-synonymous mutations predicted as ‘Benign’ or ‘Disease’ by the six pathogenicity predictors at the same time.
  • 7.
    >=0.4311 Disease score cutoff Bimodaldistribution of disease scores for 1872 observed non-synonymous mutations observed in 15385 mtDNA genomes from healthy individuals stored in HmtDB (1) (Last update May 2014) The disease score value for which the probability of belonging to the second ‘disease’ component of mixture model was ten times greater than the probability of belonging to the first ‘neutral’. 1 Rubino et al., 2012 Disease Scores Frequency mar. ’15 Mariangela Santorsola Disease Benign
  • 8.
    mar. ’15 MariangelaSantorsola Potentially affecting function mitochondrial mutations Low nucleotide variability value High disease score Nucleotide variability/Disease score correlation of all possible non-synonymous mutations
  • 9.
    mar. ’15 MariangelaSantorsola Nucleotide variability cutoff Nucleotide variability Frequency The nucleotide variability cutoff below which a mutation may be considered potentially deleterious was determined as the third quartile of the distribution of variability values associated to the 816 non-synonymous events featuring disease score above the established DS-cutoff. 3rd Qu. <= 0.0026
  • 10.
    mar. ’15 MariangelaSantorsola Polymorphic and haplogroup-associated vs rare mutations MToolBox Functional annotation • Hg_MHCS (Major Haplogroup Consensus Sequences(1) ) • rCRS (revised Cambridge Reference Sequence(2)) • RSRS (Reconstructed Sapiens Reference Sequence(3)) Phylogenetic relationships among virtual Major Haplogroup Consensus Sequences and two real mitochondrial sequences (Phylotree(4)) for each haplogroup 1 Calabrese et al., 2014 2 Anderson et al., 1981 3 Behar et al., 2012 4 van Oven and Kayser 2009
  • 11.
    Prioritization criteria ofmtDNA non-synonymous mutations affecting-function for future analysis mar. ’15 Mariangela Santorsola Recognized by three reference sequences Occurring in non-haplogroup defining sites Featuring nucleotide variability values <= 0.0026 Featuring Disease score >= 0.4311 Heteroplasmy level (*)
  • 12.
    mar. ’15 MariangelaSantorsola Application of MToolBox and prioritization criteria Check of tumor-specific nature by sequencing mtDNA from blood tissues of the same individuals • 77.78% of prioritized variants were tumor-specific 21 ovarian tumor pre-chemio mtDNA samples Sample Variant Allele HF Locus AA Change Nt Var Disease score Tumor- specific/Germline EOC5 3380A 0.75 MT-ND1 R25Q 0.0003 0.8764 tumor-specific EOC40 14969C 0.50 MT-CYB Y75H 0.0003 0.8526 tumor-specific EOC16 9837A 0.45 MT-CO3 G211S 0.0000 0.8379 tumor-specific EOC20 15255C 0.80 MT-CYB V170A 0.0000 0.8195 tumor-specific EOC20 10696T 0.75 MT-ND4L A76V 0.0000 0.7810 tumor-specific EOC14 6121C 0.45 MT-CO1 I73T 0.0007 0.7054 tumor-specific EOC5 8412C 1.00 MT-ATP8 M16T 0.0023 0.6587 germline EOC32 14249A 1.00 MT-ND6 A142V 0.0020 0.4498 germline List of 8/268 prioritized non-synonymous affecting function mutations 6/21 mutated samples (33%) • All synonymous mutations, occurring in site showing variability values below the variability cutoff, resulted to be germline
  • 13.
    Acknowledgements mar. ’15 MariangelaSantorsola Department of Medical and Surgical Sciences University of Bologna Giuseppe Gasparre Claudia Calabrese Rosanna Clima Giulia Girolimetti Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari Prof Marcella Attimonelli Saverio Vicario Domenico Simone Maria Angela Diroma