Identification of pathological mutationsfrom the single-gene case to exomeprojects: lessons from the Fabry diseaseXavier d...
Identificationofpathological… Interpretationcontextofmutation data Identifyingpathologicalmutations:    ◦ Presenttools  ...
From base pairs to bedside                    (Green & Guyer, Nature, 2011)              Understanding            Understa...
So, is there a problem?                                    2017                                    $517000                ...
From base pairs to …Sample             Exome sequencing  Variant identification            and      quality control       ...
The interpretation problem   “…enormous amounts of raw data, but    still very little understanding of what it    means”...
Exome-ready mutation annotationtools   PolyPhen (Adzhubei et al., 2010), SIFT    (Kumar et al., 2009):    ◦ Mutation mini...
PERFORMANCE
CONDEL                     González-Pérez & López-Bigas, 2011                            ROC area:                        ...
Limitsofpresentannotationtools   Consensustools:    ◦ Understandingof molecular      damageisharder    ◦ Theydependonthee...
Type III Hereditary Hemochromatosis – TFR2 • TFR2, a dimeric type II transmembrane membrane protein expressed mostly in th...
Fabry disease   Systemic disorder characterized by:    progressive renal failure,    cardiovascular or cerebrovascular   ...
CYS52
PRESENT PREDICTORSGENE 1   PATHOL. MUT. 1GENE 2   PATHOL. MUT. 2                          MUTATIONGENE 3   PATHOL. MUT. 3 ...
GENE-SPECIFICPREDICTORS GENE 1   PATHOL. MUT. 1   MUTATION PREDICTOR 1 GENE 2   PATHOL. MUT. 2   MUTATION PREDICTOR 2 GENE...
Improving mutation annotation tools   Train in single genes (Ferrer-Costa et    al., 2004): increase 5%-10%    successrate
METHOD   S
MUTATIONproperty 1                                              property N             property 2       property i   prope...
DATAMINE pathological mutations          CHARACTERIZE PROTEIN DAMAGE          BUILD COMPUTATIONAL MODEL                   ...
Datamine PathologicalMutations General databases: UniProt, OMIM Specificdatabases:    ◦ Fabrydatabase(http://fabry-datab...
Proteinstability        Functional interactionsProtein damage…KKRHCSGWL…                       Unspecific cellular        ...
Conceptual context: impact of mutationson protein structure/function   Empirical rules from site-directed    mutagenesis ...
Mutation properties   Sequence-based: V,           , Blosum62    elements, etc   Structure-based: relate to mutation    ...
Multiple alignments   Low similarity, only two sequences:     AVTTGLNMWTTAKRPGMDDFYTILLPGLMNCI     GLFTAIDMHFFGRKPACEEYFT...
MSA: thetechnicalside Forverydivergentproteinsgood MSA  are veryhard to obtain Protocol to buildalignments:    ◦ Recover...
Ourpredictor   7 properties: sequence-based (       V, ,    Blosum62), structure (relativeaccessibility), MSA-    based (...
Performance measure: Matthewscorrelationcoefficient   MCC=(tp.tn-fp.fn)/[(tp+fp)(tp+fn)(tn+fp)(tn+fn)]1/2   -1≤ MCC ≤1  ...
ult
Pathogenicity prediction in Fabrydisease   Mutationdataset: 313 pathological and    59 neutral mutations   Discriminantp...
Aminoacid volume
Residueconservation
Performance
ROC curves
Performance (Successrate)
Performance (MCC)
GENE-SPECIFICPREDICTORS -galactosidase   PREDICTOR          MYH7    PREDICTOR          ………     PREDICTOR
MYH7 (Beta-cardiac myosin heavychain)   Largestructuralprotein (1390aa)   Mutations cause familial    hypertrophiccardio...
Performance (Successrate)
Performance (MCC)
Gene-specific performancesQtot= (tp+tn)/(tp+tn+fp+fn)   Sensitivity= tp/(tp+fn)   Specificity= tn/(tn+fp)                 ...
Futuredirections:pathogenicityprediction/analysis   Extend to more genes:    ◦ Enoughmutation data    ◦ Notenoughmutation...
Summary Thereisroomforimprovement in  mutationannotationtools We are developping a new, gene-  basedtoolthatimprovespres...
WORKING TOGETHER
Towards a uniquemutationdamagereport   Standardizethedescription/reportingof    mutationimpact:    ◦   Sequence-level    ...
TRANSLATIONAL BIOINFORMATICS              IN NEUROSCIENCES GROUP•Neurovascular Disease, Neurosciences    •Joan Montaner   ...
Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease (Xavi...
Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease (Xavi...
Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease (Xavi...
Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease (Xavi...
Upcoming SlideShare
Loading in …5
×

Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease (Xavier de la Cruz) Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease

691 views
533 views

Published on

*Watch the video at the end of the presentation
Seminar led by Dr. Xavier de la Cruz, ICREA Research Professor. Head of the Translational Bioinformatics in Neuroscience group of VHIR, at VHIR (22nd November 2012).

Content: The need to identify the pathological character of mutations may arise in different contexts in biomedical research. However, the methods available to address this problem essentially depend on the number of cases under analysis. When we work with only a few mutations we can use an artisan-like approach, where all information available on protein sequence, structure and function is manually retrieved and studied. However, when we need to characterize many variants, as can be the case in exome projects, faster methods are required to assess their pathogenicity. In my talk I will illustrate the principles underlying these two approaches with examples from the study of Fabry disease mutations, resulting from our collaborative work at the VHIR.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
691
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease (Xavier de la Cruz) Identification of pathological mutations from the single-gene case to exome projects: lessons from the Fabry disease

  1. 1. Identification of pathological mutationsfrom the single-gene case to exomeprojects: lessons from the Fabry diseaseXavier de la Cruz
  2. 2. Identificationofpathological… Interpretationcontextofmutation data Identifyingpathologicalmutations: ◦ Presenttools ◦ Problems? A VHIR-basedtoolformutationscoring ◦ Development ◦ Performance Futuredirections ◦ Implementing a standardizedmutationreport
  3. 3. From base pairs to bedside (Green & Guyer, Nature, 2011) Understanding Understanding Improving genome disease biology healthcare structure Understanding Advancing effectiveness genome biology medical science 1990-2003 Genome Project 2004-2010 2011-2020Beyond 2020
  4. 4. So, is there a problem? 2017 $517000 $285000 INTERPRETATION COST <$100
  5. 5. From base pairs to …Sample Exome sequencing Variant identification and quality control INTERPRETATION
  6. 6. The interpretation problem “…enormous amounts of raw data, but still very little understanding of what it means” Exome sequencing context: ◦ Identify disease causative variants ◦ Prioritize of variants ◦ Speed: can we do this for 100‟s-1000‟s variants in “reasonable” time? ◦ Reliability: can we provide good error models for counseling/diagnosis/prognosis?
  7. 7. Exome-ready mutation annotationtools PolyPhen (Adzhubei et al., 2010), SIFT (Kumar et al., 2009): ◦ Mutation mining  pathological : databases + literature + private datasets  neutral: experimental sets (LacI, lysozyme, etc), evolutionary model, databases (dbSNP) ◦ Building model: machine learning
  8. 8. PERFORMANCE
  9. 9. CONDEL González-Pérez & López-Bigas, 2011 ROC area: •CONDEL: 0.849 •CAROL: 0.852 CAROLLopes et al., 2012
  10. 10. Limitsofpresentannotationtools Consensustools: ◦ Understandingof molecular damageisharder ◦ Theydependontheexistenceofprimarytoold s (PolyPhen, SIFT) Primary (PolyPhen, SIFT): ◦ Average overmanymutationsand genes
  11. 11. Type III Hereditary Hemochromatosis – TFR2 • TFR2, a dimeric type II transmembrane membrane protein expressed mostly in the liver and CD71+ early erythroids. • At least 50 families and 69 patients have been described with mutations in TFR2 gene.
  12. 12. Fabry disease Systemic disorder characterized by: progressive renal failure, cardiovascular or cerebrovascular disease, etc. Caused by mutations in lysosomal enzyme -galactosidase A
  13. 13. CYS52
  14. 14. PRESENT PREDICTORSGENE 1 PATHOL. MUT. 1GENE 2 PATHOL. MUT. 2 MUTATIONGENE 3 PATHOL. MUT. 3 PREDICTOR……… ………
  15. 15. GENE-SPECIFICPREDICTORS GENE 1 PATHOL. MUT. 1 MUTATION PREDICTOR 1 GENE 2 PATHOL. MUT. 2 MUTATION PREDICTOR 2 GENE 3 PATHOL. MUT. 3 MUTATION PREDICTOR 3 ……… ……… MUTATION PREDICTOR …
  16. 16. Improving mutation annotation tools Train in single genes (Ferrer-Costa et al., 2004): increase 5%-10% successrate
  17. 17. METHOD S
  18. 18. MUTATIONproperty 1 property N property 2 property i property j PATHOLOGICAL / NEUTRAL
  19. 19. DATAMINE pathological mutations CHARACTERIZE PROTEIN DAMAGE BUILD COMPUTATIONAL MODEL •Experimental Application: study • score •Counseling • prioritize •Etc
  20. 20. Datamine PathologicalMutations General databases: UniProt, OMIM Specificdatabases: ◦ Fabrydatabase(http://fabry-database.org/) ◦ p53 database(http://p53.free.fr/Database p53_database.html) Literature Institutionmutationcollections
  21. 21. Proteinstability Functional interactionsProtein damage…KKRHCSGWL… Unspecific cellular Y interactions
  22. 22. Conceptual context: impact of mutationson protein structure/function Empirical rules from site-directed mutagenesis (‟80s, „90s): ◦ break disulphide bridges, burial of charged residues, hydrogen bond loss, disturb protein-protein interface, etc ◦ protein structure destabilization is associated to function loss Evolutionary conservation is linked to biological function
  23. 23. Mutation properties Sequence-based: V, , Blosum62 elements, etc Structure-based: relate to mutation location: accessibility, contact number and type, etc Evolutionary-based properties: ◦ wild-type (wt) conservation degree ◦ mutant rarity ◦ sequence variability at the mutation locus (entropy)
  24. 24. Multiple alignments Low similarity, only two sequences: AVTTGLNMWTTAKRPGMDDFYTILLPGLMNCI GLFTAIDMHFFGRKPACEEYFTLVVDGLCNCI Low similarity, multiple sequences: GIFTDIDMHFYVKKPGLDEFFTLVLRTLCMAA ALTTGIDMWTTAKRPDMDDYYTIIIPGLMNCI AVTTGLNMWTTAKRPGMDDFYTILLPGLMNCI GVTTGLNMYFTARRPGLDEFYTLVLRTLCMCL GIFTDIDMHFYVKKPGLDEFFTLVLRTLCMAA AVTTGLNMWTTAKRPGMDDFYTILLPGLMNCI GLFTALNMHFFGRKPACEEYFTLVVDGLCNCI
  25. 25. MSA: thetechnicalside Forverydivergentproteinsgood MSA are veryhard to obtain Protocol to buildalignments: ◦ RecoverfamilymemberswithPsiBlast (E- value:0.001; seq.id.>40%) UniRef100 ◦ Align with MUSCLE Conservation may be misleading: ◦ proteinfunctionishighlyrelevantfor living beings. E.g. histones. OK ! ◦ databasebias. E.g. onlyhominidaesequences are available. PROBLEM ?!
  26. 26. Ourpredictor 7 properties: sequence-based ( V, , Blosum62), structure (relativeaccessibility), MSA- based (entropy, pssm(wt), pssm(mt)) Neural networks (Wekapackage) ◦ Multilayerpercetron (1 hiddenlayer-4 units) ◦ No hiddenlayer Training: 2-fold cross- validationscheme (25 replicas to
  27. 27. Performance measure: Matthewscorrelationcoefficient MCC=(tp.tn-fp.fn)/[(tp+fp)(tp+fn)(tn+fp)(tn+fn)]1/2 -1≤ MCC ≤1 ◦ 0: predictivepower similar to random ◦ 1:perfectpredictionpower ◦ -1: badprediction, smallsamples, theproblemcannot be solved?
  28. 28. ult
  29. 29. Pathogenicity prediction in Fabrydisease Mutationdataset: 313 pathological and 59 neutral mutations Discriminantpowerofparameters Performance
  30. 30. Aminoacid volume
  31. 31. Residueconservation
  32. 32. Performance
  33. 33. ROC curves
  34. 34. Performance (Successrate)
  35. 35. Performance (MCC)
  36. 36. GENE-SPECIFICPREDICTORS -galactosidase PREDICTOR MYH7 PREDICTOR ……… PREDICTOR
  37. 37. MYH7 (Beta-cardiac myosin heavychain) Largestructuralprotein (1390aa) Mutations cause familial hypertrophiccardiomyopathy 1 MutationdatasetobtainedfromUniProt, OMIM andCardioGenomics ◦ 74 disease-causingmutations ◦ 45 neutral mutations (MSA)
  38. 38. Performance (Successrate)
  39. 39. Performance (MCC)
  40. 40. Gene-specific performancesQtot= (tp+tn)/(tp+tn+fp+fn) Sensitivity= tp/(tp+fn) Specificity= tn/(tn+fp) (neutral) (pathological)
  41. 41. Futuredirections:pathogenicityprediction/analysis Extend to more genes: ◦ Enoughmutation data ◦ Notenoughmutation data Can wepredictotherdiseasephenotypes? ◦ First tests suggest a similar approachcouldworkforseverity
  42. 42. Summary Thereisroomforimprovement in mutationannotationtools We are developping a new, gene- basedtoolthatimprovespresentmethod s Ourmethodwillworkforlarge- scalescoringprojects (exome) andfor single-mutationanalyses
  43. 43. WORKING TOGETHER
  44. 44. Towards a uniquemutationdamagereport Standardizethedescription/reportingof mutationimpact: ◦ Sequence-level ◦ Structure-level ◦ MSA ◦ Miscellaneousinformation Communityeffort
  45. 45. TRANSLATIONAL BIOINFORMATICS IN NEUROSCIENCES GROUP•Neurovascular Disease, Neurosciences •Joan Montaner •Israel Fernández-Cadenas•Nanomed.lysos.storage diseas., CIBBIM, Nanomedicine •M.Carmen Domínguez•Immunology, Respiratory & Systemic Diseases: •Ricardo Pujol •Mónica Martínez •Roger Colobran•Neuromusc.& Mitoch.Pathol, Neurosciences: •Elena García •Tomás Pinós•Cancer and Iron group, IMPPC: •Mayka Sánchez •Ricky Joshi•Biomedicine & Translat. & Pediatrics Oncol., Oncology •Jaume Reventos •Eva Colás •Andreas Doll •Marina Rigau •Marta García

×