PhDc exam presentation

1,050 views

Published on

This are the slides I used on my PhD candidature exam.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,050
On SlideShare
0
From Embeds
0
Number of Embeds
288
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PhDc exam presentation

  1. 1. Functional Characterisation ofMetabolic NetworksCarlos Manuel Estévez-Bretón MScDoctorate in Systems Engineering and Computer SciencesAdvisors: Luis Fernando Niño PhDLiliana Lopez Kleine PhDIntelligent Systems Research Laboratory - LISIBioinformatics and Computational Biology research line “BioLisi”Examining Committee:Dr. Jason Papin, -U. ofVirginia, Bioengineering.Dr.Andres Gonzalez, - U. de los Andes, Chemical Engineering.Dr. Fabio Gonzalez, U. Nacional, Systems Engineering.
  2. 2. What...Why...Research QuestionHow...Progress ...AgendaGoalsEvaluationDeliverables
  3. 3. What?http://www.impactcommunicationsinc.com/wp-content/uploads/2011/10/11-11_speak_up.jpg
  4. 4. Metabolism are thecomplete set ofmetabolicnetworks andphysical processesthat determine thephysiological andbiochemical propertiesof a cell.With the sequencing of completegenomes, it is now possible toreconstruct the network of biochemicalreactions in many organisms, frombacteria to humans...
  5. 5. PMC 2011 August 17.Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459.doi: 10.1002/wsbm.75Ecological ScaleLucas B. Edelman, James A. Eddy, and Nathan D. PriceSystems BiologyIntroduction
  6. 6. PMC 2011 August 17.Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459.doi: 10.1002/wsbm.75Ecological ScaleLucas B. Edelman, James A. Eddy, and Nathan D. PriceSystems BiologyIntroduction
  7. 7. PMC 2011 August 17.Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459.doi: 10.1002/wsbm.75Ecological ScaleLucas B. Edelman, James A. Eddy, and Nathan D. PriceSystems BiologyIntroduction
  8. 8. PMC 2011 August 17.Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459.doi: 10.1002/wsbm.75Ecological ScaleLucas B. Edelman, James A. Eddy, and Nathan D. PriceSystems BiologyIntroduction
  9. 9. PMC 2011 August 17.Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459.doi: 10.1002/wsbm.75Ecological ScaleLucas B. Edelman, James A. Eddy, and Nathan D. PriceMultilevelfieldSystems BiologyIntroduction
  10. 10. PMC 2011 August 17.Wiley Interdiscip Rev Syst Biol Med. 2010 Jul-Aug; 2(4): 438–459.doi: 10.1002/wsbm.75Ecological ScaleLucas B. Edelman, James A. Eddy, and Nathan D. PriceMultilevelfieldStudiedInterdisciplinarySystems BiologyIntroduction
  11. 11. IntroductionBetter and cheaper processing power
  12. 12. Multilevel InformationIntroductionBetter and cheaper processing power
  13. 13. IntroductionRegulatory NetworksProtein Protein InteractionNetworksMetabolic NetworksEcological Networks
  14. 14. IntroductionRegulatory NetworksProtein Protein InteractionNetworksMetabolic NetworksEcological NetworksMain Data Sources
  15. 15. “Techniques such as high-trougput (HT)sequencing and gene/protein profiling havetransformed biological Research” (Khatri et al,2012)“In this way,the advent of HT profiling technologiespresents a new challenge,that of extracting meaning froma long list of differentially expressed genes and proteins”.(Khatri et al,2012)
  16. 16. “Techniques such as high-trougput (HT)sequencing and gene/protein profiling havetransformed biological Research” (Khatri et al,2012)“In this way,the advent of HT profiling technologiespresents a new challenge,that of extracting meaning froma long list of differentially expressed genes and proteins”.(Khatri et al,2012)These biological techniques changes the way we studybiological science.Interdisciplinary effort to extract meaning, analyze, andobtain information with high levels of confidence andquality.
  17. 17. [14:56 18/11/2011 Bioinformatics-btr585.tex] Page: 3331 3331–3332commonly used in bioinformatics and their common synonyms,plural forms and abbreviations. We then searched this list againstthe PubMed titles and abstracts to identify the number of paperspublished per year for each machine learning technique. To match asmany papers as possible, searches were case insensitive and allowedfor variation in hyphenation.Fig. 1. The growth of supervised machine learning methods in PubMed.∗To whom correspondence should be addressedperhaps going out of fashion. The results show that none of themajor league methods has gone out of fashion, but we do seemoderate decreases in the use of both ANNs and Markov models inthe literature.We were also curious to find out if certain machine learningtechniques were used in combination with each other. To investigatethis, we looked at what machine learning methods are co-mentionedin articles (See Fig. 2). For all pairs of methods from the SupervisedFig. 2. Heatmap showing the co-occurrence of machine learning techniqueswithin articles.© The Author(s) 2011. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.byguestonDecember7,2011ormatics.oxfordjournals.org/“Hot techniques”: ANN,Markov Models,and“new ones”SVM and Random Forests.(Jensen & Bateman in 2011)IntelligentSystemsLatent Topic Analysisis not in the list of methods.
  18. 18. “In particular,supervised machine learning has beenused to great effect in numerous bioinformaticsprediction methods”.(Jensen & Bateman,2011)Machine learning is of immense importance inbioinformatics and more generally for biomedicalsciences (Larrañaga et al.,2006;Tarca et al.,2007).Because in metabolic systems analysis,is not common,I think that is important to emphasise that:
  19. 19. There are no references in the literature foranalysis of metabolic pathways from afunctional approach,or using proposedmachine learning methods.IntelligentSystems
  20. 20. Larrañaga et al. bib.oxfordjournals.org at The Reference Shelf on May 30, 2011achineLearning
  21. 21. Larrañaga et al. bib.oxfordjournals.org at The Reference Shelf on May 30, 2011Bayesian classifiers, Feature subsetselectionSVM,ANN, classification trees,Evolutionary algorithmstabu searchnearest neighbour, SVM, Bayesianclassifier, fuzzy k-NNBayesiangeneralizationoftheSVM,ANN,lineardiscriminantanalysis,classificationtrees,ANNSVMandHMM,linear discriminant analysis,quadratic discriminantanalysis, k-NN classifier,bagging and boostingclassification trees, SVM andrandom forestachineLearning
  22. 22. Larrañaga et al. bib.oxfordjournals.org at The Reference Shelf on May 30, 2011Bayesian classifiers, Feature subsetselectionSVM,ANN, classification trees,Evolutionary algorithmstabu searchnearest neighbour, SVM, Bayesianclassifier, fuzzy k-NNBayesiangeneralizationoftheSVM,ANN,lineardiscriminantanalysis,classificationtrees,ANNprobabilistic graphicalmodels, classificationtrees, boosting withclassification treesSVMandHMM,linear discriminant analysis,quadratic discriminantanalysis, k-NN classifier,bagging and boostingclassification trees, SVM andrandom forestachineLearning
  23. 23. Why?http://www.perftrends.com/images/why.jpg
  24. 24. ... or Methods arenot applied toMetabolic Pathways......or are based onTopological (Graph Based)network representations
  25. 25. • It should be possible to make someadvances in understanding theunderlying functional conformationof metabolic pathways.Statementhttp://www.scriptmag.com/wp-content/uploads/BrainStorm-NewColor-12-22_32-1280x980at86.jpg
  26. 26. http://www.scriptmag.com/wp-content/uploads/BrainStorm-NewColor-12-22_32-1280x980at86.jpg• Supervised Clustering - useful to test thegiven representation - by classifying the biochemicalreactions.http://www.ee.ryerson.ca/~courses/ele888/ele_888_pat_class.gifStatement
  27. 27. http://diversity-mining-lab.wikispaces.com/Statement
  28. 28. • Information Retrieval algebraic models, likevector space based ones, should “reveal” topicsthat occurs in document collections.• Is it possible to generate new - “really new” pathways?• ...I’m talking about synthetic biology.http://diversity-mining-lab.wikispaces.com/Statement
  29. 29. Research QuestionIs it possible toclassify metabolicnetworks onlyusing functionalfeatures?
  30. 30. How?http://www.wired.com/images_blogs/threatlevel/2012/10/harris002.jpg
  31. 31. Goals• To Classify functionally, (without considering thetopological structure) metabolic pathwaysbased on machine learning methods.
  32. 32. Goals• To Classify functionally, (without considering thetopological structure) metabolic pathwaysbased on machine learning methods.• To Build or adapt a system of functional representationfor metabolic networks.
  33. 33. Goals• To Classify functionally, (without considering thetopological structure) metabolic pathwaysbased on machine learning methods.• To Build or adapt a system of functional representationfor metabolic networks.• To Classify metabolic networks using machine learningmethods.
  34. 34. Goals• To Classify functionally, (without considering thetopological structure) metabolic pathwaysbased on machine learning methods.• To Build or adapt a system of functional representationfor metabolic networks.• To Classify metabolic networks using machine learningmethods.• To Apply (in new ways) machine learning methods inthe study of systems biology.
  35. 35. MethodologyS1 + S2 + … Sn P1 + P2 + … PnEnzimeCoFactor CoEnzimeGeneral Metabolic Reaction Model - GMRMVectorization of GMRMS1 S2 S3 Enzime CoF CoE P1 P2 P3MetaCycKEGG12RepresentationClassificationCarlosManuelEstévez-BretónR.2012DataSourceEvaluationMethod 2Method 1ROCConfusionmatrixEntropypurityadjustedRand IndexAccuracyPipelinepaper paperpaper
  36. 36. DataSources MetaCycKEGG12
  37. 37. DataRepresentationS1 + S2 + … Sn P1 + P2 + … PnEnzimeCoFactor CoEnzimeGeneral Metabolic Reaction Model - GMRMVectorization of GMRMS1 S2 S3 Enzime CoF CoE P1 P2 P3
  38. 38. ClassificationSupervised ClassificationMethod 1
  39. 39. •Let’s think about clustering without anyprior knowledge...• Applying Information Retrieval methods toMetabolic Pathways data.Method 2
  40. 40. Evaluation ROCConfusionmatrixEntropypurityadjustedRand IndexAccuracyhttp://www.intechopen.com/source/html/38584/media/image56.jpegClassified as:Really is:Positive NegativePositiveNegativeFalse NegativeTrue NegativeFalse PositiveTrue Positive
  41. 41. Evaluation ROCConfusionmatrixEntropypurityadjustedRand IndexAccuracyhttp://www.intechopen.com/source/html/38584/media/image56.jpegClassified as:Really is:Positive NegativePositiveNegativeFalse NegativeTrue NegativeFalse PositiveTrue PositiveError RateRecall/sensitivitySpecificity/True Negative RatePrecision1-Specificity/False Alarm Rate
  42. 42. EvaluationROCConfusionmatrixEntropypurityadjustedRand IndexAccuracyhttp://www.intechopen.com/source/html/38584/media/image56.jpeghttp://wwww.cbgstat.com/v2/method_ROC_curve_MedCalc/images/ROC_curve_MedCalc_Snap17.gif
  43. 43. DeliverablesA computational metabolicrepresentation proposalA computational metabolicclassification methodA generative metabolicpathways modelA pipeline for metabolicpathways analysis
  44. 44. Progress ...http://desktop.freewallpaper4.me/view/original/3714/the-lonely-man.jpg
  45. 45. PreliminaryResultsS1 + S2 + … Sn P1 + P2 + … PnEnzimeCoFactor CoEnzimeGeneral Metabolic Reaction Model - GMRMVectorization of GMRMS1 S2 S3 Enzime CoF CoE P1 P2 P3MetaCycKEGG12RepresentationClassificationCarlosManuelEstévez-BretónR.2012DataSourceEvaluationMethod 2Method 1ROCConfusionmatrixEntropypurityadjustedRand IndexAccuracyPipelinepaper paperpaper
  46. 46. ComplexityMetabolic PathwayReactionMetabolites/omeMetabolic SwitchGlucoseGlucose 6P ATPHidrolasePyrophosphateVocabularyWords MoleculestheMurder for a jar of red rumfrogsoapDocumentPhraseParagraphrum Murder forjaraofredrum Murder forjaraofredGlucose Glucose 6PATPHidrolaseADP+ +ADPLinguisticAnalogyS1 + S2 + … Sn P1 + P2 + … PnEnzimeCoFactor CoEnzimeGeneral Metabolic Reaction Model - GMRMVectorization of GMRMS1 S2 S3 Enzime CoF CoE P1 P2 P3
  47. 47. RepresentationS1 + S2 + … Sn P1 + P2 + … PnEnzimeCoFactor CoEnzimeGeneral Metabolic Reaction Model - GMRMVectorization of GMRMS1 S2 S3 Enzime CoF CoE P1 P2 P3
  48. 48. Classification Supervised4Pathways2carbohydrate metabolism1lipid metabolism1from nucleotide metabolismSupport Vector MachinesClassification TreeK Nearest NeighbourCN2Naive Bayes24organismsMethod 1
  49. 49. Pipeline
  50. 50. Review- Proposing a vector representation of biochemicalreactions, based in a linguistic analogy.I´m going to classify metabolic networksonly using functional features...To find patterns that suggests constitutionrules on metabolic pathways.- Searching patterns by clustering.
  51. 51. Thanks@karelman

×