PhD thesis presentation


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PhD thesis presentation

  1. 1. Next-generation text-miningapplied to toxicogenomics data analysis Kristina Hettne PhD thesis defense 20 December, 2012
  2. 2. Toxicogenomics: study if a chemical causes damage to genesText mining: teach a computer to “read” articles and extract explicit informationNext-generation text mining: teach a computer to find implicit information in articles
  3. 3. Drug safety is essential! But… how to minimize animal testing?Image source: The Independent, July 12, 2012
  4. 4. Toxicogenomics data Interpretation using knowledge from manually curated databasesImage sources: Verhallen and Piersma, 2011, de Jong et al 2011,
  5. 5. Toxicogenomics data Interpretation using knowledge from manually curated databases Not sufficient in coverage We hypothesize that next-generation text mining can increase the information coverageImage sources: Verhallen and Piersma, 2011, de Jong et al 2011,
  6. 6. Next-generation text mining = concept profile matching Information cloud for a gene concept Shared concepts Information cloud for a chemical conceptImage source: Herman van Haagen 7
  7. 7. Concepts come from a thesaurus and are identified in text with concept identification software A good thesaurus = the basis for good concept identificationImage source: Herman van Haagen
  8. 8. Research objectives:• Investigate information coverage in public biomedical and chemical thesauri and databases• Provide methods to improve the quality and coverage• Give recommendations for use• Investigate added value of next- generation text mining when interpreting toxicogenomics data 9
  9. 9. Results 10
  10. 10. A thesaurus of chemical concepts1 andmethods1,2,3 to prepare a thesaurus to beused with concept identification software Hettne et al. Bioinformatics, 20092. Hettne et al. Journal of Biomedical Semantics, 2010 113. Hettne et al. Journal of Cheminformatics, 2010
  11. 11. A next-generation text mining-based method for interpreting biological data Next-generation Biological data Statistical test text mining 12 This method gives more, and more specific results1 than other available tools Jelier R, Goeman JJ, Hettne KM, Schuemie MJ, den Dunnen JT, t Hoen PA. Briefings in Bioinformatics, 2011
  12. 12. Application to toxicogenomics Hettne et al. (submitted)
  13. 13. See developmental defects in stem cells instead of in animal embryos Embryonic structure 1.2. Posterior neuropore open A) Control group rat embryo B)Triazole-exposed rat embryoImage sources1. Verhallen and Piersma, 2011, 2. De Jong et al 2012
  14. 14. Toxicity class prediction (case study: Triazoles) 25 times larger chemical-gene matrix compared to manual work (Comparative Toxicogenomics Database) Chemical 1.Image source 1: Verhallen and Piersma, 2011
  15. 15. ConclusionsNext-generation text mining combined withstatistical tests complements, and issometimes superior to, manually curateddatabases in:- Relating chemical information to gene expression data- Identifying toxic effects already at the gene expression stage- Discriminating between different classes of chemicals
  16. 16. Future1. Make the method easier to use(currently being worked on)2. Apply the method for new drugswith unknown toxicityEarly prediction of toxicity ->less animal testing and safer drugs
  17. 17. Thank you to all who made this possible!