Rinaldi - ODIN

1. Using ODIN for a PharmGKB revalidation experiment Fabio Rinaldi1 , Simon Clematide1 , Yael Garten2 , Michelle Whirl-Carrillo2 , Li Gong2 , Joan M. Hebert2 , Katrin Sangkuhl2 , Caroline F. Thorn2 , Teri E. Klein2 , Russ B. Altman2 . 1 OntoGene group, University of Zurich 2 PharmGKB group, Stanford University Biocuration 2012

2. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene Introduction PharmGKB OntoGene IE Approach Entities Interactions Revalidation Results Conclusion Outlook Acknowledgments Extra ME Ranking Evaluation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 2 / 42

3. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene PharmGKB Mission PharmGKB is a pharmacogenomics knowledge resource that encompasses clinical information, potentially clinically actionable gene-drug associations and genotype-phenotype relationships Approach PharmGKB collects, curates and disseminates knowledge about the impact of human genetic variation on drug responses through the many activities, including Annotating genetic variants and gene-drug-disease relationships via literature reviews Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 3 / 42

4. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene PharmGKB Mission PharmGKB is a pharmacogenomics knowledge resource that encompasses clinical information, potentially clinically actionable gene-drug associations and genotype-phenotype relationships Approach PharmGKB collects, curates and disseminates knowledge about the impact of human genetic variation on drug responses through the many activities, including Annotating genetic variants and gene-drug-disease relationships via literature reviews Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 3 / 42

5. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene http://www.pharmgkb.org/ Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 4 / 42

6. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene OntoGene group Aims Develop innovative text mining technologies for the automatic extraction of information from the biomedical literature. http://www.ontogene.org/ Selected results PPI,IMT BioCreative 2006 PPI BioCreative 2009 (best results) ACT, IMT, IAT, BioCreative 2010 Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 5 / 42

7. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene OntoGene group Aims Develop innovative text mining technologies for the automatic extraction of information from the biomedical literature. http://www.ontogene.org/ Selected results PPI,IMT BioCreative 2006 PPI BioCreative 2009 (best results) ACT, IMT, IAT, BioCreative 2010 Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 5 / 42

8. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene SASEBio: Missions SASEBio: Semi-automated semantic enrichment of biomedical texts Mission I “Relation/Text Mining”: Extraction of semantic relations between biomedical entities (proteins, genes, drugs) using linguistic text mining methods Mission II “Literature Curation”: Development of a ﬂexible interactive curation interface for eﬃcient human validation and annotation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 6 / 42

9. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene SASEBio: Missions SASEBio: Semi-automated semantic enrichment of biomedical texts Mission I “Relation/Text Mining”: Extraction of semantic relations between biomedical entities (proteins, genes, drugs) using linguistic text mining methods Mission II “Literature Curation”: Development of a ﬂexible interactive curation interface for eﬃcient human validation and annotation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 6 / 42

10. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene Relation/Text Mining: Automatic Document Analysis Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 7 / 42

11. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene Relation Mining: Syntactic Approach Using dependency parses and machine learning Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 8 / 42

12. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene Literature Curation: Interactive Curation Environment Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 9 / 42

13. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene ODIN: Interactive Curation Environment Using client-side Web-based techniques XML, CSS, DOM manipulation by JavaScript and AJAX Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 10 / 42

14. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGene ODIN: Interactive Curation Environment Extensive logging facilities Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 11 / 42

15. Intro IE Approach Revalidation Results Conclusion Extra Entities Interactions Introduction PharmGKB OntoGene IE Approach Entities Interactions Revalidation Results Conclusion Outlook Acknowledgments Extra ME Ranking Evaluation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 12 / 42

16. Intro IE Approach Revalidation Results Conclusion Extra Entities Interactions Relations between Genes, Drugs, Diseases PharmGKB: Pharmacogenomics Knowledge Base as a Gold Standard Subset of information in PharmGKB used: 26,122 binary relations between diseases, drugs, and genes 5062 PubMed abstracts referenced Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 13 / 42

17. Intro IE Approach Revalidation Results Conclusion Extra Entities Interactions Relations between Genes, Drugs, Diseases PharmGKB: Pharmacogenomics Knowledge Base as a Gold Standard Subset of information in PharmGKB used: 26,122 binary relations between diseases, drugs, and genes 5062 PubMed abstracts referenced Goal Compute high-quality relation candidates and rank them according to a conﬁdence score. Information used for text mining PubMed abstracts plus MeSH terms and chemical substances terms. Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 13 / 42

18. Intro IE Approach Revalidation Results Conclusion Extra Entities Interactions Baseline: Abstract-wide Co-occurence-based Candidate Relation Generation Basic idea Combine all concepts identiﬁed in the abstract into relation candidate pairs. However, do not combine concepts stemming from the same ambiguous term. Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 14 / 42

19. Intro IE Approach Revalidation Results Conclusion Extra Entities Interactions Baseline: Abstract-wide Co-occurence-based Candidate Relation Generation Basic idea Combine all concepts identiﬁed in the abstract into relation candidate pairs. However, do not combine concepts stemming from the same ambiguous term. Basic ranking: Occurrences and zoning Score of a pair of concepts c1 , c2 in an abstract (C = all concepts): freq(c1 ) + freq(c2 ) score(c1 , c2 ) = freq(C ) Text zone boosting: An occurrence in an article title is counted 10 times. Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 14 / 42

20. Intro IE Approach Revalidation Results Conclusion Extra Entities Interactions Improving Relation Ranking Core ideas for improved ranking Identify noisy concepts recognized by term recognizer and penalize them. Weight individual concepts according to their likeliness to appear in a gold relation! Adapt ranking of relations to gold standard. Combine the weights of individual concepts for the score of relation candidates. Generally penalize relations of the same type (rare phenomenon) Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 15 / 42

23. Intro IE Approach Revalidation Results Conclusion Extra Introduction PharmGKB OntoGene IE Approach Entities Interactions Revalidation Results Conclusion Outlook Acknowledgments Extra ME Ranking Evaluation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 16 / 42

24. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Experiment Goal Revalidation of PharmGKB relations with respect to false positives. Collaboration with Stanford Center for Biomedical Informatics Research Relations Articles In 3059 out of 5378 articles we ﬁnd all 2 8 relations. 3 9 4 2 Keep 1407 where number of relations > 1 and 5 3 ≤ 20. 6-7 1 Almost half of 3059 contain only 1 relation. 8-9 1 10-20 1 Each of the 5 curators revalidates 25 articles Sampling of articles according to number relations per article Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 17 / 42

28. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Process and Categories Revalidation process Our initial setup from IAT BioCreative task: Curator deletes unwanted relations and exports the wanted. But curators didn’t like that: The want checkboxes for revalidation categories for each relation http://kitt.cl.uzh.ch/kitt/bcms/pharmgkbmeB/#pmid=11990384 Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 18 / 42

29. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Process and Categories Revalidation process Our initial setup from IAT BioCreative task: Curator deletes unwanted relations and exports the wanted. But curators didn’t like that: The want checkboxes for revalidation categories for each relation Revalidation categories Our initial setup: veriﬁed = true positive; falsiﬁed = false positive But curators wanted more: Need full text: A relation can only be revalidated by recourse to full text Negative relation: Article denies a relation between two entities http://kitt.cl.uzh.ch/kitt/bcms/pharmgkbmeB/#pmid=11990384 Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 18 / 42

30. Intro IE Approach Revalidation Results Conclusion Extra Customized ODIN interface Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 19 / 42

31. Intro IE Approach Revalidation Results Conclusion Extra Lessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable! Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42

37. Intro IE Approach Revalidation Results Conclusion Extra Introduction PharmGKB OntoGene IE Approach Entities Interactions Revalidation Results Conclusion Outlook Acknowledgments Extra ME Ranking Evaluation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 21 / 42

38. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Results reject needs full text negative confirm Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 22 / 42

39. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Results by Relation Types reject needs full text negative confirm 150 Number of relations 100 50 0 Disease/Drug Disease/Ds. Drug/Drug Drug/Gene Gene/Gene Relation types Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 23 / 42

40. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Results by Curators reject 70 needs full text negative confirm 60 50 Number of relations 40 30 20 10 0 A B C D E Curator Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 24 / 42

41. Intro IE Approach Revalidation Results Conclusion Extra Revalidation Results by Conﬁdence Score Ranking 1.0 confirm negative Relative distribution of decisions for curated relations needs full text reject 0.8 0.6 0.4 0.2 0.0 1. 2. 3−5. 6−20. Rank of a relation according to the confidence score Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 25 / 42

42. Intro IE Approach Revalidation Results Conclusion Extra Concept Identiﬁcation Quality as Rated by Curators bad N/A ok good Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 26 / 42

43. Intro IE Approach Revalidation Results Conclusion Extra Concept Identiﬁcation Quality as Rated by Curators 25 N/A good ok bad 20 15 Articles 10 5 0 A B C D E Curator Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 27 / 42

44. Intro IE Approach Revalidation Results Conclusion Extra Meantime for Decision Taking for One Relation q 350 q q q Meantime of curation time per article in seconds 300 q 250 q 200 q 150 q q 100 q q q 50 q q 0 A B C D E Curator Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 28 / 42

45. Intro IE Approach Revalidation Results Conclusion Extra Concept Identiﬁcation Quality and Meantime for Decision Taking 350 q q q q Meantime of curation time per article in seconds 300 q 250 q 200 q q 150 q 100 50 0 bad ok good Rating of quality of concept identification per article Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 29 / 42

46. Intro IE Approach Revalidation Results Conclusion Extra Outlook Acknowledgments Introduction PharmGKB OntoGene IE Approach Entities Interactions Revalidation Results Conclusion Outlook Acknowledgments Extra ME Ranking Evaluation Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 30 / 42

47. Intro IE Approach Revalidation Results Conclusion Extra Outlook Acknowledgments Conclusion The PharmGKB resource is an interesting gold standard for relation detection between drugs, genes and diseases (apart from the common protein-protein interaction detection task) Proper ranking is crucial for real-world applications. Supervised machine learning methods improve rankings dramatically. Usability of the interface as a crucial acceptability criteria. Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 31 / 42

48. Intro IE Approach Revalidation Results Conclusion Extra Outlook Acknowledgments Future Work For measuring inter-annotator agreement, each article sample should be revalidated by at least two curators Another experiment for the detection of false negatives: Select PubMed articles where our text mining systems suggests a non-existing relation with high conﬁdence score. Consider other databases: we are interested in research collaborations. Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 32 / 42

51. Intro IE Approach Revalidation Results Conclusion Extra Outlook Acknowledgments SMBM 2012 Semantic Mining in Biomedicine, Zurich, September 3-4, 2012 http://www.smbm.eu/ Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 33 / 42

52. Intro IE Approach Revalidation Results Conclusion Extra Outlook Acknowledgments SMBM 2012 Semantic Mining in Biomedicine, Zurich, September 3-4, 2012 http://www.smbm.eu/ Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 33 / 42

53. Intro IE Approach Revalidation Results Conclusion Extra Outlook Acknowledgments Acknowledgements Yael Garten, Michelle Whirl-Carillo, Li Gong, Joan M. Hebert, Katrin Sangkuhl, Caroline F. Thorn, Teri E. Klein, Russ B. Altman from Stanford University Gerold Schneider and Kaarel Kaljurand Martin Romacker from NITAS, Novartis Thank you for your attention! Questions? Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 34 / 42

Rinaldi - ODIN

Recommended

Recommended

More Related Content

Similar to Rinaldi - ODIN

Similar to Rinaldi - ODIN (20)

Recently uploaded

Recently uploaded (20)

Rinaldi - ODIN