Rinaldi - ODIN

446 views

Published on

Using ODIN for a PharmGKB re-validation experiment, by Rinaldi, Fabio; Clematide, Simon; Garten, Yael; Whirl-Carrillo, Michelle; Gong, Li; Hebert, Joan; Sangkuhl, Katrin; Thorn, Caroline; Klein, Teri; Altman, Russ.

Presented at the 5th International Biocuration Conference, hosted by PIR in Washington, DC, April 2-4, 2012.

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
446
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rinaldi - ODIN

  1. 1. Using ODIN for a PharmGKB revalidation experiment Fabio Rinaldi1 , Simon Clematide1 , Yael Garten2 , MichelleWhirl-Carrillo2 , Li Gong2 , Joan M. Hebert2 , Katrin Sangkuhl2 , Caroline F. Thorn2 , Teri E. Klein2 , Russ B. Altman2 . 1 OntoGene group, University of Zurich 2 PharmGKB group, Stanford University Biocuration 2012
  2. 2. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneIntroduction PharmGKB OntoGeneIE Approach Entities InteractionsRevalidationResultsConclusion Outlook AcknowledgmentsExtra ME Ranking EvaluationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 2 / 42
  3. 3. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGenePharmGKBMissionPharmGKB is a pharmacogenomics knowledge resource that encompassesclinical information, potentially clinically actionable gene-drug associationsand genotype-phenotype relationshipsApproachPharmGKB collects, curates and disseminates knowledge about the impactof human genetic variation on drug responses through the many activities,including Annotating genetic variants and gene-drug-disease relationshipsvia literature reviewsBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 3 / 42
  4. 4. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGenePharmGKBMissionPharmGKB is a pharmacogenomics knowledge resource that encompassesclinical information, potentially clinically actionable gene-drug associationsand genotype-phenotype relationshipsApproachPharmGKB collects, curates and disseminates knowledge about the impactof human genetic variation on drug responses through the many activities,including Annotating genetic variants and gene-drug-disease relationshipsvia literature reviewsBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 3 / 42
  5. 5. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGenehttp://www.pharmgkb.org/Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 4 / 42
  6. 6. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneOntoGene groupAimsDevelop innovative text mining technologies for the automatic extractionof information from the biomedical literature. http://www.ontogene.org/Selected results PPI,IMT BioCreative 2006 PPI BioCreative 2009 (best results) ACT, IMT, IAT, BioCreative 2010Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 5 / 42
  7. 7. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneOntoGene groupAimsDevelop innovative text mining technologies for the automatic extractionof information from the biomedical literature. http://www.ontogene.org/Selected results PPI,IMT BioCreative 2006 PPI BioCreative 2009 (best results) ACT, IMT, IAT, BioCreative 2010Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 5 / 42
  8. 8. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneSASEBio: MissionsSASEBio: Semi-automated semantic enrichment of biomedical texts Mission I “Relation/Text Mining”: Extraction of semantic relations between biomedical entities (proteins, genes, drugs) using linguistic text mining methods Mission II “Literature Curation”: Development of a flexible interactive curation interface for efficient human validation and annotationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 6 / 42
  9. 9. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneSASEBio: MissionsSASEBio: Semi-automated semantic enrichment of biomedical texts Mission I “Relation/Text Mining”: Extraction of semantic relations between biomedical entities (proteins, genes, drugs) using linguistic text mining methods Mission II “Literature Curation”: Development of a flexible interactive curation interface for efficient human validation and annotationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 6 / 42
  10. 10. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneRelation/Text Mining: Automatic Document AnalysisBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 7 / 42
  11. 11. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneRelation Mining: Syntactic ApproachUsing dependency parses and machine learningBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 8 / 42
  12. 12. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneLiterature Curation: Interactive Curation EnvironmentBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 9 / 42
  13. 13. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneODIN: Interactive Curation EnvironmentUsing client-side Web-based techniquesXML, CSS, DOM manipulation by JavaScript and AJAXBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 10 / 42
  14. 14. Intro IE Approach Revalidation Results Conclusion Extra PharmGKB OntoGeneODIN: Interactive Curation EnvironmentExtensive logging facilitiesBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 11 / 42
  15. 15. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsIntroduction PharmGKB OntoGeneIE Approach Entities InteractionsRevalidationResultsConclusion Outlook AcknowledgmentsExtra ME Ranking EvaluationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 12 / 42
  16. 16. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsRelations between Genes, Drugs, DiseasesPharmGKB: Pharmacogenomics Knowledge Base as a Gold StandardSubset of information in PharmGKB used: 26,122 binary relations between diseases, drugs, and genes 5062 PubMed abstracts referencedBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 13 / 42
  17. 17. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsRelations between Genes, Drugs, DiseasesPharmGKB: Pharmacogenomics Knowledge Base as a Gold StandardSubset of information in PharmGKB used: 26,122 binary relations between diseases, drugs, and genes 5062 PubMed abstracts referencedGoalCompute high-quality relation candidates and rank them according to aconfidence score.Information used for text miningPubMed abstracts plus MeSH terms and chemical substances terms.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 13 / 42
  18. 18. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsBaseline: Abstract-wide Co-occurence-based CandidateRelation GenerationBasic ideaCombine all concepts identified in the abstract into relation candidatepairs.However, do not combine concepts stemming from the same ambiguousterm.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 14 / 42
  19. 19. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsBaseline: Abstract-wide Co-occurence-based CandidateRelation GenerationBasic ideaCombine all concepts identified in the abstract into relation candidatepairs.However, do not combine concepts stemming from the same ambiguousterm.Basic ranking: Occurrences and zoningScore of a pair of concepts c1 , c2 in an abstract (C = all concepts): freq(c1 ) + freq(c2 ) score(c1 , c2 ) = freq(C )Text zone boosting: An occurrence in an article title is counted 10 times.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 14 / 42
  20. 20. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsImproving Relation RankingCore ideas for improved ranking Identify noisy concepts recognized by term recognizer and penalize them. Weight individual concepts according to their likeliness to appear in a gold relation! Adapt ranking of relations to gold standard. Combine the weights of individual concepts for the score of relation candidates. Generally penalize relations of the same type (rare phenomenon)Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 15 / 42
  21. 21. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsImproving Relation RankingCore ideas for improved ranking Identify noisy concepts recognized by term recognizer and penalize them. Weight individual concepts according to their likeliness to appear in a gold relation! Adapt ranking of relations to gold standard. Combine the weights of individual concepts for the score of relation candidates. Generally penalize relations of the same type (rare phenomenon)Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 15 / 42
  22. 22. Intro IE Approach Revalidation Results Conclusion Extra Entities InteractionsImproving Relation RankingCore ideas for improved ranking Identify noisy concepts recognized by term recognizer and penalize them. Weight individual concepts according to their likeliness to appear in a gold relation! Adapt ranking of relations to gold standard. Combine the weights of individual concepts for the score of relation candidates. Generally penalize relations of the same type (rare phenomenon)Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 15 / 42
  23. 23. Intro IE Approach Revalidation Results Conclusion ExtraIntroduction PharmGKB OntoGeneIE Approach Entities InteractionsRevalidationResultsConclusion Outlook AcknowledgmentsExtra ME Ranking EvaluationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 16 / 42
  24. 24. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation ExperimentGoalRevalidation of PharmGKB relations with respect to false positives.Collaboration with Stanford Center for Biomedical InformaticsResearch Relations Articles In 3059 out of 5378 articles we find all 2 8 relations. 3 9 4 2 Keep 1407 where number of relations > 1 and 5 3 ≤ 20. 6-7 1 Almost half of 3059 contain only 1 relation. 8-9 1 10-20 1 Each of the 5 curators revalidates 25 articles Sampling of articles according to number relations per articleBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 17 / 42
  25. 25. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation ExperimentGoalRevalidation of PharmGKB relations with respect to false positives.Collaboration with Stanford Center for Biomedical InformaticsResearch Relations Articles In 3059 out of 5378 articles we find all 2 8 relations. 3 9 4 2 Keep 1407 where number of relations > 1 and 5 3 ≤ 20. 6-7 1 Almost half of 3059 contain only 1 relation. 8-9 1 10-20 1 Each of the 5 curators revalidates 25 articles Sampling of articles according to number relations per articleBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 17 / 42
  26. 26. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation ExperimentGoalRevalidation of PharmGKB relations with respect to false positives.Collaboration with Stanford Center for Biomedical InformaticsResearch Relations Articles In 3059 out of 5378 articles we find all 2 8 relations. 3 9 4 2 Keep 1407 where number of relations > 1 and 5 3 ≤ 20. 6-7 1 Almost half of 3059 contain only 1 relation. 8-9 1 10-20 1 Each of the 5 curators revalidates 25 articles Sampling of articles according to number relations per articleBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 17 / 42
  27. 27. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation ExperimentGoalRevalidation of PharmGKB relations with respect to false positives.Collaboration with Stanford Center for Biomedical InformaticsResearch Relations Articles In 3059 out of 5378 articles we find all 2 8 relations. 3 9 4 2 Keep 1407 where number of relations > 1 and 5 3 ≤ 20. 6-7 1 Almost half of 3059 contain only 1 relation. 8-9 1 10-20 1 Each of the 5 curators revalidates 25 articles Sampling of articles according to number relations per articleBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 17 / 42
  28. 28. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation Process and CategoriesRevalidation process Our initial setup from IAT BioCreative task: Curator deletes unwanted relations and exports the wanted. But curators didn’t like that: The want checkboxes for revalidation categories for each relationhttp://kitt.cl.uzh.ch/kitt/bcms/pharmgkbmeB/#pmid=11990384Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 18 / 42
  29. 29. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation Process and CategoriesRevalidation process Our initial setup from IAT BioCreative task: Curator deletes unwanted relations and exports the wanted. But curators didn’t like that: The want checkboxes for revalidation categories for each relationRevalidation categories Our initial setup: verified = true positive; falsified = false positive But curators wanted more: Need full text: A relation can only be revalidated by recourse to full text Negative relation: Article denies a relation between two entitieshttp://kitt.cl.uzh.ch/kitt/bcms/pharmgkbmeB/#pmid=11990384Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 18 / 42
  30. 30. Intro IE Approach Revalidation Results Conclusion ExtraCustomized ODIN interfaceBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 19 / 42
  31. 31. Intro IE Approach Revalidation Results Conclusion ExtraLessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable!Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42
  32. 32. Intro IE Approach Revalidation Results Conclusion ExtraLessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable!Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42
  33. 33. Intro IE Approach Revalidation Results Conclusion ExtraLessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable!Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42
  34. 34. Intro IE Approach Revalidation Results Conclusion ExtraLessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable!Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42
  35. 35. Intro IE Approach Revalidation Results Conclusion ExtraLessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable!Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42
  36. 36. Intro IE Approach Revalidation Results Conclusion ExtraLessons Learnt for Usability 1 Ask experienced users what they want (or what they are used to) 2 Rapidly implement prototypes and get feedback from users! (The use of a JavaScript framework allows this easily!) 3 Let the users test on real data! 4 Respect user needs (as far as possible or sensible)! Goto item 1! Prepare simple and good documentation! Be prepared for the unforeseeable!Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 20 / 42
  37. 37. Intro IE Approach Revalidation Results Conclusion ExtraIntroduction PharmGKB OntoGeneIE Approach Entities InteractionsRevalidationResultsConclusion Outlook AcknowledgmentsExtra ME Ranking EvaluationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 21 / 42
  38. 38. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation Results reject needs full text negative confirmBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 22 / 42
  39. 39. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation Results by Relation Types reject needs full text negative confirm 150 Number of relations 100 50 0 Disease/Drug Disease/Ds. Drug/Drug Drug/Gene Gene/Gene Relation typesBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 23 / 42
  40. 40. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation Results by Curators reject 70 needs full text negative confirm 60 50 Number of relations 40 30 20 10 0 A B C D E CuratorBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 24 / 42
  41. 41. Intro IE Approach Revalidation Results Conclusion ExtraRevalidation Results by Confidence Score Ranking 1.0 confirm negative Relative distribution of decisions for curated relations needs full text reject 0.8 0.6 0.4 0.2 0.0 1. 2. 3−5. 6−20. Rank of a relation according to the confidence scoreBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 25 / 42
  42. 42. Intro IE Approach Revalidation Results Conclusion ExtraConcept Identification Quality as Rated by Curators bad N/A ok goodBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 26 / 42
  43. 43. Intro IE Approach Revalidation Results Conclusion ExtraConcept Identification Quality as Rated by Curators 25 N/A good ok bad 20 15 Articles 10 5 0 A B C D E CuratorBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 27 / 42
  44. 44. Intro IE Approach Revalidation Results Conclusion ExtraMeantime for Decision Taking for One Relation q 350 q q q Meantime of curation time per article in seconds 300 q 250 q 200 q 150 q q 100 q q q 50 q q 0 A B C D E CuratorBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 28 / 42
  45. 45. Intro IE Approach Revalidation Results Conclusion ExtraConcept Identification Quality and Meantime for DecisionTaking 350 q q q q Meantime of curation time per article in seconds 300 q 250 q 200 q q 150 q 100 50 0 bad ok good Rating of quality of concept identification per articleBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 29 / 42
  46. 46. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsIntroduction PharmGKB OntoGeneIE Approach Entities InteractionsRevalidationResultsConclusion Outlook AcknowledgmentsExtra ME Ranking EvaluationBiocuration 2012 Rinaldi et al. ODIN-PharmGKB 30 / 42
  47. 47. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsConclusion The PharmGKB resource is an interesting gold standard for relation detection between drugs, genes and diseases (apart from the common protein-protein interaction detection task) Proper ranking is crucial for real-world applications. Supervised machine learning methods improve rankings dramatically. Usability of the interface as a crucial acceptability criteria.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 31 / 42
  48. 48. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsFuture Work For measuring inter-annotator agreement, each article sample should be revalidated by at least two curators Another experiment for the detection of false negatives: Select PubMed articles where our text mining systems suggests a non-existing relation with high confidence score. Consider other databases: we are interested in research collaborations.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 32 / 42
  49. 49. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsFuture Work For measuring inter-annotator agreement, each article sample should be revalidated by at least two curators Another experiment for the detection of false negatives: Select PubMed articles where our text mining systems suggests a non-existing relation with high confidence score. Consider other databases: we are interested in research collaborations.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 32 / 42
  50. 50. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsFuture Work For measuring inter-annotator agreement, each article sample should be revalidated by at least two curators Another experiment for the detection of false negatives: Select PubMed articles where our text mining systems suggests a non-existing relation with high confidence score. Consider other databases: we are interested in research collaborations.Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 32 / 42
  51. 51. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsSMBM 2012Semantic Mining in Biomedicine, Zurich, September 3-4, 2012http://www.smbm.eu/Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 33 / 42
  52. 52. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsSMBM 2012Semantic Mining in Biomedicine, Zurich, September 3-4, 2012http://www.smbm.eu/Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 33 / 42
  53. 53. Intro IE Approach Revalidation Results Conclusion Extra Outlook AcknowledgmentsAcknowledgements Yael Garten, Michelle Whirl-Carillo, Li Gong, Joan M. Hebert, Katrin Sangkuhl, Caroline F. Thorn, Teri E. Klein, Russ B. Altman from Stanford University Gerold Schneider and Kaarel Kaljurand Martin Romacker from NITAS, Novartis Thank you for your attention! Questions?Biocuration 2012 Rinaldi et al. ODIN-PharmGKB 34 / 42

×