Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A new approach for summarizing SemRep predications

188 views

Published on

Semantic MEDLINE applies automatic summarization techniques to manage the semantic predications extracted from the biomedical literature by SemRep. It does so by selecting salient predications based on several criteria. In this study, we investigated a new technique to automatically summarize SemRep predications. Our technique leverages hierarchical relations from the UMLS Metathesaurus for aggregating the semantic predications. We also generated new inferences from the aggregated semantic predications. Several quantitative measures are de ned to evaluate the system. We applied our method to summarize medications used to treat diseases and also adverse drug events reported in the biomedical literature. Our preliminary experimental results are promising in terms of summarization rate. They also indicate that less than half of the newly generated inferences correspond to existing relations. Further work is needed to evaluate the rest of the inferences.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

A new approach for summarizing SemRep predications

  1. 1. A new approach for summarizing SemRep predications Vahid Taslimitehrani, PhD candidate, Kno.e.sis Center, WSU Dr. Olivier Bodenreider, Cognitive Science Branch, NLM/NIH
  2. 2. Introduction • Medline Citations • UMLS Metathesarus SemRep • List of semantic predications Semantic MEDLINE • Summarized list of semantic predications We want to propose an alternative/complementary summarization technique 2
  3. 3. Background  Semantic MEDLINE summarization system  Semantic MEDLINE (2008)  Relevance  Connectivity  Novelty  Saliency  Degree Centrality (2011)  Clustering Cliques (2013) 3 ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- Semantic MEDLINE Our technique Semantic MEDLINE +Our technique
  4. 4. Motivation – an example Dobutamine TREAT S Congestive heart failure Dopexamine TREAT S Congestive heart failure Dopexamine hydrochloride TREAT S Congestive heart failure Xamoterol TREAT S Congestive heart failure SemRep semantic predications UMLS Metathesaurus Selective beta-1 adrenoceptor stimulants IS_A IS_A IS_A IS_A Selective beta-1 adrenoceptor stimulants TREATS Congestive heart failure Summarize Inferred 4
  5. 5. Motivation-an example  Based on the example, we observe: 1. 4 semantic predications are aggregated. 2. A new inference is made. Objective: Our technique leverages hierarchical relations from the UMLS Metathesaurus for aggregating the semantic predications and generating new inferences from the aggregated predications. 5
  6. 6. Methodology-an overview Asserted Predication Inferred Predication Med 1 Med 2 Med 3 Disease* TREATS* Med TREATS* Med 1 Med 2 Med 3 Disease* TREATS* Med Med 4 TREATS* 6
  7. 7. Methodology  Let’s discuss about the details of methodology using an example. We’re interested to summarize the semantic predications returned in response to the following question: What are the medications used to TREAT Congestive heart failure (C0018802)?  SemRep returns 6013 sematic predications.  SemRep returns 684 unique treatment options. 7
  8. 8. Methodology  Step 1: Retrieve unique semantic predications from SemRep when  The predicate is TREATS or any descendant.  The object is a disease or any descendant.  Step 2: Extract semantic groups of each subject and remove those predications with procedures as semantic group (T061, …)  Step 3: Retrieve all parents of each medication returned from step 2. Congestive heart failure Xamoterol is one of the medications from step 2 (Xamoterol TREATS Congestive heart failure) Xamoterol has two parents: 1. Selective beta-1 adrenoceptor stimulants (Xamoterol IS_A Selective beta-1 adrenoceptor stimulants) 2. Sympathomimetics (Xamoterol IS_A Sympathomimetics) 684 Semantic predications 500 Semantic Predications 8
  9. 9. Methodology  Step 4: Retrieve all children of the parents returned from step 3.  Step 5: Some of the parents are too generic such as C0993159 (Oral product). If an ancestor has too many descendants, then it is not used. Selective beta-1 adrenoceptor stimulants has 4 children: 1. Dobutamine (Selective beta-1 adrenoceptor stimulants INVERSE_ISA Dobutamine) 2. Dopexamine (Selective beta-1 adrenoceptor stimulants INVERSE_ISA Dopexamine ) 3. Dopexamine hydrochloride (Selective beta-1 adrenoceptor stimulants INVERSE_ISA Dopexamine hydrochloride) 4. Xamoterol (Selective beta-1 adrenoceptor stimulants INVERSE_ISA Xamoterol) Sympathomimetics has many children such as: 1. Adrenergic alpha-agonists 2. Dopamine 3. Ephedrine 4. Etilefrine 5. Xamoterol 9
  10. 10. Methodology  Step 6: For each child of a parent returned from step 5, we need to verify the child TREATS the disease or not. If all children TREAT the disease, we aggregate semantic predications and make a new inference. Dobutamine TREATS CHF ✔ Dopexamine TREATS CHF ✔ Dopexamine hydrochloride TREATS CHF ✔ Xamoterol TREATS CHF ✔ Adrenergic TREATS CHF ✔ Dopamine TREATS CHF ✔ Ephedrine TREATS CHF ✖ Xamoterol TREATS CHF ✔ Selective beta-1 adrenoceptor stimulants Sympathomimetics Aggregate Do not aggregate Selective beta-1 adrenoceptor stimulants TREATS Congestive heart failure 10 ✖
  11. 11. Methodology  Step 7: If we can aggregate semantic predications in step 6, we continue to aggregate into the higher levels. If not, it is the highest level of summarization. 11 Selective beta-1 adrenoceptor stimulants has just one parent: 1. Adrenergic beta agonist and adrenergic beta agonist has 6 children: 1. Selective beta-1 adrenoceptor stimulants ✔ 2. Selective beta-2 adrenoceptor stimulants ✖ 3. Dobutamine ✔ 4. Dobutamine hydrochloride ✔ 5. Isoproterenol ✖ 6. Mirabegon ✖
  12. 12. Implementation  We used Biomedical Knowledge Repository (BKR) to implement our technique.  UMLS in RDF  SemRep predications in RDF  Created at NLM by Dr. Olivier Bodenreider and Dr. Thomas Rindflesch.  We used 2013 version that includes more than 27 million predications from 13 million articles by SemRep.  Technologies  Semantic Web  RDF, SPARQL standards  Virtuoso triple store  Programming  Java 12
  13. 13. Evaluation  We defined 4 quantitative measures to evaluate the performance of our technique.  Summarization rate:  Inference ratio:  Number of generated inferences  Ratio of validated inferences 1- # semantic predications after # semantic predications before =1- 300 400 = 0.25 # predications can be aggregated #inf erences = 130 30 @ 4.3 270 130 270 30 before after 400 300 13
  14. 14. Experimental results  We investigate two questions: 1. What are the medications used to treat disease X? 2. What are the medications caused disease X? (adverse drug events)  For each question, we select five diseases with  high numbers of predications (more than 400 unique predications)  medium numbers of predications (between 100 and 400 predications) 14
  15. 15. Experimental results-question #1 Disease # pred. Summarization rate # generated inferences Ratio of validated inferences Inference ratio Hypertensive disorder 1122 26% 63 38% 5.6 Congestive heart failure 499 29% 24 21% 7 Depression 400 27% 39 23% 3.7 Myocardial infarction 419 29% 24 16% 6 Schizophrenia 401 30% 26 38% 5.6 Diseases with high numbers of semantic predications (more than 400) 15
  16. 16. Experimental results-question #1 Disease # pred. Summarization rate # generated inferences Ratio of validated inferences Inference ratio Hypercholesterolemia 240 21% 18 33% 3.8 Pruitus 179 47% 12 50% 8 Burn injury 316 45% 11 55% 13.9 Pseudomonas Infections 201 26% 23 43% 3.3 Glaucoma 343 29% 21 62% 5.7 Diseases with the medium numbers of semantic predications 16
  17. 17. Experimental results-question #2 Disease # pred. Summarizati on rate # generated inferences Ratio of validated inferences Inference ratio Traumatic Injury 1045 40% 83 17% 6 Ischemia 622 44% 32 3% 9.5 Cerebrovasc ular accident 401 34.5% 16 6% 9.6 Obstruction 1815 37% 75 16% 10 Septicemia 748 41% 42 14% 11.5 Diseases with high numbers of semantic predications (more than 400) 17
  18. 18. Experimental results-question #2 Disease # pred. Summarizati on rate # generated inferences Ratio of validated inferences Inference ratio Wounds & injuries 139 44% 6 50% 8.4 Cardiovascul ar disease 170 37% 8 25% 6.7 Pulmonary embolism 152 35% 7 28% 6.6 Asthma 192 41% 11 36% 11.3 Gastroesoph ageal reflux disease 101 32% 5 60% 14.8 Diseases with the medium numbers of semantic predications 18
  19. 19. Limitations  We need to validate the rest of new inferences by the experts.  In the current experiments, we’re using SNOMED_CT and MEDCIN hierarchy and IS_A relations are used to explore the hierarchy. The code has the ability to generalize to any other hierarchies. 19
  20. 20. Conclusion  We propose a new technique to summarize SemRep predications.  Our technique is based on aggregating semantic predications and in the same time making new inferences from the aggregated inferences.  We also designed four measures to evaluate the performance of our technique.  Preliminary experimental results are promising.  We can use this technique as complementary to the semantic MEDLINE summarization. 20
  21. 21. Acknowledgment I would like to thank  Dr. Olivier Bodenreider  Dr. Paul Fontelo  Dr. Marcelo Fizman  Dr. Thomas Rindflesch  Dr. McDonald  Lister Hill Center 21

×