Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries

112 views

Published on

Accurately answering verbose queries that describe a clinical case and aim at finding articles in a collection of medical literature requires capturing many explicit and latent aspects of complex information needs underlying such queries. Proper representation of these aspects often requires query analysis to identify the most important query concepts as well as query transformation by adding new concepts to a query, which can be extracted from the top retrieved documents or medical knowledge bases. Traditionally, query analysis and expansion have been done separately. In this paper, we propose a method for representing verbose domain-specific queries based on weighted unigram, bigram, and multi-term concepts in the query itself, as well as extracted from the top retrieved documents and external knowledge bases. We also propose a graduated non-convexity optimization framework, which allows to unify query analysis and expansion by jointly determining the importance weights for the query and expansion concepts depending on their type and source. Experiments using a collection of PubMed articles and TREC Clinical Decision Support (CDS) track queries indicate that applying our proposed method results in significant improvement of retrieval accuracy over state-of-the-art methods for ad hoc and medical IR.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries

  1. 1. 1/27 Introduction Method Experiments Conclusions Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries Saeid Balaneshin-kordan Alexander Kotov Wayne State University September 16, 2016 Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  2. 2. 2/27 Introduction Method Experiments Conclusions Introduction Method Experiments Conclusions Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  3. 3. 3/27 Introduction Method Experiments Conclusions Introduction Method Experiments Conclusions Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  4. 4. 4/27 Introduction Method Experiments Conclusions Queries and Explicit and Latent Concepts (Example) § Query: 33-year-old male presents with severe abdominal pain one week after a bike accident, in which he sustained abdominal trauma. He is hypotensive and tachycardic, and imaging reveals a ruptured spleen and intraperitoneal hemorrhage Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  5. 5. 4/27 Introduction Method Experiments Conclusions Queries and Explicit and Latent Concepts (Example) § Query: 33-year-old male presents with severe abdominal pain one week after a bike accident, in which he sustained abdominal trauma. He is hypotensive and tachycardic, and imaging reveals a ruptured spleen and intraperitoneal hemorrhage § Explicit concepts: ”bike accident”, “abdominal trauma”, “tachycardia”, “splenic rupture”, “intraperitoneal hemorrhage” Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  6. 6. 4/27 Introduction Method Experiments Conclusions Queries and Explicit and Latent Concepts (Example) § Query: 33-year-old male presents with severe abdominal pain one week after a bike accident, in which he sustained abdominal trauma. He is hypotensive and tachycardic, and imaging reveals a ruptured spleen and intraperitoneal hemorrhage § Explicit concepts: ”bike accident”, “abdominal trauma”, “tachycardia”, “splenic rupture”, “intraperitoneal hemorrhage” § Latent concepts: “splenic trauma”, “Injury of spleen”, “Traffic accidents” Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  7. 7. 5/27 Introduction Method Experiments Conclusions Sources of Latent Concepts for Query Expansion § Top retrieved documents § External domain-specific knowledge repositories (e.g., UMLS) § External general-purpose resources (e.g., Wikipedia) Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  8. 8. 6/27 Introduction Method Experiments Conclusions Retrieval model § Retrieval score of document D with respect to query Q: sc(Q, D) = ÿ TPTQ ÿ cPCT λT(c)fT(c, D) Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  9. 9. 6/27 Introduction Method Experiments Conclusions Retrieval model § Retrieval score of document D with respect to query Q: sc(Q, D) = ÿ TPTQ ÿ cPCT λT(c)fT(c, D) § Matching function of concept c in document D (two-stage smoothing method [Zhai and Lafferty, SIGIR’02]): fT(c, D) = log((1 ´ λ) n(c, D) + µn(c,Col) |Col| |D| + µ + λ n(c, Col) |Col| ) Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  10. 10. 6/27 Introduction Method Experiments Conclusions Retrieval model § Retrieval score of document D with respect to query Q: sc(Q, D) = ÿ TPTQ ÿ cPCT λT(c)fT(c, D) § Weight of concept c with type T: λT(c) = Nÿ n=1 wn φφn , Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  11. 11. 6/27 Introduction Method Experiments Conclusions Retrieval model § Retrieval score of document D with respect to query Q: sc(Q, D) = ÿ TPTQ ÿ cPCT λT(c)fT(c, D) § Weight of concept c with type T: λT(c) = Nÿ n=1 wn φφn , § Objective: find the weights wn φ that maximize infNDCG Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  12. 12. 7/27 Introduction Method Experiments Conclusions infNDCG retrieval metric by varying the weight of one of the features 0.325 0.330 0.335 0.340 0.0 0.1 0.2 0.3 0.4 wGI infNDCG Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  13. 13. 8/27 Introduction Method Experiments Conclusions Proposed Method § Represent verbose domain-specific queries using weighted unigram, bigram and multi-term concepts in a query itself, top retrieved documents and knowledge bases. § Leverage Graduated Non-Convexity Optimization (GNC) method to jointly determine the importance weights for the query and expansion concepts depending on their type and source. Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  14. 14. 9/27 Introduction Method Experiments Conclusions Introduction Method Experiments Conclusions Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  15. 15. 10/27 Introduction Method Experiments Conclusions 0.325 0.330 0.335 0.340 0.0 0.1 0.2 0.3 0.4 wGI infNDCG ∆ 0.025 0.33 0.34 0.0 0.1 0.2 0.3 0.4 wGI infNDCG § Sample infNDCG for the following values of wφ: ws,φ = [wφ,´M, . . . , wφ,0, . . . , wφ,M] § Using the fixed sampling interval (∆wφ): wφ,m = wφ,0 +m∆wφ, m P [´M, . . . , M] § Polynomial of degree K is used for smoothing the objective function: ˜E(wφ,m) = Kÿ k=0 akmk , m P [´M, . . . , M] Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  16. 16. 11/27 Introduction Method Experiments Conclusions 0.325 0.330 0.335 0.340 0.0 0.1 0.2 0.3 0.4 wGI infNDCG ∆ 0.025 0.33 0.34 0.0 0.1 0.2 0.3 0.4 wGI infNDCG § Cost Function (Mean Square Error (MSE)): εφ = 1 2M + 1 Mÿ m=´M (˜E(wφ,m)´E(wφ,m))2 § Optimal a: a = (JT J)´1 JT ws,φ § J is a Jacobian of the vector: [J]m,k = (m´M)k , m P [0, 2M], k P [0, K] . Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  17. 17. 12/27 Introduction Method Experiments Conclusions The first three iterations... ∆ 0.025 0.33 0.34 0.0 0.1 0.2 0.3 0.4 wGI infNDCG (a) 1st Iteration ∆ 0.338 0.340 0.342 0.344 0.00 0.01 0.02 0.03 0.04 wGI infNDCG (b) 2nd Iteration ∆w 0.3426 0.3429 0.3432 0.3435 0.023 0.024 0.025 0.026 0.027 wGI infNDCG (c) 3rd Iteration Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  18. 18. 13/27 Introduction Method Experiments Conclusions Multi-variate optimization § Using Powell’s method, the multi-variate objective function is decomposing into a series of univariate objective functions to weight the n-th feature at the j-th iteration: En,j (wn φ) = E([ˆw1 φ, . . . , ˆwn´1 φ , wn φ, ˆwn+1 φ , . . . , ˆwN φ]) Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  19. 19. 14/27 Introduction Method Experiments Conclusions Algorithm 1: Identify explicit and latent concepts 2: Randomly initialize the feature weights vector (wφ) 3: for j = 1 : jmax do 4: Randomly shuffle wφ 5: for n = 1 : N do 6: for each sampling policy do 7: Sample En,j(wn φ) 8: Obtain ˜En,j(wn φ,m) 9: Obtain the optimum point ˆwn φ 10: Update n-th element of wφ by ˆwn φ 11: end for 12: end for 13: if Convergence then 14: Break 15: end if 16: end for Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  20. 20. 15/27 Introduction Method Experiments Conclusions Introduction Method Experiments Conclusions Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  21. 21. 16/27 Introduction Method Experiments Conclusions Experimental Setup § All retrieval methods were implemented using Indri 5.9 IR toolkit § We used INQUERY stoplist and Porter stemmer § Very frequent terms (that occur in more than 15% of documents) and very rare terms (that occur in less than 5 documents) were ignored for estimation of translation models, LDA and NMF. § We used the implementation of NMF from scikit-learn 17.1 (Double Singular Value Decomposition for non-random initialization of factors and Coordinate Descent solver for optimization) § Gibbs sampler for posterior inference of LDA parameters was run for 200 iterations, and the convergence threshold for NMF was set to 10´5 § Word embeddings were obtained by using word2vec (version 0.1c) tool 3 (by using Skip-gram architecture for its training) Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  22. 22. 17/27 Introduction Method Experiments Conclusions Explicit and Latent Query Concept Types Concept Type Concept Source(s) Concept Representation Concept Extraction QU Query unigrams all query unigrams QOB Query ordered bigrams all query bigrams QUB Query unordered bigrams all query bigrams QUU Query, UMLS unigrams MetaMap QUOB Query, UMLS ordered bigrams MetaMap QUUB Query, UMLS unordered bigrams MetaMap QDU Query, Wikipedia unigrams health-relatedness measure QDOB Query, Wikipedia ordered bigrams health-relatedness measure QDUB Query, Wikipedia unordered bigrams health-relatedness measure TU Top-docs unigrams direct identification TOB Top-docs ordered bigrams direct identification TUB Top-docs unordered bigrams direct identification TUU Top-docs, UMLS unigrams MetaMap TUOB Top-docs, UMLS ordered bigrams MetaMap TUUB Top-docs, UMLS unordered bigrams MetaMap TDU Top-docs, Wikipedia unigrams health-relatedness measure TDOB Top-docs, Wikipedia ordered bigrams health-relatedness measure TDUB Top-docs, Wikipedia unordered bigrams health-relatedness measure UU UMLS unigrams UMLS relationships UOB UMLS ordered bigrams UMLS relationships UUB UMLS unordered bigrams UMLS relationships Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  23. 23. 18/27 Introduction Method Experiments Conclusions Features (when concept sources are Query and UMLS) § TF-IDF of concept c in the collection, § Number of top retrieved documents containing concept c, § Sum of retrieval scores of top-ranked documents containing concept c, § Average/Maximum collection co-occurrence of concept c with other concepts in the query, § Average/Maximum co-occurrence of concept c with other query concepts in top retrieved documents, § Does concept c have a UMLS semantic type that is effective for medical query expansion? § Node degree of concept c in the UMLS concept graph, § Average distance between concept c in the UMLS concept graph and other query, top document and related UMLS concepts identified for a query. Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  24. 24. 19/27 Introduction Method Experiments Conclusions Considered Semantic Types of UMLS concepts Obtained using backward feature elimination process from [Limsopatham et al., OAIR’13]: Semantic Group Semantic Type Chemicals & Drugs Clinical Drug Disorders Disease or Syndrome Disorders Injury or Poisoning Disorders Sign or Symptom Procedures Therapeutic or Preventive Procedure Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  25. 25. 20/27 Introduction Method Experiments Conclusions Number of Top-docs and number of their Concepts ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.26 0.28 0.30 0.32 5 10 15 Number of Top−docs infNDCG ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.28 0.29 0.30 0.31 0.32 0.33 5 10 15 Number of Concepts infNDCG Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  26. 26. 21/27 Introduction Method Experiments Conclusions Performance Comparison Query set TREC 2014 CDS track TREC 2015 CDS track Method infNDCG infAP P@5 infNDCG infAP P@5 Two-Stage 0.1945 0.0493 0.3533 0.2110 0.0449 0.4200 Wiki-Orig 0.2069 0.0550 0.3533 0.2193 0.0457 0.4133 UMLS-Orig 0.2074 0.0569 0.3867 0.2206 0.0478 0.4400 UMLS-TD˚ 0.2724 0.0810 0.4133 0.2503 0.0614 0.4667 PQE 0.2796 0.0873 0.4733 0.2792 0.0762 0.4400 Wiki-TD˚ 0.2883 0.0944 0.4600 0.2519 0.0633 0.4600 TREC best 0.2631 0.0757 0.4067 0.2928 0.0777 0.4467 INTGR-LS 0.3114 0.0993 0.4867 0.2987 0.0792 0.4800 INTGR 0.3401 0.1229 0.5267 0.3135 0.0873 0.5200 § INTGR significantly out- performs all of the best performing baselines in terms of all retrieval metrics. § Using graduated non-convexity as a uni-variate optimization method results in: § 5-9% improvement of retrieval accuracy in terms of infNDCG, § 10-23% improve- ment in terms of infAP and § 8% improvement in terms of P@5 on different query sets. Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  27. 27. 22/27 Introduction Method Experiments Conclusions Effectiveness of different type of knowledge bases Table : TREC 2014 CDS track topics Method infNDCG infAP P@5 INTGR using no knowledge bases 0.2673 0.0875 0.4601 INTGR using only Wikipedia 0.2975 0.0936 0.4533 (11.30%) (6.97%) (-1.47%) INTGR using only UMLS 0.3309 0.1170 0.5200 (23.79%) (33.71%) (13.02%) INTGR using UMLS and Wikipedia 0.3401 0.1229 0.5267 (27.23%) (40.46%) (14.47%) § the methods based on semantic concepts, UMLS is a better choice than Wikipedia with respect to all metrics, if only one knowledge repository is used § combining both knowledge bases results in better retrieval accuracy than using any one of them individually Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  28. 28. 22/27 Introduction Method Experiments Conclusions Effectiveness of different type of knowledge bases Table : TREC 2015 CDS track topics Method infNDCG infAP P@5 INTGR using no knowledge bases 0.2771 0.0758 0.4633 INTGR using only Wikipedia 0.2954 0.0779 0.4667 (6.60%) (2.77%) (0.09%) INTGR using only UMLS 0.3012 0.0786 0.5033 (8.67%) (3.93%) (7.93%) INTGR using UMLS and Wikipedia 0.3135 0.0873 0.5200 (13.14%) (15.17%) (11.52%) § the methods based on semantic concepts, UMLS is a better choice than Wikipedia with respect to all metrics, if only one knowledge repository is used § combining both knowledge bases results in better retrieval accuracy than using any one of them individually Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  29. 29. 23/27 Introduction Method Experiments Conclusions Comparison of INTGR with the baselines in terms of P@k for k ď 10: 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 k P@k (d) TREC 2014 CDS track topics 0.40 0.45 0.50 0.55 1 2 3 4 5 6 7 8 9 10 k P@k (e) TREC 2015 CDS track topics § INTGR has slightly lower accuracy improvement over its best-performing baseline and Two-Stage on the testing query set than on the training query set § Improvement that INTGR achieves over Two-Stage is much higher than the improvement of the best performing base- line over Two-Stage. Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  30. 30. 24/27 Introduction Method Experiments Conclusions Topic-level differences of the infNDCG values 16 27 15 10 19 24 29 9 26 4 2 25 12 18 23 21 20 8 6 11 22 3 13 17 28 30 1 5 7 14 mean = 0.0518 σ = 0.12 −0.2 0.0 0.2 0.4 Query ID infNDCG−difference (f) TREC 2014 CDS track topics § infNDCG of INTGR is greater than that of Wiki-TD‹ on 67% of the queries § Highest Improvement: 28-year-old female with neck pain and left arm numbness 3 weeks after working with stray animals. Physical exam initially remarkable for slight left arm tremor and spasticity. Three days later she presented with significant arm spasticity, diaphoresis, agitation, difficulty swallowing, and hydrophobia. § Lowest Degradation: 85-year-old man who was in a car accident 3 weeks ago, now with 3 days of progressively decreasing level of consciousness and impaired ability to perform activities of daily living. Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  31. 31. 24/27 Introduction Method Experiments Conclusions Topic-level differences of the infNDCG values 6 15 22 28 30 4 16 23 7 12 3 9 27 18 26 17 10 29 24 19 25 2 5 1 20 11 13 21 14 8 mean = 0.0345 σ = 0.0734 −0.2 0.0 0.2 0.4 Query ID infNDCG−difference (g) TREC 2015 CDS track topics § infNDCG of INTGR is greater than that of PQE on 73% of the queries § Highest Improvement:A 46-year-old woman with sweaty hands, exophthalmia, and weight loss despite increased eating. § Lowest Degradation:An 89-year-old man with progressive change in personality, poor memory, and myoclonic jerks. Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  32. 32. 25/27 Introduction Method Experiments Conclusions Introduction Method Experiments Conclusions Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  33. 33. 26/27 Introduction Method Experiments Conclusions Conclusions § We proposed a method to represent CDS queries using explicit concepts from the original query and the latent concepts from the top retrieved documents and knowledge bases § We proposed the features to individually weigh each query concept depending on its type and source § We proposed to use graduated optimization method to directly optimize the parameters of the concept based retrieval model with respect to the target retrieval metric § Proposed method significantly outperforms state-of-the-art baselines as well as best systems in CDS track of TREC 2014 and 2015 Balaneshin, Kotov Wayne State University Optimization Method for Weighting Explicit and Latent Concepts in Clinical Decision Support Queries
  34. 34. Many Thanks to the ACM SIGIR Student Travel Grant! 27/27

×