Be the first to like this
Accurately answering verbose queries that describe a clinical case and aim at finding articles in a collection of medical literature requires capturing many explicit and latent aspects of complex information needs underlying such queries. Proper representation of these aspects often requires query analysis to identify the most important query concepts as well as query transformation by adding new concepts to a query, which can be extracted from the top retrieved documents or medical knowledge bases. Traditionally, query analysis and expansion have been done separately. In this paper, we propose a method for representing verbose domain-specific queries based on weighted unigram, bigram, and multi-term concepts in the query itself, as well as extracted from the top retrieved documents and external knowledge bases. We also propose a graduated non-convexity optimization framework, which allows to unify query analysis and expansion by jointly determining the importance weights for the query and expansion concepts depending on their type and source. Experiments using a collection of PubMed articles and TREC Clinical Decision Support (CDS) track queries indicate that applying our proposed method results in significant improvement of retrieval accuracy over state-of-the-art methods for ad hoc and medical IR.