Be the first to like this
The need to recognise biomedical and clinical concepts in free text has been driven by demand for semantic information retrieval and decision support. Comprehensive, large-scale ontologies, such as the Foundational Model of Anatomy (FMA) and the Disease Ontology (DO), form the building blocks of the Unified Medical Language System (UMLS) and are the basis of dictionary-based biomedical concept recognisers such as MetaMap. However, these tools typically require substantial computing resources in terms of disk space, memory and processing time to execute. Recently, regular-expression (regex) based concept recognisers such as mGrep have begun to address this shortcoming, but a method that allows researchers to create their own concept recogniser from a given ontology remains unexplained.
In this presentation, I present a method for semantic decomposition of biomedical ontologies as applied to the FMA and DO in the creation of a high-performance tool for identifying anatomical and disease concepts in free text. The method involves 1) tokenizing each ontology into distinct words, 2) extracting free and bound morphemes from the word list, 3) classifying each morpheme according to semantic type or grammatical role, 4) generating regexes over each morpheme set, 5) applying simple grammatical rules over the regexes to identify potential concepts. We evaluate its precision and recall performance against manually annotated clinical and biomedical corpora, and compare the results with the performance of 1) direct ontology lookup and 2) MetaMap against the same corpora.
As measured by the Mann-Whitney rank sum test, the method demonstrates significant (p < 0.01) improvement in accuracy over direct ontology lookup. Against MetaMap, it also demonstrates a measurable improvement in accuracy, although this is not statistically significant (p > 0.05), but has the benefit of reducing processing time by by several orders of magnitude.